Sandip's Programming Zen

An attempt to share tech/coding experiences

Convert scanned pdf into image in C#

leave a comment »

I needed to convert scanned pdf into any image format so that I can run OCR operation for one of the DMS solutions I am working on. Actually I am using Tesseract (Open source OCR solution from Google) for OCR and it only takes image format as input. After spending few hours searching and trying few solution I zeroed in on this codeproject article which is making use of Ghostscript to achieve the core functionality.
So far this is the only solution I think works seamlessly on most windows platforms including Win7 (already tested on XP, Win2003, Win7).

Advertisements

Written by Sandip

March 23, 2010 at 7:16 am

Posted in .Net

Tagged with , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: