Results 1 to 6 of 6

Thread: Extracting content from a scan file

  1. #1
    Join Date
    Nov 2009
    Posts
    792

    Extracting content from a scan file

    Hi,
    I had scanned some papers which were denoting upcoming information on the notice board. Now the pages are scanned in the form of a pdf file. It has certain format which will consume long time to write on Microsoft Word. I want to extract text from it. how to do that. I had heard about a OCR software but I am not familiar about this.

  2. #2
    Join Date
    Mar 2009
    Posts
    1,588

    Re: Extracting content from a scan file

    For your convenience, you can turn to the OCR, a computerized optical character recognition. The FreeOCR freeware can recover the text in the image of a printed text but also scanned a leaf and even a PDF document. Just download this software and then add the file in the software. Convert that directly to a text file.

  3. #3
    Join Date
    Jan 2008
    Posts
    3,755

    Re: Extracting content from a scan file

    By using FreeOCR software there might be some result requires some alterations depending on the quality of your original document, most of the characters and words are recognized and you can get your text into your word processor, Word or OpenOffice for example. You'll have a text document that you can manipulate, rework and recreational use.

  4. #4
    Join Date
    May 2008
    Posts
    3,316

    Re: Extracting content from a scan file

    One thing you have to take care that if you using a multi lingual document because this can create the issue while extracting. Free OCR works fine for English I am not sure about the others. You must have the fonts of at-least a dictionary support in your computer in order to help the software to extrat the exact content out of it.

  5. #5
    Join Date
    Apr 2008
    Posts
    4,642

    Re: Extracting content from a scan file

    FreeOCR is a complete scanning software library. It is very simple to use and supports multiple pages: fax documents, most image types including compressed Tiff. It currently supports scanning Twain. It can scan the pages of a book and make the editable text in the scanned page. The OCR engine is an open source product released by Google

  6. #6
    Join Date
    May 2008
    Posts
    4,570

    Re: Extracting content from a scan file

    I had noticed that when you convert a document via a ocr software it creates a lots of junk in the text files. There are unrecognizable characters like boxes or other characters. This happens when the software is not able to catch the exact image of the software. This issue might be related to the font library in the computer.

Similar Threads

  1. Replies: 5
    Last Post: 24-05-2011, 10:17 PM
  2. Replies: 6
    Last Post: 26-02-2011, 10:26 AM
  3. Extracting Selected Data from a txt file
    By sidney786 in forum Software Development
    Replies: 3
    Last Post: 20-08-2010, 01:07 AM
  4. Extracting data from XML file
    By Remedy in forum Software Development
    Replies: 4
    Last Post: 08-03-2010, 10:58 PM
  5. how to create automatic self extracting zip file in vbscript
    By vivekmohan in forum Software Development
    Replies: 1
    Last Post: 03-09-2009, 09:10 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Page generated in 1,714,021,695.11781 seconds with 16 queries