Optical Character Recognition

From Edgar BV Wiki
Revision as of 10:10, 1 December 2023 by Red (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

In order to extract text from images, PDFs, etc there are several options available to you

Scanning OCR

So you want to be able to scan a document and save it as a text searchable PDF?

Some printer drivers (Epson) come with that capability installed but some don't.

HP

In HP Scan software (6970) (HP Officejet Pro app windows 10) in Print, Scan & Fax go to Scan submenu. Under Scan a Document or photo edit Save as PDF and change Destination File Type from PDF to Searchable PDF. Set Auto Orient. Click the save icon.

From the HP Officejet Pro 7740 software click on Scan a document or photo, then select Save as PDF and click the File Type drop down menu to select Searchable PDF.

Click Scan to complete the scan job, you may also press the Save button to crete a custom shortcut for a later use.

ReadIris Pro

This program allows you throw PDF scans through it and turn your scanned PDF into searchable text PDF

Extracting text from pictures

Windows

Snipping tool

You can now use the snipping tool for this. Using <win>+<shift>+<s> you can select an area of the screen to take a picture of. The screenshot will appear in the bottom right corner of the screen and if you click on it, you will get the picture in the snipping tool. You can also just launch snipping tool and open the file. In the top centre the button next to crop is 'text actions'. Clicking this allows you to select which text you want to copy and paste elsewhere.

OneNote

You can paste a picture into OneNote, right click on it and select extract all text. You can then paste this into OneNote. This does not allow you to select which text you want.

Alternatives

Can be found by googling "textsnatcher alternatives" eg TextShot, gImageReader, Capture2Text, Text-Grab, dpScreenOCR

Android

Android has a problem that it doesn't support many charactersets, eg Tigrinya, so it may select text properly, but it doesn't paste it properly. I haven't figured out how to fix this yet.

Gallery

If your gallery has Google Lens then a little T will appear on photos with text. Clicking on this will allow you to select text. Doing this will also send meta information to the web based Google Photos which allows you to select text in Windows from there.

Google Photos

If you used google lens in your phone gallery, then sometimes Google Photo's allows you to extract text from the picture. If you click button on that in the top of the photo view, you can select individual lines. It doesn't always work though.

Google Lens

You can open a picture using Google Lens and then select individual texts.

Linux

TextSnatcher (limited amount of languages), GreenShot, gImageReader, Frog, dpScreenOCR, normcap