ocr Archives - Magenaut

Pytesseract OCR multiple config options

August 21, 2022 by Magenaut

I am having some problems with pytesseract. I need to configure Tesseract to that it is configured to accept single digits while also only being able to accept numbers as the number zero is often confused with an ‘O’.

How to clean images before OCR with Python OpenCV?

August 21, 2022 by Magenaut

I’ve been trying to clear images for OCR: (the lines)

Simple Digit Recognition OCR in OpenCV-Python

August 20, 2022 by Magenaut

I am trying to implement a “Digit Recognition OCR” in OpenCV-Python (cv2). It is just for learning purposes. I would like to learn both KNearest and SVM features in OpenCV.

Split text lines in scanned document

August 18, 2022 by Magenaut

I am trying to find a way to break the split the lines of text in a scanned document that has been adaptive thresholded. Right now, I am storing the pixel values of the document as unsigned ints from 0 to 255, and I am taking the average of the pixels in each line, and I split the lines into ranges based on whether the average of the pixels values is larger than 250, and then I take the median of each range of lines for which this holds. However, this methods sometimes fails, as there can be black splotches on the image.

Getting the bounding box of the recognized words using python-tesseract

August 12, 2022 by Magenaut

I am using python-tesseract to extract words from an image. This is a python wrapper for tesseract which is an OCR code.

How to OCR a PDF file and get the text stored within the PDF?

August 7, 2022 by Magenaut

First, apologies if this has been asked before – I searched for a while through the existing posts, but could not find support.