How do I convert images of a page of text to pure Black & White and sharpen the text?

I’m a research student working with large numbers of archival documents. For speed when at the archives, I photograph the documents (with a decent but not fancy digital camera), producing colour .jpg files. For ease of reading, and to improve detail when printing some of the documents, I would like to convert the images to pure Black and White – not Greyscale – and enhance / sharpen the text. Ideally, I would like the finished image to look more like a photocopied page, with the background, any shadows etc faded as much as possible to improve contrast with the text. I’ve tried various things but can’t quite get it there. Apologies if there’s an obvious method I’ve missed – I’m a GIMP novice. See the sample image for what I’m working with.
enter image description here


Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

Adjusting brightness/contrast is not easy due to the uneven lighting.

First, to avoid color fringes, you work on a grayscale version of the image, either Image>Mode>Grayscale or Color>Desaturate. If you are using Gimp 2.10, you can also set Image>Precision to 32-bit floating point/linear.

Then you apply the following technique(*) to even the lighting:

  • duplicate the image layer
  • apply a Gaussian blur that is sufficient to make the text disappear completely (around 50px on your image)
  • set the top layer to Grain extract mode
  • create a new layer with the result: Layer>New from visible

In the resulting layer, the background is gray (around 50%) but is a more uniform gray. In the histogram, this is the big spike. You can then use the Levels tool comfortably to optimize the result. In the “input” settings:

  • right handle slightly left of the middle of the big spike (anything to its right becomes completely white)
  • left handle to where the histogram seems to cease (anything to its left becomes completely black)
  • middle handle adjusted to optimize contrast

enter image description here

The text on the other side of the page shows through and limits a bit your ability to stretch contrast. Next time you take these pictures, bring a dark sheet of paper (ideally, black) that you insert under the page that you are shooting.

(*) To explain a bit:

  • With the Gaussian blur a pixel value is replaced by the lightness of the area around it (the blur is assumed to be sufficiently wide to make the influence of local details such as text negligible)
  • The grain extract, which is basically a subtraction subtracts the average lightness of the area from the pixels in the initial image:
    • for background pixels, the average value of the background around them is removed, so whatever the initial background lightness, the result is close to zero (actually, 50% gray, since Grain extract adds a bias to the result),
    • for subject pixels (which are normally quite different from the background) the difference is far from 0 and they remain visible.

Method 2

Simple contrast boost will not work because the light is uneven.

The uneven light can be to some degree flattened with high pass filtering. (It’s not perfect because missing local contrast will not be fixed). Here’s a split view scene from the high pass filter:

enter image description here

Applying curves can be used to increase the contrast. It unfortunately also boosts color differences and all unwanted crap at the edges:

enter image description here

But the edges can be painted white or clipped off and the colors can be desaturated (not done here):

enter image description here

Filter “treshold” makes everything strictly black and white. I found that a steep curve which retains some grayshades, results better readability.

Desaturating in the beginning is in your case useful, because it prevents all color boosting which can be caused (OCR is different, desaturating increases errors). Colored dirt for example could be attacked only if the image was colored. In addition you wanted the result in BW.

This still isn’t the best possible apparent contrast. Darker text is available with edge detection. It unfortunately makes the image negative, but the result can be inverted at the same time when the final contrast is stretched to limit with the curves. In the next example the image is at first desaturated, then it got Sobel Edge detection filtering and the screenshot shows the curves tool in use:

enter image description here

Note: The paper edges stayed guite clean witout clipping anything.

Enhancing the letters needs some pattern matching filtering that knows this is typewritten text and replaces the letters with perfect ones. That’s OCR.

I believe at least Google likes to read your documents and do the OCR thing for you. As installable software there are commercial packages and FreeOCR.

I cleaned the edges of the output image and dropped it to FreeOCR. Here’s the result.

enter image description here

All extra spaces are removed. There’s no control over it in FreeOCR.

You see that full check is a must, but it can still be faster than retyping. The next question is What file preprocessing is actually needed for succesful OCR, probably not the highest apparent contrast?

I made some tests. It became clear that highest possible apparent contrast causes errors. Best results needed:

  • no desaturation, many pieces of crap are colored differently than text, OCR can reject them more easily in colors
  • high pass filtering to remove overall lightness variations
  • contrast boost to fade the ghost writing throught the paper.If the ghost has clearly different color, it is not useful to fade it completely, because fading deletes something also in the wanted writing.
  • cropping off all extra areas

Here’s a screenshot:

enter image description here

Method 3

This is a complement to my other answer, using new capabilities in Gimp 2.10

  1. De-saturate the image as above
  2. Use Filters>Enhance>Wavelet Decompose
  3. Fill the “Residual” layer with a uniform color. Two ways:
    • `Filters>Blur>Pixelize and set the “pixel” size to the image size
    • Use the color picker with a very large radius where th eimage seems to have the average luminosity, and bucket-fill
  4. Layer>New from visible to create a new layer (make sure it is on top of the stack, outside the layer group created by the wavelet decomposition)
  5. You can then use Brighness/Constrast with very few side effects

How do I convert images of a page of text to pure Black & White and sharpen the text?

Method 4

I did this by:

Image->Mode->Indexed then select “Use black and white (1 bit) palette”

(Gimp 2.1)

All methods was sourced from or, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Notify of

Inline Feedbacks
View all comments
Would love your thoughts, please comment.x