Transcript for video
In this video, I’ll show you some ways you can improve the quality of a page before capturing text, and some considerations to keep in mind as you process your content. I’ll do the experiments, so you don’t have to!
Adding to the Quick Tools toolbar
Before I start with scanned pages, here’s a quick tip for making Acrobat work faster for you. Rather than working from the right hand pane, take a minute and add the tools you need to the Quick Tools toolbar:
- Right-click in the toolbar area and choose Customize Quick Tools.
- Open the toolset you’re going to work with, in this case, Enhance Scans.
- Select the tool you need, such as the Enhance tool.
- Click the + up arrow to add the tool to the toolbar.
- Click Save to close the dialog.
Here’s our new tool, ready to use. First up, I have a terribly crooked page, something that can happen when you scan. To fix it, click Enhance Scans > Scanned Document. You’ll see the full Enhance Scans toolbars open. Click Settings. The Filters section shows the status of the available repair filters, with the Deskew filter active by default—leave the filter selected, and click Enhance. When you capture the text, Acrobat redraws the page, aligning the text. Always leave the Deskew filter active.
The next tool we’ll look at is Background Removal, useful for slightly-muddy looking pages, but I’ve got a real test. Here’s a page from a book written in the 1600’s. The pages are very dark with poor contrast and a darkened bleed behind the letters. I’ve got a version of my page adjusted in Photoshop to correct the contrast, and another one with the background removed.
I’ll set the Background Removal to High, and capture the text. If I then activate the suspects, and click Review recognized text, you’ll see some significant differences.
In the unaltered scan, the text capture is surprisingly good, although there are quite a few suspects, and some unidentified errors. The scan from the page with corrected contrast is far better, both in terms of suspects and missed errors. The black and white scan is useless, with virtually every text string a suspect.
The takeaway? Background removal works best when the page has a decent amount of contrast.
Next up, let’s look at descreening. Notice on this page that the text is surrounded by little dots, which can occur when you scan newsprint. In the first example, the capture is done without descreen applied. The capture isn’t too bad, but let’s try with the Descreen filter.
There really isn’t any improvement in the capture. Descreen can’t improve this scan, although it may work in some cases.
The final filter sharpens text. Here’s my starting document. In the first example, I’ve captured the content without any sharpening. You see the capture is very poor. In the second example, I’ll set the text sharpening to High and try again.
Isn’t that disappointing? Although the text looks much heavier on the page, there is no improvement in the capture. The filter may work for some files, but you can’t enhance a poor scan.