Posted: 23 Aug 2019 10:28 EDT Last activity: 23 Aug 2019 17:00 EDT
Can Document OCR not get text out of a scanned pdf?
I have tried processToXml and ProcessToPdf and tried putting ProcessToPdf before each of these and thried everything with and without ocrImagesAndText being true. everything just returns false. I am trying to get text out of a pdf produced by scanning a paper document, but there are even some pdfs the regular pdf connector can read that document ocr cannot, unless I just cannot sort out how to use it. I can make it get text from images in word documents and it can get text out of a pdf I make by doing a print to pdf, so I know I am not doing everything wrong. can this component actually not get text from a scanned pdf?
It can but a lower quality of the scan may be the reason. I would suggest you open a support request so that they can examine your specific PDF unless you can attach it here for the community to examine.
forgot to say, that code uses Acrobat SDK, so this is only helpful to people with Acrobat Pro DC. We have to bring a free-er easier solution to the people, Tsasnett Sir. Acrobat SDK rough and not free.