Just looking for advise/suggestion on OCR integration. Use case is: We want to be able to scan an image (Drivers license) and pre-populate the application form from the scanned document. Looking for software which was used, whether it is possible or not etc.
I've successfully built out a proof-of-concept level integration with Tesseract via the Tess4J library. It's well beyond out-of-the-box Pega functionality, but Tesseract is one of the most efficient and accurate OCR implementations available... and it's open source.
Since Tesseract is a native library (compiled code / .dll or .so binary) it requires installation on the server machine, which will vary based on OS and processor architecture. The final result is a system that looks something like this, where parenthesis represent "wrapping" or calling:
Data Transform(Function Rule(Java Class(JNA(Tesseract Library(image file)))))
Tess4J handles the Java Class and JNA portions of the architecture, which would be the most challenging pieces for a Pega Developer. Prior to calling the function, the image file needs to be written to disk in a location accessible to the Tesseract library. For attachments, this means they need to be extracted from the database and written as a java.io.File object. Once OCR has been performed, the file can be deleted.
If only a portion of the file needs to be read, Tesseract supports passing in the coordinates of a rectangle representing the area of the image to be read. Alternatively, the image could be cropped prior to running OCR on it.
Unfortunately I was not able to find much more information than what is discussed on this thread, but there are different options available:
Backend-based integration with OCR system (e.g. we put an image file / pdf somewhere or hand it over via web service to the OCR system and receive a structured system readable response back with the results)
Native OCR implementation with 3rd party library in Pega (i.e. based on Tesseract / Tess4J like discussed in this thread)
Here are the options we discussed:
The final decision will be made during the inception of the project and I will post an update then - and later based on the implementation.
For our vision demo, we simulated the OCR part, but included how the UI would look like with options 1-3.
In this case a PDF order would have been OCR'd / parsed already and shown to the Order Manager for verification & validation (we are showing the results on the left hand side and the original PDF document on the right). This would be the case with option 1, 2 and 3, but the actual capabilities would differ (e.g. with option 2 and 3 we would have the possibility to interact on the UI, e.g. select something visually in the PDF and then automatically generate a new order item on the left - with option 1 (backend integration only) this would be a manual change when it comes to the validation step).