Jakub Lebeda (JakubL28)
PricewaterhouseCoopers Advisory SRO

PricewaterhouseCoopers Advisory SRO
JakubL28 Member since 2018 1 post
PricewaterhouseCoopers Advisory SRO
Posted: November 16, 2018
Last activity: April 15, 2019
Posted: 16 Nov 2018 8:45 EST
Last activity: 15 Apr 2019 5:43 EDT

DocumentOCR - ProcessToText method output

I am trying to process a text from scanned receipt using DocumentOcr's ProcessToText method. The method returns a string containing two whitespaces(apart from linespace) - \u2028 and u\2029 that should help format the string to match formatting of the scanned file. However, I cannot figure out how that can be achieved and the method seems to be processing different parts of the image at random - not by lines, not by columns, not by some kind of parts.

Attached is one example of the input for the method(test_ocr.jpg) along with a text file with output(output.txt) - new line and the ";" are the different whitespaces. As you can see, there seems to be no apparent order in which the method processes the text.

Is there a way to format the processed data to recreate (at least) relative positions of text from the original image?

Robotic Process Automation
Moderation Team has archived post, This thread is closed to future replies. Content and links will no longer be updated. If you have the same/similar Question, please write a new Question.