Question
Table data extraction in pdf
Hi,
I need to extract information from a table in pdf. For example, when I pass a variable with value "Customer Id"(which is in the first column), I need to get the corresponding value from the second column. I used iTextSharp to read the contents of the pdf and I had obtained the result in the form of a string. Then I used C# string operations to get the required data. The problem I face is that the word "Customer Id" (in the pdf) has a space in it. But this is not recognized by iText. So when I use a variable to enter "Customer Id", there is no match between what I have passed and "Customer Id" in the pdf, since according to the extractor "Customer Id" is a single word without any space.
What could I do about this?
Pega Robotics latest 8.0 version (any version greater than 8.0.1037 would be my suggestion) has support for reading PDFs. If using iTextSharp, you will likely need to seek support from that vendor, although someone in the forums may have experience using it and be able to help.
In the Pega Robotics PDF Connector, you would read the contents from a relative line (i.e. locate the next segment/word/line after a specific item is located). Assuming your table is pretty consistently structured, you can read it piece by piece.