Question
Pega RPA - How to extract table content from standard PDF
Hi Everyone,
Objective: To extract table content in understandable view from a standard pdf using Pega Robotics Studio application.
Requirements: Bot will get table title '' to search for in given standard pdf and extract all the table in a veriable or excel etc which should be understandable to read column by column etc.
Attachments:
- file name: 'Table_automation.png' shows our automation which search for a title in general and get text line by line Issues: it will go until end of the pdf to extract all lines because can't match the table.
the bot also extract a paragraph with same title. - file name: 'table_content_output.png' is the bot output of the pdf table in one cell of the excel file and not structured as table to read the table content
- file name: ''Table_PDF.png' is the original table from pdf which is supposed to be extracted via BOT
we are using Pega robotics PDF connectors to extract text. we would really appreciate if there is any possible solution to extract table.
thanks
***Edited by Moderator: Lochan to update platform capability tags***
Without your PDF attached, I can't really validate, but you seem to be on the right track. In the newer builds of 19.1 (19.1.14 or later) the PDF reader has been enhanced and can handle tables with intersecting lines. In this case though, there are no intersecting lines to speak of, so I would guess you'd need to read it line by line. I would guess (from what I can see anyway) that since the second line of each row has a pattern to it, you could use a RegEx expression to check if the new line is part of a new row or a second line in the current row. Depending on your thresholds, you might also be able to count the segments and determine if the line is missing the first part to determine if it is a new line or a continuation.
Once you can determine where you are at in a table, you can add each "part" to a lookup table. You can then call the ImportTable method to place it wherever you want in Excel. If the table and your Excel document are not in the same order, then you'll need to locate the record in your lookup table and get the values you need and set them cell by cell.