Question
1
Replies
356
Views
Posted: April 8, 2019
Last activity: April 15, 2019
Closed
Pega IVA : Email Attachments : OCR extract from Word, Excel, Powerpoint filetypes
As per OCR support article, it is possible to extract data from email attachments across several filetypes, including PDFs.
Is it possible to extract data from Word, Excel, Powerpoint and other office documents ?
Dear Sarangan,
You may want to take a look on pySetTextExtractionCapabilities which controls document types. This is not limited to OCR component, in other words if you want to retrieve text from docx or xlsx files you may do it.
However if your scenario is that you have in a xlsx document a picture which you want to OCR then it is not supported. OCR component works with images (e.g. jpg) and as a container for images it can process pdf files.
Hope it helps.
Best regards, Mariusz