Question
1
Replies
218
Views
Posted: December 5, 2019
Last activity: December 13, 2019
Closed
How to extract the content from an email attachment using OCR component?
Hi,
We have a requirement to extract the content from email attachment which is a PDF and create a case based on the content. Do we need to configure any additional configurations in the Email channel?
Hi Santosh, email channel has configuration to enable attachment analysis. Attachments can be analyzed with and without OCR.
With OCR - You need to install ABBY core processor on the application server. More info here - https://community.pega.com/knowledgebase/articles/conversational-channels/installing-pega-ocr-component
Without OCR - When you do not have OCR installed, attachments (pdf, doc, xls, etc) are analyzed using Java libraries which does pretty decent job of text extraction and then passes that text to NLP for intelligent routing. More info here - https://community.pega.com/knowledgebase/release-note/support-extracting-data-file-attachments-during-email-triage
Attachment analysis of email channel in depth -
https://community.pega.com/sites/default/files/help_v74/procomhelpmain.htm#mcp/tasks/mcp-enabling-analysis-attached-file-during-email-triage-tsk.htm
(You can find similar document for all versions of Pega platform)