Process a PDF using NLP Text Analyzer | Support Center

Contact a Moderator
Back to top

Question

MaleeshaW

Member since 2019

Evonsys

Posted: Apr 17, 2020

Last activity: Apr 20, 2020

Posted: 17 Apr 2020 0:52 EDT
Last activity: 20 Apr 2020 9:18 EDT

Closed

Process a PDF using NLP Text Analyzer

Report

Hi,

I have a requirement to process a PDF to find certain keywords using text analyzer.

I am using Pega v8.1.0.

https://community.pega.com/knowledgebase/articles/conversational-channels/84/exploring-text-analyzers I have referred this and it suggests that text analyzers can only be used in emails, chat text messages or voice commands.

Is there any workaround that can be used for this?

TIA

***Edited by Moderator: Pallavi to update platform capability tags***

To see attachments, please log in.

Pega Knowledge 8.1

Prediction Studio

User Experience

System Architect

Like (0)
Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Posted: 4 years ago

Posted: 18 Apr 2020 7:54 EDT

KevinZheng_GCS

PEGA

replied to MaleeshaW

Report

You will need to convert pdf to text first, e.g., this link has some sample code there: https://collaborate.pega.com/question/need-read-data-doc-and-pdf

Unfortunately some java code involved.

After that, you can follow this link to get started for Pega NLP: https://community.pega.com/knowledgebase/articles/decision-management-overview/learn-natural-language-processing-nlp-sample

To see attachments, please log in.

Likes (1)

Maleesha Wilfred

Posted: 4 years ago

Posted: 20 Apr 2020 9:18 EDT

Piotr Skowronek

PEGA

replied to KevinZheng_GCS

Report

As it comes to PDF to text conversion, you have 3 options:

for PDFs with textual layer one may use Rule-File-Binary#pyExtractText extension point (which is using PDFBox to extract the text)
for PDFs that require OCR (non-textual layers) or PNGs or JPGs one may use either PegaOCR component (for on-prem solutions) or DPS (for cloud solutions) - pyExtractText will then invoke pyOcrAttachmentAnalysis to do OCR.

The activity pyExtractText returns text that can be then fed into Pega NLP. As it comes to DPS (Document Processing Service) this is yet unreleased, but you can contact @grabm - our product owner to have early access.

To see attachments, please log in.

Likes (1)

Maleesha Wilfred