Question
3
Replies
39
Views
Blue Cross and Blue Shield of Minnesota
Posted: September 16, 2020
Last activity: October 8, 2020
Query blob to to extract files
We have a use case where we want to extract text from PDFs, JPGs (OCR). We decided we will use AWS's Textract. We are trying to query blob (pc_data_workattach, pzpvstream) to extract files but we are not able to read the files even after successful extraction. Does Pega encode the blob by it's proprietary mechanism? Any suggestions on how we approach our use case?
The BLOB is the entire class instance serialized into a binary format. What you need is the content of the file itself which will be stored in one property inside the blob. I don't know the data model offhand, but you can start by investigating the Data-WorkAttach-File class to find what property the file content is stored in (perhaps pyFileData or pyAttachStream).