Posted: 16 Sep 2020 10:11 EDT Last activity: 8 Oct 2020 0:15 EDT
Query blob to to extract files
We have a use case where we want to extract text from PDFs, JPGs (OCR). We decided we will use AWS's Textract.
We are trying to query blob (pc_data_workattach, pzpvstream) to extract files but we are not able to read the files even after successful extraction. Does Pega encode the blob by it's proprietary mechanism? Any suggestions on how we approach our use case?
The BLOB is the entire class instance serialized into a binary format. What you need is the content of the file itself which will be stored in one property inside the blob. I don't know the data model offhand, but you can start by investigating the Data-WorkAttach-File class to find what property the file content is stored in (perhaps pyFileData or pyAttachStream).