Question

1
Replies
7929
Views
redds9 Member since 2014 2 posts
PEGA
Posted: November 9, 2016
Last activity: November 10, 2016
Closed

Need to read data from the doc and pdf

Hi,

I have a requirement where we get different types of files(doc,docx,pdf) on the filesystem path. Based on the filename input i will be pulling the file from the server and read through the contents of the files and copy it to the property whose control is rich text editor. I am able to read the file contents however, the format, alignment, images, tables are coming as text and displaying the data without any format or alignment. I am reading the file from java code as we have to pic only a specific file which cannot be achieved by file listener.

Please suggest if there is any approach or do i need to modify in my code copied below.

This is for PDF.

com.pega.apache.pdfbox.util.PDFTextStripper pdfStripper = null;
com.pega.apache.pdfbox.pdmodel.PDDocument pdDoc=null;
com.pega.apache.pdfbox.cos.COSDocument cosDoc = null;
ParameterPage pp = tools.getParameterPage();
try{
  String filePath = pp.getString("FullFilePathName");
//java.io.File file = new java.io.File(filePath);
PRFile prfCheck = new PRFile(filePath);
//  java.io.FileInputStream fis=null;
PRInputStream fis = null;
com.pega.apache.pdfbox.pdfparser.PDFParser parser = new com.pega.apache.pdfbox.pdfparser.PDFParser(new PRInputStream(prfCheck));
parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new com.pega.apache.pdfbox.util.PDFTextStripper();
pdDoc = new com.pega.apache.pdfbox.pdmodel.PDDocument(cosDoc);

String parsedText = pdfStripper.getText(pdDoc);

tools.putParamValue("ContentSourceAuthored",parsedText);
}catch(Exception e) {
 throw new PRRuntimeException("Unable to read file '"+e);
}

Data Integration Java and Activities
Moderation Team has archived post
Share this page LinkedIn