Question

1
Replies
8181
Views
Suresh Reddy (redds9)
PEGA
Senior Solutions Engineer
Pegasystems Inc.
IN
redds9 Member since 2014 2 posts
PEGA
Posted: November 9, 2016
Last activity: November 10, 2016
Posted: 9 Nov 2016 2:05 EST
Last activity: 10 Nov 2016 9:30 EST
Closed

Need to read data from the doc and pdf

Hi,


I have a requirement where we get different types of files(doc,docx,pdf) on the filesystem path. Based on the filename input i will be pulling the file from the server and read through the contents of the files and copy it to the property whose control is rich text editor. I am able to read the file contents however, the format, alignment, images, tables are coming as text and displaying the data without any format or alignment. I am reading the file from java code as we have to pic only a specific file which cannot be achieved by file listener.


Please suggest if there is any approach or do i need to modify in my code copied below.


This is for PDF.


com.pega.apache.pdfbox.util.PDFTextStripper pdfStripper = null;

com.pega.apache.pdfbox.pdmodel.PDDocument pdDoc=null;

com.pega.apache.pdfbox.cos.COSDocument cosDoc = null;

ParameterPage pp = tools.getParameterPage();

try{

  String filePath = pp.getString("FullFilePathName");

//java.io.File file = new java.io.File(filePath);

PRFile prfCheck = new PRFile(filePath);

//  java.io.FileInputStream fis=null;

PRInputStream fis = null;

com.pega.apache.pdfbox.pdfparser.PDFParser parser = new com.pega.apache.pdfbox.pdfparser.PDFParser(new PRInputStream(prfCheck));

parser.parse();

cosDoc = parser.getDocument();

pdfStripper = new com.pega.apache.pdfbox.util.PDFTextStripper();

pdDoc = new com.pega.apache.pdfbox.pdmodel.PDDocument(cosDoc);

Show More

Hi,

I have a requirement where we get different types of files(doc,docx,pdf) on the filesystem path. Based on the filename input i will be pulling the file from the server and read through the contents of the files and copy it to the property whose control is rich text editor. I am able to read the file contents however, the format, alignment, images, tables are coming as text and displaying the data without any format or alignment. I am reading the file from java code as we have to pic only a specific file which cannot be achieved by file listener.

Please suggest if there is any approach or do i need to modify in my code copied below.

This is for PDF.

com.pega.apache.pdfbox.util.PDFTextStripper pdfStripper = null;
com.pega.apache.pdfbox.pdmodel.PDDocument pdDoc=null;
com.pega.apache.pdfbox.cos.COSDocument cosDoc = null;
ParameterPage pp = tools.getParameterPage();
try{
  String filePath = pp.getString("FullFilePathName");
//java.io.File file = new java.io.File(filePath);
PRFile prfCheck = new PRFile(filePath);
//  java.io.FileInputStream fis=null;
PRInputStream fis = null;
com.pega.apache.pdfbox.pdfparser.PDFParser parser = new com.pega.apache.pdfbox.pdfparser.PDFParser(new PRInputStream(prfCheck));
parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new com.pega.apache.pdfbox.util.PDFTextStripper();
pdDoc = new com.pega.apache.pdfbox.pdmodel.PDDocument(cosDoc);

String parsedText = pdfStripper.getText(pdDoc);

tools.putParamValue("ContentSourceAuthored",parsedText);
}catch(Exception e) {
 throw new PRRuntimeException("Unable to read file '"+e);
}

Show Less
Data Integration Java and Activities
Moderation Team has archived post, This thread is closed to future replies. Content and links will no longer be updated. If you have the same/similar Question, please write a new Question.