JackC784 Member since 2017 3 posts
Sky Solutions
Posted: September 10, 2019
Last activity: September 23, 2019

How to convert Microsoft Word (.docx) to HTML


We have a requirement to read from a temp file stored as an MS word document (.docx) and convert it in to HTML so that the rich text can be preserved while editing in CKEditor. I have looked into various ways to accomplish this. From what I can tell, Pega has similar functionality when uploading a word document template for correspondence rules but the function would seem to be private and I cant look into the code.

There are two main java libraries that are commonly used for docx to html, Docx4j and Apache POI. Part of the Apache POI library is within Pega and but we are missing XWPFConverter needed for the docx to html methods. Docx4j has a method for docx to html but it looks like it is creating a blank html as the method for loading a word document is dependent on a File input and we have to use PRFile for the pega temp folder.

Is there anyway we can accomplish this docx to html conversion without having to import a new Java Library?

Data Integration Java and Activities
Moderation Team has archived post
Share this page LinkedIn