I have some troubles to convert Word files recieived from attachements to PDF. The context is such as, I am using Pega CRM 7.3 and I need to convert my doc/docx to sring Base64 but in such way that it is converted as PDF.
that "[...] ConvertAttachmentToPDF flow action uses ActiveX integration and works only in Internet Explorer browser. Pega 7.3 and later versions do not support ActiveX controls[...]". I am using Chrome and Firefox, iIcan't be restricted to a single browser tho.
How accurate does the PDF have to be? I mean: could you extract the text and the basic structure of the WORD file (using 'doc4jx' maybe) and then create a PDF using (say) PDFBox - both libraries ship with the Pega Platform.
So the PDF would be a representation of the original WORD file - rather than being a 'screenshot' of the actual WORD file itself - depends on your use-case as to whether that would be acceptable as a 'preview' or not I guess.
Can you use a Print Driver ("Print to PDF") here? That would output the WINWORD file as a PDF - but you would (probably) need to hook out to a Windows Environment to do this.
Or - is there a Web Service which could perform the service for you?
For 'PDFBox' and 'docx4j' - yes, you would most probably need to write a combination of Activities (Java Steps as you say) and Functions to wrap the functionality. I'm afraid I don't have an examples of 'docx4j' - but you could you start by taking standalone examples (check out some of the 3rd party forums for a starting point).
You could also try using the Apache POI library (also ships with Pega Platform) here as well: in the most basic form - you could try extracting the text (losing the rich structure) and then using the OOTB activity HTMLTOPDF to generate a (basic) representation of the content. There is an example here for extracting text from a WORD file.
With regard to using a Print Driver - again, yes you would then have to figure out a way of capturing the filename which is output and saving that back to Pega. (Possibly Pega Robotics could help there).
By the way : I had a typo in my original reply - the library is called 'docx4j' (not 'doc4jx' - which sounds more like an Australian lager).
Just for reference; I was able to get a (very) basic Word (text content only - no formatting) working in PRPC8.1.x - using OOTB libraries.
Including RAP file here - it contains only a single Activity and an example WORD file (which is also attached here).
It works by opening up a Rule-File-Binary (OBJ-OPEN) of the WORD file (which I uploaded first obviously) - converting the 'pyFileSource' base64 to a binary array and then using the following code to extract the text.