Posted: 8 Feb 2016 13:03 EST Last activity: 25 Feb 2016 8:04 EST
Can a pega Http Connector rule encode output in UTF-8 ?
I am working for a project in the EU for a customer who wishes to use special characters in an http communication over an http connector. The special characters are the 'Euro' currency symbol, and other accented characters that are not available in 8-bit ASCII.
I believe that internally Pega stores strings as UTF-16 because this is the standard that Java uses. When a Pega http Connector takes a string from a property and sends it to the outside world, it performs some kind of conversion to get an 8-bit representation of the string. I can see that the string I want to send looks correct in the property at the point that the connector is called by using the tracer, however the service at the other end is expecting a UTF-8 encoded string and is throwing an incorrect encoding exception. Somewhere under the hood, Pega is serializing its string for transmission as an 8-bit string. I think it is not performing UTF-8 encoding. I can see no way of specifying the encoding anywhere on the Pega http connector rule.
1. Does anybody know what encoding Pega uses when it serializes a string for output in an http Connector rule?
2. Does anybody know how to specify the encoding for Pega to use, when preparing a transmission in an http Connector rule?
3. Does anybody know a way that I can view, byte for byte, exactly and precisely what got sent? i.e., not just a rendition of it in a browser window, but the exact byte-stream ideally rendered in hexadecimal?
You may verify the encoding that is being set using System Management Application (check for file.encoding) and using the JVM argument you may change the current encoding in place by adding -Dfile.encoding as an argument.
Based on the application server that is being used, the argument to be used might change.
You can verify the data being sent over the wire using any HTTP sniffer tools like Fiddler, HTTPWatch etc.
Before making any changes, please verify the content using any of the above mentioned tools.
Thank you for the advice. I was not aware of the 'file.encoding' setting displayed in SMA, so I have had a look: it is indeed set to "UTF-8". Does this prove, unequivocally and beyond doubt, that we are sending xml in correctly encoded UTF-8 from my http connector? Because I dont think we are but I cannot prove it (note the need for conclusive proof here, this is a dispute situation) one way or the other.
Unfortunately the Pega Dev system is running on the cloud, and the Test and Prod systems are running at the customer's site behind a firewall, and in all cases they will not allow me to install Fiddler or TCPMon or HTTPWatch or similar tool on the system where Pega runs or the destination server in Germany where the system to which Pega connects runs.
I think, an approach I may be able to use, is to install a Pega PVS system on my laptop that is not in a VMware sandbox system but is native on my laptop, then create an HTTP Service on the PVS system, and then get the Cloud Pega system to send to that service using its Http Connector to make http post, and then use a tool such as TCPMon on my laptop to look at the incoming byte-stream to observe it is correctly formatted. This is going to take some time but I cannot think of a better approach. And it may not work; laptops are often on dynamic IP address not accessible as a URL from the wide-area network.
I will do some research on the net to see if I can find a better way. Maybe someone has written a tool on the net, like posttestserver 'http://posttestserver.com/post.php' but that also gives information about the encoding of the bytestream it receives.
Ok, Update. My Pega system IS transmitting correctly formatted UTF-8.
I created an Activity to call the Http-Connector directly, in order to make screenshots and testing easier, and to my surprise the SMS messages I received when I ran the Activity were clearly correct, containing all the special characters that I needed.
Therefore, in my Pega system, the loss of correct encoding is happening somewhere internally, in Pega Marketing Framework application where the messages are generated and sent. The messages look good when they are created, but obviously are not good when they arrive at the Pega Http Connector to be transmitted.
Pega Marketing Framework is very complex, and the treatments sent to a Prospect are first written to a database table called BATCHOUTPRxxx en-route to being sent, perhaps it is this step that is affecting the encoding.