The html formatting like below. I Need Subject from each record. Please help me how to parse it? if you save as the code as html you can see the table of content which is coming and system is generating sample html code below.
Unfortunately: that HTML is *not* well-formed - so you will *not* be able to use a PARSE-XML against it.
It contains two unclosed 'meta' tags in the 'head' section of the document.
You would first have to clean these up , using another method (you will need to treat this as a 'string' first, then extract out the 'clean' XML).
Additionally: this HTML (at least the version I 'scraped' from the post) is not in the default (UTF-8) encoding - it appears to be in "Windows-1252" format - you might also need to take this into account.
And lastly: I'm not sure how much control you have over the 'shape' of the incoming HTML; if the format is even slightly bit different (for instance; if text is or isn't contained with a bold "<B>" element); your mapping will fail.
But for the sake of demonstration: here's some screenshots showing how to build a PARSE-XML rule which is able to extract out portions from the (cleaned-up) XML.
I have attached the cleaned-up XML for reference.
It only maps the *first* TD element's text (which is contained with a BOLD tag); for demonstration ; but you could extend this rule to map other elements. (You could even use Repeating Structures here; but they are more complicated to build, so I went for the simpler option here).
1. Create a PRPC Text Property ; in this example "Extract":
2. Create a PARSE-XML rule (From 'Integration Mapping' Category) , call it 'ExtractTextFromEmail' (but we will *change* the namespace, see below).
Change the root element to 'html' : and double-click it:
Click on the 'node' tab and change the 'Node Namespace' to "http://www.w3.org/TR/REC-html40" (because the input XML has the 'xmlns' directive set on the 'html' element).
3. Follow the structure of the input XML document; use 'add element' to add in the sub-tree of html>body>div>table>tbody>tr>td>p>b
4. Map the final 'b' element to the property: (double-click the element):
5. Test the rule: just use ACTION button > Run.
Use the option: "Text to be parsed" and copy and paste in the complete ('cleaned-up') XML and press 'execute':
Note the XML representation of the Page, correctly shows the text 'EmailID' (picked-out of the input XML) mapped to the 'ExtractProperty':
Again, I think this approach (of trying to use PARSE-XML to process incoming HTML) isn't the ideal solution here, mainly because:
1. You have to 'pre-process' the HTML to make it well-formed, and has the correct character-set directive.
2. It will break; if the incoming XML 'shape' changes.
You might want to consider building something in Java (where you can build lower-level processing - as a String etc) ,and then importing that Java as a standalone library/class into PRPC and then 'wrapping' it from PRPC.