Here you go. The key to working with PDFs is to look at the Developer Tools available in the PDFViewer. You can highlight the various parts you can extract and then determine how to navigate the document from there.
I hard-coded this to use the attached PDF, but you could replace that with a selection if you like.
1. To add any static .Net methods (like System.IO.Path.GetDirectoryName), you right-click on an area of the Toolbox and select Choose Items. You then select "Pega Robotics Static Members". Next, select "From Global Assembly Cache". Finally, select the assembly the method you wish to use is in. In this case (most of the interesting ones are in either mscorlib or System) select mscorlib (it is near the bottom of the "m", so if you click the letter "n" and scroll up), you can find it quicker) and locate the Directory node. Simply check next to whichever method you like.
2. I wanted to attach the PDF to the example for ease of distribution. In practice, you'd load the PDF from another path. The logic there is just used to locate the file on-disk in the extract directory of the solution, since I have attached the file to the deployment.
3. OCR simply turns whatever file you have into a PDF to use the PDF Connector component on it, so not directly. I guess you could get the entire text of the file and parse it yourself though.