Posted: 1 May 2017 2:48 EDT Last activity: 5 May 2017 7:58 EDT
Guidelines or Tips for Crawling the Internet Web Sites
Hi. Do we have any guidelines or tips for crawling the Internet Web sites with Pega Robotics?
For example, Yahoo Web site's search function seems to restrict robot's access in a row. As far as I tested, it returns the warning page after accessing it approximately 50 times consecutively. But it worked fine after I put 10-second think times between each iterations.
One of our customers is concerned on the robustness of our Automation executions in this perspective. Can we say something on this subject?
Due to the vast variety in the ways websites are created and written and the fact that many of them have functionalities built into them with the purpose of stopping malicious scripts from attacking them it's extremely difficult to create guidelines that would suffice in explaining the potential issues one could have across all sites. Generally speaking cases such as the problematic tendency you found with Yahoo are dealt with on a case by case basis.
That said, the PDN is a great way to find out how people dealt with certain issues on common sites and common issues that one could run into. Additionally we have an active support team that is more than willing to help deal with a particularly troublesome issue on a site you could come across.