STARTUP: A Web 2.0 Crawler - Pentestmag

STARTUP: A Web 2.0 Crawler

Web 2.0 Crawler

My name is Sina Yazdanmehr. I am a penetration tester and information security consultant. My expertise is web and mobile applications security. I currently work for Infigo IS, and have worked for other security firms and CERT since 2009.

My project is a new kind  of web crawler being able to crawl on WEB2.0 web applications. The idea behind this tool is to crawl WEB2.0 applications, and capture all AJAX requests as well as extract all URLs from the web page's contents.

Due to the fact that there is not any fixed pattern for developing WEB2.0 applications, it is almost impossible to crawl new web applications the same as traditional tools do. Because of new technologies, such as Single-Page Applications (SPAs), DOM event listeners, dynamic DOM tree and so on, scanning HTML contents and trying to extract addresses and input parameters from the HTML codes will fail.

This new tool takes advantage of a web browser's engine, to parse the retrieved contents of each page, and by modifying built-in and native JavaScript and DOM abilities and functions, will try to capture all static URLs and AJAX requests as well as HTML forms. At the end it will cluster acquired URLs.

Additionally, it is possible to integrate its output with other tools.

More information:

How did the idea of organizing it appear?

Nowadays, programmers tend to develop WEB2.0 (AJAX based) web applications, and this trend has caused traditional web crawlers to become useless. For penetration testing projects, the most important part of the test is reconnaissance, and one of the most useful tools for this part is web crawler because it can help with finding the entry points of target application. Since the existing tools are not useful for crawling new web applications, this new tool is going to enable penetration testers to crawl AJAX-based web applications, and makes the test easier and more accurate.

Why is your project interesting and innovative?

Because this tool uses a totally different manner for gathering URLs. It uses a web browser engine and parses the fetched contents, then captures all HTTP requests and extracts all URLs from the content.

This method can be used for any type of web application, even old fashioned ones. Also, it can even crawl web applications that have been developed on top of the client-side libraries like AngularJS, JQuery, etc.

What kind of audience can be interested in that?

Penetration testers, security consultants and developers.

How is it different from other similar projects on the market ?

All the existing tools utilize the same method, they just scan the HTML contents and look for URLs. These tools cannot cover AJAX requests and HTML forms, also, they are unable to trace DOM's changes and analyze new changes (basically, they are not able to analyze the DOM tree).

This tool is going to analyze HTML content instead of just scanning.  

Notify of

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Inline Feedbacks
View all comments

© HAKIN9 MEDIA SP. Z O.O. SP. K. 2013