2017-01-10 at 10:33 PM UTC
So i got an idea for a program i wanted to write and i'm about 80 lines deep but i ran into a problem. The basic idea behind the program is that i would load a list of dorks or a single dork, open google and search for websites that are vulnerable to SQLi or XSS or whatever the dork is for, automatically. To that end i wanted to use mechanize and Scrapy, i would use Mechanize to open startpage/google and enter the dork into the search bar. The results i would grab with mechanize' s `links()` method and store them to an array, from there Scrapy would read them in and do all the heavy lifting.
Unfortunately, Mechanize doesn't seem to support parsing JS and over at startpage the links are wrapped in some JS. This makes it so that when mechanize looks at them all it sees is `javascript:void(0);`.
What's more, Google knows when you point a bot at them with some JS magic and throws a captcha. So i was wondering if there would be a way to parse/process JS with mechanize in order to emulate a browser better and make my program work without changing too much about the original design.
Anyway if it's not possible to do this(Looks like it) i am open to suggestions
2017-01-11 at 12:03 AM UTC
Maybe you can try to use
Selenium python library to automize actual browser to process it? Something like
this or
this looks interesting.
Edit: I realize the first link only applies to Java but same concept, there are many other interesting example snippets for python in the documentation.
Post last edited by 0Death at 2017-01-11T00:05:45.956375+00:00
The following users say it would be alright if the author of this
post didn't die in a fire!
2017-01-11 at 1:39 AM UTC
aldra
JIDF Controlled Opposition
yeah, mechanize is not going to parse client side-scripts, you'll need a browser emulator to do that.
I would probably recommend looking at the dumped pages to see if you can work out how those links are encoded to JS so you can reverse it via string functions - I'm thinking those are your only two options, really.
2017-01-12 at 1:20 AM UTC
Anyway, i went with selenium, i am actually glad i did, selenium is really straightforward.