For a bit I was writing down the tools I had been working with and making. And then my blog blew up. Or more literally locked up and I lost the data because it was all on a dev machine that I didn’t care that much about.
I didn’t really stop working on things, but didn’t write much about it.
Then yesterday I had an idea. It wasn’t an original idea. It was really a how can I make that so I can use it and not need to install more software.
I came across this tool in a tweet. https://github.com/hakluke/hakcheckurl Written in Go, it checks on URLs, looks like it spiders and gets status codes for the URLs. Cool I thought. Go I thought.
Can I do it in python (I thought)? I played around. I looked around. I really didn’t want to rewrite a crawler. Lazy I know, but it’s my project and time.
Sites have places they don’t want crawled. They put these places in a file in hopes that crawlers will respect this and not look there.
Most of these files/folders will be benign, style folders, images taken out of context, but some can help people looking for vulnerabilities out.
So, why not work out a way to take a look at them solo or in batches of sites?
So, what can it do?
Right now it’s pretty simple. Choose from one site or provide a list of sites and we will go check if they have a robots.txt file and log that data for review.
I’m hoping to add the ability to switch between http and https if one doesn’t show results soon for a site. The thought of piping the disallows to be followed and see what’s there has also crept into my mind.
Download it. Give it a spin. Give it a whirl. Please help me improve it.