A Reddit web scraper bot

Posted on 18-04-2018

After a long break I started playing my favorite childhood game again, the MMORPG Dofus. Being a frequent Reddit user I also subscribed to the Dofus subreddit so I could easily get the latest updates about the game. There was one problem however; while browsing submissions people were mentioning items, but without their stats. Dofus has a great online encyclopedia showcasing all items and their respective stats, but every time someone mentioned an item I had to switch apps on my phone and manually search for it myself. “How come nobody made a bot for this yet”, I thought out loud. Hold on, I could make this bot!

Dofus artwork

Dofus artwork

After some research I quickly settled on Python and figured I could use Praw for all Reddit interactions. Although I helped as a teachers aid with the Python classes, I’ve never done a stand alone Python project before but this proved not to be an issue. The Reddit interaction turned out to be the easiest, the web scraping was a bore, but the surprisingly hard part was figuring out how I was going to get the item from the comments. I figured the user could mention the item between brackets (ex: [Tofu set] ) and the bot gets the keywords in between those brackets (Fun fact: In the beginning the bot tried to search for hyperlinks as well, so I had to build a check to prevent that). Searching on the encyclopedia itself proved to be troublesome, so I took a step back and decided to go with the easiest approach: A simple Google search. It searches for ‘site:dofus.com keywords‘, and if the first search result is a link to the encyclopedia it’ll continue. The scraping class checks the url to see if it contains either ‘/sets/’, ‘/equipment/’ or ‘/weapon/’ and gets the correct data accordingly. Hosting it on Heroku turned out to be surprisingly easy as well.

Example of an item on the Dofus Encyclopedia

Example of an item on the Dofus Encyclopedia

I did a small beta test where I got great feedback about markdown and duplicate stats I totally overlooked. A week later, after implementing all the feedback, the bot was up and running.

The bot replying to a comment

The bot replying to a comment

Being the first bot/web scraper I ever wrote, I didn’t have a clear approach which sadly reflects in my programming choices. Despite this I decided to make the project public on my Gitlab account, and wrote a simple readme so anyone interested can play around with it themselves. All in all it was a fun side project, offering interesting insights and it made me grow more accustomed with Python.