Anubis on git.gluecode.net

Posted 2025-06-23

TL;DR - Anubis is now protecting git.gluecode.net

I keep a couple of local mirrors of big, public repositories so I can do various git things locally on my server. Repos like mediawiki-core or the linux kernel or servo are fairly large and pulling them regularly as a full mirror means I don’t have to worry about whether or not upstream is available.

Meme of Ralph saying “I’m in danger” captioned with “When you host large repos”

Somehow AmazonBot, Claude, ChatGPT, and others found my server. They decided it was a good idea to click on every comparison link (repeatedly). They refused to honor robots.txt, so I put a 403-redirect for all “bot” user strings in nginx. That took out the “polite” bots that modified their user-agent string, like AmazonBot and ChatGPT, but there was a non-zero contingent of bots that did not care and proceeded to keep hitting my server.

Anubis saves the day

I did the somewhat-smarter thing back when I configured nginx and made cgit run within fcgiwrap, instead of the original apache/bozohttpd method of direct CGI invocation.

By running in fcgiwrap, I was already prepared to inject Anubis into the workflow. (Thank goodness for Xe caring about getting Anubis functioning on FreeBSD!) After following a few instructions to move my nginx-server directive for the cgit instance over to a socket-only listener, adding anubis was… easy. Nginx is both the TLS frontend and the backend for anubis to do its thing, so I get full protection for my computationally-expensive subdomain.


I’ve since “enhanced” the policy definiton for anubis to be a bit more brutal with bots, but the answer to every bot-like client will be “Absolutely Not”. I’ll work on allow-listing groups like the Internet Archive, but they don’t seem to play by any rules, and all of this code will be backed up separately anyway.

Thanks again to Xe. I made sure to increase my donations, as this is more than worth the money for an otherwise free product.

If you want to use this for commercial purposes, make sure to sponsor development. Your company is worth spending some money to make sure you survive the current/oncoming LLM crawler onslaught.