torcontrol and Tor circuits managers

November 29, 2022

TLDR

Stem did not encourage me to read specifications. I fucked error handling up and now proudly introduce my own minimalistic Tor controller.

Rants

All started from the need for circuits management. Suppose you want to scrape something anonymously and would like to take advantage of Tor exit nodes. To reduce the traffic on the Tor network it would be nice to use as few relays as possible. Tor does not allow circuits with single exit node only, but we can exclude middle relays. All you need is to make sure your application uses these short circuits, and you also need to build new ones if some get broken.

In a nutshell, that's pretty easy task. To build circuits you need to construct custom paths using relay descriptors and send EXTENDCIRCUIT 0 command to the Tor service (Stem offers create_circuit method for that). Then, CIRC events let you know when a circuit is created or destroyed so you can maintain the desired amount. And to make sure your application uses right circuits, you should set __LeaveStreamsUnattached to 1 so you'll receive STREAM events and attach streams to your circuits on your own.

My first Stem-based implementation seemed to work well but it stopped receiving events, from time to time. I had to restart my circuits manager, and I had no idea where the problem was. I had two options to dig in: either Stem codebase, or Tor Controller Specification. I chose the latter, simply because it has much less lines to read.

It didn't take too long to write my own implementation, but it had the same issue. On the other hand I had no one to blame but myself and finally I found the source of the problem: a "while True" loop in the event handler and error 551, "unable to attach stream". When it tried to attach stream to an already broken circuit it proceeded to the next one, but when something went wrong with stream, ATTACHSTREAM never succeeded and my circuits manager stopped working.

A correct approach was trying a few times and giving up. Tor would close stream in a couple of minutes and the scraping software would re-try after timeout.

I split my prototype into dtcm2 and torcontrol. Although torcontrol is deep alpha at the moment, both are thoroughly tested in production.

torcontrol is pretty simple. The basic TorConnection class provides send_command and handle_event methods. If you want to handle events, you need to override the latter in your subclass. All commands (a few as of time of writing) are implemented in the TorCommands mix-in. If you need commands that aren't implemented yet, you don't need to contribute to the library, you simply implement them in your subclass.

The TorController class is just a subclass of TorConnection and TorCommands. Usage is pretty simple as well:

from torcontrol import TorController

async with TorController('127.0.0.1', 9051) as tc:
    await tc.authenticate('my-secret-password')
    async for circuit in tc.get_circuits():
        print(circuit)

I have no idea how elegant my approach is, some pieces of code is still total shit, but if you want to dig into Tor, it's much easier and funny to play with small codebase. In future I'll make it looking better and implement all commands, I hope.

Credit goes to Stem project, from which I borrowed some parsing code.