Git on pkarr

Posted on by Chris Warburton

An earlier post showed how to configure our OS to resolve pkarr addresses; and a followup post showed how DNSLink can associate content identifiers with domain records (including pkarr records), in a way that can be looked up and traversed using Kubo.

Git objects on IPFS

We’ve already seen how git stores files (AKA “blobs”), directories (AKA “trees”), commits and tags as “objects”, which are immutable and referenced by their SHA hash.

The git-remote-ipfs executable in my git-on-ipfs repo allows git to fetch objects from IPFS and (if we have access to a Kubo RPC API) push objects into IPFS too. This liberates git object data from individual directories on individual machines; into a single, distributed, global (“interplanetary”) storage layer.

Git refs

Git repositories also contain “refs”, which are names that point to particular objects (via their SHA hashes). The most common types of ref are “tags”, which point to a particular object and rarely change (e.g. “version 1.0”); and “branches”, which point to some on-going work (e.g. “fix installer”), and are updated as new decendent commits are made.

Git refs are very similar to DNSLink, since they both associate a “name” to a content address (i.e. a hash). Hence, in order to liberate git fully, we need to store those refs in a single, distributed, global (“interplanetary”) naming layer.

Git on DNS

The Domain Name System seems like an obvious contender; after all, it’s the original use-case for DNSLink. However, DNS on its own does not provide any cryptographic assurances; so we cannot be sure that responses contain the “real” identifiers we requested. This is especially dangerous for a system that manages code, like git; since a spoofed identifier is a code injection vector!

DNS also has other annoyances, like a lack of standard tooling or protocols for managing zone data. The Internet’s main DNS database also has obvious problems with requiring “registration”, and allowing rent-seeking, seizures, etc. Note that cryptographic verification can be added using DNSSEC; but that hands control to a set of gatekeepers, which is not a path to liberation.

In any case, the git-remote-dns executable in my git-on-ipfs repo allows git to fetch refs from a DNS domain. Each ref (e.g. branch) is a subdomain, referred to by a PTR record on the _heads subdomain (similar to DNS-SD). Each ref’s subdomain contains:

Pushing is not supported, due to the lack of any standard tool/protocol.

If a fetch is asked for objects rather than refs, the request will be forwarded to git-remote-ipfs.

If you’ve set up your system to resolve pkarr addresses then those will work with git-remote-dns. However, we can go much further in that case!

Git on pkarr

All of the problems we faced with DNS are avoided by pkarr:

The git-remote-pkipfs executable can fetch refs just like git-remote-dns except, since it’s specialised for pkarr records, it allows pushing refs too. Pushing requires access to the pkarr seed, which we look up using git’s credential system. Personally, I keep them in pass and use pass-git-helper.

Next steps

Dogfood

I’ve been using git-remote-pkipfs for about a year, and it works reasonably well for my needs. This is partly why I haven’t bothered pushing things to centralised places like chriswarbo.net/git for a while!

I’ve only been moving repos across to pkarr/IPFS when I make new commits, so only “active” repos have been moved so far. I’ve been putting their pkarr addresses in their README so I don’t forget which is which!

Kubo git signatures

I hit some issues with Kubo’s git codec (which is used to parse git objects, to traverse their links). It only supports PGP in the signature field, but recent versions of git allow other types, like SSH and x509. I’ve made a patch for this, but the upstream repository seems pretty dormant. In the mean time, I’ve managed to work around it by having git-remote-ipfs treat commit objects as raw blocks, whose parents and tree we extract ourselves; rather than relying on Kubo to parse them.

Repeaters

Data in the Mainline DHT needs to be refreshed periodically, so pkarr comes with a “repeater” tool which can do that. I haven’t set that up yet, since I’d like to run it on my VisionFive2, but some of pkarr’s dependencies don’t seem to build on RISCV.

Reproviders

IPFS nodes can transfer data between each other, but they need some way to know which nodes have the blocks they need. AFAIK a Kubo node will ask the nodes it’s directly connected to, which works well for transferring data between the same group of nodes (say, a bunch of machines in an organisation). For nodes on the same LAN, this can be mostly automatic, thanks to mDNS.

If we aren’t connected to any peers which provide the blocks we need, we can try looking up some providers in the “Amino” DHT. However, that relies on provider records being announced: whether Kubo announces such records depends on its “provider strategy”, and whether the blocks are “pinned”.

Kubo’s “all” provider strategy tries to announce a record for every block, which in our case includes every git object of every repo. That results in far too much network usage: in fact, I’ve seen it take days to work through all of the blocks on my nodes; at which point, the DHT records are out-of-date and need reproviding (Amino has a 2 day TTL).

Instead, we want to limit the number of blocks/CIDs we announce provider records for: we certainly want to announce the latest commit object for each branch/ref, and perhaps the tree object for those commit objects too. It might be worth announcing ancestor commits too, but I don’t think any further trees (or blobs) would be needed.

Again, I want to set up my VisionFive2 to do this, but haven’t gotten around to configuring it yet (I’d want it tied to the pkarr records, so freshly-announced commits would cause us to announce those new commits/trees instead).

IPNS

As mentioned in my DNSLink post, the IPFS project provides its own naming layer called IPNS, which I tried about a decade ago and eventually stopped using. Once I’ve sorted out my repeaters and reproviders, I might revive those old IPNS addresses and push to them too.

GNS

The GNU Name System looks like a nice alternative to pkarr, but I’ve had trouble bootstrapping GNUNet. My ISP doesn’t provide IPv6, and it’s also been failing to trigger uPnP on my NAT.

GNS is attractive, since it provides the same “public key as an address” and “records stored on a DHT” features as pkarr; but it also has nice features like aliasing and delegation. For example, I could alias one of my zones to a name like chriswarbo, and put my repos on its subdomains (e.g. chriswarbo-net.git.chriswarbo for this site’s git repo), then do all the same _heads, DNSLink, etc. as described in this post.

Of course, I could already set up such aliases for myself using my various name resolvers; but GNS supports delegation, where those aliases are also subdomains of my zone. For example, resolving an address like chriswarbo-net.git.chriswarbo is like asking “What am I calling chriswarbo? What does chriswarbo call git? And what does git.chriswarbo call chriswarbo-net?”.

GNS feels like a long-term solution to many of the problems with DNS, whilst pkarr feels like a neat hack. However, pkarr already works, so that’s what I’ll stick with for the time being!