Git on pkarr
An earlier post showed how to configure our OS to resolve pkarr addresses; and a followup post showed how DNSLink can associate content identifiers with domain records (including pkarr records), in a way that can be looked up and traversed using Kubo.
Git objects on IPFS
We’ve already seen how git stores files (AKA “blobs”), directories (AKA “trees”), commits and tags as “objects”, which are immutable and referenced by their SHA hash.
The git-remote-ipfs executable in my git-on-ipfs repo allows git to fetch
objects from IPFS and (if we have access to a Kubo RPC API) push
objects into IPFS too. This liberates git object data from individual
directories on individual machines; into a single, distributed, global
(“interplanetary”) storage layer.
Git refs
Git repositories also contain “refs”, which are names that point to particular objects (via their SHA hashes). The most common types of ref are “tags”, which point to a particular object and rarely change (e.g. “version 1.0”); and “branches”, which point to some on-going work (e.g. “fix installer”), and are updated as new decendent commits are made.
Git refs are very similar to DNSLink, since they both associate a “name” to a content address (i.e. a hash). Hence, in order to liberate git fully, we need to store those refs in a single, distributed, global (“interplanetary”) naming layer.
Git on DNS
The Domain Name System seems like an obvious contender; after all, it’s the original use-case for DNSLink. However, DNS on its own does not provide any cryptographic assurances; so we cannot be sure that responses contain the “real” identifiers we requested. This is especially dangerous for a system that manages code, like git; since a spoofed identifier is a code injection vector!
DNS also has other annoyances, like a lack of standard tooling or protocols for managing zone data. The Internet’s main DNS database also has obvious problems with requiring “registration”, and allowing rent-seeking, seizures, etc. Note that cryptographic verification can be added using DNSSEC; but that hands control to a set of gatekeepers, which is not a path to liberation.
In any case, the git-remote-dns executable in my git-on-ipfs repo allows git to
fetch refs from a DNS domain. Each ref (e.g. branch) is a
subdomain, referred to by a PTR record on the
_heads subdomain (similar to DNS-SD).
Each ref’s subdomain contains:
A DNSLink record, specifying the ref’s associated content; e.g. the IPFS CID of a git commit.
A TXT record specifying its name; which can be useful if the ref’s name contains characters which are not valid as part of the subdomain.
Pushing is not supported, due to the lack of any standard tool/protocol.
If a fetch is asked for objects rather than refs, the
request will be forwarded to git-remote-ipfs.
If you’ve set up your system to resolve pkarr addresses then
those will work with git-remote-dns. However, we
can go much further in that case!
Git on pkarr
All of the problems we faced with DNS are avoided by pkarr:
- It’s cryptographically verified
- There is no need for gatekeepers or “registration”
- It does not allow seizure or rent-seeking (other than “vanity” keys, which are silly anyway)
- There is a standard protocol (BEP-0044) and tooling (pkdns-cli) we can use to update our refs
The git-remote-pkipfs executable can fetch refs just
like git-remote-dns except, since it’s specialised for
pkarr records, it allows pushing refs too. Pushing requires
access to the pkarr seed, which we look up using git’s credential
system. Personally, I keep them in pass and use pass-git-helper.
Next steps
Dogfood
I’ve been using git-remote-pkipfs for about a year, and
it works reasonably well for my needs. This is partly why I haven’t
bothered pushing things to centralised places like chriswarbo.net/git
for a while!
I’ve only been moving repos across to pkarr/IPFS when I make new commits, so only “active” repos have been moved so far. I’ve been putting their pkarr addresses in their README so I don’t forget which is which!
Kubo git signatures
I hit some issues with Kubo’s git codec (which is used to parse git
objects, to traverse their links). It only supports PGP in the
signature field, but recent versions of git allow other
types, like SSH and x509. I’ve made a patch for
this, but the upstream repository seems pretty dormant. In the mean
time, I’ve managed to work around it by having
git-remote-ipfs treat commit objects as raw blocks, whose
parents and tree we extract ourselves; rather than relying on Kubo to
parse them.
Repeaters
Data in the Mainline DHT needs to be refreshed periodically, so pkarr comes with a “repeater” tool which can do that. I haven’t set that up yet, since I’d like to run it on my VisionFive2, but some of pkarr’s dependencies don’t seem to build on RISCV.
Reproviders
IPFS nodes can transfer data between each other, but they need some way to know which nodes have the blocks they need. AFAIK a Kubo node will ask the nodes it’s directly connected to, which works well for transferring data between the same group of nodes (say, a bunch of machines in an organisation). For nodes on the same LAN, this can be mostly automatic, thanks to mDNS.
If we aren’t connected to any peers which provide the blocks we need, we can try looking up some providers in the “Amino” DHT. However, that relies on provider records being announced: whether Kubo announces such records depends on its “provider strategy”, and whether the blocks are “pinned”.
Kubo’s “all” provider strategy tries to announce a record for every block, which in our case includes every git object of every repo. That results in far too much network usage: in fact, I’ve seen it take days to work through all of the blocks on my nodes; at which point, the DHT records are out-of-date and need reproviding (Amino has a 2 day TTL).
Instead, we want to limit the number of blocks/CIDs we announce provider records for: we certainly want to announce the latest commit object for each branch/ref, and perhaps the tree object for those commit objects too. It might be worth announcing ancestor commits too, but I don’t think any further trees (or blobs) would be needed.
Again, I want to set up my VisionFive2 to do this, but haven’t gotten around to configuring it yet (I’d want it tied to the pkarr records, so freshly-announced commits would cause us to announce those new commits/trees instead).
IPNS
As mentioned in my DNSLink post, the IPFS project provides its own naming layer called IPNS, which I tried about a decade ago and eventually stopped using. Once I’ve sorted out my repeaters and reproviders, I might revive those old IPNS addresses and push to them too.
GNS
The GNU Name System looks like a nice alternative to pkarr, but I’ve had trouble bootstrapping GNUNet. My ISP doesn’t provide IPv6, and it’s also been failing to trigger uPnP on my NAT.
GNS is attractive, since it provides the same “public key as an
address” and “records stored on a DHT” features as pkarr; but it also
has nice features like aliasing and delegation. For example, I could
alias one of my zones to a name like chriswarbo, and put my
repos on its subdomains (e.g. chriswarbo-net.git.chriswarbo
for this site’s git repo), then do all the same _heads,
DNSLink, etc. as described in this post.
Of course, I could already set up such aliases for myself
using my various name resolvers; but GNS supports delegation,
where those aliases are also subdomains of my zone. For example,
resolving an address like chriswarbo-net.git.chriswarbo is
like asking “What am I calling chriswarbo? What does
chriswarbo call git? And what does
git.chriswarbo call chriswarbo-net?”.
GNS feels like a long-term solution to many of the problems with DNS, whilst pkarr feels like a neat hack. However, pkarr already works, so that’s what I’ll stick with for the time being!