chriswarbo-net: fe4a57bc398a6073b316e5debfe1583d1ad5a895
1: ---
2: title: Deleting files
3: ---
4:
5: While trying to free up space on my NixOS installation, I ran into the following
6: situation.
7:
8: There is a directory containing `n` files, which we can list in alphabetical
9: order. Some files are in use, so can't be deleted, but we don't know which ones;
10: we want to delete all of the files which aren't in use.
11:
12: We can delete any set of files at once, but we pay a large constant cost. If any
13: file in the chosen set is in use, none will be deleted.
14:
15: What strategy can we follow which will delete all of the required files, using
16: the fewest delete operations?
17:
18: # Trivial Solutions #
19:
20: If we try to delete all of the files at once, we use the fewest number of calls;
21: but we fail to delete *any* files if there is one or more in use, so on its own
22: this isn't actually a solution.
23:
24: If we delete each file individually, we're guaranteed to remove all the required
25: files, but this fails to utilise any batching, giving us O(n) time.
26:
27: # Smarter approaches #
28:
29: One approach which leaps out as a programmer is a divide-and-conquer approach:
30: we try to delete the whole lot, and see if that works. If not, we split our list
31: of files into two halves and try again with those. This is elegant and
32: recursive.
33:
34: In the worst case, every other file is in use: no grouping of neighbouring files
35: will work, so we end up deleting each one individually, which takes O(n) time as
36: mentioned above, but we also have an O(log(n)) factor as we split up the list
37: into smaller and smaller pieces, giving a worst time complexity of O(n*log(n)).
38:
39: The best case is O(1), since it's the case where all files can be deleted, and
40: our first call does the lot.
41:
42: I think this is the best approach, but would love to know if there's something
43: better!
Generated by git2html.