My previous post dealt with how versioning and non-versioning apps interact under the new Mac OS X Lion release. I now turn to some low-level sleuthing. The motivation is, in a way, similar: I am interested in how command-line tools such as rsync may deal with versioning. Again, the bottom line is that these tools will do exactly what they did under Snow Leopard and earlier releases, but that it is likely going to be hard to get them to back up and sync file versions. I need to do further testing on this point; feel free to provide any details you may have.
Low-level details and storage space
Here is what I found. First, the state of any versioning-aware app (that is, whether or not a given file has been edited since a version was explicitly saved) is stored in
~/Library/Saved Application State, but the actual versions of files are in a separate directory structure off the root of the current volume, specifically
/.DocumentRevisions-V100. That directory looks like this on my system:
d--x--x--x 7 root wheel 238 Jul 18 22:39 . drwxr-xr-x 36 root wheel 1292 Jul 28 23:24 .. drwx------ 5 root wheel 170 Jul 28 23:27 .cs drw------- 2 root wheel 68 Jul 18 22:39 ChunkTemp d--x--x--x 3 root wheel 102 Jul 18 22:39 PerUID drwx------ 4 root wheel 136 Jul 28 23:27 db-V1 drwx--x--x 2 root wheel 68 Jul 18 22:39 staging
If you drill down the
PerUID directory (there is a further subdirectory for each user ID, or UID, and further subdirectories off of that), you will see every single version of every single versioned file. The actual file names are replaced with hexadecimal hash codes, but extensions are preserved. You can actually open these files directly, e.g. using
open file_name.ext from the command line. You may think this is very inefficient: imagine a very large Pages or TextEdit file, with versions differing only by a few characters. Does Lion actually save an entire copy of each version? This seems wasteful, and it is also not what version control systems typically do: rather, they save the “deltas,” or “patches” needed to go from one revision to the next. It turns out that Lion is smarter than this, but the details are a bit opaque.
First, as the above Ars Technica article explains, Lion keeps track of “file chunks”, and only actually saves those chunks that have changed; these are stored in binary blobs off the
.cs directory. Actually figuring out which chunks have changed is pretty challenging, and again Ars Technica provides hints and links illustrating the principles and heuristics at work. The whole system is pretty sophisticated, but then again, it has to be: versions are stored in your internal hard disk (indeed, your normal working “volume”); if you are editing a movie, you would not want to keep around 10 almost identical copies of a multi-gigabyte file, would you?
Now, you may rememeber that Time Machine does something similar to keep the size of backups under control. Again, Ars Technica comes to our aid. Instead of saving a copy of your entire hard disk each time it runs, Time Machine creates hard links to files and directories that have not changed since the last backup, and only copies files that have been modified. However, all this happens at the file level. Lion’s file versioning instead operates below the file level—it tracks and saves file chunks, which requires quite a bit more cleverness.
What’s intriguing is that the above does not explain why, when you look at files off the PerUID directory, you see what appear to be full copies of each version of any given file. For example, here’s what I get by issuing
ls -l /.DocumentRevisions-V100/PerUID/505/3/com.apple.documentVersions:
total 0 -r--r--r--@ 1 marciano staff 512 Aug 6 14:00 655BB90C-85A2-4F64-A9D0-9C469DABD56E.rtf -r--r--r--@ 1 marciano staff 324 Aug 5 17:29 A8DD2DAF-26F1-4CCA-8C9E-1811A379939A.rtf -r--r--r--@ 1 marciano staff 324 Aug 5 17:29 D8BFF539-F540-4EDA-A1A2-14E74C2CE5CA.rtf -r--r--r--@ 1 marciano staff 5057 Aug 6 14:02 DDA0C14D-DCC6-484F-BDB4-5F99A3923EE3.rtf
Again, you can open each of these files, and you will see the corresponding revision. The last one (dated August 6 at 14:02) is apparently 5057 bytes long. This matches what you see if you issue
ls -l in the directory where I keep the file itself:
total 16 -rw-r--r-- 1 marciano staff 5057 Aug 6 14:02 external.rtf
However, notice the first lines in the two displays. According to the
ls manual entry, this displays the number of 512-byte blocks actually used in the directory being listed. I assume that Lion saves files in 4096-byte bloks, so a 5057-byte file requires two such blocks—or exactly 16 512-byte blocks. But note that the versions directory requires zero 512-byte blocks, despite the fact that it seemingly contains 4 non-empty files! So, what gives?
Here’s what I think is going on: if any of you readers happen to know, please use the comments section :-) As you can see in this Wikipedia entry, each file in a modern file system is assocated with a so-called “inode.” This, in turn, contains (among other things) a reference to the actual physical location on the hard disk where the file content is stored, called the “inode pointer structure”. When the OS needs to write a file to disk, it figures out where to place it, then creates an inode structure to keep track of it. However, in principle, the OS could also create an inode pointer structure pointing to physical locations where other files are stored. So, in particular, Lion could be actually writing to disk the file chunks it tracks for a particular versioned file, then create an inode whose pointer structure points to these file chunks, and finally associating a file name with that inode. If this is what is going on, then files thus created would not take up any actual space, because they would merely be pointing to file chunks saved elsewhere, whose storage size is already accounted for. Indeed, it may well be that even the “original” file (
external.rtf in the above example) has an inode pointer structure pointing to tracked file chunks.
Whatever the mechanism, it is a fact that Lion’s versioning does not take up unnecessary space. To check this, I created a roughly 16MB rtf file containing only text lines. I then added one more line to it. In the above directory off
/.DocumentRevisions-V100/PerUID, I see both files, each 16MB in size. At this point, the Finder reported 131,297,953,160 bytes used. Then, I ran TextEdit and, using the Versions UI, deleted the second version. The finder now reported 131,298,051,464 bytes used; there is a slight increase in hard disk usage, most likely due to virtual memory, intermediate files, or whatever—but the key thing is that hard disk usage did not go down by roughly 16MB. Then, I deleted the original version, thus keeping only the file in the regular, visible user folder: the Finder reported 131,298,071,944 bytes used. Finally, I closed TextEdit and deleted the file in my own folder; Finder finally reported 131,281,700,232 bytes used, or about 16MB less. Bingo!
The bottom line is that you can use file versioning and still retain full control on disk usage: Lion works hard not to keep redundant information on disk, and you can always decide to delete old, unused versions, thereby saving space. However, there are some implications for Dropbox (and by implication rsync):
- first, if you keep your documents in the Dropbox folder, only the “current” revision is backed up: versions live outside the Dropbox folder, and simply do not get picked up.
- second, if you think that you can address this issue by aliasing the
DocumentRevisions-V100directory, think again. Yes, it may work, but it’s a bad idea. Most likely, Dropbox would not be able to figure out that files off the
PerUIDdirectory occupy no space, and would instead create full copies of these files—a waste of useful (and paid-for) storage space. And that’s assuming this works at all—I haven’t tried and do not plan to!
One thing: there is some (apparent) space used in the
.cs/ChunkStorage directory. Maybe this gets cleaned up occasionally—I don’t know. There is one file,
.../ChunkStorage/0/0/0/1, which contains chunks; it grew much bigger after the above exercise, even though in the end I deleted the file.
A caveat, and a trick.
In the course of my investigations, I deleted the entire
./DocumentRevisions-V100 directory and its contents. Bad idea! I thought that these would be regenerated upon first saving a version of a document, but this is not the case. Actually, versioning apps will simply refuse to save! The aforelinked site krypted.com suggests recreating the
DocumentRevisions-V100 directory structure; this does enable saving and versioning, but not file chunk tracking. In fact, none of the files under
db-v1, for instance, was recreated upon saving in a versioned app, although version files off of
PerUID were created. What worked for me was using Disk Utility to repair the disk (not just the permissions: the entire disk). You need to run Disk Utility from either the recovery partition (hold CMD+R down while you restart) or your installation DVD, if you created one using the procedure described elsewhere on the internet.
Finally, a little trick:
sudo -s gives you a root shell, without the need to actually enable the root user. Cool, and very useful if you, for instance, need to navigate a directory structure that is not accessible to regular users, such as