Instant Preview: Some Raw Data

Attached file: latexTest-line.py. Remove .doc extension after downloading!

This is a follow-up to my earlier post on Instant Preview (IP) with LaTeX. After thinking about different IP solutions, I decided to run a little experiment. How can I measure the “instantness” of IP on the Mac, when compiling to either DVI or PDF output?

My first attempt involved a combination of a text editor, a script, and suitable DVI and PDF previewers. As text editor, I used TextWrangler, because it can be easily controlled via AppleScript, the Mac OS X native scripting language (actually, I should say "scripting infrastructure," as will be clear momentarily). In case you were wondering: yes, I tried using my beloved TextMate, and no, it could not be made to work because it does not expose an AppleScript interface. I then wrote a simple AppleScript that routinely asked TextWrangler if new text had been entered in the main document window. If so, the script asked TextWrangler to save the file, then launched the latex compiler to generate either a DVI or a PDF file, and finally asked the viewer app to refresh the document being displayed. In other words, I implemented a (very) poor man’s version of Flashmode.

As PDF viewer, the best choice for this experiment turned out to be TeXshop; it is scriptable and fast. Skim is also workable, but seems a bit slower, and requires setting "unsupported" defaults in order to refresh the current page in the background, without either asking for user confirmation or displaying an annoying (and time-consuming) progress bar. Finally, Apple’s on Preview app is not scriptable.

I was worried about DVI files. There is no native DVI previewer on the Mac, except for the venerable (and slow, not to mention for-pay) MacDviX. Skim will display DVI files, but it actually converts them to PDF files, which is obviously a problem if the whole point of the exercise is to measure DVI vs. PDF rendering speed. However, it turns out that Texlive, the TeX distribution upon which MacTeX is based, ships with the ancient but trusty xdvi. While xdvi runs under the X Windows system, and is a bit offensive to modern aesthetic sensibilities, it gets the job done–fast. Furthermore, getting xdvi to reload the current file can be achieved by sending it the SIGUSR1 Unix signal.

Long story short, my little AppleScript somehow worked, and clearly indicated that DVI is perceptibly faster than PDF as an output format for IP. While both xdvi and TeXshop lagged behind my typing, TeXshop lagged more (pretty much the same as with Flashmode, which is of course what one should expect). The question was how to quantify this difference. After discarding more "creative" solutions (I seriously considered filming myself while typing to a beat set by metronome!) I figured out that the best way to get some hard numbers was to simulate typing via a script. Now, AppleScript is Apple’s native scripting solution, but it just isn’t as powerful as, say, Python for anything even slightly more elaborate that sending a few method calls to a running application. So, Python it was.

One minor issue I had to deal with was how to control TeXshop from Python. The naive, slow way is to use os.system() or subprocess.Popen to launch the osascript command-line utility, passing the appropriate AppleScript instructions as argument. The smarter, much faster way is to use py-appscript. The only catch is that, to install the latter, you must have the GCC compiler installed, and that comes with the whole XCode environment, which is fairly big. Anyway, if you do have a development environment up and running, installing py-appscript is a simple matter. Then, you just import appscript in your Python code, and can send AppleScript instructions using nicely Pythonic syntax. And, again, things are quite a bit faster. As far as controlling xdvi is concerned, sending Unix signals from Python is a simple matter.

The Python script I used is attached to this post, in case you are interested (please remove the .doc extension: it is needed to keep WordPress happy). As described below, I ran a series of tests, each requiring slight modifications to the code; see the comments in the file for details. The basic idea is simple: the Python script "types" a line of text in the "middle" of a file, one character at a time; more precisely, for each integer n it creates a file containing

  1. a LaTeX preamble, the title and author of the "paper," and an initial fixed line of text;
  2. the first n characters of the line being "typed"; and
  3. in some tests, additional "body text," ranging from a few paragraphs to about 21 pages.

Each time a new file is created, the Python file compiles it to either DVI or PDF, then refreshes the viewer–just like my earlier Applescript did, except that, again, typing is simulated. The advantage is that I can easily time the execution of the script, using Python’s time module.

For each output format, I measured the time required to compile, but not display, the "typed" line (from the first to the last character), and then the time required to compile and display it. Furthermore, I experimented with different amounts of text after the "typed" line, so as to simulate IP in a fragment of text vs. in a medium-sized file. The tests were conducted on my Early 2008 iMac (a 3.06 GHz machine, codenamed iMac8,1 according to Apple’s convention). All measurements are taken from a single run of the script; I repeated the experiment several times, but found no significant difference. Without further ado, the results are shown in the following table.

Instant Preview: DVI vs PDF

To interpret the picture, “dvi/pdf, no body” means that there is no further text after the simulated “typed-in” line; “dvi/pdf, 1x body” means that there are a few paragraphs of subsequent text, consisting of a theorem declaration, a displayed equation, and some gibberish text; finally, “dvi/pdf, full body” means that the paragraphs just described are copied 50 times, so that the document is 22 pages long overall.

The results: I think there are four clear conclusions to be drawn from this little experiment. First, DVI files are measurably faster to render than PDF files; the time to display DVI output is negligible relative to the time it takes to actually generate it from LaTeX input. The same is certainly not true for PDF output. Now, this may in part be due to the fact that sending a SIGUSR1 signal to xdvi is faster than going through the Applescript infrastructure; I doubt that this is the most significant source of discrepancy, but in any event there is no other way to control TeXshop as far as I know (and for the time being).

Second, generating PDF files also takes longer. This exacerbates the differences in rendering times. For instance, without any additional body text, just generating the DVI files corresponding to the simulated typed line takes 11.26 seconds; when the output format is set to PDF, 13.98 seconds are needed—that is, 24% longer. If we include rendering, the time goes from 11.44 seconds to 15.13 seconds–an increase of 32%. If we include only a single copy of the body text, then just compiling takes 11.27 seconds for DVI and 15.18 for PDF, i.e. 35% longer; if we include rendering, the total time required is 11.65 seconds for DVI and 18.05 for PDF, i.e. 55% longer!

Third, when working with a medium-sized or large file, recompiling only the current page makes a difference. This can be seen by comparing the results for the "1x body" and "full body" experiments.

Fourth, and finally, PDF display is too slow to keep up with regular typing speed (as opposed to "hunt and peck" typing). To get an idea, consider the no-body test; the simulated typed-in line had 77 characters, and it took 15.13 seconds to "type" it all (i.e. compile and render the corresponding sequence of files). This translates to 77/15.13=5.1 characters per second. With 1x body text, "typing speed" goes down to 77/18.05=4.26 characters per second. You almost surely type faster than that. Try this test for instance (it’s fun!). I am no touch typist, but I got 7.48 characters per minute. If you are like me, or faster (which is likely), a naive implementation of IP that does what my simple Python script does on a machine no faster than my 3.06GHz iMac will lag behind you as you type. It won’t be quite so "instant." This is exactly my experience with Flashmode. In the case of DVI output, with 1x body text the Python script produces 77/11.65 = 6.6 characters per minute, which is not as fast as I type (according to the above-linked test) but not far behind either.

This test is only a starting point; it suggests where one can try to optimize processing. As already noted, focusing on the current page, rather than recompiling the entire file, makes a difference. But my Python script can be modified to experiment, for instance, with the format file trick described in my previous post. Furthermore, to output PDF files it is necessary to include, or embed, all required fonts; the pdftex manual suggests that this may be avoided, if the fonts are available system-wide. Perhaps that, too, saves time. In any case, more experimentation, followed by more serious hacking, is required.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s