Once again, I haven’t posted anything in a while… I have been pretty busy at work lately: a couple of trips and PhD admissions took up most of my time. However, I have also made some progress with the Sublime Text 2 (ST2) LaTeX plugin. You can take a look at the current GitHub repository; there have been a few blog-worthy additions to the
master branch, but the work I am most excited about has been in the
make-pdf branch. Check that out (literally!) and look for yourself if you feel adventurous; however, note that things may be a bit broken, especially on Windows (my main work and dev platform is OSX). In particular, there is (as of today) no keybinding for the new make facility on Windows: take a look at
Default (OSX).sublime-keymap to see how to add this on a PC. But, if you are not adventurous enough, read on
In the master branch, turning your tex file into a PDF file is achieved by using the ST2-standard build framework to invoke the
latexmk command on OSX, and the
texify command on Windows. In turn, these commands are instructed to launch a viewer. The output from these commands, including any error or warning messages, are sent by ST2 to the so-called "exec panel", i.e. a display area that pops up in the (roughly) bottom third or fourth of your ST2 window, and can be dismissed by pressing ESC if the user so wishes. Furthermore, ST2 provides a "goto next error" command (bound to F4 by default); to configure it, one must provide a regex (regular expression) that captures the file name and line number (and, optionally, column number) where the error occurred.
Using the standard build facility has the advantage of being more, er, Sublime–that is, in line with the philosophy, look, and feel of the ST2 editor. That’s a big plus. However, there are two important minuses. First, the regex approach for error detection works very well with languages, such as C or Java, that produce well-structured output and logs (where, by "well-structured," I mean "sane"). This does not include TeX–by a long shot. I provide some details below, as that’s the main focus of this post, and of my work in the
make-pdf branch. The other minus is the fact that, while
texify can invoke a viewer, it is not possible to instruct said viewers to jump to the current line; the reason is that the build system is controlled by a static configuration file, and has no access to things like the current view (at least as far as I can tell, and as of build 2055). However, the ST2 developer has provided the ability to link the build command to any arbitrary Python script (or, more precisely, any ST2 "command"). This makes it possible to overcome any limitation of the default system. The
make-pdf branch of the LaTeX plugin uses just this facility.
Specifically, I implemented a heavily customized version of the ST2
exec command, which is invoked whenever a standard build is initiated. The code is in
makePDF.py, and the
make_pdf command it defies is currently bound to
ctrl+alt+t, the key binding for the old "texify" command in the version 1 plugin (again, this is actually only true for OSX as of today). The overall structure is relatively simple: the system-dependent command line is set up, then invoked in an instance of a worker thread called
CmdThread, to avoid blocking. After the compilation command terminates, the log file is parsed (again in the thread), and the results are displayed. Since all output must happen on the main thread, the
CmdThread thread issues display commands (basically, adding text to the exec panel) using
set_timeout, using callbacks in the
make_pdfCommand class. When this is done, the
jump_to_pdf command is invoked, which causes the viewer to open the just-compiled PDF file (assuming all went well) and go to the line corresponding to the current cursor position. Notice that right now there is no error checking: the next order of business is to at least check for compilation errors before the viewer is invoked.
The piece de resistance of the
make_pdf command, however, is the
parseTeXlog function. It, well, parses the TeX log file. This presents five distinct difficulties:
- TeX errors are reported over several lines: the first (starting with an exclamation mark) states the type of error, then a few lines of surrounding text are reported, and finally the line where the error occurred is actually indicated. It is possible to force tex to use a file:line:error format like C or Java; however…
- …each macro package, beginning with LaTeX, reports warnings in a different format that cannot be forced into file-line-error mode. Thus, you need different regexes to match errors and warnings. I guess in principle one could try to use a single, humongous regex, but I am not sure one could cover all cases, and it would be a nightmare to write and maintain. I live by JWZ’s famous quote on regexes. In any case…
- …a regex that catches warnings would not be able to detect which file the warning occurred in—only the line and the warning message. This poses a problem for multi-file documents, which I really want to support (hello book co-authors!). This requires really understanding what the TeX log file is telling you at any given line. In particular, whenever TeX processes a new file (be it a source file, a style file, a class file, or anything else), this is reported on the log file; when processing of that particular file is complete, this, too is indicated. But, to make matters worse…
- …TeX "helpfully" breaks lines longer than 79 characters. Why is this a problem? Because certain style and macro files live deep in the
texmftree, so that their full path name easily exceeds 79 charactes. And because, if a user’s account name is
RichardJamesDrofnatsTheThird, and he is working on a file called
UnexplicableImplicationsOfUnrealisticAssumptionsInEconomicTheory.texin a directory called
CogitationsOfAnUnrepentantEconomistoff of his
Documentsfolder, well, that file name will be truncated (as far as I can tell, TeX always displays the full path name of the master file being compiled). And, finally…
- …there is no way to know whether a line was truncated or not. You can guess, and you can use heuristics, but there is no specific marker to tell you what TeX actually did with your log file. Fun!
So, here’s what I did. To make sense of this mess, one must start with the last problem: how to reassemble long lines. Again, there is no way to know for sure whether a 79-character line is the beginning of, say, a 90-character line, or if it is a line that just happens to be exactly 79 characters long (In case you were wondering, yes, I have found quite a few such lines in actual log files!). However, there are certain empirical regularities, which I exploit. The code in
parseTeXlog joins a 79-character line with the line immediately following it, except under certain conditions. For example, many tex packages output an identification string of the form
Package: geometry.sty... as soon as they are loaded. So, if a 79-character line is followed by a line that starts with
Package:, I don’t join the two lines. There are other exceptions and heuristics: take a look at the file.
To be sure, I am not certain that I have covered all the cases. However, so far, every log file I have processed has been successfully parsed, and (trust me!) I have processed a lot of files. But, if you find something that breaks the parsing logic, I definitely want to know—thanks!
The next issue is to keep track of the file currently being processed. TeX reports it by printing a
( followed by the path name. Usually, the parenthesis is the first character in the line… but when a macro file loads another macro file, this may not be the case. So, you need logic for that. Whenever I see a new file being processed, I add it to a stack (the
files list) and print its name (suitably indented) for debugging purposes (the user does not see this, unless she switches to the Python console).
When TeX is done processing a file, it just prints a closing parenthesis:
). This usually occurs on a line by itself, but for files imported by other packages this may not be the case. Special logic is needed to handle such cases. When I can be reasonably sure that a given file has been fully processed, I "pop" (i.e. remove) it from the stack.
The net result is that, when an error message is encountered, it must refer to the file that is currently at the top of the stack. What’s nice about this approach is that I do not need to know anything about the file being compiled: all the information I need is contained in the log file.
When all this is done, extracting error and warning information is actually easy. I then convert the information into file-line-error format, and tell ST2 to look for errors in that particular guise. This way, the user can hit F4 to navigate through errors. Or, the user can double-click on an error, and—regardless of the file it’s in—ST2 brings up the offending line. Pretty cool! This is actually better than TextMate’s error reporting, I think!
There is still a lot to do. First, as I noted above, I think that the current parsing logic is fairly robust, but it is still full of heuristics and hacks. If you find any problem or issue, please let me know! Second, I need to robustify the support for multi-file documents: there will be a mechanism to invoke compilation of the master file even if the user is editing an included file. Furthermore, there will be logic to only display the PDF output if compilation was successful.
More urgently though, I need to run some tests on Windows. When that is done, I will merge the
make-pdf branch with the master branch and make this code the default build system. Comments welcome!