LaTeX plugin update: new build command, and parsing TeX log files

Once again, I haven’t posted anything in a while… I have been pretty busy at work lately: a couple of trips and PhD admissions took up most of my time. However, I have also made some progress with the Sublime Text 2 (ST2) LaTeX plugin. You can take a look at the current GitHub repository; there have been a few blog-worthy additions to the master branch, but the work I am most excited about has been in the make-pdf branch. Check that out (literally!) and look for yourself if you feel adventurous; however, note that things may be a bit broken, especially on Windows (my main work and dev platform is OSX). In particular, there is (as of today) no keybinding for the new make facility on Windows: take a look at Default (OSX).sublime-keymap to see how to add this on a PC. But, if you are not adventurous enough, read on 🙂

In the master branch, turning your tex file into a PDF file is achieved by using the ST2-standard build framework to invoke the latexmk command on OSX, and the texify command on Windows. In turn, these commands are instructed to launch a viewer. The output from these commands, including any error or warning messages, are sent by ST2 to the so-called "exec panel", i.e. a display area that pops up in the (roughly) bottom third or fourth of your ST2 window, and can be dismissed by pressing ESC if the user so wishes. Furthermore, ST2 provides a "goto next error" command (bound to F4 by default); to configure it, one must provide a regex (regular expression) that captures the file name and line number (and, optionally, column number) where the error occurred.

Using the standard build facility has the advantage of being more, er, Sublime–that is, in line with the philosophy, look, and feel of the ST2 editor. That’s a big plus. However, there are two important minuses. First, the regex approach for error detection works very well with languages, such as C or Java, that produce well-structured output and logs (where, by "well-structured," I mean "sane"). This does not include TeX–by a long shot. I provide some details below, as that’s the main focus of this post, and of my work in the make-pdf branch. The other minus is the fact that, while latexmk and texify can invoke a viewer, it is not possible to instruct said viewers to jump to the current line; the reason is that the build system is controlled by a static configuration file, and has no access to things like the current view (at least as far as I can tell, and as of build 2055). However, the ST2 developer has provided the ability to link the build command to any arbitrary Python script (or, more precisely, any ST2 "command"). This makes it possible to overcome any limitation of the default system. The make-pdf branch of the LaTeX plugin uses just this facility.

Specifically, I implemented a heavily customized version of the ST2 exec command, which is invoked whenever a standard build is initiated. The code is in, and the make_pdf command it defies is currently bound to ctrl+alt+t, the key binding for the old "texify" command in the version 1 plugin (again, this is actually only true for OSX as of today). The overall structure is relatively simple: the system-dependent command line is set up, then invoked in an instance of a worker thread called CmdThread, to avoid blocking. After the compilation command terminates, the log file is parsed (again in the thread), and the results are displayed. Since all output must happen on the main thread, the CmdThread thread issues display commands (basically, adding text to the exec panel) using set_timeout, using callbacks in the make_pdfCommand class. When this is done, the jump_to_pdf command is invoked, which causes the viewer to open the just-compiled PDF file (assuming all went well) and go to the line corresponding to the current cursor position. Notice that right now there is no error checking: the next order of business is to at least check for compilation errors before the viewer is invoked.

The piece de resistance of the make_pdf command, however, is the parseTeXlog function. It, well, parses the TeX log file. This presents five distinct difficulties:

  1. TeX errors are reported over several lines: the first (starting with an exclamation mark) states the type of error, then a few lines of surrounding text are reported, and finally the line where the error occurred is actually indicated. It is possible to force tex to use a file:line:error format like C or Java; however…
  2. …each macro package, beginning with LaTeX, reports warnings in a different format that cannot be forced into file-line-error mode. Thus, you need different regexes to match errors and warnings. I guess in principle one could try to use a single, humongous regex, but I am not sure one could cover all cases, and it would be a nightmare to write and maintain. I live by JWZ’s famous quote on regexes. In any case…
  3. …a regex that catches warnings would not be able to detect which file the warning occurred in—only the line and the warning message. This poses a problem for multi-file documents, which I really want to support (hello book co-authors!). This requires really understanding what the TeX log file is telling you at any given line. In particular, whenever TeX processes a new file (be it a source file, a style file, a class file, or anything else), this is reported on the log file; when processing of that particular file is complete, this, too is indicated. But, to make matters worse…
  4. …TeX "helpfully" breaks lines longer than 79 characters. Why is this a problem? Because certain style and macro files live deep in the texmf tree, so that their full path name easily exceeds 79 charactes. And because, if a user’s account name is RichardJamesDrofnatsTheThird, and he is working on a file called UnexplicableImplicationsOfUnrealisticAssumptionsInEconomicTheory.tex in a directory called CogitationsOfAnUnrepentantEconomist off of his Documents folder, well, that file name will be truncated (as far as I can tell, TeX always displays the full path name of the master file being compiled). And, finally…
  5. …there is no way to know whether a line was truncated or not. You can guess, and you can use heuristics, but there is no specific marker to tell you what TeX actually did with your log file. Fun!

So, here’s what I did. To make sense of this mess, one must start with the last problem: how to reassemble long lines. Again, there is no way to know for sure whether a 79-character line is the beginning of, say, a 90-character line, or if it is a line that just happens to be exactly 79 characters long (In case you were wondering, yes, I have found quite a few such lines in actual log files!). However, there are certain empirical regularities, which I exploit. The code in parseTeXlog joins a 79-character line with the line immediately following it, except under certain conditions. For example, many tex packages output an identification string of the form Package: geometry.sty... as soon as they are loaded. So, if a 79-character line is followed by a line that starts with Package:, I don’t join the two lines. There are other exceptions and heuristics: take a look at the file.

To be sure, I am not certain that I have covered all the cases. However, so far, every log file I have processed has been successfully parsed, and (trust me!) I have processed a lot of files. But, if you find something that breaks the parsing logic, I definitely want to know—thanks!

The next issue is to keep track of the file currently being processed. TeX reports it by printing a ( followed by the path name. Usually, the parenthesis is the first character in the line… but when a macro file loads another macro file, this may not be the case. So, you need logic for that. Whenever I see a new file being processed, I add it to a stack (the files list) and print its name (suitably indented) for debugging purposes (the user does not see this, unless she switches to the Python console).

When TeX is done processing a file, it just prints a closing parenthesis: ). This usually occurs on a line by itself, but for files imported by other packages this may not be the case. Special logic is needed to handle such cases. When I can be reasonably sure that a given file has been fully processed, I "pop" (i.e. remove) it from the stack.

The net result is that, when an error message is encountered, it must refer to the file that is currently at the top of the stack. What’s nice about this approach is that I do not need to know anything about the file being compiled: all the information I need is contained in the log file.

When all this is done, extracting error and warning information is actually easy. I then convert the information into file-line-error format, and tell ST2 to look for errors in that particular guise. This way, the user can hit F4 to navigate through errors. Or, the user can double-click on an error, and—regardless of the file it’s in—ST2 brings up the offending line. Pretty cool! This is actually better than TextMate’s error reporting, I think!

There is still a lot to do. First, as I noted above, I think that the current parsing logic is fairly robust, but it is still full of heuristics and hacks. If you find any problem or issue, please let me know! Second, I need to robustify the support for multi-file documents: there will be a mechanism to invoke compilation of the master file even if the user is editing an included file. Furthermore, there will be logic to only display the PDF output if compilation was successful.

More urgently though, I need to run some tests on Windows. When that is done, I will merge the make-pdf branch with the master branch and make this code the default build system. Comments welcome!


One response to “LaTeX plugin update: new build command, and parsing TeX log files

  1. Hi Marciano,
    are you aware of the fact that if the path of the source file contains accented characters (for example), the viewer is not invoked?

    Furthermore: any hints on how to configure things in order to typeset Plain TeX files (on Mac OS X)?



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s