跳到内容

As humans, we prefer to view and edit text files on a line-by-line basis. Once we think a line of text is long enough we hit “return” (in the text editor) to signal the end of that line. Behind the scenes, your text editor will interpret that as an instruction to add a newline character at the point where you decide to break a line.

However, if only it were that simple: the trouble is that different operating systems have varying notions of what constitutes a newline character. To make matters worse, Windows will treat newline characters differently depending on whether a file is opened in so-called binary mode or text mode. The result is that, depending on the host operating system, lines in a text file can be terminated by varying combinations of characters called carriage return (ASCII/Unicode character 13) and line feed (ASCII/Unicode character 10): denoted by \r and \n respectively.

Clearly, to be system-independent TeX needs a way to deal with the vaguries introduced through the different characters used to terminate a line within text files it needs to read and process.

TeX’s input buffer

You may, or may not, be surprised to learn that TeX engines (including LuaTeX and XeTeX) read input files a line at a time: they don’t read the entire text file into memory. Even though most text files processed by TeX engines are miniscule compared to the available memory on modern devices, each line in the file is individually read and stored in a small internal buffer. But, of course, TeX’s process of reading and storing a line has some additional twists.

“I’ll do it My Way”—TeX’s \endlinechar command

When TeX reads another line of text from an input file it performs two “housekeeping tasks”:

  • it removes any terminating newline characters (\r or \n) found at the end of the line—i.e., it strips out all line endings added when the text file was originally saved to disk;
  • it also removes all trailing space characters found at the end of the line.

These two processes happen before TeX actually starts to scan the characters contained in the line itself: think of them as a form of “housekeeping” in preparation for the next stage of processing (scanning). So, during this initial stage of the line-reading process TeX has stripped off all platform-dependent line endings (and any trailing whitespace): so how will TeX know (detect) where that line ends? TeX has one more “trick” up its sleeve: the \endlinechar command.

To avoid the problem of platform-dependent newline characters TeX introduces the concept of \endlinechar, a user-definable parameter that TeX uses to insert its own end-of-line character to the very end of a line of text it has just read from a file. Note again that this happens before TeX actually starts scanning the characters—it is the final step is TeX’s “housekeeping” before it is ready to start reading (scanning) the actual characters contained in the line.

TeX will use the value stored in \endlinechar to add its own end-of-line terminator if, and only if, \endlinechar is appropriately defined—in Knuth’s TeX that means it has to have a value that is >-1 and < 256. Typically, \endlinechar is assigned the value of 13: the carriage return character—usually denoted by \r within programming literature.

But if you write \endlinechar=-1 somewhere within your input then the next time TeX reads a line of text from a file it will not add any additional terminator to the end of a line. Consequently, your input will be treated as one long continuous string of text until you reset \endlinechar to an appropriate value—typically 13 (\r):

\endlinechar=13

One of TeX’s 16 category codes (value 5) is reserved to identify the “end of line” character which is usually the character that \endlinechar inserts—which is inserted if (and only if) the value of \endlinechar is set to an appropriate value.

Summary of end-of-line processing

Although these details are quite low-level they will be of interest to anyone who wants to explore writing macros which deal with reading lines of text.

  1. When TeX reads a line from your file it will first strip out all end-of-line characters (\r and \n) added by your text editor when the file was saved. In addition:
    • TeX also strips out any trailing space characters from the end of the line;
    • TeX does not remove trailing tab characters (ASCII character code 9).
    • Aside: One of LuaTeX’s source code files, the one which has code to perform this stripping of spaces, contains the following note:

      (Cited in the file luatex.c) “David Fuchs mentions that this [space] stripping was done to ensure portability of TeX documents given the padding with spaces on fixed-record "lines" on some systems of the time, e.g., IBM VM/CMS and OS/360.”

  2. After step (1) TeX adds (inserts) an additional character whose value is stored in \endlinechar (provided that is suitably defined: >-1 and < 256)
  3. \endlinechar is typically set to the value 13 (\r), which means that the character added in step (2) is usually character 13 (\r)—but, of course, you can set \endlinechar to another value to achieve special effects via macro programming.
  4. When its input scanning routines detect the character \r (character code 13) at the end of its internal buffer, TeX will, as usual, check its category code in order to decide what to do with it.
  5. Character 13 usually has category code value of 5 (“end of line”) unless, of course, its category code has been changed—some macros make the end-of-line character active in order to do sophisticated processing.
  6. Depending on TeX’s internal state (in effect, what it is doing) TeX can change that the end-of-line character (usually \r, category code 5) into a space character—this is how end of lines characters become spaces.
  7. Note too that TeX uses characters with category code 5 to detect when it has read an empty line and needs to generate a \par token.

The following graphic gives a visual summary of steps (1) and (2): stripping newline characters and trailing space characters and inserting \endlinechar ready for the task of scanning the input.

How TeX uses \endlinechar

Overleaf guides

LaTeX Basics

Mathematics

Figures and tables

References and Citations

Languages

Document structure

Formatting

Fonts

Presentations

Commands

Field specific

Class files

Advanced TeX/LaTeX