% This program is copyright (C) 1982 by D. E. Knuth; all rights are reserved. % Copying of this file is authorized only if (1) you are D. E. Knuth, or if % (2) you make absolutely no changes to your copy. (The WEB system provides % for alterations via an auxiliary file; the master file should stay intact.) % See Appendix H of the WEB manual for hints on how to install this program. % And see Appendix A of the TRIP manual for details about how to validate it. % TeX is a trademark of the American Mathematical Society. % METAFONT is a trademark of Addison-Wesley Publishing Company. % Version 0 was released in September 1982 after it passed a variety of tests. % Version 1 was released in November 1983 after thorough testing. % Version 1.1 fixed ``disappearing font identifiers'' et alia (July 1984). % Version 1.2 allowed `0' in response to an error, et alia (October 1984). % Version 1.3 made memory allocation more flexible and local (November 1984). % Version 1.4 fixed accents right after line breaks, et alia (April 1985). % Version 1.5 fixed \the\toks after other expansion in \edefs (August 1985). % Version 2.0 (almost identical to 1.5) corresponds to "Volume B" (April 1986). % Version 2.1 corrected anomalies in discretionary breaks (January 1987). % Version 2.2 corrected "(Please type...)" with null \endlinechar (April 1987). % Version 2.3 avoided incomplete page in premature termination (August 1987). % Version 2.4 fixed \noaligned rules in indented displays (August 1987). % Version 2.5 saved cur_order when expanding tokens (September 1987). % Version 2.6 added 10sp slop when shipping leaders (November 1987). % Version 2.7 improved rounding of negative-width characters (November 1987). % Version 2.8 fixed weird bug if no \patterns are used (December 1987). % Version 2.9 made \csname\endcsname's "relax" local (December 1987). % Version 2.91 fixed \outer\def\a0{}\a\a bug (April 1988). % Version 2.92 fixed \patterns, also file names with complex macros (May 1988). % Version 2.93 fixed negative halving in allocator when mem_min<0 (June 1988). % Version 2.94 kept open_log_file from calling fatal_error (November 1988). % Version 2.95 solved that problem a better way (December 1988). % Version 2.96 corrected bug in "Infinite shrinkage" recovery (January 1989). % Version 2.97 corrected blunder in creating 2.95 (February 1989). % Version 2.98 omitted save_for_after at outer level (March 1989). % Version 2.99 caught $$\begingroup\halign..$$ (June 1989). % Version 2.991 caught .5\ifdim.6... (June 1989). % Version 2.992 introduced major changes for 8-bit extensions (September 1989). % Version 2.993 fixed a save_stack synchronization bug et alia (December 1989). % Version 3.0 fixed unusual displays; was more \output robust (March 1990). % Version 3.1 fixed nullfont, disabled \write{\the\prevgraf} (September 1990). % Version 3.14 fixed unprintable font names and corrected typos (March 1991). % Version 3.141 more of same; reconstituted ligatures better (March 1992). % Version 3.1415 preserved nonexplicit kerns, tidied up (February 1993). % Version 3.14159 allowed fontmemsize to change; bulletproofing (March 1995). % Version 3.141592 fixed \xleaders, glueset, weird alignments (December 2002). % Version 3.1415926 was a general cleanup with minor fixes (February 2008). % Version 3.14159265 was similar (January 2014). % A reward of $327.68 will be paid to the first finder of any remaining bug. % Although considerable effort has been expended to make the TeX program % correct and reliable, no warranty is implied; the author disclaims any % obligation or liability for damages, including but not limited to % special, indirect, or consequential damages arising out of or in % connection with the use or performance of this software. This work has % been a ``labor of love'' and the author hopes that users enjoy it. % Here is TeX material that gets inserted after \input webmac \def\hang{\hangindent 3em\noindent\ignorespaces} \def\hangg#1 {\hang\hbox{#1 }} \def\textindent#1{\hangindent2.5em\noindent\hbox to2.5em{\hss#1 }\ignorespaces} \font\ninerm=cmr9 \let\mc=\ninerm % medium caps for names like SAIL \def\PASCAL{Pascal} \def\ph{\hbox{Pascal-H}} \def\pct!{{\char`\%}} % percent sign in ordinary text \font\logo=logo10 % font used for the METAFONT logo \def\MF{{\logo META}\-{\logo FONT}} \def\<#1>{$\langle#1\rangle$} \def\section{\mathhexbox278} \def\(#1){} % this is used to make section names sort themselves better \def\9#1{} % this is used for sort keys in the index via @@:sort key}{entry@@> \let\?=\relax % we want to be able to \write a \? \def\title{\TeX82} \def\topofcontents{\hsize 5.5in \vglue 0pt plus 1fil minus 1.5in \def\?##1]{\hbox to 1in{\hfil##1.\ }} } \def\botofcontents{\vskip 0pt plus 1fil minus 1.5in} \pageno=3 \def\glob{13} % this should be the section number of "<Global...>" \def\gglob{20, 26} % this should be the next two sections of "<Global...>" \def\.#1{\leavevmode\hbox{\tentex % typewriter type for strings \let\\=\BS % backslash in a string \let\'=\RQ % right quote in a string \let\`=\LQ % left quote in a string \let\{=\LB % left brace in a string \let\}=\RB % right brace in a string \let\~=\TL % tilde in a string \let\ =\SP % space in a string \let\_=\UL % underline in a string \let\&=\AM % ampersand in a string #1\kern.05em}} \def\^{\ifmmode\mathchar"222 \else\char`^ \fi} % pointer or hat \def\LQ{{\tt\char'22}} % left quote in a string \def\RQ{{\tt\char'23}} % right quote in a string \def\dotdot{\mathrel{.\,.}} % double dot, used only in math mode @s dotdot TeX @* Introduction. This is \TeX, a document compiler intended to produce typesetting of high quality. The \PASCAL\ program that follows is the definition of \TeX82, a standard @:PASCAL}{\PASCAL@> @!@:TeX82}{\TeX82@> version of \TeX\ that is designed to be highly portable so that identical output will be obtainable on a great variety of computers. The main purpose of the following program is to explain the algorithms of \TeX\ as clearly as possible. As a result, the program will not necessarily be very efficient when a particular \PASCAL\ compiler has translated it into a particular machine language. However, the program has been written so that it can be tuned to run efficiently in a wide variety of operating environments by making comparatively few changes. Such flexibility is possible because the documentation that follows is written in the \.{WEB} language, which is at a higher level than \PASCAL; the preprocessing step that converts \.{WEB} to \PASCAL\ is able to introduce most of the necessary refinements. Semi-automatic translation to other languages is also feasible, because the program below does not make extensive use of features that are peculiar to \PASCAL. A large piece of software like \TeX\ has inherent complexity that cannot be reduced below a certain level of difficulty, although each individual part is fairly simple by itself. The \.{WEB} language is intended to make the algorithms as readable as possible, by reflecting the way the individual program pieces fit together and by providing the cross-references that connect different parts. Detailed comments about what is going on, and about why things were done in certain ways, have been liberally sprinkled throughout the program. These comments explain features of the implementation, but they rarely attempt to explain the \TeX\ language itself, since the reader is supposed to be familiar with {\sl The \TeX book}. @.WEB@> @:TeXbook}{\sl The \TeX book@> @ The present implementation has a long ancestry, beginning in the summer of~1977, when Michael~F. Plass and Frank~M. Liang designed and coded a prototype @^Plass, Michael Frederick@> @^Liang, Franklin Mark@> @^Knuth, Donald Ervin@> based on some specifications that the author had made in May of that year. This original proto\TeX\ included macro definitions and elementary manipulations on boxes and glue, but it did not have line-breaking, page-breaking, mathematical formulas, alignment routines, error recovery, or the present semantic nest; furthermore, it used character lists instead of token lists, so that a control sequence like \.{\\halign} was represented by a list of seven characters. A complete version of \TeX\ was designed and coded by the author in late 1977 and early 1978; that program, like its prototype, was written in the {\mc SAIL} language, for which an excellent debugging system was available. Preliminary plans to convert the {\mc SAIL} code into a form somewhat like the present ``web'' were developed by Luis Trabb~Pardo and @^Trabb Pardo, Luis Isidoro@> the author at the beginning of 1979, and a complete implementation was created by Ignacio~A. Zabala in 1979 and 1980. The \TeX82 program, which @^Zabala Salelles, Ignacio Andr\'es@> was written by the author during the latter part of 1981 and the early part of 1982, also incorporates ideas from the 1979 implementation of @^Guibas, Leonidas Ioannis@> @^Sedgewick, Robert@> @^Wyatt, Douglas Kirk@> \TeX\ in {\mc MESA} that was written by Leonidas Guibas, Robert Sedgewick, and Douglas Wyatt at the Xerox Palo Alto Research Center. Several hundred refinements were introduced into \TeX82 based on the experiences gained with the original implementations, so that essentially every part of the system has been substantially improved. After the appearance of ``Version 0'' in September 1982, this program benefited greatly from the comments of many other people, notably David~R. Fuchs and Howard~W. Trickey. A final revision in September 1989 extended the input character set to eight-bit codes and introduced the ability to hyphenate words from different languages, based on some ideas of Michael~J. Ferguson. @^Fuchs, David Raymond@> @^Trickey, Howard Wellington@> @^Ferguson, Michael John@> No doubt there still is plenty of room for improvement, but the author is firmly committed to keeping \TeX82 ``frozen'' from now on; stability and reliability are to be its main virtues. On the other hand, the \.{WEB} description can be extended without changing the core of \TeX82 itself, and the program has been designed so that such extensions are not extremely difficult to make. The |banner| string defined here should be changed whenever \TeX\ undergoes any modifications, so that it will be clear which version of \TeX\ might be the guilty party when a problem arises. @^extensions to \TeX@> @^system dependencies@> If this program is changed, the resulting system should not be called `\TeX'; the official name `\TeX' by itself is reserved for software systems that are fully compatible with each other. A special test suite called the ``\.{TRIP} test'' is available for helping to determine whether a particular implementation deserves to be known as `\TeX' [cf.~Stanford Computer Science report CS1027, November 1984]. @d banner "This is TeX, Version 3.14159265 (HINT)" /*printed when \TeX\ starts*/ @ Different \PASCAL s have slightly different conventions, and the present @!@:PASCAL H}{\ph@> program expresses \TeX\ in terms of the \PASCAL\ that was available to the author in 1982. Constructions that apply to this particular compiler, which we shall call \ph, should help the reader see how to make an appropriate interface for other systems if necessary. (\ph\ is Charles Hedrick's modification of a compiler @^Hedrick, Charles Locke@> for the DECsystem-10 that was originally developed at the University of Hamburg; cf.\ {\sl SOFTWARE---Practice \AM\ Experience \bf6} (1976), 29--42. The \TeX\ program below is intended to be adaptable, without extensive changes, to most other versions of \PASCAL, so it does not fully use the admirable features of \ph. Indeed, a conscious effort has been made here to avoid using several idiosyncratic features of standard \PASCAL\ itself, so that most of the code can be translated mechanically into other high-level languages. For example, the `\&{with}' and `\\{new}' features are not used, nor are pointer types, set types, or enumerated scalar types; there are no `\&{var}' parameters, except in the case of files; there are no tag fields on variant records; there are no assignments |double=int|; no procedures are declared local to other procedures.) The portions of this program that involve system-dependent code, where changes might be necessary because of differences between \PASCAL\ compilers and/or differences between operating systems, can be identified by looking at the sections whose numbers are listed under `system dependencies' in the index. Furthermore, the index entries for `dirty \PASCAL' list all places where the restrictions of \PASCAL\ have not been followed perfectly, for one reason or another. @!@^system dependencies@> @!@^dirty \PASCAL@> Incidentally, \PASCAL's standard |round| function can be problematical, because it disagrees with the IEEE floating-point standard. Many implementors have therefore chosen to substitute their own home-grown rounding procedure. @ The program begins with a normal \PASCAL\ program heading, whose components will be filled in later, using the conventions of \.{WEB}. @.WEB@> For example, the portion of the program called `\X\glob:Global variables\X' below will be replaced by a sequence of variable declarations that starts in $\section\glob$ of this documentation. In this way, we are able to define each individual global variable when we are prepared to understand what it means; we do not have to define all of the globals at once. Cross references in $\section\glob$, where it says ``See also sections \gglob, \dots,'' also make it possible to look at the set of all global variables, if desired. Similar remarks apply to the other portions of the program heading. Actually the heading shown here is not quite normal: The || line does not mention any |output| file, because \ph\ would ask the \TeX\ user to specify a file name if |output| were specified here. @:PASCAL H}{\ph@> @^system dependencies@> @f type true /*but `|type|' will not be treated as a reserved word*/ @p@t\4@>@<Compiler directives@>@; /*all file names are defined dynamically*/ @<Labels in the outer block@>@; @<Constants in the outer block@>@; @<Types in the outer block@>@; @<Global variables@>@; @# void initialize(void) /*this procedure gets things started properly*/ {@+@<Local variables for initialization@>@; @<Initialize whatever \TeX\ might access@>; } @# @t\4@>@<Basic printing procedures@>@; @t\4@>@<Error handling procedures@>@; @ The overall \TeX\ program begins with the heading just shown, after which comes a bunch of procedure declarations and function declarations. Finally we will get to the main program, which begins with the comment `|start_here|'. If you want to skip down to the main program now, you can look up `|start_here|' in the index. But the author suggests that the best way to understand this program is to follow pretty much the order of \TeX's components as they appear in the \.{WEB} description you are now reading, since the present ordering is intended to combine the advantages of the ``bottom up'' and ``top down'' approaches to the problem of understanding a somewhat complicated system. @ Three labels must be declared in the main program, so we give them symbolic names. @<Labels in the out...@>= @t\hskip-2pt@>@t\hskip-2pt@>@, /*key control points*/ @ Some of the code below is intended to be used only when diagnosing the strange behavior that sometimes occurs when \TeX\ is being installed or when system wizards are fooling around with \TeX\ without quite knowing what they are doing. Such code will not normally be compiled; it is delimited by the codewords `$|debug|\ldots|debug|$', with apologies to people who wish to preserve the purity of English. Similarly, there is some conditional code delimited by `$|stat|\ldots|tats|$' that is intended for use when statistics are to be kept about \TeX's memory usage. The |stat| $\ldots$ |tats| code also implements diagnostic information for \.{\\tracingparagraphs} and \.{\\tracingpages}. @^debugging@> @ This program has two important variations: (1) There is a long and slow version called \.{INITEX}, which does the extra calculations needed to @.INITEX@> initialize \TeX's internal tables; and (2)~there is a shorter and faster production version, which cuts the initialization to a bare minimum. Parts of the program that are needed in (1) but not in (2) are delimited by the codewords `$|init|\ldots|tini|$'. @<Initialize whatever...@>= @<Set initial values of key variables@>@; #ifdef @!INIT @<Initialize table entries (done by \.{INITEX} only)@>@; #endif @ If the first character of a \PASCAL\ comment is a dollar sign, \ph\ treats the comment as a list of ``compiler directives'' that will affect the translation of this program into machine language. The directives shown below specify full checking and inclusion of the \PASCAL\ debugger when \TeX\ is being debugged, but they cause range checking and other redundant code to be eliminated when the production system is being generated. Arithmetic overflow will be detected in all cases. @:PASCAL H}{\ph@> @^system dependencies@> @^overflow in arithmetic@> @s int8_t int @s uint8_t int @s uint16_t int @s halfword int @s in TeX @s line normal @s to do @<Compiler directives@>= #include <stdint.h> #include <stdbool.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <math.h> #define odd(X) ((X)&1) #define chr(X) ((unsigned char)(X)) #define ord(X) ((int)(X)) #if __SIZEOF_FLOAT__==4 typedef float float32_t; #else #error float type must have size 4 #endif @h @ This \TeX\ implementation conforms to the rules of the {\sl Pascal User @:PASCAL}{\PASCAL@> @^system dependencies@> Manual} published by Jensen and Wirth in 1975, except where system-dependent @^Wirth, Niklaus@> @^Jensen, Kathleen@> code is necessary to make a useful system program, and except in another respect where such conformity would unnecessarily obscure the meaning and clutter up the code: We assume that |case| statements may include a default case that applies if no matching label is found. Thus, we shall use constructions like $$\vbox{\halign{\ignorespaces#\hfil\cr |case x|\cr 1: $\langle\,$code for $x=1\,\rangle$;\cr 3: $\langle\,$code for $x=3\,\rangle$;\cr |default:| $\langle\,$code for |x!=1| and |x!=3|$\,\rangle$\cr |} |\cr}}$$ since most \PASCAL\ compilers have plugged this hole in the language by incorporating some sort of default mechanism. For example, the \ph\ compiler allows `|default:|:' as a default label, and other \PASCAL s allow syntaxes like `\&{else}' or `\&{otherwise}' or `\\{otherwise}:', etc. The definitions of |default:| and |} | should be changed to agree with local conventions. Note that no semicolon appears before |} | in this program, so the definition of |} | should include a semicolon if the compiler wants one. (Of course, if no default mechanism is available, the |case| statements of \TeX\ will have to be laboriously extended by listing all remaining cases. People who are stuck with such \PASCAL s have, in fact, done this, successfully but not happily!) @:PASCAL H}{\ph@> @ The following parameters can be changed at compile time to extend or reduce \TeX's capacity. They may have different values in \.{INITEX} and in production versions of \TeX. @.INITEX@> @^system dependencies@> @<Constants...@>= enum {@+@!mem_max=30000@+}; /*greatest index in \TeX's internal |mem| array; must be strictly less than |max_halfword|; must be equal to |mem_top| in \.{INITEX}, otherwise | >= mem_top|*/ enum {@+@!mem_min=0@+}; /*smallest index in \TeX's internal |mem| array; must be |min_halfword| or more; must be equal to |mem_bot| in \.{INITEX}, otherwise | <= mem_bot|*/ enum {@+@!buf_size=500@+}; /*maximum number of characters simultaneously present in current lines of open files and in control sequences between \.{\\csname} and \.{\\endcsname}; must not exceed |max_halfword|*/ enum {@+@!error_line=72@+}; /*width of context lines on terminal error messages*/ enum {@+@!half_error_line=42@+}; /*width of first lines of contexts in terminal error messages; should be between 30 and |error_line-15|*/ enum {@+@!max_print_line=79@+}; /*width of longest text lines output; should be at least 60*/ enum {@+@!stack_size=200@+}; /*maximum number of simultaneous input sources*/ enum {@+@!max_in_open=6@+}; /*maximum number of input files and error insertions that can be going on simultaneously*/ enum {@+@!font_max=75@+}; /*maximum internal font number; must not exceed |max_quarterword| and must be at most |font_base+256|*/ enum {@+@!font_mem_size=20000@+}; /*number of words of |font_info| for all fonts*/ enum {@+@!param_size=60@+}; /*maximum number of simultaneous macro parameters*/ enum {@+@!nest_size=40@+}; /*maximum number of semantic levels simultaneously active*/ enum {@+@!max_strings=3000@+}; /*maximum number of strings; must not exceed |max_halfword|*/ enum {@+@!string_vacancies=8000@+}; /*the minimum number of characters that should be available for the user's control sequences and font names, after \TeX's own error messages are stored*/ enum {@+@!pool_size=32000@+}; /*maximum number of characters in strings, including all error messages and help texts, and the names of all fonts and control sequences; must exceed |string_vacancies| by the total length of \TeX's own strings, which is currently about 23000*/ enum {@+@!save_size=600@+}; /*space for saving values outside of current group; must be at most |max_halfword|*/ enum {@+@!trie_size=8000@+}; /*space for hyphenation patterns; should be larger for \.{INITEX} than it is in production versions of \TeX*/ enum {@+@!trie_op_size=500@+}; /*space for ``opcodes'' in the hyphenation patterns*/ enum {@+@!dvi_buf_size=800@+}; /*size of the output buffer; must be a multiple of 8*/ enum {@+@!file_name_size=40@+}; /*file names shouldn't be longer than this*/ const char *@!pool_name="TeXformats:TEX.POOL "; /*string of length |file_name_size|; tells where the string pool appears*/ @.TeXformats@> @ Like the preceding parameters, the following quantities can be changed at compile time to extend or reduce \TeX's capacity. But if they are changed, it is necessary to rerun the initialization program \.{INITEX} @.INITEX@> to generate new tables for the production \TeX\ program. One can't simply make helter-skelter changes to the following constants, since certain rather complex initialization numbers are computed from them. They are defined here using \.{WEB} macros, instead of being put into \PASCAL's || list, in order to emphasize this distinction. @d mem_bot 0 /*smallest index in the |mem| array dumped by \.{INITEX}; must not be less than |mem_min|*/ @d mem_top 30000 /*largest index in the |mem| array dumped by \.{INITEX}; must be substantially larger than |mem_bot| and not greater than |mem_max|*/ @d font_base 0 /*smallest internal font number; must not be less than |min_quarterword|*/ @d hash_size 2100 /*maximum number of control sequences; it should be at most about |(mem_max-mem_min)/(double)10|*/ @d hash_prime 1777 /*a prime number equal to about 85\pct! of |hash_size|*/ @d hyph_size 307 /*another prime; the number of \.{\\hyphenation} exceptions*/ @^system dependencies@> @ In case somebody has inadvertently made bad settings of the ``constants,'' \TeX\ checks them using a global variable called |bad|. This is the first of many sections of \TeX\ where global variables are defined. @<Glob...@>= int @!bad; /*is some ``constant'' wrong?*/ @ Later on we will say `\ignorespaces|if (mem_max >= max_halfword) bad=14|', or something similar. (We can't do that until |max_halfword| has been defined.) @<Check the ``constant'' values for consistency@>= bad=0; if ((half_error_line < 30)||(half_error_line > error_line-15)) bad=1; if (max_print_line < 60) bad=2; if (dvi_buf_size%8!=0) bad=3; if (mem_bot+1100 > mem_top) bad=4; if (hash_prime > hash_size) bad=5; if (max_in_open >= 128) bad=6; if (mem_top < 256+11) bad=7; /*we will want |null_list > 255|*/ @ Labels are given symbolic names by the following definitions, so that occasional |goto| statements will be meaningful. We insert the label `|end|' just before the `\ignorespaces|} |\unskip' of a procedure in which we have used the `|goto end|' statement defined below; the label `|restart|' is occasionally used at the very beginning of a procedure; and the label `|reswitch|' is occasionally used just prior to a |case| statement in which some cases change the conditions and we wish to branch to the newly applicable case. Loops that are set up with the |loop| construction defined below are commonly exited by going to `|done|' or to `|found|' or to `|not_found|', and they are sometimes repeated by going to `|resume|'. If two or more parts of a subroutine start differently but end up the same, the shared code may be gathered together at `|common_ending|'. Incidentally, this program never declares a label that isn't actually used, because some fussy \PASCAL\ compilers will complain about redundant labels. @d done6 36 /*for exiting the sixth loop in a block*/ @ Here are some macros for common programming idioms. @d incr(X) X=X+1 /*increase a variable by unity*/ @d decr(X) X=X-1 /*decrease a variable by unity*/ @d negate(X) X=-X /*change the sign of a variable*/ @d loop @+while (true) @+ /*repeat over and over until a |goto| happens*/ @f loop else /*\.{WEB}'s |else| acts like `\ignorespaces|while true do|\unskip'*/ @d do_nothing /*empty statement*/ @d empty 0 /*symbolic name for a null constant*/ @* The character set. In order to make \TeX\ readily portable to a wide variety of computers, all of its input text is converted to an internal eight-bit code that includes standard ASCII, the ``American Standard Code for Information Interchange.'' This conversion is done immediately when each character is read in. Conversely, characters are converted from ASCII to the user's external representation just before they are output to a text file. Such an internal code is relevant to users of \TeX\ primarily because it governs the positions of characters in the fonts. For example, the character `\.A' has ASCII code $65=0101$, and when \TeX\ typesets this letter it specifies character number 65 in the current font. If that font actually has `\.A' in a different position, \TeX\ doesn't know what the real position is; the program that does the actual printing from \TeX's device-independent files is responsible for converting from ASCII to a particular font encoding. @^ASCII code@> \TeX's internal code also defines the value of constants that begin with a reverse apostrophe; and it provides an index to the \.{\\catcode}, \.{\\mathcode}, \.{\\uccode}, \.{\\lccode}, and \.{\\delcode} tables. @ Characters of text that have been converted to \TeX's internal form are said to be of type |ASCII_code|, which is a subrange of the integers. @<Types...@>= typedef uint8_t ASCII_code; /*eight-bit numbers*/ @ The original \PASCAL\ compiler was designed in the late 60s, when six-bit character sets were common, so it did not make provision for lowercase letters. Nowadays, of course, we need to deal with both capital and small letters in a convenient way, especially in a program for typesetting; so the present specification of \TeX\ has been written under the assumption that the \PASCAL\ compiler and run-time system permit the use of text files with more than 64 distinguishable characters. More precisely, we assume that the character set contains at least the letters and symbols associated with ASCII codes 040 through 0176; all of these characters are now available on most computer terminals. Since we are dealing with more characters than were present in the first \PASCAL\ compilers, we have to decide what to call the associated data type. Some \PASCAL s use the original name |unsigned char| for the characters in text files, even though there now are more than 64 such characters, while other \PASCAL s consider |unsigned char| to be a 64-element subrange of a larger data type that has some other name. In order to accommodate this difference, we shall use the name |text_char| to stand for the data type of the characters that are converted to and from |ASCII_code| when they are input and output. We shall also assume that |text_char| consists of the elements |chr(first_text_char)| through |chr(last_text_char)|, inclusive. The following definitions should be adjusted if necessary. @^system dependencies@> @d text_char unsigned char /*the data type of characters in text files*/ @d first_text_char 0 /*ordinal number of the smallest element of |text_char|*/ @d last_text_char 255 /*ordinal number of the largest element of |text_char|*/ @<Local variables for init...@>= int @!i; @ The \TeX\ processor converts between ASCII code and the user's external character set by means of arrays |xord| and |xchr| that are analogous to \PASCAL's |ord| and |chr| functions. @<Glob...@>= ASCII_code @!xord[256]; /*specifies conversion of input characters*/ uint8_t @!xchr[256]; /*specifies conversion of output characters*/ @ Since we are assuming that our \PASCAL\ system is able to read and write the visible characters of standard ASCII (although not necessarily using the ASCII codes to represent them), the following assignment statements initialize the standard part of the |xchr| array properly, without needing any system-dependent changes. On the other hand, it is possible to implement \TeX\ with less complete character sets, and in such cases it will be necessary to change something here. @^system dependencies@> @<Set init...@>= xchr[040]= ' ' ; xchr[041]= '!' ; xchr[042]= '"' ; xchr[043]= '#' ; xchr[044]= '$' ; xchr[045]= '%' ; xchr[046]= '&' ; xchr[047]= '\'' ;@/ xchr[050]= '(' ; xchr[051]= ')' ; xchr[052]= '*' ; xchr[053]= '+' ; xchr[054]= ',' ; xchr[055]= '-' ; xchr[056]= '.' ; xchr[057]= '/' ;@/ xchr[060]= '0' ; xchr[061]= '1' ; xchr[062]= '2' ; xchr[063]= '3' ; xchr[064]= '4' ; xchr[065]= '5' ; xchr[066]= '6' ; xchr[067]= '7' ;@/ xchr[070]= '8' ; xchr[071]= '9' ; xchr[072]= ':' ; xchr[073]= ';' ; xchr[074]= '<' ; xchr[075]= '=' ; xchr[076]= '>' ; xchr[077]= '?' ;@/ xchr[0100]= '@@' ; xchr[0101]= 'A' ; xchr[0102]= 'B' ; xchr[0103]= 'C' ; xchr[0104]= 'D' ; xchr[0105]= 'E' ; xchr[0106]= 'F' ; xchr[0107]= 'G' ;@/ xchr[0110]= 'H' ; xchr[0111]= 'I' ; xchr[0112]= 'J' ; xchr[0113]= 'K' ; xchr[0114]= 'L' ; xchr[0115]= 'M' ; xchr[0116]= 'N' ; xchr[0117]= 'O' ;@/ xchr[0120]= 'P' ; xchr[0121]= 'Q' ; xchr[0122]= 'R' ; xchr[0123]= 'S' ; xchr[0124]= 'T' ; xchr[0125]= 'U' ; xchr[0126]= 'V' ; xchr[0127]= 'W' ;@/ xchr[0130]= 'X' ; xchr[0131]= 'Y' ; xchr[0132]= 'Z' ; xchr[0133]= '[' ; xchr[0134]= '\\' ; xchr[0135]= ']' ; xchr[0136]= '^' ; xchr[0137]= '_' ;@/ xchr[0140]= '`' ; xchr[0141]= 'a' ; xchr[0142]= 'b' ; xchr[0143]= 'c' ; xchr[0144]= 'd' ; xchr[0145]= 'e' ; xchr[0146]= 'f' ; xchr[0147]= 'g' ;@/ xchr[0150]= 'h' ; xchr[0151]= 'i' ; xchr[0152]= 'j' ; xchr[0153]= 'k' ; xchr[0154]= 'l' ; xchr[0155]= 'm' ; xchr[0156]= 'n' ; xchr[0157]= 'o' ;@/ xchr[0160]= 'p' ; xchr[0161]= 'q' ; xchr[0162]= 'r' ; xchr[0163]= 's' ; xchr[0164]= 't' ; xchr[0165]= 'u' ; xchr[0166]= 'v' ; xchr[0167]= 'w' ;@/ xchr[0170]= 'x' ; xchr[0171]= 'y' ; xchr[0172]= 'z' ; xchr[0173]= '{' ; xchr[0174]= '|' ; xchr[0175]= '}' ; xchr[0176]= '~' ;@/ @ Some of the ASCII codes without visible characters have been given symbolic names in this program because they are used with a special meaning. @d null_code 00 /*ASCII code that might disappear*/ @d carriage_return 015 /*ASCII code used at end of line*/ @d invalid_code 0177 /*ASCII code that many systems prohibit in text files*/ @ The ASCII code is ``standard'' only to a certain extent, since many computer installations have found it advantageous to have ready access to more than 94 printing characters. Appendix~C of {\sl The \TeX book\/} gives a complete specification of the intended correspondence between characters and \TeX's internal representation. @:TeXbook}{\sl The \TeX book@> If \TeX\ is being used on a garden-variety \PASCAL\ for which only standard ASCII codes will appear in the input and output files, it doesn't really matter what codes are specified in |xchr[0 dotdot 037]|, but the safest policy is to blank everything out by using the code shown below. However, other settings of |xchr| will make \TeX\ more friendly on computers that have an extended character set, so that users can type things like `\.^^Z' instead of `\.{\\ne}'. People with extended character sets can assign codes arbitrarily, giving an |xchr| equivalent to whatever characters the users of \TeX\ are allowed to have in their input files. It is best to make the codes correspond to the intended interpretations as shown in Appendix~C whenever possible; but this is not necessary. For example, in countries with an alphabet of more than 26 letters, it is usually best to map the additional letters into codes less than~040. To get the most ``permissive'' character set, change | ' ' | on the right of these assignment statements to |chr(i)|. @^character set dependencies@> @^system dependencies@> @<Set init...@>= for (i=0; i<=037; i++) xchr[i]= ' ' ; for (i=0177; i<=0377; i++) xchr[i]= ' ' ; @ The following system-independent code makes the |xord| array contain a suitable inverse to the information in |xchr|. Note that if |xchr[i]==xchr[j]| where |i < j < 0177|, the value of |xord[xchr[i]]| will turn out to be |j| or more; hence, standard ASCII code numbers will be used instead of codes below 040 in case there is a coincidence. @<Set init...@>= for (i=first_text_char; i<=last_text_char; i++) xord[chr(i)]=invalid_code; for (i=0200; i<=0377; i++) xord[xchr[i]]=i; for (i=0; i<=0176; i++) xord[xchr[i]]=i; @* Input and output. The bane of portability is the fact that different operating systems treat input and output quite differently, perhaps because computer scientists have not given sufficient attention to this problem. People have felt somehow that input and output are not part of ``real'' programming. Well, it is true that some kinds of programming are more fun than others. With existing input/output conventions being so diverse and so messy, the only sources of joy in such parts of the code are the rare occasions when one can find a way to make the program a little less bad than it might have been. We have two choices, either to attack I/O now and get it over with, or to postpone I/O until near the end. Neither prospect is very attractive, so let's get it over with. The basic operations we need to do are (1)~inputting and outputting of text, to or from a file or the user's terminal; (2)~inputting and outputting of eight-bit bytes, to or from a file; (3)~instructing the operating system to initiate (``open'') or to terminate (``close'') input or output from a specified file; (4)~testing whether the end of an input file has been reached. \TeX\ needs to deal with two kinds of files. We shall use the term |alpha_file| for a file that contains textual data, and the term |byte_file| for a file that contains eight-bit binary information. These two types turn out to be the same on many computers, but sometimes there is a significant distinction, so we shall be careful to distinguish between them. Standard protocols for transferring such files from computer to computer, via high-speed networks, are now becoming available to more and more communities of users. The program actually makes use also of a third kind of file, called a |word_file|, when dumping and reloading base information for its own initialization. We shall define a word file later; but it will be possible for us to specify simple operations on word files before they are defined. @<Types...@>= typedef uint8_t eight_bits; /*unsigned one-byte quantity*/ typedef struct {@+FILE *f;@+text_char@,d;@+} alpha_file; /*files that contain textual data*/ typedef struct {@+FILE *f;@+eight_bits@,d;@+} byte_file; /*files that contain binary data*/ @ Most of what we need to do with respect to input and output can be handled by the I/O facilities that are standard in \PASCAL, i.e., the routines called |get|, |put|, |eof|, and so on. But standard \PASCAL\ does not allow file variables to be associated with file names that are determined at run time, so it cannot be used to implement \TeX; some sort of extension to \PASCAL's ordinary |reset| and |rewrite| is crucial for our purposes. We shall assume that |name_of_file| is a variable of an appropriate type such that the \PASCAL\ run-time system being used to implement \TeX\ can open a file whose external name is specified by |name_of_file|. @^system dependencies@> @<Glob...@>= uint8_t @!name_of_file0[file_name_size+1]={0}, *const @!name_of_file = @!name_of_file0-1;@;@/ /*on some systems this may be a \&{record} variable*/ uint8_t @!name_length;@/ /*this many characters are actually relevant in |name_of_file| (the rest are blank)*/ @ The \ph\ compiler with which the present version of \TeX\ was prepared has extended the rules of \PASCAL\ in a very convenient way. To open file~|f|, we can write $$\vbox{\halign{#\hfil\qquad&#\hfil\cr |reset(f,@t\\{name}@>,"/O")|&for input;\cr |rewrite(f,@t\\{name}@>,"/O")|&for output.\cr}}$$ The `\\{name}' parameter, which is of type `{\bf packed array $[\langle\\{any}\rangle]$ of \\{char}}', stands for the name of the external file that is being opened for input or output. Blank spaces that might appear in \\{name} are ignored. The `\.{/O}' parameter tells the operating system not to issue its own error messages if something goes wrong. If a file of the specified name cannot be found, or if such a file cannot be opened for some other reason (e.g., someone may already be trying to write the same file), we will have |@!erstat(f)!=0| after an unsuccessful |reset| or |rewrite|. This allows \TeX\ to undertake appropriate corrective action. @:PASCAL H}{\ph@> @^system dependencies@> \TeX's file-opening procedures return |false| if no file identified by |name_of_file| could be opened. @d reset_OK(X) erstat(X)==0 @d rewrite_OK(X) erstat(X)==0 @p bool a_open_in(alpha_file *f) /*open a text file for input*/ {@+reset((*f), name_of_file,"r");return reset_OK((*f)); } @# bool a_open_out(alpha_file *f) /*open a text file for output*/ {@+rewrite((*f), name_of_file,"w");return rewrite_OK((*f)); } @# bool b_open_in(byte_file *f) /*open a binary file for input*/ {@+reset((*f), name_of_file,"rb");return reset_OK((*f)); } @# bool b_open_out(byte_file *f) /*open a binary file for output*/ {@+rewrite((*f), name_of_file,"wb");return rewrite_OK((*f)); } @# bool w_open_in(word_file *f) /*open a word file for input*/ {@+reset((*f), name_of_file,"rb");return reset_OK((*f)); } @# bool w_open_out(word_file *f) /*open a word file for output*/ {@+rewrite((*f), name_of_file,"wb");return rewrite_OK((*f)); } @ Files can be closed with the \ph\ routine `|close(f)|', which @:PASCAL H}{\ph@> @^system dependencies@> should be used when all input or output with respect to |f| has been completed. This makes |f| available to be opened again, if desired; and if |f| was used for output, the |close| operation makes the corresponding external file appear on the user's area, ready to be read. These procedures should not generate error messages if a file is being closed before it has been successfully opened. @p void a_close(alpha_file *f) /*close a text file*/ {@+close((*f)); } @# void b_close(byte_file *f) /*close a binary file*/ {@+close((*f)); } @# void w_close(word_file *f) /*close a word file*/ {@+close((*f)); } @ Binary input and output are done with \PASCAL's ordinary |get| and |put| procedures, so we don't have to make any other special arrangements for binary~I/O. Text output is also easy to do with standard \PASCAL\ routines. The treatment of text input is more difficult, however, because of the necessary translation to |ASCII_code| values. \TeX's conventions should be efficient, and they should blend nicely with the user's operating environment. @ Input from text files is read one line at a time, using a routine called |input_ln|. This function is defined in terms of global variables called |buffer|, |first|, and |last| that will be described in detail later; for now, it suffices for us to know that |buffer| is an array of |ASCII_code| values, and that |first| and |last| are indices into this array representing the beginning and ending of a line of text. @<Glob...@>= ASCII_code @!buffer[buf_size+1]; /*lines of characters being read*/ uint16_t @!first; /*the first unused position in |buffer|*/ uint16_t @!last; /*end of the line just input to |buffer|*/ uint16_t @!max_buf_stack; /*largest index used in |buffer|*/ @ The |input_ln| function brings the next line of input from the specified file into available positions of the buffer array and returns the value |true|, unless the file has already been entirely read, in which case it returns |false| and sets |last=first|. In general, the |ASCII_code| numbers that represent the next line of the file are input into |buffer[first]|, |buffer[first+1]|, \dots, |buffer[last-1]|; and the global variable |last| is set equal to |first| plus the length of the line. Trailing blanks are removed from the line; thus, either |last==first| (in which case the line was entirely blank) or |buffer[last-1]!=' '|. An overflow error is given, however, if the normal actions of |input_ln| would make |last >= buf_size|; this is done so that other parts of \TeX\ can safely look at the contents of |buffer[last+1]| without overstepping the bounds of the |buffer| array. Upon entry to |input_ln|, the condition |first < buf_size| will always hold, so that there is always room for an ``empty'' line. The variable |max_buf_stack|, which is used to keep track of how large the |buf_size| parameter must be to accommodate the present job, is also kept up to date by |input_ln|. If the |bypass_eoln| parameter is |true|, |input_ln| will do a |get| before looking at the first character of the line; this skips over an |eoln| that was in |f.d|. The procedure does not do a |get| when it reaches the end of the line; therefore it can be used to acquire input from the user's terminal as well as from ordinary text files. Standard \PASCAL\ says that a file should have |eoln| immediately before |eof|, but \TeX\ needs only a weaker restriction: If |eof| occurs in the middle of a line, the system function |eoln| should return a |true| result (even though |f.d| will be undefined). Since the inner loop of |input_ln| is part of \TeX's ``inner loop''---each character of input comes in at this place---it is wise to reduce system overhead by making use of special routines that read in an entire array of characters at once, if such routines are available. The following code uses standard \PASCAL\ to illustrate what needs to be done, but finer tuning is often possible at well-developed \PASCAL\ sites. @^inner loop@> @p bool input_ln(alpha_file *f, bool @!bypass_eoln) /*inputs the next line or returns |false|*/ {@+uint16_t last_nonblank; /*|last| with trailing blanks removed*/ if (bypass_eoln) if (!eof((*f))) get((*f)); /*input the first character of the line into |f.d|*/ last=first; /*cf.\ Matthew 19\thinspace:\thinspace30*/ if (eof((*f))) return false; else{@+last_nonblank=first; while (!eoln((*f))) {@+if (last >= max_buf_stack) {@+max_buf_stack=last+1; if (max_buf_stack==buf_size) @<Report overflow of the input buffer, and abort@>; } buffer[last]=xord[(*f).d];get((*f));incr(last); if (buffer[last-1]!=' ') last_nonblank=last; } last=last_nonblank;return true; } } @ The user's terminal acts essentially like other files of text, except that it is used both for input and for output. When the terminal is considered an input file, the file variable is called |term_in|, and when it is considered an output file the file variable is |term_out|. @^system dependencies@> @<Glob...@>= alpha_file @!term_in; /*the terminal as an input file*/ alpha_file @!term_out; /*the terminal as an output file*/ @ Here is how to open the terminal files in \ph. The `\.{/I}' switch suppresses the first |get|. @:PASCAL H}{\ph@> @^system dependencies@> @d t_open_in term_in.f=stdin /*open the terminal for text input*/ @d t_open_out term_out.f=stdout /*open the terminal for text output*/ @ Sometimes it is necessary to synchronize the input/output mixture that happens on the user's terminal, and three system-dependent procedures are used for this purpose. The first of these, |update_terminal|, is called when we want to make sure that everything we have output to the terminal so far has actually left the computer's internal buffers and been sent. The second, |clear_terminal|, is called when we wish to cancel any input that the user may have typed ahead (since we are about to issue an unexpected error message). The third, |wake_up_terminal|, is supposed to revive the terminal if the user has disabled it by some instruction to the operating system. The following macros show how these operations can be specified in \ph: @:PASCAL H}{\ph@> @^system dependencies@> @d update_terminal fflush(term_out.f) /*empty the terminal output buffer*/ @d clear_terminal fflush(term_in.f) /*clear the terminal input buffer*/ @d wake_up_terminal do_nothing /*cancel the user's cancellation of output*/ @ We need a special routine to read the first line of \TeX\ input from the user's terminal. This line is different because it is read before we have opened the transcript file; there is sort of a ``chicken and egg'' problem here. If the user types `\.{\\input paper}' on the first line, or if some macro invoked by that line does such an \.{\\input}, the transcript file will be named `\.{paper.log}'; but if no \.{\\input} commands are performed during the first line of terminal input, the transcript file will acquire its default name `\.{texput.log}'. (The transcript file will not contain error messages generated by the first line before the first \.{\\input} command.) @.texput@> The first line is even more special if we are lucky enough to have an operating system that treats \TeX\ differently from a run-of-the-mill \PASCAL\ object program. It's nice to let the user start running a \TeX\ job by typing a command line like `\.{tex paper}'; in such a case, \TeX\ will operate as if the first line of input were `\.{paper}', i.e., the first line will consist of the remainder of the command line, after the part that invoked \TeX. The first line is special also because it may be read before \TeX\ has input a format file. In such cases, normal error messages cannot yet be given. The following code uses concepts that will be explained later. (If the \PASCAL\ compiler does not support non-local |@!goto|\unskip, the @^system dependencies@> statement `|goto exit(0)|' should be replaced by something that quietly terminates the program.) @<Report overflow of the input buffer, and abort@>= if (format_ident==0) {@+write_ln(term_out,"Buffer size exceeded!");exit(0); @.Buffer size exceeded@> } else{@+cur_input.loc_field=first;cur_input.limit_field=last-1; overflow("buffer size", buf_size); @:TeX capacity exceeded buffer size}{\quad buffer size@> } @ Different systems have different ways to get started. But regardless of what conventions are adopted, the routine that initializes the terminal should satisfy the following specifications: \yskip\textindent{1)}It should open file |term_in| for input from the terminal. (The file |term_out| will already be open for output to the terminal.) \textindent{2)}If the user has given a command line, this line should be considered the first line of terminal input. Otherwise the user should be prompted with `\.{**}', and the first line of input should be whatever is typed in response. \textindent{3)}The first line of input, which might or might not be a command line, should appear in locations |first| to |last-1| of the |buffer| array. \textindent{4)}The global variable |loc| should be set so that the character to be read next by \TeX\ is in |buffer[loc]|. This character should not be blank, and we should have |loc < last|. \yskip\noindent(It may be necessary to prompt the user several times before a non-blank line comes in. The prompt is `\.{**}' instead of the later `\.*' because the meaning is slightly different: `\.{\\input}' need not be typed immediately after~`\.{**}'.) @d loc cur_input.loc_field /*location of first unread character in |buffer|*/ @ The following program does the required initialization without retrieving a possible command line. It should be clear how to modify this routine to deal with command lines, if the system permits them. @^system dependencies@> @p bool init_terminal(void) /*gets the terminal input started*/ {@+ t_open_in; loop@+{@+wake_up_terminal;write(term_out,"**");update_terminal; @.**@> if (!input_ln(&term_in, true)) /*this shouldn't happen*/ {@+write_ln(term_out); write(term_out,"! End of file on the terminal... why?"); @.End of file on the terminal@> return false; } loc=first; while ((loc < last)&&(buffer[loc]==' ')) incr(loc); if (loc < last) {@+return true; /*return unless the line was all blank*/ } write_ln(term_out,"Please type the name of your input file."); } } @* String handling. Control sequence names and diagnostic messages are variable-length strings of eight-bit characters. Since \PASCAL\ does not have a well-developed string mechanism, \TeX\ does all of its string processing by homegrown methods. Elaborate facilities for dynamic strings are not needed, so all of the necessary operations can be handled with a simple data structure. The array |str_pool| contains all of the (eight-bit) ASCII codes in all of the strings, and the array |str_start| contains indices of the starting points of each string. Strings are referred to by integer numbers, so that string number |s| comprises the characters |str_pool[j]| for |str_start[s] <= j < str_start[s+1]|. Additional integer variables |pool_ptr| and |str_ptr| indicate the number of entries used so far in |str_pool| and |str_start|, respectively; locations |str_pool[pool_ptr]| and |str_start[str_ptr]| are ready for the next string to be allocated. String numbers 0 to 255 are reserved for strings that correspond to single ASCII characters. This is in accordance with the conventions of \.{WEB}, @.WEB@> which converts single-character strings into the ASCII code number of the single character involved, while it converts other strings into integers and builds a string pool file. Thus, when the string constant \.{"."} appears in the program below, \.{WEB} converts it into the integer 46, which is the ASCII code for a period, while \.{WEB} will convert a string like \.{"hello"} into some integer greater than~255. String number 46 will presumably be the single character `\..'; but some ASCII codes have no standard visible representation, and \TeX\ sometimes needs to be able to print an arbitrary ASCII character, so the first 256 strings are used to specify exactly what should be printed for each of the 256 possibilities. Elements of the |str_pool| array must be ASCII codes that can actually be printed; i.e., they must have an |xchr| equivalent in the local character set. (This restriction applies only to preloaded strings, not to those generated dynamically by the user.) Some \PASCAL\ compilers won't pack integers into a single byte unless the integers lie in the range |-128 dotdot 127|. To accommodate such systems we access the string pool only via macros that can easily be redefined. @^system dependencies@> @d si(X) X /*convert from |ASCII_code| to |packed_ASCII_code|*/ @d so(X) X /*convert from |packed_ASCII_code| to |ASCII_code|*/ @<Types...@>= typedef uint16_t pool_pointer; /*for variables that point into |str_pool|*/ typedef uint16_t str_number; /*for variables that point into |str_start|*/ typedef uint8_t packed_ASCII_code; /*elements of |str_pool| array*/ @ @<Glob...@>= @<prepare for string pool initialization@>@; packed_ASCII_code @!str_pool[pool_size+1]= @<|str_pool| initialization@>; /*the characters*/ pool_pointer @!str_start[max_strings+1]= {@<|str_start| initialization@>}; /*the starting pointers*/ pool_pointer @!pool_ptr=@<|pool_ptr| initialization@>; /*first unused position in |str_pool|*/ str_number @!str_ptr=@<|str_ptr| initialization@>; /*number of the current string being created*/ pool_pointer @!init_pool_ptr; /*the starting value of |pool_ptr|*/ str_number @!init_str_ptr; /*the starting value of |str_ptr|*/ @ Several of the elementary string operations are performed using \.{WEB} macros instead of \PASCAL\ procedures, because many of the operations are done quite frequently and we want to avoid the overhead of procedure calls. For example, here is a simple macro that computes the length of a string. @.WEB@> @d length(X) (str_start[X+1]-str_start[X]) /*the number of characters in string number \#*/ @ The length of the current string is called |cur_length|: @d cur_length (pool_ptr-str_start[str_ptr]) @ Strings are created by appending character codes to |str_pool|. The |append_char| macro, defined here, does not check to see if the value of |pool_ptr| has gotten too high; this test is supposed to be made before |append_char| is used. There is also a |flush_char| macro, which erases the last character appended. To test if there is room to append |l| more characters to |str_pool|, we shall write |str_room(l)|, which aborts \TeX\ and gives an apologetic error message if there isn't enough room. @d append_char(X) /*put |ASCII_code| \# at the end of |str_pool|*/ {@+str_pool[pool_ptr]=si(X);incr(pool_ptr); } @d flush_char decr(pool_ptr) /*forget the last character in the pool*/ @d str_room(X) /*make sure that the pool hasn't overflowed*/ {@+if (pool_ptr+X > pool_size) overflow("pool size", pool_size-init_pool_ptr); @:TeX capacity exceeded pool size}{\quad pool size@> } @ Once a sequence of characters has been appended to |str_pool|, it officially becomes a string when the function |make_string| is called. This function returns the identification number of the new string as its value. @p str_number make_string(void) /*current string enters the pool*/ {@+if (str_ptr==max_strings) overflow("number of strings", max_strings-init_str_ptr); @:TeX capacity exceeded number of strings}{\quad number of strings@> incr(str_ptr);str_start[str_ptr]=pool_ptr; return str_ptr-1; } @ To destroy the most recently made string, we say |flush_string|. @d flush_string {@+decr(str_ptr);pool_ptr=str_start[str_ptr]; } @ The following subroutine compares string |s| with another string of the same length that appears in |buffer| starting at position |k|; the result is |true| if and only if the strings are equal. Empirical tests indicate that |str_eq_buf| is used in such a way that it tends to return |true| about 80 percent of the time. @p bool str_eq_buf(str_number @!s, int @!k) /*test equality of strings*/ {@+ /*loop exit*/ pool_pointer j; /*running index*/ bool @!result; /*result of comparison*/ j=str_start[s]; while (j < str_start[s+1]) {@+if (so(str_pool[j])!=buffer[k]) {@+result=false;goto not_found; } incr(j);incr(k); } result=true; not_found: return result; } @ Here is a similar routine, but it compares two strings in the string pool, and it does not assume that they have the same length. @p bool str_eq_str(str_number @!s, str_number @!t) /*test equality of strings*/ {@+ /*loop exit*/ pool_pointer j, @!k; /*running indices*/ bool @!result; /*result of comparison*/ result=false; if (length(s)!=length(t)) goto not_found; j=str_start[s];k=str_start[t]; while (j < str_start[s+1]) {@+if (str_pool[j]!=str_pool[k]) goto not_found; incr(j);incr(k); } result=true; not_found: return result; } @ The initial values of |str_pool|, |str_start|, |pool_ptr|, and |str_ptr| are computed by the \.{INITEX} program, based in part on the information that \.{WEB} has output while processing \TeX. @.INITEX@> @^string pool@> @p #ifdef @!INIT bool get_strings_started(void) /*initializes the string pool, but returns |false| if something goes wrong*/ {@+ int k, @!l; /*small indices or counters*/ uint8_t @!m, @!n; /*characters input from |pool_file|*/ str_number @!g; /*garbage*/ int @!a; /*accumulator for check sum*/ bool @!c; /*check sum has been checked*/ pool_ptr=0;str_ptr=0;str_start[0]=0; @<Make the first 256 strings@>; @<Read the other strings from the \.{TEX.POOL} file and return |true|, or give an error message and return |false|@>; } #endif @ @d app_lc_hex(X) l=X; if (l < 10) append_char(l+'0')@;@+else append_char(l-10+'a') @<Make the first 256...@>= for (k=0; k<=255; k++) { if (@[@<Character |k| cannot be printed@>@]) { append_char('^');append_char('^'); if (k < 0100) append_char(k+0100)@; else if (k < 0200) append_char(k-0100)@; else{ app_lc_hex(k/16);app_lc_hex(k%16); } } else append_char(k); g=make_string(); } @ The first 128 strings will contain 95 standard ASCII characters, and the other 33 characters will be printed in three-symbol form like `\.{\^\^A}' unless a system-dependent change is made here. Installations that have an extended character set, where for example |xchr[032]==@t\.{\'^^Z\'}@>|, would like string 032 to be the single character 032 instead of the three characters 0136, 0136, 0132 (\.{\^\^Z}). On the other hand, even people with an extended character set will want to represent string 015 by \.{\^\^M}, since 015 is |carriage_return|; the idea is to produce visible strings instead of tabs or line-feeds or carriage-returns or bell-rings or characters that are treated anomalously in text files. Unprintable characters of codes 128--255 are, similarly, rendered \.{\^\^80}--\.{\^\^ff}. The boolean expression defined here should be |true| unless \TeX\ internal code number~|k| corresponds to a non-troublesome visible symbol in the local character set. An appropriate formula for the extended character set recommended in {\sl The \TeX book\/} would, for example, be `|k in[0, 010 dotdot 012, 014, 015, 033, 0177 dotdot 0377]|'. If character |k| cannot be printed, and |k < 0200|, then character |k+0100| or |k-0100| must be printable; moreover, ASCII codes |[041 dotdot 046, 060 dotdot 071, 0136, 0141 dotdot 0146, 0160 dotdot 0171]| must be printable. Thus, at least 81 printable characters are needed. @:TeXbook}{\sl The \TeX book@> @^character set dependencies@> @^system dependencies@> @<Character |k| cannot be printed@>= (k < ' ')||(k > '~') @ When the \.{WEB} system program called \.{TANGLE} processes the \.{TEX.WEB} description that you are now reading, it outputs the \PASCAL\ program \.{TEX.PAS} and also a string pool file called \.{TEX.POOL}. The \.{INITEX} @.WEB@>@.INITEX@> program reads the latter file, where each string appears as a two-digit decimal length followed by the string itself, and the information is recorded in \TeX's string memory. @<Glob...@>= #ifdef @!INIT alpha_file @!pool_file; /*the string-pool file output by \.{TANGLE}*/ #endif @ @d bad_pool(X) {@+wake_up_terminal;write_ln(term_out, X); a_close(&pool_file);return false; } @<Read the other strings...@>= {@+int k;@+for(k=1; k<=file_name_size;k++)name_of_file[k]=pool_name[k-1];@+} /*we needn't set |name_length|*/ if (a_open_in(&pool_file)) {@+c=false; @/do@+{@<Read one string, but return |false| if the string memory space is getting too tight for comfort@>; }@+ while (!(c)); a_close(&pool_file);return true; } else bad_pool("! I can't read TEX.POOL.") @.I can't read TEX.POOL@> @ @<Read one string...@>= {@+if (eof(pool_file)) bad_pool("! TEX.POOL has no check sum."); @.TEX.POOL has no check sum@> read(pool_file, m);@+read(pool_file, n); /*read two digits of string length*/ if (m== '*' ) @<Check the pool check sum@>@; else{@+if ((xord[m] < '0')||(xord[m] > '9')||@| (xord[n] < '0')||(xord[n] > '9')) bad_pool("! TEX.POOL line doesn't begin with two digits."); @.TEX.POOL line doesn't...@> l=xord[m]*10+xord[n]-'0'*11; /*compute the length*/ if (pool_ptr+l+string_vacancies > pool_size) bad_pool("! You have to increase POOLSIZE."); @.You have to increase POOLSIZE@> for (k=1; k<=l; k++) {@+if (eoln(pool_file)) m= ' ' ;@+else read(pool_file, m); append_char(xord[m]); } read_ln(pool_file);g=make_string(); } } @ The \.{WEB} operation \.{@@\$} denotes the value that should be at the end of this \.{TEX.POOL} file; any other value means that the wrong pool file has been loaded. @^check sum@> @<Check the pool check sum@>= {@+a=0;k=1; loop@+{@+if ((xord[n] < '0')||(xord[n] > '9')) bad_pool("! TEX.POOL check sum doesn't have nine digits."); @.TEX.POOL check sum...@> a=10*a+xord[n]-'0'; if (k==9) goto done; incr(k);read(pool_file, n); } done: if (a!=0) bad_pool("! TEX.POOL doesn't match; TANGLE me again."); @.TEX.POOL doesn't match@> c=true; } @* On-line and off-line printing. Messages that are sent to a user's terminal and to the transcript-log file are produced by several `|print|' procedures. These procedures will direct their output to a variety of places, based on the setting of the global variable |selector|, which has the following possible values: \yskip \hang |term_and_log|, the normal setting, prints on the terminal and on the transcript file. \hang |log_only|, prints only on the transcript file. \hang |term_only|, prints only on the terminal. \hang |no_print|, doesn't print at all. This is used only in rare cases before the transcript file is open. \hang |pseudo|, puts output into a cyclic buffer that is used by the |show_context| routine; when we get to that routine we shall discuss the reasoning behind this curious mode. \hang |new_string|, appends the output to the current string in the string pool. \hang 0 to 15, prints on one of the sixteen files for \.{\\write} output. \yskip \noindent The symbolic names `|term_and_log|', etc., have been assigned numeric codes that satisfy the convenient relations |no_print+1==term_only|, |no_print+2==log_only|, |term_only+2==log_only+1==term_and_log|. Three additional global variables, |tally| and |term_offset| and |file_offset|, record the number of characters that have been printed since they were most recently cleared to zero. We use |tally| to record the length of (possibly very long) stretches of printing; |term_offset| and |file_offset|, on the other hand, keep track of how many characters have appeared so far on the current line that has been output to the terminal or to the transcript file, respectively. @d no_print 16 /*|selector| setting that makes data disappear*/ @d term_only 17 /*printing is destined for the terminal only*/ @d log_only 18 /*printing is destined for the transcript file only*/ @d term_and_log 19 /*normal |selector| setting*/ @d pseudo 20 /*special |selector| setting for |show_context|*/ @d new_string 21 /*printing is deflected to the string pool*/ @d max_selector 21 /*highest selector setting*/ @<Glob...@>= alpha_file @!log_file; /*transcript of \TeX\ session*/ uint8_t @!selector; /*where to print a message*/ uint8_t @!dig[23]; /*digits in a number being output*/ int @!tally; /*the number of characters recently printed*/ uint8_t @!term_offset; /*the number of characters on the current terminal line*/ uint8_t @!file_offset; /*the number of characters on the current file line*/ ASCII_code @!trick_buf[error_line+1]; /*circular buffer for pseudoprinting*/ int @!trick_count; /*threshold for pseudoprinting, explained later*/ int @!first_count; /*another variable for pseudoprinting*/ @ @<Initialize the output routines@>= selector=term_only;tally=0;term_offset=0;file_offset=0; @ Macro abbreviations for output to the terminal and to the log file are defined here for convenience. Some systems need special conventions for terminal output, and it is possible to adhere to those conventions by changing |wterm|, |wterm_ln|, and |wterm_cr| in this section. @^system dependencies@> @<Compiler directives@>= #define put(file) @[fwrite(&((file).d),sizeof((file).d),1,(file).f)@] #define get(file) @[fread(&((file).d),sizeof((file).d),1,(file).f)@] #define reset(file,name,mode) @[((file).f=fopen((char *)(name)+1,mode),\ (file).f!=NULL?get(file):0)@] #define rewrite(file,name,mode) @[((file).f=fopen((char *)(name)+1,mode))@] #define close(file) @[fclose((file).f)@] #define eof(file) @[feof((file).f)@] #define eoln(file) @[((file).d=='\n'||eof(file))@] #define erstat(file) @[((file).f==NULL?-1:ferror((file).f))@] #define read(file,x) @[((x)=(file).d,get(file))@] #define read_ln(file) @[do get(file); while (!eoln(file))@] #define write(file, format,...) @[fprintf(file.f,format,## __VA_ARGS__)@] #define write_ln(file,...) @[write(file,__VA_ARGS__"\n")@] #define wterm(format,...) @[write(term_out,format, ## __VA_ARGS__)@] #define wterm_ln(format,...) @[wterm(format "\n", ## __VA_ARGS__)@] #define wterm_cr @[write(term_out,"\n")@] #define wlog(format, ...) @[write(log_file,format, ## __VA_ARGS__)@] #define wlog_ln(format, ...) @[wlog(format "\n", ## __VA_ARGS__)@] #define wlog_cr @[write(log_file,"\n")@] @ To end a line of text output, we call |print_ln|. @<Basic print...@>= void print_ln(void) /*prints an end-of-line*/ {@+switch (selector) { case term_and_log: {@+wterm_cr;wlog_cr; term_offset=0;file_offset=0; } @+break; case log_only: {@+wlog_cr;file_offset=0; } @+break; case term_only: {@+wterm_cr;term_offset=0; } @+break; case no_print: case pseudo: case new_string: do_nothing;@+break; default:write_ln(write_file[selector]); } @/ } /*|tally| is not affected*/ @ The |print_char| procedure sends one character to the desired destination, using the |xchr| array to map it into an external character compatible with |input_ln|. All printing comes through |print_ln| or |print_char|. @<Basic printing...@>= void print_char(ASCII_code @!s) /*prints a single character*/ {@+ if (@<Character |s| is the current new-line character@>) if (selector < pseudo) {@+print_ln();return; } switch (selector) { case term_and_log: {@+wterm("%c",xchr[s]);wlog("%c",xchr[s]); incr(term_offset);incr(file_offset); if (term_offset==max_print_line) {@+wterm_cr;term_offset=0; } if (file_offset==max_print_line) {@+wlog_cr;file_offset=0; } } @+break; case log_only: {@+wlog("%c",xchr[s]);incr(file_offset); if (file_offset==max_print_line) print_ln(); } @+break; case term_only: {@+wterm("%c",xchr[s]);incr(term_offset); if (term_offset==max_print_line) print_ln(); } @+break; case no_print: do_nothing;@+break; case pseudo: if (tally < trick_count) trick_buf[tally%error_line]=s;@+break; case new_string: {@+if (pool_ptr < pool_size) append_char(s); } @+break; /*we drop characters if the string space is full*/ default:write(write_file[selector],"%c", xchr[s]); } @/ incr(tally); } @ An entire string is output by calling |print|. Note that if we are outputting the single standard ASCII character \.c, we could call |print('c')|, since |'c'==99| is the number of a single-character string, as explained above. But |print_char('c')| is quicker, so \TeX\ goes directly to the |print_char| routine when it knows that this is safe. (The present implementation assumes that it is always safe to print a visible ASCII character.) @^system dependencies@> @<Basic print...@>= void print(int @!s) /*prints string |s|*/ {@+ pool_pointer j; /*current character code position*/ int @!nl; /*new-line character to restore*/ if (s >= str_ptr) s=@[@<|"???"|@>@]; /*this can't happen*/ @.???@> else if (s < 256) if (s < 0) s=@[@<|"???"|@>@]; /*can't happen*/ else{@+if (selector > pseudo) {@+print_char(s);return; /*internal strings are not expanded*/ } if ((@<Character |s| is the current new-line character@>)) if (selector < pseudo) {@+print_ln();return; } nl=new_line_char;new_line_char=-1; /*temporarily disable new-line character*/ j=str_start[s]; while (j < str_start[s+1]) {@+print_char(so(str_pool[j]));incr(j); } new_line_char=nl;return; } j=str_start[s]; while (j < str_start[s+1]) {@+print_char(so(str_pool[j]));incr(j); } } void print_str(char *s) /* the simple version */ {while (*s!=0) print_char(*s++);@+ } @ Control sequence names, file names, and strings constructed with \.{\\string} might contain |ASCII_code| values that can't be printed using |print_char|. Therefore we use |slow_print| for them: @<Basic print...@>= void slow_print(int @!s) /*prints string |s|*/ {@+pool_pointer j; /*current character code position*/ if ((s >= str_ptr)||(s < 256)) print(s); else{@+j=str_start[s]; while (j < str_start[s+1]) {@+print(so(str_pool[j]));incr(j); } } } @ Here is the very first thing that \TeX\ prints: a headline that identifies the version number and format package. The |term_offset| variable is temporarily incorrect, but the discrepancy is not serious since we assume that the banner and format identifier together will occupy at most |max_print_line| character positions. @<Initialize the output...@>= wterm("%s",banner); if (format_ident==0) wterm_ln(" (no format preloaded)"); else{@+slow_print(format_ident);print_ln(); } update_terminal; @ The procedure |print_nl| is like |print|, but it makes sure that the string appears at the beginning of a new line. @<Basic print...@>= void print_nl(char *@!s) /*prints string |s| at beginning of line*/ {@+if (((term_offset > 0)&&(odd(selector)))||@| ((file_offset > 0)&&(selector >= log_only))) print_ln(); print_str(s); } @ The procedure |print_esc| prints a string that is preceded by the user's escape character (which is usually a backslash). @<Basic print...@>= void print_esc(str_number @!s) /*prints escape character, then |s|*/ {@+int c; /*the escape character code*/ @<Set variable |c| to the current escape character@>; if (c >= 0) if (c < 256) print(c); slow_print(s); } @ An array of digits in the range |0 dotdot 15| is printed by |print_the_digs|. @<Basic print...@>= void print_the_digs(eight_bits @!k) /*prints |dig[k-1]|$\,\ldots\,$|dig[0]|*/ {@+while (k > 0) {@+decr(k); if (dig[k] < 10) print_char('0'+dig[k]); else print_char('A'-10+dig[k]); } } @ The following procedure, which prints out the decimal representation of a given integer |n|, has been written carefully so that it works properly if |n==0| or if |(-n)| would cause overflow. It does not apply |%| or |/| to negative arguments, since such operations are not implemented consistently by all \PASCAL\ compilers. @<Basic print...@>= void print_int(int @!n) /*prints an integer in decimal form*/ {@+uint8_t k; /*index to current digit; we assume that $|n|<10^{23}$*/ int @!m; /*used to negate |n| in possibly dangerous cases*/ k=0; if (n < 0) {@+print_char('-'); if (n > -100000000) negate(n); else{@+m=-1-n;n=m/10;m=(m%10)+1;k=1; if (m < 10) dig[0]=m; else{@+dig[0]=0;incr(n); } } } @/do@+{dig[k]=n%10;n=n/10;incr(k); }@+ while (!(n==0)); print_the_digs(k); } @ Here is a trivial procedure to print two digits; it is usually called with a parameter in the range |0 <= n <= 99|. @p void print_two(int @!n) /*prints two least significant digits*/ {@+n=abs(n)%100;print_char('0'+(n/10)); print_char('0'+(n%10)); } @ Hexadecimal printing of nonnegative integers is accomplished by |print_hex|. @p void print_hex(int @!n) /*prints a positive integer in hexadecimal form*/ {@+uint8_t k; /*index to current digit; we assume that $0\L n<16^{22}$*/ k=0;print_char('"'); @/do@+{dig[k]=n%16;n=n/16;incr(k); }@+ while (!(n==0)); print_the_digs(k); } @ Old versions of \TeX\ needed a procedure called |print_ASCII| whose function is now subsumed by |print|. We retain the old name here as a possible aid to future software arch\ae ologists. @d print_ASCII print @ Roman numerals are produced by the |print_roman_int| routine. Readers who like puzzles might enjoy trying to figure out how this tricky code works; therefore no explanation will be given. Notice that 1990 yields \.{mcmxc}, not \.{mxm}. @p void print_roman_int(int @!n) {@+ pool_pointer j, @!k; /*mysterious indices into |str_pool|*/ nonnegative_integer @!u, @!v; /*mysterious numbers*/ j=str_start[@[@<|"m2d5c2l5x2v5i"|@>@]];v=1000; loop@+{@+while (n >= v) {@+print_char(so(str_pool[j]));n=n-v; } if (n <= 0) return; /*nonpositive input produces no output*/ k=j+2;u=v/(so(str_pool[k-1])-'0'); if (str_pool[k-1]==si('2')) {@+k=k+2;u=u/(so(str_pool[k-1])-'0'); } if (n+u >= v) {@+print_char(so(str_pool[k]));n=n+u; } else{@+j=j+2;v=v/(so(str_pool[j-1])-'0'); } } } @ The |print| subroutine will not print a string that is still being created. The following procedure will. @p void print_current_string(void) /*prints a yet-unmade string*/ {@+pool_pointer j; /*points to current character code*/ j=str_start[str_ptr]; while (j < pool_ptr) {@+print_char(so(str_pool[j]));incr(j); } } @ Here is a procedure that asks the user to type a line of input, assuming that the |selector| setting is either |term_only| or |term_and_log|. The input is placed into locations |first| through |last-1| of the |buffer| array, and echoed on the transcript file if appropriate. This procedure is never called when |interaction < scroll_mode|. @d prompt_input(X) {@+wake_up_terminal;print_str(X);term_input(); } /*prints a string and gets a line of input*/ @p void term_input(void) /*gets a line from the terminal*/ {@+int k; /*index into |buffer|*/ update_terminal; /*now the user sees the prompt for sure*/ if (!input_ln(&term_in, true)) fatal_error("End of file on the terminal!"); @.End of file on the terminal@> term_offset=0; /*the user's line ended with \<\rm return>*/ decr(selector); /*prepare to echo the input*/ if (last!=first) for (k=first; k<=last-1; k++) print(buffer[k]); print_ln();incr(selector); /*restore previous status*/ } @* Reporting errors. When something anomalous is detected, \TeX\ typically does something like this: $$\vbox{\halign{#\hfil\cr |print_err("Something anomalous has been detected");|\cr |help3("This is the first line of my offer to help.")|\cr |("This is the second line. I'm trying to")|\cr |("explain the best way for you to proceed.");|\cr |error;|\cr}}$$ A two-line help message would be given using |help2|, etc.; these informal helps should use simple vocabulary that complements the words used in the official error message that was printed. (Outside the U.S.A., the help messages should preferably be translated into the local vernacular. Each line of help is at most 60 characters long, in the present implementation, so that |max_print_line| will not be exceeded.) The |print_err| procedure supplies a `\.!' before the official message, and makes sure that the terminal is awake if a stop is going to occur. The |error| procedure supplies a `\..' after the official message, then it shows the location of the error; and if |interaction==error_stop_mode|, it also enters into a dialog with the user, during which time the help message may be printed. @^system dependencies@> @ The global variable |interaction| has four settings, representing increasing amounts of user interaction: @d batch_mode 0 /*omits all stops and omits terminal output*/ @d nonstop_mode 1 /*omits all stops*/ @d scroll_mode 2 /*omits error stops*/ @d error_stop_mode 3 /*stops at every opportunity to interact*/ @d print_err(X) {@+if (interaction==error_stop_mode) wake_up_terminal; print_nl("! ");print_str(X); } @<Glob...@>= uint8_t @!interaction; /*current level of interaction*/ @ @<Set init...@>=interaction=error_stop_mode; @ \TeX\ is careful not to call |error| when the print |selector| setting might be unusual. The only possible values of |selector| at the time of error messages are \yskip\hang|no_print| (when |interaction==batch_mode| and |log_file| not yet open); \hang|term_only| (when |interaction > batch_mode| and |log_file| not yet open); \hang|log_only| (when |interaction==batch_mode| and |log_file| is open); \hang|term_and_log| (when |interaction > batch_mode| and |log_file| is open). @<Initialize the print |selector| based on |interaction|@>= if (interaction==batch_mode) selector=no_print;@+else selector=term_only @ A global variable |deletions_allowed| is set |false| if the |get_next| routine is active when |error| is called; this ensures that |get_next| and related routines like |get_token| will never be called recursively. A similar interlock is provided by |set_box_allowed|. @^recursion@> The global variable |history| records the worst level of error that has been detected. It has four possible values: |spotless|, |warning_issued|, |error_message_issued|, and |fatal_error_stop|. Another global variable, |error_count|, is increased by one when an |error| occurs without an interactive dialog, and it is reset to zero at the end of every paragraph. If |error_count| reaches 100, \TeX\ decides that there is no point in continuing further. @d spotless 0 /*|history| value when nothing has been amiss yet*/ @d warning_issued 1 /*|history| value when |begin_diagnostic| has been called*/ @d error_message_issued 2 /*|history| value when |error| has been called*/ @d fatal_error_stop 3 /*|history| value when termination was premature*/ @<Glob...@>= bool @!deletions_allowed; /*is it safe for |error| to call |get_token|?*/ bool @!set_box_allowed; /*is it safe to do a \.{\\setbox} assignment?*/ uint8_t @!history; /*has the source input been clean so far?*/ int8_t @!error_count; /*the number of scrolled errors since the last paragraph ended*/ @ The value of |history| is initially |fatal_error_stop|, but it will be changed to |spotless| if \TeX\ survives the initialization process. @<Set init...@>= deletions_allowed=true;set_box_allowed=true; error_count=0; /*|history| is initialized elsewhere*/ @ Since errors can be detected almost anywhere in \TeX, we want to declare the error procedures near the beginning of the program. But the error procedures in turn use some other procedures, which need to be declared |forward| before we get to |error| itself. It is possible for |error| to be called recursively if some error arises when |get_token| is being used to delete a token, and/or if some fatal error occurs while \TeX\ is trying to fix a non-fatal one. But such recursion @^recursion@> is never more than two levels deep. @<Error handling...@>= void normalize_selector(void);@/ void get_token(void);@/ void term_input(void);@/ void show_context(void);@/ void begin_file_reading(void);@/ void open_log_file(void);@/ void close_files_and_terminate(void);@/ void clear_for_error_prompt(void);@/ void give_err_help(void);@/ #ifdef @!DEBUG void debug_help(void); #else #define debug_help() do_nothing #endif @ Individual lines of help are recorded in the array |help_line|, which contains entries in positions |0 dotdot(help_ptr-1)|. They should be printed in reverse order, i.e., with |help_line[0]| appearing last. @d hlp1(X) help_line[0]=X;@+} @d hlp2(X) help_line[1]=X;hlp1 @d hlp3(X) help_line[2]=X;hlp2 @d hlp4(X) help_line[3]=X;hlp3 @d hlp5(X) help_line[4]=X;hlp4 @d hlp6(X) help_line[5]=X;hlp5 @d help0 help_ptr=0 /*sometimes there might be no help*/ @d help1 @+{@+help_ptr=1;hlp1 /*use this with one help line*/ @d help2 @+{@+help_ptr=2;hlp2 /*use this with two help lines*/ @d help3 @+{@+help_ptr=3;hlp3 /*use this with three help lines*/ @d help4 @+{@+help_ptr=4;hlp4 /*use this with four help lines*/ @d help5 @+{@+help_ptr=5;hlp5 /*use this with five help lines*/ @d help6 @+{@+help_ptr=6;hlp6 /*use this with six help lines*/ @<Glob...@>= char *@!help_line[6]; /*helps for the next |error|*/ uint8_t @!help_ptr; /*the number of help lines present*/ bool @!use_err_help; /*should the |err_help| list be shown?*/ @ @<Set init...@>= help_ptr=0;use_err_help=false; @ The |jump_out| procedure just cuts across all active procedure levels and goes to |end_of_TEX|. This is the only nontrivial |@!goto| statement in the whole program. It is used when there is no recovery from a particular error. Some \PASCAL\ compilers do not implement non-local |goto| statements. @^system dependencies@> In such cases the body of |jump_out| should simply be `|close_files_and_terminate|;\thinspace' followed by a call on some system procedure that quietly terminates the program. @<Error hand...@>= void jump_out(void) {@+ close_files_and_terminate(); exit(0); } @ Here now is the general |error| routine. @<Error hand...@>= void error(void) /*completes the job of error reporting*/ {@+ ASCII_code c; /*what the user types*/ int @!s1, @!s2, @!s3, @!s4; /*used to save global variables when deleting tokens*/ if (history < error_message_issued) history=error_message_issued; print_char('.');show_context(); if (interaction==error_stop_mode) @<Get user's advice and |return|@>; incr(error_count); if (error_count==100) {@+print_nl("(That makes 100 errors; please try again.)"); @.That makes 100 errors...@> history=fatal_error_stop;jump_out(); } @<Put help message on the transcript file@>; } @ @<Get user's advice...@>= loop@+{@+resume: clear_for_error_prompt();prompt_input("? "); @.?\relax@> if (last==first) return; c=buffer[first]; if (c >= 'a') c=c+'A'-'a'; /*convert to uppercase*/ @<Interpret code |c| and |return| if done@>; } @ It is desirable to provide an `\.E' option here that gives the user an easy way to return from \TeX\ to the system editor, with the offending line ready to be edited. But such an extension requires some system wizardry, so the present implementation simply types out the name of the file that should be edited and the relevant line number. @^system dependencies@> There is a secret `\.D' option available when the debugging routines haven't been commented~out. @^debugging@> @<Interpret code |c| and |return| if done@>= switch (c) { case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': if (deletions_allowed) @<Delete \(c)|c-"0"| tokens and |goto continue|@>@;@+break; @t\4\4@>@; #ifdef @!DEBUG case 'D': {@+debug_help();goto resume;@+} #endif case 'E': if (base_ptr > 0) {@+print_nl("You want to edit file "); @.You want to edit file x@> slow_print(input_stack[base_ptr].name_field); print_str(" at line ");print_int(line); interaction=scroll_mode;jump_out(); } @+break; case 'H': @<Print the help information and |goto continue|@>@; case 'I': @<Introduce new material from the terminal and |return|@>@; case 'Q': case 'R': case 'S': @<Change the interaction level and |return|@>@; case 'X': {@+interaction=scroll_mode;jump_out(); } @+break; default:do_nothing; } @/ @<Print the menu of available options@>@; @ @<Print the menu...@>= {@+print_str("Type <return> to proceed, S to scroll future error messages,");@/ @.Type <return> to proceed...@> print_nl("R to run without stopping, Q to run quietly,");@/ print_nl("I to insert something, "); if (base_ptr > 0) print_str("E to edit your file,"); if (deletions_allowed) print_nl("1 or ... or 9 to ignore the next 1 to 9 tokens of input,"); print_nl("H for help, X to quit."); } @ Here the author of \TeX\ apologizes for making use of the numerical relation between |'Q'|, |'R'|, |'S'|, and the desired interaction settings |batch_mode|, |nonstop_mode|, |scroll_mode|. @^Knuth, Donald Ervin@> @<Change the interaction...@>= {@+error_count=0;interaction=batch_mode+c-'Q'; print_str("OK, entering "); switch (c) { case 'Q': {@+print_esc(@[@<|"batchmode"|@>@]);decr(selector); } @+break; case 'R': print_esc(@[@<|"nonstopmode"|@>@]);@+break; case 'S': print_esc(@[@<|"scrollmode"|@>@]); } /*there are no other cases*/ print_str("...");print_ln();update_terminal;return; } @ When the following code is executed, |buffer[(first+1)dotdot(last-1)]| may contain the material inserted by the user; otherwise another prompt will be given. In order to understand this part of the program fully, you need to be familiar with \TeX's input stacks. @<Introduce new material...@>= {@+begin_file_reading(); /*enter a new syntactic level for terminal input*/ /*now |state==mid_line|, so an initial blank space will count as a blank*/ if (last > first+1) {@+loc=first+1;buffer[first]=' '; } else{@+prompt_input("insert>");loc=first; @.insert>@> } first=last; cur_input.limit_field=last-1; /*no |end_line_char| ends this line*/ return; } @ We allow deletion of up to 99 tokens at a time. @<Delete \(c)|c-"0"| tokens...@>= {@+s1=cur_tok;s2=cur_cmd;s3=cur_chr;s4=align_state; align_state=1000000;OK_to_interrupt=false; if ((last > first+1)&&(buffer[first+1] >= '0')&&(buffer[first+1] <= '9')) c=c*10+buffer[first+1]-'0'*11; else c=c-'0'; while (c > 0) {@+get_token(); /*one-level recursive call of |error| is possible*/ decr(c); } cur_tok=s1;cur_cmd=s2;cur_chr=s3;align_state=s4;OK_to_interrupt=true; help2("I have just deleted some text, as you asked.")@/ ("You can now delete more, or insert, or whatever."); show_context();goto resume; } @ @<Print the help info...@>= {@+if (use_err_help) {@+give_err_help();use_err_help=false; } else{@+if (help_ptr==0) help2("Sorry, I don't know how to help in this situation.")@/ @t\kern1em@>("Maybe you should try asking a human?"); @/do@+{decr(help_ptr);print_str(help_line[help_ptr]);print_ln(); }@+ while (!(help_ptr==0)); } help4("Sorry, I already gave what help I could...")@/ ("Maybe you should try asking a human?")@/ ("An error might have occurred before I noticed any problems.")@/ ("``If all else fails, read the instructions.'");@/ goto resume; } @ @<Put help message on the transcript file@>= if (interaction > batch_mode) decr(selector); /*avoid terminal output*/ if (use_err_help) {@+print_ln();give_err_help(); } else while (help_ptr > 0) {@+decr(help_ptr);print_nl(help_line[help_ptr]); } print_ln(); if (interaction > batch_mode) incr(selector); /*re-enable terminal output*/ print_ln() @ A dozen or so error messages end with a parenthesized integer, so we save a teeny bit of program space by declaring the following procedure: @p void int_error(int @!n) {@+print_str(" (");print_int(n);print_char(')');error(); } @ In anomalous cases, the print selector might be in an unknown state; the following subroutine is called to fix things just enough to keep running a bit longer. @p void normalize_selector(void) {@+if (log_opened) selector=term_and_log; else selector=term_only; if (job_name==0) open_log_file(); if (interaction==batch_mode) decr(selector); } @ The following procedure prints \TeX's last words before dying. @d succumb {@+if (interaction==error_stop_mode) interaction=scroll_mode; /*no more interaction*/ if (log_opened) error(); if (interaction > batch_mode) debug_help(); history=fatal_error_stop;jump_out(); /*irrecoverable error*/ } @<Error hand...@>= void fatal_error(char *@!s) /*prints |s|, and that's it*/ {@+normalize_selector();@/ print_err("Emergency stop");help1(s);succumb; @.Emergency stop@> } @ Here is the most dreaded error message. @<Error hand...@>= void overflow(char *@!s, int @!n) /*stop due to finiteness*/ {@+normalize_selector(); print_err("TeX capacity exceeded, sorry ["); @.TeX capacity exceeded ...@> print_str(s);print_char('=');print_int(n);print_char(']'); help2("If you really absolutely need more capacity,")@/ ("you can ask a wizard to enlarge me."); succumb; } @ The program might sometime run completely amok, at which point there is no choice but to stop. If no previous error has been detected, that's bad news; a message is printed that is really intended for the \TeX\ maintenance person instead of the user (unless the user has been particularly diabolical). The index entries for `this can't happen' may help to pinpoint the problem. @^dry rot@> @<Error hand...@>= void confusion(str_number @!s) /*consistency check violated; |s| tells where*/ {@+normalize_selector(); if (history < error_message_issued) {@+print_err("This can't happen (");print(s);print_char(')'); @.This can't happen@> help1("I'm broken. Please show this to someone who can fix can fix"); } else{@+print_err("I can't go on meeting you like this"); @.I can't go on...@> help2("One of your faux pas seems to have wounded me deeply...")@/ ("in fact, I'm barely conscious. Please fix it and try again."); } succumb; } @ Users occasionally want to interrupt \TeX\ while it's running. If the \PASCAL\ runtime system allows this, one can implement a routine that sets the global variable |interrupt| to some nonzero value when such an interrupt is signalled. Otherwise there is probably at least a way to make |interrupt| nonzero using the \PASCAL\ debugger. @^system dependencies@> @^debugging@> @d check_interrupt {@+if (interrupt!=0) pause_for_instructions(); } @<Global...@>= int @!interrupt; /*should \TeX\ pause for instructions?*/ bool @!OK_to_interrupt; /*should interrupts be observed?*/ @ @<Set init...@>= interrupt=0;OK_to_interrupt=true; @ When an interrupt has been detected, the program goes into its highest interaction level and lets the user have nearly the full flexibility of the |error| routine. \TeX\ checks for interrupts only at times when it is safe to do this. @p void pause_for_instructions(void) {@+if (OK_to_interrupt) {@+interaction=error_stop_mode; if ((selector==log_only)||(selector==no_print)) incr(selector); print_err("Interruption"); @.Interruption@> help3("You rang?")@/ ("Try to insert some instructions for me (e.g.,`I\\showlists'),")@/ ("unless you just want to quit by typing `X'."); deletions_allowed=false;error();deletions_allowed=true; interrupt=0; } } @* Arithmetic with scaled dimensions. The principal computations performed by \TeX\ are done entirely in terms of integers less than $2^{31}$ in magnitude; and divisions are done only when both dividend and divisor are nonnegative. Thus, the arithmetic specified in this program can be carried out in exactly the same way on a wide variety of computers, including some small ones. Why? Because the arithmetic calculations need to be spelled out precisely in order to guarantee that \TeX\ will produce identical output on different machines. If some quantities were rounded differently in different implementations, we would find that line breaks and even page breaks might occur in different places. Hence the arithmetic of \TeX\ has been designed with care, and systems that claim to be implementations of \TeX82 should follow precisely the @:TeX82}{\TeX82@> calculations as they appear in the present program. (Actually there are three places where \TeX\ uses |/| with a possibly negative numerator. These are harmless; see |/| in the index. Also if the user sets the \.{\\time} or the \.{\\year} to a negative value, some diagnostic information will involve negative-numerator division. The same remarks apply for |%| as well as for |/|.) @ Here is a routine that calculates half of an integer, using an unambiguous convention with respect to signed odd numbers. @p int half(int @!x) {@+if (odd(x)) return(x+1)/2; else return x/2; } @ Fixed-point arithmetic is done on {\sl scaled integers\/} that are multiples of $2^{-16}$. In other words, a binary point is assumed to be sixteen bit positions from the right end of a binary computer word. @d unity 0200000 /*$2^{16}$, represents 1.00000*/ @d two 0400000 /*$2^{17}$, represents 2.00000*/ @<Types...@>= typedef int scaled; /*this type is used for scaled integers*/ typedef uint32_t nonnegative_integer; /*$0\L x<2^{31}$*/ typedef uint8_t small_number; /*this type is self-explanatory*/ @ The following function is used to create a scaled integer from a given decimal fraction $(.d_0d_1\ldots d_{k-1})$, where |0 <= k <= 17|. The digit $d_i$ is given in |dig[i]|, and the calculation produces a correctly rounded result. @p scaled round_decimals(small_number @!k) /*converts a decimal fraction*/ {@+int a; /*the accumulator*/ a=0; while (k > 0) {@+decr(k);a=(a+dig[k]*two)/10; } return(a+1)/2; } @ Conversely, here is a procedure analogous to |print_int|. If the output of this procedure is subsequently read by \TeX\ and converted by the |round_decimals| routine above, it turns out that the original value will be reproduced exactly; the ``simplest'' such decimal number is output, but there is always at least one digit following the decimal point. The invariant relation in the \&{repeat} loop is that a sequence of decimal digits yet to be printed will yield the original number if and only if they form a fraction~$f$ in the range $s-\delta\L10\cdot2^{16}f<s$. We can stop if and only if $f=0$ satisfies this condition; the loop will terminate before $s$ can possibly become zero. @p void print_scaled(scaled @!s) /*prints scaled real, rounded to five digits*/ {@+scaled delta; /*amount of allowable inaccuracy*/ if (s < 0) {@+print_char('-');negate(s); /*print the sign, if negative*/ } print_int(s/unity); /*print the integer part*/ print_char('.'); s=10*(s%unity)+5;delta=10; @/do@+{if (delta > unity) s=s+0100000-50000; /*round the last digit*/ print_char('0'+(s/unity));s=10*(s%unity);delta=delta*10; }@+ while (!(s <= delta)); } @ Physical sizes that a \TeX\ user specifies for portions of documents are represented internally as scaled points. Thus, if we define an `sp' (scaled @^sp@> point) as a unit equal to $2^{-16}$ printer's points, every dimension inside of \TeX\ is an integer number of sp. There are exactly 4,736,286.72 sp per inch. Users are not allowed to specify dimensions larger than $2^{30}-1$ sp, which is a distance of about 18.892 feet (5.7583 meters); two such quantities can be added without overflow on a 32-bit computer. The present implementation of \TeX\ does not check for overflow when @^overflow in arithmetic@> dimensions are added or subtracted. This could be done by inserting a few dozen tests of the form `\ignorespaces|if (x >= 010000000000)@/ @t\\{report\_overflow}@>|', but the chance of overflow is so remote that such tests do not seem worthwhile. \TeX\ needs to do only a few arithmetic operations on scaled quantities, other than addition and subtraction, and the following subroutines do most of the work. A single computation might use several subroutine calls, and it is desirable to avoid producing multiple error messages in case of arithmetic overflow; so the routines set the global variable |arith_error| to |true| instead of reporting errors directly to the user. Another global variable, |rem|, holds the remainder after a division. @<Glob...@>= bool @!arith_error; /*has arithmetic overflow occurred recently?*/ scaled @!rem; /*amount subtracted to get an exact division*/ @ The first arithmetical subroutine we need computes $nx+y$, where |x| and~|y| are |scaled| and |n| is an integer. We will also use it to multiply integers. @d nx_plus_y(X, Y, Z) mult_and_add(X, Y, Z, 07777777777) @d mult_integers(X, Y) mult_and_add(X, Y, 0, 017777777777) @p scaled mult_and_add(int @!n, scaled @!x, scaled @!y, scaled @!max_answer) {@+if (n < 0) {@+negate(x);negate(n); } if (n==0) return y; else if (((x <= (max_answer-y)/n)&&(-x <= (max_answer+y)/n))) return n*x+y; else{@+arith_error=true;return 0; } } @ We also need to divide scaled dimensions by integers. @p scaled x_over_n(scaled @!x, int @!n) {@+bool negative; /*should |rem| be negated?*/ scaled x_over_n; negative=false; if (n==0) {@+arith_error=true;x_over_n=0;rem=x; } else{@+if (n < 0) {@+negate(x);negate(n);negative=true; } if (x >= 0) {@+x_over_n=x/n;rem=x%n; } else{@+x_over_n=-((-x)/n);rem=-((-x)%n); } } if (negative) negate(rem); return x_over_n;} @ Then comes the multiplication of a scaled number by a fraction |n/(double)d|, where |n| and |d| are nonnegative integers | <= @t$2^{16}$@>| and |d| is positive. It would be too dangerous to multiply by~|n| and then divide by~|d|, in separate operations, since overflow might well occur; and it would be too inaccurate to divide by |d| and then multiply by |n|. Hence this subroutine simulates 1.5-precision arithmetic. @p scaled xn_over_d(scaled @!x, int @!n, int @!d) {@+bool positive; /*was |x >= 0|?*/ nonnegative_integer @!t, @!u, @!v; /*intermediate quantities*/ scaled xn_over_d; if (x >= 0) positive=true; else{@+negate(x);positive=false; } t=(x%0100000)*n; u=(x/0100000)*n+(t/0100000); v=(u%d)*0100000+(t%0100000); if (u/d >= 0100000) arith_error=true; else u=0100000*(u/d)+(v/d); if (positive) {@+xn_over_d=u;rem=v%d; } else{@+xn_over_d=-u;rem=-(v%d); } return xn_over_d;} @ The next subroutine is used to compute the ``badness'' of glue, when a total~|t| is supposed to be made from amounts that sum to~|s|. According to {\sl The \TeX book}, the badness of this situation is $100(t/s)^3$; however, badness is simply a heuristic, so we need not squeeze out the last drop of accuracy when computing it. All we really want is an approximation that has similar properties. @:TeXbook}{\sl The \TeX book@> The actual method used to compute the badness is easier to read from the program than to describe in words. It produces an integer value that is a reasonably close approximation to $100(t/s)^3$, and all implementations of \TeX\ should use precisely this method. Any badness of $2^{13}$ or more is treated as infinitely bad, and represented by 10000. It is not difficult to prove that $$\hbox{|badness(t+1, s) >= badness(t, s) >= badness(t, s+1)|}.$$ The badness function defined here is capable of computing at most 1095 distinct values, but that is plenty. @d inf_bad 10000 /*infinitely bad value*/ @p halfword badness(scaled @!t, scaled @!s) /*compute badness, given |t >= 0|*/ {@+int r; /*approximation to $\alpha t/s$, where $\alpha^3\approx 100\cdot2^{18}$*/ if (t==0) return 0; else if (s <= 0) return inf_bad; else{@+if (t <= 7230584) r=(t*297)/s; /*$297^3=99.94\times2^{18}$*/ else if (s >= 1663497) r=t/(s/297); else r=t; if (r > 1290) return inf_bad; /*$1290^3<2^{31}<1291^3$*/ else return(r*r*r+0400000)/01000000; } /*that was $r^3/2^{18}$, rounded to the nearest integer*/ } @ When \TeX\ ``packages'' a list into a box, it needs to calculate the proportionality ratio by which the glue inside the box should stretch or shrink. This calculation does not affect \TeX's decision making, so the precise details of rounding, etc., in the glue calculation are not of critical importance for the consistency of results on different computers. We shall use the type |glue_ratio| for such proportionality ratios. A glue ratio should take the same amount of memory as an |int| (usually 32 bits) if it is to blend smoothly with \TeX's other data structures. Thus |glue_ratio| should be equivalent to |short_real| in some implementations of \PASCAL. Alternatively, it is possible to deal with glue ratios using nothing but fixed-point arithmetic; see {\sl TUGboat \bf3},1 (March 1982), 10--27. (But the routines cited there must be modified to allow negative glue ratios.) @^system dependencies@> @d set_glue_ratio_zero(X) X=0.0 /*store the representation of zero ratio*/ @d set_glue_ratio_one(X) X=1.0 /*store the representation of unit ratio*/ @d float(X) ((double)(X)) /*convert from |glue_ratio| to type |double|*/ @d unfloat(X) ((glue_ratio)(X)) /*convert from |double| to type |glue_ratio|*/ @d float_constant(X) ((double)(X)) /*convert |int| constant to |double|*/ @<Types...@>= typedef float32_t @!glue_ratio; /*one-word representation of a glue expansion factor*/ @* Packed data. In order to make efficient use of storage space, \TeX\ bases its major data structures on a |memory_word|, which contains either a (signed) integer, possibly scaled, or a (signed) |glue_ratio|, or a small number of fields that are one half or one quarter of the size used for storing integers. If |x| is a variable of type |memory_word|, it contains up to four fields that can be referred to as follows: $$\vbox{\halign{\hfil#&#\hfil&#\hfil\cr |x|&.|i|&(an |int|)\cr |x|&.|sc|\qquad&(a |scaled| integer)\cr |x|&.|gr|&(a |glue_ratio|)\cr |x.hh.lh|, |x.hh|&.|rh|&(two halfword fields)\cr |x.hh.b0|, |x.hh.b1|, |x.hh|&.|rh|&(two quarterword fields, one halfword field)\cr |x.qqqq.b0|, |x.qqqq.b1|, |x.qqqq|&.|b2|, |x.qqqq.b3|\hskip-100pt &\qquad\qquad\qquad(four quarterword fields)\cr}}$$ This is somewhat cumbersome to write, and not very readable either, but macros will be used to make the notation shorter and more transparent. The \PASCAL\ code below gives a formal definition of |memory_word| and its subsidiary types, using packed variant records. \TeX\ makes no assumptions about the relative positions of the fields within a word. Since we are assuming 32-bit integers, a halfword must contain at least 16 bits, and a quarterword must contain at least 8 bits. @^system dependencies@> But it doesn't hurt to have more bits; for example, with enough 36-bit words you might be able to have |mem_max| as large as 262142, which is eight times as much memory as anybody had during the first four years of \TeX's existence. N.B.: Valuable memory space will be dreadfully wasted unless \TeX\ is compiled by a \PASCAL\ that packs all of the |memory_word| variants into the space of a single integer. This means, for example, that |glue_ratio| words should be |short_real| instead of |double| on some computers. Some \PASCAL\ compilers will pack an integer whose subrange is `|0 dotdot 255|' into an eight-bit field, but others insist on allocating space for an additional sign bit; on such systems you can get 256 values into a quarterword only if the subrange is `|-128 dotdot 127|'. The present implementation tries to accommodate as many variations as possible, so it makes few assumptions. If integers having the subrange `|min_quarterword dotdot max_quarterword|' can be packed into a quarterword, and if integers having the subrange `|min_halfword dotdot max_halfword|' can be packed into a halfword, everything should work satisfactorily. It is usually most efficient to have |min_quarterword==min_halfword==0|, so one should try to achieve this unless it causes a severe problem. The values defined here are recommended for most 32-bit computers. @d min_quarterword 0 /*smallest allowable value in a |quarterword|*/ @d max_quarterword 255 /*largest allowable value in a |quarterword|*/ @d min_halfword 0 /*smallest allowable value in a |halfword|*/ @d max_halfword 65535 /*largest allowable value in a |halfword|*/ @ Here are the inequalities that the quarterword and halfword values must satisfy (or rather, the inequalities that they mustn't satisfy): @<Check the ``constant''...@>= #ifdef @!INIT if ((mem_min!=mem_bot)||(mem_max!=mem_top)) bad=10; #endif @;@/ if ((mem_min > mem_bot)||(mem_max < mem_top)) bad=10; if ((min_quarterword > 0)||(max_quarterword < 127)) bad=11; if ((min_halfword > 0)||(max_halfword < 32767)) bad=12; if ((min_quarterword < min_halfword)||@| (max_quarterword > max_halfword)) bad=13; if ((mem_min < min_halfword)||(mem_max >= max_halfword)||@| (mem_bot-mem_min > max_halfword+1)) bad=14; if ((font_base < min_quarterword)||(font_max > max_quarterword)) bad=15; if (font_max > font_base+256) bad=16; if ((save_size > max_halfword)||(max_strings > max_halfword)) bad=17; if (buf_size > max_halfword) bad=18; if (max_quarterword-min_quarterword < 255) bad=19; @ The operation of adding or subtracting |min_quarterword| occurs quite frequently in \TeX, so it is convenient to abbreviate this operation by using the macros |qi| and |qo| for input and output to and from quarterword format. The inner loop of \TeX\ will run faster with respect to compilers that don't optimize expressions like `|x+0|' and `|x-0|', if these macros are simplified in the obvious way when |min_quarterword==0|. @^inner loop@>@^system dependencies@> @d qi(X) X+min_quarterword /*to put an |eight_bits| item into a quarterword*/ @d qo(X) X-min_quarterword /*to take an |eight_bits| item out of a quarterword*/ @d hi(X) X+min_halfword /*to put a sixteen-bit item into a halfword*/ @d ho(X) X-min_halfword /*to take a sixteen-bit item from a halfword*/ @ The reader should study the following definitions closely: @^system dependencies@> @d sc i /*|scaled| data is equivalent to |int|*/ @<Types...@>= typedef uint8_t quarterword; /*1/4 of a word*/ typedef uint16_t halfword; /*1/2 of a word*/ typedef uint8_t two_choices; /*used when there are two variants in a record*/ typedef uint8_t four_choices; /*used when there are four variants in a record*/ typedef struct { @;@/ halfword @!rh; union { halfword @!lh; struct { quarterword @!b0;quarterword @!b1;} ; };} two_halves; typedef struct { @;@/ quarterword @!b0; quarterword @!b1; quarterword @!b2; quarterword @!b3; } four_quarters; typedef struct { @;@/ union { int @!i; glue_ratio @!gr; two_halves @!hh; four_quarters @!qqqq; };} memory_word; typedef struct {@+FILE *f;@+memory_word@,d;@+} word_file; @ When debugging, we may want to print a |memory_word| without knowing what type it is; so we print it in all modes. @^dirty \PASCAL@>@^debugging@> @p #ifdef @!DEBUG void print_word(memory_word @!w) /*prints |w| in all ways*/ {@+print_int(w.i);print_char(' ');@/ print_scaled(w.sc);print_char(' ');@/ print_scaled(round(unity*float(w.gr)));print_ln();@/ @^real multiplication@> print_int(w.hh.lh);print_char('=');print_int(w.hh.b0);print_char(':'); print_int(w.hh.b1);print_char(';');print_int(w.hh.rh);print_char(' ');@/ print_int(w.qqqq.b0);print_char(':');print_int(w.qqqq.b1);print_char(':'); print_int(w.qqqq.b2);print_char(':');print_int(w.qqqq.b3); } #endif @* Dynamic memory allocation. The \TeX\ system does nearly all of its own memory allocation, so that it can readily be transported into environments that do not have automatic facilities for strings, garbage collection, etc., and so that it can be in control of what error messages the user receives. The dynamic storage requirements of \TeX\ are handled by providing a large array |mem| in which consecutive blocks of words are used as nodes by the \TeX\ routines. Pointer variables are indices into this array, or into another array called |eqtb| that will be explained later. A pointer variable might also be a special flag that lies outside the bounds of |mem|, so we allow pointers to assume any |halfword| value. The minimum halfword value represents a null pointer. \TeX\ does not assume that |mem[null]| exists. @d pointer halfword /*a flag or a location in |mem| or |eqtb|*/ @s pointer int @d null min_halfword /*the null pointer*/ @<Glob...@>= pointer @!temp_ptr; /*a pointer variable for occasional emergency use*/ @ The |mem| array is divided into two regions that are allocated separately, but the dividing line between these two regions is not fixed; they grow together until finding their ``natural'' size in a particular job. Locations less than or equal to |lo_mem_max| are used for storing variable-length records consisting of two or more words each. This region is maintained using an algorithm similar to the one described in exercise 2.5--19 of {\sl The Art of Computer Programming}. However, no size field appears in the allocated nodes; the program is responsible for knowing the relevant size when a node is freed. Locations greater than or equal to |hi_mem_min| are used for storing one-word records; a conventional \.{AVAIL} stack is used for allocation in this region. Locations of |mem| between |mem_bot| and |mem_top| may be dumped as part of preloaded format files, by the \.{INITEX} preprocessor. @.INITEX@> Production versions of \TeX\ may extend the memory at both ends in order to provide more space; locations between |mem_min| and |mem_bot| are always used for variable-size nodes, and locations between |mem_top| and |mem_max| are always used for single-word nodes. The key pointers that govern |mem| allocation have a prescribed order: $$\advance\thickmuskip-2mu \hbox{|null <= mem_min <= mem_bot < lo_mem_max < hi_mem_min < mem_top <= mem_end <= mem_max|.}$$ Empirical tests show that the present implementation of \TeX\ tends to spend about 9\pct! of its running time allocating nodes, and about 6\pct! deallocating them after their use. @<Glob...@>= memory_word @!mem0[mem_max-mem_min+1], *const @!mem = @!mem0-mem_min; /*the big dynamic storage area*/ pointer @!lo_mem_max; /*the largest location of variable-size memory in use*/ pointer @!hi_mem_min; /*the smallest location of one-word memory in use*/ @ In order to study the memory requirements of particular applications, it is possible to prepare a version of \TeX\ that keeps track of current and maximum memory usage. When code between the delimiters | #ifdef @!STAT | $\ldots$ |tats| is not ``commented out,'' \TeX\ will run a bit slower but it will report these statistics when |tracing_stats| is sufficiently large. @<Glob...@>= int @!var_used, @!dyn_used; /*how much memory is in use*/ #ifdef @!DEBUG #define incr_dyn_used @[incr(dyn_used)@] #define decr_dyn_used @[decr(dyn_used)@] #else #define incr_dyn_used #define decr_dyn_used #endif @ Let's consider the one-word memory region first, since it's the simplest. The pointer variable |mem_end| holds the highest-numbered location of |mem| that has ever been used. The free locations of |mem| that occur between |hi_mem_min| and |mem_end|, inclusive, are of type |two_halves|, and we write |info(p)| and |link(p)| for the |lh| and |rh| fields of |mem[p]| when it is of this type. The single-word free locations form a linked list $$|avail|,\;\hbox{|link(avail)|},\;\hbox{|link(link(avail))|},\;\ldots$$ terminated by |null|. @d link(X) mem[X].hh.rh /*the |link| field of a memory word*/ @d info(X) mem[X].hh.lh /*the |info| field of a memory word*/ @<Glob...@>= pointer @!avail; /*head of the list of available one-word nodes*/ pointer @!mem_end; /*the last one-word node used in |mem|*/ @ If memory is exhausted, it might mean that the user has forgotten a right brace. We will define some procedures later that try to help pinpoint the trouble. @p@<Declare the procedure called |show_token_list|@>@; @<Declare the procedure called |runaway|@>@; @ The function |get_avail| returns a pointer to a new one-word node whose |link| field is null. However, \TeX\ will halt if there is no more room left. @^inner loop@> If the available-space list is empty, i.e., if |avail==null|, we try first to increase |mem_end|. If that cannot be done, i.e., if |mem_end==mem_max|, we try to decrease |hi_mem_min|. If that cannot be done, i.e., if |hi_mem_min==lo_mem_max+1|, we have to quit. @p pointer get_avail(void) /*single-word node allocation*/ {@+pointer p; /*the new node being got*/ p=avail; /*get top location in the |avail| stack*/ if (p!=null) avail=link(avail); /*and pop it off*/ else if (mem_end < mem_max) /*or go into virgin territory*/ {@+incr(mem_end);p=mem_end; } else{@+decr(hi_mem_min);p=hi_mem_min; if (hi_mem_min <= lo_mem_max) {@+runaway(); /*if memory is exhausted, display possible runaway text*/ overflow("main memory size", mem_max+1-mem_min); /*quit; all one-word nodes are busy*/ @:TeX capacity exceeded main memory size}{\quad main memory size@> } } link(p)=null; /*provide an oft-desired initialization of the new node*/ incr_dyn_used; /*maintain statistics*/ return p; } @ Conversely, a one-word node is recycled by calling |free_avail|. This routine is part of \TeX's ``inner loop,'' so we want it to be fast. @^inner loop@> @d free_avail(X) /*single-word node liberation*/ {@+link(X)=avail;avail=X; decr_dyn_used; } @ There's also a |fast_get_avail| routine, which saves the procedure-call overhead at the expense of extra programming. This routine is used in the places that would otherwise account for the most calls of |get_avail|. @^inner loop@> @d fast_get_avail(X) @t@>@;@/ {@+X=avail; /*avoid |get_avail| if possible, to save time*/ if (X==null) X=get_avail(); else{@+avail=link(X);link(X)=null; incr_dyn_used; } } @ The procedure |flush_list(p)| frees an entire linked list of one-word nodes that starts at position |p|. @^inner loop@> @p void flush_list(pointer @!p) /*makes list of single-word nodes available*/ {@+pointer @!q, @!r; /*list traversers*/ if (p!=null) {@+r=p; @/do@+{q=r;r=link(r); decr_dyn_used; }@+ while (!(r==null)); /*now |q| is the last node on the list*/ link(q)=avail;avail=p; } } @ The available-space list that keeps track of the variable-size portion of |mem| is a nonempty, doubly-linked circular list of empty nodes, pointed to by the roving pointer |rover|. Each empty node has size 2 or more; the first word contains the special value |max_halfword| in its |link| field and the size in its |info| field; the second word contains the two pointers for double linking. Each nonempty node also has size 2 or more. Its first word is of type |two_halves|\kern-1pt, and its |link| field is never equal to |max_halfword|. Otherwise there is complete flexibility with respect to the contents of its other fields and its other words. (We require |mem_max < max_halfword| because terrible things can happen when |max_halfword| appears in the |link| field of a nonempty node.) @d empty_flag max_halfword /*the |link| of an empty variable-size node*/ @d is_empty(X) (link(X)==empty_flag) /*tests for empty node*/ @d node_size info /*the size field in empty variable-size nodes*/ @d llink(X) info(X+1) /*left link in doubly-linked list of empty nodes*/ @d rlink(X) link(X+1) /*right link in doubly-linked list of empty nodes*/ @<Glob...@>= pointer @!rover; /*points to some node in the list of empties*/ @ A call to |get_node| with argument |s| returns a pointer to a new node of size~|s|, which must be 2~or more. The |link| field of the first word of this new node is set to null. An overflow stop occurs if no suitable space exists. If |get_node| is called with $s=2^{30}$, it simply merges adjacent free areas and returns the value |max_halfword|. @p pointer get_node(int @!s) /*variable-size node allocation*/ {@+ pointer p; /*the node currently under inspection*/ pointer @!q; /*the node physically after node |p|*/ int @!r; /*the newly allocated node, or a candidate for this honor*/ int @!t; /*temporary register*/ restart: p=rover; /*start at some free node in the ring*/ @/do@+{@<Try to allocate within node |p| and its physical successors, and |goto found| if allocation was possible@>; @^inner loop@> p=rlink(p); /*move to the next node in the ring*/ }@+ while (!(p==rover)); /*repeat until the whole list has been traversed*/ if (s==010000000000) {@+return max_halfword; } if (lo_mem_max+2 < hi_mem_min) if (lo_mem_max+2 <= mem_bot+max_halfword) @<Grow more variable-size memory and |goto restart|@>; overflow("main memory size", mem_max+1-mem_min); /*sorry, nothing satisfactory is left*/ @:TeX capacity exceeded main memory size}{\quad main memory size@> found: link(r)=null; /*this node is now nonempty*/ #ifdef @!STAT var_used=var_used+s; /*maintain usage statistics*/ #endif @;@/ return r; } @ The lower part of |mem| grows by 1000 words at a time, unless we are very close to going under. When it grows, we simply link a new node into the available-space list. This method of controlled growth helps to keep the |mem| usage consecutive when \TeX\ is implemented on ``virtual memory'' systems. @^virtual memory@> @<Grow more variable-size memory and |goto restart|@>= {@+if (hi_mem_min-lo_mem_max >= 1998) t=lo_mem_max+1000; else t=lo_mem_max+1+(hi_mem_min-lo_mem_max)/2; /*|lo_mem_max+2 <= t < hi_mem_min|*/ p=llink(rover);q=lo_mem_max;rlink(p)=q;llink(rover)=q;@/ if (t > mem_bot+max_halfword) t=mem_bot+max_halfword; rlink(q)=rover;llink(q)=p;link(q)=empty_flag;node_size(q)=t-lo_mem_max;@/ lo_mem_max=t;link(lo_mem_max)=null;info(lo_mem_max)=null; rover=q;goto restart; } @ Empirical tests show that the routine in this section performs a node-merging operation about 0.75 times per allocation, on the average, after which it finds that |r > p+1| about 95\pct! of the time. @<Try to allocate...@>= q=p+node_size(p); /*find the physical successor*/ @^inner loop@> while (is_empty(q)) /*merge node |p| with node |q|*/ {@+t=rlink(q); if (q==rover) rover=t; llink(t)=llink(q);rlink(llink(q))=t;@/ q=q+node_size(q); } r=q-s; if (r > p+1) @<Allocate from the top of node |p| and |goto found|@>; if (r==p) if (rlink(p)!=p) @<Allocate entire node |p| and |goto found|@>; node_size(p)=q-p /*reset the size in case it grew*/ @ @<Allocate from the top...@>= {@+node_size(p)=r-p; /*store the remaining size*/ @^inner loop@> rover=p; /*start searching here next time*/ goto found; } @ Here we delete node |p| from the ring, and let |rover| rove around. @<Allocate entire...@>= {@+rover=rlink(p);t=llink(p); llink(rover)=t;rlink(t)=rover; goto found; } @ Conversely, when some variable-size node |p| of size |s| is no longer needed, the operation |free_node(p, s)| will make its words available, by inserting |p| as a new empty node just before where |rover| now points. @^inner loop@> @p void free_node(pointer @!p, halfword @!s) /*variable-size node liberation*/ {@+pointer q; /*|llink(rover)|*/ node_size(p)=s;link(p)=empty_flag; q=llink(rover);llink(p)=q;rlink(p)=rover; /*set both links*/ llink(rover)=p;rlink(q)=p; /*insert |p| into the ring*/ #ifdef @!STAT var_used=var_used-s; #endif @; /*maintain statistics*/ } @ Just before \.{INITEX} writes out the memory, it sorts the doubly linked available space list. The list is probably very short at such times, so a simple insertion sort is used. The smallest available location will be pointed to by |rover|, the next-smallest by |rlink(rover)|, etc. @p #ifdef @!INIT void sort_avail(void) /*sorts the available variable-size nodes by location*/ {@+pointer p, @!q, @!r; /*indices into |mem|*/ pointer @!old_rover; /*initial |rover| setting*/ p=get_node(010000000000); /*merge adjacent free areas*/ p=rlink(rover);rlink(rover)=max_halfword;old_rover=rover; while (p!=old_rover) @<Sort \(p)|p| into the list starting at |rover| and advance |p| to |rlink(p)|@>; p=rover; while (rlink(p)!=max_halfword) {@+llink(rlink(p))=p;p=rlink(p); } rlink(p)=rover;llink(rover)=p; } #endif @ The following |while | loop is guaranteed to terminate, since the list that starts at |rover| ends with |max_halfword| during the sorting procedure. @<Sort \(p)|p|...@>= if (p < rover) {@+q=p;p=rlink(q);rlink(q)=rover;rover=q; } else{@+q=rover; while (rlink(q) < p) q=rlink(q); r=rlink(p);rlink(p)=rlink(q);rlink(q)=p;p=r; } @* Data structures for boxes and their friends. From the computer's standpoint, \TeX's chief mission is to create horizontal and vertical lists. We shall now investigate how the elements of these lists are represented internally as nodes in the dynamic memory. A horizontal or vertical list is linked together by |link| fields in the first word of each node. Individual nodes represent boxes, glue, penalties, or special things like discretionary hyphens; because of this variety, some nodes are longer than others, and we must distinguish different kinds of nodes. We do this by putting a `|type|' field in the first word, together with the link and an optional `|subtype|'. @d type(X) mem[X].hh.b0 /*identifies what kind of node this is*/ @d subtype(X) mem[X].hh.b1 /*secondary identification in some cases*/ @ A |@!char_node|, which represents a single character, is the most important kind of node because it accounts for the vast majority of all boxes. Special precautions are therefore taken to ensure that a |char_node| does not take up much memory space. Every such node is one word long, and in fact it is identifiable by this property, since other kinds of nodes have at least two words, and they appear in |mem| locations less than |hi_mem_min|. This makes it possible to omit the |type| field in a |char_node|, leaving us room for two bytes that identify a |font| and a |character| within that font. Note that the format of a |char_node| allows for up to 256 different fonts and up to 256 characters per font; but most implementations will probably limit the total number of fonts to fewer than 75 per job, and most fonts will stick to characters whose codes are less than 128 (since higher codes are more difficult to access on most keyboards). Extensions of \TeX\ intended for oriental languages will need even more than $256\times256$ possible characters, when we consider different sizes @^oriental characters@>@^Chinese characters@>@^Japanese characters@> and styles of type. It is suggested that Chinese and Japanese fonts be handled by representing such characters in two consecutive |char_node| entries: The first of these has |font==font_base|, and its |link| points to the second; the second identifies the font and the character dimensions. The saving feature about oriental characters is that most of them have the same box dimensions. The |character| field of the first |char_node| is a ``\\{charext}'' that distinguishes between graphic symbols whose dimensions are identical for typesetting purposes. (See the \MF\ manual.) Such an extension of \TeX\ would not be difficult; further details are left to the reader. In order to make sure that the |character| code fits in a quarterword, \TeX\ adds the quantity |min_quarterword| to the actual code. Character nodes appear only in horizontal lists, never in vertical lists. @d is_char_node(X) (X >= hi_mem_min) /*does the argument point to a |char_node|?*/ @d font type /*the font code in a |char_node|*/ @d character subtype /*the character code in a |char_node|*/ @ An |hlist_node| stands for a box that was made from a horizontal list. Each |hlist_node| is seven words long, and contains the following fields (in addition to the mandatory |type| and |link|, which we shall not mention explicitly when discussing the other node types): The |height| and |width| and |depth| are scaled integers denoting the dimensions of the box. There is also a |shift_amount| field, a scaled integer indicating how much this box should be lowered (if it appears in a horizontal list), or how much it should be moved to the right (if it appears in a vertical list). There is a |list_ptr| field, which points to the beginning of the list from which this box was fabricated; if |list_ptr| is |null|, the box is empty. Finally, there are three fields that represent the setting of the glue: |glue_set(p)| is a word of type |glue_ratio| that represents the proportionality constant for glue setting; |glue_sign(p)| is |stretching| or |shrinking| or |normal| depending on whether or not the glue should stretch or shrink or remain rigid; and |glue_order(p)| specifies the order of infinity to which glue setting applies (|normal|, |fil|, |fill|, or |filll|). The |subtype| field is not used. @d hlist_node 0 /*|type| of hlist nodes*/ @d box_node_size 7 /*number of words to allocate for a box node*/ @d width_offset 1 /*position of |width| field in a box node*/ @d depth_offset 2 /*position of |depth| field in a box node*/ @d height_offset 3 /*position of |height| field in a box node*/ @d width(X) mem[X+width_offset].sc /*width of the box, in sp*/ @d depth(X) mem[X+depth_offset].sc /*depth of the box, in sp*/ @d height(X) mem[X+height_offset].sc /*height of the box, in sp*/ @d shift_amount(X) mem[X+4].sc /*repositioning distance, in sp*/ @d list_offset 5 /*position of |list_ptr| field in a box node*/ @d list_ptr(X) link(X+list_offset) /*beginning of the list inside the box*/ @d glue_order(X) subtype(X+list_offset) /*applicable order of infinity*/ @d glue_sign(X) type(X+list_offset) /*stretching or shrinking*/ @d normal 0 /*the most common case when several cases are named*/ @d stretching 1 /*glue setting applies to the stretch components*/ @d shrinking 2 /*glue setting applies to the shrink components*/ @d glue_offset 6 /*position of |glue_set| in a box node*/ @d glue_set(X) mem[X+glue_offset].gr /*a word of type |glue_ratio| for glue setting*/ @ The |new_null_box| function returns a pointer to an |hlist_node| in which all subfields have the values corresponding to `\.{\\hbox\{\}}'. The |subtype| field is set to |min_quarterword|, since that's the desired |span_count| value if this |hlist_node| is changed to an |unset_node|. @p pointer new_null_box(void) /*creates a new box node*/ {@+pointer p; /*the new node*/ p=get_node(box_node_size);type(p)=hlist_node; subtype(p)=min_quarterword; width(p)=0;depth(p)=0;height(p)=0;shift_amount(p)=0;list_ptr(p)=null; glue_sign(p)=normal;glue_order(p)=normal;set_glue_ratio_zero(glue_set(p)); return p; } @ A |vlist_node| is like an |hlist_node| in all respects except that it contains a vertical list. @d vlist_node 1 /*|type| of vlist nodes*/ @ A |rule_node| stands for a solid black rectangle; it has |width|, |depth|, and |height| fields just as in an |hlist_node|. However, if any of these dimensions is $-2^{30}$, the actual value will be determined by running the rule up to the boundary of the innermost enclosing box. This is called a ``running dimension.'' The |width| is never running in an hlist; the |height| and |depth| are never running in a~vlist. @d rule_node 2 /*|type| of rule nodes*/ @d rule_node_size 4 /*number of words to allocate for a rule node*/ @d null_flag -010000000000 /*$-2^{30}$, signifies a missing item*/ @d is_running(X) (X==null_flag) /*tests for a running dimension*/ @ A new rule node is delivered by the |new_rule| function. It makes all the dimensions ``running,'' so you have to change the ones that are not allowed to run. @p pointer new_rule(void) {@+pointer p; /*the new node*/ p=get_node(rule_node_size);type(p)=rule_node; subtype(p)=0; /*the |subtype| is not used*/ width(p)=null_flag;depth(p)=null_flag;height(p)=null_flag; return p; } @ Insertions are represented by |ins_node| records, where the |subtype| indicates the corresponding box number. For example, `\.{\\insert 250}' leads to an |ins_node| whose |subtype| is |250+min_quarterword|. The |height| field of an |ins_node| is slightly misnamed; it actually holds the natural height plus depth of the vertical list being inserted. The |depth| field holds the |split_max_depth| to be used in case this insertion is split, and the |split_top_ptr| points to the corresponding |split_top_skip|. The |float_cost| field holds the |floating_penalty| that will be used if this insertion floats to a subsequent page after a split insertion of the same class. There is one more field, the |ins_ptr|, which points to the beginning of the vlist for the insertion. @d ins_node 3 /*|type| of insertion nodes*/ @d ins_node_size 5 /*number of words to allocate for an insertion*/ @d float_cost(X) mem[X+1].i /*the |floating_penalty| to be used*/ @d ins_ptr(X) info(X+4) /*the vertical list to be inserted*/ @d split_top_ptr(X) link(X+4) /*the |split_top_skip| to be used*/ @ A |mark_node| has a |mark_ptr| field that points to the reference count of a token list that contains the user's \.{\\mark} text. This field occupies a full word instead of a halfword, because there's nothing to put in the other halfword; it is easier in \PASCAL\ to use the full word than to risk leaving garbage in the unused half. @d mark_node 4 /*|type| of a mark node*/ @d small_node_size 2 /*number of words to allocate for most node types*/ @d mark_ptr(X) mem[X+1].i /*head of the token list for a mark*/ @ An |adjust_node|, which occurs only in horizontal lists, specifies material that will be moved out into the surrounding vertical list; i.e., it is used to implement \TeX's `\.{\\vadjust}' operation. The |adjust_ptr| field points to the vlist containing this material. @d adjust_node 5 /*|type| of an adjust node*/ @d adjust_ptr mark_ptr /*vertical list to be moved out of horizontal list*/ @ A |ligature_node|, which occurs only in horizontal lists, specifies a character that was fabricated from the interaction of two or more actual characters. The second word of the node, which is called the |lig_char| word, contains |font| and |character| fields just as in a |char_node|. The characters that generated the ligature have not been forgotten, since they are needed for diagnostic messages and for hyphenation; the |lig_ptr| field points to a linked list of character nodes for all original characters that have been deleted. (This list might be empty if the characters that generated the ligature were retained in other nodes.) The |subtype| field is 0, plus 2 and/or 1 if the original source of the ligature included implicit left and/or right boundaries. @d ligature_node 6 /*|type| of a ligature node*/ @d lig_char(X) X+1 /*the word where the ligature is to be found*/ @d lig_ptr(X) link(lig_char(X)) /*the list of characters*/ @ The |new_ligature| function creates a ligature node having given contents of the |font|, |character|, and |lig_ptr| fields. We also have a |new_lig_item| function, which returns a two-word node having a given |character| field. Such nodes are used for temporary processing as ligatures are being created. @p pointer new_ligature(quarterword @!f, quarterword @!c, pointer @!q) {@+pointer p; /*the new node*/ p=get_node(small_node_size);type(p)=ligature_node; font(lig_char(p))=f;character(lig_char(p))=c;lig_ptr(p)=q; subtype(p)=0;return p; } @# pointer new_lig_item(quarterword @!c) {@+pointer p; /*the new node*/ p=get_node(small_node_size);character(p)=c;lig_ptr(p)=null; return p; } @ A |disc_node|, which occurs only in horizontal lists, specifies a ``dis\-cretion\-ary'' line break. If such a break occurs at node |p|, the text that starts at |pre_break(p)| will precede the break, the text that starts at |post_break(p)| will follow the break, and text that appears in the next |replace_count(p)| nodes will be ignored. For example, an ordinary discretionary hyphen, indicated by `\.{\\-}', yields a |disc_node| with |pre_break| pointing to a |char_node| containing a hyphen, |post_break==null|, and |replace_count==0|. All three of the discretionary texts must be lists that consist entirely of character, kern, box, rule, and ligature nodes. If |pre_break(p)==null|, the |ex_hyphen_penalty| will be charged for this break. Otherwise the |hyphen_penalty| will be charged. The texts will actually be substituted into the list by the line-breaking algorithm if it decides to make the break, and the discretionary node will disappear at that time; thus, the output routine sees only discretionaries that were not chosen. @d disc_node 7 /*|type| of a discretionary node*/ @d replace_count subtype /*how many subsequent nodes to replace*/ @d pre_break llink /*text that precedes a discretionary break*/ @d post_break rlink /*text that follows a discretionary break*/ @p pointer new_disc(void) /*creates an empty |disc_node|*/ {@+pointer p; /*the new node*/ p=get_node(small_node_size);type(p)=disc_node; replace_count(p)=0;pre_break(p)=null;post_break(p)=null; return p; } @ A |whatsit_node| is a wild card reserved for extensions to \TeX. The |subtype| field in its first word says what `\\{whatsit}' it is, and implicitly determines the node size (which must be 2 or more) and the format of the remaining words. When a |whatsit_node| is encountered in a list, special actions are invoked; knowledgeable people who are careful not to mess up the rest of \TeX\ are able to make \TeX\ do new things by adding code at the end of the program. For example, there might be a `\TeX nicolor' extension to specify different colors of ink, @^extensions to \TeX@> and the whatsit node might contain the desired parameters. The present implementation of \TeX\ treats the features associated with `\.{\\write}' and `\.{\\special}' as if they were extensions, in order to illustrate how such routines might be coded. We shall defer further discussion of extensions until the end of this program. @d whatsit_node 8 /*|type| of special extension nodes*/ @ A |math_node|, which occurs only in horizontal lists, appears before and after mathematical formulas. The |subtype| field is |before| before the formula and |after| after it. There is a |width| field, which represents the amount of surrounding space inserted by \.{\\mathsurround}. @d math_node 9 /*|type| of a math node*/ @d before 0 /*|subtype| for math node that introduces a formula*/ @d after 1 /*|subtype| for math node that winds up a formula*/ @p pointer new_math(scaled @!w, small_number @!s) {@+pointer p; /*the new node*/ p=get_node(small_node_size);type(p)=math_node; subtype(p)=s;width(p)=w;return p; } @ \TeX\ makes use of the fact that |hlist_node|, |vlist_node|, |rule_node|, |ins_node|, |mark_node|, |adjust_node|, |ligature_node|, |disc_node|, |whatsit_node|, and |math_node| are at the low end of the type codes, by permitting a break at glue in a list if and only if the |type| of the previous node is less than |math_node|. Furthermore, a node is discarded after a break if its type is |math_node| or~more. @d precedes_break(X) (type(X) < math_node) @d non_discardable(X) (type(X) < math_node) @ A |glue_node| represents glue in a list. However, it is really only a pointer to a separate glue specification, since \TeX\ makes use of the fact that many essentially identical nodes of glue are usually present. If |p| points to a |glue_node|, |glue_ptr(p)| points to another packet of words that specify the stretch and shrink components, etc. Glue nodes also serve to represent leaders; the |subtype| is used to distinguish between ordinary glue (which is called |normal|) and the three kinds of leaders (which are called |a_leaders|, |c_leaders|, and |x_leaders|). The |leader_ptr| field points to a rule node or to a box node containing the leaders; it is set to |null| in ordinary glue nodes. Many kinds of glue are computed from \TeX's ``skip'' parameters, and it is helpful to know which parameter has led to a particular glue node. Therefore the |subtype| is set to indicate the source of glue, whenever it originated as a parameter. We will be defining symbolic names for the parameter numbers later (e.g., |line_skip_code==0|, |baseline_skip_code==1|, etc.); it suffices for now to say that the |subtype| of parametric glue will be the same as the parameter number, plus~one. In math formulas there are two more possibilities for the |subtype| in a glue node: |mu_glue| denotes an \.{\\mskip} (where the units are scaled \.{mu} instead of scaled \.{pt}); and |cond_math_glue| denotes the `\.{\\nonscript}' feature that cancels the glue node immediately following if it appears in a subscript. @d glue_node 10 /*|type| of node that points to a glue specification*/ @d cond_math_glue 98 /*special |subtype| to suppress glue in the next node*/ @d mu_glue 99 /*|subtype| for math glue*/ @d a_leaders 100 /*|subtype| for aligned leaders*/ @d c_leaders 101 /*|subtype| for centered leaders*/ @d x_leaders 102 /*|subtype| for expanded leaders*/ @d glue_ptr llink /*pointer to a glue specification*/ @d leader_ptr rlink /*pointer to box or rule node for leaders*/ @ A glue specification has a halfword reference count in its first word, @^reference counts@> representing |null| plus the number of glue nodes that point to it (less one). Note that the reference count appears in the same position as the |link| field in list nodes; this is the field that is initialized to |null| when a node is allocated, and it is also the field that is flagged by |empty_flag| in empty nodes. Glue specifications also contain three |scaled| fields, for the |width|, |stretch|, and |shrink| dimensions. Finally, there are two one-byte fields called |stretch_order| and |shrink_order|; these contain the orders of infinity (|normal|, |fil|, |fill|, or |filll|) corresponding to the stretch and shrink values. @d glue_spec_size 4 /*number of words to allocate for a glue specification*/ @d glue_ref_count(X) link(X) /*reference count of a glue specification*/ @d stretch(X) mem[X+2].sc /*the stretchability of this glob of glue*/ @d shrink(X) mem[X+3].sc /*the shrinkability of this glob of glue*/ @d stretch_order type /*order of infinity for stretching*/ @d shrink_order subtype /*order of infinity for shrinking*/ @d fil 1 /*first-order infinity*/ @d fill 2 /*second-order infinity*/ @d filll 3 /*third-order infinity*/ @<Types...@>= typedef uint8_t glue_ord; /*infinity to the 0, 1, 2, or 3 power*/ @ Here is a function that returns a pointer to a copy of a glue spec. The reference count in the copy is |null|, because there is assumed to be exactly one reference to the new specification. @p pointer new_spec(pointer @!p) /*duplicates a glue specification*/ {@+pointer q; /*the new spec*/ q=get_node(glue_spec_size);@/ mem[q]=mem[p];glue_ref_count(q)=null;@/ width(q)=width(p);stretch(q)=stretch(p);shrink(q)=shrink(p); return q; } @ And here's a function that creates a glue node for a given parameter identified by its code number; for example, |new_param_glue(line_skip_code)| returns a pointer to a glue node for the current \.{\\lineskip}. @p pointer new_param_glue(small_number @!n) {@+pointer p; /*the new node*/ pointer @!q; /*the glue specification*/ p=get_node(small_node_size);type(p)=glue_node;subtype(p)=n+1; leader_ptr(p)=null;@/ q=@<Current |mem| equivalent of glue parameter number |n|@>@t@>; glue_ptr(p)=q;incr(glue_ref_count(q)); return p; } @ Glue nodes that are more or less anonymous are created by |new_glue|, whose argument points to a glue specification. @p pointer new_glue(pointer @!q) {@+pointer p; /*the new node*/ p=get_node(small_node_size);type(p)=glue_node;subtype(p)=normal; leader_ptr(p)=null;glue_ptr(p)=q;incr(glue_ref_count(q)); return p; } @ Still another subroutine is needed: This one is sort of a combination of |new_param_glue| and |new_glue|. It creates a glue node for one of the current glue parameters, but it makes a fresh copy of the glue specification, since that specification will probably be subject to change, while the parameter will stay put. The global variable |temp_ptr| is set to the address of the new spec. @p pointer new_skip_param(small_number @!n) {@+pointer p; /*the new node*/ temp_ptr=new_spec(@<Current |mem| equivalent of glue parameter...@>); p=new_glue(temp_ptr);glue_ref_count(temp_ptr)=null;subtype(p)=n+1; return p; } @ A |kern_node| has a |width| field to specify a (normally negative) amount of spacing. This spacing correction appears in horizontal lists between letters like A and V when the font designer said that it looks better to move them closer together or further apart. A kern node can also appear in a vertical list, when its `|width|' denotes additional spacing in the vertical direction. The |subtype| is either |normal| (for kerns inserted from font information or math mode calculations) or |explicit| (for kerns inserted from \.{\\kern} and \.{\\/} commands) or |acc_kern| (for kerns inserted from non-math accents) or |mu_glue| (for kerns inserted from \.{\\mkern} specifications in math formulas). @d kern_node 11 /*|type| of a kern node*/ @d explicit 1 /*|subtype| of kern nodes from \.{\\kern} and \.{\\/}*/ @d acc_kern 2 /*|subtype| of kern nodes from accents*/ @ The |new_kern| function creates a kern node having a given width. @p pointer new_kern(scaled @!w) {@+pointer p; /*the new node*/ p=get_node(small_node_size);type(p)=kern_node; subtype(p)=normal; width(p)=w; return p; } @ A |penalty_node| specifies the penalty associated with line or page breaking, in its |penalty| field. This field is a fullword integer, but the full range of integer values is not used: Any penalty | >= 10000| is treated as infinity, and no break will be allowed for such high values. Similarly, any penalty | <= -10000| is treated as negative infinity, and a break will be forced. @d penalty_node 12 /*|type| of a penalty node*/ @d inf_penalty inf_bad /*``infinite'' penalty value*/ @d eject_penalty (-inf_penalty) /*``negatively infinite'' penalty value*/ @d penalty(X) mem[X+1].i /*the added cost of breaking a list here*/ @ Anyone who has been reading the last few sections of the program will be able to guess what comes next. @p pointer new_penalty(int @!m) {@+pointer p; /*the new node*/ p=get_node(small_node_size);type(p)=penalty_node; subtype(p)=0; /*the |subtype| is not used*/ penalty(p)=m;return p; } @ You might think that we have introduced enough node types by now. Well, almost, but there is one more: An |unset_node| has nearly the same format as an |hlist_node| or |vlist_node|; it is used for entries in \.{\\halign} or \.{\\valign} that are not yet in their final form, since the box dimensions are their ``natural'' sizes before any glue adjustment has been made. The |glue_set| word is not present; instead, we have a |glue_stretch| field, which contains the total stretch of order |glue_order| that is present in the hlist or vlist being boxed. Similarly, the |shift_amount| field is replaced by a |glue_shrink| field, containing the total shrink of order |glue_sign| that is present. The |subtype| field is called |span_count|; an unset box typically contains the data for |qo(span_count)+1| columns. Unset nodes will be changed to box nodes when alignment is completed. @d unset_node 13 /*|type| for an unset node*/ @d glue_stretch(X) mem[X+glue_offset].sc /*total stretch in an unset node*/ @d glue_shrink shift_amount /*total shrink in an unset node*/ @d span_count subtype /*indicates the number of spanned columns*/ @ In fact, there are still more types coming. When we get to math formula processing we will see that a |style_node| has |type==14|; and a number of larger type codes will also be defined, for use in math mode only. @ Warning: If any changes are made to these data structure layouts, such as changing any of the node sizes or even reordering the words of nodes, the |copy_node_list| procedure and the memory initialization code below may have to be changed. Such potentially dangerous parts of the program are listed in the index under `data structure assumptions'. @!@^data structure assumptions@> However, other references to the nodes are made symbolically in terms of the \.{WEB} macro definitions above, so that format changes will leave \TeX's other algorithms intact. @^system dependencies@> @* Memory layout. Some areas of |mem| are dedicated to fixed usage, since static allocation is more efficient than dynamic allocation when we can get away with it. For example, locations |mem_bot| to |mem_bot+3| are always used to store the specification for glue that is `\.{0pt plus 0pt minus 0pt}'. The following macro definitions accomplish the static allocation by giving symbolic names to the fixed positions. Static variable-size nodes appear in locations |mem_bot| through |lo_mem_stat_max|, and static single-word nodes appear in locations |hi_mem_stat_min| through |mem_top|, inclusive. It is harmless to let |lig_trick| and |garbage| share the same location of |mem|. @d zero_glue mem_bot /*specification for \.{0pt plus 0pt minus 0pt}*/ @d fil_glue zero_glue+glue_spec_size /*\.{0pt plus 1fil minus 0pt}*/ @d fill_glue fil_glue+glue_spec_size /*\.{0pt plus 1fill minus 0pt}*/ @d ss_glue fill_glue+glue_spec_size /*\.{0pt plus 1fil minus 1fil}*/ @d fil_neg_glue ss_glue+glue_spec_size /*\.{0pt plus -1fil minus 0pt}*/ @d lo_mem_stat_max fil_neg_glue+glue_spec_size-1 /*largest statically allocated word in the variable-size |mem|*/ @# @d page_ins_head mem_top /*list of insertion data for current page*/ @d contrib_head mem_top-1 /*vlist of items not yet on current page*/ @d page_head mem_top-2 /*vlist for current page*/ @d temp_head mem_top-3 /*head of a temporary list of some kind*/ @d hold_head mem_top-4 /*head of a temporary list of another kind*/ @d adjust_head mem_top-5 /*head of adjustment list returned by |hpack|*/ @d active mem_top-7 /*head of active list in |line_break|, needs two words*/ @d align_head mem_top-8 /*head of preamble list for alignments*/ @d end_span mem_top-9 /*tail of spanned-width lists*/ @d omit_template mem_top-10 /*a constant token list*/ @d null_list mem_top-11 /*permanently empty list*/ @d lig_trick mem_top-12 /*a ligature masquerading as a |char_node|*/ @d garbage mem_top-12 /*used for scrap information*/ @d backup_head mem_top-13 /*head of token list built by |scan_keyword|*/ @d hi_mem_stat_min mem_top-13 /*smallest statically allocated word in the one-word |mem|*/ @d hi_mem_stat_usage 14 /*the number of one-word nodes always present*/ @ The following code gets |mem| off to a good start, when \TeX\ is initializing itself the slow~way. @<Local variables for init...@>= int @!k; /*index into |mem|, |eqtb|, etc.*/ @ @<Initialize table entries...@>= for (k=mem_bot+1; k<=lo_mem_stat_max; k++) mem[k].sc=0; /*all glue dimensions are zeroed*/ @^data structure assumptions@> k=mem_bot;@+while (k <= lo_mem_stat_max) /*set first words of glue specifications*/ {@+glue_ref_count(k)=null+1; stretch_order(k)=normal;shrink_order(k)=normal; k=k+glue_spec_size; } stretch(fil_glue)=unity;stretch_order(fil_glue)=fil;@/ stretch(fill_glue)=unity;stretch_order(fill_glue)=fill;@/ stretch(ss_glue)=unity;stretch_order(ss_glue)=fil;@/ shrink(ss_glue)=unity;shrink_order(ss_glue)=fil;@/ stretch(fil_neg_glue)=-unity;stretch_order(fil_neg_glue)=fil;@/ rover=lo_mem_stat_max+1; link(rover)=empty_flag; /*now initialize the dynamic memory*/ node_size(rover)=1000; /*which is a 1000-word available node*/ llink(rover)=rover;rlink(rover)=rover;@/ lo_mem_max=rover+1000;link(lo_mem_max)=null;info(lo_mem_max)=null;@/ for (k=hi_mem_stat_min; k<=mem_top; k++) mem[k]=mem[lo_mem_max]; /*clear list heads*/ @<Initialize the special list heads and constant nodes@>; avail=null;mem_end=mem_top; hi_mem_min=hi_mem_stat_min; /*initialize the one-word memory*/ var_used=lo_mem_stat_max+1-mem_bot;dyn_used=hi_mem_stat_usage; /*initialize statistics*/ @ If \TeX\ is extended improperly, the |mem| array might get screwed up. For example, some pointers might be wrong, or some ``dead'' nodes might not have been freed when the last reference to them disappeared. Procedures |check_mem| and |search_mem| are available to help diagnose such problems. These procedures make use of two arrays called |is_free| and |was_free| that are present only if \TeX's debugging routines have been included. (You may want to decrease the size of |mem| while you @^debugging@> are debugging.) @<Glob...@>= #ifdef @!DEBUG bool @!is_free0[mem_max-mem_min+1], *const @!is_free = @!is_free0-mem_min; /*free cells*/ @t\hskip10pt@>bool @!was_free0[mem_max-mem_min+1], *const @!was_free = @!was_free0-mem_min; /*previously free cells*/ @t\hskip10pt@>pointer @!was_mem_end, @!was_lo_max, @!was_hi_min; /*previous |mem_end|, |lo_mem_max|, and |hi_mem_min|*/ @t\hskip10pt@>bool @!panicking; /*do we want to check memory constantly?*/ #endif @ @<Set initial...@>= #ifdef @!DEBUG was_mem_end=mem_min; /*indicate that everything was previously free*/ was_lo_max=mem_min;was_hi_min=mem_max; panicking=false; #endif @ Procedure |check_mem| makes sure that the available space lists of |mem| are well formed, and it optionally prints out all locations that are reserved now but were free the last time this procedure was called. @p #ifdef @!DEBUG void check_mem(bool @!print_locs) {@+ /*loop exits*/ pointer p, @!q; /*current locations of interest in |mem|*/ bool @!clobbered; /*is something amiss?*/ for (p=mem_min; p<=lo_mem_max; p++) is_free[p]=false; /*you can probably do this faster*/ for (p=hi_mem_min; p<=mem_end; p++) is_free[p]=false; /*ditto*/ @<Check single-word |avail| list@>; @<Check variable-size |avail| list@>; @<Check flags of unavailable nodes@>; if (print_locs) @<Print newly busy locations@>; for (p=mem_min; p<=lo_mem_max; p++) was_free[p]=is_free[p]; for (p=hi_mem_min; p<=mem_end; p++) was_free[p]=is_free[p]; /*|was_free=is_free| might be faster*/ was_mem_end=mem_end;was_lo_max=lo_mem_max;was_hi_min=hi_mem_min; } #endif @ @<Check single-word...@>= p=avail;q=null;clobbered=false; while (p!=null) {@+if ((p > mem_end)||(p < hi_mem_min)) clobbered=true; else if (is_free[p]) clobbered=true; if (clobbered) {@+print_nl("AVAIL list clobbered at "); @.AVAIL list clobbered...@> print_int(q);goto done1; } is_free[p]=true;q=p;p=link(q); } done1: @ @<Check variable-size...@>= p=rover;q=null;clobbered=false; @/do@+{if ((p >= lo_mem_max)||(p < mem_min)) clobbered=true; else if ((rlink(p) >= lo_mem_max)||(rlink(p) < mem_min)) clobbered=true; else if (!(is_empty(p))||(node_size(p) < 2)||@| (p+node_size(p) > lo_mem_max)||@|(llink(rlink(p))!=p)) clobbered=true; if (clobbered) {@+print_nl("Double-AVAIL list clobbered at "); print_int(q);goto done2; } for (q=p; q<=p+node_size(p)-1; q++) /*mark all locations free*/ {@+if (is_free[q]) {@+print_nl("Doubly free location at "); @.Doubly free location...@> print_int(q);goto done2; } is_free[q]=true; } q=p;p=rlink(p); }@+ while (!(p==rover)); done2: @ @<Check flags...@>= p=mem_min; while (p <= lo_mem_max) /*node |p| should not be empty*/ {@+if (is_empty(p)) {@+print_nl("Bad flag at ");print_int(p); @.Bad flag...@> } while ((p <= lo_mem_max)&&!is_free[p]) incr(p); while ((p <= lo_mem_max)&&is_free[p]) incr(p); } @ @<Print newly busy...@>= {@+print_nl("New busy locs:"); for (p=mem_min; p<=lo_mem_max; p++) if (!is_free[p]&&((p > was_lo_max)||was_free[p])) {@+print_char(' ');print_int(p); } for (p=hi_mem_min; p<=mem_end; p++) if (!is_free[p]&& ((p < was_hi_min)||(p > was_mem_end)||was_free[p])) {@+print_char(' ');print_int(p); } } @ The |search_mem| procedure attempts to answer the question ``Who points to node~|p|?'' In doing so, it fetches |link| and |info| fields of |mem| that might not be of type |two_halves|. Strictly speaking, this is @^dirty \PASCAL@> undefined in \PASCAL, and it can lead to ``false drops'' (words that seem to point to |p| purely by coincidence). But for debugging purposes, we want to rule out the places that do {\sl not\/} point to |p|, so a few false drops are tolerable. @p #ifdef @!DEBUG void search_mem(pointer @!p) /*look for pointers to |p|*/ {@+int q; /*current position being searched*/ for (q=mem_min; q<=lo_mem_max; q++) {@+if (link(q)==p) {@+print_nl("LINK(");print_int(q);print_char(')'); } if (info(q)==p) {@+print_nl("INFO(");print_int(q);print_char(')'); } } for (q=hi_mem_min; q<=mem_end; q++) {@+if (link(q)==p) {@+print_nl("LINK(");print_int(q);print_char(')'); } if (info(q)==p) {@+print_nl("INFO(");print_int(q);print_char(')'); } } @<Search |eqtb| for equivalents equal to |p|@>; @<Search |save_stack| for equivalents that point to |p|@>; @<Search |hyph_list| for pointers to |p|@>; } #endif @* Displaying boxes. We can reinforce our knowledge of the data structures just introduced by considering two procedures that display a list in symbolic form. The first of these, called |short_display|, is used in ``overfull box'' messages to give the top-level description of a list. The other one, called |show_node_list|, prints a detailed description of exactly what is in the data structure. The philosophy of |short_display| is to ignore the fine points about exactly what is inside boxes, except that ligatures and discretionary breaks are expanded. As a result, |short_display| is a recursive procedure, but the recursion is never more than one level deep. @^recursion@> A global variable |font_in_short_display| keeps track of the font code that is assumed to be present when |short_display| begins; deviations from this font will be printed. @<Glob...@>= int @!font_in_short_display; /*an internal font number*/ @ Boxes, rules, inserts, whatsits, marks, and things in general that are sort of ``complicated'' are indicated only by printing `\.{[]}'. @p void short_display(int @!p) /*prints highlights of list |p|*/ {@+int n; /*for replacement counts*/ while (p > mem_min) {@+if (is_char_node(p)) {@+if (p <= mem_end) {@+if (font(p)!=font_in_short_display) {@+if ((font(p) < font_base)||(font(p) > font_max)) print_char('*'); @.*\relax@> else@<Print the font identifier for |font(p)|@>; print_char(' ');font_in_short_display=font(p); } print_ASCII(qo(character(p))); } } else@<Print a short indication of the contents of node |p|@>; p=link(p); } } @ @<Print a short indication of the contents of node |p|@>= switch (type(p)) { case hlist_node: case vlist_node: case ins_node: case whatsit_node: case mark_node: case adjust_node: case unset_node: print_str("[]");@+break; case rule_node: print_char('|');@+break; case glue_node: if (glue_ptr(p)!=zero_glue) print_char(' ');@+break; case math_node: print_char('$');@+break; case ligature_node: short_display(lig_ptr(p));@+break; case disc_node: {@+short_display(pre_break(p)); short_display(post_break(p));@/ n=replace_count(p); while (n > 0) {@+if (link(p)!=null) p=link(p); decr(n); } } @+break; default:do_nothing; } @ The |show_node_list| routine requires some auxiliary subroutines: one to print a font-and-character combination, one to print a token list without its reference count, and one to print a rule dimension. @p void print_font_and_char(int @!p) /*prints |char_node| data*/ {@+if (p > mem_end) print_esc(@[@<|"CLOBBERED."|@>@]); else{@+if ((font(p) < font_base)||(font(p) > font_max)) print_char('*'); @.*\relax@> else@<Print the font identifier for |font(p)|@>; print_char(' ');print_ASCII(qo(character(p))); } } @# void print_mark(int @!p) /*prints token list data in braces*/ {@+print_char('{'); if ((p < hi_mem_min)||(p > mem_end)) print_esc(@[@<|"CLOBBERED."|@>@]); else show_token_list(link(p), null, max_print_line-10); print_char('}'); } @# void print_rule_dimen(scaled @!d) /*prints dimension in rule node*/ {@+if (is_running(d)) print_char('*');else print_scaled(d); @.*\relax@> } @ Then there is a subroutine that prints glue stretch and shrink, possibly followed by the name of finite units: @p void print_glue(scaled @!d, int @!order, str_number @!s) /*prints a glue component*/ {@+print_scaled(d); if ((order < normal)||(order > filll)) print_str("foul"); else if (order > normal) {@+print_str("fil"); while (order > fil) {@+print_char('l');decr(order); } } else if (s!=0) print(s); } @ The next subroutine prints a whole glue specification. @p void print_spec(int @!p, str_number @!s) /*prints a glue specification*/ {@+if ((p < mem_min)||(p >= lo_mem_max)) print_char('*'); @.*\relax@> else{@+print_scaled(width(p)); if (s!=0) print(s); if (stretch(p)!=0) {@+print_str(" plus ");print_glue(stretch(p), stretch_order(p), s); } if (shrink(p)!=0) {@+print_str(" minus ");print_glue(shrink(p), shrink_order(p), s); } } } @ We also need to declare some procedures that appear later in this documentation. @p@<Declare procedures needed for displaying the elements of mlists@>@; @<Declare the procedure called |print_skip_param|@>@; @ Since boxes can be inside of boxes, |show_node_list| is inherently recursive, @^recursion@> up to a given maximum number of levels. The history of nesting is indicated by the current string, which will be printed at the beginning of each line; the length of this string, namely |cur_length|, is the depth of nesting. Recursive calls on |show_node_list| therefore use the following pattern: @d node_list_display(X) {@+append_char('.');show_node_list(X);flush_char; } /*|str_room| need not be checked; see |show_box| below*/ @ A global variable called |depth_threshold| is used to record the maximum depth of nesting for which |show_node_list| will show information. If we have |depth_threshold==0|, for example, only the top level information will be given and no sublists will be traversed. Another global variable, called |breadth_max|, tells the maximum number of items to show at each level; |breadth_max| had better be positive, or you won't see anything. @<Glob...@>= int @!depth_threshold; /*maximum nesting depth in box displays*/ int @!breadth_max; /*maximum number of items shown at the same list level*/ @ Now we are ready for |show_node_list| itself. This procedure has been written to be ``extra robust'' in the sense that it should not crash or get into a loop even if the data structures have been messed up by bugs in the rest of the program. You can safely call its parent routine |show_box(p)| for arbitrary values of |p| when you are debugging \TeX. However, in the presence of bad data, the procedure may @^dirty \PASCAL@>@^debugging@> fetch a |memory_word| whose variant is different from the way it was stored; for example, it might try to read |mem[p].hh| when |mem[p]| contains a scaled integer, if |p| is a pointer that has been clobbered or chosen at random. @p void show_node_list(int @!p) /*prints a node list symbolically*/ {@+ int n; /*the number of items already printed at this level*/ double @!g; /*a glue ratio, as a floating point number*/ if (cur_length > depth_threshold) {@+if (p > null) print_str(" []"); /*indicate that there's been some truncation*/ return; } n=0; while (p > mem_min) {@+print_ln();print_current_string(); /*display the nesting history*/ if (p > mem_end) /*pointer out of range*/ {@+print_str("Bad link, display aborted.");return; @.Bad link...@> } incr(n);if (n > breadth_max) /*time to stop*/ {@+print_str("etc.");return; @.etc@> } @<Display node |p|@>; p=link(p); } } @ @<Display node |p|@>= if (is_char_node(p)) print_font_and_char(p); else switch (type(p)) { case hlist_node: case vlist_node: case unset_node: @<Display box |p|@>@;@+break; case rule_node: @<Display rule |p|@>@;@+break; case ins_node: @<Display insertion |p|@>@;@+break; case whatsit_node: @<Display the whatsit node |p|@>@;@+break; case glue_node: @<Display glue |p|@>@;@+break; case kern_node: @<Display kern |p|@>@;@+break; case math_node: @<Display math node |p|@>@;@+break; case ligature_node: @<Display ligature |p|@>@;@+break; case penalty_node: @<Display penalty |p|@>@;@+break; case disc_node: @<Display discretionary |p|@>@;@+break; case mark_node: @<Display mark |p|@>@;@+break; case adjust_node: @<Display adjustment |p|@>@;@+break; @t\4@>@<Cases of |show_node_list| that arise in mlists only@>@; default:print_str("Unknown node type!"); } @ @<Display box |p|@>= {@+if (type(p)==hlist_node) print_esc('h'); else if (type(p)==vlist_node) print_esc('v'); else print_esc(@[@<|"unset"|@>@]); print_str("box(");print_scaled(height(p));print_char('+'); print_scaled(depth(p));print_str(")x");print_scaled(width(p)); if (type(p)==unset_node) @<Display special fields of the unset node |p|@>@; else{@+@<Display the value of |glue_set(p)|@>; if (shift_amount(p)!=0) {@+print_str(", shifted ");print_scaled(shift_amount(p)); } } node_list_display(list_ptr(p)); /*recursive call*/ } @ @<Display special fields of the unset node |p|@>= {@+if (span_count(p)!=min_quarterword) {@+print_str(" (");print_int(qo(span_count(p))+1); print_str(" columns)"); } if (glue_stretch(p)!=0) {@+print_str(", stretch ");print_glue(glue_stretch(p), glue_order(p), 0); } if (glue_shrink(p)!=0) {@+print_str(", shrink ");print_glue(glue_shrink(p), glue_sign(p), 0); } } @ The code will have to change in this place if |glue_ratio| is a structured type instead of an ordinary |double|. Note that this routine should avoid arithmetic errors even if the |glue_set| field holds an arbitrary random value. The following code assumes that a properly formed nonzero |double| number has absolute value $2^{20}$ or more when it is regarded as an integer; this precaution was adequate to prevent floating point underflow on the author's computer. @^system dependencies@> @^dirty \PASCAL@> @<Display the value of |glue_set(p)|@>= g=float(glue_set(p)); if ((g!=float_constant(0))&&(glue_sign(p)!=normal)) {@+print_str(", glue set "); if (glue_sign(p)==shrinking) print_str("- "); if (abs(mem[p+glue_offset].i) < 04000000) print_str("?.?"); else if (abs(g) > float_constant(20000)) {@+if (g > float_constant(0)) print_char('>'); else print_str("< -"); print_glue(20000*unity, glue_order(p), 0); } else print_glue(round(unity*g), glue_order(p), 0); @^real multiplication@> } @ @<Display rule |p|@>= {@+print_esc(@[@<|"rule("|@>@]);print_rule_dimen(height(p));print_char('+'); print_rule_dimen(depth(p));print_str(")x");print_rule_dimen(width(p)); } @ @<Display insertion |p|@>= {@+print_esc(@[@<|"insert"|@>@]);print_int(qo(subtype(p))); print_str(", natural size ");print_scaled(height(p)); print_str("; split(");print_spec(split_top_ptr(p), 0); print_char(',');print_scaled(depth(p)); print_str("); float cost ");print_int(float_cost(p)); node_list_display(ins_ptr(p)); /*recursive call*/ } @ @<Display glue |p|@>= if (subtype(p) >= a_leaders) @<Display leaders |p|@>@; else{@+print_esc(@[@<|"glue"|@>@]); if (subtype(p)!=normal) {@+print_char('('); if (subtype(p) < cond_math_glue) print_skip_param(subtype(p)-1); else if (subtype(p)==cond_math_glue) print_esc(@[@<|"nonscript"|@>@]); else print_esc(@[@<|"mskip"|@>@]); print_char(')'); } if (subtype(p)!=cond_math_glue) {@+print_char(' '); if (subtype(p) < cond_math_glue) print_spec(glue_ptr(p), 0); else print_spec(glue_ptr(p),@[@<|"mu"|@>@]); } } @ @<Display leaders |p|@>= {@+print_esc(empty_string); if (subtype(p)==c_leaders) print_char('c'); else if (subtype(p)==x_leaders) print_char('x'); print_str("leaders ");print_spec(glue_ptr(p), 0); node_list_display(leader_ptr(p)); /*recursive call*/ } @ An ``explicit'' kern value is indicated implicitly by an explicit space. @<Display kern |p|@>= if (subtype(p)!=mu_glue) {@+print_esc(@[@<|"kern"|@>@]); if (subtype(p)!=normal) print_char(' '); print_scaled(width(p)); if (subtype(p)==acc_kern) print_str(" (for accent)"); @.for accent@> } else{@+print_esc(@[@<|"mkern"|@>@]);print_scaled(width(p));print_str("mu"); } @ @<Display math node |p|@>= {@+print_esc(@[@<|"math"|@>@]); if (subtype(p)==before) print_str("on"); else print_str("off"); if (width(p)!=0) {@+print_str(", surrounded ");print_scaled(width(p)); } } @ @<Display ligature |p|@>= {@+print_font_and_char(lig_char(p));print_str(" (ligature "); if (subtype(p) > 1) print_char('|'); font_in_short_display=font(lig_char(p));short_display(lig_ptr(p)); if (odd(subtype(p))) print_char('|'); print_char(')'); } @ @<Display penalty |p|@>= {@+print_esc(@[@<|"penalty "|@>@]);print_int(penalty(p)); } @ The |post_break| list of a discretionary node is indicated by a prefixed `\.{\char'174}' instead of the `\..' before the |pre_break| list. @<Display discretionary |p|@>= {@+print_esc(@[@<|"discretionary"|@>@]); if (replace_count(p) > 0) {@+print_str(" replacing ");print_int(replace_count(p)); } node_list_display(pre_break(p)); /*recursive call*/ append_char('|');show_node_list(post_break(p));flush_char; /*recursive call*/ } @ @<Display mark |p|@>= {@+print_esc(@[@<|"mark"|@>@]);print_mark(mark_ptr(p)); } @ @<Display adjustment |p|@>= {@+print_esc(@[@<|"vadjust"|@>@]);node_list_display(adjust_ptr(p)); /*recursive call*/ } @ The recursive machinery is started by calling |show_box|. @^recursion@> @p void show_box(pointer @!p) {@+@<Assign the values |depth_threshold:=show_box_depth| and |breadth_max:=show_box_breadth|@>; if (breadth_max <= 0) breadth_max=5; if (pool_ptr+depth_threshold >= pool_size) depth_threshold=pool_size-pool_ptr-1; /*now there's enough room for prefix string*/ show_node_list(p); /*the show starts at |p|*/ print_ln(); } @* Destroying boxes. When we are done with a node list, we are obliged to return it to free storage, including all of its sublists. The recursive procedure |flush_node_list| does this for us. @ First, however, we shall consider two non-recursive procedures that do simpler tasks. The first of these, |delete_token_ref|, is called when a pointer to a token list's reference count is being removed. This means that the token list should disappear if the reference count was |null|, otherwise the count should be decreased by one. @^reference counts@> @d token_ref_count(X) info(X) /*reference count preceding a token list*/ @p void delete_token_ref(pointer @!p) /*|p| points to the reference count of a token list that is losing one reference*/ {@+if (token_ref_count(p)==null) flush_list(p); else decr(token_ref_count(p)); } @ Similarly, |delete_glue_ref| is called when a pointer to a glue specification is being withdrawn. @^reference counts@> @d fast_delete_glue_ref(X) @t@>@;@/ {@+if (glue_ref_count(X)==null) free_node(X, glue_spec_size); else decr(glue_ref_count(X)); } @p void delete_glue_ref(pointer @!p) /*|p| points to a glue specification*/ fast_delete_glue_ref(p) @ Now we are ready to delete any node list, recursively. In practice, the nodes deleted are usually charnodes (about 2/3 of the time), and they are glue nodes in about half of the remaining cases. @^recursion@> @p void flush_node_list(pointer @!p) /*erase list of nodes starting at |p|*/ {@+ /*go here when node |p| has been freed*/ pointer q; /*successor to node |p|*/ while (p!=null) @^inner loop@> {@+q=link(p); if (is_char_node(p)) free_avail(p)@; else{@+switch (type(p)) { case hlist_node: case vlist_node: case unset_node: {@+flush_node_list(list_ptr(p)); free_node(p, box_node_size);goto done; } case rule_node: {@+free_node(p, rule_node_size);goto done; } case ins_node: {@+flush_node_list(ins_ptr(p)); delete_glue_ref(split_top_ptr(p)); free_node(p, ins_node_size);goto done; } case whatsit_node: @<Wipe out the whatsit node |p| and |goto done|@>@; case glue_node: {@+fast_delete_glue_ref(glue_ptr(p)); if (leader_ptr(p)!=null) flush_node_list(leader_ptr(p)); } @+break; case kern_node: case math_node: case penalty_node: do_nothing;@+break; case ligature_node: flush_node_list(lig_ptr(p));@+break; case mark_node: delete_token_ref(mark_ptr(p));@+break; case disc_node: {@+flush_node_list(pre_break(p)); flush_node_list(post_break(p)); } @+break; case adjust_node: flush_node_list(adjust_ptr(p));@+break; @t\4@>@<Cases of |flush_node_list| that arise in mlists only@>@; default:confusion(@[@<|"flushing"|@>@]); @:this can't happen flushing}{\quad flushing@> } @/ free_node(p, small_node_size); done: ;} p=q; } } @* Copying boxes. Another recursive operation that acts on boxes is sometimes needed: The procedure |copy_node_list| returns a pointer to another node list that has the same structure and meaning as the original. Note that since glue specifications and token lists have reference counts, we need not make copies of them. Reference counts can never get too large to fit in a halfword, since each pointer to a node is in a different memory address, and the total number of memory addresses fits in a halfword. @^recursion@> @^reference counts@> (Well, there actually are also references from outside |mem|; if the |save_stack| is made arbitrarily large, it would theoretically be possible to break \TeX\ by overflowing a reference count. But who would want to do that?) @d add_token_ref(X) incr(token_ref_count(X)) /*new reference to a token list*/ @d add_glue_ref(X) incr(glue_ref_count(X)) /*new reference to a glue spec*/ @ The copying procedure copies words en masse without bothering to look at their individual fields. If the node format changes---for example, if the size is altered, or if some link field is moved to another relative position---then this code may need to be changed too. @^data structure assumptions@> @p pointer copy_node_list(pointer @!p) /*makes a duplicate of the node list that starts at |p| and returns a pointer to the new list*/ {@+pointer h; /*temporary head of copied list*/ pointer @!q; /*previous position in new list*/ pointer @!r; /*current node being fabricated for new list*/ uint8_t @!words; /*number of words remaining to be copied*/ h=get_avail();q=h; while (p!=null) {@+@<Make a copy of node |p| in node |r|@>; link(q)=r;q=r;p=link(p); } link(q)=null;q=link(h);free_avail(h); return q; } @ @<Make a copy of node |p|...@>= words=1; /*this setting occurs in more branches than any other*/ if (is_char_node(p)) r=get_avail(); else@<Case statement to copy different types and set |words| to the number of initial words not yet copied@>; while (words > 0) {@+decr(words);mem[r+words]=mem[p+words]; } @ @<Case statement to copy...@>= switch (type(p)) { case hlist_node: case vlist_node: case unset_node: {@+r=get_node(box_node_size); mem[r+6]=mem[p+6];mem[r+5]=mem[p+5]; /*copy the last two words*/ list_ptr(r)=copy_node_list(list_ptr(p)); /*this affects |mem[r+5]|*/ words=5; } @+break; case rule_node: {@+r=get_node(rule_node_size);words=rule_node_size; } @+break; case ins_node: {@+r=get_node(ins_node_size);mem[r+4]=mem[p+4]; add_glue_ref(split_top_ptr(p)); ins_ptr(r)=copy_node_list(ins_ptr(p)); /*this affects |mem[r+4]|*/ words=ins_node_size-1; } @+break; case whatsit_node: @<Make a partial copy of the whatsit node |p| and make |r| point to it; set |words| to the number of initial words not yet copied@>@;@+break; case glue_node: {@+r=get_node(small_node_size);add_glue_ref(glue_ptr(p)); glue_ptr(r)=glue_ptr(p);leader_ptr(r)=copy_node_list(leader_ptr(p)); } @+break; case kern_node: case math_node: case penalty_node: {@+r=get_node(small_node_size); words=small_node_size; } @+break; case ligature_node: {@+r=get_node(small_node_size); mem[lig_char(r)]=mem[lig_char(p)]; /*copy |font| and |character|*/ lig_ptr(r)=copy_node_list(lig_ptr(p)); } @+break; case disc_node: {@+r=get_node(small_node_size); pre_break(r)=copy_node_list(pre_break(p)); post_break(r)=copy_node_list(post_break(p)); } @+break; case mark_node: {@+r=get_node(small_node_size);add_token_ref(mark_ptr(p)); words=small_node_size; } @+break; case adjust_node: {@+r=get_node(small_node_size); adjust_ptr(r)=copy_node_list(adjust_ptr(p)); } @+break; /*|words==1==small_node_size-1|*/ default:confusion(@[@<|"copying"|@>@]); @:this can't happen copying}{\quad copying@> } @* The command codes. Before we can go any further, we need to define symbolic names for the internal code numbers that represent the various commands obeyed by \TeX. These codes are somewhat arbitrary, but not completely so. For example, the command codes for character types are fixed by the language, since a user says, e.g., `\.{\\catcode \`\\\${} = 3}' to make \.{\char'44} a math delimiter, and the command code |math_shift| is equal to~3. Some other codes have been made adjacent so that |case| statements in the program need not consider cases that are widely spaced, or so that |case| statements can be replaced by |if (| statements. At any rate, here is the list, for future reference. First come the ``catcode'' commands, several of which share their numeric codes with ordinary commands when the catcode cannot emerge from \TeX's scanning routine. @d escape 0 /*escape delimiter (called \.\\ in {\sl The \TeX book\/})*/ @:TeXbook}{\sl The \TeX book@> @d relax 0 /*do nothing ( \.{\\relax} )*/ @d left_brace 1 /*beginning of a group ( \.\{ )*/ @d right_brace 2 /*ending of a group ( \.\} )*/ @d math_shift 3 /*mathematics shift character ( \.\$ )*/ @d tab_mark 4 /*alignment delimiter ( \.\&, \.{\\span} )*/ @d car_ret 5 /*end of line ( |carriage_return|, \.{\\cr}, \.{\\crcr} )*/ @d out_param 5 /*output a macro parameter*/ @d mac_param 6 /*macro parameter symbol ( \.\# )*/ @d sup_mark 7 /*superscript ( \.{\char'136} )*/ @d sub_mark 8 /*subscript ( \.{\char'137} )*/ @d ignore 9 /*characters to ignore ( \.{\^\^@@} )*/ @d endv 9 /*end of \<v_j> list in alignment template*/ @d spacer 10 /*characters equivalent to blank space ( \.{\ } )*/ @d letter 11 /*characters regarded as letters ( \.{A..Z}, \.{a..z} )*/ @d other_char 12 /*none of the special character types*/ @d active_char 13 /*characters that invoke macros ( \.{\char`\~} )*/ @d par_end 13 /*end of paragraph ( \.{\\par} )*/ @d match 13 /*match a macro parameter*/ @d comment 14 /*characters that introduce comments ( \.\% )*/ @d end_match 14 /*end of parameters to macro*/ @d stop 14 /*end of job ( \.{\\end}, \.{\\dump} )*/ @d invalid_char 15 /*characters that shouldn't appear ( \.{\^\^?} )*/ @d delim_num 15 /*specify delimiter numerically ( \.{\\delimiter} )*/ @d max_char_code 15 /*largest catcode for individual characters*/ @ Next are the ordinary run-of-the-mill command codes. Codes that are |min_internal| or more represent internal quantities that might be expanded by `\.{\\the}'. @d char_num 16 /*character specified numerically ( \.{\\char} )*/ @d math_char_num 17 /*explicit math code ( \.{\\mathchar} )*/ @d mark 18 /*mark definition ( \.{\\mark} )*/ @d xray 19 /*peek inside of \TeX\ ( \.{\\show}, \.{\\showbox}, etc.~)*/ @d make_box 20 /*make a box ( \.{\\box}, \.{\\copy}, \.{\\hbox}, etc.~)*/ @d hmove 21 /*horizontal motion ( \.{\\moveleft}, \.{\\moveright} )*/ @d vmove 22 /*vertical motion ( \.{\\raise}, \.{\\lower} )*/ @d un_hbox 23 /*unglue a box ( \.{\\unhbox}, \.{\\unhcopy} )*/ @d un_vbox 24 /*unglue a box ( \.{\\unvbox}, \.{\\unvcopy} )*/ @d remove_item 25 /*nullify last item ( \.{\\unpenalty}, \.{\\unkern}, \.{\\unskip} )*/ @d hskip 26 /*horizontal glue ( \.{\\hskip}, \.{\\hfil}, etc.~)*/ @d vskip 27 /*vertical glue ( \.{\\vskip}, \.{\\vfil}, etc.~)*/ @d mskip 28 /*math glue ( \.{\\mskip} )*/ @d kern 29 /*fixed space ( \.{\\kern})*/ @d mkern 30 /*math kern ( \.{\\mkern} )*/ @d leader_ship 31 /*use a box ( \.{\\shipout}, \.{\\leaders}, etc.~)*/ @d halign 32 /*horizontal table alignment ( \.{\\halign} )*/ @d valign 33 /*vertical table alignment ( \.{\\valign} )*/ @d no_align 34 /*temporary escape from alignment ( \.{\\noalign} )*/ @d vrule 35 /*vertical rule ( \.{\\vrule} )*/ @d hrule 36 /*horizontal rule ( \.{\\hrule} )*/ @d insert 37 /*vlist inserted in box ( \.{\\insert} )*/ @d vadjust 38 /*vlist inserted in enclosing paragraph ( \.{\\vadjust} )*/ @d ignore_spaces 39 /*gobble |spacer| tokens ( \.{\\ignorespaces} )*/ @d after_assignment 40 /*save till assignment is done ( \.{\\afterassignment} )*/ @d after_group 41 /*save till group is done ( \.{\\aftergroup} )*/ @d break_penalty 42 /*additional badness ( \.{\\penalty} )*/ @d start_par 43 /*begin paragraph ( \.{\\indent}, \.{\\noindent} )*/ @d ital_corr 44 /*italic correction ( \.{\\/} )*/ @d accent 45 /*attach accent in text ( \.{\\accent} )*/ @d math_accent 46 /*attach accent in math ( \.{\\mathaccent} )*/ @d discretionary 47 /*discretionary texts ( \.{\\-}, \.{\\discretionary} )*/ @d eq_no 48 /*equation number ( \.{\\eqno}, \.{\\leqno} )*/ @d left_right 49 /*variable delimiter ( \.{\\left}, \.{\\right} )*/ @d math_comp 50 /*component of formula ( \.{\\mathbin}, etc.~)*/ @d limit_switch 51 /*diddle limit conventions ( \.{\\displaylimits}, etc.~)*/ @d above 52 /*generalized fraction ( \.{\\above}, \.{\\atop}, etc.~)*/ @d math_style 53 /*style specification ( \.{\\displaystyle}, etc.~)*/ @d math_choice 54 /*choice specification ( \.{\\mathchoice} )*/ @d non_script 55 /*conditional math glue ( \.{\\nonscript} )*/ @d vcenter 56 /*vertically center a vbox ( \.{\\vcenter} )*/ @d case_shift 57 /*force specific case ( \.{\\lowercase}, \.{\\uppercase}~)*/ @d message 58 /*send to user ( \.{\\message}, \.{\\errmessage} )*/ @d extension 59 /*extensions to \TeX\ ( \.{\\write}, \.{\\special}, etc.~)*/ @d in_stream 60 /*files for reading ( \.{\\openin}, \.{\\closein} )*/ @d begin_group 61 /*begin local grouping ( \.{\\begingroup} )*/ @d end_group 62 /*end local grouping ( \.{\\endgroup} )*/ @d omit 63 /*omit alignment template ( \.{\\omit} )*/ @d ex_space 64 /*explicit space ( \.{\\\ } )*/ @d no_boundary 65 /*suppress boundary ligatures ( \.{\\noboundary} )*/ @d radical 66 /*square root and similar signs ( \.{\\radical} )*/ @d end_cs_name 67 /*end control sequence ( \.{\\endcsname} )*/ @d min_internal 68 /*the smallest code that can follow \.{\\the}*/ @d char_given 68 /*character code defined by \.{\\chardef}*/ @d math_given 69 /*math code defined by \.{\\mathchardef}*/ @d last_item 70 /*most recent item ( \.{\\lastpenalty}, \.{\\lastkern}, \.{\\lastskip} )*/ @d max_non_prefixed_command 70 /*largest command code that can't be \.{\\global}*/ @ The next codes are special; they all relate to mode-independent assignment of values to \TeX's internal registers or tables. Codes that are |max_internal| or less represent internal quantities that might be expanded by `\.{\\the}'. @d toks_register 71 /*token list register ( \.{\\toks} )*/ @d assign_toks 72 /*special token list ( \.{\\output}, \.{\\everypar}, etc.~)*/ @d assign_int 73 /*user-defined integer ( \.{\\tolerance}, \.{\\day}, etc.~)*/ @d assign_dimen 74 /*user-defined length ( \.{\\hsize}, etc.~)*/ @d assign_glue 75 /*user-defined glue ( \.{\\baselineskip}, etc.~)*/ @d assign_mu_glue 76 /*user-defined muglue ( \.{\\thinmuskip}, etc.~)*/ @d assign_font_dimen 77 /*user-defined font dimension ( \.{\\fontdimen} )*/ @d assign_font_int 78 /*user-defined font integer ( \.{\\hyphenchar}, \.{\\skewchar} )*/ @d set_aux 79 /*specify state info ( \.{\\spacefactor}, \.{\\prevdepth} )*/ @d set_prev_graf 80 /*specify state info ( \.{\\prevgraf} )*/ @d set_page_dimen 81 /*specify state info ( \.{\\pagegoal}, etc.~)*/ @d set_page_int 82 /*specify state info ( \.{\\deadcycles}, \.{\\insertpenalties} )*/ @d set_box_dimen 83 /*change dimension of box ( \.{\\wd}, \.{\\ht}, \.{\\dp} )*/ @d set_shape 84 /*specify fancy paragraph shape ( \.{\\parshape} )*/ @d def_code 85 /*define a character code ( \.{\\catcode}, etc.~)*/ @d def_family 86 /*declare math fonts ( \.{\\textfont}, etc.~)*/ @d set_font 87 /*set current font ( font identifiers )*/ @d def_font 88 /*define a font file ( \.{\\font} )*/ @d register 89 /*internal register ( \.{\\count}, \.{\\dimen}, etc.~)*/ @d max_internal 89 /*the largest code that can follow \.{\\the}*/ @d advance 90 /*advance a register or parameter ( \.{\\advance} )*/ @d multiply 91 /*multiply a register or parameter ( \.{\\multiply} )*/ @d divide 92 /*divide a register or parameter ( \.{\\divide} )*/ @d prefix 93 /*qualify a definition ( \.{\\global}, \.{\\long}, \.{\\outer} )*/ @d let 94 /*assign a command code ( \.{\\let}, \.{\\futurelet} )*/ @d shorthand_def 95 /*code definition ( \.{\\chardef}, \.{\\countdef}, etc.~)*/ @d read_to_cs 96 /*read into a control sequence ( \.{\\read} )*/ @d def 97 /*macro definition ( \.{\\def}, \.{\\gdef}, \.{\\xdef}, \.{\\edef} )*/ @d set_box 98 /*set a box ( \.{\\setbox} )*/ @d hyph_data 99 /*hyphenation data ( \.{\\hyphenation}, \.{\\patterns} )*/ @d set_interaction 100 /*define level of interaction ( \.{\\batchmode}, etc.~)*/ @d max_command 100 /*the largest command code seen at |big_switch|*/ @ The remaining command codes are extra special, since they cannot get through \TeX's scanner to the main control routine. They have been given values higher than |max_command| so that their special nature is easily discernible. The ``expandable'' commands come first. @d undefined_cs (max_command+1) /*initial state of most |eq_type| fields*/ @d expand_after (max_command+2) /*special expansion ( \.{\\expandafter} )*/ @d no_expand (max_command+3) /*special nonexpansion ( \.{\\noexpand} )*/ @d input (max_command+4) /*input a source file ( \.{\\input}, \.{\\endinput} )*/ @d if_test (max_command+5) /*conditional text ( \.{\\if}, \.{\\ifcase}, etc.~)*/ @d fi_or_else (max_command+6) /*delimiters for conditionals ( \.{\\else}, etc.~)*/ @d cs_name (max_command+7) /*make a control sequence from tokens ( \.{\\csname} )*/ @d convert (max_command+8) /*convert to text ( \.{\\number}, \.{\\string}, etc.~)*/ @d the (max_command+9) /*expand an internal quantity ( \.{\\the} )*/ @d top_bot_mark (max_command+10) /*inserted mark ( \.{\\topmark}, etc.~)*/ @d call (max_command+11) /*non-long, non-outer control sequence*/ @d long_call (max_command+12) /*long, non-outer control sequence*/ @d outer_call (max_command+13) /*non-long, outer control sequence*/ @d long_outer_call (max_command+14) /*long, outer control sequence*/ @d end_template (max_command+15) /*end of an alignment template*/ @d dont_expand (max_command+16) /*the following token was marked by \.{\\noexpand}*/ @d glue_ref (max_command+17) /*the equivalent points to a glue specification*/ @d shape_ref (max_command+18) /*the equivalent points to a parshape specification*/ @d box_ref (max_command+19) /*the equivalent points to a box node, or is |null|*/ @d data (max_command+20) /*the equivalent is simply a halfword number*/ @* The semantic nest. \TeX\ is typically in the midst of building many lists at once. For example, when a math formula is being processed, \TeX\ is in math mode and working on an mlist; this formula has temporarily interrupted \TeX\ from being in horizontal mode and building the hlist of a paragraph; and this paragraph has temporarily interrupted \TeX\ from being in vertical mode and building the vlist for the next page of a document. Similarly, when a \.{\\vbox} occurs inside of an \.{\\hbox}, \TeX\ is temporarily interrupted from working in restricted horizontal mode, and it enters internal vertical mode. The ``semantic nest'' is a stack that keeps track of what lists and modes are currently suspended. At each level of processing we are in one of six modes: \yskip\hang|vmode| stands for vertical mode (the page builder); \hang|hmode| stands for horizontal mode (the paragraph builder); \hang|mmode| stands for displayed formula mode; \hang|-vmode| stands for internal vertical mode (e.g., in a \.{\\vbox}); \hang|-hmode| stands for restricted horizontal mode (e.g., in an \.{\\hbox}); \hang|-mmode| stands for math formula mode (not displayed). \yskip\noindent The mode is temporarily set to zero while processing \.{\\write} texts in the |ship_out| routine. Numeric values are assigned to |vmode|, |hmode|, and |mmode| so that \TeX's ``big semantic switch'' can select the appropriate thing to do by computing the value |abs(mode)+cur_cmd|, where |mode| is the current mode and |cur_cmd| is the current command code. @d vmode 1 /*vertical mode*/ @d hmode (vmode+max_command+1) /*horizontal mode*/ @d mmode (hmode+max_command+1) /*math mode*/ @p void print_mode(int @!m) /*prints the mode represented by |m|*/ {@+if (m > 0) switch (m/(max_command+1)) { case 0: print_str("vertical");@+break; case 1: print_str("horizontal");@+break; case 2: print_str("display math"); } else if (m==0) print_str("no"); else switch ((-m)/(max_command+1)) { case 0: print_str("internal vertical");@+break; case 1: print_str("restricted horizontal");@+break; case 2: print_str("math"); } print_str(" mode"); } @ The state of affairs at any semantic level can be represented by five values: \yskip\hang|mode| is the number representing the semantic mode, as just explained. \yskip\hang|head| is a |pointer| to a list head for the list being built; |link(head)| therefore points to the first element of the list, or to |null| if the list is empty. \yskip\hang|tail| is a |pointer| to the final node of the list being built; thus, |tail==head| if and only if the list is empty. \yskip\hang|prev_graf| is the number of lines of the current paragraph that have already been put into the present vertical list. \yskip\hang|aux| is an auxiliary |memory_word| that gives further information that is needed to characterize the situation. \yskip\noindent In vertical mode, |aux| is also known as |prev_depth|; it is the scaled value representing the depth of the previous box, for use in baseline calculations, or it is | <= -1000|pt if the next box on the vertical list is to be exempt from baseline calculations. In horizontal mode, |aux| is also known as |space_factor| and |clang|; it holds the current space factor used in spacing calculations, and the current language used for hyphenation. (The value of |clang| is undefined in restricted horizontal mode.) In math mode, |aux| is also known as |incompleat_noad|; if not |null|, it points to a record that represents the numerator of a generalized fraction for which the denominator is currently being formed in the current list. There is also a sixth quantity, |mode_line|, which correlates the semantic nest with the user's input; |mode_line| contains the source line number at which the current level of nesting was entered. The negative of this line number is the |mode_line| at the level of the user's output routine. In horizontal mode, the |prev_graf| field is used for initial language data. The semantic nest is an array called |nest| that holds the |mode|, |head|, |tail|, |prev_graf|, |aux|, and |mode_line| values for all semantic levels below the currently active one. Information about the currently active level is kept in the global quantities |mode|, |head|, |tail|, |prev_graf|, |aux|, and |mode_line|, which live in a \PASCAL\ record that is ready to be pushed onto |nest| if necessary. @d ignore_depth -65536000 /*|prev_depth| value that is ignored*/ @<Types...@>= typedef struct { int16_t @!mode_field;@+ pointer @!head_field, @!tail_field; int @!pg_field, @!ml_field;@+ memory_word @!aux_field; } list_state_record; @ @d mode cur_list.mode_field /*current mode*/ @d head cur_list.head_field /*header node of current list*/ @d tail cur_list.tail_field /*final node on current list*/ @d prev_graf cur_list.pg_field /*number of paragraph lines accumulated*/ @d aux cur_list.aux_field /*auxiliary data about the current list*/ @d prev_depth aux.sc /*the name of |aux| in vertical mode*/ @d space_factor aux.hh.lh /*part of |aux| in horizontal mode*/ @d clang aux.hh.rh /*the other part of |aux| in horizontal mode*/ @d incompleat_noad aux.i /*the name of |aux| in math mode*/ @d mode_line cur_list.ml_field /*source file line number at beginning of list*/ @<Glob...@>= list_state_record @!nest[nest_size+1]; uint8_t @!nest_ptr; /*first unused location of |nest|*/ uint8_t @!max_nest_stack; /*maximum of |nest_ptr| when pushing*/ list_state_record @!cur_list; /*the ``top'' semantic state*/ int16_t @!shown_mode; /*most recent mode shown by \.{\\tracingcommands}*/ @ Here is a common way to make the current list grow: @d tail_append(X) {@+link(tail)=X;tail=link(tail); } @ We will see later that the vertical list at the bottom semantic level is split into two parts; the ``current page'' runs from |page_head| to |page_tail|, and the ``contribution list'' runs from |contrib_head| to |tail| of semantic level zero. The idea is that contributions are first formed in vertical mode, then ``contributed'' to the current page (during which time the page-breaking decisions are made). For now, we don't need to know any more details about the page-building process. @<Set init...@>= nest_ptr=0;max_nest_stack=0; mode=vmode;head=contrib_head;tail=contrib_head; prev_depth=ignore_depth;mode_line=0; prev_graf=0;shown_mode=0; @<Start a new current page@>; @ When \TeX's work on one level is interrupted, the state is saved by calling |push_nest|. This routine changes |head| and |tail| so that a new (empty) list is begun; it does not change |mode| or |aux|. @p void push_nest(void) /*enter a new semantic level, save the old*/ {@+if (nest_ptr > max_nest_stack) {@+max_nest_stack=nest_ptr; if (nest_ptr==nest_size) overflow("semantic nest size", nest_size); @:TeX capacity exceeded semantic nest size}{\quad semantic nest size@> } nest[nest_ptr]=cur_list; /*stack the record*/ incr(nest_ptr);head=get_avail();tail=head;prev_graf=0;mode_line=line; } @ Conversely, when \TeX\ is finished on the current level, the former state is restored by calling |pop_nest|. This routine will never be called at the lowest semantic level, nor will it be called unless |head| is a node that should be returned to free memory. @p void pop_nest(void) /*leave a semantic level, re-enter the old*/ {@+free_avail(head);decr(nest_ptr);cur_list=nest[nest_ptr]; } @ Here is a procedure that displays what \TeX\ is working on, at all levels. @p void print_totals(void); void show_activities(void) {@+int p; /*index into |nest|*/ int16_t @!m; /*mode*/ memory_word @!a; /*auxiliary*/ pointer @!q, @!r; /*for showing the current page*/ int @!t; /*ditto*/ nest[nest_ptr]=cur_list; /*put the top level into the array*/ print_nl("");print_ln(); for (p=nest_ptr; p>=0; p--) {@+m=nest[p].mode_field;a=nest[p].aux_field; print_nl("### ");print_mode(m); print_str(" entered at line ");print_int(abs(nest[p].ml_field)); if (m==hmode) if (nest[p].pg_field!=040600000) {@+print_str(" (language");print_int(nest[p].pg_field%0200000); print_str(":hyphenmin");print_int(nest[p].pg_field/020000000); print_char(',');print_int((nest[p].pg_field/0200000)%0100); print_char(')'); } if (nest[p].ml_field < 0) print_str(" (\\output routine)"); if (p==0) {@+@<Show the status of the current page@>; if (link(contrib_head)!=null) print_nl("### recent contributions:"); } show_box(link(nest[p].head_field)); @<Show the auxiliary field, |a|@>; } } @ @<Show the auxiliary...@>= switch (abs(m)/(max_command+1)) { case 0: {@+print_nl("prevdepth "); if (a.sc <= ignore_depth) print_str("ignored"); else print_scaled(a.sc); if (nest[p].pg_field!=0) {@+print_str(", prevgraf "); print_int(nest[p].pg_field);print_str(" line"); if (nest[p].pg_field!=1) print_char('s'); } } @+break; case 1: {@+print_nl("spacefactor ");print_int(a.hh.lh); if (m > 0) @+if (a.hh.rh > 0) {@+print_str(", current language ");print_int(a.hh.rh);@+ } } @+break; case 2: if (a.i!=null) {@+print_str("this will be denominator of:");show_box(a.i);@+ } } /*there are no other cases*/ @* The table of equivalents. Now that we have studied the data structures for \TeX's semantic routines, we ought to consider the data structures used by its syntactic routines. In other words, our next concern will be the tables that \TeX\ looks at when it is scanning what the user has written. The biggest and most important such table is called |eqtb|. It holds the current ``equivalents'' of things; i.e., it explains what things mean or what their current values are, for all quantities that are subject to the nesting structure provided by \TeX's grouping mechanism. There are six parts to |eqtb|: \yskip\hangg 1) |eqtb[active_base dotdot(hash_base-1)]| holds the current equivalents of single-character control sequences. \yskip\hangg 2) |eqtb[hash_base dotdot(glue_base-1)]| holds the current equivalents of multiletter control sequences. \yskip\hangg 3) |eqtb[glue_base dotdot(local_base-1)]| holds the current equivalents of glue parameters like the current baselineskip. \yskip\hangg 4) |eqtb[local_base dotdot(int_base-1)]| holds the current equivalents of local halfword quantities like the current box registers, the current ``catcodes,'' the current font, and a pointer to the current paragraph shape. \yskip\hangg 5) |eqtb[int_base dotdot(dimen_base-1)]| holds the current equivalents of fullword integer parameters like the current hyphenation penalty. \yskip\hangg 6) |eqtb[dimen_base dotdot eqtb_size]| holds the current equivalents of fullword dimension parameters like the current hsize or amount of hanging indentation. \yskip\noindent Note that, for example, the current amount of baselineskip glue is determined by the setting of a particular location in region~3 of |eqtb|, while the current meaning of the control sequence `\.{\\baselineskip}' (which might have been changed by \.{\\def} or \.{\\let}) appears in region~2. @ Each entry in |eqtb| is a |memory_word|. Most of these words are of type |two_halves|, and subdivided into three fields: \yskip\hangg 1) The |eq_level| (a quarterword) is the level of grouping at which this equivalent was defined. If the level is |level_zero|, the equivalent has never been defined; |level_one| refers to the outer level (outside of all groups), and this level is also used for global definitions that never go away. Higher levels are for equivalents that will disappear at the end of their group. @^global definitions@> \yskip\hangg 2) The |eq_type| (another quarterword) specifies what kind of entry this is. There are many types, since each \TeX\ primitive like \.{\\hbox}, \.{\\def}, etc., has its own special code. The list of command codes above includes all possible settings of the |eq_type| field. \yskip\hangg 3) The |equiv| (a halfword) is the current equivalent value. This may be a font number, a pointer into |mem|, or a variety of other things. @d eq_level_field(X) X.hh.b1 @d eq_type_field(X) X.hh.b0 @d equiv_field(X) X.hh.rh @d eq_level(X) eq_level_field(eqtb[X]) /*level of definition*/ @d eq_type(X) eq_type_field(eqtb[X]) /*command code for equivalent*/ @d equiv(X) equiv_field(eqtb[X]) /*equivalent value*/ @d level_zero min_quarterword /*level for undefined quantities*/ @d level_one (level_zero+1) /*outermost level for defined quantities*/ @ Many locations in |eqtb| have symbolic names. The purpose of the next paragraphs is to define these names, and to set up the initial values of the equivalents. In the first region we have 256 equivalents for ``active characters'' that act as control sequences, followed by 256 equivalents for single-character control sequences. Then comes region~2, which corresponds to the hash table that we will define later. The maximum address in this region is used for a dummy control sequence that is perpetually undefined. There also are several locations for control sequences that are perpetually defined (since they are used in error recovery). @d active_base 1 /*beginning of region 1, for active character equivalents*/ @d single_base (active_base+256) /*equivalents of one-character control sequences*/ @d null_cs (single_base+256) /*equivalent of \.{\\csname\\endcsname}*/ @d hash_base (null_cs+1) /*beginning of region 2, for the hash table*/ @d frozen_control_sequence (hash_base+hash_size) /*for error recovery*/ @d frozen_protection frozen_control_sequence /*inaccessible but definable*/ @d frozen_cr (frozen_control_sequence+1) /*permanent `\.{\\cr}'*/ @d frozen_end_group (frozen_control_sequence+2) /*permanent `\.{\\endgroup}'*/ @d frozen_right (frozen_control_sequence+3) /*permanent `\.{\\right}'*/ @d frozen_fi (frozen_control_sequence+4) /*permanent `\.{\\fi}'*/ @d frozen_end_template (frozen_control_sequence+5) /*permanent `\.{\\endtemplate}'*/ @d frozen_endv (frozen_control_sequence+6) /*second permanent `\.{\\endtemplate}'*/ @d frozen_relax (frozen_control_sequence+7) /*permanent `\.{\\relax}'*/ @d end_write (frozen_control_sequence+8) /*permanent `\.{\\endwrite}'*/ @d frozen_dont_expand (frozen_control_sequence+9) /*permanent `\.{\\notexpanded:}'*/ @d frozen_null_font (frozen_control_sequence+10) /*permanent `\.{\\nullfont}'*/ @d font_id_base (frozen_null_font-font_base) /*begins table of 257 permanent font identifiers*/ @d undefined_control_sequence (frozen_null_font+257) /*dummy location*/ @d glue_base (undefined_control_sequence+1) /*beginning of region 3*/ @<Initialize table entries...@>= eq_type(undefined_control_sequence)=undefined_cs; equiv(undefined_control_sequence)=null; eq_level(undefined_control_sequence)=level_zero; for (k=active_base; k<=undefined_control_sequence-1; k++) eqtb[k]=eqtb[undefined_control_sequence]; @ Here is a routine that displays the current meaning of an |eqtb| entry in region 1 or~2. (Similar routines for the other regions will appear below.) @<Show equivalent |n|, in region 1 or 2@>= {@+sprint_cs(n);print_char('=');print_cmd_chr(eq_type(n), equiv(n)); if (eq_type(n) >= call) {@+print_char(':');show_token_list(link(equiv(n)), null, 32); } } @ Region 3 of |eqtb| contains the 256 \.{\\skip} registers, as well as the glue parameters defined here. It is important that the ``muskip'' parameters have larger numbers than the others. @d line_skip_code 0 /*interline glue if |baseline_skip| is infeasible*/ @d baseline_skip_code 1 /*desired glue between baselines*/ @d par_skip_code 2 /*extra glue just above a paragraph*/ @d above_display_skip_code 3 /*extra glue just above displayed math*/ @d below_display_skip_code 4 /*extra glue just below displayed math*/ @d above_display_short_skip_code 5 /*glue above displayed math following short lines*/ @d below_display_short_skip_code 6 /*glue below displayed math following short lines*/ @d left_skip_code 7 /*glue at left of justified lines*/ @d right_skip_code 8 /*glue at right of justified lines*/ @d top_skip_code 9 /*glue at top of main pages*/ @d split_top_skip_code 10 /*glue at top of split pages*/ @d tab_skip_code 11 /*glue between aligned entries*/ @d space_skip_code 12 /*glue between words (if not |zero_glue|)*/ @d xspace_skip_code 13 /*glue after sentences (if not |zero_glue|)*/ @d par_fill_skip_code 14 /*glue on last line of paragraph*/ @d thin_mu_skip_code 15 /*thin space in math formula*/ @d med_mu_skip_code 16 /*medium space in math formula*/ @d thick_mu_skip_code 17 /*thick space in math formula*/ @d glue_pars 18 /*total number of glue parameters*/ @d skip_base (glue_base+glue_pars) /*table of 256 ``skip'' registers*/ @d mu_skip_base (skip_base+256) /*table of 256 ``muskip'' registers*/ @d local_base (mu_skip_base+256) /*beginning of region 4*/ @# @d skip(X) equiv(skip_base+X) /*|mem| location of glue specification*/ @d mu_skip(X) equiv(mu_skip_base+X) /*|mem| location of math glue spec*/ @d glue_par(X) equiv(glue_base+X) /*|mem| location of glue specification*/ @d line_skip glue_par(line_skip_code) @d baseline_skip glue_par(baseline_skip_code) @d par_skip glue_par(par_skip_code) @d above_display_skip glue_par(above_display_skip_code) @d below_display_skip glue_par(below_display_skip_code) @d above_display_short_skip glue_par(above_display_short_skip_code) @d below_display_short_skip glue_par(below_display_short_skip_code) @d left_skip glue_par(left_skip_code) @d right_skip glue_par(right_skip_code) @d top_skip glue_par(top_skip_code) @d split_top_skip glue_par(split_top_skip_code) @d tab_skip glue_par(tab_skip_code) @d space_skip glue_par(space_skip_code) @d xspace_skip glue_par(xspace_skip_code) @d par_fill_skip glue_par(par_fill_skip_code) @d thin_mu_skip glue_par(thin_mu_skip_code) @d med_mu_skip glue_par(med_mu_skip_code) @d thick_mu_skip glue_par(thick_mu_skip_code) @<Current |mem| equivalent of glue parameter number |n|@>=glue_par(n) @ Sometimes we need to convert \TeX's internal code numbers into symbolic form. The |print_skip_param| routine gives the symbolic name of a glue parameter. @<Declare the procedure called |print_skip_param|@>= void print_skip_param(int @!n) {@+switch (n) { case line_skip_code: print_esc(@[@<|"lineskip"|@>@]);@+break; case baseline_skip_code: print_esc(@[@<|"baselineskip"|@>@]);@+break; case par_skip_code: print_esc(@[@<|"parskip"|@>@]);@+break; case above_display_skip_code: print_esc(@[@<|"abovedisplayskip"|@>@]);@+break; case below_display_skip_code: print_esc(@[@<|"belowdisplayskip"|@>@]);@+break; case above_display_short_skip_code: print_esc(@[@<|"abovedisplayshortskip"|@>@]);@+break; case below_display_short_skip_code: print_esc(@[@<|"belowdisplayshortskip"|@>@]);@+break; case left_skip_code: print_esc(@[@<|"leftskip"|@>@]);@+break; case right_skip_code: print_esc(@[@<|"rightskip"|@>@]);@+break; case top_skip_code: print_esc(@[@<|"topskip"|@>@]);@+break; case split_top_skip_code: print_esc(@[@<|"splittopskip"|@>@]);@+break; case tab_skip_code: print_esc(@[@<|"tabskip"|@>@]);@+break; case space_skip_code: print_esc(@[@<|"spaceskip"|@>@]);@+break; case xspace_skip_code: print_esc(@[@<|"xspaceskip"|@>@]);@+break; case par_fill_skip_code: print_esc(@[@<|"parfillskip"|@>@]);@+break; case thin_mu_skip_code: print_esc(@[@<|"thinmuskip"|@>@]);@+break; case med_mu_skip_code: print_esc(@[@<|"medmuskip"|@>@]);@+break; case thick_mu_skip_code: print_esc(@[@<|"thickmuskip"|@>@]);@+break; default:print_str("[unknown glue parameter!]"); } } @ The symbolic names for glue parameters are put into \TeX's hash table by using the routine called |primitive|, defined below. Let us enter them now, so that we don't have to list all those parameter names anywhere else. @<Put each of \TeX's primitives into the hash table@>= primitive(@[@<|"lineskip"|@>@], assign_glue, glue_base+line_skip_code);@/ @!@:line_skip_}{\.{\\lineskip} primitive@> primitive(@[@<|"baselineskip"|@>@], assign_glue, glue_base+baseline_skip_code);@/ @!@:baseline_skip_}{\.{\\baselineskip} primitive@> primitive(@[@<|"parskip"|@>@], assign_glue, glue_base+par_skip_code);@/ @!@:par_skip_}{\.{\\parskip} primitive@> primitive(@[@<|"abovedisplayskip"|@>@], assign_glue, glue_base+above_display_skip_code);@/ @!@:above_display_skip_}{\.{\\abovedisplayskip} primitive@> primitive(@[@<|"belowdisplayskip"|@>@], assign_glue, glue_base+below_display_skip_code);@/ @!@:below_display_skip_}{\.{\\belowdisplayskip} primitive@> primitive(@[@<|"abovedisplayshortskip"|@>@], assign_glue, glue_base+above_display_short_skip_code);@/ @!@:above_display_short_skip_}{\.{\\abovedisplayshortskip} primitive@> primitive(@[@<|"belowdisplayshortskip"|@>@], assign_glue, glue_base+below_display_short_skip_code);@/ @!@:below_display_short_skip_}{\.{\\belowdisplayshortskip} primitive@> primitive(@[@<|"leftskip"|@>@], assign_glue, glue_base+left_skip_code);@/ @!@:left_skip_}{\.{\\leftskip} primitive@> primitive(@[@<|"rightskip"|@>@], assign_glue, glue_base+right_skip_code);@/ @!@:right_skip_}{\.{\\rightskip} primitive@> primitive(@[@<|"topskip"|@>@], assign_glue, glue_base+top_skip_code);@/ @!@:top_skip_}{\.{\\topskip} primitive@> primitive(@[@<|"splittopskip"|@>@], assign_glue, glue_base+split_top_skip_code);@/ @!@:split_top_skip_}{\.{\\splittopskip} primitive@> primitive(@[@<|"tabskip"|@>@], assign_glue, glue_base+tab_skip_code);@/ @!@:tab_skip_}{\.{\\tabskip} primitive@> primitive(@[@<|"spaceskip"|@>@], assign_glue, glue_base+space_skip_code);@/ @!@:space_skip_}{\.{\\spaceskip} primitive@> primitive(@[@<|"xspaceskip"|@>@], assign_glue, glue_base+xspace_skip_code);@/ @!@:xspace_skip_}{\.{\\xspaceskip} primitive@> primitive(@[@<|"parfillskip"|@>@], assign_glue, glue_base+par_fill_skip_code);@/ @!@:par_fill_skip_}{\.{\\parfillskip} primitive@> primitive(@[@<|"thinmuskip"|@>@], assign_mu_glue, glue_base+thin_mu_skip_code);@/ @!@:thin_mu_skip_}{\.{\\thinmuskip} primitive@> primitive(@[@<|"medmuskip"|@>@], assign_mu_glue, glue_base+med_mu_skip_code);@/ @!@:med_mu_skip_}{\.{\\medmuskip} primitive@> primitive(@[@<|"thickmuskip"|@>@], assign_mu_glue, glue_base+thick_mu_skip_code);@/ @!@:thick_mu_skip_}{\.{\\thickmuskip} primitive@> @ @<Cases of |print_cmd_chr| for symbolic printing of primitives@>= case assign_glue: case assign_mu_glue: if (chr_code < skip_base) print_skip_param(chr_code-glue_base); else if (chr_code < mu_skip_base) {@+print_esc(@[@<|"skip"|@>@]);print_int(chr_code-skip_base); } else{@+print_esc(@[@<|"muskip"|@>@]);print_int(chr_code-mu_skip_base); } @+break; @ All glue parameters and registers are initially `\.{0pt plus0pt minus0pt}'. @<Initialize table entries...@>= equiv(glue_base)=zero_glue;eq_level(glue_base)=level_one; eq_type(glue_base)=glue_ref; for (k=glue_base+1; k<=local_base-1; k++) eqtb[k]=eqtb[glue_base]; glue_ref_count(zero_glue)=glue_ref_count(zero_glue)+local_base-glue_base; @ @<Show equivalent |n|, in region 3@>= if (n < skip_base) {@+print_skip_param(n-glue_base);print_char('='); if (n < glue_base+thin_mu_skip_code) print_spec(equiv(n),@[@<|"pt"|@>@]); else print_spec(equiv(n),@[@<|"mu"|@>@]); } else if (n < mu_skip_base) {@+print_esc(@[@<|"skip"|@>@]);print_int(n-skip_base);print_char('='); print_spec(equiv(n),@[@<|"pt"|@>@]); } else{@+print_esc(@[@<|"muskip"|@>@]);print_int(n-mu_skip_base);print_char('='); print_spec(equiv(n),@[@<|"mu"|@>@]); } @ Region 4 of |eqtb| contains the local quantities defined here. The bulk of this region is taken up by five tables that are indexed by eight-bit characters; these tables are important to both the syntactic and semantic portions of \TeX. There are also a bunch of special things like font and token parameters, as well as the tables of \.{\\toks} and \.{\\box} registers. @d par_shape_loc local_base /*specifies paragraph shape*/ @d output_routine_loc (local_base+1) /*points to token list for \.{\\output}*/ @d every_par_loc (local_base+2) /*points to token list for \.{\\everypar}*/ @d every_math_loc (local_base+3) /*points to token list for \.{\\everymath}*/ @d every_display_loc (local_base+4) /*points to token list for \.{\\everydisplay}*/ @d every_hbox_loc (local_base+5) /*points to token list for \.{\\everyhbox}*/ @d every_vbox_loc (local_base+6) /*points to token list for \.{\\everyvbox}*/ @d every_job_loc (local_base+7) /*points to token list for \.{\\everyjob}*/ @d every_cr_loc (local_base+8) /*points to token list for \.{\\everycr}*/ @d err_help_loc (local_base+9) /*points to token list for \.{\\errhelp}*/ @d toks_base (local_base+10) /*table of 256 token list registers*/ @d box_base (toks_base+256) /*table of 256 box registers*/ @d cur_font_loc (box_base+256) /*internal font number outside math mode*/ @d math_font_base (cur_font_loc+1) /*table of 48 math font numbers*/ @d cat_code_base (math_font_base+48) /*table of 256 command codes (the ``catcodes'')*/ @d lc_code_base (cat_code_base+256) /*table of 256 lowercase mappings*/ @d uc_code_base (lc_code_base+256) /*table of 256 uppercase mappings*/ @d sf_code_base (uc_code_base+256) /*table of 256 spacefactor mappings*/ @d math_code_base (sf_code_base+256) /*table of 256 math mode mappings*/ @d int_base (math_code_base+256) /*beginning of region 5*/ @# @d par_shape_ptr equiv(par_shape_loc) @d output_routine equiv(output_routine_loc) @d every_par equiv(every_par_loc) @d every_math equiv(every_math_loc) @d every_display equiv(every_display_loc) @d every_hbox equiv(every_hbox_loc) @d every_vbox equiv(every_vbox_loc) @d every_job equiv(every_job_loc) @d every_cr equiv(every_cr_loc) @d err_help equiv(err_help_loc) @d toks(X) equiv(toks_base+X) @d box(X) equiv(box_base+X) @d cur_font equiv(cur_font_loc) @d fam_fnt(X) equiv(math_font_base+X) @d cat_code(X) equiv(cat_code_base+X) @d lc_code(X) equiv(lc_code_base+X) @d uc_code(X) equiv(uc_code_base+X) @d sf_code(X) equiv(sf_code_base+X) @d math_code(X) equiv(math_code_base+X) /*Note: |math_code(c)| is the true math code plus |min_halfword|*/ @<Put each...@>= primitive(@[@<|"output"|@>@], assign_toks, output_routine_loc); @!@:output_}{\.{\\output} primitive@> primitive(@[@<|"everypar"|@>@], assign_toks, every_par_loc); @!@:every_par_}{\.{\\everypar} primitive@> primitive(@[@<|"everymath"|@>@], assign_toks, every_math_loc); @!@:every_math_}{\.{\\everymath} primitive@> primitive(@[@<|"everydisplay"|@>@], assign_toks, every_display_loc); @!@:every_display_}{\.{\\everydisplay} primitive@> primitive(@[@<|"everyhbox"|@>@], assign_toks, every_hbox_loc); @!@:every_hbox_}{\.{\\everyhbox} primitive@> primitive(@[@<|"everyvbox"|@>@], assign_toks, every_vbox_loc); @!@:every_vbox_}{\.{\\everyvbox} primitive@> primitive(@[@<|"everyjob"|@>@], assign_toks, every_job_loc); @!@:every_job_}{\.{\\everyjob} primitive@> primitive(@[@<|"everycr"|@>@], assign_toks, every_cr_loc); @!@:every_cr_}{\.{\\everycr} primitive@> primitive(@[@<|"errhelp"|@>@], assign_toks, err_help_loc); @!@:err_help_}{\.{\\errhelp} primitive@> @ @<Cases of |print_cmd_chr|...@>= case assign_toks: if (chr_code >= toks_base) {@+print_esc(@[@<|"toks"|@>@]);print_int(chr_code-toks_base); } else switch (chr_code) { case output_routine_loc: print_esc(@[@<|"output"|@>@]);@+break; case every_par_loc: print_esc(@[@<|"everypar"|@>@]);@+break; case every_math_loc: print_esc(@[@<|"everymath"|@>@]);@+break; case every_display_loc: print_esc(@[@<|"everydisplay"|@>@]);@+break; case every_hbox_loc: print_esc(@[@<|"everyhbox"|@>@]);@+break; case every_vbox_loc: print_esc(@[@<|"everyvbox"|@>@]);@+break; case every_job_loc: print_esc(@[@<|"everyjob"|@>@]);@+break; case every_cr_loc: print_esc(@[@<|"everycr"|@>@]);@+break; default:print_esc(@[@<|"errhelp"|@>@]); } @+break; @ We initialize most things to null or undefined values. An undefined font is represented by the internal code |font_base|. However, the character code tables are given initial values based on the conventional interpretation of ASCII code. These initial values should not be changed when \TeX\ is adapted for use with non-English languages; all changes to the initialization conventions should be made in format packages, not in \TeX\ itself, so that global interchange of formats is possible. @d null_font font_base @d var_code 070000 /*math code meaning ``use the current family''*/ @<Initialize table entries...@>= par_shape_ptr=null;eq_type(par_shape_loc)=shape_ref; eq_level(par_shape_loc)=level_one;@/ for (k=output_routine_loc; k<=toks_base+255; k++) eqtb[k]=eqtb[undefined_control_sequence]; box(0)=null;eq_type(box_base)=box_ref;eq_level(box_base)=level_one; for (k=box_base+1; k<=box_base+255; k++) eqtb[k]=eqtb[box_base]; cur_font=null_font;eq_type(cur_font_loc)=data; eq_level(cur_font_loc)=level_one;@/ for (k=math_font_base; k<=math_font_base+47; k++) eqtb[k]=eqtb[cur_font_loc]; equiv(cat_code_base)=0;eq_type(cat_code_base)=data; eq_level(cat_code_base)=level_one;@/ for (k=cat_code_base+1; k<=int_base-1; k++) eqtb[k]=eqtb[cat_code_base]; for (k=0; k<=255; k++) {@+cat_code(k)=other_char;math_code(k)=hi(k);sf_code(k)=1000; } cat_code(carriage_return)=car_ret;cat_code(' ')=spacer; cat_code('\\')=escape;cat_code('%')=comment; cat_code(invalid_code)=invalid_char;cat_code(null_code)=ignore; for (k='0'; k<='9'; k++) math_code(k)=hi(k+var_code); for (k='A'; k<='Z'; k++) {@+cat_code(k)=letter;cat_code(k+'a'-'A')=letter;@/ math_code(k)=hi(k+var_code+0x100); math_code(k+'a'-'A')=hi(k+'a'-'A'+var_code+0x100);@/ lc_code(k)=k+'a'-'A';lc_code(k+'a'-'A')=k+'a'-'A';@/ uc_code(k)=k;uc_code(k+'a'-'A')=k;@/ sf_code(k)=999; } @ @<Show equivalent |n|, in region 4@>= if (n==par_shape_loc) {@+print_esc(@[@<|"parshape"|@>@]);print_char('='); if (par_shape_ptr==null) print_char('0'); else print_int(info(par_shape_ptr)); } else if (n < toks_base) {@+print_cmd_chr(assign_toks, n);print_char('='); if (equiv(n)!=null) show_token_list(link(equiv(n)), null, 32); } else if (n < box_base) {@+print_esc(@[@<|"toks"|@>@]);print_int(n-toks_base);print_char('='); if (equiv(n)!=null) show_token_list(link(equiv(n)), null, 32); } else if (n < cur_font_loc) {@+print_esc(@[@<|"box"|@>@]);print_int(n-box_base);print_char('='); if (equiv(n)==null) print_str("void"); else{@+depth_threshold=0;breadth_max=1;show_node_list(equiv(n)); } } else if (n < cat_code_base) @<Show the font identifier in |eqtb[n]|@>@; else@<Show the halfword code in |eqtb[n]|@>@; @ @<Show the font identifier in |eqtb[n]|@>= {@+if (n==cur_font_loc) print_str("current font"); else if (n < math_font_base+16) {@+print_esc(@[@<|"textfont"|@>@]);print_int(n-math_font_base); } else if (n < math_font_base+32) {@+print_esc(@[@<|"scriptfont"|@>@]);print_int(n-math_font_base-16); } else{@+print_esc(@[@<|"scriptscriptfont"|@>@]);print_int(n-math_font_base-32); } print_char('=');@/ print_esc(hash[font_id_base+equiv(n)].rh); /*that's |font_id_text(equiv(n))|*/ } @ @<Show the halfword code in |eqtb[n]|@>= if (n < math_code_base) {@+if (n < lc_code_base) {@+print_esc(@[@<|"catcode"|@>@]);print_int(n-cat_code_base); } else if (n < uc_code_base) {@+print_esc(@[@<|"lccode"|@>@]);print_int(n-lc_code_base); } else if (n < sf_code_base) {@+print_esc(@[@<|"uccode"|@>@]);print_int(n-uc_code_base); } else{@+print_esc(@[@<|"sfcode"|@>@]);print_int(n-sf_code_base); } print_char('=');print_int(equiv(n)); } else{@+print_esc(@[@<|"mathcode"|@>@]);print_int(n-math_code_base); print_char('=');print_int(ho(equiv(n))); } @ Region 5 of |eqtb| contains the integer parameters and registers defined here, as well as the |del_code| table. The latter table differs from the |cat_code dotdot math_code| tables that precede it, since delimiter codes are fullword integers while the other kinds of codes occupy at most a halfword. This is what makes region~5 different from region~4. We will store the |eq_level| information in an auxiliary array of quarterwords that will be defined later. @d pretolerance_code 0 /*badness tolerance before hyphenation*/ @d tolerance_code 1 /*badness tolerance after hyphenation*/ @d line_penalty_code 2 /*added to the badness of every line*/ @d hyphen_penalty_code 3 /*penalty for break after discretionary hyphen*/ @d ex_hyphen_penalty_code 4 /*penalty for break after explicit hyphen*/ @d club_penalty_code 5 /*penalty for creating a club line*/ @d widow_penalty_code 6 /*penalty for creating a widow line*/ @d display_widow_penalty_code 7 /*ditto, just before a display*/ @d broken_penalty_code 8 /*penalty for breaking a page at a broken line*/ @d bin_op_penalty_code 9 /*penalty for breaking after a binary operation*/ @d rel_penalty_code 10 /*penalty for breaking after a relation*/ @d pre_display_penalty_code 11 /*penalty for breaking just before a displayed formula*/ @d post_display_penalty_code 12 /*penalty for breaking just after a displayed formula*/ @d inter_line_penalty_code 13 /*additional penalty between lines*/ @d double_hyphen_demerits_code 14 /*demerits for double hyphen break*/ @d final_hyphen_demerits_code 15 /*demerits for final hyphen break*/ @d adj_demerits_code 16 /*demerits for adjacent incompatible lines*/ @d mag_code 17 /*magnification ratio*/ @d delimiter_factor_code 18 /*ratio for variable-size delimiters*/ @d looseness_code 19 /*change in number of lines for a paragraph*/ @d time_code 20 /*current time of day*/ @d day_code 21 /*current day of the month*/ @d month_code 22 /*current month of the year*/ @d year_code 23 /*current year of our Lord*/ @d show_box_breadth_code 24 /*nodes per level in |show_box|*/ @d show_box_depth_code 25 /*maximum level in |show_box|*/ @d hbadness_code 26 /*hboxes exceeding this badness will be shown by |hpack|*/ @d vbadness_code 27 /*vboxes exceeding this badness will be shown by |vpack|*/ @d pausing_code 28 /*pause after each line is read from a file*/ @d tracing_online_code 29 /*show diagnostic output on terminal*/ @d tracing_macros_code 30 /*show macros as they are being expanded*/ @d tracing_stats_code 31 /*show memory usage if \TeX\ knows it*/ @d tracing_paragraphs_code 32 /*show line-break calculations*/ @d tracing_pages_code 33 /*show page-break calculations*/ @d tracing_output_code 34 /*show boxes when they are shipped out*/ @d tracing_lost_chars_code 35 /*show characters that aren't in the font*/ @d tracing_commands_code 36 /*show command codes at |big_switch|*/ @d tracing_restores_code 37 /*show equivalents when they are restored*/ @d uc_hyph_code 38 /*hyphenate words beginning with a capital letter*/ @d output_penalty_code 39 /*penalty found at current page break*/ @d max_dead_cycles_code 40 /*bound on consecutive dead cycles of output*/ @d hang_after_code 41 /*hanging indentation changes after this many lines*/ @d floating_penalty_code 42 /*penalty for insertions heldover after a split*/ @d global_defs_code 43 /*override \.{\\global} specifications*/ @d cur_fam_code 44 /*current family*/ @d escape_char_code 45 /*escape character for token output*/ @d default_hyphen_char_code 46 /*value of \.{\\hyphenchar} when a font is loaded*/ @d default_skew_char_code 47 /*value of \.{\\skewchar} when a font is loaded*/ @d end_line_char_code 48 /*character placed at the right end of the buffer*/ @d new_line_char_code 49 /*character that prints as |print_ln|*/ @d language_code 50 /*current hyphenation table*/ @d left_hyphen_min_code 51 /*minimum left hyphenation fragment size*/ @d right_hyphen_min_code 52 /*minimum right hyphenation fragment size*/ @d holding_inserts_code 53 /*do not remove insertion nodes from \.{\\box255}*/ @d error_context_lines_code 54 /*maximum intermediate line pairs shown*/ @d int_pars 55 /*total number of integer parameters*/ @d count_base (int_base+int_pars) /*256 user \.{\\count} registers*/ @d del_code_base (count_base+256) /*256 delimiter code mappings*/ @d dimen_base (del_code_base+256) /*beginning of region 6*/ @# @d del_code(X) eqtb[del_code_base+X].i @d count(X) eqtb[count_base+X].i @d int_par(X) eqtb[int_base+X].i /*an integer parameter*/ @d pretolerance int_par(pretolerance_code) @d tolerance int_par(tolerance_code) @d line_penalty int_par(line_penalty_code) @d hyphen_penalty int_par(hyphen_penalty_code) @d ex_hyphen_penalty int_par(ex_hyphen_penalty_code) @d club_penalty int_par(club_penalty_code) @d widow_penalty int_par(widow_penalty_code) @d display_widow_penalty int_par(display_widow_penalty_code) @d broken_penalty int_par(broken_penalty_code) @d bin_op_penalty int_par(bin_op_penalty_code) @d rel_penalty int_par(rel_penalty_code) @d pre_display_penalty int_par(pre_display_penalty_code) @d post_display_penalty int_par(post_display_penalty_code) @d inter_line_penalty int_par(inter_line_penalty_code) @d double_hyphen_demerits int_par(double_hyphen_demerits_code) @d final_hyphen_demerits int_par(final_hyphen_demerits_code) @d adj_demerits int_par(adj_demerits_code) @d mag int_par(mag_code) @d delimiter_factor int_par(delimiter_factor_code) @d looseness int_par(looseness_code) @d time int_par(time_code) @d day int_par(day_code) @d month int_par(month_code) @d year int_par(year_code) @d show_box_breadth int_par(show_box_breadth_code) @d show_box_depth int_par(show_box_depth_code) @d hbadness int_par(hbadness_code) @d vbadness int_par(vbadness_code) @d pausing int_par(pausing_code) @d tracing_online int_par(tracing_online_code) @d tracing_macros int_par(tracing_macros_code) @d tracing_stats int_par(tracing_stats_code) @d tracing_paragraphs int_par(tracing_paragraphs_code) @d tracing_pages int_par(tracing_pages_code) @d tracing_output int_par(tracing_output_code) @d tracing_lost_chars int_par(tracing_lost_chars_code) @d tracing_commands int_par(tracing_commands_code) @d tracing_restores int_par(tracing_restores_code) @d uc_hyph int_par(uc_hyph_code) @d output_penalty int_par(output_penalty_code) @d max_dead_cycles int_par(max_dead_cycles_code) @d hang_after int_par(hang_after_code) @d floating_penalty int_par(floating_penalty_code) @d global_defs int_par(global_defs_code) @d cur_fam int_par(cur_fam_code) @d escape_char int_par(escape_char_code) @d default_hyphen_char int_par(default_hyphen_char_code) @d default_skew_char int_par(default_skew_char_code) @d end_line_char int_par(end_line_char_code) @d new_line_char int_par(new_line_char_code) @d language int_par(language_code) @d left_hyphen_min int_par(left_hyphen_min_code) @d right_hyphen_min int_par(right_hyphen_min_code) @d holding_inserts int_par(holding_inserts_code) @d error_context_lines int_par(error_context_lines_code) @<Assign the values |depth_threshold:=show_box_depth|...@>= depth_threshold=show_box_depth; breadth_max=show_box_breadth @ We can print the symbolic name of an integer parameter as follows. @p void print_param(int @!n) {@+switch (n) { case pretolerance_code: print_esc(@[@<|"pretolerance"|@>@]);@+break; case tolerance_code: print_esc(@[@<|"tolerance"|@>@]);@+break; case line_penalty_code: print_esc(@[@<|"linepenalty"|@>@]);@+break; case hyphen_penalty_code: print_esc(@[@<|"hyphenpenalty"|@>@]);@+break; case ex_hyphen_penalty_code: print_esc(@[@<|"exhyphenpenalty"|@>@]);@+break; case club_penalty_code: print_esc(@[@<|"clubpenalty"|@>@]);@+break; case widow_penalty_code: print_esc(@[@<|"widowpenalty"|@>@]);@+break; case display_widow_penalty_code: print_esc(@[@<|"displaywidowpenalty"|@>@]);@+break; case broken_penalty_code: print_esc(@[@<|"brokenpenalty"|@>@]);@+break; case bin_op_penalty_code: print_esc(@[@<|"binoppenalty"|@>@]);@+break; case rel_penalty_code: print_esc(@[@<|"relpenalty"|@>@]);@+break; case pre_display_penalty_code: print_esc(@[@<|"predisplaypenalty"|@>@]);@+break; case post_display_penalty_code: print_esc(@[@<|"postdisplaypenalty"|@>@]);@+break; case inter_line_penalty_code: print_esc(@[@<|"interlinepenalty"|@>@]);@+break; case double_hyphen_demerits_code: print_esc(@[@<|"doublehyphendemerits"|@>@]);@+break; case final_hyphen_demerits_code: print_esc(@[@<|"finalhyphendemerits"|@>@]);@+break; case adj_demerits_code: print_esc(@[@<|"adjdemerits"|@>@]);@+break; case mag_code: print_esc(@[@<|"mag"|@>@]);@+break; case delimiter_factor_code: print_esc(@[@<|"delimiterfactor"|@>@]);@+break; case looseness_code: print_esc(@[@<|"looseness"|@>@]);@+break; case time_code: print_esc(@[@<|"time"|@>@]);@+break; case day_code: print_esc(@[@<|"day"|@>@]);@+break; case month_code: print_esc(@[@<|"month"|@>@]);@+break; case year_code: print_esc(@[@<|"year"|@>@]);@+break; case show_box_breadth_code: print_esc(@[@<|"showboxbreadth"|@>@]);@+break; case show_box_depth_code: print_esc(@[@<|"showboxdepth"|@>@]);@+break; case hbadness_code: print_esc(@[@<|"hbadness"|@>@]);@+break; case vbadness_code: print_esc(@[@<|"vbadness"|@>@]);@+break; case pausing_code: print_esc(@[@<|"pausing"|@>@]);@+break; case tracing_online_code: print_esc(@[@<|"tracingonline"|@>@]);@+break; case tracing_macros_code: print_esc(@[@<|"tracingmacros"|@>@]);@+break; case tracing_stats_code: print_esc(@[@<|"tracingstats"|@>@]);@+break; case tracing_paragraphs_code: print_esc(@[@<|"tracingparagraphs"|@>@]);@+break; case tracing_pages_code: print_esc(@[@<|"tracingpages"|@>@]);@+break; case tracing_output_code: print_esc(@[@<|"tracingoutput"|@>@]);@+break; case tracing_lost_chars_code: print_esc(@[@<|"tracinglostchars"|@>@]);@+break; case tracing_commands_code: print_esc(@[@<|"tracingcommands"|@>@]);@+break; case tracing_restores_code: print_esc(@[@<|"tracingrestores"|@>@]);@+break; case uc_hyph_code: print_esc(@[@<|"uchyph"|@>@]);@+break; case output_penalty_code: print_esc(@[@<|"outputpenalty"|@>@]);@+break; case max_dead_cycles_code: print_esc(@[@<|"maxdeadcycles"|@>@]);@+break; case hang_after_code: print_esc(@[@<|"hangafter"|@>@]);@+break; case floating_penalty_code: print_esc(@[@<|"floatingpenalty"|@>@]);@+break; case global_defs_code: print_esc(@[@<|"globaldefs"|@>@]);@+break; case cur_fam_code: print_esc(@[@<|"fam"|@>@]);@+break; case escape_char_code: print_esc(@[@<|"escapechar"|@>@]);@+break; case default_hyphen_char_code: print_esc(@[@<|"defaulthyphenchar"|@>@]);@+break; case default_skew_char_code: print_esc(@[@<|"defaultskewchar"|@>@]);@+break; case end_line_char_code: print_esc(@[@<|"endlinechar"|@>@]);@+break; case new_line_char_code: print_esc(@[@<|"newlinechar"|@>@]);@+break; case language_code: print_esc(@[@<|"language"|@>@]);@+break; case left_hyphen_min_code: print_esc(@[@<|"lefthyphenmin"|@>@]);@+break; case right_hyphen_min_code: print_esc(@[@<|"righthyphenmin"|@>@]);@+break; case holding_inserts_code: print_esc(@[@<|"holdinginserts"|@>@]);@+break; case error_context_lines_code: print_esc(@[@<|"errorcontextlines"|@>@]);@+break; default:print_str("[unknown integer parameter!]"); } } @ The integer parameter names must be entered into the hash table. @<Put each...@>= primitive(@[@<|"pretolerance"|@>@], assign_int, int_base+pretolerance_code);@/ @!@:pretolerance_}{\.{\\pretolerance} primitive@> primitive(@[@<|"tolerance"|@>@], assign_int, int_base+tolerance_code);@/ @!@:tolerance_}{\.{\\tolerance} primitive@> primitive(@[@<|"linepenalty"|@>@], assign_int, int_base+line_penalty_code);@/ @!@:line_penalty_}{\.{\\linepenalty} primitive@> primitive(@[@<|"hyphenpenalty"|@>@], assign_int, int_base+hyphen_penalty_code);@/ @!@:hyphen_penalty_}{\.{\\hyphenpenalty} primitive@> primitive(@[@<|"exhyphenpenalty"|@>@], assign_int, int_base+ex_hyphen_penalty_code);@/ @!@:ex_hyphen_penalty_}{\.{\\exhyphenpenalty} primitive@> primitive(@[@<|"clubpenalty"|@>@], assign_int, int_base+club_penalty_code);@/ @!@:club_penalty_}{\.{\\clubpenalty} primitive@> primitive(@[@<|"widowpenalty"|@>@], assign_int, int_base+widow_penalty_code);@/ @!@:widow_penalty_}{\.{\\widowpenalty} primitive@> primitive(@[@<|"displaywidowpenalty"|@>@], assign_int, int_base+display_widow_penalty_code);@/ @!@:display_widow_penalty_}{\.{\\displaywidowpenalty} primitive@> primitive(@[@<|"brokenpenalty"|@>@], assign_int, int_base+broken_penalty_code);@/ @!@:broken_penalty_}{\.{\\brokenpenalty} primitive@> primitive(@[@<|"binoppenalty"|@>@], assign_int, int_base+bin_op_penalty_code);@/ @!@:bin_op_penalty_}{\.{\\binoppenalty} primitive@> primitive(@[@<|"relpenalty"|@>@], assign_int, int_base+rel_penalty_code);@/ @!@:rel_penalty_}{\.{\\relpenalty} primitive@> primitive(@[@<|"predisplaypenalty"|@>@], assign_int, int_base+pre_display_penalty_code);@/ @!@:pre_display_penalty_}{\.{\\predisplaypenalty} primitive@> primitive(@[@<|"postdisplaypenalty"|@>@], assign_int, int_base+post_display_penalty_code);@/ @!@:post_display_penalty_}{\.{\\postdisplaypenalty} primitive@> primitive(@[@<|"interlinepenalty"|@>@], assign_int, int_base+inter_line_penalty_code);@/ @!@:inter_line_penalty_}{\.{\\interlinepenalty} primitive@> primitive(@[@<|"doublehyphendemerits"|@>@], assign_int, int_base+double_hyphen_demerits_code);@/ @!@:double_hyphen_demerits_}{\.{\\doublehyphendemerits} primitive@> primitive(@[@<|"finalhyphendemerits"|@>@], assign_int, int_base+final_hyphen_demerits_code);@/ @!@:final_hyphen_demerits_}{\.{\\finalhyphendemerits} primitive@> primitive(@[@<|"adjdemerits"|@>@], assign_int, int_base+adj_demerits_code);@/ @!@:adj_demerits_}{\.{\\adjdemerits} primitive@> primitive(@[@<|"mag"|@>@], assign_int, int_base+mag_code);@/ @!@:mag_}{\.{\\mag} primitive@> primitive(@[@<|"delimiterfactor"|@>@], assign_int, int_base+delimiter_factor_code);@/ @!@:delimiter_factor_}{\.{\\delimiterfactor} primitive@> primitive(@[@<|"looseness"|@>@], assign_int, int_base+looseness_code);@/ @!@:looseness_}{\.{\\looseness} primitive@> primitive(@[@<|"time"|@>@], assign_int, int_base+time_code);@/ @!@:time_}{\.{\\time} primitive@> primitive(@[@<|"day"|@>@], assign_int, int_base+day_code);@/ @!@:day_}{\.{\\day} primitive@> primitive(@[@<|"month"|@>@], assign_int, int_base+month_code);@/ @!@:month_}{\.{\\month} primitive@> primitive(@[@<|"year"|@>@], assign_int, int_base+year_code);@/ @!@:year_}{\.{\\year} primitive@> primitive(@[@<|"showboxbreadth"|@>@], assign_int, int_base+show_box_breadth_code);@/ @!@:show_box_breadth_}{\.{\\showboxbreadth} primitive@> primitive(@[@<|"showboxdepth"|@>@], assign_int, int_base+show_box_depth_code);@/ @!@:show_box_depth_}{\.{\\showboxdepth} primitive@> primitive(@[@<|"hbadness"|@>@], assign_int, int_base+hbadness_code);@/ @!@:hbadness_}{\.{\\hbadness} primitive@> primitive(@[@<|"vbadness"|@>@], assign_int, int_base+vbadness_code);@/ @!@:vbadness_}{\.{\\vbadness} primitive@> primitive(@[@<|"pausing"|@>@], assign_int, int_base+pausing_code);@/ @!@:pausing_}{\.{\\pausing} primitive@> primitive(@[@<|"tracingonline"|@>@], assign_int, int_base+tracing_online_code);@/ @!@:tracing_online_}{\.{\\tracingonline} primitive@> primitive(@[@<|"tracingmacros"|@>@], assign_int, int_base+tracing_macros_code);@/ @!@:tracing_macros_}{\.{\\tracingmacros} primitive@> primitive(@[@<|"tracingstats"|@>@], assign_int, int_base+tracing_stats_code);@/ @!@:tracing_stats_}{\.{\\tracingstats} primitive@> primitive(@[@<|"tracingparagraphs"|@>@], assign_int, int_base+tracing_paragraphs_code);@/ @!@:tracing_paragraphs_}{\.{\\tracingparagraphs} primitive@> primitive(@[@<|"tracingpages"|@>@], assign_int, int_base+tracing_pages_code);@/ @!@:tracing_pages_}{\.{\\tracingpages} primitive@> primitive(@[@<|"tracingoutput"|@>@], assign_int, int_base+tracing_output_code);@/ @!@:tracing_output_}{\.{\\tracingoutput} primitive@> primitive(@[@<|"tracinglostchars"|@>@], assign_int, int_base+tracing_lost_chars_code);@/ @!@:tracing_lost_chars_}{\.{\\tracinglostchars} primitive@> primitive(@[@<|"tracingcommands"|@>@], assign_int, int_base+tracing_commands_code);@/ @!@:tracing_commands_}{\.{\\tracingcommands} primitive@> primitive(@[@<|"tracingrestores"|@>@], assign_int, int_base+tracing_restores_code);@/ @!@:tracing_restores_}{\.{\\tracingrestores} primitive@> primitive(@[@<|"uchyph"|@>@], assign_int, int_base+uc_hyph_code);@/ @!@:uc_hyph_}{\.{\\uchyph} primitive@> primitive(@[@<|"outputpenalty"|@>@], assign_int, int_base+output_penalty_code);@/ @!@:output_penalty_}{\.{\\outputpenalty} primitive@> primitive(@[@<|"maxdeadcycles"|@>@], assign_int, int_base+max_dead_cycles_code);@/ @!@:max_dead_cycles_}{\.{\\maxdeadcycles} primitive@> primitive(@[@<|"hangafter"|@>@], assign_int, int_base+hang_after_code);@/ @!@:hang_after_}{\.{\\hangafter} primitive@> primitive(@[@<|"floatingpenalty"|@>@], assign_int, int_base+floating_penalty_code);@/ @!@:floating_penalty_}{\.{\\floatingpenalty} primitive@> primitive(@[@<|"globaldefs"|@>@], assign_int, int_base+global_defs_code);@/ @!@:global_defs_}{\.{\\globaldefs} primitive@> primitive(@[@<|"fam"|@>@], assign_int, int_base+cur_fam_code);@/ @!@:fam_}{\.{\\fam} primitive@> primitive(@[@<|"escapechar"|@>@], assign_int, int_base+escape_char_code);@/ @!@:escape_char_}{\.{\\escapechar} primitive@> primitive(@[@<|"defaulthyphenchar"|@>@], assign_int, int_base+default_hyphen_char_code);@/ @!@:default_hyphen_char_}{\.{\\defaulthyphenchar} primitive@> primitive(@[@<|"defaultskewchar"|@>@], assign_int, int_base+default_skew_char_code);@/ @!@:default_skew_char_}{\.{\\defaultskewchar} primitive@> primitive(@[@<|"endlinechar"|@>@], assign_int, int_base+end_line_char_code);@/ @!@:end_line_char_}{\.{\\endlinechar} primitive@> primitive(@[@<|"newlinechar"|@>@], assign_int, int_base+new_line_char_code);@/ @!@:new_line_char_}{\.{\\newlinechar} primitive@> primitive(@[@<|"language"|@>@], assign_int, int_base+language_code);@/ @!@:language_}{\.{\\language} primitive@> primitive(@[@<|"lefthyphenmin"|@>@], assign_int, int_base+left_hyphen_min_code);@/ @!@:left_hyphen_min_}{\.{\\lefthyphenmin} primitive@> primitive(@[@<|"righthyphenmin"|@>@], assign_int, int_base+right_hyphen_min_code);@/ @!@:right_hyphen_min_}{\.{\\righthyphenmin} primitive@> primitive(@[@<|"holdinginserts"|@>@], assign_int, int_base+holding_inserts_code);@/ @!@:holding_inserts_}{\.{\\holdinginserts} primitive@> primitive(@[@<|"errorcontextlines"|@>@], assign_int, int_base+error_context_lines_code);@/ @!@:error_context_lines_}{\.{\\errorcontextlines} primitive@> @ @<Cases of |print_cmd_chr|...@>= case assign_int: if (chr_code < count_base) print_param(chr_code-int_base); else{@+print_esc(@[@<|"count"|@>@]);print_int(chr_code-count_base); } @+break; @ The integer parameters should really be initialized by a macro package; the following initialization does the minimum to keep \TeX\ from complete failure. @^null delimiter@> @<Initialize table entries...@>= for (k=int_base; k<=del_code_base-1; k++) eqtb[k].i=0; mag=1000;tolerance=10000;hang_after=1;max_dead_cycles=25; escape_char='\\';end_line_char=carriage_return; for (k=0; k<=255; k++) del_code(k)=-1; del_code('.')=0; /*this null delimiter is used in error recovery*/ @ The following procedure, which is called just before \TeX\ initializes its input and output, establishes the initial values of the date and time. @^system dependencies@> Since standard \PASCAL\ cannot provide such information, something special is needed. The program here simply specifies July 4, 1776, at noon; but users probably want a better approximation to the truth. @p void fix_date_and_time(void) {@+time=12*60; /*minutes since midnight*/ day=4; /*fourth day of the month*/ month=7; /*seventh month of the year*/ year=1776; /*Anno Domini*/ } @ @<Show equivalent |n|, in region 5@>= {@+if (n < count_base) print_param(n-int_base); else if (n < del_code_base) {@+print_esc(@[@<|"count"|@>@]);print_int(n-count_base); } else{@+print_esc(@[@<|"delcode"|@>@]);print_int(n-del_code_base); } print_char('=');print_int(eqtb[n].i); } @ @<Set variable |c| to the current escape character@>=c=escape_char @ @<Character |s| is the current new-line character@>=s==new_line_char @ \TeX\ is occasionally supposed to print diagnostic information that goes only into the transcript file, unless |tracing_online| is positive. Here are two routines that adjust the destination of print commands: @p void begin_diagnostic(void) /*prepare to do some tracing*/ {@+old_setting=selector; if ((tracing_online <= 0)&&(selector==term_and_log)) {@+decr(selector); if (history==spotless) history=warning_issued; } } @# void end_diagnostic(bool @!blank_line) /*restore proper conditions after tracing*/ {@+print_nl(""); if (blank_line) print_ln(); selector=old_setting; } @ Of course we had better declare another global variable, if the previous routines are going to work. @<Glob...@>= uint8_t @!old_setting; @ The final region of |eqtb| contains the dimension parameters defined here, and the 256 \.{\\dimen} registers. @d par_indent_code 0 /*indentation of paragraphs*/ @d math_surround_code 1 /*space around math in text*/ @d line_skip_limit_code 2 /*threshold for |line_skip| instead of |baseline_skip|*/ @d hsize_code 3 /*line width in horizontal mode*/ @d vsize_code 4 /*page height in vertical mode*/ @d max_depth_code 5 /*maximum depth of boxes on main pages*/ @d split_max_depth_code 6 /*maximum depth of boxes on split pages*/ @d box_max_depth_code 7 /*maximum depth of explicit vboxes*/ @d hfuzz_code 8 /*tolerance for overfull hbox messages*/ @d vfuzz_code 9 /*tolerance for overfull vbox messages*/ @d delimiter_shortfall_code 10 /*maximum amount uncovered by variable delimiters*/ @d null_delimiter_space_code 11 /*blank space in null delimiters*/ @d script_space_code 12 /*extra space after subscript or superscript*/ @d pre_display_size_code 13 /*length of text preceding a display*/ @d display_width_code 14 /*length of line for displayed equation*/ @d display_indent_code 15 /*indentation of line for displayed equation*/ @d overfull_rule_code 16 /*width of rule that identifies overfull hboxes*/ @d hang_indent_code 17 /*amount of hanging indentation*/ @d h_offset_code 18 /*amount of horizontal offset when shipping pages out*/ @d v_offset_code 19 /*amount of vertical offset when shipping pages out*/ @d emergency_stretch_code 20 /*reduces badnesses on final pass of line-breaking*/ @d dimen_pars 21 /*total number of dimension parameters*/ @d scaled_base (dimen_base+dimen_pars) /*table of 256 user-defined \.{\\dimen} registers*/ @d eqtb_size (scaled_base+255) /*largest subscript of |eqtb|*/ @# @d dimen(X) eqtb[scaled_base+X].sc @d dimen_par(X) eqtb[dimen_base+X].sc /*a scaled quantity*/ @d par_indent dimen_par(par_indent_code) @d math_surround dimen_par(math_surround_code) @d line_skip_limit dimen_par(line_skip_limit_code) @d hsize dimen_par(hsize_code) @d vsize dimen_par(vsize_code) @d max_depth dimen_par(max_depth_code) @d split_max_depth dimen_par(split_max_depth_code) @d box_max_depth dimen_par(box_max_depth_code) @d hfuzz dimen_par(hfuzz_code) @d vfuzz dimen_par(vfuzz_code) @d delimiter_shortfall dimen_par(delimiter_shortfall_code) @d null_delimiter_space dimen_par(null_delimiter_space_code) @d script_space dimen_par(script_space_code) @d pre_display_size dimen_par(pre_display_size_code) @d display_width dimen_par(display_width_code) @d display_indent dimen_par(display_indent_code) @d overfull_rule dimen_par(overfull_rule_code) @d hang_indent dimen_par(hang_indent_code) @d h_offset dimen_par(h_offset_code) @d v_offset dimen_par(v_offset_code) @d emergency_stretch dimen_par(emergency_stretch_code) @p void print_length_param(int @!n) {@+switch (n) { case par_indent_code: print_esc(@[@<|"parindent"|@>@]);@+break; case math_surround_code: print_esc(@[@<|"mathsurround"|@>@]);@+break; case line_skip_limit_code: print_esc(@[@<|"lineskiplimit"|@>@]);@+break; case hsize_code: print_esc(@[@<|"hsize"|@>@]);@+break; case vsize_code: print_esc(@[@<|"vsize"|@>@]);@+break; case max_depth_code: print_esc(@[@<|"maxdepth"|@>@]);@+break; case split_max_depth_code: print_esc(@[@<|"splitmaxdepth"|@>@]);@+break; case box_max_depth_code: print_esc(@[@<|"boxmaxdepth"|@>@]);@+break; case hfuzz_code: print_esc(@[@<|"hfuzz"|@>@]);@+break; case vfuzz_code: print_esc(@[@<|"vfuzz"|@>@]);@+break; case delimiter_shortfall_code: print_esc(@[@<|"delimitershortfall"|@>@]);@+break; case null_delimiter_space_code: print_esc(@[@<|"nulldelimiterspace"|@>@]);@+break; case script_space_code: print_esc(@[@<|"scriptspace"|@>@]);@+break; case pre_display_size_code: print_esc(@[@<|"predisplaysize"|@>@]);@+break; case display_width_code: print_esc(@[@<|"displaywidth"|@>@]);@+break; case display_indent_code: print_esc(@[@<|"displayindent"|@>@]);@+break; case overfull_rule_code: print_esc(@[@<|"overfullrule"|@>@]);@+break; case hang_indent_code: print_esc(@[@<|"hangindent"|@>@]);@+break; case h_offset_code: print_esc(@[@<|"hoffset"|@>@]);@+break; case v_offset_code: print_esc(@[@<|"voffset"|@>@]);@+break; case emergency_stretch_code: print_esc(@[@<|"emergencystretch"|@>@]);@+break; default:print_str("[unknown dimen parameter!]"); } } @ @<Put each...@>= primitive(@[@<|"parindent"|@>@], assign_dimen, dimen_base+par_indent_code);@/ @!@:par_indent_}{\.{\\parindent} primitive@> primitive(@[@<|"mathsurround"|@>@], assign_dimen, dimen_base+math_surround_code);@/ @!@:math_surround_}{\.{\\mathsurround} primitive@> primitive(@[@<|"lineskiplimit"|@>@], assign_dimen, dimen_base+line_skip_limit_code);@/ @!@:line_skip_limit_}{\.{\\lineskiplimit} primitive@> primitive(@[@<|"hsize"|@>@], assign_dimen, dimen_base+hsize_code);@/ @!@:hsize_}{\.{\\hsize} primitive@> primitive(@[@<|"vsize"|@>@], assign_dimen, dimen_base+vsize_code);@/ @!@:vsize_}{\.{\\vsize} primitive@> primitive(@[@<|"maxdepth"|@>@], assign_dimen, dimen_base+max_depth_code);@/ @!@:max_depth_}{\.{\\maxdepth} primitive@> primitive(@[@<|"splitmaxdepth"|@>@], assign_dimen, dimen_base+split_max_depth_code);@/ @!@:split_max_depth_}{\.{\\splitmaxdepth} primitive@> primitive(@[@<|"boxmaxdepth"|@>@], assign_dimen, dimen_base+box_max_depth_code);@/ @!@:box_max_depth_}{\.{\\boxmaxdepth} primitive@> primitive(@[@<|"hfuzz"|@>@], assign_dimen, dimen_base+hfuzz_code);@/ @!@:hfuzz_}{\.{\\hfuzz} primitive@> primitive(@[@<|"vfuzz"|@>@], assign_dimen, dimen_base+vfuzz_code);@/ @!@:vfuzz_}{\.{\\vfuzz} primitive@> primitive(@[@<|"delimitershortfall"|@>@], assign_dimen, dimen_base+delimiter_shortfall_code);@/ @!@:delimiter_shortfall_}{\.{\\delimitershortfall} primitive@> primitive(@[@<|"nulldelimiterspace"|@>@], assign_dimen, dimen_base+null_delimiter_space_code);@/ @!@:null_delimiter_space_}{\.{\\nulldelimiterspace} primitive@> primitive(@[@<|"scriptspace"|@>@], assign_dimen, dimen_base+script_space_code);@/ @!@:script_space_}{\.{\\scriptspace} primitive@> primitive(@[@<|"predisplaysize"|@>@], assign_dimen, dimen_base+pre_display_size_code);@/ @!@:pre_display_size_}{\.{\\predisplaysize} primitive@> primitive(@[@<|"displaywidth"|@>@], assign_dimen, dimen_base+display_width_code);@/ @!@:display_width_}{\.{\\displaywidth} primitive@> primitive(@[@<|"displayindent"|@>@], assign_dimen, dimen_base+display_indent_code);@/ @!@:display_indent_}{\.{\\displayindent} primitive@> primitive(@[@<|"overfullrule"|@>@], assign_dimen, dimen_base+overfull_rule_code);@/ @!@:overfull_rule_}{\.{\\overfullrule} primitive@> primitive(@[@<|"hangindent"|@>@], assign_dimen, dimen_base+hang_indent_code);@/ @!@:hang_indent_}{\.{\\hangindent} primitive@> primitive(@[@<|"hoffset"|@>@], assign_dimen, dimen_base+h_offset_code);@/ @!@:h_offset_}{\.{\\hoffset} primitive@> primitive(@[@<|"voffset"|@>@], assign_dimen, dimen_base+v_offset_code);@/ @!@:v_offset_}{\.{\\voffset} primitive@> primitive(@[@<|"emergencystretch"|@>@], assign_dimen, dimen_base+emergency_stretch_code);@/ @!@:emergency_stretch_}{\.{\\emergencystretch} primitive@> @ @<Cases of |print_cmd_chr|...@>= case assign_dimen: if (chr_code < scaled_base) print_length_param(chr_code-dimen_base); else{@+print_esc(@[@<|"dimen"|@>@]);print_int(chr_code-scaled_base); } @+break; @ @<Initialize table entries...@>= for (k=dimen_base; k<=eqtb_size; k++) eqtb[k].sc=0; @ @<Show equivalent |n|, in region 6@>= {@+if (n < scaled_base) print_length_param(n-dimen_base); else{@+print_esc(@[@<|"dimen"|@>@]);print_int(n-scaled_base); } print_char('=');print_scaled(eqtb[n].sc);print_str("pt"); } @ Here is a procedure that displays the contents of |eqtb[n]| symbolically. @p@t\4@>@<Declare the procedure called |print_cmd_chr|@>@;@/ #ifdef @!STAT void show_eqtb(pointer @!n) {@+if (n < active_base) print_char('?'); /*this can't happen*/ else if (n < glue_base) @<Show equivalent |n|, in region 1 or 2@>@; else if (n < local_base) @<Show equivalent |n|, in region 3@>@; else if (n < int_base) @<Show equivalent |n|, in region 4@>@; else if (n < dimen_base) @<Show equivalent |n|, in region 5@>@; else if (n <= eqtb_size) @<Show equivalent |n|, in region 6@>@; else print_char('?'); /*this can't happen either*/ } #endif @ The last two regions of |eqtb| have fullword values instead of the three fields |eq_level|, |eq_type|, and |equiv|. An |eq_type| is unnecessary, but \TeX\ needs to store the |eq_level| information in another array called |xeq_level|. @<Glob...@>= memory_word @!eqtb0[eqtb_size-active_base+1], *const @!eqtb = @!eqtb0-active_base; quarterword @!xeq_level0[eqtb_size-int_base+1], *const @!xeq_level = @!xeq_level0-int_base; @ @<Set init...@>= for (k=int_base; k<=eqtb_size; k++) xeq_level[k]=level_one; @ When the debugging routine |search_mem| is looking for pointers having a given value, it is interested only in regions 1 to~3 of~|eqtb|, and in the first part of region~4. @<Search |eqtb| for equivalents equal to |p|@>= for (q=active_base; q<=box_base+255; q++) {@+if (equiv(q)==p) {@+print_nl("EQUIV(");print_int(q);print_char(')'); } } @* The hash table. Control sequences are stored and retrieved by means of a fairly standard hash table algorithm called the method of ``coalescing lists'' (cf.\ Algorithm 6.4C in {\sl The Art of Computer Programming\/}). Once a control sequence enters the table, it is never removed, because there are complicated situations involving \.{\\gdef} where the removal of a control sequence at the end of a group would be a mistake preventable only by the introduction of a complicated reference-count mechanism. The actual sequence of letters forming a control sequence identifier is stored in the |str_pool| array together with all the other strings. An auxiliary array |hash| consists of items with two halfword fields per word. The first of these, called |next(p)|, points to the next identifier belonging to the same coalesced list as the identifier corresponding to~|p|; and the other, called |text(p)|, points to the |str_start| entry for |p|'s identifier. If position~|p| of the hash table is empty, we have |text(p)==0|; if position |p| is either empty or the end of a coalesced hash list, we have |next(p)==0|. An auxiliary pointer variable called |hash_used| is maintained in such a way that all locations |p >= hash_used| are nonempty. The global variable |cs_count| tells how many multiletter control sequences have been defined, if statistics are being kept. A global boolean variable called |no_new_control_sequence| is set to |true| during the time that new hash table entries are forbidden. @d next(X) hash[X].lh /*link for coalesced lists*/ @d text(X) hash[X].rh /*string number for control sequence name*/ @d hash_is_full (hash_used==hash_base) /*test if all positions are occupied*/ @d font_id_text(X) text(font_id_base+X) /*a frozen font identifier's name*/ @<Glob...@>= two_halves @!hash0[undefined_control_sequence-hash_base], *const @!hash = @!hash0-hash_base; /*the hash table*/ pointer @!hash_used; /*allocation pointer for |hash|*/ bool @!no_new_control_sequence; /*are new identifiers legal?*/ int @!cs_count; /*total number of known identifiers*/ @ @<Set init...@>= no_new_control_sequence=true; /*new identifiers are usually forbidden*/ next(hash_base)=0;text(hash_base)=0; for (k=hash_base+1; k<=undefined_control_sequence-1; k++) hash[k]=hash[hash_base]; @ @<Initialize table entries...@>= hash_used=frozen_control_sequence; /*nothing is used*/ cs_count=0; eq_type(frozen_dont_expand)=dont_expand; text(frozen_dont_expand)=@[@<|"notexpanded:"|@>@]; @.notexpanded:@> @ Here is the subroutine that searches the hash table for an identifier that matches a given string of length |l > 1| appearing in |buffer[j dotdot (j+l-1)]|. If the identifier is found, the corresponding hash table address is returned. Otherwise, if the global variable |no_new_control_sequence| is |true|, the dummy address |undefined_control_sequence| is returned. Otherwise the identifier is inserted into the hash table and its location is returned. @p pointer id_lookup(int @!j, int @!l) /*search the hash table*/ {@+ /*go here if you found it*/ int h; /*hash code*/ int @!d; /*number of characters in incomplete current string*/ pointer @!p; /*index in |hash| array*/ pointer @!k; /*index in |buffer| array*/ @<Compute the hash code |h|@>; p=h+hash_base; /*we start searching here; note that |0 <= h < hash_prime|*/ loop@+{@+if (text(p) > 0) if (length(text(p))==l) if (str_eq_buf(text(p), j)) goto found; if (next(p)==0) {@+if (no_new_control_sequence) p=undefined_control_sequence; else@<Insert a new control sequence after |p|, then make |p| point to it@>; goto found; } p=next(p); } found: return p; } @ @<Insert a new control...@>= {@+if (text(p) > 0) {@+@/do@+{if (hash_is_full) overflow("hash size", hash_size); @:TeX capacity exceeded hash size}{\quad hash size@> decr(hash_used); }@+ while (!(text(hash_used)==0)); /*search for an empty location in |hash|*/ next(p)=hash_used;p=hash_used; } str_room(l);d=cur_length; while (pool_ptr > str_start[str_ptr]) {@+decr(pool_ptr);str_pool[pool_ptr+l]=str_pool[pool_ptr]; } /*move current string up to make room for another*/ for (k=j; k<=j+l-1; k++) append_char(buffer[k]); text(p)=make_string();pool_ptr=pool_ptr+d; #ifdef @!STAT incr(cs_count); #endif @;@/ } @ The value of |hash_prime| should be roughly 85\pct! of |hash_size|, and it should be a prime number. The theory of hashing tells us to expect fewer than two table probes, on the average, when the search is successful. [See J.~S. Vitter, {\sl Journal of the ACM\/ \bf30} (1983), 231--258.] @^Vitter, Jeffrey Scott@> @<Compute the hash code |h|@>= h=buffer[j]; for (k=j+1; k<=j+l-1; k++) {@+h=h+h+buffer[k]; while (h >= hash_prime) h=h-hash_prime; } @ Single-character control sequences do not need to be looked up in a hash table, since we can use the character code itself as a direct address. The procedure |print_cs| prints the name of a control sequence, given a pointer to its address in |eqtb|. A space is printed after the name unless it is a single nonletter or an active character. This procedure might be invoked with invalid data, so it is ``extra robust.'' The individual characters must be printed one at a time using |print|, since they may be unprintable. @<Basic printing...@>= void print_cs(int @!p) /*prints a purported control sequence*/ {@+if (p < hash_base) /*single character*/ if (p >= single_base) if (p==null_cs) {@+print_esc(@[@<|"csname"|@>@]);print_esc(@[@<|"endcsname"|@>@]);print_char(' '); } else{@+print_esc(p-single_base); if (cat_code(p-single_base)==letter) print_char(' '); } else if (p < active_base) print_esc(@[@<|"IMPOSSIBLE."|@>@]); @.IMPOSSIBLE@> else print(p-active_base); else if (p >= undefined_control_sequence) print_esc(@[@<|"IMPOSSIBLE."|@>@]); else if ((text(p) < 0)||(text(p) >= str_ptr)) print_esc(@[@<|"NONEXISTENT."|@>@]); @.NONEXISTENT@> else{@+print_esc(text(p)); print_char(' '); } } @ Here is a similar procedure; it avoids the error checks, and it never prints a space after the control sequence. @<Basic printing procedures@>= void sprint_cs(pointer @!p) /*prints a control sequence*/ {@+if (p < hash_base) if (p < single_base) print(p-active_base); else if (p < null_cs) print_esc(p-single_base); else{@+print_esc(@[@<|"csname"|@>@]);print_esc(@[@<|"endcsname"|@>@]); } else print_esc(text(p)); } @ We need to put \TeX's ``primitive'' control sequences into the hash table, together with their command code (which will be the |eq_type|) and an operand (which will be the |equiv|). The |primitive| procedure does this, in a way that no \TeX\ user can. The global value |cur_val| contains the new |eqtb| pointer after |primitive| has acted. @p #ifdef @!INIT void primitive(str_number @!s, quarterword @!c, halfword @!o) {@+pool_pointer k; /*index into |str_pool|*/ int @!j; /*index into |buffer|*/ small_number @!l; /*length of the string*/ if (s < 256) cur_val=s+single_base; else{@+k=str_start[s];l=str_start[s+1]-k; /*we will move |s| into the (empty) |buffer|*/ for (j=0; j<=l-1; j++) buffer[j]=so(str_pool[k+j]); cur_val=id_lookup(0, l); /*|no_new_control_sequence| is |false|*/ flush_string;text(cur_val)=s; /*we don't want to have the string twice*/ } eq_level(cur_val)=level_one;eq_type(cur_val)=c;equiv(cur_val)=o; } #endif @ Many of \TeX's primitives need no |equiv|, since they are identifiable by their |eq_type| alone. These primitives are loaded into the hash table as follows: @<Put each of \TeX's primitives into the hash table@>= primitive(' ', ex_space, 0);@/ @!@:Single-character primitives /}{\quad\.{\\\ }@> primitive('/', ital_corr, 0);@/ @!@:Single-character primitives /}{\quad\.{\\/}@> primitive(@[@<|"accent"|@>@], accent, 0);@/ @!@:accent_}{\.{\\accent} primitive@> primitive(@[@<|"advance"|@>@], advance, 0);@/ @!@:advance_}{\.{\\advance} primitive@> primitive(@[@<|"afterassignment"|@>@], after_assignment, 0);@/ @!@:after_assignment_}{\.{\\afterassignment} primitive@> primitive(@[@<|"aftergroup"|@>@], after_group, 0);@/ @!@:after_group_}{\.{\\aftergroup} primitive@> primitive(@[@<|"begingroup"|@>@], begin_group, 0);@/ @!@:begin_group_}{\.{\\begingroup} primitive@> primitive(@[@<|"char"|@>@], char_num, 0);@/ @!@:char_}{\.{\\char} primitive@> primitive(@[@<|"csname"|@>@], cs_name, 0);@/ @!@:cs_name_}{\.{\\csname} primitive@> primitive(@[@<|"delimiter"|@>@], delim_num, 0);@/ @!@:delimiter_}{\.{\\delimiter} primitive@> primitive(@[@<|"divide"|@>@], divide, 0);@/ @!@:divide_}{\.{\\divide} primitive@> primitive(@[@<|"endcsname"|@>@], end_cs_name, 0);@/ @!@:end_cs_name_}{\.{\\endcsname} primitive@> primitive(@[@<|"endgroup"|@>@], end_group, 0); @!@:end_group_}{\.{\\endgroup} primitive@> text(frozen_end_group)=@[@<|"endgroup"|@>@];eqtb[frozen_end_group]=eqtb[cur_val];@/ primitive(@[@<|"expandafter"|@>@], expand_after, 0);@/ @!@:expand_after_}{\.{\\expandafter} primitive@> primitive(@[@<|"font"|@>@], def_font, 0);@/ @!@:font_}{\.{\\font} primitive@> primitive(@[@<|"fontdimen"|@>@], assign_font_dimen, 0);@/ @!@:font_dimen_}{\.{\\fontdimen} primitive@> primitive(@[@<|"halign"|@>@], halign, 0);@/ @!@:halign_}{\.{\\halign} primitive@> primitive(@[@<|"hrule"|@>@], hrule, 0);@/ @!@:hrule_}{\.{\\hrule} primitive@> primitive(@[@<|"ignorespaces"|@>@], ignore_spaces, 0);@/ @!@:ignore_spaces_}{\.{\\ignorespaces} primitive@> primitive(@[@<|"insert"|@>@], insert, 0);@/ @!@:insert_}{\.{\\insert} primitive@> primitive(@[@<|"mark"|@>@], mark, 0);@/ @!@:mark_}{\.{\\mark} primitive@> primitive(@[@<|"mathaccent"|@>@], math_accent, 0);@/ @!@:math_accent_}{\.{\\mathaccent} primitive@> primitive(@[@<|"mathchar"|@>@], math_char_num, 0);@/ @!@:math_char_}{\.{\\mathchar} primitive@> primitive(@[@<|"mathchoice"|@>@], math_choice, 0);@/ @!@:math_choice_}{\.{\\mathchoice} primitive@> primitive(@[@<|"multiply"|@>@], multiply, 0);@/ @!@:multiply_}{\.{\\multiply} primitive@> primitive(@[@<|"noalign"|@>@], no_align, 0);@/ @!@:no_align_}{\.{\\noalign} primitive@> primitive(@[@<|"noboundary"|@>@], no_boundary, 0);@/ @!@:no_boundary_}{\.{\\noboundary} primitive@> primitive(@[@<|"noexpand"|@>@], no_expand, 0);@/ @!@:no_expand_}{\.{\\noexpand} primitive@> primitive(@[@<|"nonscript"|@>@], non_script, 0);@/ @!@:non_script_}{\.{\\nonscript} primitive@> primitive(@[@<|"omit"|@>@], omit, 0);@/ @!@:omit_}{\.{\\omit} primitive@> primitive(@[@<|"parshape"|@>@], set_shape, 0);@/ @!@:par_shape_}{\.{\\parshape} primitive@> primitive(@[@<|"penalty"|@>@], break_penalty, 0);@/ @!@:penalty_}{\.{\\penalty} primitive@> primitive(@[@<|"prevgraf"|@>@], set_prev_graf, 0);@/ @!@:prev_graf_}{\.{\\prevgraf} primitive@> primitive(@[@<|"radical"|@>@], radical, 0);@/ @!@:radical_}{\.{\\radical} primitive@> primitive(@[@<|"read"|@>@], read_to_cs, 0);@/ @!@:read_}{\.{\\read} primitive@> primitive(@[@<|"relax"|@>@], relax, 256); /*cf.\ |scan_file_name|*/ @!@:relax_}{\.{\\relax} primitive@> text(frozen_relax)=@[@<|"relax"|@>@];eqtb[frozen_relax]=eqtb[cur_val];@/ primitive(@[@<|"setbox"|@>@], set_box, 0);@/ @!@:set_box_}{\.{\\setbox} primitive@> primitive(@[@<|"the"|@>@], the, 0);@/ @!@:the_}{\.{\\the} primitive@> primitive(@[@<|"toks"|@>@], toks_register, 0);@/ @!@:toks_}{\.{\\toks} primitive@> primitive(@[@<|"vadjust"|@>@], vadjust, 0);@/ @!@:vadjust_}{\.{\\vadjust} primitive@> primitive(@[@<|"valign"|@>@], valign, 0);@/ @!@:valign_}{\.{\\valign} primitive@> primitive(@[@<|"vcenter"|@>@], vcenter, 0);@/ @!@:vcenter_}{\.{\\vcenter} primitive@> primitive(@[@<|"vrule"|@>@], vrule, 0);@/ @!@:vrule_}{\.{\\vrule} primitive@> @ Each primitive has a corresponding inverse, so that it is possible to display the cryptic numeric contents of |eqtb| in symbolic form. Every call of |primitive| in this program is therefore accompanied by some straightforward code that forms part of the |print_cmd_chr| routine below. @<Cases of |print_cmd_chr|...@>= case accent: print_esc(@[@<|"accent"|@>@]);@+break; case advance: print_esc(@[@<|"advance"|@>@]);@+break; case after_assignment: print_esc(@[@<|"afterassignment"|@>@]);@+break; case after_group: print_esc(@[@<|"aftergroup"|@>@]);@+break; case assign_font_dimen: print_esc(@[@<|"fontdimen"|@>@]);@+break; case begin_group: print_esc(@[@<|"begingroup"|@>@]);@+break; case break_penalty: print_esc(@[@<|"penalty"|@>@]);@+break; case char_num: print_esc(@[@<|"char"|@>@]);@+break; case cs_name: print_esc(@[@<|"csname"|@>@]);@+break; case def_font: print_esc(@[@<|"font"|@>@]);@+break; case delim_num: print_esc(@[@<|"delimiter"|@>@]);@+break; case divide: print_esc(@[@<|"divide"|@>@]);@+break; case end_cs_name: print_esc(@[@<|"endcsname"|@>@]);@+break; case end_group: print_esc(@[@<|"endgroup"|@>@]);@+break; case ex_space: print_esc(' ');@+break; case expand_after: print_esc(@[@<|"expandafter"|@>@]);@+break; case halign: print_esc(@[@<|"halign"|@>@]);@+break; case hrule: print_esc(@[@<|"hrule"|@>@]);@+break; case ignore_spaces: print_esc(@[@<|"ignorespaces"|@>@]);@+break; case insert: print_esc(@[@<|"insert"|@>@]);@+break; case ital_corr: print_esc('/');@+break; case mark: print_esc(@[@<|"mark"|@>@]);@+break; case math_accent: print_esc(@[@<|"mathaccent"|@>@]);@+break; case math_char_num: print_esc(@[@<|"mathchar"|@>@]);@+break; case math_choice: print_esc(@[@<|"mathchoice"|@>@]);@+break; case multiply: print_esc(@[@<|"multiply"|@>@]);@+break; case no_align: print_esc(@[@<|"noalign"|@>@]);@+break; case no_boundary: print_esc(@[@<|"noboundary"|@>@]);@+break; case no_expand: print_esc(@[@<|"noexpand"|@>@]);@+break; case non_script: print_esc(@[@<|"nonscript"|@>@]);@+break; case omit: print_esc(@[@<|"omit"|@>@]);@+break; case radical: print_esc(@[@<|"radical"|@>@]);@+break; case read_to_cs: print_esc(@[@<|"read"|@>@]);@+break; case relax: print_esc(@[@<|"relax"|@>@]);@+break; case set_box: print_esc(@[@<|"setbox"|@>@]);@+break; case set_prev_graf: print_esc(@[@<|"prevgraf"|@>@]);@+break; case set_shape: print_esc(@[@<|"parshape"|@>@]);@+break; case the: print_esc(@[@<|"the"|@>@]);@+break; case toks_register: print_esc(@[@<|"toks"|@>@]);@+break; case vadjust: print_esc(@[@<|"vadjust"|@>@]);@+break; case valign: print_esc(@[@<|"valign"|@>@]);@+break; case vcenter: print_esc(@[@<|"vcenter"|@>@]);@+break; case vrule: print_esc(@[@<|"vrule"|@>@]);@+break; @ We will deal with the other primitives later, at some point in the program where their |eq_type| and |equiv| values are more meaningful. For example, the primitives for math mode will be loaded when we consider the routines that deal with formulas. It is easy to find where each particular primitive was treated by looking in the index at the end; for example, the section where |@[@<|"radical"|@>@]| entered |eqtb| is listed under `\.{\\radical} primitive'. (Primitives consisting of a single nonalphabetic character, @!like `\.{\\/}', are listed under `Single-character primitives'.) @!@^Single-character primitives@> Meanwhile, this is a convenient place to catch up on something we were unable to do before the hash table was defined: @<Print the font identifier for |font(p)|@>= print_esc(font_id_text(font(p))) @* Saving and restoring equivalents. The nested structure provided by `$\.{\char'173}\ldots\.{\char'175}$' groups in \TeX\ means that |eqtb| entries valid in outer groups should be saved and restored later if they are overridden inside the braces. When a new |eqtb| value is being assigned, the program therefore checks to see if the previous entry belongs to an outer level. In such a case, the old value is placed on the |save_stack| just before the new value enters |eqtb|. At the end of a grouping level, i.e., when the right brace is sensed, the |save_stack| is used to restore the outer values, and the inner ones are destroyed. Entries on the |save_stack| are of type |memory_word|. The top item on this stack is |save_stack[p]|, where |p==save_ptr-1|; it contains three fields called |save_type|, |save_level|, and |save_index|, and it is interpreted in one of four ways: \yskip\hangg 1) If |save_type(p)==restore_old_value|, then |save_index(p)| is a location in |eqtb| whose current value should be destroyed at the end of the current group and replaced by |save_stack[p-1]|. Furthermore if |save_index(p) >= int_base|, then |save_level(p)| should replace the corresponding entry in |xeq_level|. \yskip\hangg 2) If |save_type(p)==restore_zero|, then |save_index(p)| is a location in |eqtb| whose current value should be destroyed at the end of the current group, when it should be replaced by the current value of |eqtb[undefined_control_sequence]|. \yskip\hangg 3) If |save_type(p)==insert_token|, then |save_index(p)| is a token that should be inserted into \TeX's input when the current group ends. \yskip\hangg 4) If |save_type(p)==level_boundary|, then |save_level(p)| is a code explaining what kind of group we were previously in, and |save_index(p)| points to the level boundary word at the bottom of the entries for that group. @d save_type(X) save_stack[X].hh.b0 /*classifies a |save_stack| entry*/ @d save_level(X) save_stack[X].hh.b1 /*saved level for regions 5 and 6, or group code*/ @d save_index(X) save_stack[X].hh.rh /*|eqtb| location or token or |save_stack| location*/ @d restore_old_value 0 /*|save_type| when a value should be restored later*/ @d restore_zero 1 /*|save_type| when an undefined entry should be restored*/ @d insert_token 2 /*|save_type| when a token is being saved for later use*/ @d level_boundary 3 /*|save_type| corresponding to beginning of group*/ @ Here are the group codes that are used to discriminate between different kinds of groups. They allow \TeX\ to decide what special actions, if any, should be performed when a group ends. \def\grp{\.{\char'173...\char'175}} Some groups are not supposed to be ended by right braces. For example, the `\.\$' that begins a math formula causes a |math_shift_group| to be started, and this should be terminated by a matching `\.\$'. Similarly, a group that starts with \.{\\left} should end with \.{\\right}, and one that starts with \.{\\begingroup} should end with \.{\\endgroup}. @d bottom_level 0 /*group code for the outside world*/ @d simple_group 1 /*group code for local structure only*/ @d hbox_group 2 /*code for `\.{\\hbox}\grp'*/ @d adjusted_hbox_group 3 /*code for `\.{\\hbox}\grp' in vertical mode*/ @d vbox_group 4 /*code for `\.{\\vbox}\grp'*/ @d vtop_group 5 /*code for `\.{\\vtop}\grp'*/ @d align_group 6 /*code for `\.{\\halign}\grp', `\.{\\valign}\grp'*/ @d no_align_group 7 /*code for `\.{\\noalign}\grp'*/ @d output_group 8 /*code for output routine*/ @d math_group 9 /*code for, e.g., `\.{\char'136}\grp'*/ @d disc_group 10 /*code for `\.{\\discretionary}\grp\grp\grp'*/ @d insert_group 11 /*code for `\.{\\insert}\grp', `\.{\\vadjust}\grp'*/ @d vcenter_group 12 /*code for `\.{\\vcenter}\grp'*/ @d math_choice_group 13 /*code for `\.{\\mathchoice}\grp\grp\grp\grp'*/ @d semi_simple_group 14 /*code for `\.{\\begingroup...\\endgroup}'*/ @d math_shift_group 15 /*code for `\.{\$...\$}'*/ @d math_left_group 16 /*code for `\.{\\left...\\right}'*/ @d max_group_code 16 @<Types...@>= typedef uint8_t group_code; /*|save_level| for a level boundary*/ @ The global variable |cur_group| keeps track of what sort of group we are currently in. Another global variable, |cur_boundary|, points to the topmost |level_boundary| word. And |cur_level| is the current depth of nesting. The routines are designed to preserve the condition that no entry in the |save_stack| or in |eqtb| ever has a level greater than |cur_level|. @ @<Glob...@>= memory_word @!save_stack[save_size+1]; uint16_t @!save_ptr; /*first unused entry on |save_stack|*/ uint16_t @!max_save_stack; /*maximum usage of save stack*/ quarterword @!cur_level; /*current nesting level for groups*/ group_code @!cur_group; /*current group type*/ uint16_t @!cur_boundary; /*where the current level begins*/ @ At this time it might be a good idea for the reader to review the introduction to |eqtb| that was given above just before the long lists of parameter names. Recall that the ``outer level'' of the program is |level_one|, since undefined control sequences are assumed to be ``defined'' at |level_zero|. @<Set init...@>= save_ptr=0;cur_level=level_one;cur_group=bottom_level;cur_boundary=0; max_save_stack=0; @ The following macro is used to test if there is room for up to six more entries on |save_stack|. By making a conservative test like this, we can get by with testing for overflow in only a few places. @d check_full_save_stack if (save_ptr > max_save_stack) {@+max_save_stack=save_ptr; if (max_save_stack > save_size-6) overflow("save size", save_size); @:TeX capacity exceeded save size}{\quad save size@> } @ Procedure |new_save_level| is called when a group begins. The argument is a group identification code like `|hbox_group|'. After calling this routine, it is safe to put five more entries on |save_stack|. In some cases integer-valued items are placed onto the |save_stack| just below a |level_boundary| word, because this is a convenient place to keep information that is supposed to ``pop up'' just when the group has finished. For example, when `\.{\\hbox to 100pt}\grp' is being treated, the 100pt dimension is stored on |save_stack| just before |new_save_level| is called. We use the notation |saved(k)| to stand for an integer item that appears in location |save_ptr+k| of the save stack. @d saved(X) save_stack[save_ptr+X].i @p void new_save_level(group_code @!c) /*begin a new level of grouping*/ {@+check_full_save_stack; save_type(save_ptr)=level_boundary;save_level(save_ptr)=cur_group; save_index(save_ptr)=cur_boundary; if (cur_level==max_quarterword) overflow("grouping levels", @:TeX capacity exceeded grouping levels}{\quad grouping levels@> max_quarterword-min_quarterword); /*quit if |(cur_level+1)| is too big to be stored in |eqtb|*/ cur_boundary=save_ptr;incr(cur_level);incr(save_ptr);cur_group=c; } @ Just before an entry of |eqtb| is changed, the following procedure should be called to update the other data structures properly. It is important to keep in mind that reference counts in |mem| include references from within |save_stack|, so these counts must be handled carefully. @^reference counts@> @p void eq_destroy(memory_word @!w) /*gets ready to forget |w|*/ {@+pointer q; /*|equiv| field of |w|*/ switch (eq_type_field(w)) { case call: case long_call: case outer_call: case long_outer_call: delete_token_ref(equiv_field(w));@+break; case glue_ref: delete_glue_ref(equiv_field(w));@+break; case shape_ref: {@+q=equiv_field(w); /*we need to free a \.{\\parshape} block*/ if (q!=null) free_node(q, info(q)+info(q)+1); } @+break; /*such a block is |2 n+1| words long, where |n==info(q)|*/ case box_ref: flush_node_list(equiv_field(w));@+break; default:do_nothing; } } @ To save a value of |eqtb[p]| that was established at level |l|, we can use the following subroutine. @p void eq_save(pointer @!p, quarterword @!l) /*saves |eqtb[p]|*/ {@+check_full_save_stack; if (l==level_zero) save_type(save_ptr)=restore_zero; else{@+save_stack[save_ptr]=eqtb[p];incr(save_ptr); save_type(save_ptr)=restore_old_value; } save_level(save_ptr)=l;save_index(save_ptr)=p;incr(save_ptr); } @ The procedure |eq_define| defines an |eqtb| entry having specified |eq_type| and |equiv| fields, and saves the former value if appropriate. This procedure is used only for entries in the first four regions of |eqtb|, i.e., only for entries that have |eq_type| and |equiv| fields. After calling this routine, it is safe to put four more entries on |save_stack|, provided that there was room for four more entries before the call, since |eq_save| makes the necessary test. @p void eq_define(pointer @!p, quarterword @!t, halfword @!e) /*new data for |eqtb|*/ {@+if (eq_level(p)==cur_level) eq_destroy(eqtb[p]); else if (cur_level > level_one) eq_save(p, eq_level(p)); eq_level(p)=cur_level;eq_type(p)=t;equiv(p)=e; } @ The counterpart of |eq_define| for the remaining (fullword) positions in |eqtb| is called |eq_word_define|. Since |xeq_level[p] >= level_one| for all |p|, a `|restore_zero|' will never be used in this case. @p void eq_word_define(pointer @!p, int @!w) {@+if (xeq_level[p]!=cur_level) {@+eq_save(p, xeq_level[p]);xeq_level[p]=cur_level; } eqtb[p].i=w; } @ The |eq_define| and |eq_word_define| routines take care of local definitions. @^global definitions@> Global definitions are done in almost the same way, but there is no need to save old values, and the new value is associated with |level_one|. @p void geq_define(pointer @!p, quarterword @!t, halfword @!e) /*global |eq_define|*/ {@+eq_destroy(eqtb[p]); eq_level(p)=level_one;eq_type(p)=t;equiv(p)=e; } @# void geq_word_define(pointer @!p, int @!w) /*global |eq_word_define|*/ {@+eqtb[p].i=w;xeq_level[p]=level_one; } @ Subroutine |save_for_after| puts a token on the stack for save-keeping. @p void save_for_after(halfword @!t) {@+if (cur_level > level_one) {@+check_full_save_stack; save_type(save_ptr)=insert_token;save_level(save_ptr)=level_zero; save_index(save_ptr)=t;incr(save_ptr); } } @ The |unsave| routine goes the other way, taking items off of |save_stack|. This routine takes care of restoration when a level ends; everything belonging to the topmost group is cleared off of the save stack. @p@t\4@>@<Declare the procedure called |restore_trace|@>@;@/ void back_input(void); void unsave(void) /*pops the top level off the save stack*/ {@+ pointer p; /*position to be restored*/ quarterword @!l; /*saved level, if in fullword regions of |eqtb|*/ halfword @!t; /*saved value of |cur_tok|*/ if (cur_level > level_one) {@+decr(cur_level); @<Clear off top level from |save_stack|@>; } else confusion(@[@<|"curlevel"|@>@]); /*|unsave| is not used when |cur_group==bottom_level|*/ @:this can't happen curlevel}{\quad curlevel@> } @ @<Clear off...@>= loop@+{@+decr(save_ptr); if (save_type(save_ptr)==level_boundary) goto done; p=save_index(save_ptr); if (save_type(save_ptr)==insert_token) @<Insert token |p| into \TeX's input@>@; else{@+if (save_type(save_ptr)==restore_old_value) {@+l=save_level(save_ptr);decr(save_ptr); } else save_stack[save_ptr]=eqtb[undefined_control_sequence]; @<Store \(s)|save_stack[save_ptr]| in |eqtb[p]|, unless |eqtb[p]| holds a global value@>; } } done: cur_group=save_level(save_ptr);cur_boundary=save_index(save_ptr) @ A global definition, which sets the level to |level_one|, @^global definitions@> will not be undone by |unsave|. If at least one global definition of |eqtb[p]| has been carried out within the group that just ended, the last such definition will therefore survive. @<Store \(s)|save...@>= if (p < int_base) if (eq_level(p)==level_one) {@+eq_destroy(save_stack[save_ptr]); /*destroy the saved value*/ #ifdef @!STAT if (tracing_restores > 0) restore_trace(p,@[@<|"retaining"|@>@]); #endif @;@/ } else{@+eq_destroy(eqtb[p]); /*destroy the current value*/ eqtb[p]=save_stack[save_ptr]; /*restore the saved value*/ #ifdef @!STAT if (tracing_restores > 0) restore_trace(p,@[@<|"restoring"|@>@]); #endif @;@/ } else if (xeq_level[p]!=level_one) {@+eqtb[p]=save_stack[save_ptr];xeq_level[p]=l; #ifdef @!STAT if (tracing_restores > 0) restore_trace(p,@[@<|"restoring"|@>@]); #endif @;@/ } else{ #ifdef @!STAT if (tracing_restores > 0) restore_trace(p,@[@<|"retaining"|@>@]); #endif @;@/ } @ @<Declare the procedure called |restore_trace|@>= #ifdef @!STAT void restore_trace(pointer @!p, str_number @!s) /*|eqtb[p]| has just been restored or retained*/ {@+begin_diagnostic();print_char('{');print(s);print_char(' '); show_eqtb(p);print_char('}'); end_diagnostic(false); } #endif @ When looking for possible pointers to a memory location, it is helpful to look for references from |eqtb| that might be waiting on the save stack. Of course, we might find spurious pointers too; but this routine is merely an aid when debugging, and at such times we are grateful for any scraps of information, even if they prove to be irrelevant. @^dirty \PASCAL@> @<Search |save_stack| for equivalents that point to |p|@>= if (save_ptr > 0) for (q=0; q<=save_ptr-1; q++) {@+if (equiv_field(save_stack[q])==p) {@+print_nl("SAVE(");print_int(q);print_char(')'); } } @ Most of the parameters kept in |eqtb| can be changed freely, but there's an exception: The magnification should not be used with two different values during any \TeX\ job, since a single magnification is applied to an entire run. The global variable |mag_set| is set to the current magnification whenever it becomes necessary to ``freeze'' it at a particular value. @<Glob...@>= int @!mag_set; /*if nonzero, this magnification should be used henceforth*/ @ @<Set init...@>= mag_set=0; @ The |prepare_mag| subroutine is called whenever \TeX\ wants to use |mag| for magnification. @p void prepare_mag(void) {@+if ((mag_set > 0)&&(mag!=mag_set)) {@+print_err("Incompatible magnification (");print_int(mag); @.Incompatible magnification@> print_str(");");print_nl(" the previous value will be retained"); help2("I can handle only one magnification ratio per job. So I've")@/ ("reverted to the magnification you used earlier on this run.");@/ int_error(mag_set); geq_word_define(int_base+mag_code, mag_set); /*|mag=mag_set|*/ } if ((mag <= 0)||(mag > 32768)) {@+print_err("Illegal magnification has been changed to 1000");@/ @.Illegal magnification...@> help1("The magnification ratio must be between 1 and 32768."); int_error(mag);geq_word_define(int_base+mag_code, 1000); } mag_set=mag; } @* Token lists. A \TeX\ token is either a character or a control sequence, and it is @^token@> represented internally in one of two ways: (1)~A character whose ASCII code number is |c| and whose command code is |m| is represented as the number $2^8m+c$; the command code is in the range |1 <= m <= 14|. (2)~A control sequence whose |eqtb| address is |p| is represented as the number |cs_token_flag+p|. Here |cs_token_flag==@t$2^{12}-1$@>| is larger than $2^8m+c$, yet it is small enough that |cs_token_flag+p < max_halfword|; thus, a token fits comfortably in a halfword. A token |t| represents a |left_brace| command if and only if |t < left_brace_limit|; it represents a |right_brace| command if and only if we have |left_brace_limit <= t < right_brace_limit|; and it represents a |match| or |end_match| command if and only if |match_token <= t <= end_match_token|. The following definitions take care of these token-oriented constants and a few others. @d cs_token_flag 07777 /*amount added to the |eqtb| location in a token that stands for a control sequence; is a multiple of~256, less~1*/ @d left_brace_token 00400 /*$2^8\cdot|left_brace|$*/ @d left_brace_limit 01000 /*$2^8\cdot(|left_brace|+1)$*/ @d right_brace_token 01000 /*$2^8\cdot|right_brace|$*/ @d right_brace_limit 01400 /*$2^8\cdot(|right_brace|+1)$*/ @d math_shift_token 01400 /*$2^8\cdot|math_shift|$*/ @d tab_token 02000 /*$2^8\cdot|tab_mark|$*/ @d out_param_token 02400 /*$2^8\cdot|out_param|$*/ @d space_token 05040 /*$2^8\cdot|spacer|+|' '|$*/ @d letter_token 05400 /*$2^8\cdot|letter|$*/ @d other_token 06000 /*$2^8\cdot|other_char|$*/ @d match_token 06400 /*$2^8\cdot|match|$*/ @d end_match_token 07000 /*$2^8\cdot|end_match|$*/ @ @<Check the ``constant''...@>= if (cs_token_flag+undefined_control_sequence > max_halfword) bad=21; @ A token list is a singly linked list of one-word nodes in |mem|, where each word contains a token and a link. Macro definitions, output-routine definitions, marks, \.{\\write} texts, and a few other things are remembered by \TeX\ in the form of token lists, usually preceded by a node with a reference count in its |token_ref_count| field. The token stored in location |p| is called |info(p)|. Three special commands appear in the token lists of macro definitions. When |m==match|, it means that \TeX\ should scan a parameter for the current macro; when |m==end_match|, it means that parameter matching should end and \TeX\ should start reading the macro text; and when |m==out_param|, it means that \TeX\ should insert parameter number |c| into the text at this point. The enclosing \.{\char'173} and \.{\char'175} characters of a macro definition are omitted, but the final right brace of an output routine is included at the end of its token list. Here is an example macro definition that illustrates these conventions. After \TeX\ processes the text $$\.{\\def\\mac a\#1\#2 \\b \{\#1\\-a \#\#1\#2 \#2\}}$$ the definition of \.{\\mac} is represented as a token list containing $$\def\,{\hskip2pt} \vbox{\halign{\hfil#\hfil\cr (reference count), |letter|\,\.a, |match|\,\#, |match|\,\#, |spacer|\,\.\ , \.{\\b}, |end_match|,\cr |out_param|\,1, \.{\\-}, |letter|\,\.a, |spacer|\,\.\ , |mac_param|\,\#, |other_char|\,\.1,\cr |out_param|\,2, |spacer|\,\.\ , |out_param|\,2.\cr}}$$ The procedure |scan_toks| builds such token lists, and |macro_call| does the parameter matching. @^reference counts@> Examples such as $$\.{\\def\\m\{\\def\\m\{a\}\ b\}}$$ explain why reference counts would be needed even if \TeX\ had no \.{\\let} operation: When the token list for \.{\\m} is being read, the redefinition of \.{\\m} changes the |eqtb| entry before the token list has been fully consumed, so we dare not simply destroy a token list when its control sequence is being redefined. If the parameter-matching part of a definition ends with `\.{\#\{}', the corresponding token list will have `\.\{' just before the `|end_match|' and also at the very end. The first `\.\{' is used to delimit the parameter; the second one keeps the first from disappearing. @ The procedure |show_token_list|, which prints a symbolic form of the token list that starts at a given node |p|, illustrates these conventions. The token list being displayed should not begin with a reference count. However, the procedure is intended to be robust, so that if the memory links are awry or if |p| is not really a pointer to a token list, nothing catastrophic will happen. An additional parameter |q| is also given; this parameter is either null or it points to a node in the token list where a certain magic computation takes place that will be explained later. (Basically, |q| is non-null when we are printing the two-line context information at the time of an error message; |q| marks the place corresponding to where the second line should begin.) For example, if |p| points to the node containing the first \.a in the token list above, then |show_token_list| will print the string $$\hbox{`\.{a\#1\#2\ \\b\ ->\#1\\-a\ \#\#1\#2\ \#2}';}$$ and if |q| points to the node containing the second \.a, the magic computation will be performed just before the second \.a is printed. The generation will stop, and `\.{\\ETC.}' will be printed, if the length of printing exceeds a given limit~|l|. Anomalous entries are printed in the form of control sequences that are not followed by a blank space, e.g., `\.{\\BAD.}'; this cannot be confused with actual control sequences because a real control sequence named \.{BAD} would come out `\.{\\BAD\ }'. @<Declare the procedure called |show_token_list|@>= void show_token_list(int @!p, int @!q, int @!l) {@+ int m, @!c; /*pieces of a token*/ ASCII_code @!match_chr; /*character used in a `|match|'*/ ASCII_code @!n; /*the highest parameter number, as an ASCII digit*/ match_chr='#';n='0';tally=0; while ((p!=null)&&(tally < l)) {@+if (p==q) @<Do magic computation@>; @<Display token |p|, and |return| if there are problems@>; p=link(p); } if (p!=null) print_esc(@[@<|"ETC."|@>@]); @.ETC@> } @ @<Display token |p|...@>= if ((p < hi_mem_min)||(p > mem_end)) {@+print_esc(@[@<|"CLOBBERED."|@>@]);return; @.CLOBBERED@> } if (info(p) >= cs_token_flag) print_cs(info(p)-cs_token_flag); else{@+m=info(p)/0400;c=info(p)%0400; if (info(p) < 0) print_esc(@[@<|"BAD."|@>@]); @.BAD@> else@<Display the token $(|m|,|c|)$@>; } @ The procedure usually ``learns'' the character code used for macro parameters by seeing one in a |match| command before it runs into any |out_param| commands. @<Display the token...@>= switch (m) { case left_brace: case right_brace: case math_shift: case tab_mark: case sup_mark: case sub_mark: case spacer: case letter: case other_char: print(c);@+break; case mac_param: {@+print(c);print(c); } @+break; case out_param: {@+print(match_chr); if (c <= 9) print_char(c+'0'); else{@+print_char('!');return; } } @+break; case match: {@+match_chr=c;print(c);incr(n);print_char(n); if (n > '9') return; } @+break; case end_match: print_str("->");@+break; @.->@> default:print_esc(@[@<|"BAD."|@>@]); @.BAD@> } @ Here's the way we sometimes want to display a token list, given a pointer to its reference count; the pointer may be null. @p void token_show(pointer @!p) {@+if (p!=null) show_token_list(link(p), null, 10000000); } @ The |print_meaning| subroutine displays |cur_cmd| and |cur_chr| in symbolic form, including the expansion of a macro or mark. @p void print_meaning(void) {@+print_cmd_chr(cur_cmd, cur_chr); if (cur_cmd >= call) {@+print_char(':');print_ln();token_show(cur_chr); } else if (cur_cmd==top_bot_mark) {@+print_char(':');print_ln(); token_show(cur_mark[cur_chr]); } } @* Introduction to the syntactic routines. Let's pause a moment now and try to look at the Big Picture. The \TeX\ program consists of three main parts: syntactic routines, semantic routines, and output routines. The chief purpose of the syntactic routines is to deliver the user's input to the semantic routines, one token at a time. The semantic routines act as an interpreter responding to these tokens, which may be regarded as commands. And the output routines are periodically called on to convert box-and-glue lists into a compact set of instructions that will be sent to a typesetter. We have discussed the basic data structures and utility routines of \TeX, so we are good and ready to plunge into the real activity by considering the syntactic routines. Our current goal is to come to grips with the |get_next| procedure, which is the keystone of \TeX's input mechanism. Each call of |get_next| sets the value of three variables |cur_cmd|, |cur_chr|, and |cur_cs|, representing the next input token. $$\vbox{\halign{#\hfil\cr \hbox{|cur_cmd| denotes a command code from the long list of codes given above;}\cr \hbox{|cur_chr| denotes a character code or other modifier of the command code;}\cr \hbox{|cur_cs| is the |eqtb| location of the current control sequence,}\cr \hbox{\qquad if the current token was a control sequence, otherwise it's zero.}\cr}}$$ Underlying this external behavior of |get_next| is all the machinery necessary to convert from character files to tokens. At a given time we may be only partially finished with the reading of several files (for which \.{\\input} was specified), and partially finished with the expansion of some user-defined macros and/or some macro parameters, and partially finished with the generation of some text in a template for \.{\\halign}, and so on. When reading a character file, special characters must be classified as math delimiters, etc.; comments and extra blank spaces must be removed, paragraphs must be recognized, and control sequences must be found in the hash table. Furthermore there are occasions in which the scanning routines have looked ahead for a word like `\.{plus}' but only part of that word was found, hence a few characters must be put back into the input and scanned again. To handle these situations, which might all be present simultaneously, \TeX\ uses various stacks that hold information about the incomplete activities, and there is a finite state control for each level of the input mechanism. These stacks record the current state of an implicitly recursive process, but the |get_next| procedure is not recursive. Therefore it will not be difficult to translate these algorithms into low-level languages that do not support recursion. @<Glob...@>= eight_bits @!cur_cmd; /*current command set by |get_next|*/ halfword @!cur_chr; /*operand of current command*/ pointer @!cur_cs; /*control sequence found here, zero if none found*/ halfword @!cur_tok; /*packed representative of |cur_cmd| and |cur_chr|*/ @ The |print_cmd_chr| routine prints a symbolic interpretation of a command code and its modifier. This is used in certain `\.{You can\'t}' error messages, and in the implementation of diagnostic routines like \.{\\show}. The body of |print_cmd_chr| is a rather tedious listing of print commands, and most of it is essentially an inverse to the |primitive| routine that enters a \TeX\ primitive into |eqtb|. Therefore much of this procedure appears elsewhere in the program, together with the corresponding |primitive| calls. @d chr_cmd(X) {@+print_str(X);print_ASCII(chr_code); } @<Declare the procedure called |print_cmd_chr|@>= void print_cmd_chr(quarterword @!cmd, halfword @!chr_code) {@+switch (cmd) { case left_brace: chr_cmd("begin-group character ")@;@+break; case right_brace: chr_cmd("end-group character ")@;@+break; case math_shift: chr_cmd("math shift character ")@;@+break; case mac_param: chr_cmd("macro parameter character ")@;@+break; case sup_mark: chr_cmd("superscript character ")@;@+break; case sub_mark: chr_cmd("subscript character ")@;@+break; case endv: print_str("end of alignment template");@+break; case spacer: chr_cmd("blank space ")@;@+break; case letter: chr_cmd("the letter ")@;@+break; case other_char: chr_cmd("the character ")@;@+break; @t\4@>@<Cases of |print_cmd_chr| for symbolic printing of primitives@>@; default:print_str("[unknown command code!]"); } } @ Here is a procedure that displays the current command. @p void show_cur_cmd_chr(void) {@+begin_diagnostic();print_nl("{"); if (mode!=shown_mode) {@+print_mode(mode);print_str(": ");shown_mode=mode; } print_cmd_chr(cur_cmd, cur_chr);print_char('}'); end_diagnostic(false); } @* Input stacks and states. This implementation of \TeX\ uses two different conventions for representing sequential stacks. @^stack conventions@>@^conventions for representing stacks@> \yskip\hangg 1) If there is frequent access to the top entry, and if the stack is essentially never empty, then the top entry is kept in a global variable (even better would be a machine register), and the other entries appear in the array $\\{stack}[0\to(\\{ptr}-1)]$. For example, the semantic stack described above is handled this way, and so is the input stack that we are about to study. \yskip\hangg 2) If there is infrequent top access, the entire stack contents are in the array $\\{stack}[0\to(\\{ptr}-1)]$. For example, the |save_stack| is treated this way, as we have seen. \yskip\noindent The state of \TeX's input mechanism appears in the input stack, whose entries are records with six fields, called |state|, |index|, |start|, |loc|, |limit|, and |name|. This stack is maintained with convention~(1), so it is declared in the following way: @<Types...@>= typedef struct { quarterword @!state_field, @!index_field; halfword @!start_field, @!loc_field, @!limit_field, @!name_field; } in_state_record; @ @<Glob...@>= in_state_record @!input_stack[stack_size+1]; uint8_t @!input_ptr; /*first unused location of |input_stack|*/ uint8_t @!max_in_stack; /*largest value of |input_ptr| when pushing*/ in_state_record @!cur_input; /*the ``top'' input state, according to convention (1)*/ @ We've already defined the special variable |loc====cur_input.loc_field| in our discussion of basic input-output routines. The other components of |cur_input| are defined in the same way: @d state cur_input.state_field /*current scanner state*/ @d index cur_input.index_field /*reference for buffer information*/ @d start cur_input.start_field /*starting position in |buffer|*/ @d limit cur_input.limit_field /*end of current line in |buffer|*/ @d name cur_input.name_field /*name of the current file*/ @ Let's look more closely now at the control variables (|state|,~|index|,~|start|,~|loc|,~|limit|,~|name|), assuming that \TeX\ is reading a line of characters that have been input from some file or from the user's terminal. There is an array called |buffer| that acts as a stack of all lines of characters that are currently being read from files, including all lines on subsidiary levels of the input stack that are not yet completed. \TeX\ will return to the other lines when it is finished with the present input file. (Incidentally, on a machine with byte-oriented addressing, it might be appropriate to combine |buffer| with the |str_pool| array, letting the buffer entries grow downward from the top of the string pool and checking that these two tables don't bump into each other.) The line we are currently working on begins in position |start| of the buffer; the next character we are about to read is |buffer[loc]|; and |limit| is the location of the last character present. If |loc > limit|, the line has been completely read. Usually |buffer[limit]| is the |end_line_char|, denoting the end of a line, but this is not true if the current line is an insertion that was entered on the user's terminal in response to an error message. The |name| variable is a string number that designates the name of the current file, if we are reading a text file. It is zero if we are reading from the terminal; it is |n+1| if we are reading from input stream |n|, where |0 <= n <= 16|. (Input stream 16 stands for an invalid stream number; in such cases the input is actually from the terminal, under control of the procedure |read_toks|.) The |state| variable has one of three values, when we are scanning such files: $$\baselineskip 15pt\vbox{\halign{#\hfil\cr 1) |state==mid_line| is the normal state.\cr 2) |state==skip_blanks| is like |mid_line|, but blanks are ignored.\cr 3) |state==new_line| is the state at the beginning of a line.\cr}}$$ These state values are assigned numeric codes so that if we add the state code to the next character's command code, we get distinct values. For example, `|mid_line+spacer|' stands for the case that a blank space character occurs in the middle of a line when it is not being ignored; after this case is processed, the next value of |state| will be |skip_blanks|. @d mid_line 1 /*|state| code when scanning a line of characters*/ @d skip_blanks (2+max_char_code) /*|state| code when ignoring blanks*/ @d new_line (3+max_char_code+max_char_code) /*|state| code at start of line*/ @ Additional information about the current line is available via the |index| variable, which counts how many lines of characters are present in the buffer below the current level. We have |index==0| when reading from the terminal and prompting the user for each line; then if the user types, e.g., `\.{\\input paper}', we will have |index==1| while reading the file \.{paper.tex}. However, it does not follow that |index| is the same as the input stack pointer, since many of the levels on the input stack may come from token lists. For example, the instruction `\.{\\input paper}' might occur in a token list. The global variable |in_open| is equal to the |index| value of the highest non-token-list level. Thus, the number of partially read lines in the buffer is |in_open+1|, and we have |in_open==index| when we are not reading a token list. If we are not currently reading from the terminal, or from an input stream, we are reading from the file variable |input_file[index]|. We use the notation |terminal_input| as a convenient abbreviation for |name==0|, and |cur_file| as an abbreviation for |input_file[index]|. The global variable |line| contains the line number in the topmost open file, for use in error messages. If we are not reading from the terminal, |line_stack[index]| holds the line number for the enclosing level, so that |line| can be restored when the current file has been read. Line numbers should never be negative, since the negative of the current line number is used to identify the user's output routine in the |mode_line| field of the semantic nest entries. If more information about the input state is needed, it can be included in small arrays like those shown here. For example, the current page or segment number in the input file might be put into a variable |@!page|, maintained for enclosing levels in `\ignorespaces|@!page_stack: array[1 dotdot max_in_open]int|\unskip' by analogy with |line_stack|. @^system dependencies@> @d terminal_input (name==0) /*are we reading from the terminal?*/ @d cur_file input_file[index] /*the current |alpha_file| variable*/ @<Glob...@>= uint8_t @!in_open; /*the number of lines in the buffer, less one*/ uint8_t @!open_parens; /*the number of open text files*/ alpha_file @!input_file0[max_in_open], *const @!input_file = @!input_file0-1; int @!line; /*current line number in the current source file*/ int @!line_stack0[max_in_open], *const @!line_stack = @!line_stack0-1; @ Users of \TeX\ sometimes forget to balance left and right braces properly, and one of the ways \TeX\ tries to spot such errors is by considering an input file as broken into subfiles by control sequences that are declared to be \.{\\outer}. A variable called |scanner_status| tells \TeX\ whether or not to complain when a subfile ends. This variable has six possible values: \yskip\hang|normal|, means that a subfile can safely end here without incident. \yskip\hang|skipping|, means that a subfile can safely end here, but not a file, because we're reading past some conditional text that was not selected. \yskip\hang|defining|, means that a subfile shouldn't end now because a macro is being defined. \yskip\hang|matching|, means that a subfile shouldn't end now because a macro is being used and we are searching for the end of its arguments. \yskip\hang|aligning|, means that a subfile shouldn't end now because we are not finished with the preamble of an \.{\\halign} or \.{\\valign}. \yskip\hang|absorbing|, means that a subfile shouldn't end now because we are reading a balanced token list for \.{\\message}, \.{\\write}, etc. \yskip\noindent If the |scanner_status| is not |normal|, the variable |warning_index| points to the |eqtb| location for the relevant control sequence name to print in an error message. @d skipping 1 /*|scanner_status| when passing conditional text*/ @d defining 2 /*|scanner_status| when reading a macro definition*/ @d matching 3 /*|scanner_status| when reading macro arguments*/ @d aligning 4 /*|scanner_status| when reading an alignment preamble*/ @d absorbing 5 /*|scanner_status| when reading a balanced text*/ @<Glob...@>= uint8_t @!scanner_status; /*can a subfile end now?*/ pointer @!warning_index; /*identifier relevant to non-|normal| scanner status*/ pointer @!def_ref; /*reference count of token list being defined*/ @ Here is a procedure that uses |scanner_status| to print a warning message when a subfile has ended, and at certain other crucial times: @<Declare the procedure called |runaway|@>= void runaway(void) {@+pointer p; /*head of runaway list*/ if (scanner_status > skipping) {@+print_nl("Runaway "); @.Runaway...@> switch (scanner_status) { case defining: {@+print_str("definition");p=def_ref; } @+break; case matching: {@+print_str("argument");p=temp_head; } @+break; case aligning: {@+print_str("preamble");p=hold_head; } @+break; case absorbing: {@+print_str("text");p=def_ref; } } /*there are no other cases*/ print_char('?');print_ln();show_token_list(link(p), null, error_line-10); } } @ However, all this discussion about input state really applies only to the case that we are inputting from a file. There is another important case, namely when we are currently getting input from a token list. In this case |state==token_list|, and the conventions about the other state variables are different: \yskip\hang|loc| is a pointer to the current node in the token list, i.e., the node that will be read next. If |loc==null|, the token list has been fully read. \yskip\hang|start| points to the first node of the token list; this node may or may not contain a reference count, depending on the type of token list involved. \yskip\hang|token_type|, which takes the place of |index| in the discussion above, is a code number that explains what kind of token list is being scanned. \yskip\hang|name| points to the |eqtb| address of the control sequence being expanded, if the current token list is a macro. \yskip\hang|param_start|, which takes the place of |limit|, tells where the parameters of the current macro begin in the |param_stack|, if the current token list is a macro. \yskip\noindent The |token_type| can take several values, depending on where the current token list came from: \yskip\hang|parameter|, if a parameter is being scanned; \hang|u_template|, if the \<u_j> part of an alignment template is being scanned; \hang|v_template|, if the \<v_j> part of an alignment template is being scanned; \hang|backed_up|, if the token list being scanned has been inserted as `to be read again'. \hang|inserted|, if the token list being scanned has been inserted as the text expansion of a \.{\\count} or similar variable; \hang|macro|, if a user-defined control sequence is being scanned; \hang|output_text|, if an \.{\\output} routine is being scanned; \hang|every_par_text|, if the text of \.{\\everypar} is being scanned; \hang|every_math_text|, if the text of \.{\\everymath} is being scanned; \hang|every_display_text|, if the text of \.{\\everydisplay} is being scanned; \hang|every_hbox_text|, if the text of \.{\\everyhbox} is being scanned; \hang|every_vbox_text|, if the text of \.{\\everyvbox} is being scanned; \hang|every_job_text|, if the text of \.{\\everyjob} is being scanned; \hang|every_cr_text|, if the text of \.{\\everycr} is being scanned; \hang|mark_text|, if the text of a \.{\\mark} is being scanned; \hang|write_text|, if the text of a \.{\\write} is being scanned. \yskip\noindent The codes for |output_text|, |every_par_text|, etc., are equal to a constant plus the corresponding codes for token list parameters |output_routine_loc|, |every_par_loc|, etc. The token list begins with a reference count if and only if |token_type >= macro|. @^reference counts@> @d token_list 0 /*|state| code when scanning a token list*/ @d token_type index /*type of current token list*/ @d param_start limit /*base of macro parameters in |param_stack|*/ @d parameter 0 /*|token_type| code for parameter*/ @d u_template 1 /*|token_type| code for \<u_j> template*/ @d v_template 2 /*|token_type| code for \<v_j> template*/ @d backed_up 3 /*|token_type| code for text to be reread*/ @d inserted 4 /*|token_type| code for inserted texts*/ @d macro 5 /*|token_type| code for defined control sequences*/ @d output_text 6 /*|token_type| code for output routines*/ @d every_par_text 7 /*|token_type| code for \.{\\everypar}*/ @d every_math_text 8 /*|token_type| code for \.{\\everymath}*/ @d every_display_text 9 /*|token_type| code for \.{\\everydisplay}*/ @d every_hbox_text 10 /*|token_type| code for \.{\\everyhbox}*/ @d every_vbox_text 11 /*|token_type| code for \.{\\everyvbox}*/ @d every_job_text 12 /*|token_type| code for \.{\\everyjob}*/ @d every_cr_text 13 /*|token_type| code for \.{\\everycr}*/ @d mark_text 14 /*|token_type| code for \.{\\topmark}, etc.*/ @d write_text 15 /*|token_type| code for \.{\\write}*/ @ The |param_stack| is an auxiliary array used to hold pointers to the token lists for parameters at the current level and subsidiary levels of input. This stack is maintained with convention (2), and it grows at a different rate from the others. @<Glob...@>= pointer @!param_stack[param_size+1]; /*token list pointers for parameters*/ uint8_t @!param_ptr; /*first unused entry in |param_stack|*/ int @!max_param_stack; /*largest value of |param_ptr|, will be | <= param_size+9|*/ @ The input routines must also interact with the processing of \.{\\halign} and \.{\\valign}, since the appearance of tab marks and \.{\\cr} in certain places is supposed to trigger the beginning of special \<v_j> template text in the scanner. This magic is accomplished by an |align_state| variable that is increased by~1 when a `\.{\char'173}' is scanned and decreased by~1 when a `\.{\char'175}' is scanned. The |align_state| is nonzero during the \<u_j> template, after which it is set to zero; the \<v_j> template begins when a tab mark or \.{\\cr} occurs at a time that |align_state==0|. @<Glob...@>= int @!align_state; /*group level with respect to current alignment*/ @ Thus, the ``current input state'' can be very complicated indeed; there can be many levels and each level can arise in a variety of ways. The |show_context| procedure, which is used by \TeX's error-reporting routine to print out the current input state on all levels down to the most recent line of characters from an input file, illustrates most of these conventions. The global variable |base_ptr| contains the lowest level that was displayed by this procedure. @<Glob...@>= uint8_t @!base_ptr; /*shallowest level shown by |show_context|*/ @ The status at each level is indicated by printing two lines, where the first line indicates what was read so far and the second line shows what remains to be read. The context is cropped, if necessary, so that the first line contains at most |half_error_line| characters, and the second contains at most |error_line|. Non-current input levels whose |token_type| is `|backed_up|' are shown only if they have not been fully read. @p void show_context(void) /*prints where the scanner is*/ {@+ uint8_t old_setting; /*saved |selector| setting*/ int @!nn; /*number of contexts shown so far, less one*/ bool @!bottom_line; /*have we reached the final context to be shown?*/ @<Local variables for formatting calculations@>@; base_ptr=input_ptr;input_stack[base_ptr]=cur_input; /*store current state*/ nn=-1;bottom_line=false; loop@+{@+cur_input=input_stack[base_ptr]; /*enter into the context*/ if ((state!=token_list)) if ((name > 17)||(base_ptr==0)) bottom_line=true; if ((base_ptr==input_ptr)||bottom_line||(nn < error_context_lines)) @<Display the current context@>@; else if (nn==error_context_lines) {@+print_nl("...");incr(nn); /*omitted if |error_context_lines < 0|*/ } if (bottom_line) goto done; decr(base_ptr); } done: cur_input=input_stack[input_ptr]; /*restore original state*/ } @ @<Display the current context@>= {@+if ((base_ptr==input_ptr)||(state!=token_list)|| (token_type!=backed_up)||(loc!=null)) /*we omit backed-up token lists that have already been read*/ {@+tally=0; /*get ready to count characters*/ old_setting=selector; if (state!=token_list) {@+@<Print location of current line@>; @<Pseudoprint the line@>; } else{@+@<Print type of token list@>; @<Pseudoprint the token list@>; } selector=old_setting; /*stop pseudoprinting*/ @<Print two lines using the tricky pseudoprinted information@>; incr(nn); } } @ This routine should be changed, if necessary, to give the best possible indication of where the current line resides in the input file. For example, on some systems it is best to print both a page and line number. @^system dependencies@> @<Print location of current line@>= if (name <= 17) if (terminal_input) if (base_ptr==0) print_nl("<*>");else print_nl("<insert> "); else{@+print_nl("<read "); if (name==17) print_char('*');@+else print_int(name-1); @.*\relax@> print_char('>'); } else{@+print_nl("l.");print_int(line); } print_char(' ') @ @<Print type of token list@>= switch (token_type) { case parameter: print_nl("<argument> ");@+break; case u_template: case v_template: print_nl("<template> ");@+break; case backed_up: if (loc==null) print_nl("<recently read> "); else print_nl("<to be read again> ");@+break; case inserted: print_nl("<inserted text> ");@+break; case macro: {@+print_ln();print_cs(name); } @+break; case output_text: print_nl("<output> ");@+break; case every_par_text: print_nl("<everypar> ");@+break; case every_math_text: print_nl("<everymath> ");@+break; case every_display_text: print_nl("<everydisplay> ");@+break; case every_hbox_text: print_nl("<everyhbox> ");@+break; case every_vbox_text: print_nl("<everyvbox> ");@+break; case every_job_text: print_nl("<everyjob> ");@+break; case every_cr_text: print_nl("<everycr> ");@+break; case mark_text: print_nl("<mark> ");@+break; case write_text: print_nl("<write> ");@+break; default:print_nl("?"); /*this should never happen*/ } @ Here it is necessary to explain a little trick. We don't want to store a long string that corresponds to a token list, because that string might take up lots of memory; and we are printing during a time when an error message is being given, so we dare not do anything that might overflow one of \TeX's tables. So `pseudoprinting' is the answer: We enter a mode of printing that stores characters into a buffer of length |error_line|, where character $k+1$ is placed into \hbox{|trick_buf[k%error_line]|} if |k < trick_count|, otherwise character |k| is dropped. Initially we set |tally=0| and |trick_count=1000000|; then when we reach the point where transition from line 1 to line 2 should occur, we set |first_count=tally| and |trick_count=@tmax@>(error_line, tally+1+error_line-half_error_line)|. At the end of the pseudoprinting, the values of |first_count|, |tally|, and |trick_count| give us all the information we need to print the two lines, and all of the necessary text is in |trick_buf|. Namely, let |l| be the length of the descriptive information that appears on the first line. The length of the context information gathered for that line is |k==first_count|, and the length of the context information gathered for line~2 is $m=\min(|tally|, |trick_count|)-k$. If |l+k <= h|, where |h==half_error_line|, we print |trick_buf[0 dotdot k-1]| after the descriptive information on line~1, and set |n=l+k|; here |n| is the length of line~1. If $l+k>h$, some cropping is necessary, so we set |n=h| and print `\.{...}' followed by $$\hbox{|trick_buf[(l+k-h+3)dotdot k-1]|,}$$ where subscripts of |trick_buf| are circular modulo |error_line|. The second line consists of |n|~spaces followed by |trick_buf[k dotdot(k+m-1)]|, unless |n+m > error_line|; in the latter case, further cropping is done. This is easier to program than to explain. @<Local variables for formatting...@>= int @!i; /*index into |buffer|*/ uint16_t @!j; /*end of current line in |buffer|*/ uint8_t @!l; /*length of descriptive information on line 1*/ int @!m; /*context information gathered for line 2*/ uint8_t @!n; /*length of line 1*/ int @!p; /*starting or ending place in |trick_buf|*/ int @!q; /*temporary index*/ @ The following code sets up the print routines so that they will gather the desired information. @d begin_pseudoprint {@+l=tally;tally=0;selector=pseudo; trick_count=1000000; } @d set_trick_count {@+first_count=tally; trick_count=tally+1+error_line-half_error_line; if (trick_count < error_line) trick_count=error_line; } @ And the following code uses the information after it has been gathered. @<Print two lines using the tricky pseudoprinted information@>= if (trick_count==1000000) set_trick_count; /*|set_trick_count| must be performed*/ if (tally < trick_count) m=tally-first_count; else m=trick_count-first_count; /*context on line 2*/ if (l+first_count <= half_error_line) {@+p=0;n=l+first_count; } else{@+print_str("...");p=l+first_count-half_error_line+3; n=half_error_line; } for (q=p; q<=first_count-1; q++) print_char(trick_buf[q%error_line]); print_ln(); for (q=1; q<=n; q++) print_char(' '); /*print |n| spaces to begin line~2*/ if (m+n <= error_line) p=first_count+m;else p=first_count+(error_line-n-3); for (q=first_count; q<=p-1; q++) print_char(trick_buf[q%error_line]); if (m+n > error_line) print_str("...") @ But the trick is distracting us from our current goal, which is to understand the input state. So let's concentrate on the data structures that are being pseudoprinted as we finish up the |show_context| procedure. @<Pseudoprint the line@>= begin_pseudoprint; if (buffer[limit]==end_line_char) j=limit; else j=limit+1; /*determine the effective end of the line*/ if (j > 0) for (i=start; i<=j-1; i++) {@+if (i==loc) set_trick_count; print(buffer[i]); } @ @<Pseudoprint the token list@>= begin_pseudoprint; if (token_type < macro) show_token_list(start, loc, 100000); else show_token_list(link(start), loc, 100000) /*avoid reference count*/ @ Here is the missing piece of |show_token_list| that is activated when the token beginning line~2 is about to be shown: @<Do magic computation@>=set_trick_count @* Maintaining the input stacks. The following subroutines change the input status in commonly needed ways. First comes |push_input|, which stores the current state and creates a new level (having, initially, the same properties as the old). @d push_input @t@> /*enter a new input level, save the old*/ {@+if (input_ptr > max_in_stack) {@+max_in_stack=input_ptr; if (input_ptr==stack_size) overflow("input stack size", stack_size); @:TeX capacity exceeded input stack size}{\quad input stack size@> } input_stack[input_ptr]=cur_input; /*stack the record*/ incr(input_ptr); } @ And of course what goes up must come down. @d pop_input @t@> /*leave an input level, re-enter the old*/ {@+decr(input_ptr);cur_input=input_stack[input_ptr]; } @ Here is a procedure that starts a new level of token-list input, given a token list |p| and its type |t|. If |t==macro|, the calling routine should set |name| and |loc|. @d back_list(X) begin_token_list(X, backed_up) /*backs up a simple token list*/ @d ins_list(X) begin_token_list(X, inserted) /*inserts a simple token list*/ @p void begin_token_list(pointer @!p, quarterword @!t) {@+push_input;state=token_list;start=p;token_type=t; if (t >= macro) /*the token list starts with a reference count*/ {@+add_token_ref(p); if (t==macro) param_start=param_ptr; else{@+loc=link(p); if (tracing_macros > 1) {@+begin_diagnostic();print_nl(""); switch (t) { case mark_text: print_esc(@[@<|"mark"|@>@]);@+break; case write_text: print_esc(@[@<|"write"|@>@]);@+break; default:print_cmd_chr(assign_toks, t-output_text+output_routine_loc); } @/ print_str("->");token_show(p);end_diagnostic(false); } } } else loc=p; } @ When a token list has been fully scanned, the following computations should be done as we leave that level of input. The |token_type| tends to be equal to either |backed_up| or |inserted| about 2/3 of the time. @^inner loop@> @p void end_token_list(void) /*leave a token-list input level*/ {@+if (token_type >= backed_up) /*token list to be deleted*/ {@+if (token_type <= inserted) flush_list(start); else{@+delete_token_ref(start); /*update reference count*/ if (token_type==macro) /*parameters must be flushed*/ while (param_ptr > param_start) {@+decr(param_ptr); flush_list(param_stack[param_ptr]); } } } else if (token_type==u_template) if (align_state > 500000) align_state=0; else fatal_error("(interwoven alignment preambles are not allowed)"); @.interwoven alignment preambles...@> pop_input; check_interrupt; } @ Sometimes \TeX\ has read too far and wants to ``unscan'' what it has seen. The |back_input| procedure takes care of this by putting the token just scanned back into the input stream, ready to be read again. This procedure can be used only if |cur_tok| represents the token to be replaced. Some applications of \TeX\ use this procedure a lot, so it has been slightly optimized for speed. @^inner loop@> @p void back_input(void) /*undoes one token of input*/ {@+pointer p; /*a token list of length one*/ while ((state==token_list)&&(loc==null)&&(token_type!=v_template)) end_token_list(); /*conserve stack space*/ p=get_avail();info(p)=cur_tok; if (cur_tok < right_brace_limit) if (cur_tok < left_brace_limit) decr(align_state); else incr(align_state); push_input;state=token_list;start=p;token_type=backed_up; loc=p; /*that was |back_list(p)|, without procedure overhead*/ } @ @<Insert token |p| into \TeX's input@>= {@+t=cur_tok;cur_tok=p;back_input();cur_tok=t; } @ The |back_error| routine is used when we want to replace an offending token just before issuing an error message. This routine, like |back_input|, requires that |cur_tok| has been set. We disable interrupts during the call of |back_input| so that the help message won't be lost. @p void back_error(void) /*back up one token and call |error|*/ {@+OK_to_interrupt=false;back_input();OK_to_interrupt=true;error(); } @# void ins_error(void) /*back up one inserted token and call |error|*/ {@+OK_to_interrupt=false;back_input();token_type=inserted; OK_to_interrupt=true;error(); } @ The |begin_file_reading| procedure starts a new level of input for lines of characters to be read from a file, or as an insertion from the terminal. It does not take care of opening the file, nor does it set |loc| or |limit| or |line|. @^system dependencies@> @p void begin_file_reading(void) {@+if (in_open==max_in_open) overflow("text input levels", max_in_open); @:TeX capacity exceeded text input levels}{\quad text input levels@> if (first==buf_size) overflow("buffer size", buf_size); @:TeX capacity exceeded buffer size}{\quad buffer size@> incr(in_open);push_input;index=in_open; line_stack[index]=line;start=first;state=mid_line; name=0; /*|terminal_input| is now |true|*/ } @ Conversely, the variables must be downdated when such a level of input is finished: @p void end_file_reading(void) {@+first=start;line=line_stack[index]; if (name > 17) a_close(&cur_file); /*forget it*/ pop_input;decr(in_open); } @ In order to keep the stack from overflowing during a long sequence of inserted `\.{\\show}' commands, the following routine removes completed error-inserted lines from memory. @p void clear_for_error_prompt(void) {@+while ((state!=token_list)&&terminal_input&&@| (input_ptr > 0)&&(loc > limit)) end_file_reading(); print_ln();clear_terminal; } @ To get \TeX's whole input mechanism going, we perform the following actions. @<Initialize the input routines@>= {@+input_ptr=0;max_in_stack=0; in_open=0;open_parens=0;max_buf_stack=0; param_ptr=0;max_param_stack=0; first=buf_size;@/do@+{buffer[first]=0;decr(first);}@+ while (!(first==0)); scanner_status=normal;warning_index=null;first=1; state=new_line;start=1;index=0;line=0;name=0; force_eof=false; align_state=1000000;@/ if (!init_terminal()) exit(0); limit=last;first=last+1; /*|init_terminal| has set |loc| and |last|*/ } @* Getting the next token. The heart of \TeX's input mechanism is the |get_next| procedure, which we shall develop in the next few sections of the program. Perhaps we shouldn't actually call it the ``heart,'' however, because it really acts as \TeX's eyes and mouth, reading the source files and gobbling them up. And it also helps \TeX\ to regurgitate stored token lists that are to be processed again. @^eyes and mouth@> The main duty of |get_next| is to input one token and to set |cur_cmd| and |cur_chr| to that token's command code and modifier. Furthermore, if the input token is a control sequence, the |eqtb| location of that control sequence is stored in |cur_cs|; otherwise |cur_cs| is set to zero. Underlying this simple description is a certain amount of complexity because of all the cases that need to be handled. However, the inner loop of |get_next| is reasonably short and fast. When |get_next| is asked to get the next token of a \.{\\read} line, it sets |cur_cmd==cur_chr==cur_cs==0| in the case that no more tokens appear on that line. (There might not be any tokens at all, if the |end_line_char| has |ignore| as its catcode.) @ The value of |par_loc| is the |eqtb| address of `\.{\\par}'. This quantity is needed because a blank line of input is supposed to be exactly equivalent to the appearance of \.{\\par}; we must set |cur_cs=par_loc| when detecting a blank line. @<Glob...@>= pointer @!par_loc; /*location of `\.{\\par}' in |eqtb|*/ halfword @!par_token; /*token representing `\.{\\par}'*/ @ @<Put each...@>= primitive(@[@<|"par"|@>@], par_end, 256); /*cf.\ |scan_file_name|*/ @!@:par_}{\.{\\par} primitive@> par_loc=cur_val;par_token=cs_token_flag+par_loc; @ @<Cases of |print_cmd_chr|...@>= case par_end: print_esc(@[@<|"par"|@>@]);@+break; @ Before getting into |get_next|, let's consider the subroutine that is called when an `\.{\\outer}' control sequence has been scanned or when the end of a file has been reached. These two cases are distinguished by |cur_cs|, which is zero at the end of a file. @p void check_outer_validity(void) {@+pointer p; /*points to inserted token list*/ pointer @!q; /*auxiliary pointer*/ if (scanner_status!=normal) {@+deletions_allowed=false; @<Back up an outer control sequence so that it can be reread@>; if (scanner_status > skipping) @<Tell the user what has run away and try to recover@>@; else{@+print_err("Incomplete ");print_cmd_chr(if_test, cur_if); @.Incomplete \\if...@> print_str("; all text was ignored after line ");print_int(skip_line); help3("A forbidden control sequence occurred in skipped text.")@/ ("This kind of error happens when you say `\\if...' and forget")@/ ("the matching `\\fi'. I've inserted a `\\fi'; this might work."); if (cur_cs!=0) cur_cs=0; else help_line[2]=@| "The file ended while I was skipping conditional text."; cur_tok=cs_token_flag+frozen_fi;ins_error(); } deletions_allowed=true; } } @ An outer control sequence that occurs in a \.{\\read} will not be reread, since the error recovery for \.{\\read} is not very powerful. @<Back up an outer control sequence so that it can be reread@>= if (cur_cs!=0) {@+if ((state==token_list)||(name < 1)||(name > 17)) {@+p=get_avail();info(p)=cs_token_flag+cur_cs; back_list(p); /*prepare to read the control sequence again*/ } cur_cmd=spacer;cur_chr=' '; /*replace it by a space*/ } @ @<Tell the user what has run away...@>= {@+runaway(); /*print a definition, argument, or preamble*/ if (cur_cs==0) print_err("File ended")@; @.File ended while scanning...@> else{@+cur_cs=0;print_err("Forbidden control sequence found"); @.Forbidden control sequence...@> } print_str(" while scanning "); @<Print either `\.{definition}' or `\.{use}' or `\.{preamble}' or `\.{text}', and insert tokens that should lead to recovery@>; print_str(" of ");sprint_cs(warning_index); help4("I suspect you have forgotten a `}', causing me")@/ ("to read past where you wanted me to stop.")@/ ("I'll try to recover; but if the error is serious,")@/ ("you'd better type `E' or `X' now and fix your file.");@/ error(); } @ The recovery procedure can't be fully understood without knowing more about the \TeX\ routines that should be aborted, but we can sketch the ideas here: For a runaway definition we will insert a right brace; for a runaway preamble, we will insert a special \.{\\cr} token and a right brace; and for a runaway argument, we will set |long_state| to |outer_call| and insert \.{\\par}. @<Print either `\.{definition}' or...@>= p=get_avail(); switch (scanner_status) { case defining: {@+print_str("definition");info(p)=right_brace_token+'}'; } @+break; case matching: {@+print_str("use");info(p)=par_token;long_state=outer_call; } @+break; case aligning: {@+print_str("preamble");info(p)=right_brace_token+'}';q=p; p=get_avail();link(p)=q;info(p)=cs_token_flag+frozen_cr; align_state=-1000000; } @+break; case absorbing: {@+print_str("text");info(p)=right_brace_token+'}'; } } /*there are no other cases*/ ins_list(p) @ We need to mention a procedure here that may be called by |get_next|. @p void firm_up_the_line(void); @ Now we're ready to take the plunge into |get_next| itself. Parts of this routine are executed more often than any other instructions of \TeX. @^mastication@>@^inner loop@> @p void get_next(void) /*sets |cur_cmd|, |cur_chr|, |cur_cs| to next token*/ {@+ /*go here to get the next input token*/ /*go here to eat the next character from a file*/ /*go here to digest it again*/ /*go here to start looking for a control sequence*/ /*go here when a control sequence has been found*/ /*go here when the next input token has been got*/ uint16_t k; /*an index into |buffer|*/ halfword @!t; /*a token*/ uint8_t @!cat; /*|cat_code(cur_chr)|, usually*/ ASCII_code @!c, @!cc; /*constituents of a possible expanded code*/ uint8_t @!d; /*number of excess characters in an expanded code*/ restart: cur_cs=0; if (state!=token_list) @<Input from external file, |goto restart| if no input found@>@; else@<Input from token list, |goto restart| if end of list or if a parameter needs to be expanded@>; @<If an alignment entry has just ended, take appropriate action@>; } @ An alignment entry ends when a tab or \.{\\cr} occurs, provided that the current level of braces is the same as the level that was present at the beginning of that alignment entry; i.e., provided that |align_state| has returned to the value it had after the \<u_j> template for that entry. @^inner loop@> @<If an alignment entry has just ended, take appropriate action@>= if (cur_cmd <= car_ret) if (cur_cmd >= tab_mark) if (align_state==0) @<Insert the \(v)\<v_j> template and |goto restart|@>@; @ @<Input from external file, |goto restart| if no input found@>= @^inner loop@> {@+get_cur_chr: if (loc <= limit) /*current line not yet finished*/ {@+cur_chr=buffer[loc];incr(loc); reswitch: cur_cmd=cat_code(cur_chr); @<Change state if necessary, and |goto switch| if the current character should be ignored, or |goto reswitch| if the current character changes to another@>; } else{@+state=new_line;@/ @<Move to next line of file, or |goto restart| if there is no next line, or |return| if a \.{\\read} line has finished@>; check_interrupt; goto get_cur_chr; } } @ The following 48-way switch accomplishes the scanning quickly, assuming that a decent \PASCAL\ compiler has translated the code. Note that the numeric values for |mid_line|, |skip_blanks|, and |new_line| are spaced apart from each other by |max_char_code+1|, so we can add a character's command code to the state to get a single number that characterizes both. @d any_state_plus(X) case mid_line+X: case skip_blanks+X: case new_line+X @<Change state if necessary...@>= switch (state+cur_cmd) { @<Cases where character is ignored@>: goto get_cur_chr; any_state_plus(escape): @<Scan a control sequence and set |state:=skip_blanks| or |mid_line|@>@;@+break; any_state_plus(active_char): @<Process an active-character control sequence and set |state:=mid_line|@>@;@+break; any_state_plus(sup_mark): @<If this |sup_mark| starts an expanded character like~\.{\^\^A} or~\.{\^\^df}, then |goto reswitch|, otherwise set |state:=mid_line|@>@;@+break; any_state_plus(invalid_char): @<Decry the invalid character and |goto restart|@>@; @t\4@>@<Handle situations involving spaces, braces, changes of state@>@; default:do_nothing; } @ @<Cases where character is ignored@>= any_state_plus(ignore): case skip_blanks+spacer: case new_line+spacer @ We go to |restart| instead of to |get_cur_chr|, because |state| might equal |token_list| after the error has been dealt with (cf.\ |clear_for_error_prompt|). @<Decry the invalid...@>= {@+print_err("Text line contains an invalid character"); @.Text line contains...@> help2("A funny symbol that I can't read has just been input.")@/ ("Continue, and I'll forget that it ever happened.");@/ deletions_allowed=false;error();deletions_allowed=true; goto restart; } @ @d add_delims_to(X) X+math_shift: X+tab_mark: X+mac_param: X+sub_mark: X+letter: X+other_char @<Handle situations involving spaces, braces, changes of state@>= case mid_line+spacer: @<Enter |skip_blanks| state, emit a space@>@;@+break; case mid_line+car_ret: @<Finish line, emit a space@>@;@+break; case skip_blanks+car_ret: any_state_plus(comment): @<Finish line, |goto switch|@>@; case new_line+car_ret: @<Finish line, emit a \.{\\par}@>@;@+break; case mid_line+left_brace: incr(align_state);@+break; case skip_blanks+left_brace: case new_line+left_brace: {@+ state=mid_line;incr(align_state); } @+break; case mid_line+right_brace: decr(align_state);@+break; case skip_blanks+right_brace: case new_line+right_brace: {@+ state=mid_line;decr(align_state); } @+break; add_delims_to(case skip_blanks): add_delims_to(case new_line): state=mid_line;@+break; @ When a character of type |spacer| gets through, its character code is changed to $\.{"\ "}=040$. This means that the ASCII codes for tab and space, and for the space inserted at the end of a line, will be treated alike when macro parameters are being matched. We do this since such characters are indistinguishable on most computer terminal displays. @<Finish line, emit a space@>= {@+loc=limit+1;cur_cmd=spacer;cur_chr=' '; } @ The following code is performed only when |cur_cmd==spacer|. @<Enter |skip_blanks| state, emit a space@>= {@+state=skip_blanks;cur_chr=' '; } @ @<Finish line, |goto switch|@>= {@+loc=limit+1;goto get_cur_chr; } @ @<Finish line, emit a \.{\\par}@>= {@+loc=limit+1;cur_cs=par_loc;cur_cmd=eq_type(cur_cs); cur_chr=equiv(cur_cs); if (cur_cmd >= outer_call) check_outer_validity(); } @ Notice that a code like \.{\^\^8} becomes \.x if not followed by a hex digit. @d is_hex(X) (((X >= '0')&&(X <= '9'))||((X >= 'a')&&(X <= 'f'))) @d hex_to_cur_chr if (c <= '9') cur_chr=c-'0';@+else cur_chr=c-'a'+10; if (cc <= '9') cur_chr=16*cur_chr+cc-'0'; else cur_chr=16*cur_chr+cc-'a'+10 @<If this |sup_mark| starts an expanded character...@>= {@+if (cur_chr==buffer[loc]) if (loc < limit) {@+c=buffer[loc+1];@+if (c < 0200) /*yes we have an expanded char*/ {@+loc=loc+2; if (is_hex(c)) if (loc <= limit) {@+cc=buffer[loc];@+if (is_hex(cc)) {@+incr(loc);hex_to_cur_chr;goto reswitch; } } if (c < 0100) cur_chr=c+0100;@+else cur_chr=c-0100; goto reswitch; } } state=mid_line; } @ @<Process an active-character...@>= {@+cur_cs=cur_chr+active_base; cur_cmd=eq_type(cur_cs);cur_chr=equiv(cur_cs);state=mid_line; if (cur_cmd >= outer_call) check_outer_validity(); } @ Control sequence names are scanned only when they appear in some line of a file; once they have been scanned the first time, their |eqtb| location serves as a unique identification, so \TeX\ doesn't need to refer to the original name any more except when it prints the equivalent in symbolic form. The program that scans a control sequence has been written carefully in order to avoid the blowups that might otherwise occur if a malicious user tried something like `\.{\\catcode\'15=0}'. The algorithm might look at |buffer[limit+1]|, but it never looks at |buffer[limit+2]|. If expanded characters like `\.{\^\^A}' or `\.{\^\^df}' appear in or just following a control sequence name, they are converted to single characters in the buffer and the process is repeated, slowly but surely. @<Scan a control...@>= {@+if (loc > limit) cur_cs=null_cs; /*|state| is irrelevant in this case*/ else{@+start_cs: k=loc;cur_chr=buffer[k];cat=cat_code(cur_chr); incr(k); if (cat==letter) state=skip_blanks; else if (cat==spacer) state=skip_blanks; else state=mid_line; if ((cat==letter)&&(k <= limit)) @<Scan ahead in the buffer until finding a nonletter; if an expanded code is encountered, reduce it and |goto start_cs|; otherwise if a multiletter control sequence is found, adjust |cur_cs| and |loc|, and |goto found|@>@; else@<If an expanded code is present, reduce it and |goto start_cs|@>; cur_cs=single_base+buffer[loc];incr(loc); } found: cur_cmd=eq_type(cur_cs);cur_chr=equiv(cur_cs); if (cur_cmd >= outer_call) check_outer_validity(); } @ Whenever we reach the following piece of code, we will have |cur_chr==buffer[k-1]| and |k <= limit+1| and |cat==cat_code(cur_chr)|. If an expanded code like \.{\^\^A} or \.{\^\^df} appears in |buffer[(k-1)dotdot(k+1)]| or |buffer[(k-1)dotdot(k+2)]|, we will store the corresponding code in |buffer[k-1]| and shift the rest of the buffer left two or three places. @<If an expanded...@>= {@+if (buffer[k]==cur_chr) @+if (cat==sup_mark) @+if (k < limit) {@+c=buffer[k+1];@+if (c < 0200) /*yes, one is indeed present*/ {@+d=2; if (is_hex(c)) @+if (k+2 <= limit) {@+cc=buffer[k+2];@+if (is_hex(cc)) incr(d); } if (d > 2) {@+hex_to_cur_chr;buffer[k-1]=cur_chr; } else if (c < 0100) buffer[k-1]=c+0100; else buffer[k-1]=c-0100; limit=limit-d;first=first-d; while (k <= limit) {@+buffer[k]=buffer[k+d];incr(k); } goto start_cs; } } } @ @<Scan ahead in the buffer...@>= {@+@/do@+{cur_chr=buffer[k];cat=cat_code(cur_chr);incr(k); }@+ while (!((cat!=letter)||(k > limit))); @<If an expanded...@>; if (cat!=letter) decr(k); /*now |k| points to first nonletter*/ if (k > loc+1) /*multiletter control sequence has been scanned*/ {@+cur_cs=id_lookup(loc, k-loc);loc=k;goto found; } } @ Let's consider now what happens when |get_next| is looking at a token list. @<Input from token list, |goto restart| if end of list or if a parameter needs to be expanded@>= if (loc!=null) /*list not exhausted*/ @^inner loop@> {@+t=info(loc);loc=link(loc); /*move to next*/ if (t >= cs_token_flag) /*a control sequence token*/ {@+cur_cs=t-cs_token_flag; cur_cmd=eq_type(cur_cs);cur_chr=equiv(cur_cs); if (cur_cmd >= outer_call) if (cur_cmd==dont_expand) @<Get the next token, suppressing expansion@>@; else check_outer_validity(); } else{@+cur_cmd=t/0400;cur_chr=t%0400; switch (cur_cmd) { case left_brace: incr(align_state);@+break; case right_brace: decr(align_state);@+break; case out_param: @<Insert macro parameter and |goto restart|@>@; default:do_nothing; } } } else{@+ /*we are done with this token list*/ end_token_list();goto restart; /*resume previous level*/ } @ The present point in the program is reached only when the |expand| routine has inserted a special marker into the input. In this special case, |info(loc)| is known to be a control sequence token, and |link(loc)==null|. @d no_expand_flag 257 /*this characterizes a special variant of |relax|*/ @<Get the next token, suppressing expansion@>= {@+cur_cs=info(loc)-cs_token_flag;loc=null;@/ cur_cmd=eq_type(cur_cs);cur_chr=equiv(cur_cs); if (cur_cmd > max_command) {@+cur_cmd=relax;cur_chr=no_expand_flag; } } @ @<Insert macro parameter...@>= {@+begin_token_list(param_stack[param_start+cur_chr-1], parameter); goto restart; } @ All of the easy branches of |get_next| have now been taken care of. There is one more branch. @d end_line_char_inactive (end_line_char < 0)||(end_line_char > 255) @<Move to next line of file, or |goto restart|...@>= if (name > 17) @<Read next line of file into |buffer|, or |goto restart| if the file has ended@>@; else{@+if (!terminal_input) /*\.{\\read} line has ended*/ {@+cur_cmd=0;cur_chr=0;return; } if (input_ptr > 0) /*text was inserted during error recovery*/ {@+end_file_reading();goto restart; /*resume previous level*/ } if (selector < log_only) open_log_file(); if (interaction > nonstop_mode) {@+if (end_line_char_inactive) incr(limit); if (limit==start) /*previous line was empty*/ print_nl("(Please type a command or say `\\end')"); @.Please type...@> print_ln();first=start; prompt_input("*"); /*input on-line into |buffer|*/ @.*\relax@> limit=last; if (end_line_char_inactive) decr(limit); else buffer[limit]=end_line_char; first=limit+1; loc=start; } else fatal_error("*** (job aborted, no legal \\end found)"); @.job aborted@> /*nonstop mode, which is intended for overnight batch processing, never waits for on-line input*/ } @ The global variable |force_eof| is normally |false|; it is set |true| by an \.{\\endinput} command. @<Glob...@>= bool @!force_eof; /*should the next \.{\\input} be aborted early?*/ @ @<Read next line of file into |buffer|, or |goto restart| if the file has ended@>= {@+incr(line);first=start; if (!force_eof) {@+if (input_ln(&cur_file, true)) /*not end of file*/ firm_up_the_line(); /*this sets |limit|*/ else force_eof=true; } if (force_eof) {@+print_char(')');decr(open_parens); update_terminal; /*show user that file has been read*/ force_eof=false; end_file_reading(); /*resume previous level*/ check_outer_validity();goto restart; } if (end_line_char_inactive) decr(limit); else buffer[limit]=end_line_char; first=limit+1;loc=start; /*ready to read*/ } @ If the user has set the |pausing| parameter to some positive value, and if nonstop mode has not been selected, each line of input is displayed on the terminal and the transcript file, followed by `\.{=>}'. \TeX\ waits for a response. If the response is simply |carriage_return|, the line is accepted as it stands, otherwise the line typed is used instead of the line in the file. @p void firm_up_the_line(void) {@+int k; /*an index into |buffer|*/ limit=last; if (pausing > 0) if (interaction > nonstop_mode) {@+wake_up_terminal;print_ln(); if (start < limit) for (k=start; k<=limit-1; k++) print(buffer[k]); first=limit;prompt_input("=>"); /*wait for user response*/ @.=>@> if (last > first) {@+for (k=first; k<=last-1; k++) /*move line down in buffer*/ buffer[k+start-first]=buffer[k]; limit=start+last-first; } } } @ Since |get_next| is used so frequently in \TeX, it is convenient to define three related procedures that do a little more: \yskip\hang|get_token| not only sets |cur_cmd| and |cur_chr|, it also sets |cur_tok|, a packed halfword version of the current token. \yskip\hang|get_x_token|, meaning ``get an expanded token,'' is like |get_token|, but if the current token turns out to be a user-defined control sequence (i.e., a macro call), or a conditional, or something like \.{\\topmark} or \.{\\expandafter} or \.{\\csname}, it is eliminated from the input by beginning the expansion of the macro or the evaluation of the conditional. \yskip\hang|x_token| is like |get_x_token| except that it assumes that |get_next| has already been called. \yskip\noindent In fact, these three procedures account for almost every use of |get_next|. @ No new control sequences will be defined except during a call of |get_token|, or when \.{\\csname} compresses a token list, because |no_new_control_sequence| is always |true| at other times. @p void get_token(void) /*sets |cur_cmd|, |cur_chr|, |cur_tok|*/ {@+no_new_control_sequence=false;get_next();no_new_control_sequence=true; @^inner loop@> if (cur_cs==0) cur_tok=(cur_cmd*0400)+cur_chr; else cur_tok=cs_token_flag+cur_cs; } @* Expanding the next token. Only a dozen or so command codes | > max_command| can possibly be returned by |get_next|; in increasing order, they are |undefined_cs|, |expand_after|, |no_expand|, |input|, |if_test|, |fi_or_else|, |cs_name|, |convert|, |the|, |top_bot_mark|, |call|, |long_call|, |outer_call|, |long_outer_call|, and |end_template|.{\emergencystretch=40pt\par} The |expand| subroutine is used when |cur_cmd > max_command|. It removes a ``call'' or a conditional or one of the other special operations just listed. It follows that |expand| might invoke itself recursively. In all cases, |expand| destroys the current token, but it sets things up so that the next |get_next| will deliver the appropriate next token. The value of |cur_tok| need not be known when |expand| is called. Since several of the basic scanning routines communicate via global variables, their values are saved as local variables of |expand| so that recursive calls don't invalidate them. @^recursion@> @p@t\4@>@<Declare the procedure called |macro_call|@>@;@/ @t\4@>@<Declare the procedure called |insert_relax|@>@;@/ void pass_text(void); void start_input(void); void conditional(void); void get_x_token(void); void conv_toks(void); void ins_the_toks(void); void expand(void) {@+halfword t; /*token that is being ``expanded after''*/ pointer @!p, @!q, @!r; /*for list manipulation*/ uint16_t @!j; /*index into |buffer|*/ int @!cv_backup; /*to save the global quantity |cur_val|*/ small_number @!cvl_backup, @!radix_backup, @!co_backup; /*to save |cur_val_level|, etc.*/ pointer @!backup_backup; /*to save |link(backup_head)|*/ small_number @!save_scanner_status; /*temporary storage of |scanner_status|*/ cv_backup=cur_val;cvl_backup=cur_val_level;radix_backup=radix; co_backup=cur_order;backup_backup=link(backup_head); if (cur_cmd < call) @<Expand a nonmacro@>@; else if (cur_cmd < end_template) macro_call(); else@<Insert a token containing |frozen_endv|@>; cur_val=cv_backup;cur_val_level=cvl_backup;radix=radix_backup; cur_order=co_backup;link(backup_head)=backup_backup; } @ @<Expand a nonmacro@>= {@+if (tracing_commands > 1) show_cur_cmd_chr(); switch (cur_cmd) { case top_bot_mark: @<Insert the \(a)appropriate mark text into the scanner@>@;@+break; case expand_after: @<Expand the token after the next token@>@;@+break; case no_expand: @<Suppress expansion of the next token@>@;@+break; case cs_name: @<Manufacture a control sequence name@>@;@+break; case convert: conv_toks();@+break; /*this procedure is discussed in Part 27 below*/ case the: ins_the_toks();@+break; /*this procedure is discussed in Part 27 below*/ case if_test: conditional();@+break; /*this procedure is discussed in Part 28 below*/ case fi_or_else: @<Terminate the current conditional and skip to \.{\\fi}@>@;@+break; case input: @<Initiate or terminate input from a file@>;@+break; default:@<Complain about an undefined macro@>@; } } @ It takes only a little shuffling to do what \TeX\ calls \.{\\expandafter}. @<Expand the token after...@>= {@+get_token();t=cur_tok;get_token(); if (cur_cmd > max_command) expand();@+else back_input(); cur_tok=t;back_input(); } @ The implementation of \.{\\noexpand} is a bit trickier, because it is necessary to insert a special `|dont_expand|' marker into \TeX's reading mechanism. This special marker is processed by |get_next|, but it does not slow down the inner loop. Since \.{\\outer} macros might arise here, we must also clear the |scanner_status| temporarily. @<Suppress expansion...@>= {@+save_scanner_status=scanner_status;scanner_status=normal; get_token();scanner_status=save_scanner_status;t=cur_tok; back_input(); /*now |start| and |loc| point to the backed-up token |t|*/ if (t >= cs_token_flag) {@+p=get_avail();info(p)=cs_token_flag+frozen_dont_expand; link(p)=loc;start=p;loc=p; } } @ @<Complain about an undefined macro@>= {@+print_err("Undefined control sequence"); @.Undefined control sequence@> help5("The control sequence at the end of the top line")@/ ("of your error message was never \\def'ed. If you have")@/ ("misspelled it (e.g., `\\hobx'), type `I' and the correct")@/ ("spelling (e.g., `I\\hbox'). Otherwise just continue,")@/ ("and I'll forget about whatever was undefined."); error(); } @ The |expand| procedure and some other routines that construct token lists find it convenient to use the following macros, which are valid only if the variables |p| and |q| are reserved for token-list building. @d store_new_token(X) {@+q=get_avail();link(p)=q;info(q)=X; p=q; /*|link(p)| is |null|*/ } @d fast_store_new_token(X) {@+fast_get_avail(q);link(p)=q;info(q)=X; p=q; /*|link(p)| is |null|*/ } @ @<Manufacture a control...@>= {@+r=get_avail();p=r; /*head of the list of characters*/ @/do@+{get_x_token(); if (cur_cs==0) store_new_token(cur_tok); }@+ while (!(cur_cs!=0)); if (cur_cmd!=end_cs_name) @<Complain about missing \.{\\endcsname}@>; @<Look up the characters of list |r| in the hash table, and set |cur_cs|@>; flush_list(r); if (eq_type(cur_cs)==undefined_cs) {@+eq_define(cur_cs, relax, 256); /*N.B.: The |save_stack| might change*/ } /*the control sequence will now match `\.{\\relax}'*/ cur_tok=cur_cs+cs_token_flag;back_input(); } @ @<Complain about missing \.{\\endcsname}@>= {@+print_err("Missing ");print_esc(@[@<|"endcsname"|@>@]);print_str(" inserted"); @.Missing \\endcsname...@> help2("The control sequence marked <to be read again> should")@/ ("not appear between \\csname and \\endcsname."); back_error(); } @ @<Look up the characters of list |r| in the hash table...@>= j=first;p=link(r); while (p!=null) {@+if (j >= max_buf_stack) {@+max_buf_stack=j+1; if (max_buf_stack==buf_size) overflow("buffer size", buf_size); @:TeX capacity exceeded buffer size}{\quad buffer size@> } buffer[j]=info(p)%0400;incr(j);p=link(p); } if (j > first+1) {@+no_new_control_sequence=false;cur_cs=id_lookup(first, j-first); no_new_control_sequence=true; } else if (j==first) cur_cs=null_cs; /*the list is empty*/ else cur_cs=single_base+buffer[first] /*the list has length one*/ @ An |end_template| command is effectively changed to an |endv| command by the following code. (The reason for this is discussed below; the |frozen_end_template| at the end of the template has passed the |check_outer_validity| test, so its mission of error detection has been accomplished.) @<Insert a token containing |frozen_endv|@>= {@+cur_tok=cs_token_flag+frozen_endv;back_input(); } @ The processing of \.{\\input} involves the |start_input| subroutine, which will be declared later; the processing of \.{\\endinput} is trivial. @<Put each...@>= primitive(@[@<|"input"|@>@], input, 0);@/ @!@:input_}{\.{\\input} primitive@> primitive(@[@<|"endinput"|@>@], input, 1);@/ @!@:end_input_}{\.{\\endinput} primitive@> @ @<Cases of |print_cmd_chr|...@>= case input: if (chr_code==0) print_esc(@[@<|"input"|@>@]);@+else print_esc(@[@<|"endinput"|@>@]);@+break; @ @<Initiate or terminate input...@>= if (cur_chr > 0) force_eof=true; else if (name_in_progress) insert_relax(); else start_input() @ Sometimes the expansion looks too far ahead, so we want to insert a harmless \.{\\relax} into the user's input. @<Declare the procedure called |insert_relax|@>= void insert_relax(void) {@+cur_tok=cs_token_flag+cur_cs;back_input(); cur_tok=cs_token_flag+frozen_relax;back_input();token_type=inserted; } @ Here is a recursive procedure that is \TeX's usual way to get the next token of input. It has been slightly optimized to take account of common cases. @p void get_x_token(void) /*sets |cur_cmd|, |cur_chr|, |cur_tok|, and expands macros*/ {@+ restart: get_next(); @^inner loop@> if (cur_cmd <= max_command) goto done; if (cur_cmd >= call) if (cur_cmd < end_template) macro_call(); else{@+cur_cs=frozen_endv;cur_cmd=endv; goto done; /*|cur_chr==null_list|*/ } else expand(); goto restart; done: if (cur_cs==0) cur_tok=(cur_cmd*0400)+cur_chr; else cur_tok=cs_token_flag+cur_cs; } @ The |get_x_token| procedure is equivalent to two consecutive procedure calls: |get_next;x_token|. @p void x_token(void) /*|get_x_token| without the initial |get_next|*/ {@+while (cur_cmd > max_command) {@+expand(); get_next(); } if (cur_cs==0) cur_tok=(cur_cmd*0400)+cur_chr; else cur_tok=cs_token_flag+cur_cs; } @ A control sequence that has been \.{\\def}'ed by the user is expanded by \TeX's |macro_call| procedure. Before we get into the details of |macro_call|, however, let's consider the treatment of primitives like \.{\\topmark}, since they are essentially macros without parameters. The token lists for such marks are kept in a global array of five pointers; we refer to the individual entries of this array by symbolic names |top_mark|, etc. The value of |top_mark| is either |null| or a pointer to the reference count of a token list. @d top_mark_code 0 /*the mark in effect at the previous page break*/ @d first_mark_code 1 /*the first mark between |top_mark| and |bot_mark|*/ @d bot_mark_code 2 /*the mark in effect at the current page break*/ @d split_first_mark_code 3 /*the first mark found by \.{\\vsplit}*/ @d split_bot_mark_code 4 /*the last mark found by \.{\\vsplit}*/ @d top_mark cur_mark[top_mark_code] @d first_mark cur_mark[first_mark_code] @d bot_mark cur_mark[bot_mark_code] @d split_first_mark cur_mark[split_first_mark_code] @d split_bot_mark cur_mark[split_bot_mark_code] @<Glob...@>= pointer @!cur_mark0[split_bot_mark_code-top_mark_code+1], *const @!cur_mark = @!cur_mark0-top_mark_code; /*token lists for marks*/ @ @<Set init...@>= top_mark=null;first_mark=null;bot_mark=null; split_first_mark=null;split_bot_mark=null; @ @<Put each...@>= primitive(@[@<|"topmark"|@>@], top_bot_mark, top_mark_code); @!@:top_mark_}{\.{\\topmark} primitive@> primitive(@[@<|"firstmark"|@>@], top_bot_mark, first_mark_code); @!@:first_mark_}{\.{\\firstmark} primitive@> primitive(@[@<|"botmark"|@>@], top_bot_mark, bot_mark_code); @!@:bot_mark_}{\.{\\botmark} primitive@> primitive(@[@<|"splitfirstmark"|@>@], top_bot_mark, split_first_mark_code); @!@:split_first_mark_}{\.{\\splitfirstmark} primitive@> primitive(@[@<|"splitbotmark"|@>@], top_bot_mark, split_bot_mark_code); @!@:split_bot_mark_}{\.{\\splitbotmark} primitive@> @ @<Cases of |print_cmd_chr|...@>= case top_bot_mark: switch (chr_code) { case first_mark_code: print_esc(@[@<|"firstmark"|@>@]);@+break; case bot_mark_code: print_esc(@[@<|"botmark"|@>@]);@+break; case split_first_mark_code: print_esc(@[@<|"splitfirstmark"|@>@]);@+break; case split_bot_mark_code: print_esc(@[@<|"splitbotmark"|@>@]);@+break; default:print_esc(@[@<|"topmark"|@>@]); } @+break; @ The following code is activated when |cur_cmd==top_bot_mark| and when |cur_chr| is a code like |top_mark_code|. @<Insert the \(a)appropriate mark text into the scanner@>= {@+if (cur_mark[cur_chr]!=null) begin_token_list(cur_mark[cur_chr], mark_text); } @ Now let's consider |macro_call| itself, which is invoked when \TeX\ is scanning a control sequence whose |cur_cmd| is either |call|, |long_call|, |outer_call|, or |long_outer_call|. The control sequence definition appears in the token list whose reference count is in location |cur_chr| of |mem|. The global variable |long_state| will be set to |call| or to |long_call|, depending on whether or not the control sequence disallows \.{\\par} in its parameters. The |get_next| routine will set |long_state| to |outer_call| and emit \.{\\par}, if a file ends or if an \.{\\outer} control sequence occurs in the midst of an argument. @<Glob...@>= uint8_t @!long_state; /*governs the acceptance of \.{\\par}*/ @ The parameters, if any, must be scanned before the macro is expanded. Parameters are token lists without reference counts. They are placed on an auxiliary stack called |pstack| while they are being scanned, since the |param_stack| may be losing entries during the matching process. (Note that |param_stack| can't be gaining entries, since |macro_call| is the only routine that puts anything onto |param_stack|, and it is not recursive.) @<Glob...@>= pointer @!pstack[9]; /*arguments supplied to a macro*/ @ After parameter scanning is complete, the parameters are moved to the |param_stack|. Then the macro body is fed to the scanner; in other words, |macro_call| places the defined text of the control sequence at the top of\/ \TeX's input stack, so that |get_next| will proceed to read it next. The global variable |cur_cs| contains the |eqtb| address of the control sequence being expanded, when |macro_call| begins. If this control sequence has not been declared \.{\\long}, i.e., if its command code in the |eq_type| field is not |long_call| or |long_outer_call|, its parameters are not allowed to contain the control sequence \.{\\par}. If an illegal \.{\\par} appears, the macro call is aborted, and the \.{\\par} will be rescanned. @<Declare the procedure called |macro_call|@>= void macro_call(void) /*invokes a user-defined control sequence*/ {@+ pointer r; /*current node in the macro's token list*/ pointer @!p; /*current node in parameter token list being built*/ pointer @!q; /*new node being put into the token list*/ pointer @!s; /*backup pointer for parameter matching*/ pointer @!t; /*cycle pointer for backup recovery*/ pointer @!u, @!v; /*auxiliary pointers for backup recovery*/ pointer @!rbrace_ptr; /*one step before the last |right_brace| token*/ small_number @!n; /*the number of parameters scanned*/ halfword @!unbalance; /*unmatched left braces in current parameter*/ int @!m; /*the number of tokens or groups (usually)*/ pointer @!ref_count; /*start of the token list*/ small_number @!save_scanner_status; /*|scanner_status| upon entry*/ pointer @!save_warning_index; /*|warning_index| upon entry*/ ASCII_code @!match_chr; /*character used in parameter*/ save_scanner_status=scanner_status;save_warning_index=warning_index; warning_index=cur_cs;ref_count=cur_chr;r=link(ref_count);n=0; if (tracing_macros > 0) @<Show the text of the macro being expanded@>; if (info(r)!=end_match_token) @<Scan the parameters and make |link(r)| point to the macro body; but |return| if an illegal \.{\\par} is detected@>; @<Feed the macro body and its parameters to the scanner@>; end: scanner_status=save_scanner_status;warning_index=save_warning_index; } @ Before we put a new token list on the input stack, it is wise to clean off all token lists that have recently been depleted. Then a user macro that ends with a call to itself will not require unbounded stack space. @<Feed the macro body and its parameters to the scanner@>= while ((state==token_list)&&(loc==null)&&(token_type!=v_template)) end_token_list(); /*conserve stack space*/ begin_token_list(ref_count, macro);name=warning_index;loc=link(r); if (n > 0) {@+if (param_ptr+n > max_param_stack) {@+max_param_stack=param_ptr+n; if (max_param_stack > param_size) overflow("parameter stack size", param_size); @:TeX capacity exceeded parameter stack size}{\quad parameter stack size@> } for (m=0; m<=n-1; m++) param_stack[param_ptr+m]=pstack[m]; param_ptr=param_ptr+n; } @ At this point, the reader will find it advisable to review the explanation of token list format that was presented earlier, since many aspects of that format are of importance chiefly in the |macro_call| routine. The token list might begin with a string of compulsory tokens before the first |match| or |end_match|. In that case the macro name is supposed to be followed by those tokens; the following program will set |s==null| to represent this restriction. Otherwise |s| will be set to the first token of a string that will delimit the next parameter. @<Scan the parameters and make |link(r)| point to the macro body...@>= {@+scanner_status=matching;unbalance=0; long_state=eq_type(cur_cs); if (long_state >= outer_call) long_state=long_state-2; @/do@+{link(temp_head)=null; if ((info(r) > match_token+255)||(info(r) < match_token)) s=null; else{@+match_chr=info(r)-match_token;s=link(r);r=s; p=temp_head;m=0; } @<Scan a parameter until its delimiter string has been found; or, if |s=null|, simply scan the delimiter string@>;@/ /*now |info(r)| is a token whose command code is either |match| or |end_match|*/ }@+ while (!(info(r)==end_match_token)); } @ If |info(r)| is a |match| or |end_match| command, it cannot be equal to any token found by |get_token|. Therefore an undelimited parameter---i.e., a |match| that is immediately followed by |match| or |end_match|---will always fail the test `|cur_tok==info(r)|' in the following algorithm. @<Scan a parameter until its delimiter string has been found; or,...@>= resume: get_token(); /*set |cur_tok| to the next token of input*/ if (cur_tok==info(r)) @<Advance \(r)|r|; |goto found| if the parameter delimiter has been fully matched, otherwise |goto continue|@>; @<Contribute the recently matched tokens to the current parameter, and |goto continue| if a partial match is still in effect; but abort if |s=null|@>; if (cur_tok==par_token) if (long_state!=long_call) @<Report a runaway argument and abort@>; if (cur_tok < right_brace_limit) if (cur_tok < left_brace_limit) @<Contribute an entire group to the current parameter@>@; else@<Report an extra right brace and |goto continue|@>@; else@<Store the current token, but |goto continue| if it is a blank space that would become an undelimited parameter@>; incr(m); if (info(r) > end_match_token) goto resume; if (info(r) < match_token) goto resume; found: if (s!=null) @<Tidy up the parameter just scanned, and tuck it away@>@; @ @<Store the current token, but |goto continue| if it is...@>= {@+if (cur_tok==space_token) if (info(r) <= end_match_token) if (info(r) >= match_token) goto resume; store_new_token(cur_tok); } @ A slightly subtle point arises here: When the parameter delimiter ends with `\.{\#\{}', the token list will have a left brace both before and after the |end_match|\kern-.4pt. Only one of these should affect the |align_state|, but both will be scanned, so we must make a correction. @<Advance \(r)|r|; |goto found| if the parameter delimiter has been fully...@>= {@+r=link(r); if ((info(r) >= match_token)&&(info(r) <= end_match_token)) {@+if (cur_tok < left_brace_limit) decr(align_state); goto found; } else goto resume; } @ @<Report an extra right brace and |goto continue|@>= {@+back_input();print_err("Argument of ");sprint_cs(warning_index); @.Argument of \\x has...@> print_str(" has an extra }"); help6("I've run across a `}' that doesn't seem to match anything.")@/ ("For example, `\\def\\a#1{...}' and `\\a}' would produce")@/ ("this error. If you simply proceed now, the `\\par' that")@/ ("I've just inserted will cause me to report a runaway")@/ ("argument that might be the root of the problem. But if")@/ ("your `}' was spurious, just type `2' and it will go away."); incr(align_state);long_state=call;cur_tok=par_token;ins_error(); goto resume; } /*a white lie; the \.{\\par} won't always trigger a runaway*/ @ If |long_state==outer_call|, a runaway argument has already been reported. @<Report a runaway argument and abort@>= {@+if (long_state==call) {@+runaway();print_err("Paragraph ended before "); @.Paragraph ended before...@> sprint_cs(warning_index);print_str(" was complete"); help3("I suspect you've forgotten a `}', causing me to apply this")@/ ("control sequence to too much text. How can we recover?")@/ ("My plan is to forget the whole thing and hope for the best."); back_error(); } pstack[n]=link(temp_head);align_state=align_state-unbalance; for (m=0; m<=n; m++) flush_list(pstack[m]); goto end; } @ When the following code becomes active, we have matched tokens from |s| to the predecessor of |r|, and we have found that |cur_tok!=info(r)|. An interesting situation now presents itself: If the parameter is to be delimited by a string such as `\.{ab}', and if we have scanned `\.{aa}', we want to contribute one `\.a' to the current parameter and resume looking for a `\.b'. The program must account for such partial matches and for others that can be quite complex. But most of the time we have |s==r| and nothing needs to be done. Incidentally, it is possible for \.{\\par} tokens to sneak in to certain parameters of non-\.{\\long} macros. For example, consider a case like `\.{\\def\\a\#1\\par!\{...\}}' where the first \.{\\par} is not followed by an exclamation point. In such situations it does not seem appropriate to prohibit the \.{\\par}, so \TeX\ keeps quiet about this bending of the rules. @<Contribute the recently matched tokens to the current parameter...@>= if (s!=r) if (s==null) @<Report an improper use of the macro and abort@>@; else{@+t=s; @/do@+{store_new_token(info(t));incr(m);u=link(t);v=s; loop@+{@+if (u==r) if (cur_tok!=info(v)) goto done; else{@+r=link(v);goto resume; } if (info(u)!=info(v)) goto done; u=link(u);v=link(v); } done: t=link(t); }@+ while (!(t==r)); r=s; /*at this point, no tokens are recently matched*/ } @ @<Report an improper use...@>= {@+print_err("Use of ");sprint_cs(warning_index); @.Use of x doesn't match...@> print_str(" doesn't match its definition"); help4("If you say, e.g., `\\def\\a1{...}', then you must always")@/ ("put `1' after `\\a', since control sequence names are")@/ ("made up of letters only. The macro here has not been")@/ ("followed by the required stuff, so I'm ignoring it."); error();goto end; } @ @<Contribute an entire group to the current parameter@>= {@+unbalance=1; @^inner loop@> loop@+{@+fast_store_new_token(cur_tok);get_token(); if (cur_tok==par_token) if (long_state!=long_call) @<Report a runaway argument and abort@>; if (cur_tok < right_brace_limit) if (cur_tok < left_brace_limit) incr(unbalance); else{@+decr(unbalance); if (unbalance==0) goto done1; } } done1: rbrace_ptr=p;store_new_token(cur_tok); } @ If the parameter consists of a single group enclosed in braces, we must strip off the enclosing braces. That's why |rbrace_ptr| was introduced. @<Tidy up the parameter just scanned, and tuck it away@>= {@+if ((m==1)&&(info(p) < right_brace_limit)&&(p!=temp_head)) {@+link(rbrace_ptr)=null;free_avail(p); p=link(temp_head);pstack[n]=link(p);free_avail(p); } else pstack[n]=link(temp_head); incr(n); if (tracing_macros > 0) {@+begin_diagnostic();print_nl("");print(match_chr);print_int(n); print_str("<-");show_token_list(pstack[n-1], null, 1000); end_diagnostic(false); } } @ @<Show the text of the macro being expanded@>= {@+begin_diagnostic();print_ln();print_cs(warning_index); token_show(ref_count);end_diagnostic(false); } @* Basic scanning subroutines. Let's turn now to some procedures that \TeX\ calls upon frequently to digest certain kinds of patterns in the input. Most of these are quite simple; some are quite elaborate. Almost all of the routines call |get_x_token|, which can cause them to be invoked recursively. @^stomach@> @^recursion@> @ The |scan_left_brace| routine is called when a left brace is supposed to be the next non-blank token. (The term ``left brace'' means, more precisely, a character whose catcode is |left_brace|.) \TeX\ allows \.{\\relax} to appear before the |left_brace|. @p void scan_left_brace(void) /*reads a mandatory |left_brace|*/ {@+@<Get the next non-blank non-relax non-call token@>; if (cur_cmd!=left_brace) {@+print_err("Missing { inserted"); @.Missing \{ inserted@> help4("A left brace was mandatory here, so I've put one in.")@/ ("You might want to delete and/or insert some corrections")@/ ("so that I will find a matching right brace soon.")@/ ("(If you're confused by all this, try typing `I}' now.)"); back_error();cur_tok=left_brace_token+'{';cur_cmd=left_brace; cur_chr='{';incr(align_state); } } @ @<Get the next non-blank non-relax non-call token@>= @/do@+{get_x_token(); }@+ while (!((cur_cmd!=spacer)&&(cur_cmd!=relax))) @ The |scan_optional_equals| routine looks for an optional `\.=' sign preceded by optional spaces; `\.{\\relax}' is not ignored here. @p void scan_optional_equals(void) {@+@<Get the next non-blank non-call token@>; if (cur_tok!=other_token+'=') back_input(); } @ @<Get the next non-blank non-call token@>= @/do@+{get_x_token(); }@+ while (!(cur_cmd!=spacer)) @ In case you are getting bored, here is a slightly less trivial routine: Given a string of lowercase letters, like `\.{pt}' or `\.{plus}' or `\.{width}', the |scan_keyword| routine checks to see whether the next tokens of input match this string. The match must be exact, except that uppercase letters will match their lowercase counterparts; uppercase equivalents are determined by subtracting |'a'-'A'|, rather than using the |uc_code| table, since \TeX\ uses this routine only for its own limited set of keywords. If a match is found, the characters are effectively removed from the input and |true| is returned. Otherwise |false| is returned, and the input is left essentially unchanged (except for the fact that some macros may have been expanded, etc.). @^inner loop@> @p bool scan_keyword(str_number @!s) /*look for a given string*/ {@+ pointer p; /*tail of the backup list*/ pointer @!q; /*new node being added to the token list via |store_new_token|*/ pool_pointer @!k; /*index into |str_pool|*/ p=backup_head;link(p)=null;k=str_start[s]; while (k < str_start[s+1]) {@+get_x_token(); /*recursion is possible here*/ @^recursion@> if ((cur_cs==0)&&@| ((cur_chr==so(str_pool[k]))||(cur_chr==so(str_pool[k])-'a'+'A'))) {@+store_new_token(cur_tok);incr(k); } else if ((cur_cmd!=spacer)||(p!=backup_head)) {@+back_input(); if (p!=backup_head) back_list(link(backup_head)); return false; } } flush_list(link(backup_head));return true; } @ Here is a procedure that sounds an alarm when mu and non-mu units are being switched. @p void mu_error(void) {@+print_err("Incompatible glue units"); @.Incompatible glue units@> help1("I'm going to assume that 1mu=1pt when they're mixed."); error(); } @ The next routine `|scan_something_internal|' is used to fetch internal numeric quantities like `\.{\\hsize}', and also to handle the `\.{\\the}' when expanding constructions like `\.{\\the\\toks0}' and `\.{\\the\\baselineskip}'. Soon we will be considering the |scan_int| procedure, which calls |scan_something_internal|; on the other hand, |scan_something_internal| also calls |scan_int|, for constructions like `\.{\\catcode\`\\\$}' or `\.{\\fontdimen} \.3 \.{\\ff}'. So we have to declare |scan_int| as a |forward| procedure. A few other procedures are also declared at this point. @p void scan_int(void); /*scans an integer value*/ @t\4\4@>@<Declare procedures that scan restricted classes of integers@>@; @t\4\4@>@<Declare procedures that scan font-related stuff@>@; @ \TeX\ doesn't know exactly what to expect when |scan_something_internal| begins. For example, an integer or dimension or glue value could occur immediately after `\.{\\hskip}'; and one can even say \.{\\the} with respect to token lists in constructions like `\.{\\xdef\\o\{\\the\\output\}}'. On the other hand, only integers are allowed after a construction like `\.{\\count}'. To handle the various possibilities, |scan_something_internal| has a |level| parameter, which tells the ``highest'' kind of quantity that |scan_something_internal| is allowed to produce. Six levels are distinguished, namely |int_val|, |dimen_val|, |glue_val|, |mu_val|, |ident_val|, and |tok_val|. The output of |scan_something_internal| (and of the other routines |scan_int|, |scan_dimen|, and |scan_glue| below) is put into the global variable |cur_val|, and its level is put into |cur_val_level|. The highest values of |cur_val_level| are special: |mu_val| is used only when |cur_val| points to something in a ``muskip'' register, or to one of the three parameters \.{\\thinmuskip}, \.{\\medmuskip}, \.{\\thickmuskip}; |ident_val| is used only when |cur_val| points to a font identifier; |tok_val| is used only when |cur_val| points to |null| or to the reference count of a token list. The last two cases are allowed only when |scan_something_internal| is called with |level==tok_val|. If the output is glue, |cur_val| will point to a glue specification, and the reference count of that glue will have been updated to reflect this reference; if the output is a nonempty token list, |cur_val| will point to its reference count, but in this case the count will not have been updated. Otherwise |cur_val| will contain the integer or scaled value in question. @d int_val 0 /*integer values*/ @d dimen_val 1 /*dimension values*/ @d glue_val 2 /*glue specifications*/ @d mu_val 3 /*math glue specifications*/ @d ident_val 4 /*font identifier*/ @d tok_val 5 /*token lists*/ @<Glob...@>= int @!cur_val; /*value returned by numeric scanners*/ uint8_t @!cur_val_level; /*the ``level'' of this value*/ @ The hash table is initialized with `\.{\\count}', `\.{\\dimen}', `\.{\\skip}', and `\.{\\muskip}' all having |register| as their command code; they are distinguished by the |chr_code|, which is either |int_val|, |dimen_val|, |glue_val|, or |mu_val|. @<Put each...@>= primitive(@[@<|"count"|@>@], register, int_val); @!@:count_}{\.{\\count} primitive@> primitive(@[@<|"dimen"|@>@], register, dimen_val); @!@:dimen_}{\.{\\dimen} primitive@> primitive(@[@<|"skip"|@>@], register, glue_val); @!@:skip_}{\.{\\skip} primitive@> primitive(@[@<|"muskip"|@>@], register, mu_val); @!@:mu_skip_}{\.{\\muskip} primitive@> @ @<Cases of |print_cmd_chr|...@>= case register: if (chr_code==int_val) print_esc(@[@<|"count"|@>@]); else if (chr_code==dimen_val) print_esc(@[@<|"dimen"|@>@]); else if (chr_code==glue_val) print_esc(@[@<|"skip"|@>@]); else print_esc(@[@<|"muskip"|@>@]);@+break; @ OK, we're ready for |scan_something_internal| itself. A second parameter, |negative|, is set |true| if the value that is found should be negated. It is assumed that |cur_cmd| and |cur_chr| represent the first token of the internal quantity to be scanned; an error will be signalled if |cur_cmd < min_internal| or |cur_cmd > max_internal|. @d scanned_result_end(X) cur_val_level=X;@+} @d scanned_result(X) @+{@+cur_val=X;scanned_result_end @p void scan_something_internal(small_number @!level, bool @!negative) /*fetch an internal parameter*/ {@+halfword m; /*|chr_code| part of the operand token*/ uint8_t @!p; /*index into |nest|*/ m=cur_chr; switch (cur_cmd) { case def_code: @<Fetch a character code from some table@>@;@+break; case toks_register: case assign_toks: case def_family: case set_font: case def_font: @<Fetch a token list or font identifier, provided that |level=tok_val|@>@;@+break; case assign_int: scanned_result(eqtb[m].i)(int_val)@;@+break; case assign_dimen: scanned_result(eqtb[m].sc)(dimen_val)@;@+break; case assign_glue: scanned_result(equiv(m))(glue_val)@;@+break; case assign_mu_glue: scanned_result(equiv(m))(mu_val)@;@+break; case set_aux: @<Fetch the |space_factor| or the |prev_depth|@>@;@+break; case set_prev_graf: @<Fetch the |prev_graf|@>@;@+break; case set_page_int: @<Fetch the |dead_cycles| or the |insert_penalties|@>@;@+break; case set_page_dimen: @<Fetch something on the |page_so_far|@>@;@+break; case set_shape: @<Fetch the |par_shape| size@>@;@+break; case set_box_dimen: @<Fetch a box dimension@>@;@+break; case char_given: case math_given: scanned_result(cur_chr)(int_val)@;@+break; case assign_font_dimen: @<Fetch a font dimension@>@;@+break; case assign_font_int: @<Fetch a font integer@>@;@+break; case register: @<Fetch a register@>@;@+break; case last_item: @<Fetch an item in the current node, if appropriate@>@;@+break; default:@<Complain that \.{\\the} can't do this; give zero result@>@; } @/ while (cur_val_level > level) @<Convert \(c)|cur_val| to a lower level@>; @<Fix the reference count, if any, and negate |cur_val| if |negative|@>; } @ @<Fetch a character code from some table@>= {@+scan_char_num(); if (m==math_code_base) scanned_result(ho(math_code(cur_val)))(int_val)@; else if (m < math_code_base) scanned_result(equiv(m+cur_val))(int_val)@; else scanned_result(eqtb[m+cur_val].i)(int_val); } @ @<Fetch a token list...@>= if (level!=tok_val) {@+print_err("Missing number, treated as zero"); @.Missing number...@> help3("A number should have been here; I inserted `0'.")@/ ("(If you can't figure out why I needed to see a number,")@/ ("look up `weird error' in the index to The TeXbook.)"); @:TeXbook}{\sl The \TeX book@> back_error();scanned_result(0)(dimen_val); } else if (cur_cmd <= assign_toks) {@+if (cur_cmd < assign_toks) /*|cur_cmd==toks_register|*/ {@+scan_eight_bit_int();m=toks_base+cur_val; } scanned_result(equiv(m))(tok_val); } else{@+back_input();scan_font_ident(); scanned_result(font_id_base+cur_val)(ident_val); } @ Users refer to `\.{\\the\\spacefactor}' only in horizontal mode, and to `\.{\\the\\prevdepth}' only in vertical mode; so we put the associated mode in the modifier part of the |set_aux| command. The |set_page_int| command has modifier 0 or 1, for `\.{\\deadcycles}' and `\.{\\insertpenalties}', respectively. The |set_box_dimen| command is modified by either |width_offset|, |height_offset|, or |depth_offset|. And the |last_item| command is modified by either |int_val|, |dimen_val|, |glue_val|, |input_line_no_code|, or |badness_code|. @d input_line_no_code (glue_val+1) /*code for \.{\\inputlineno}*/ @d badness_code (glue_val+2) /*code for \.{\\badness}*/ @<Put each...@>= primitive(@[@<|"spacefactor"|@>@], set_aux, hmode); @!@:space_factor_}{\.{\\spacefactor} primitive@> primitive(@[@<|"prevdepth"|@>@], set_aux, vmode);@/ @!@:prev_depth_}{\.{\\prevdepth} primitive@> primitive(@[@<|"deadcycles"|@>@], set_page_int, 0); @!@:dead_cycles_}{\.{\\deadcycles} primitive@> primitive(@[@<|"insertpenalties"|@>@], set_page_int, 1); @!@:insert_penalties_}{\.{\\insertpenalties} primitive@> primitive(@[@<|"wd"|@>@], set_box_dimen, width_offset); @!@:wd_}{\.{\\wd} primitive@> primitive(@[@<|"ht"|@>@], set_box_dimen, height_offset); @!@:ht_}{\.{\\ht} primitive@> primitive(@[@<|"dp"|@>@], set_box_dimen, depth_offset); @!@:dp_}{\.{\\dp} primitive@> primitive(@[@<|"lastpenalty"|@>@], last_item, int_val); @!@:last_penalty_}{\.{\\lastpenalty} primitive@> primitive(@[@<|"lastkern"|@>@], last_item, dimen_val); @!@:last_kern_}{\.{\\lastkern} primitive@> primitive(@[@<|"lastskip"|@>@], last_item, glue_val); @!@:last_skip_}{\.{\\lastskip} primitive@> primitive(@[@<|"inputlineno"|@>@], last_item, input_line_no_code); @!@:input_line_no_}{\.{\\inputlineno} primitive@> primitive(@[@<|"badness"|@>@], last_item, badness_code); @!@:badness_}{\.{\\badness} primitive@> @ @<Cases of |print_cmd_chr|...@>= case set_aux: if (chr_code==vmode) print_esc(@[@<|"prevdepth"|@>@]); @+else print_esc(@[@<|"spacefactor"|@>@]);@+break; case set_page_int: if (chr_code==0) print_esc(@[@<|"deadcycles"|@>@]); @+else print_esc(@[@<|"insertpenalties"|@>@]);@+break; case set_box_dimen: if (chr_code==width_offset) print_esc(@[@<|"wd"|@>@]); else if (chr_code==height_offset) print_esc(@[@<|"ht"|@>@]); else print_esc(@[@<|"dp"|@>@]);@+break; case last_item: switch (chr_code) { case int_val: print_esc(@[@<|"lastpenalty"|@>@]);@+break; case dimen_val: print_esc(@[@<|"lastkern"|@>@]);@+break; case glue_val: print_esc(@[@<|"lastskip"|@>@]);@+break; case input_line_no_code: print_esc(@[@<|"inputlineno"|@>@]);@+break; default:print_esc(@[@<|"badness"|@>@]); } @+break; @ @<Fetch the |space_factor| or the |prev_depth|@>= if (abs(mode)!=m) {@+print_err("Improper ");print_cmd_chr(set_aux, m); @.Improper \\spacefactor@> @.Improper \\prevdepth@> help4("You can refer to \\spacefactor only in horizontal mode;")@/ ("you can refer to \\prevdepth only in vertical mode; and")@/ ("neither of these is meaningful inside \\write. So")@/ ("I'm forgetting what you said and using zero instead."); error(); if (level!=tok_val) scanned_result(0)(dimen_val)@; else scanned_result(0)(int_val); } else if (m==vmode) scanned_result(prev_depth)(dimen_val)@; else scanned_result(space_factor)(int_val) @ @<Fetch the |dead_cycles| or the |insert_penalties|@>= {@+if (m==0) cur_val=dead_cycles;@+else cur_val=insert_penalties; cur_val_level=int_val; } @ @<Fetch a box dimension@>= {@+scan_eight_bit_int(); if (box(cur_val)==null) cur_val=0;@+else cur_val=mem[box(cur_val)+m].sc; cur_val_level=dimen_val; } @ Inside an \.{\\output} routine, a user may wish to look at the page totals that were present at the moment when output was triggered. @d max_dimen 07777777777 /*$2^{30}-1$*/ @<Fetch something on the |page_so_far|@>= {@+if ((page_contents==empty)&&(!output_active)) if (m==0) cur_val=max_dimen;@+else cur_val=0; else cur_val=page_so_far[m]; cur_val_level=dimen_val; } @ @<Fetch the |prev_graf|@>= if (mode==0) scanned_result(0)(int_val)@; /*|prev_graf==0| within \.{\\write}*/ else{@+nest[nest_ptr]=cur_list;p=nest_ptr; while (abs(nest[p].mode_field)!=vmode) decr(p); scanned_result(nest[p].pg_field)(int_val); } @ @<Fetch the |par_shape| size@>= {@+if (par_shape_ptr==null) cur_val=0; else cur_val=info(par_shape_ptr); cur_val_level=int_val; } @ Here is where \.{\\lastpenalty}, \.{\\lastkern}, and \.{\\lastskip} are implemented. The reference count for \.{\\lastskip} will be updated later. We also handle \.{\\inputlineno} and \.{\\badness} here, because they are legal in similar contexts. @<Fetch an item in the current node...@>= if (cur_chr > glue_val) {@+if (cur_chr==input_line_no_code) cur_val=line; else cur_val=last_badness; /*|cur_chr==badness_code|*/ cur_val_level=int_val; } else{@+if (cur_chr==glue_val) cur_val=zero_glue;@+else cur_val=0; cur_val_level=cur_chr; if (!is_char_node(tail)&&(mode!=0)) switch (cur_chr) { case int_val: if (type(tail)==penalty_node) cur_val=penalty(tail);@+break; case dimen_val: if (type(tail)==kern_node) cur_val=width(tail);@+break; case glue_val: if (type(tail)==glue_node) {@+cur_val=glue_ptr(tail); if (subtype(tail)==mu_glue) cur_val_level=mu_val; } } /*there are no other cases*/ else if ((mode==vmode)&&(tail==head)) switch (cur_chr) { case int_val: cur_val=last_penalty;@+break; case dimen_val: cur_val=last_kern;@+break; case glue_val: if (last_glue!=max_halfword) cur_val=last_glue; } /*there are no other cases*/ } @ @<Fetch a font dimension@>= {@+find_font_dimen(false);font_info[fmem_ptr].sc=0; scanned_result(font_info[cur_val].sc)(dimen_val); } @ @<Fetch a font integer@>= {@+scan_font_ident(); if (m==0) scanned_result(hyphen_char[cur_val])(int_val)@; else scanned_result(skew_char[cur_val])(int_val); } @ @<Fetch a register@>= {@+scan_eight_bit_int(); switch (m) { case int_val: cur_val=count(cur_val);@+break; case dimen_val: cur_val=dimen(cur_val);@+break; case glue_val: cur_val=skip(cur_val);@+break; case mu_val: cur_val=mu_skip(cur_val); } /*there are no other cases*/ cur_val_level=m; } @ @<Complain that \.{\\the} can't do this; give zero result@>= {@+print_err("You can't use `");print_cmd_chr(cur_cmd, cur_chr); @.You can't use x after ...@> print_str("' after ");print_esc(@[@<|"the"|@>@]); help1("I'm forgetting what you said and using zero instead."); error(); if (level!=tok_val) scanned_result(0)(dimen_val)@; else scanned_result(0)(int_val); } @ When a |glue_val| changes to a |dimen_val|, we use the width component of the glue; there is no need to decrease the reference count, since it has not yet been increased. When a |dimen_val| changes to an |int_val|, we use scaled points so that the value doesn't actually change. And when a |mu_val| changes to a |glue_val|, the value doesn't change either. @<Convert \(c)|cur_val| to a lower level@>= {@+if (cur_val_level==glue_val) cur_val=width(cur_val); else if (cur_val_level==mu_val) mu_error(); decr(cur_val_level); } @ If |cur_val| points to a glue specification at this point, the reference count for the glue does not yet include the reference by |cur_val|. If |negative| is |true|, |cur_val_level| is known to be | <= mu_val|. @<Fix the reference count, if any,...@>= if (negative) if (cur_val_level >= glue_val) {@+cur_val=new_spec(cur_val); @<Negate all three glue components of |cur_val|@>; } else negate(cur_val); else if ((cur_val_level >= glue_val)&&(cur_val_level <= mu_val)) add_glue_ref(cur_val) @ @<Negate all three...@>= {@+negate(width(cur_val)); negate(stretch(cur_val)); negate(shrink(cur_val)); } @ Our next goal is to write the |scan_int| procedure, which scans anything that \TeX\ treats as an integer. But first we might as well look at some simple applications of |scan_int| that have already been made inside of |scan_something_internal|. @ @<Declare procedures that scan restricted classes of integers@>= void scan_eight_bit_int(void) {@+scan_int(); if ((cur_val < 0)||(cur_val > 255)) {@+print_err("Bad register code"); @.Bad register code@> help2("A register number must be between 0 and 255.")@/ ("I changed this one to zero.");int_error(cur_val);cur_val=0; } } @ @<Declare procedures that scan restricted classes of integers@>= void scan_char_num(void) {@+scan_int(); if ((cur_val < 0)||(cur_val > 255)) {@+print_err("Bad character code"); @.Bad character code@> help2("A character number must be between 0 and 255.")@/ ("I changed this one to zero.");int_error(cur_val);cur_val=0; } } @ While we're at it, we might as well deal with similar routines that will be needed later. @<Declare procedures that scan restricted classes of integers@>= void scan_four_bit_int(void) {@+scan_int(); if ((cur_val < 0)||(cur_val > 15)) {@+print_err("Bad number"); @.Bad number@> help2("Since I expected to read a number between 0 and 15,")@/ ("I changed this one to zero.");int_error(cur_val);cur_val=0; } } @ @<Declare procedures that scan restricted classes of integers@>= void scan_fifteen_bit_int(void) {@+scan_int(); if ((cur_val < 0)||(cur_val > 077777)) {@+print_err("Bad mathchar"); @.Bad mathchar@> help2("A mathchar number must be between 0 and 32767.")@/ ("I changed this one to zero.");int_error(cur_val);cur_val=0; } } @ @<Declare procedures that scan restricted classes of integers@>= void scan_twenty_seven_bit_int(void) {@+scan_int(); if ((cur_val < 0)||(cur_val > 0777777777)) {@+print_err("Bad delimiter code"); @.Bad delimiter code@> help2("A numeric delimiter code must be between 0 and 2^{27}-1.")@/ ("I changed this one to zero.");int_error(cur_val);cur_val=0; } } @ An integer number can be preceded by any number of spaces and `\.+' or `\.-' signs. Then comes either a decimal constant (i.e., radix 10), an octal constant (i.e., radix 8, preceded by~\.\'), a hexadecimal constant (radix 16, preceded by~\."), an alphabetic constant (preceded by~\.\`), or an internal variable. After scanning is complete, |cur_val| will contain the answer, which must be at most $2^{31}-1=2147483647$ in absolute value. The value of |radix| is set to 10, 8, or 16 in the cases of decimal, octal, or hexadecimal constants, otherwise |radix| is set to zero. An optional space follows a constant. @d octal_token (other_token+'\'') /*apostrophe, indicates an octal constant*/ @d hex_token (other_token+'"') /*double quote, indicates a hex constant*/ @d alpha_token (other_token+'`') /*reverse apostrophe, precedes alpha constants*/ @d point_token (other_token+'.') /*decimal point*/ @d continental_point_token (other_token+',') /*decimal point, Eurostyle*/ @<Glob...@>= small_number @!radix; /*|scan_int| sets this to 8, 10, 16, or zero*/ @ We initialize the following global variables just in case |expand| comes into action before any of the basic scanning routines has assigned them a value. @<Set init...@>= cur_val=0;cur_val_level=int_val;radix=0;cur_order=normal; @ The |scan_int| routine is used also to scan the integer part of a fraction; for example, the `\.3' in `\.{3.14159}' will be found by |scan_int|. The |scan_dimen| routine assumes that |cur_tok==point_token| after the integer part of such a fraction has been scanned by |scan_int|, and that the decimal point has been backed up to be scanned again. @p void scan_int(void) /*sets |cur_val| to an integer*/ {@+ bool negative; /*should the answer be negated?*/ int @!m; /*|@t$2^{31}$@>/radix|, the threshold of danger*/ small_number @!d; /*the digit just scanned*/ bool @!vacuous; /*have no digits appeared?*/ bool @!OK_so_far; /*has an error message been issued?*/ radix=0;OK_so_far=true;@/ @<Get the next non-blank non-sign token; set |negative| appropriately@>; if (cur_tok==alpha_token) @<Scan an alphabetic character code into |cur_val|@>@; else if ((cur_cmd >= min_internal)&&(cur_cmd <= max_internal)) scan_something_internal(int_val, false); else@<Scan a numeric constant@>; if (negative) negate(cur_val); } @ @<Get the next non-blank non-sign token...@>= negative=false; @/do@+{@<Get the next non-blank non-call token@>; if (cur_tok==other_token+'-') {@+negative=!negative;cur_tok=other_token+'+'; } }@+ while (!(cur_tok!=other_token+'+')) @ A space is ignored after an alphabetic character constant, so that such constants behave like numeric ones. @<Scan an alphabetic character code into |cur_val|@>= {@+get_token(); /*suppress macro expansion*/ if (cur_tok < cs_token_flag) {@+cur_val=cur_chr; if (cur_cmd <= right_brace) if (cur_cmd==right_brace) incr(align_state); else decr(align_state); } else if (cur_tok < cs_token_flag+single_base) cur_val=cur_tok-cs_token_flag-active_base; else cur_val=cur_tok-cs_token_flag-single_base; if (cur_val > 255) {@+print_err("Improper alphabetic constant"); @.Improper alphabetic constant@> help2("A one-character control sequence belongs after a ` mark.")@/ ("So I'm essentially inserting \\0 here."); cur_val='0';back_error(); } else@<Scan an optional space@>; } @ @<Scan an optional space@>= {@+get_x_token();if (cur_cmd!=spacer) back_input(); } @ @<Scan a numeric constant@>= {@+radix=10;m=214748364; if (cur_tok==octal_token) {@+radix=8;m=02000000000;get_x_token(); } else if (cur_tok==hex_token) {@+radix=16;m=01000000000;get_x_token(); } vacuous=true;cur_val=0;@/ @<Accumulate the constant until |cur_tok| is not a suitable digit@>; if (vacuous) @<Express astonishment that no number was here@>@; else if (cur_cmd!=spacer) back_input(); } @ @d infinity 017777777777 /*the largest positive value that \TeX\ knows*/ @d zero_token (other_token+'0') /*zero, the smallest digit*/ @d A_token (letter_token+'A') /*the smallest special hex digit*/ @d other_A_token (other_token+'A') /*special hex digit of type |other_char|*/ @<Accumulate the constant...@>= loop@+{@+if ((cur_tok < zero_token+radix)&&(cur_tok >= zero_token)&& (cur_tok <= zero_token+9)) d=cur_tok-zero_token; else if (radix==16) if ((cur_tok <= A_token+5)&&(cur_tok >= A_token)) d=cur_tok-A_token+10; else if ((cur_tok <= other_A_token+5)&&(cur_tok >= other_A_token)) d=cur_tok-other_A_token+10; else goto done; else goto done; vacuous=false; if ((cur_val >= m)&&((cur_val > m)||(d > 7)||(radix!=10))) {@+if (OK_so_far) {@+print_err("Number too big"); @.Number too big@> help2("I can only go up to 2147483647='17777777777=\"7FFFFFFF,")@/ ("so I'm using that number instead of yours."); error();cur_val=infinity;OK_so_far=false; } } else cur_val=cur_val*radix+d; get_x_token(); } done: @ @<Express astonishment...@>= {@+print_err("Missing number, treated as zero"); @.Missing number...@> help3("A number should have been here; I inserted `0'.")@/ ("(If you can't figure out why I needed to see a number,")@/ ("look up `weird error' in the index to The TeXbook.)"); @:TeXbook}{\sl The \TeX book@> back_error(); } @ The |scan_dimen| routine is similar to |scan_int|, but it sets |cur_val| to a |scaled| value, i.e., an integral number of sp. One of its main tasks is therefore to interpret the abbreviations for various kinds of units and to convert measurements to scaled points. There are three parameters: |mu| is |true| if the finite units must be `\.{mu}', while |mu| is |false| if `\.{mu}' units are disallowed; |inf| is |true| if the infinite units `\.{fil}', `\.{fill}', `\.{filll}' are permitted; and |shortcut| is |true| if |cur_val| already contains an integer and only the units need to be considered. The order of infinity that was found in the case of infinite glue is returned in the global variable |cur_order|. @<Glob...@>= glue_ord @!cur_order; /*order of infinity found by |scan_dimen|*/ @ Constructions like `\.{-\'77 pt}' are legal dimensions, so |scan_dimen| may begin with |scan_int|. This explains why it is convenient to use |scan_int| also for the integer part of a decimal fraction. Several branches of |scan_dimen| work with |cur_val| as an integer and with an auxiliary fraction |f|, so that the actual quantity of interest is $|cur_val|+|f|/2^{16}$. At the end of the routine, this ``unpacked'' representation is put into the single word |cur_val|, which suddenly switches significance from |int| to |scaled|. @d scan_normal_dimen scan_dimen(false, false, false) @p void scan_dimen(bool @!mu, bool @!inf, bool @!shortcut) /*sets |cur_val| to a dimension*/ {@+ bool negative; /*should the answer be negated?*/ int @!f; /*numerator of a fraction whose denominator is $2^{16}$*/ @<Local variables for dimension calculations@>@; f=0;arith_error=false;cur_order=normal;negative=false; if (!shortcut) {@+@<Get the next non-blank non-sign...@>; if ((cur_cmd >= min_internal)&&(cur_cmd <= max_internal)) @<Fetch an internal dimension and |goto attach_sign|, or fetch an internal integer@>@; else{@+back_input(); if (cur_tok==continental_point_token) cur_tok=point_token; if (cur_tok!=point_token) scan_int(); else{@+radix=10;cur_val=0; } if (cur_tok==continental_point_token) cur_tok=point_token; if ((radix==10)&&(cur_tok==point_token)) @<Scan decimal fraction@>; } } if (cur_val < 0) /*in this case |f==0|*/ {@+negative=!negative;negate(cur_val); } @<Scan units and set |cur_val| to $x\cdot(|cur_val|+f/2^{16})$, where there are |x| sp per unit; |goto attach_sign| if the units are internal@>; @<Scan an optional space@>; attach_sign: if (arith_error||(abs(cur_val) >= 010000000000)) @<Report that this dimension is out of range@>; if (negative) negate(cur_val); } @ @<Fetch an internal dimension and |goto attach_sign|...@>= if (mu) {@+scan_something_internal(mu_val, false); @<Coerce glue to a dimension@>; if (cur_val_level==mu_val) goto attach_sign; if (cur_val_level!=int_val) mu_error(); } else{@+scan_something_internal(dimen_val, false); if (cur_val_level==dimen_val) goto attach_sign; } @ @<Local variables for dimension calculations@>= uint32_t @!num, @!denom; /*conversion ratio for the scanned units*/ int @!k, @!kk; /*number of digits in a decimal fraction*/ pointer @!p, @!q; /*top of decimal digit stack*/ scaled @!v; /*an internal dimension*/ int @!save_cur_val; /*temporary storage of |cur_val|*/ @ The following code is executed when |scan_something_internal| was called asking for |mu_val|, when we really wanted a ``mudimen'' instead of ``muglue.'' @<Coerce glue to a dimension@>= if (cur_val_level >= glue_val) {@+v=width(cur_val);delete_glue_ref(cur_val);cur_val=v; } @ When the following code is executed, we have |cur_tok==point_token|, but this token has been backed up using |back_input|; we must first discard it. It turns out that a decimal point all by itself is equivalent to `\.{0.0}'. Let's hope people don't use that fact. @<Scan decimal fraction@>= {@+k=0;p=null;get_token(); /*|point_token| is being re-scanned*/ loop@+{@+get_x_token(); if ((cur_tok > zero_token+9)||(cur_tok < zero_token)) goto done1; if (k < 17) /*digits for |k >= 17| cannot affect the result*/ {@+q=get_avail();link(q)=p;info(q)=cur_tok-zero_token; p=q;incr(k); } } done1: for (kk=k; kk>=1; kk--) {@+dig[kk-1]=info(p);q=p;p=link(p);free_avail(q); } f=round_decimals(k); if (cur_cmd!=spacer) back_input(); } @ Now comes the harder part: At this point in the program, |cur_val| is a nonnegative integer and $f/2^{16}$ is a nonnegative fraction less than 1; we want to multiply the sum of these two quantities by the appropriate factor, based on the specified units, in order to produce a |scaled| result, and we want to do the calculation with fixed point arithmetic that does not overflow. @<Scan units and set |cur_val| to $x\cdot(|cur_val|+f/2^{16})$...@>= if (inf) @<Scan for \(f)\.{fil} units; |goto attach_fraction| if found@>; @<Scan for \(u)units that are internal dimensions; |goto attach_sign| with |cur_val| set if found@>; if (mu) @<Scan for \(m)\.{mu} units and |goto attach_fraction|@>; if (scan_keyword(@[@<|"true"|@>@])) @<Adjust \(f)for the magnification ratio@>; @.true@> if (scan_keyword(@[@<|"pt"|@>@])) goto attach_fraction; /*the easy case*/ @.pt@> @<Scan for \(a)all other units and adjust |cur_val| and |f| accordingly; |goto done| in the case of scaled points@>; attach_fraction: if (cur_val >= 040000) arith_error=true; else cur_val=cur_val*unity+f; done: @ A specification like `\.{filllll}' or `\.{fill L L L}' will lead to two error messages (one for each additional keyword \.{"l"}). @<Scan for \(f)\.{fil} units...@>= if (scan_keyword(@[@<|"fil"|@>@])) @.fil@> {@+cur_order=fil; while (scan_keyword('l')) {@+if (cur_order==filll) {@+print_err("Illegal unit of measure ("); @.Illegal unit of measure@> print_str("replaced by filll)"); help1("I dddon't go any higher than filll.");error(); } else incr(cur_order); } goto attach_fraction; } @ @<Scan for \(u)units that are internal dimensions...@>= save_cur_val=cur_val; @<Get the next non-blank non-call...@>; if ((cur_cmd < min_internal)||(cur_cmd > max_internal)) back_input(); else{@+if (mu) {@+scan_something_internal(mu_val, false);@<Coerce glue...@>; if (cur_val_level!=mu_val) mu_error(); } else scan_something_internal(dimen_val, false); v=cur_val;goto found; } if (mu) goto not_found; if (scan_keyword(@[@<|"em"|@>@])) v=(@<The em width for |cur_font|@>); @.em@> else if (scan_keyword(@[@<|"ex"|@>@])) v=(@<The x-height for |cur_font|@>); @.ex@> else goto not_found; @<Scan an optional space@>; found: cur_val=nx_plus_y(save_cur_val, v, xn_over_d(v, f, 0200000)); goto attach_sign; not_found: @ @<Scan for \(m)\.{mu} units and |goto attach_fraction|@>= if (scan_keyword(@[@<|"mu"|@>@])) goto attach_fraction; @.mu@> else{@+print_err("Illegal unit of measure (");print_str("mu inserted)"); @.Illegal unit of measure@> help4("The unit of measurement in math glue must be mu.")@/ ("To recover gracefully from this error, it's best to")@/ ("delete the erroneous units; e.g., type `2' to delete")@/ ("two letters. (See Chapter 27 of The TeXbook.)"); @:TeXbook}{\sl The \TeX book@> error();goto attach_fraction; } @ @<Adjust \(f)for the magnification ratio@>= {@+prepare_mag(); if (mag!=1000) {@+cur_val=xn_over_d(cur_val, 1000, mag); f=(1000*f+0200000*rem)/mag; cur_val=cur_val+(f/0200000);f=f%0200000; } } @ The necessary conversion factors can all be specified exactly as fractions whose numerator and denominator sum to 32768 or less. According to the definitions here, $\rm2660\,dd\approx1000.33297\,mm$; this agrees well with the value $\rm1000.333\,mm$ cited by Bosshard @^Bosshard, Hans Rudolf@> in {\sl Technische Grundlagen zur Satzherstellung\/} (Bern, 1980). @d set_conversion_end(X) denom=X;} @d set_conversion(X) @+{@+num=X;set_conversion_end @<Scan for \(a)all other units and adjust |cur_val| and |f|...@>= if (scan_keyword(@[@<|"in"|@>@])) set_conversion(7227)(100)@; @.in@> else if (scan_keyword(@[@<|"pc"|@>@])) set_conversion(12)(1)@; @.pc@> else if (scan_keyword(@[@<|"cm"|@>@])) set_conversion(7227)(254)@; @.cm@> else if (scan_keyword(@[@<|"mm"|@>@])) set_conversion(7227)(2540)@; @.mm@> else if (scan_keyword(@[@<|"bp"|@>@])) set_conversion(7227)(7200)@; @.bp@> else if (scan_keyword(@[@<|"dd"|@>@])) set_conversion(1238)(1157)@; @.dd@> else if (scan_keyword(@[@<|"cc"|@>@])) set_conversion(14856)(1157)@; @.cc@> else if (scan_keyword(@[@<|"sp"|@>@])) goto done; @.sp@> else@<Complain about unknown unit and |goto done2|@>; cur_val=xn_over_d(cur_val, num, denom); f=(num*f+0200000*rem)/denom;@/ cur_val=cur_val+(f/0200000);f=f%0200000; done2: @ @<Complain about unknown unit...@>= {@+print_err("Illegal unit of measure (");print_str("pt inserted)"); @.Illegal unit of measure@> help6("Dimensions can be in units of em, ex, in, pt, pc,")@/ ("cm, mm, dd, cc, bp, or sp; but yours is a new one!")@/ ("I'll assume that you meant to say pt, for printer's points.")@/ ("To recover gracefully from this error, it's best to")@/ ("delete the erroneous units; e.g., type `2' to delete")@/ ("two letters. (See Chapter 27 of The TeXbook.)"); @:TeXbook}{\sl The \TeX book@> error();goto done2; } @ @<Report that this dimension is out of range@>= {@+print_err("Dimension too large"); @.Dimension too large@> help2("I can't work with sizes bigger than about 19 feet.")@/ ("Continue and I'll use the largest value I can.");@/ error();cur_val=max_dimen;arith_error=false; } @ The final member of \TeX's value-scanning trio is |scan_glue|, which makes |cur_val| point to a glue specification. The reference count of that glue spec will take account of the fact that |cur_val| is pointing to~it. The |level| parameter should be either |glue_val| or |mu_val|. Since |scan_dimen| was so much more complex than |scan_int|, we might expect |scan_glue| to be even worse. But fortunately, it is very simple, since most of the work has already been done. @p void scan_glue(small_number @!level) /*sets |cur_val| to a glue spec pointer*/ {@+ bool negative; /*should the answer be negated?*/ pointer @!q; /*new glue specification*/ bool @!mu; /*does |level==mu_val|?*/ mu=(level==mu_val);@<Get the next non-blank non-sign...@>; if ((cur_cmd >= min_internal)&&(cur_cmd <= max_internal)) {@+scan_something_internal(level, negative); if (cur_val_level >= glue_val) {@+if (cur_val_level!=level) mu_error(); return; } if (cur_val_level==int_val) scan_dimen(mu, false, true); else if (level==mu_val) mu_error(); } else{@+back_input();scan_dimen(mu, false, false); if (negative) negate(cur_val); } @<Create a new glue specification whose width is |cur_val|; scan for its stretch and shrink components@>; } @ @<Create a new glue specification whose width is |cur_val|...@>= q=new_spec(zero_glue);width(q)=cur_val; if (scan_keyword(@[@<|"plus"|@>@])) @.plus@> {@+scan_dimen(mu, true, false); stretch(q)=cur_val;stretch_order(q)=cur_order; } if (scan_keyword(@[@<|"minus"|@>@])) @.minus@> {@+scan_dimen(mu, true, false); shrink(q)=cur_val;shrink_order(q)=cur_order; } cur_val=q @ Here's a similar procedure that returns a pointer to a rule node. This routine is called just after \TeX\ has seen \.{\\hrule} or \.{\\vrule}; therefore |cur_cmd| will be either |hrule| or |vrule|. The idea is to store the default rule dimensions in the node, then to override them if `\.{height}' or `\.{width}' or `\.{depth}' specifications are found (in any order). @d default_rule 26214 /*0.4\thinspace pt*/ @p pointer scan_rule_spec(void) {@+ pointer q; /*the rule node being created*/ q=new_rule(); /*|width|, |depth|, and |height| all equal |null_flag| now*/ if (cur_cmd==vrule) width(q)=default_rule; else{@+height(q)=default_rule;depth(q)=0; } reswitch: if (scan_keyword(@[@<|"width"|@>@])) @.width@> {@+scan_normal_dimen;width(q)=cur_val;goto reswitch; } if (scan_keyword(@[@<|"height"|@>@])) @.height@> {@+scan_normal_dimen;height(q)=cur_val;goto reswitch; } if (scan_keyword(@[@<|"depth"|@>@])) @.depth@> {@+scan_normal_dimen;depth(q)=cur_val;goto reswitch; } return q; } @* Building token lists. The token lists for macros and for other things like \.{\\mark} and \.{\\output} and \.{\\write} are produced by a procedure called |scan_toks|. Before we get into the details of |scan_toks|, let's consider a much simpler task, that of converting the current string into a token list. The |str_toks| function does this; it classifies spaces as type |spacer| and everything else as type |other_char|. The token list created by |str_toks| begins at |link(temp_head)| and ends at the value |p| that is returned. (If |p==temp_head|, the list is empty.) @p pointer str_toks(pool_pointer @!b) /*changes the string |str_pool[b dotdot pool_ptr]| to a token list*/ {@+pointer p; /*tail of the token list*/ pointer @!q; /*new node being added to the token list via |store_new_token|*/ halfword @!t; /*token being appended*/ pool_pointer @!k; /*index into |str_pool|*/ str_room(1); p=temp_head;link(p)=null;k=b; while (k < pool_ptr) {@+t=so(str_pool[k]); if (t==' ') t=space_token; else t=other_token+t; fast_store_new_token(t); incr(k); } pool_ptr=b;return p; } @ The main reason for wanting |str_toks| is the next function, |the_toks|, which has similar input/output characteristics. This procedure is supposed to scan something like `\.{\\skip\\count12}', i.e., whatever can follow `\.{\\the}', and it constructs a token list containing something like `\.{-3.0pt minus 0.5fill}'. @p pointer the_toks(void) {@+uint8_t old_setting; /*holds |selector| setting*/ pointer @!p, @!q, @!r; /*used for copying a token list*/ pool_pointer @!b; /*base of temporary string*/ get_x_token();scan_something_internal(tok_val, false); if (cur_val_level >= ident_val) @<Copy the token list@>@; else{@+old_setting=selector;selector=new_string;b=pool_ptr; switch (cur_val_level) { case int_val: print_int(cur_val);@+break; case dimen_val: {@+print_scaled(cur_val);print_str("pt"); } @+break; case glue_val: {@+print_spec(cur_val,@[@<|"pt"|@>@]);delete_glue_ref(cur_val); } @+break; case mu_val: {@+print_spec(cur_val,@[@<|"mu"|@>@]);delete_glue_ref(cur_val); } } /*there are no other cases*/ selector=old_setting;return str_toks(b); } } @ @<Copy the token list@>= {@+p=temp_head;link(p)=null; if (cur_val_level==ident_val) store_new_token(cs_token_flag+cur_val)@; else if (cur_val!=null) {@+r=link(cur_val); /*do not copy the reference count*/ while (r!=null) {@+fast_store_new_token(info(r));r=link(r); } } return p; } @ Here's part of the |expand| subroutine that we are now ready to complete: @p void ins_the_toks(void) {@+link(garbage)=the_toks();ins_list(link(temp_head)); } @ The primitives \.{\\number}, \.{\\romannumeral}, \.{\\string}, \.{\\meaning}, \.{\\fontname}, and \.{\\jobname} are defined as follows. @d number_code 0 /*command code for \.{\\number}*/ @d roman_numeral_code 1 /*command code for \.{\\romannumeral}*/ @d string_code 2 /*command code for \.{\\string}*/ @d meaning_code 3 /*command code for \.{\\meaning}*/ @d font_name_code 4 /*command code for \.{\\fontname}*/ @d job_name_code 5 /*command code for \.{\\jobname}*/ @<Put each...@>= primitive(@[@<|"number"|@>@], convert, number_code);@/ @!@:number_}{\.{\\number} primitive@> primitive(@[@<|"romannumeral"|@>@], convert, roman_numeral_code);@/ @!@:roman_numeral_}{\.{\\romannumeral} primitive@> primitive(@[@<|"string"|@>@], convert, string_code);@/ @!@:string_}{\.{\\string} primitive@> primitive(@[@<|"meaning"|@>@], convert, meaning_code);@/ @!@:meaning_}{\.{\\meaning} primitive@> primitive(@[@<|"fontname"|@>@], convert, font_name_code);@/ @!@:font_name_}{\.{\\fontname} primitive@> primitive(@[@<|"jobname"|@>@], convert, job_name_code);@/ @!@:job_name_}{\.{\\jobname} primitive@> @ @<Cases of |print_cmd_chr|...@>= case convert: switch (chr_code) { case number_code: print_esc(@[@<|"number"|@>@]);@+break; case roman_numeral_code: print_esc(@[@<|"romannumeral"|@>@]);@+break; case string_code: print_esc(@[@<|"string"|@>@]);@+break; case meaning_code: print_esc(@[@<|"meaning"|@>@]);@+break; case font_name_code: print_esc(@[@<|"fontname"|@>@]);@+break; default:print_esc(@[@<|"jobname"|@>@]); } @+break; @ The procedure |conv_toks| uses |str_toks| to insert the token list for |convert| functions into the scanner; `\.{\\outer}' control sequences are allowed to follow `\.{\\string}' and `\.{\\meaning}'. @p void conv_toks(void) {@+uint8_t old_setting; /*holds |selector| setting*/ uint8_t @!c; /*desired type of conversion*/ small_number @!save_scanner_status; /*|scanner_status| upon entry*/ pool_pointer @!b; /*base of temporary string*/ c=cur_chr;@<Scan the argument for command |c|@>; old_setting=selector;selector=new_string;b=pool_ptr; @<Print the result of command |c|@>; selector=old_setting;link(garbage)=str_toks(b);ins_list(link(temp_head)); } @ @<Scan the argument for command |c|@>= switch (c) { case number_code: case roman_numeral_code: scan_int();@+break; case string_code: case meaning_code: {@+save_scanner_status=scanner_status; scanner_status=normal;get_token();scanner_status=save_scanner_status; } @+break; case font_name_code: scan_font_ident();@+break; case job_name_code: if (job_name==0) open_log_file(); } /*there are no other cases*/ @ @<Print the result of command |c|@>= switch (c) { case number_code: print_int(cur_val);@+break; case roman_numeral_code: print_roman_int(cur_val);@+break; case string_code: if (cur_cs!=0) sprint_cs(cur_cs); else print_char(cur_chr);@+break; case meaning_code: print_meaning();@+break; case font_name_code: {@+print(font_name[cur_val]); if (font_size[cur_val]!=font_dsize[cur_val]) {@+print_str(" at ");print_scaled(font_size[cur_val]); print_str("pt"); } } @+break; case job_name_code: print(job_name); } /*there are no other cases*/ @ Now we can't postpone the difficulties any longer; we must bravely tackle |scan_toks|. This function returns a pointer to the tail of a new token list, and it also makes |def_ref| point to the reference count at the head of that list. There are two boolean parameters, |macro_def| and |xpand|. If |macro_def| is true, the goal is to create the token list for a macro definition; otherwise the goal is to create the token list for some other \TeX\ primitive: \.{\\mark}, \.{\\output}, \.{\\everypar}, \.{\\lowercase}, \.{\\uppercase}, \.{\\message}, \.{\\errmessage}, \.{\\write}, or \.{\\special}. In the latter cases a left brace must be scanned next; this left brace will not be part of the token list, nor will the matching right brace that comes at the end. If |xpand| is false, the token list will simply be copied from the input using |get_token|. Otherwise all expandable tokens will be expanded until unexpandable tokens are left, except that the results of expanding `\.{\\the}' are not expanded further. If both |macro_def| and |xpand| are true, the expansion applies only to the macro body (i.e., to the material following the first |left_brace| character). The value of |cur_cs| when |scan_toks| begins should be the |eqtb| address of the control sequence to display in ``runaway'' error messages. @p pointer scan_toks(bool @!macro_def, bool @!xpand) {@+ halfword t; /*token representing the highest parameter number*/ halfword @!s; /*saved token*/ pointer @!p; /*tail of the token list being built*/ pointer @!q; /*new node being added to the token list via |store_new_token|*/ halfword @!unbalance; /*number of unmatched left braces*/ halfword @!hash_brace; /*possible `\.{\#\{}' token*/ if (macro_def) scanner_status=defining; @+else scanner_status=absorbing; warning_index=cur_cs;def_ref=get_avail();token_ref_count(def_ref)=null; p=def_ref;hash_brace=0;t=zero_token; if (macro_def) @<Scan and build the parameter part of the macro definition@>@; else scan_left_brace(); /*remove the compulsory left brace*/ @<Scan and build the body of the token list; |goto found| when finished@>; found: scanner_status=normal; if (hash_brace!=0) store_new_token(hash_brace); return p; } @ @<Scan and build the parameter part...@>= {@+loop{@+get_token(); /*set |cur_cmd|, |cur_chr|, |cur_tok|*/ if (cur_tok < right_brace_limit) goto done1; if (cur_cmd==mac_param) @<If the next character is a parameter number, make |cur_tok| a |match| token; but if it is a left brace, store `|left_brace|, |end_match|', set |hash_brace|, and |goto done|@>; store_new_token(cur_tok); } done1: store_new_token(end_match_token); if (cur_cmd==right_brace) @<Express shock at the missing left brace; |goto found|@>; done: ;} @ @<Express shock...@>= {@+print_err("Missing { inserted");incr(align_state); @.Missing \{ inserted@> help2("Where was the left brace? You said something like `\\def\\a}',")@/ ("which I'm going to interpret as `\\def\\a{}'.");error();goto found; } @ @<If the next character is a parameter number...@>= {@+s=match_token+cur_chr;get_token(); if (cur_cmd==left_brace) {@+hash_brace=cur_tok; store_new_token(cur_tok);store_new_token(end_match_token); goto done; } if (t==zero_token+9) {@+print_err("You already have nine parameters"); @.You already have nine...@> help1("I'm going to ignore the # sign you just used.");error(); } else{@+incr(t); if (cur_tok!=t) {@+print_err("Parameters must be numbered consecutively"); @.Parameters...consecutively@> help2("I've inserted the digit you should have used after the #.")@/ ("Type `1' to delete what you did use.");back_error(); } cur_tok=s; } } @ @<Scan and build the body of the token list; |goto found| when finished@>= unbalance=1; loop@+{@+if (xpand) @<Expand the next part of the input@>@; else get_token(); if (cur_tok < right_brace_limit) if (cur_cmd < right_brace) incr(unbalance); else{@+decr(unbalance); if (unbalance==0) goto found; } else if (cur_cmd==mac_param) if (macro_def) @<Look for parameter number or \.{\#\#}@>; store_new_token(cur_tok); } @ Here we insert an entire token list created by |the_toks| without expanding it further. @<Expand the next part of the input@>= {@+loop{@+get_next(); if (cur_cmd <= max_command) goto done2; if (cur_cmd!=the) expand(); else{@+q=the_toks(); if (link(temp_head)!=null) {@+link(p)=link(temp_head);p=q; } } } done2: x_token(); } @ @<Look for parameter number...@>= {@+s=cur_tok; if (xpand) get_x_token();else get_token(); if (cur_cmd!=mac_param) if ((cur_tok <= zero_token)||(cur_tok > t)) {@+print_err("Illegal parameter number in definition of "); @.Illegal parameter number...@> sprint_cs(warning_index); help3("You meant to type ## instead of #, right?")@/ ("Or maybe a } was forgotten somewhere earlier, and things")@/ ("are all screwed up? I'm going to assume that you meant ##."); back_error();cur_tok=s; } else cur_tok=out_param_token-'0'+cur_chr; } @ Another way to create a token list is via the \.{\\read} command. The sixteen files potentially usable for reading appear in the following global variables. The value of |read_open[n]| will be |closed| if stream number |n| has not been opened or if it has been fully read; |just_open| if an \.{\\openin} but not a \.{\\read} has been done; and |normal| if it is open and ready to read the next line. @d closed 2 /*not open, or at end of file*/ @d just_open 1 /*newly opened, first line not yet read*/ @<Glob...@>= alpha_file @!read_file[16]; /*used for \.{\\read}*/ uint8_t @!read_open[17]; /*state of |read_file[n]|*/ @ @<Set init...@>= for (k=0; k<=16; k++) read_open[k]=closed; @ The |read_toks| procedure constructs a token list like that for any macro definition, and makes |cur_val| point to it. Parameter |r| points to the control sequence that will receive this token list. @p void read_toks(int @!n, pointer @!r) {@+ pointer p; /*tail of the token list*/ pointer @!q; /*new node being added to the token list via |store_new_token|*/ int @!s; /*saved value of |align_state|*/ small_number @!m; /*stream number*/ scanner_status=defining;warning_index=r; def_ref=get_avail();token_ref_count(def_ref)=null; p=def_ref; /*the reference count*/ store_new_token(end_match_token); if ((n < 0)||(n > 15)) m=16;@+else m=n; s=align_state;align_state=1000000; /*disable tab marks, etc.*/ @/do@+{@<Input and store tokens from the next line of the file@>; }@+ while (!(align_state==1000000)); cur_val=def_ref;scanner_status=normal;align_state=s; } @ @<Input and store tokens from the next line of the file@>= begin_file_reading();name=m+1; if (read_open[m]==closed) @<Input for \.{\\read} from the terminal@>; else if (read_open[m]==just_open) @<Input the first line of |read_file[m]|@>@; else@<Input the next line of |read_file[m]|@>; limit=last; if (end_line_char_inactive) decr(limit); else buffer[limit]=end_line_char; first=limit+1;loc=start;state=new_line;@/ loop@+{@+get_token(); if (cur_tok==0) goto done; /*|cur_cmd==cur_chr==0| will occur at the end of the line*/ if (align_state < 1000000) /*unmatched `\.\}' aborts the line*/ {@+@/do@+{get_token();}@+ while (!(cur_tok==0)); align_state=1000000;goto done; } store_new_token(cur_tok); } done: end_file_reading() @ Here we input on-line into the |buffer| array, prompting the user explicitly if |n >= 0|. The value of |n| is set negative so that additional prompts will not be given in the case of multi-line input. @<Input for \.{\\read} from the terminal@>= if (interaction > nonstop_mode) if (n < 0) prompt_input("")@; else{@+wake_up_terminal; print_ln();sprint_cs(r);prompt_input("=");n=-1; } else fatal_error("*** (cannot \\read from terminal in nonstop modes)") @.cannot \\read@> @ The first line of a file must be treated specially, since |input_ln| must be told not to start with |get|. @^system dependencies@> @<Input the first line of |read_file[m]|@>= if (input_ln(&read_file[m], false)) read_open[m]=normal; else{@+a_close(&read_file[m]);read_open[m]=closed; } @ An empty line is appended at the end of a |read_file|. @^empty line at end of file@> @<Input the next line of |read_file[m]|@>= {@+if (!input_ln(&read_file[m], true)) {@+a_close(&read_file[m]);read_open[m]=closed; if (align_state!=1000000) {@+runaway(); print_err("File ended within ");print_esc(@[@<|"read"|@>@]); @.File ended within \\read@> help1("This \\read has unbalanced braces."); align_state=1000000;error(); } } } @* Conditional processing. We consider now the way \TeX\ handles various kinds of \.{\\if} commands. @d if_char_code 0 /* `\.{\\if}' */ @d if_cat_code 1 /* `\.{\\ifcat}' */ @d if_int_code 2 /* `\.{\\ifnum}' */ @d if_dim_code 3 /* `\.{\\ifdim}' */ @d if_odd_code 4 /* `\.{\\ifodd}' */ @d if_vmode_code 5 /* `\.{\\ifvmode}' */ @d if_hmode_code 6 /* `\.{\\ifhmode}' */ @d if_mmode_code 7 /* `\.{\\ifmmode}' */ @d if_inner_code 8 /* `\.{\\ifinner}' */ @d if_void_code 9 /* `\.{\\ifvoid}' */ @d if_hbox_code 10 /* `\.{\\ifhbox}' */ @d if_vbox_code 11 /* `\.{\\ifvbox}' */ @d ifx_code 12 /* `\.{\\ifx}' */ @d if_eof_code 13 /* `\.{\\ifeof}' */ @d if_true_code 14 /* `\.{\\iftrue}' */ @d if_false_code 15 /* `\.{\\iffalse}' */ @d if_case_code 16 /* `\.{\\ifcase}' */ @<Put each...@>= primitive(@[@<|"if"|@>@], if_test, if_char_code); @!@:if_char_}{\.{\\if} primitive@> primitive(@[@<|"ifcat"|@>@], if_test, if_cat_code); @!@:if_cat_code_}{\.{\\ifcat} primitive@> primitive(@[@<|"ifnum"|@>@], if_test, if_int_code); @!@:if_int_}{\.{\\ifnum} primitive@> primitive(@[@<|"ifdim"|@>@], if_test, if_dim_code); @!@:if_dim_}{\.{\\ifdim} primitive@> primitive(@[@<|"ifodd"|@>@], if_test, if_odd_code); @!@:if_odd_}{\.{\\ifodd} primitive@> primitive(@[@<|"ifvmode"|@>@], if_test, if_vmode_code); @!@:if_vmode_}{\.{\\ifvmode} primitive@> primitive(@[@<|"ifhmode"|@>@], if_test, if_hmode_code); @!@:if_hmode_}{\.{\\ifhmode} primitive@> primitive(@[@<|"ifmmode"|@>@], if_test, if_mmode_code); @!@:if_mmode_}{\.{\\ifmmode} primitive@> primitive(@[@<|"ifinner"|@>@], if_test, if_inner_code); @!@:if_inner_}{\.{\\ifinner} primitive@> primitive(@[@<|"ifvoid"|@>@], if_test, if_void_code); @!@:if_void_}{\.{\\ifvoid} primitive@> primitive(@[@<|"ifhbox"|@>@], if_test, if_hbox_code); @!@:if_hbox_}{\.{\\ifhbox} primitive@> primitive(@[@<|"ifvbox"|@>@], if_test, if_vbox_code); @!@:if_vbox_}{\.{\\ifvbox} primitive@> primitive(@[@<|"ifx"|@>@], if_test, ifx_code); @!@:ifx_}{\.{\\ifx} primitive@> primitive(@[@<|"ifeof"|@>@], if_test, if_eof_code); @!@:if_eof_}{\.{\\ifeof} primitive@> primitive(@[@<|"iftrue"|@>@], if_test, if_true_code); @!@:if_true_}{\.{\\iftrue} primitive@> primitive(@[@<|"iffalse"|@>@], if_test, if_false_code); @!@:if_false_}{\.{\\iffalse} primitive@> primitive(@[@<|"ifcase"|@>@], if_test, if_case_code); @!@:if_case_}{\.{\\ifcase} primitive@> @ @<Cases of |print_cmd_chr|...@>= case if_test: switch (chr_code) { case if_cat_code: print_esc(@[@<|"ifcat"|@>@]);@+break; case if_int_code: print_esc(@[@<|"ifnum"|@>@]);@+break; case if_dim_code: print_esc(@[@<|"ifdim"|@>@]);@+break; case if_odd_code: print_esc(@[@<|"ifodd"|@>@]);@+break; case if_vmode_code: print_esc(@[@<|"ifvmode"|@>@]);@+break; case if_hmode_code: print_esc(@[@<|"ifhmode"|@>@]);@+break; case if_mmode_code: print_esc(@[@<|"ifmmode"|@>@]);@+break; case if_inner_code: print_esc(@[@<|"ifinner"|@>@]);@+break; case if_void_code: print_esc(@[@<|"ifvoid"|@>@]);@+break; case if_hbox_code: print_esc(@[@<|"ifhbox"|@>@]);@+break; case if_vbox_code: print_esc(@[@<|"ifvbox"|@>@]);@+break; case ifx_code: print_esc(@[@<|"ifx"|@>@]);@+break; case if_eof_code: print_esc(@[@<|"ifeof"|@>@]);@+break; case if_true_code: print_esc(@[@<|"iftrue"|@>@]);@+break; case if_false_code: print_esc(@[@<|"iffalse"|@>@]);@+break; case if_case_code: print_esc(@[@<|"ifcase"|@>@]);@+break; default:print_esc(@[@<|"if"|@>@]); } @+break; @ Conditions can be inside conditions, and this nesting has a stack that is independent of the |save_stack|. Four global variables represent the top of the condition stack: |cond_ptr| points to pushed-down entries, if any; |if_limit| specifies the largest code of a |fi_or_else| command that is syntactically legal; |cur_if| is the name of the current type of conditional; and |if_line| is the line number at which it began. If no conditions are currently in progress, the condition stack has the special state |cond_ptr==null|, |if_limit==normal|, |cur_if==0|, |if_line==0|. Otherwise |cond_ptr| points to a two-word node; the |type|, |subtype|, and |link| fields of the first word contain |if_limit|, |cur_if|, and |cond_ptr| at the next level, and the second word contains the corresponding |if_line|. @d if_node_size 2 /*number of words in stack entry for conditionals*/ @d if_line_field(X) mem[X+1].i @d if_code 1 /*code for \.{\\if...} being evaluated*/ @d fi_code 2 /*code for \.{\\fi}*/ @d else_code 3 /*code for \.{\\else}*/ @d or_code 4 /*code for \.{\\or}*/ @<Glob...@>= pointer @!cond_ptr; /*top of the condition stack*/ uint8_t @!if_limit; /*upper bound on |fi_or_else| codes*/ small_number @!cur_if; /*type of conditional being worked on*/ int @!if_line; /*line where that conditional began*/ @ @<Set init...@>= cond_ptr=null;if_limit=normal;cur_if=0;if_line=0; @ @<Put each...@>= primitive(@[@<|"fi"|@>@], fi_or_else, fi_code); @!@:fi_}{\.{\\fi} primitive@> text(frozen_fi)=@[@<|"fi"|@>@];eqtb[frozen_fi]=eqtb[cur_val]; primitive(@[@<|"or"|@>@], fi_or_else, or_code); @!@:or_}{\.{\\or} primitive@> primitive(@[@<|"else"|@>@], fi_or_else, else_code); @!@:else_}{\.{\\else} primitive@> @ @<Cases of |print_cmd_chr|...@>= case fi_or_else: if (chr_code==fi_code) print_esc(@[@<|"fi"|@>@]); else if (chr_code==or_code) print_esc(@[@<|"or"|@>@]); else print_esc(@[@<|"else"|@>@]);@+break; @ When we skip conditional text, we keep track of the line number where skipping began, for use in error messages. @<Glob...@>= int @!skip_line; /*skipping began here*/ @ Here is a procedure that ignores text until coming to an \.{\\or}, \.{\\else}, or \.{\\fi} at level zero of $\.{\\if}\ldots\.{\\fi}$ nesting. After it has acted, |cur_chr| will indicate the token that was found, but |cur_tok| will not be set (because this makes the procedure run faster). @p void pass_text(void) {@+ int l; /*level of $\.{\\if}\ldots\.{\\fi}$ nesting*/ small_number @!save_scanner_status; /*|scanner_status| upon entry*/ save_scanner_status=scanner_status;scanner_status=skipping;l=0; skip_line=line; loop@+{@+get_next(); if (cur_cmd==fi_or_else) {@+if (l==0) goto done; if (cur_chr==fi_code) decr(l); } else if (cur_cmd==if_test) incr(l); } done: scanner_status=save_scanner_status; } @ When we begin to process a new \.{\\if}, we set |if_limit=if_code|; then if\/ \.{\\or} or \.{\\else} or \.{\\fi} occurs before the current \.{\\if} condition has been evaluated, \.{\\relax} will be inserted. For example, a sequence of commands like `\.{\\ifvoid1\\else...\\fi}' would otherwise require something after the `\.1'. @<Push the condition stack@>= {@+p=get_node(if_node_size);link(p)=cond_ptr;type(p)=if_limit; subtype(p)=cur_if;if_line_field(p)=if_line; cond_ptr=p;cur_if=cur_chr;if_limit=if_code;if_line=line; } @ @<Pop the condition stack@>= {@+p=cond_ptr;if_line=if_line_field(p); cur_if=subtype(p);if_limit=type(p);cond_ptr=link(p); free_node(p, if_node_size); } @ Here's a procedure that changes the |if_limit| code corresponding to a given value of |cond_ptr|. @p void change_if_limit(small_number @!l, pointer @!p) {@+ pointer q; if (p==cond_ptr) if_limit=l; /*that's the easy case*/ else{@+q=cond_ptr; loop@+{@+if (q==null) confusion(@[@<|"if"|@>@]); @:this can't happen if}{\quad if@> if (link(q)==p) {@+type(q)=l;return; } q=link(q); } } } @ A condition is started when the |expand| procedure encounters an |if_test| command; in that case |expand| reduces to |conditional|, which is a recursive procedure. @^recursion@> @p void conditional(void) {@+ bool b; /*is the condition true?*/ uint8_t @!r; /*relation to be evaluated*/ int @!m, @!n; /*to be tested against the second operand*/ pointer @!p, @!q; /*for traversing token lists in \.{\\ifx} tests*/ small_number @!save_scanner_status; /*|scanner_status| upon entry*/ pointer @!save_cond_ptr; /*|cond_ptr| corresponding to this conditional*/ small_number @!this_if; /*type of this conditional*/ @<Push the condition stack@>;@+save_cond_ptr=cond_ptr;this_if=cur_chr;@/ @<Either process \.{\\ifcase} or set |b| to the value of a boolean condition@>; if (tracing_commands > 1) @<Display the value of |b|@>; if (b) {@+change_if_limit(else_code, save_cond_ptr); return; /*wait for \.{\\else} or \.{\\fi}*/ } @<Skip to \.{\\else} or \.{\\fi}, then |goto common_ending|@>; common_ending: if (cur_chr==fi_code) @<Pop the condition stack@>@; else if_limit=fi_code; /*wait for \.{\\fi}*/ } @ In a construction like `\.{\\if\\iftrue abc\\else d\\fi}', the first \.{\\else} that we come to after learning that the \.{\\if} is false is not the \.{\\else} we're looking for. Hence the following curious logic is needed. @ @<Skip to \.{\\else} or \.{\\fi}...@>= loop@+{@+pass_text(); if (cond_ptr==save_cond_ptr) {@+if (cur_chr!=or_code) goto common_ending; print_err("Extra ");print_esc(@[@<|"or"|@>@]); @.Extra \\or@> help1("I'm ignoring this; it doesn't match any \\if."); error(); } else if (cur_chr==fi_code) @<Pop the condition stack@>; } @ @<Either process \.{\\ifcase} or set |b|...@>= switch (this_if) { case if_char_code: case if_cat_code: @<Test if two characters match@>@;@+break; case if_int_code: case if_dim_code: @<Test relation between integers or dimensions@>@;@+break; case if_odd_code: @<Test if an integer is odd@>@;@+break; case if_vmode_code: b=(abs(mode)==vmode);@+break; case if_hmode_code: b=(abs(mode)==hmode);@+break; case if_mmode_code: b=(abs(mode)==mmode);@+break; case if_inner_code: b=(mode < 0);@+break; case if_void_code: case if_hbox_code: case if_vbox_code: @<Test box register status@>@;@+break; case ifx_code: @<Test if two tokens match@>@;@+break; case if_eof_code: {@+scan_four_bit_int();b=(read_open[cur_val]==closed); } @+break; case if_true_code: b=true;@+break; case if_false_code: b=false;@+break; case if_case_code: @<Select the appropriate case and |return| or |goto common_ending|@>; } /*there are no other cases*/ @ @<Display the value of |b|@>= {@+begin_diagnostic(); if (b) print_str("{true}");@+else print_str("{false}"); end_diagnostic(false); } @ Here we use the fact that |'<'|, |'='|, and |'>'| are consecutive ASCII codes. @^ASCII code@> @<Test relation between integers or dimensions@>= {@+if (this_if==if_int_code) scan_int();@+else scan_normal_dimen; n=cur_val;@<Get the next non-blank non-call...@>; if ((cur_tok >= other_token+'<')&&(cur_tok <= other_token+'>')) r=cur_tok-other_token; else{@+print_err("Missing = inserted for "); @.Missing = inserted@> print_cmd_chr(if_test, this_if); help1("I was expecting to see `<', `=', or `>'. Didn't."); back_error();r='='; } if (this_if==if_int_code) scan_int();@+else scan_normal_dimen; switch (r) { case '<': b=(n < cur_val);@+break; case '=': b=(n==cur_val);@+break; case '>': b=(n > cur_val); } } @ @<Test if an integer is odd@>= {@+scan_int();b=odd(cur_val); } @ @<Test box register status@>= {@+scan_eight_bit_int();p=box(cur_val); if (this_if==if_void_code) b=(p==null); else if (p==null) b=false; else if (this_if==if_hbox_code) b=(type(p)==hlist_node); else b=(type(p)==vlist_node); } @ An active character will be treated as category 13 following \.{\\if\\noexpand} or following \.{\\ifcat\\noexpand}. We use the fact that active characters have the smallest tokens, among all control sequences. @d get_x_token_or_active_char @t@>@; {@+get_x_token(); if (cur_cmd==relax) if (cur_chr==no_expand_flag) {@+cur_cmd=active_char; cur_chr=cur_tok-cs_token_flag-active_base; } } @<Test if two characters match@>= {@+get_x_token_or_active_char; if ((cur_cmd > active_char)||(cur_chr > 255)) /*not a character*/ {@+m=relax;n=256; } else{@+m=cur_cmd;n=cur_chr; } get_x_token_or_active_char; if ((cur_cmd > active_char)||(cur_chr > 255)) {@+cur_cmd=relax;cur_chr=256; } if (this_if==if_char_code) b=(n==cur_chr);@+else b=(m==cur_cmd); } @ Note that `\.{\\ifx}' will declare two macros different if one is \\{long} or \\{outer} and the other isn't, even though the texts of the macros are the same. We need to reset |scanner_status|, since \.{\\outer} control sequences are allowed, but we might be scanning a macro definition or preamble. @<Test if two tokens match@>= {@+save_scanner_status=scanner_status;scanner_status=normal; get_next();n=cur_cs;p=cur_cmd;q=cur_chr; get_next();if (cur_cmd!=p) b=false; else if (cur_cmd < call) b=(cur_chr==q); else@<Test if two macro texts match@>; scanner_status=save_scanner_status; } @ Note also that `\.{\\ifx}' decides that macros \.{\\a} and \.{\\b} are different in examples like this: $$\vbox{\halign{\.{#}\hfil&\qquad\.{#}\hfil\cr {}\\def\\a\{\\c\}& {}\\def\\c\{\}\cr {}\\def\\b\{\\d\}& {}\\def\\d\{\}\cr}}$$ @<Test if two macro texts match@>= {@+p=link(cur_chr);q=link(equiv(n)); /*omit reference counts*/ if (p==q) b=true; else{@+while ((p!=null)&&(q!=null)) if (info(p)!=info(q)) p=null; else{@+p=link(p);q=link(q); } b=((p==null)&&(q==null)); } } @ @<Select the appropriate case and |return| or |goto common_ending|@>= {@+scan_int();n=cur_val; /*|n| is the number of cases to pass*/ if (tracing_commands > 1) {@+begin_diagnostic();print_str("{case ");print_int(n);print_char('}'); end_diagnostic(false); } while (n!=0) {@+pass_text(); if (cond_ptr==save_cond_ptr) if (cur_chr==or_code) decr(n); else goto common_ending; else if (cur_chr==fi_code) @<Pop the condition stack@>; } change_if_limit(or_code, save_cond_ptr); return; /*wait for \.{\\or}, \.{\\else}, or \.{\\fi}*/ } @ The processing of conditionals is complete except for the following code, which is actually part of |expand|. It comes into play when \.{\\or}, \.{\\else}, or \.{\\fi} is scanned. @<Terminate the current conditional and skip to \.{\\fi}@>= if (cur_chr > if_limit) if (if_limit==if_code) insert_relax(); /*condition not yet evaluated*/ else{@+print_err("Extra ");print_cmd_chr(fi_or_else, cur_chr); @.Extra \\or@> @.Extra \\else@> @.Extra \\fi@> help1("I'm ignoring this; it doesn't match any \\if."); error(); } else{@+while (cur_chr!=fi_code) pass_text(); /*skip to \.{\\fi}*/ @<Pop the condition stack@>; } @* File names. It's time now to fret about file names. Besides the fact that different operating systems treat files in different ways, we must cope with the fact that completely different naming conventions are used by different groups of people. The following programs show what is required for one particular operating system; similar routines for other systems are not difficult to devise. @^fingers@> @^system dependencies@> \TeX\ assumes that a file name has three parts: the name proper; its ``extension''; and a ``file area'' where it is found in an external file system. The extension of an input file or a write file is assumed to be `\.{.tex}' unless otherwise specified; it is `\.{.log}' on the transcript file that records each run of \TeX; it is `\.{.tfm}' on the font metric files that describe characters in the fonts \TeX\ uses; it is `\.{.dvi}' on the output files that specify typesetting information; and it is `\.{.fmt}' on the format files written by \.{INITEX} to initialize \TeX. The file area can be arbitrary on input files, but files are usually output to the user's current area. If an input file cannot be found on the specified area, \TeX\ will look for it on a special system area; this special area is intended for commonly used input files like \.{webmac.tex}. Simple uses of \TeX\ refer only to file names that have no explicit extension or area. For example, a person usually says `\.{\\input} \.{paper}' or `\.{\\font\\tenrm} \.= \.{helvetica}' instead of `\.{\\input} \.{paper.new}' or `\.{\\font\\tenrm} \.= \.{<csd.knuth>test}'. Simple file names are best, because they make the \TeX\ source files portable; whenever a file name consists entirely of letters and digits, it should be treated in the same way by all implementations of \TeX. However, users need the ability to refer to other files in their environment, especially when responding to error messages concerning unopenable files; therefore we want to let them use the syntax that appears in their favorite operating system. The following procedures don't allow spaces to be part of file names; but some users seem to like names that are spaced-out. System-dependent changes to allow such things should probably be made with reluctance, and only when an entire file name that includes spaces is ``quoted'' somehow. @ In order to isolate the system-dependent aspects of file names, the @^system dependencies@> system-independent parts of \TeX\ are expressed in terms of three system-dependent procedures called |begin_name|, |more_name|, and |end_name|. In essence, if the user-specified characters of the file name are $c_1\ldots c_n$, the system-independent driver program does the operations $$|begin_name|;\,|more_name|(c_1);\,\ldots\,;\,|more_name|(c_n); \,|end_name|.$$ These three procedures communicate with each other via global variables. Afterwards the file name will appear in the string pool as three strings called |cur_name|\penalty10000\hskip-.05em, |cur_area|, and |cur_ext|; the latter two are null (i.e., |empty_string|), unless they were explicitly specified by the user. Actually the situation is slightly more complicated, because \TeX\ needs to know when the file name ends. The |more_name| routine is a function (with side effects) that returns |true| on the calls |more_name|$(c_1)$, \dots, |more_name|$(c_{n-1})$. The final call |more_name|$(c_n)$ returns |false|; or, it returns |true| and the token following $c_n$ is something like `\.{\\hbox}' (i.e., not a character). In other words, |more_name| is supposed to return |true| unless it is sure that the file name has been completely scanned; and |end_name| is supposed to be able to finish the assembly of |cur_name|, |cur_area|, and |cur_ext| regardless of whether $|more_name|(c_n)$ returned |true| or |false|. @<Glob...@>= str_number @!cur_name; /*name of file just scanned*/ str_number @!cur_area; /*file area just scanned, or \.{""}*/ str_number @!cur_ext; /*file extension just scanned, or \.{""}*/ @ The file names we shall deal with for illustrative purposes have the following structure: If the name contains `\.>' or `\.:', the file area consists of all characters up to and including the final such character; otherwise the file area is null. If the remaining file name contains `\..', the file extension consists of all such characters from the first remaining `\..' to the end, otherwise the file extension is null. @^system dependencies@> We can scan such file names easily by using two global variables that keep track of the occurrences of area and extension delimiters: @<Glob...@>= pool_pointer @!area_delimiter; /*the most recent `\.>' or `\.:', if any*/ pool_pointer @!ext_delimiter; /*the relevant `\..', if any*/ @ Input files that can't be found in the user's area may appear in a standard system area called |TEX_area|. Font metric files whose areas are not given explicitly are assumed to appear in a standard system area called |TEX_font_area|. These system area names will, of course, vary from place to place. @^system dependencies@> @.TeXinputs@> @.TeXfonts@> @ Here now is the first of the system-dependent routines for file name scanning. @^system dependencies@> @p void begin_name(void) {@+area_delimiter=0;ext_delimiter=0; } @ And here's the second. The string pool might change as the file name is being scanned, since a new \.{\\csname} might be entered; therefore we keep |area_delimiter| and |ext_delimiter| relative to the beginning of the current string, instead of assigning an absolute address like |pool_ptr| to them. @^system dependencies@> @p bool more_name(ASCII_code @!c) {@+if (c==' ') return false; else{@+str_room(1);append_char(c); /*contribute |c| to the current string*/ if ((c=='>')||(c==':')) {@+area_delimiter=cur_length;ext_delimiter=0; } else if ((c=='.')&&(ext_delimiter==0)) ext_delimiter=cur_length; return true; } } @ The third. @^system dependencies@> @p void end_name(void) {@+if (str_ptr+3 > max_strings) overflow("number of strings", max_strings-init_str_ptr); @:TeX capacity exceeded number of strings}{\quad number of strings@> if (area_delimiter==0) cur_area=empty_string; else{@+cur_area=str_ptr; str_start[str_ptr+1]=str_start[str_ptr]+area_delimiter;incr(str_ptr); } if (ext_delimiter==0) {@+cur_ext=empty_string;cur_name=make_string(); } else{@+cur_name=str_ptr; str_start[str_ptr+1]=str_start[str_ptr]+ext_delimiter-area_delimiter-1; incr(str_ptr);cur_ext=make_string(); } } @ Conversely, here is a routine that takes three strings and prints a file name that might have produced them. (The routine is system dependent, because some operating systems put the file area last instead of first.) @^system dependencies@> @<Basic printing...@>= void print_file_name(int @!n, int @!a, int @!e) {@+slow_print(a);slow_print(n);slow_print(e); } @ Another system-dependent routine is needed to convert three internal \TeX\ strings into the |name_of_file| value that is used to open files. The present code allows both lowercase and uppercase letters in the file name. @^system dependencies@> @d append_to_name(X) {@+c=X;incr(k); if (k <= file_name_size) name_of_file[k]=xchr[c]; } @p void pack_file_name(str_number @!n, str_number @!a, str_number @!e) {@+int k; /*number of positions filled in |name_of_file|*/ ASCII_code @!c; /*character being packed*/ int @!j; /*index into |str_pool|*/ k=0; for (j=str_start[a]; j<=str_start[a+1]-1; j++) append_to_name(so(str_pool[j])); for (j=str_start[n]; j<=str_start[n+1]-1; j++) append_to_name(so(str_pool[j])); for (j=str_start[e]; j<=str_start[e+1]-1; j++) append_to_name(so(str_pool[j])); if (k <= file_name_size) name_length=k;@+else name_length=file_name_size; name_of_file[name_length+1]=0; } @ A messier routine is also needed, since format file names must be scanned before \TeX's string mechanism has been initialized. We shall use the global variable |TEX_format_default| to supply the text for default system areas and extensions related to format files. @^system dependencies@> @d format_default_length 20 /*length of the |TEX_format_default| string*/ @d format_area_length 11 /*length of its area part*/ @d format_ext_length 4 /*length of its `\.{.fmt}' part*/ @d format_extension_str ".fmt" /*the extension, as a \.{WEB} constant*/ @<Glob...@>= ASCII_code @!TEX_format_default[1+format_default_length+1]=" TeXformats/plain.fmt"; @.TeXformats@> @.plain@> @^system dependencies@> @ @<Check the ``constant'' values for consistency@>= if (format_default_length > file_name_size) bad=31; @ Here is the messy routine that was just mentioned. It sets |name_of_file| from the first |n| characters of |TEX_format_default|, followed by |buffer[a dotdot b]|, followed by the last |format_ext_length| characters of |TEX_format_default|. We dare not give error messages here, since \TeX\ calls this routine before the |error| routine is ready to roll. Instead, we simply drop excess characters, since the error will be detected in another way when a strange file name isn't found. @^system dependencies@> @p void pack_buffered_name(small_number @!n, int @!a, int @!b) {@+int k; /*number of positions filled in |name_of_file|*/ ASCII_code @!c; /*character being packed*/ int @!j; /*index into |buffer| or |TEX_format_default|*/ if (n+b-a+1+format_ext_length > file_name_size) b=a+file_name_size-n-1-format_ext_length; k=0; for (j=1; j<=n; j++) append_to_name(xord[TEX_format_default[j]]); for (j=a; j<=b; j++) append_to_name(buffer[j]); for (j=format_default_length-format_ext_length+1; j<=format_default_length; j++) append_to_name(xord[TEX_format_default[j]]); if (k <= file_name_size) name_length=k;@+else name_length=file_name_size; name_of_file[name_length+1]=0; } @ Here is the only place we use |pack_buffered_name|. This part of the program becomes active when a ``virgin'' \TeX\ is trying to get going, just after the preliminary initialization, or when the user is substituting another format file by typing `\.\&' after the initial `\.{**}' prompt. The buffer contains the first line of input in |buffer[loc dotdot(last-1)]|, where |loc < last| and |buffer[loc]!=' '|. @<Declare the function called |open_fmt_file|@>= bool open_fmt_file(void) {@+ uint16_t j; /*the first space after the format file name*/ j=loc; if (buffer[loc]=='&') {@+incr(loc);j=loc;buffer[last]=' '; while (buffer[j]!=' ') incr(j); pack_buffered_name(0, loc, j-1); /*try first without the system file area*/ if (w_open_in(&fmt_file)) goto found; pack_buffered_name(format_area_length, loc, j-1); /*now try the system format file area*/ if (w_open_in(&fmt_file)) goto found; wake_up_terminal; wterm_ln("Sorry, I can't find that format; will try PLAIN."); @.Sorry, I can't find...@> update_terminal; } /*now pull out all the stops: try for the system \.{plain} file*/ pack_buffered_name(format_default_length-format_ext_length, 1, 0); if (!w_open_in(&fmt_file)) {@+wake_up_terminal; wterm_ln("I can't find the PLAIN format file!"); @.I can't find PLAIN...@> @.plain@> return false; } found: loc=j;return true; } @ Operating systems often make it possible to determine the exact name (and possible version number) of a file that has been opened. The following routine, which simply makes a \TeX\ string from the value of |name_of_file|, should ideally be changed to deduce the full name of file~|f|, which is the file most recently opened, if it is possible to do this in a \PASCAL\ program. @^system dependencies@> This routine might be called after string memory has overflowed, hence we dare not use `|str_room|'. @p str_number make_name_string(void) {@+int k; /*index into |name_of_file|*/ if ((pool_ptr+name_length > pool_size)||(str_ptr==max_strings)|| (cur_length > 0)) return'?'; else{@+for (k=1; k<=name_length; k++) append_char(xord[name_of_file[k]]); return make_string(); } } str_number a_make_name_string(alpha_file *f) {@+return make_name_string(); } str_number b_make_name_string(byte_file *f) {@+return make_name_string(); } str_number w_make_name_string(word_file *f) {@+return make_name_string(); } @ Now let's consider the ``driver'' routines by which \TeX\ deals with file names in a system-independent manner. First comes a procedure that looks for a file name in the input by calling |get_x_token| for the information. @p void scan_file_name(void) {@+ name_in_progress=true;begin_name(); @<Get the next non-blank non-call...@>; loop@+{@+if ((cur_cmd > other_char)||(cur_chr > 255)) /*not a character*/ {@+back_input();goto done; } if (!more_name(cur_chr)) goto done; get_x_token(); } done: end_name();name_in_progress=false; } @ The global variable |name_in_progress| is used to prevent recursive use of |scan_file_name|, since the |begin_name| and other procedures communicate via global variables. Recursion would arise only by devious tricks like `\.{\\input\\input f}'; such attempts at sabotage must be thwarted. Furthermore, |name_in_progress| prevents \.{\\input} @^recursion@> from being initiated when a font size specification is being scanned. Another global variable, |job_name|, contains the file name that was first \.{\\input} by the user. This name is extended by `\.{.log}' and `\.{.dvi}' and `\.{.fmt}' in the names of \TeX's output files. @<Glob...@>= bool @!name_in_progress; /*is a file name being scanned?*/ str_number @!job_name; /*principal file name*/ bool @!log_opened; /*has the transcript file been opened?*/ @ Initially |job_name==0|; it becomes nonzero as soon as the true name is known. We have |job_name==0| if and only if the `\.{log}' file has not been opened, except of course for a short time just after |job_name| has become nonzero. @<Initialize the output...@>= job_name=0;name_in_progress=false;log_opened=false; @ Here is a routine that manufactures the output file names, assuming that |job_name!=0|. It ignores and changes the current settings of |cur_area| and |cur_ext|. @d pack_cur_name pack_file_name(cur_name, cur_area, cur_ext) @p void pack_job_name(str_number @!s) /*|s==@[@<|".log"|@>@]|, |@[@<|".dvi"|@>@]|, or |format_extension|*/ {@+cur_area=empty_string;cur_ext=s; cur_name=job_name;pack_cur_name; } @ If some trouble arises when \TeX\ tries to open a file, the following routine calls upon the user to supply another file name. Parameter~|s| is used in the error message to identify the type of file; parameter~|e| is the default extension if none is given. Upon exit from the routine, variables |cur_name|, |cur_area|, |cur_ext|, and |name_of_file| are ready for another attempt at file opening. @p void prompt_file_name(char *@!s, str_number @!e) {@+ uint16_t k; /*index into |buffer|*/ if (interaction==scroll_mode) wake_up_terminal; if (strcmp(s,"input file name")==0) print_err("I can't find file `")@; @.I can't find file x@> else print_err("I can't write on file `"); @.I can't write on file x@> print_file_name(cur_name, cur_area, cur_ext);print_str("'."); if (e==@[@<|".tex"|@>@]) show_context(); print_nl("Please type another ");print_str(s); @.Please type...@> if (interaction < scroll_mode) fatal_error("*** (job aborted, file error in nonstop mode)"); @.job aborted, file error...@> clear_terminal;prompt_input(": ");@<Scan file name in the buffer@>; if (cur_ext==empty_string) cur_ext=e; pack_cur_name; } @ @<Scan file name in the buffer@>= {@+begin_name();k=first; while ((buffer[k]==' ')&&(k < last)) incr(k); loop@+{@+if (k==last) goto done; if (!more_name(buffer[k])) goto done; incr(k); } done: end_name(); } @ Here's an example of how these conventions are used. Whenever it is time to ship out a box of stuff, we shall use |@<ensure dvi open@>|. @<ensure dvi open@>= if (output_file_name==0) {@+if (job_name==0) open_log_file(); pack_job_name(@[@<|".dvi"|@>@]); while (!b_open_out(&dvi_file)) prompt_file_name("file name for output",@[@<|".dvi"|@>@]); output_file_name=b_make_name_string(&dvi_file); } @ @<Glob...@>= byte_file @!dvi_file; /*the device-independent output goes here*/ str_number @!output_file_name; /*full name of the output file*/ str_number @!log_name; /*full name of the log file*/ @ @<Initialize the output...@>=output_file_name=0; @ The |open_log_file| routine is used to open the transcript file and to help it catch up to what has previously been printed on the terminal. @p void open_log_file(void) {@+uint8_t old_setting; /*previous |selector| setting*/ int @!k; /*index into |months| and |buffer|*/ uint16_t @!l; /*end of first input line*/ @!ASCII_code @!months[]=" JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC"; /*abbreviations of month names*/ old_setting=selector; if (job_name==0) job_name=@[@<|"texput"|@>@]; @.texput@> pack_job_name(@[@<|".log"|@>@]); while (!a_open_out(&log_file)) @<Try to get a different log file name@>; log_name=a_make_name_string(&log_file); selector=log_only;log_opened=true; @<Print the banner line, including the date and time@>; input_stack[input_ptr]=cur_input; /*make sure bottom level is in memory*/ print_nl("**"); @.**@> l=input_stack[0].limit_field; /*last position of first line*/ if (buffer[l]==end_line_char) decr(l); for (k=1; k<=l; k++) print(buffer[k]); print_ln(); /*now the transcript file contains the first line of input*/ selector=old_setting+2; /*|log_only| or |term_and_log|*/ } @ Sometimes |open_log_file| is called at awkward moments when \TeX\ is unable to print error messages or even to |show_context|. The |prompt_file_name| routine can result in a |fatal_error|, but the |error| routine will not be invoked because |log_opened| will be false. The normal idea of |batch_mode| is that nothing at all should be written on the terminal. However, in the unusual case that no log file could be opened, we make an exception and allow an explanatory message to be seen. Incidentally, the program always refers to the log file as a `\.{transcript file}', because some systems cannot use the extension `\.{.log}' for this file. @<Try to get a different log file name@>= {@+selector=term_only; prompt_file_name("transcript file name",@[@<|".log"|@>@]); } @ @<Print the banner...@>= {@+wlog("%s",banner); slow_print(format_ident);print_str(" "); print_int(day);print_char(' '); for (k=3*month-2; k<=3*month; k++) wlog("%c",months[k]); print_char(' ');print_int(year);print_char(' '); print_two(time/60);print_char(':');print_two(time%60); } @ Let's turn now to the procedure that is used to initiate file reading when an `\.{\\input}' command is being processed. @p void start_input(void) /*\TeX\ will \.{\\input} something*/ {@+ scan_file_name(); /*set |cur_name| to desired file name*/ if (cur_ext==empty_string) cur_ext=@[@<|".tex"|@>@]; pack_cur_name; loop@+{@+begin_file_reading(); /*set up |cur_file| and new level of input*/ if (a_open_in(&cur_file)) goto done; if (cur_area==empty_string) {@+pack_file_name(cur_name, TEX_area, cur_ext); if (a_open_in(&cur_file)) goto done; } end_file_reading(); /*remove the level that didn't work*/ prompt_file_name("input file name",@[@<|".tex"|@>@]); } done: name=a_make_name_string(&cur_file); if (job_name==0) {@+job_name=cur_name;open_log_file(); } /*|open_log_file| doesn't |show_context|, so |limit| and |loc| needn't be set to meaningful values yet*/ if (term_offset+length(name) > max_print_line-2) print_ln(); else if ((term_offset > 0)||(file_offset > 0)) print_char(' '); print_char('(');incr(open_parens);slow_print(name);update_terminal; state=new_line; if (name==str_ptr-1) /*we can conserve string pool space now*/ {@+flush_string;name=cur_name; } @<Read the first line of the new file@>; } @ Here we have to remember to tell the |input_ln| routine not to start with a |get|. If the file is empty, it is considered to contain a single blank line. @^system dependencies@> @^empty line at end of file@> @<Read the first line...@>= {@+line=1; if (input_ln(&cur_file, false)) do_nothing; firm_up_the_line(); if (end_line_char_inactive) decr(limit); else buffer[limit]=end_line_char; first=limit+1;loc=start; } @* Font metric data. \TeX\ gets its knowledge about fonts from font metric files, also called \.{TFM} files; the `\.T' in `\.{TFM}' stands for \TeX, but other programs know about them too. @:TFM files}{\.{TFM} files@> @^font metric files@> The information in a \.{TFM} file appears in a sequence of 8-bit bytes. Since the number of bytes is always a multiple of 4, we could also regard the file as a sequence of 32-bit words, but \TeX\ uses the byte interpretation. The format of \.{TFM} files was designed by Lyle Ramshaw in 1980. The intent is to convey a lot of different kinds @^Ramshaw, Lyle Harold@> of information in a compact but useful form. @<Glob...@>= byte_file @!tfm_file; @ The first 24 bytes (6 words) of a \.{TFM} file contain twelve 16-bit integers that give the lengths of the various subsequent portions of the file. These twelve integers are, in order: $$\vbox{\halign{\hfil#&$\null=\null$#\hfil\cr |lf|&length of the entire file, in words;\cr |lh|&length of the header data, in words;\cr |bc|&smallest character code in the font;\cr |ec|&largest character code in the font;\cr |nw|&number of words in the width table;\cr |nh|&number of words in the height table;\cr |nd|&number of words in the depth table;\cr |ni|&number of words in the italic correction table;\cr |nl|&number of words in the lig/kern table;\cr |nk|&number of words in the kern table;\cr |ne|&number of words in the extensible character table;\cr |np|&number of font parameter words.\cr}}$$ They are all nonnegative and less than $2^{15}$. We must have |bc-1 <= ec <= 255|, and $$\hbox{|lf==6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np|.}$$ Note that a font may contain as many as 256 characters (if |bc==0| and |ec==255|), and as few as 0 characters (if |bc==ec+1|). Incidentally, when two or more 8-bit bytes are combined to form an integer of 16 or more bits, the most significant bytes appear first in the file. This is called BigEndian order. @!@^BigEndian order@> @ The rest of the \.{TFM} file may be regarded as a sequence of ten data arrays having the informal specification $$\def\arr$[#1]#2${\&{array} $[#1]$ \&{of} #2} \def\PB#1{\arr#1} \vbox{\halign{\hfil\\{#}&$\,:\,$#\hfil\cr header&|[0 dotdot lh-1]@t\\{stuff}@>|\cr char\_info&|[bc dotdot ec]char_info_word|\cr width&|[0 dotdot nw-1]fix_word|\cr height&|[0 dotdot nh-1]fix_word|\cr depth&|[0 dotdot nd-1]fix_word|\cr italic&|[0 dotdot ni-1]fix_word|\cr lig\_kern&|[0 dotdot nl-1]lig_kern_command|\cr kern&|[0 dotdot nk-1]fix_word|\cr exten&|[0 dotdot ne-1]extensible_recipe|\cr param&|[1 dotdot np]fix_word|\cr}}$$ The most important data type used here is a |@!fix_word|, which is a 32-bit representation of a binary fraction. A |fix_word| is a signed quantity, with the two's complement of the entire word used to represent negation. Of the 32 bits in a |fix_word|, exactly 12 are to the left of the binary point; thus, the largest |fix_word| value is $2048-2^{-20}$, and the smallest is $-2048$. We will see below, however, that all but two of the |fix_word| values must lie between $-16$ and $+16$. @ The first data array is a block of header information, which contains general facts about the font. The header must contain at least two words, |header[0]| and |header[1]|, whose meaning is explained below. Additional header information of use to other software routines might also be included, but \TeX82 does not need to know about such details. For example, 16 more words of header information are in use at the Xerox Palo Alto Research Center; the first ten specify the character coding scheme used (e.g., `\.{XEROX text}' or `\.{TeX math symbols}'), the next five give the font identifier (e.g., `\.{HELVETICA}' or `\.{CMSY}'), and the last gives the ``face byte.'' The program that converts \.{DVI} files to Xerox printing format gets this information by looking at the \.{TFM} file, which it needs to read anyway because of other information that is not explicitly repeated in \.{DVI}~format. \yskip\hang|header[0]| is a 32-bit check sum that \TeX\ will copy into the \.{DVI} output file. Later on when the \.{DVI} file is printed, possibly on another computer, the actual font that gets used is supposed to have a check sum that agrees with the one in the \.{TFM} file used by \TeX. In this way, users will be warned about potential incompatibilities. (However, if the check sum is zero in either the font file or the \.{TFM} file, no check is made.) The actual relation between this check sum and the rest of the \.{TFM} file is not important; the check sum is simply an identification number with the property that incompatible fonts almost always have distinct check sums. @^check sum@> \yskip\hang|header[1]| is a |fix_word| containing the design size of the font, in units of \TeX\ points. This number must be at least 1.0; it is fairly arbitrary, but usually the design size is 10.0 for a ``10 point'' font, i.e., a font that was designed to look best at a 10-point size, whatever that really means. When a \TeX\ user asks for a font `\.{at} $\delta$ \.{pt}', the effect is to override the design size and replace it by $\delta$, and to multiply the $x$ and~$y$ coordinates of the points in the font image by a factor of $\delta$ divided by the design size. {\sl All other dimensions in the\/ \.{TFM} file are |fix_word|\kern-1pt\ numbers in design-size units}, with the exception of |param[1]| (which denotes the slant ratio). Thus, for example, the value of |param[6]|, which defines the \.{em} unit, is often the |fix_word| value $2^{20}=1.0$, since many fonts have a design size equal to one em. The other dimensions must be less than 16 design-size units in absolute value; thus, |header[1]| and |param[1]| are the only |fix_word| entries in the whole \.{TFM} file whose first byte might be something besides 0 or 255. @ Next comes the |char_info| array, which contains one |@!char_info_word| per character. Each word in this part of the file contains six fields packed into four bytes as follows. \yskip\hang first byte: |@!width_index| (8 bits)\par \hang second byte: |@!height_index| (4 bits) times 16, plus |@!depth_index| (4~bits)\par \hang third byte: |@!italic_index| (6 bits) times 4, plus |@!tag| (2~bits)\par \hang fourth byte: |@!rem| (8 bits)\par \yskip\noindent The actual width of a character is \\{width}|[width_index]|, in design-size units; this is a device for compressing information, since many characters have the same width. Since it is quite common for many characters to have the same height, depth, or italic correction, the \.{TFM} format imposes a limit of 16 different heights, 16 different depths, and 64 different italic corrections. @!@^italic correction@> The italic correction of a character has two different uses. (a)~In ordinary text, the italic correction is added to the width only if the \TeX\ user specifies `\.{\\/}' after the character. (b)~In math formulas, the italic correction is always added to the width, except with respect to the positioning of subscripts. Incidentally, the relation $\\{width}[0]=\\{height}[0]=\\{depth}[0]= \\{italic}[0]=0$ should always hold, so that an index of zero implies a value of zero. The |width_index| should never be zero unless the character does not exist in the font, since a character is valid if and only if it lies between |bc| and |ec| and has a nonzero |width_index|. @ The |tag| field in a |char_info_word| has four values that explain how to interpret the |rem| field. \yskip\hangg|tag==0| (|no_tag|) means that |rem| is unused.\par \hangg|tag==1| (|lig_tag|) means that this character has a ligature/kerning program starting at position |rem| in the |lig_kern| array.\par \hangg|tag==2| (|list_tag|) means that this character is part of a chain of characters of ascending sizes, and not the largest in the chain. The |rem| field gives the character code of the next larger character.\par \hangg|tag==3| (|ext_tag|) means that this character code represents an extensible character, i.e., a character that is built up of smaller pieces so that it can be made arbitrarily large. The pieces are specified in |@!exten[rem]|.\par \yskip\noindent Characters with |tag==2| and |tag==3| are treated as characters with |tag==0| unless they are used in special circumstances in math formulas. For example, the \.{\\sum} operation looks for a |list_tag|, and the \.{\\left} operation looks for both |list_tag| and |ext_tag|. @d no_tag 0 /*vanilla character*/ @d lig_tag 1 /*character has a ligature/kerning program*/ @d list_tag 2 /*character has a successor in a charlist*/ @d ext_tag 3 /*character is extensible*/ @ The |lig_kern| array contains instructions in a simple programming language that explains what to do for special letter pairs. Each word in this array is a |@!lig_kern_command| of four bytes. \yskip\hang first byte: |skip_byte|, indicates that this is the final program step if the byte is 128 or more, otherwise the next step is obtained by skipping this number of intervening steps.\par \hang second byte: |next_char|, ``if |next_char| follows the current character, then perform the operation and stop, otherwise continue.''\par \hang third byte: |op_byte|, indicates a ligature step if less than~128, a kern step otherwise.\par \hang fourth byte: |rem|.\par \yskip\noindent In a kern step, an additional space equal to |kern[256*(op_byte-128)+rem]| is inserted between the current character and |next_char|. This amount is often negative, so that the characters are brought closer together by kerning; but it might be positive. There are eight kinds of ligature steps, having |op_byte| codes $4a+2b+c$ where $0\le a\le b+c$ and $0\le b,c\le1$. The character whose code is |rem| is inserted between the current character and |next_char|; then the current character is deleted if $b=0$, and |next_char| is deleted if $c=0$; then we pass over $a$~characters to reach the next current character (which may have a ligature/kerning program of its own). If the very first instruction of the |lig_kern| array has |skip_byte==255|, the |next_char| byte is the so-called right boundary character of this font; the value of |next_char| need not lie between |bc| and~|ec|. If the very last instruction of the |lig_kern| array has |skip_byte==255|, there is a special ligature/kerning program for a left boundary character, beginning at location |256*op_byte+rem|. The interpretation is that \TeX\ puts implicit boundary characters before and after each consecutive string of characters from the same font. These implicit characters do not appear in the output, but they can affect ligatures and kerning. If the very first instruction of a character's |lig_kern| program has |skip_byte > 128|, the program actually begins in location |256*op_byte+rem|. This feature allows access to large |lig_kern| arrays, because the first instruction must otherwise appear in a location | <= 255|. Any instruction with |skip_byte > 128| in the |lig_kern| array must satisfy the condition $$\hbox{|256*op_byte+rem < nl|.}$$ If such an instruction is encountered during normal program execution, it denotes an unconditional halt; no ligature or kerning command is performed. @d stop_flag qi(128) /*value indicating `\.{STOP}' in a lig/kern program*/ @d kern_flag qi(128) /*op code for a kern step*/ @d skip_byte(X) X.b0 @d next_char(X) X.b1 @d op_byte(X) X.b2 @d rem_byte(X) X.b3 @ Extensible characters are specified by an |@!extensible_recipe|, which consists of four bytes called |@!top|, |@!mid|, |@!bot|, and |@!rep| (in this order). These bytes are the character codes of individual pieces used to build up a large symbol. If |top|, |mid|, or |bot| are zero, they are not present in the built-up result. For example, an extensible vertical line is like an extensible bracket, except that the top and bottom pieces are missing. Let $T$, $M$, $B$, and $R$ denote the respective pieces, or an empty box if the piece isn't present. Then the extensible characters have the form $TR^kMR^kB$ from top to bottom, for some |k >= 0|, unless $M$ is absent; in the latter case we can have $TR^kB$ for both even and odd values of~|k|. The width of the extensible character is the width of $R$; and the height-plus-depth is the sum of the individual height-plus-depths of the components used, since the pieces are butted together in a vertical list. @d ext_top(X) X.b0 /*|top| piece in a recipe*/ @d ext_mid(X) X.b1 /*|mid| piece in a recipe*/ @d ext_bot(X) X.b2 /*|bot| piece in a recipe*/ @d ext_rep(X) X.b3 /*|rep| piece in a recipe*/ @ The final portion of a \.{TFM} file is the |param| array, which is another sequence of |fix_word| values. \yskip\hang|param[1]==slant| is the amount of italic slant, which is used to help position accents. For example, |slant==.25| means that when you go up one unit, you also go .25 units to the right. The |slant| is a pure number; it's the only |fix_word| other than the design size itself that is not scaled by the design size. \hang|param[2]==space| is the normal spacing between words in text. Note that character |' '| in the font need not have anything to do with blank spaces. \hang|param[3]==space_stretch| is the amount of glue stretching between words. \hang|param[4]==space_shrink| is the amount of glue shrinking between words. \hang|param[5]==x_height| is the size of one ex in the font; it is also the height of letters for which accents don't have to be raised or lowered. \hang|param[6]==quad| is the size of one em in the font. \hang|param[7]==extra_space| is the amount added to |param[2]| at the ends of sentences. \yskip\noindent If fewer than seven parameters are present, \TeX\ sets the missing parameters to zero. Fonts used for math symbols are required to have additional parameter information, which is explained later. @d slant_code 1 @d space_code 2 @d space_stretch_code 3 @d space_shrink_code 4 @d x_height_code 5 @d quad_code 6 @d extra_space_code 7 @ So that is what \.{TFM} files hold. Since \TeX\ has to absorb such information about lots of fonts, it stores most of the data in a large array called |font_info|. Each item of |font_info| is a |memory_word|; the |fix_word| data gets converted into |scaled| entries, while everything else goes into words of type |four_quarters|. When the user defines \.{\\font\\f}, say, \TeX\ assigns an internal number to the user's font~\.{\\f}. Adding this number to |font_id_base| gives the |eqtb| location of a ``frozen'' control sequence that will always select the font. @<Types...@>= typedef uint8_t internal_font_number; /*|font| in a |char_node|*/ typedef uint16_t font_index; /*index into |font_info|*/ @ Here now is the (rather formidable) array of font arrays. @d non_char qi(256) /*a |halfword| code that can't match a real character*/ @d non_address 0 /*a spurious |bchar_label|*/ @<Glob...@>= memory_word @!font_info[font_mem_size+1]; /*the big collection of font data*/ font_index @!fmem_ptr; /*first unused word of |font_info|*/ internal_font_number @!font_ptr; /*largest internal font number in use*/ four_quarters @!font_check0[font_max-font_base+1], *const @!font_check = @!font_check0-font_base; /*check sum*/ scaled @!font_size0[font_max-font_base+1], *const @!font_size = @!font_size0-font_base; /*``at'' size*/ scaled @!font_dsize0[font_max-font_base+1], *const @!font_dsize = @!font_dsize0-font_base; /*``design'' size*/ font_index @!font_params0[font_max-font_base+1], *const @!font_params = @!font_params0-font_base; /*how many font parameters are present*/ str_number @!font_name0[font_max-font_base+1], *const @!font_name = @!font_name0-font_base; /*name of the font*/ str_number @!font_area0[font_max-font_base+1], *const @!font_area = @!font_area0-font_base; /*area of the font*/ eight_bits @!font_bc0[font_max-font_base+1], *const @!font_bc = @!font_bc0-font_base; /*beginning (smallest) character code*/ eight_bits @!font_ec0[font_max-font_base+1], *const @!font_ec = @!font_ec0-font_base; /*ending (largest) character code*/ pointer @!font_glue0[font_max-font_base+1], *const @!font_glue = @!font_glue0-font_base; /*glue specification for interword space, |null| if not allocated*/ bool @!font_used0[font_max-font_base+1], *const @!font_used = @!font_used0-font_base; /*has a character from this font actually appeared in the output?*/ int @!hyphen_char0[font_max-font_base+1], *const @!hyphen_char = @!hyphen_char0-font_base; /*current \.{\\hyphenchar} values*/ int @!skew_char0[font_max-font_base+1], *const @!skew_char = @!skew_char0-font_base; /*current \.{\\skewchar} values*/ font_index @!bchar_label0[font_max-font_base+1], *const @!bchar_label = @!bchar_label0-font_base; /*start of |lig_kern| program for left boundary character, |non_address| if there is none*/ uint16_t @!font_bchar0[font_max-font_base+1], *const @!font_bchar = @!font_bchar0-font_base; /*right boundary character, |non_char| if there is none*/ uint16_t @!font_false_bchar0[font_max-font_base+1], *const @!font_false_bchar = @!font_false_bchar0-font_base; /*|font_bchar| if it doesn't exist in the font, otherwise |non_char|*/ @ Besides the arrays just enumerated, we have directory arrays that make it easy to get at the individual entries in |font_info|. For example, the |char_info| data for character |c| in font |f| will be in |font_info[char_base[f]+c].qqqq|; and if |w| is the |width_index| part of this word (the |b0| field), the width of the character is |font_info[width_base[f]+w].sc|. (These formulas assume that |min_quarterword| has already been added to |c| and to |w|, since \TeX\ stores its quarterwords that way.) @<Glob...@>= int @!char_base0[font_max-font_base+1], *const @!char_base = @!char_base0-font_base; /*base addresses for |char_info|*/ int @!width_base0[font_max-font_base+1], *const @!width_base = @!width_base0-font_base; /*base addresses for widths*/ int @!height_base0[font_max-font_base+1], *const @!height_base = @!height_base0-font_base; /*base addresses for heights*/ int @!depth_base0[font_max-font_base+1], *const @!depth_base = @!depth_base0-font_base; /*base addresses for depths*/ int @!italic_base0[font_max-font_base+1], *const @!italic_base = @!italic_base0-font_base; /*base addresses for italic corrections*/ int @!lig_kern_base0[font_max-font_base+1], *const @!lig_kern_base = @!lig_kern_base0-font_base; /*base addresses for ligature/kerning programs*/ int @!kern_base0[font_max-font_base+1], *const @!kern_base = @!kern_base0-font_base; /*base addresses for kerns*/ int @!exten_base0[font_max-font_base+1], *const @!exten_base = @!exten_base0-font_base; /*base addresses for extensible recipes*/ int @!param_base0[font_max-font_base+1], *const @!param_base = @!param_base0-font_base; /*base addresses for font parameters*/ @ @<Set init...@>= for (k=font_base; k<=font_max; k++) font_used[k]=false; @ \TeX\ always knows at least one font, namely the null font. It has no characters, and its seven parameters are all equal to zero. @<Initialize table...@>= font_ptr=null_font;fmem_ptr=7; font_name[null_font]=@[@<|"nullfont"|@>@];font_area[null_font]=empty_string; hyphen_char[null_font]='-';skew_char[null_font]=-1; bchar_label[null_font]=non_address; font_bchar[null_font]=non_char;font_false_bchar[null_font]=non_char; font_bc[null_font]=1;font_ec[null_font]=0; font_size[null_font]=0;font_dsize[null_font]=0; char_base[null_font]=0;width_base[null_font]=0; height_base[null_font]=0;depth_base[null_font]=0; italic_base[null_font]=0;lig_kern_base[null_font]=0; kern_base[null_font]=0;exten_base[null_font]=0; font_glue[null_font]=null;font_params[null_font]=7; param_base[null_font]=-1; for (k=0; k<=6; k++) font_info[k].sc=0; @ @<Put each...@>= primitive(@[@<|"nullfont"|@>@], set_font, null_font); @!@:null_font_}{\.{\\nullfont} primitive@> text(frozen_null_font)=@[@<|"nullfont"|@>@];eqtb[frozen_null_font]=eqtb[cur_val]; @ Of course we want to define macros that suppress the detail of how font information is actually packed, so that we don't have to write things like $$\hbox{|font_info[width_base[f]+font_info[char_base[f]+c].qqqq.b0].sc|}$$ too often. The \.{WEB} definitions here make |char_info(f)(c)| the |four_quarters| word of font information corresponding to character |c| of font |f|. If |q| is such a word, |char_width(f)(q)| will be the character's width; hence the long formula above is at least abbreviated to $$\hbox{|char_width(f)(char_info(f)(c))|.}$$ Usually, of course, we will fetch |q| first and look at several of its fields at the same time. The italic correction of a character will be denoted by |char_italic(f)(q)|, so it is analogous to |char_width|. But we will get at the height and depth in a slightly different way, since we usually want to compute both height and depth if we want either one. The value of |height_depth(q)| will be the 8-bit quantity $$b=|height_index|\times16+|depth_index|,$$ and if |b| is such a byte we will write |char_height(f)(b)| and |char_depth(f)(b)| for the height and depth of the character |c| for which |q==char_info(f)(c)|. Got that? The tag field will be called |char_tag(q)|; the remainder byte will be called |rem_byte(q)|, using a macro that we have already defined above. Access to a character's |width|, |height|, |depth|, and |tag| fields is part of \TeX's inner loop, so we want these macros to produce code that is as fast as possible under the circumstances. @^inner loop@> @d char_info_end(X) X].qqqq @d char_info(X) font_info[char_base[X]+char_info_end @d char_width_end(X) X.b0].sc @d char_width(X) font_info[width_base[X]+char_width_end @d char_exists(X) (X.b0 > min_quarterword) @d char_italic_end(X) (qo(X.b2))/4].sc @d char_italic(X) font_info[italic_base[X]+char_italic_end @d height_depth(X) qo(X.b1) @d char_height_end(X) (X)/16].sc @d char_height(X) font_info[height_base[X]+char_height_end @d char_depth_end(X) (X)%16].sc @d char_depth(X) font_info[depth_base[X]+char_depth_end @d char_tag(X) ((qo(X.b2))%4) @ The global variable |null_character| is set up to be a word of |char_info| for a character that doesn't exist. Such a word provides a convenient way to deal with erroneous situations. @<Glob...@>= four_quarters @!null_character; /*nonexistent character information*/ @ @<Set init...@>= null_character.b0=min_quarterword;null_character.b1=min_quarterword; null_character.b2=min_quarterword;null_character.b3=min_quarterword; @ Here are some macros that help process ligatures and kerns. We write |char_kern(f)(j)| to find the amount of kerning specified by kerning command~|j| in font~|f|. If |j| is the |char_info| for a character with a ligature/kern program, the first instruction of that program is either |i==font_info[lig_kern_start(f)(j)]| or |font_info[lig_kern_restart(f)(i)]|, depending on whether or not |skip_byte(i) <= stop_flag|. The constant |kern_base_offset| should be simplified, for \PASCAL\ compilers that do not do local optimization. @^system dependencies@> @d char_kern_end(X) 256*op_byte(X)+rem_byte(X)].sc @d char_kern(X) font_info[kern_base[X]+char_kern_end @d kern_base_offset 256*(128+min_quarterword) @d lig_kern_start(X) lig_kern_base[X]+rem_byte /*beginning of lig/kern program*/ @d lig_kern_restart_end(X) 256*op_byte(X)+rem_byte(X)+32768-kern_base_offset @d lig_kern_restart(X) lig_kern_base[X]+lig_kern_restart_end @ Font parameters are referred to as |slant(f)|, |space(f)|, etc. @d param_end(X) param_base[X]].sc @d param(X) font_info[X+param_end @d slant param(slant_code) /*slant to the right, per unit distance upward*/ @d space param(space_code) /*normal space between words*/ @d space_stretch param(space_stretch_code) /*stretch between words*/ @d space_shrink param(space_shrink_code) /*shrink between words*/ @d x_height param(x_height_code) /*one ex*/ @d quad param(quad_code) /*one em*/ @d extra_space param(extra_space_code) /*additional space at end of sentence*/ @<The em width for |cur_font|@>=quad(cur_font) @ @<The x-height for |cur_font|@>=x_height(cur_font) @ \TeX\ checks the information of a \.{TFM} file for validity as the file is being read in, so that no further checks will be needed when typesetting is going on. The somewhat tedious subroutine that does this is called |read_font_info|. It has four parameters: the user font identifier~|u|, the file name and area strings |nom| and |aire|, and the ``at'' size~|s|. If |s|~is negative, it's the negative of a scale factor to be applied to the design size; |s==-1000| is the normal case. Otherwise |s| will be substituted for the design size; in this case, |s| must be positive and less than $2048\rm\,pt$ (i.e., it must be less than $2^{27}$ when considered as an integer). The subroutine opens and closes a global file variable called |tfm_file|. It returns the value of the internal font number that was just loaded. If an error is detected, an error message is issued and no font information is stored; |null_font| is returned in this case. @d abort goto bad_tfm /*do this when the \.{TFM} data is wrong*/ @p internal_font_number read_font_info(pointer @!u, str_number @!nom, str_number @!aire, scaled @!s) /*input a \.{TFM} file*/ {@+ font_index k; /*index into |font_info|*/ bool @!file_opened; /*was |tfm_file| successfully opened?*/ halfword @!lf, @!lh, @!bc, @!ec, @!nw, @!nh, @!nd, @!ni, @!nl, @!nk, @!ne, @!np; /*sizes of subfiles*/ internal_font_number @!f; /*the new font's number*/ internal_font_number @!g; /*the number to return*/ eight_bits @!a, @!b, @!c, @!d; /*byte variables*/ four_quarters @!qw;scaled @!sw; /*accumulators*/ int @!bch_label; /*left boundary start location, or infinity*/ uint16_t @!bchar; /*right boundary character, or 256*/ scaled @!z; /*the design size or the ``at'' size*/ int @!alpha;uint8_t @!beta; /*auxiliary quantities used in fixed-point multiplication*/ g=null_font;@/ @<Read and check the font data; |abort| if the \.{TFM} file is malformed; if there's no room for this font, say so and |goto done|; otherwise |incr(font_ptr)| and |goto done|@>; bad_tfm: @<Report that the font won't be loaded@>; done: if (file_opened) b_close(&tfm_file); return g; } @ There are programs called \.{TFtoPL} and \.{PLtoTF} that convert between the \.{TFM} format and a symbolic property-list format that can be easily edited. These programs contain extensive diagnostic information, so \TeX\ does not have to bother giving precise details about why it rejects a particular \.{TFM} file. @.TFtoPL@> @.PLtoTF@> @d start_font_error_message print_err("Font ");sprint_cs(u); print_char('=');print_file_name(nom, aire, empty_string); if (s >= 0) {@+print_str(" at ");print_scaled(s);print_str("pt"); } else if (s!=-1000) {@+print_str(" scaled ");print_int(-s); } @<Report that the font won't be loaded@>= start_font_error_message; @.Font x=xx not loadable...@> if (file_opened) print_str(" not loadable: Bad metric (TFM) file"); else print_str(" not loadable: Metric (TFM) file not found"); help5("I wasn't able to read the size data for this font,")@/ ("so I will ignore the font specification.")@/ ("[Wizards can fix TFM files using TFtoPL/PLtoTF.]")@/ ("You might try inserting a different font spec;")@/ ("e.g., type `I\\font<same font id>=<substitute font name>'."); error() @ @<Read and check...@>= @<Open |tfm_file| for input@>; @<Read the {\.{TFM}} size fields@>; @<Use size fields to allocate font information@>; @<Read the {\.{TFM}} header@>; @<Read character data@>; @<Read box dimensions@>; @<Read ligature/kern program@>; @<Read extensible character recipes@>; @<Read font parameters@>; @<Make final adjustments and |goto done|@>@; @ @<Open |tfm_file| for input@>= file_opened=false; if (aire==empty_string) pack_file_name(nom, TEX_font_area,@[@<|".tfm"|@>@]); else pack_file_name(nom, aire,@[@<|".tfm"|@>@]); if (!b_open_in(&tfm_file)) abort; file_opened=true @ Note: A malformed \.{TFM} file might be shorter than it claims to be; thus |eof(tfm_file)| might be true when |read_font_info| refers to |tfm_file.d| or when it says |get(tfm_file)|. If such circumstances cause system error messages, you will have to defeat them somehow, for example by defining |fget| to be `\ignorespaces|{@+get(tfm_file);| |if (eof(tfm_file)) abort;} |\unskip'. @^system dependencies@> @d fget get(tfm_file) @d fbyte tfm_file.d @d read_sixteen(X) {@+X=fbyte; if (X > 127) abort; fget;X=X*0400+fbyte; } @d store_four_quarters(X) {@+fget;a=fbyte;qw.b0=qi(a); fget;b=fbyte;qw.b1=qi(b); fget;c=fbyte;qw.b2=qi(c); fget;d=fbyte;qw.b3=qi(d); X=qw; } @ @<Read the {\.{TFM}} size fields@>= {@+read_sixteen(lf); fget;read_sixteen(lh); fget;read_sixteen(bc); fget;read_sixteen(ec); if ((bc > ec+1)||(ec > 255)) abort; if (bc > 255) /*|bc==256| and |ec==255|*/ {@+bc=1;ec=0; } fget;read_sixteen(nw); fget;read_sixteen(nh); fget;read_sixteen(nd); fget;read_sixteen(ni); fget;read_sixteen(nl); fget;read_sixteen(nk); fget;read_sixteen(ne); fget;read_sixteen(np); if (lf!=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np) abort; if ((nw==0)||(nh==0)||(nd==0)||(ni==0)) abort; } @ The preliminary settings of the index-offset variables |char_base|, |width_base|, |lig_kern_base|, |kern_base|, and |exten_base| will be corrected later by subtracting |min_quarterword| from them; and we will subtract 1 from |param_base| too. It's best to forget about such anomalies until later. @<Use size fields to allocate font information@>= lf=lf-6-lh; /*|lf| words should be loaded into |font_info|*/ if (np < 7) lf=lf+7-np; /*at least seven parameters will appear*/ if ((font_ptr==font_max)||(fmem_ptr+lf > font_mem_size)) @<Apologize for not loading the font, |goto done|@>; f=font_ptr+1; char_base[f]=fmem_ptr-bc; width_base[f]=char_base[f]+ec+1; height_base[f]=width_base[f]+nw; depth_base[f]=height_base[f]+nh; italic_base[f]=depth_base[f]+nd; lig_kern_base[f]=italic_base[f]+ni; kern_base[f]=lig_kern_base[f]+nl-kern_base_offset; exten_base[f]=kern_base[f]+kern_base_offset+nk; param_base[f]=exten_base[f]+ne @ @<Apologize for not loading...@>= {@+start_font_error_message; print_str(" not loaded: Not enough room left"); @.Font x=xx not loaded...@> help4("I'm afraid I won't be able to make use of this font,")@/ ("because my memory for character-size data is too small.")@/ ("If you're really stuck, ask a wizard to enlarge me.")@/ ("Or maybe try `I\\font<same font id>=<name of loaded font>'."); error();goto done; } @ Only the first two words of the header are needed by \TeX82. @<Read the {\.{TFM}} header@>= {@+if (lh < 2) abort; store_four_quarters(font_check[f]); fget;read_sixteen(z); /*this rejects a negative design size*/ fget;z=z*0400+fbyte;fget;z=(z*020)+(fbyte/020); if (z < unity) abort; while (lh > 2) {@+fget;fget;fget;fget;decr(lh); /*ignore the rest of the header*/ } font_dsize[f]=z; if (s!=-1000) if (s >= 0) z=s; else z=xn_over_d(z,-s, 1000); font_size[f]=z; } @ @<Read character data@>= for (k=fmem_ptr; k<=width_base[f]-1; k++) {@+store_four_quarters(font_info[k].qqqq); if ((a >= nw)||(b/020 >= nh)||(b%020 >= nd)|| (c/4 >= ni)) abort; switch (c%4) { case lig_tag: if (d >= nl) abort;@+break; case ext_tag: if (d >= ne) abort;@+break; case list_tag: @<Check for charlist cycle@>@;@+break; default:do_nothing; /*|no_tag|*/ } } @ We want to make sure that there is no cycle of characters linked together by |list_tag| entries, since such a cycle would get \TeX\ into an endless loop. If such a cycle exists, the routine here detects it when processing the largest character code in the cycle. @d check_byte_range(X) {@+if ((X < bc)||(X > ec)) abort;@+} @d current_character_being_worked_on k+bc-fmem_ptr @<Check for charlist cycle@>= {@+check_byte_range(d); while (d < current_character_being_worked_on) {@+qw=char_info(f)(d); /*N.B.: not |qi(d)|, since |char_base[f]| hasn't been adjusted yet*/ if (char_tag(qw)!=list_tag) goto not_found; d=qo(rem_byte(qw)); /*next character on the list*/ } if (d==current_character_being_worked_on) abort; /*yes, there's a cycle*/ not_found: ;} @ A |fix_word| whose four bytes are $(a,b,c,d)$ from left to right represents the number $$x=\left\{\vcenter{\halign{$#$,\hfil\qquad&if $#$\hfil\cr b\cdot2^{-4}+c\cdot2^{-12}+d\cdot2^{-20}&a=0;\cr -16+b\cdot2^{-4}+c\cdot2^{-12}+d\cdot2^{-20}&a=255.\cr}}\right.$$ (No other choices of |a| are allowed, since the magnitude of a number in design-size units must be less than 16.) We want to multiply this quantity by the integer~|z|, which is known to be less than $2^{27}$. If $|z|<2^{23}$, the individual multiplications $b\cdot z$, $c\cdot z$, $d\cdot z$ cannot overflow; otherwise we will divide |z| by 2, 4, 8, or 16, to obtain a multiplier less than $2^{23}$, and we can compensate for this later. If |z| has thereby been replaced by $|z|^\prime=|z|/2^e$, let $\beta=2^{4-e}$; we shall compute $$\lfloor(b+c\cdot2^{-8}+d\cdot2^{-16})\,z^\prime/\beta\rfloor$$ if $a=0$, or the same quantity minus $\alpha=2^{4+e}z^\prime$ if $a=255$. This calculation must be done exactly, in order to guarantee portability of \TeX\ between computers. @d store_scaled(X) {@+fget;a=fbyte;fget;b=fbyte; fget;c=fbyte;fget;d=fbyte;@/ sw=(((((d*z)/0400)+(c*z))/0400)+(b*z))/beta; if (a==0) X=sw;@+else if (a==255) X=sw-alpha;@+else abort; } @<Read box dimensions@>= {@+@<Replace |z| by $|z|^\prime$ and compute $\alpha,\beta$@>; for (k=width_base[f]; k<=lig_kern_base[f]-1; k++) store_scaled(font_info[k].sc); if (font_info[width_base[f]].sc!=0) abort; /*\\{width}[0] must be zero*/ if (font_info[height_base[f]].sc!=0) abort; /*\\{height}[0] must be zero*/ if (font_info[depth_base[f]].sc!=0) abort; /*\\{depth}[0] must be zero*/ if (font_info[italic_base[f]].sc!=0) abort; /*\\{italic}[0] must be zero*/ } @ @<Replace |z|...@>= {@+alpha=16; while (z >= 040000000) {@+z=z/2;alpha=alpha+alpha; } beta=256/alpha;alpha=alpha*z; } @ @d check_existence(X) @t@>@;@/ {@+check_byte_range(X); qw=char_info(f)(X); /*N.B.: not |qi(X)|*/ if (!char_exists(qw)) abort; } @<Read ligature/kern program@>= bch_label=077777;bchar=256; if (nl > 0) {@+for (k=lig_kern_base[f]; k<=kern_base[f]+kern_base_offset-1; k++) {@+store_four_quarters(font_info[k].qqqq); if (a > 128) {@+if (256*c+d >= nl) abort; if (a==255) if (k==lig_kern_base[f]) bchar=b; } else{@+if (b!=bchar) check_existence(b); if (c < 128) check_existence(d)@; /*check ligature*/ else if (256*(c-128)+d >= nk) abort; /*check kern*/ if (a < 128) if (k-lig_kern_base[f]+a+1 >= nl) abort; } } if (a==255) bch_label=256*c+d; } for (k=kern_base[f]+kern_base_offset; k<=exten_base[f]-1; k++) store_scaled(font_info[k].sc); @ @<Read extensible character recipes@>= for (k=exten_base[f]; k<=param_base[f]-1; k++) {@+store_four_quarters(font_info[k].qqqq); if (a!=0) check_existence(a); if (b!=0) check_existence(b); if (c!=0) check_existence(c); check_existence(d); } @ We check to see that the \.{TFM} file doesn't end prematurely; but no error message is given for files having more than |lf| words. @<Read font parameters@>= {@+for (k=1; k<=np; k++) if (k==1) /*the |slant| parameter is a pure number*/ {@+fget;sw=fbyte;if (sw > 127) sw=sw-256; fget;sw=sw*0400+fbyte;fget;sw=sw*0400+fbyte; fget;font_info[param_base[f]].sc= (sw*020)+(fbyte/020); } else store_scaled(font_info[param_base[f]+k-1].sc); if (eof(tfm_file)) abort; for (k=np+1; k<=7; k++) font_info[param_base[f]+k-1].sc=0; } @ Now to wrap it up, we have checked all the necessary things about the \.{TFM} file, and all we need to do is put the finishing touches on the data for the new font. @d adjust(X) X[f]=qo(X[f]) /*correct for the excess |min_quarterword| that was added*/ @<Make final adjustments...@>= if (np >= 7) font_params[f]=np;@+else font_params[f]=7; hyphen_char[f]=default_hyphen_char;skew_char[f]=default_skew_char; if (bch_label < nl) bchar_label[f]=bch_label+lig_kern_base[f]; else bchar_label[f]=non_address; font_bchar[f]=qi(bchar); font_false_bchar[f]=qi(bchar); if (bchar <= ec) if (bchar >= bc) {@+qw=char_info(f)(bchar); /*N.B.: not |qi(bchar)|*/ if (char_exists(qw)) font_false_bchar[f]=non_char; } font_name[f]=nom; font_area[f]=aire; font_bc[f]=bc;font_ec[f]=ec;font_glue[f]=null; adjust(char_base);adjust(width_base);adjust(lig_kern_base); adjust(kern_base);adjust(exten_base); decr(param_base[f]); fmem_ptr=fmem_ptr+lf;font_ptr=f;g=f;goto done @ Before we forget about the format of these tables, let's deal with two of \TeX's basic scanning routines related to font information. @<Declare procedures that scan font-related stuff@>= void scan_font_ident(void) {@+internal_font_number f; halfword @!m; @<Get the next non-blank non-call...@>; if (cur_cmd==def_font) f=cur_font; else if (cur_cmd==set_font) f=cur_chr; else if (cur_cmd==def_family) {@+m=cur_chr;scan_four_bit_int();f=equiv(m+cur_val); } else{@+print_err("Missing font identifier"); @.Missing font identifier@> help2("I was looking for a control sequence whose")@/ ("current meaning has been defined by \\font."); back_error();f=null_font; } cur_val=f; } @ The following routine is used to implement `\.{\\fontdimen} |n| |f|'. The boolean parameter |writing| is set |true| if the calling program intends to change the parameter value. @<Declare procedures that scan font-related stuff@>= void find_font_dimen(bool @!writing) /*sets |cur_val| to |font_info| location*/ {@+internal_font_number f; int @!n; /*the parameter number*/ scan_int();n=cur_val;scan_font_ident();f=cur_val; if (n <= 0) cur_val=fmem_ptr; else{@+if (writing&&(n <= space_shrink_code)&&@| (n >= space_code)&&(font_glue[f]!=null)) {@+delete_glue_ref(font_glue[f]); font_glue[f]=null; } if (n > font_params[f]) if (f < font_ptr) cur_val=fmem_ptr; else@<Increase the number of parameters in the last font@>@; else cur_val=n+param_base[f]; } @<Issue an error message if |cur_val=fmem_ptr|@>; } @ @<Issue an error message if |cur_val=fmem_ptr|@>= if (cur_val==fmem_ptr) {@+print_err("Font ");print_esc(font_id_text(f)); print_str(" has only ");print_int(font_params[f]); print_str(" fontdimen parameters"); @.Font x has only...@> help2("To increase the number of font parameters, you must")@/ ("use \\fontdimen immediately after the \\font is loaded."); error(); } @ @<Increase the number of parameters...@>= {@+@/do@+{if (fmem_ptr==font_mem_size) overflow("font memory", font_mem_size); @:TeX capacity exceeded font memory}{\quad font memory@> font_info[fmem_ptr].sc=0;incr(fmem_ptr);incr(font_params[f]); }@+ while (!(n==font_params[f])); cur_val=fmem_ptr-1; /*this equals |param_base[f]+font_params[f]|*/ } @ When \TeX\ wants to typeset a character that doesn't exist, the character node is not created; thus the output routine can assume that characters exist when it sees them. The following procedure prints a warning message unless the user has suppressed it. @p void char_warning(internal_font_number @!f, eight_bits @!c) {@+if (tracing_lost_chars > 0) {@+begin_diagnostic(); print_nl("Missing character: There is no "); @.Missing character@> print_ASCII(c);print_str(" in font "); slow_print(font_name[f]);print_char('!');end_diagnostic(false); } } @ Here is a function that returns a pointer to a character node for a given character in a given font. If that character doesn't exist, |null| is returned instead. @p pointer new_character(internal_font_number @!f, eight_bits @!c) {@+ pointer p; /*newly allocated node*/ if (font_bc[f] <= c) if (font_ec[f] >= c) if (char_exists(char_info(f)(qi(c)))) {@+p=get_avail();font(p)=f;character(p)=qi(c); return p; } char_warning(f, c); return null; } @* Device-independent file format. The most important output produced by a run of \TeX\ is the ``device independent'' (\.{DVI}) file that specifies where characters and rules are to appear on printed pages. The form of these files was designed by David R. Fuchs in 1979. Almost any reasonable typesetting device can be @^Fuchs, David Raymond@> @:DVI_files}{\.{DVI} files@> driven by a program that takes \.{DVI} files as input, and dozens of such \.{DVI}-to-whatever programs have been written. Thus, it is possible to print the output of \TeX\ on many different kinds of equipment, using \TeX\ as a device-independent ``front end.'' A \.{DVI} file is a stream of 8-bit bytes, which may be regarded as a series of commands in a machine-like language. The first byte of each command is the operation code, and this code is followed by zero or more bytes that provide parameters to the command. The parameters themselves may consist of several consecutive bytes; for example, the `|set_rule|' command has two parameters, each of which is four bytes long. Parameters are usually regarded as nonnegative integers; but four-byte-long parameters, and shorter parameters that denote distances, can be either positive or negative. Such parameters are given in two's complement notation. For example, a two-byte-long distance parameter has a value between $-2^{15}$ and $2^{15}-1$. As in \.{TFM} files, numbers that occupy more than one byte position appear in BigEndian order. A \.{DVI} file consists of a ``preamble,'' followed by a sequence of one or more ``pages,'' followed by a ``postamble.'' The preamble is simply a |pre| command, with its parameters that define the dimensions used in the file; this must come first. Each ``page'' consists of a |bop| command, followed by any number of other commands that tell where characters are to be placed on a physical page, followed by an |eop| command. The pages appear in the order that \TeX\ generated them. If we ignore |nop| commands and \\{fnt\_def} commands (which are allowed between any two commands in the file), each |eop| command is immediately followed by a |bop| command, or by a |post| command; in the latter case, there are no more pages in the file, and the remaining bytes form the postamble. Further details about the postamble will be explained later. Some parameters in \.{DVI} commands are ``pointers.'' These are four-byte quantities that give the location number of some other byte in the file; the first byte is number~0, then comes number~1, and so on. For example, one of the parameters of a |bop| command points to the previous |bop|; this makes it feasible to read the pages in backwards order, in case the results are being directed to a device that stacks its output face up. Suppose the preamble of a \.{DVI} file occupies bytes 0 to 99. Now if the first page occupies bytes 100 to 999, say, and if the second page occupies bytes 1000 to 1999, then the |bop| that starts in byte 1000 points to 100 and the |bop| that starts in byte 2000 points to 1000. (The very first |bop|, i.e., the one starting in byte 100, has a pointer of~$-1$.) @ The \.{DVI} format is intended to be both compact and easily interpreted by a machine. Compactness is achieved by making most of the information implicit instead of explicit. When a \.{DVI}-reading program reads the commands for a page, it keeps track of several quantities: (a)~The current font |f| is an integer; this value is changed only by \\{fnt} and \\{fnt\_num} commands. (b)~The current position on the page is given by two numbers called the horizontal and vertical coordinates, |h| and |v|. Both coordinates are zero at the upper left corner of the page; moving to the right corresponds to increasing the horizontal coordinate, and moving down corresponds to increasing the vertical coordinate. Thus, the coordinates are essentially Cartesian, except that vertical directions are flipped; the Cartesian version of |(h, v)| would be |(h,-v)|. (c)~The current spacing amounts are given by four numbers |w|, |x|, |y|, and |z|, where |w| and~|x| are used for horizontal spacing and where |y| and~|z| are used for vertical spacing. (d)~There is a stack containing |(h, v, w, x, y, z)| values; the \.{DVI} commands |push| and |pop| are used to change the current level of operation. Note that the current font~|f| is not pushed and popped; the stack contains only information about positioning. The values of |h|, |v|, |w|, |x|, |y|, and |z| are signed integers having up to 32 bits, including the sign. Since they represent physical distances, there is a small unit of measurement such that increasing |h| by~1 means moving a certain tiny distance to the right. The actual unit of measurement is variable, as explained below; \TeX\ sets things up so that its \.{DVI} output is in sp units, i.e., scaled points, in agreement with all the |scaled| dimensions in \TeX's data structures. @ Here is a list of all the commands that may appear in a \.{DVI} file. Each command is specified by its symbolic name (e.g., |bop|), its opcode byte (e.g., 139), and its parameters (if any). The parameters are followed by a bracketed number telling how many bytes they occupy; for example, `|p[4]|' means that parameter |p| is four bytes long. \yskip\hang|set_char_0| 0. Typeset character number~0 from font~|f| such that the reference point of the character is at |(h, v)|. Then increase |h| by the width of that character. Note that a character may have zero or negative width, so one cannot be sure that |h| will advance after this command; but |h| usually does increase. \yskip\hang\\{set\_char\_1} through \\{set\_char\_127} (opcodes 1 to 127). Do the operations of |set_char_0|; but use the character whose number matches the opcode, instead of character~0. \yskip\hang|set1| 128 |c[1]|. Same as |set_char_0|, except that character number~|c| is typeset. \TeX82 uses this command for characters in the range |128 <= c < 256|. \yskip\hang|@!set2| 129 |c[2]|. Same as |set1|, except that |c|~is two bytes long, so it is in the range |0 <= c < 65536|. \TeX82 never uses this command, but it should come in handy for extensions of \TeX\ that deal with oriental languages. @^oriental characters@>@^Chinese characters@>@^Japanese characters@> \yskip\hang|@!set3| 130 |c[3]|. Same as |set1|, except that |c|~is three bytes long, so it can be as large as $2^{24}-1$. Not even the Chinese language has this many characters, but this command might prove useful in some yet unforeseen extension. \yskip\hang|@!set4| 131 |c[4]|. Same as |set1|, except that |c|~is four bytes long. Imagine that. \yskip\hang|set_rule| 132 |a[4]| |b[4]|. Typeset a solid black rectangle of height~|a| and width~|b|, with its bottom left corner at |(h, v)|. Then set |h=h+b|. If either |a <= 0| or |b <= 0|, nothing should be typeset. Note that if |b < 0|, the value of |h| will decrease even though nothing else happens. See below for details about how to typeset rules so that consistency with \MF\ is guaranteed. \yskip\hang|@!put1| 133 |c[1]|. Typeset character number~|c| from font~|f| such that the reference point of the character is at |(h, v)|. (The `put' commands are exactly like the `set' commands, except that they simply put out a character or a rule without moving the reference point afterwards.) \yskip\hang|@!put2| 134 |c[2]|. Same as |set2|, except that |h| is not changed. \yskip\hang|@!put3| 135 |c[3]|. Same as |set3|, except that |h| is not changed. \yskip\hang|@!put4| 136 |c[4]|. Same as |set4|, except that |h| is not changed. \yskip\hang|put_rule| 137 |a[4]| |b[4]|. Same as |set_rule|, except that |h| is not changed. \yskip\hang|nop| 138. No operation, do nothing. Any number of |nop|'s may occur between \.{DVI} commands, but a |nop| cannot be inserted between a command and its parameters or between two parameters. \yskip\hang|bop| 139 $c_0[4]$ $c_1[4]$ $\ldots$ $c_9[4]$ $p[4]$. Beginning of a page: Set |(h, v, w, x, y, z)=(0, 0, 0, 0, 0, 0)| and set the stack empty. Set the current font |f| to an undefined value. The ten $c_i$ parameters hold the values of \.{\\count0} $\ldots$ \.{\\count9} in \TeX\ at the time \.{\\shipout} was invoked for this page; they can be used to identify pages, if a user wants to print only part of a \.{DVI} file. The parameter |p| points to the previous |bop| in the file; the first |bop| has $p=-1$. \yskip\hang|eop| 140. End of page: Print what you have read since the previous |bop|. At this point the stack should be empty. (The \.{DVI}-reading programs that drive most output devices will have kept a buffer of the material that appears on the page that has just ended. This material is largely, but not entirely, in order by |v| coordinate and (for fixed |v|) by |h|~coordinate; so it usually needs to be sorted into some order that is appropriate for the device in question.) \yskip\hang|push| 141. Push the current values of |(h, v, w, x, y, z)| onto the top of the stack; do not change any of these values. Note that |f| is not pushed. \yskip\hang|pop| 142. Pop the top six values off of the stack and assign them respectively to |(h, v, w, x, y, z)|. The number of pops should never exceed the number of pushes, since it would be highly embarrassing if the stack were empty at the time of a |pop| command. \yskip\hang|right1| 143 |b[1]|. Set |h=h+b|, i.e., move right |b| units. The parameter is a signed number in two's complement notation, |-128 <= b < 128|; if |b < 0|, the reference point moves left. \yskip\hang|right2| 144 |b[2]|. Same as |right1|, except that |b| is a two-byte quantity in the range |-32768 <= b < 32768|. \yskip\hang|right3| 145 |b[3]|. Same as |right1|, except that |b| is a three-byte quantity in the range |@t$-2^{23}$@> <= b < @t$2^{23}$@>|. \yskip\hang|right4| 146 |b[4]|. Same as |right1|, except that |b| is a four-byte quantity in the range |@t$-2^{31}$@> <= b < @t$2^{31}$@>|. \yskip\hang|w0| 147. Set |h=h+w|; i.e., move right |w| units. With luck, this parameterless command will usually suffice, because the same kind of motion will occur several times in succession; the following commands explain how |w| gets particular values. \yskip\hang|w1| 148 |b[1]|. Set |w=b| and |h=h+b|. The value of |b| is a signed quantity in two's complement notation, |-128 <= b < 128|. This command changes the current |w|~spacing and moves right by |b|. \yskip\hang|@!w2| 149 |b[2]|. Same as |w1|, but |b| is two bytes long, |-32768 <= b < 32768|. \yskip\hang|@!w3| 150 |b[3]|. Same as |w1|, but |b| is three bytes long, |@t$-2^{23}$@> <= b < @t$2^{23}$@>|. \yskip\hang|@!w4| 151 |b[4]|. Same as |w1|, but |b| is four bytes long, |@t$-2^{31}$@> <= b < @t$2^{31}$@>|. \yskip\hang|x0| 152. Set |h=h+x|; i.e., move right |x| units. The `|x|' commands are like the `|w|' commands except that they involve |x| instead of |w|. \yskip\hang|x1| 153 |b[1]|. Set |x=b| and |h=h+b|. The value of |b| is a signed quantity in two's complement notation, |-128 <= b < 128|. This command changes the current |x|~spacing and moves right by |b|. \yskip\hang|@!x2| 154 |b[2]|. Same as |x1|, but |b| is two bytes long, |-32768 <= b < 32768|. \yskip\hang|@!x3| 155 |b[3]|. Same as |x1|, but |b| is three bytes long, |@t$-2^{23}$@> <= b < @t$2^{23}$@>|. \yskip\hang|@!x4| 156 |b[4]|. Same as |x1|, but |b| is four bytes long, |@t$-2^{31}$@> <= b < @t$2^{31}$@>|. \yskip\hang|down1| 157 |a[1]|. Set |v=v+a|, i.e., move down |a| units. The parameter is a signed number in two's complement notation, |-128 <= a < 128|; if |a < 0|, the reference point moves up. \yskip\hang|@!down2| 158 |a[2]|. Same as |down1|, except that |a| is a two-byte quantity in the range |-32768 <= a < 32768|. \yskip\hang|@!down3| 159 |a[3]|. Same as |down1|, except that |a| is a three-byte quantity in the range |@t$-2^{23}$@> <= a < @t$2^{23}$@>|. \yskip\hang|@!down4| 160 |a[4]|. Same as |down1|, except that |a| is a four-byte quantity in the range |@t$-2^{31}$@> <= a < @t$2^{31}$@>|. \yskip\hang|y0| 161. Set |v=v+y|; i.e., move down |y| units. With luck, this parameterless command will usually suffice, because the same kind of motion will occur several times in succession; the following commands explain how |y| gets particular values. \yskip\hang|y1| 162 |a[1]|. Set |y=a| and |v=v+a|. The value of |a| is a signed quantity in two's complement notation, |-128 <= a < 128|. This command changes the current |y|~spacing and moves down by |a|. \yskip\hang|@!y2| 163 |a[2]|. Same as |y1|, but |a| is two bytes long, |-32768 <= a < 32768|. \yskip\hang|@!y3| 164 |a[3]|. Same as |y1|, but |a| is three bytes long, |@t$-2^{23}$@> <= a < @t$2^{23}$@>|. \yskip\hang|@!y4| 165 |a[4]|. Same as |y1|, but |a| is four bytes long, |@t$-2^{31}$@> <= a < @t$2^{31}$@>|. \yskip\hang|z0| 166. Set |v=v+z|; i.e., move down |z| units. The `|z|' commands are like the `|y|' commands except that they involve |z| instead of |y|. \yskip\hang|z1| 167 |a[1]|. Set |z=a| and |v=v+a|. The value of |a| is a signed quantity in two's complement notation, |-128 <= a < 128|. This command changes the current |z|~spacing and moves down by |a|. \yskip\hang|@!z2| 168 |a[2]|. Same as |z1|, but |a| is two bytes long, |-32768 <= a < 32768|. \yskip\hang|@!z3| 169 |a[3]|. Same as |z1|, but |a| is three bytes long, |@t$-2^{23}$@> <= a < @t$2^{23}$@>|. \yskip\hang|@!z4| 170 |a[4]|. Same as |z1|, but |a| is four bytes long, |@t$-2^{31}$@> <= a < @t$2^{31}$@>|. \yskip\hang|fnt_num_0| 171. Set |f=0|. Font 0 must previously have been defined by a \\{fnt\_def} instruction, as explained below. \yskip\hang\\{fnt\_num\_1} through \\{fnt\_num\_63} (opcodes 172 to 234). Set |f=1|, \dots, \hbox{|f=63|}, respectively. \yskip\hang|fnt1| 235 |k[1]|. Set |f=k|. \TeX82 uses this command for font numbers in the range |64 <= k < 256|. \yskip\hang|@!fnt2| 236 |k[2]|. Same as |fnt1|, except that |k|~is two bytes long, so it is in the range |0 <= k < 65536|. \TeX82 never generates this command, but large font numbers may prove useful for specifications of color or texture, or they may be used for special fonts that have fixed numbers in some external coding scheme. \yskip\hang|@!fnt3| 237 |k[3]|. Same as |fnt1|, except that |k|~is three bytes long, so it can be as large as $2^{24}-1$. \yskip\hang|@!fnt4| 238 |k[4]|. Same as |fnt1|, except that |k|~is four bytes long; this is for the really big font numbers (and for the negative ones). \yskip\hang|xxx1| 239 |k[1]| |x[k]|. This command is undefined in general; it functions as a $(k+2)$-byte |nop| unless special \.{DVI}-reading programs are being used. \TeX82 generates |xxx1| when a short enough \.{\\special} appears, setting |k| to the number of bytes being sent. It is recommended that |x| be a string having the form of a keyword followed by possible parameters relevant to that keyword. \yskip\hang|@!xxx2| 240 |k[2]| |x[k]|. Like |xxx1|, but |0 <= k < 65536|. \yskip\hang|@!xxx3| 241 |k[3]| |x[k]|. Like |xxx1|, but |0 <= k < @t$2^{24}$@>|. \yskip\hang|xxx4| 242 |k[4]| |x[k]|. Like |xxx1|, but |k| can be ridiculously large. \TeX82 uses |xxx4| when sending a string of length 256 or more. \yskip\hang|fnt_def1| 243 |k[1]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|. Define font |k|, where |0 <= k < 256|; font definitions will be explained shortly. \yskip\hang|@!fnt_def2| 244 |k[2]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|. Define font |k|, where |0 <= k < 65536|. \yskip\hang|@!fnt_def3| 245 |k[3]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|. Define font |k|, where |0 <= k < @t$2^{24}$@>|. \yskip\hang|@!fnt_def4| 246 |k[4]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|. Define font |k|, where |@t$-2^{31}$@> <= k < @t$2^{31}$@>|. \yskip\hang|pre| 247 |i[1]| |num[4]| |den[4]| |mag[4]| |k[1]| |x[k]|. Beginning of the preamble; this must come at the very beginning of the file. Parameters |i|, |num|, |den|, |mag|, |k|, and |x| are explained below. \yskip\hang|post| 248. Beginning of the postamble, see below. \yskip\hang|post_post| 249. Ending of the postamble, see below. \yskip\noindent Commands 250--255 are undefined at the present time. @ @d set_char_0 0 /*typeset character 0 and move right*/ @d set1 128 /*typeset a character and move right*/ @d set_rule 132 /*typeset a rule and move right*/ @d put_rule 137 /*typeset a rule*/ @d nop 138 /*no operation*/ @d bop 139 /*beginning of page*/ @d eop 140 /*ending of page*/ @d push 141 /*save the current positions*/ @d pop 142 /*restore previous positions*/ @d right1 143 /*move right*/ @d w0 147 /*move right by |w|*/ @d w1 148 /*move right and set |w|*/ @d x0 152 /*move right by |x|*/ @d x1 153 /*move right and set |x|*/ @d down1 157 /*move down*/ @d y0 161 /*move down by |y|*/ @d y1 162 /*move down and set |y|*/ @d z0 166 /*move down by |z|*/ @d z1 167 /*move down and set |z|*/ @d fnt_num_0 171 /*set current font to 0*/ @d fnt1 235 /*set current font*/ @d xxx1 239 /*extension to \.{DVI} primitives*/ @d xxx4 242 /*potentially long extension to \.{DVI} primitives*/ @d fnt_def1 243 /*define the meaning of a font number*/ @d pre 247 /*preamble*/ @d post 248 /*postamble beginning*/ @d post_post 249 /*postamble ending*/ @ The preamble contains basic information about the file as a whole. As stated above, there are six parameters: $$\hbox{|@!i[1]| |@!num[4]| |@!den[4]| |@!mag[4]| |@!k[1]| |@!x[k]|.}$$ The |i| byte identifies \.{DVI} format; currently this byte is always set to~2. (The value |i==3| is currently used for an extended format that allows a mixture of right-to-left and left-to-right typesetting. Some day we will set |i==4|, when \.{DVI} format makes another incompatible change---perhaps in the year 2048.) The next two parameters, |num| and |den|, are positive integers that define the units of measurement; they are the numerator and denominator of a fraction by which all dimensions in the \.{DVI} file could be multiplied in order to get lengths in units of $10^{-7}$ meters. Since $\rm 7227{pt} = 254{cm}$, and since \TeX\ works with scaled points where there are $2^{16}$ sp in a point, \TeX\ sets $|num|/|den|=(254\cdot10^5)/(7227\cdot2^{16})=25400000/473628672$. @^sp@> The |mag| parameter is what \TeX\ calls \.{\\mag}, i.e., 1000 times the desired magnification. The actual fraction by which dimensions are multiplied is therefore $|mag|\cdot|num|/1000|den|$. Note that if a \TeX\ source document does not call for any `\.{true}' dimensions, and if you change it only by specifying a different \.{\\mag} setting, the \.{DVI} file that \TeX\ creates will be completely unchanged except for the value of |mag| in the preamble and postamble. (Fancy \.{DVI}-reading programs allow users to override the |mag|~setting when a \.{DVI} file is being printed.) Finally, |k| and |x| allow the \.{DVI} writer to include a comment, which is not interpreted further. The length of comment |x| is |k|, where |0 <= k < 256|. @d id_byte 2 /*identifies the kind of \.{DVI} files described here*/ @ Font definitions for a given font number |k| contain further parameters $$\hbox{|c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|.}$$ The four-byte value |c| is the check sum that \TeX\ found in the \.{TFM} file for this font; |c| should match the check sum of the font found by programs that read this \.{DVI} file. @^check sum@> Parameter |s| contains a fixed-point scale factor that is applied to the character widths in font |k|; font dimensions in \.{TFM} files and other font files are relative to this quantity, which is called the ``at size'' elsewhere in this documentation. The value of |s| is always positive and less than $2^{27}$. It is given in the same units as the other \.{DVI} dimensions, i.e., in sp when \TeX82 has made the file. Parameter |d| is similar to |s|; it is the ``design size,'' and (like~|s|) it is given in \.{DVI} units. Thus, font |k| is to be used at $|mag|\cdot s/1000d$ times its normal size. The remaining part of a font definition gives the external name of the font, which is an ASCII string of length |a+l|. The number |a| is the length of the ``area'' or directory, and |l| is the length of the font name itself; the standard local system font area is supposed to be used when |a==0|. The |n| field contains the area in its first |a| bytes. Font definitions must appear before the first use of a particular font number. Once font |k| is defined, it must not be defined again; however, we shall see below that font definitions appear in the postamble as well as in the pages, so in this sense each font number is defined exactly twice, if at all. Like |nop| commands, font definitions can appear before the first |bop|, or between an |eop| and a |bop|. @ Sometimes it is desirable to make horizontal or vertical rules line up precisely with certain features in characters of a font. It is possible to guarantee the correct matching between \.{DVI} output and the characters generated by \MF\ by adhering to the following principles: (1)~The \MF\ characters should be positioned so that a bottom edge or left edge that is supposed to line up with the bottom or left edge of a rule appears at the reference point, i.e., in row~0 and column~0 of the \MF\ raster. This ensures that the position of the rule will not be rounded differently when the pixel size is not a perfect multiple of the units of measurement in the \.{DVI} file. (2)~A typeset rule of height $a>0$ and width $b>0$ should be equivalent to a \MF-generated character having black pixels in precisely those raster positions whose \MF\ coordinates satisfy |0 <= x < @t$\alpha$@>b| and |0 <= y < @t$\alpha$@>a|, where $\alpha$ is the number of pixels per \.{DVI} unit. @:METAFONT}{\MF@> @^alignment of rules with characters@> @^rules aligning with characters@> @ The last page in a \.{DVI} file is followed by `|post|'; this command introduces the postamble, which summarizes important facts that \TeX\ has accumulated about the file, making it possible to print subsets of the data with reasonable efficiency. The postamble has the form $$\vbox{\halign{\hbox{#\hfil}\cr |post| |p[4]| |num[4]| |den[4]| |mag[4]| |l[4]| |u[4]| |s[2]| |t[2]|\cr $\langle\,$font definitions$\,\rangle$\cr |post_post| |q[4]| |i[1]| 223's$[{\G}4]$\cr}}$$ Here |p| is a pointer to the final |bop| in the file. The next three parameters, |num|, |den|, and |mag|, are duplicates of the quantities that appeared in the preamble. Parameters |l| and |u| give respectively the height-plus-depth of the tallest page and the width of the widest page, in the same units as other dimensions of the file. These numbers might be used by a \.{DVI}-reading program to position individual ``pages'' on large sheets of film or paper; however, the standard convention for output on normal size paper is to position each page so that the upper left-hand corner is exactly one inch from the left and the top. Experience has shown that it is unwise to design \.{DVI}-to-printer software that attempts cleverly to center the output; a fixed position of the upper left corner is easiest for users to understand and to work with. Therefore |l| and~|u| are often ignored. Parameter |s| is the maximum stack depth (i.e., the largest excess of |push| commands over |pop| commands) needed to process this file. Then comes |t|, the total number of pages (|bop| commands) present. The postamble continues with font definitions, which are any number of \\{fnt\_def} commands as described above, possibly interspersed with |nop| commands. Each font number that is used in the \.{DVI} file must be defined exactly twice: Once before it is first selected by a \\{fnt} command, and once in the postamble. @ The last part of the postamble, following the |post_post| byte that signifies the end of the font definitions, contains |q|, a pointer to the |post| command that started the postamble. An identification byte, |i|, comes next; this currently equals~2, as in the preamble. The |i| byte is followed by four or more bytes that are all equal to the decimal number 223 (i.e., 0337 in octal). \TeX\ puts out four to seven of these trailing bytes, until the total length of the file is a multiple of four bytes, since this works out best on machines that pack four bytes per word; but any number of 223's is allowed, as long as there are at least four of them. In effect, 223 is a sort of signature that is added at the very end. @^Fuchs, David Raymond@> This curious way to finish off a \.{DVI} file makes it feasible for \.{DVI}-reading programs to find the postamble first, on most computers, even though \TeX\ wants to write the postamble last. Most operating systems permit random access to individual words or bytes of a file, so the \.{DVI} reader can start at the end and skip backwards over the 223's until finding the identification byte. Then it can back up four bytes, read |q|, and move to byte |q| of the file. This byte should, of course, contain the value 248 (|post|); now the postamble can be read, so the \.{DVI} reader can discover all the information needed for typesetting the pages. Note that it is also possible to skip through the \.{DVI} file at reasonably high speed to locate a particular page, if that proves desirable. This saves a lot of time, since \.{DVI} files used in production jobs tend to be large. Unfortunately, however, standard \PASCAL\ does not include the ability to @^system dependencies@> access a random position in a file, or even to determine the length of a file. Almost all systems nowadays provide the necessary capabilities, so \.{DVI} format has been designed to work most efficiently with modern operating systems. But if \.{DVI} files have to be processed under the restrictions of standard \PASCAL, one can simply read them from front to back, since the necessary header information is present in the preamble and in the font definitions. (The |l| and |u| and |s| and |t| parameters, which appear only in the postamble, are ``frills'' that are handy but not absolutely necessary.) @* Shipping pages out. After considering \TeX's eyes and stomach, we come now to the bowels. @^bowels@> The |ship_out| procedure is given a pointer to a box; its mission is to describe that box in \.{DVI} form, outputting a ``page'' to |dvi_file|. The \.{DVI} coordinates $(h,v)=(0,0)$ should correspond to the upper left corner of the box being shipped. Since boxes can be inside of boxes inside of boxes, the main work of |ship_out| is done by two mutually recursive routines, |hlist_out| and |vlist_out|, which traverse the hlists and vlists inside of horizontal and vertical boxes. As individual pages are being processed, we need to accumulate information about the entire set of pages, since such statistics must be reported in the postamble. The global variables |total_pages|, |max_v|, |max_h|, |max_push|, and |last_bop| are used to record this information. The variable |doing_leaders| is |true| while leaders are being output. The variable |dead_cycles| contains the number of times an output routine has been initiated since the last |ship_out|. A few additional global variables are also defined here for use in |vlist_out| and |hlist_out|. They could have been local variables, but that would waste stack space when boxes are deeply nested, since the values of these variables are not needed during recursive calls. @^recursion@> @<Glob...@>= int @!total_pages; /*the number of pages that have been shipped out*/ scaled @!max_v; /*maximum height-plus-depth of pages shipped so far*/ scaled @!max_h; /*maximum width of pages shipped so far*/ int @!max_push; /*deepest nesting of |push| commands encountered so far*/ int @!last_bop; /*location of previous |bop| in the \.{DVI} output*/ int @!dead_cycles; /*recent outputs that didn't ship anything out*/ bool @!doing_leaders; /*are we inside a leader box?*/ @# quarterword @!c, @!f; /*character and font in current |char_node|*/ scaled @!rule_ht, @!rule_dp, @!rule_wd; /*size of current rule being output*/ pointer @!g; /*current glue specification*/ int @!lq, @!lr; /*quantities used in calculations for leaders*/ @ @<Set init...@>= total_pages=0;max_v=0;max_h=0;max_push=0;last_bop=-1; doing_leaders=false;dead_cycles=0;cur_s=-1; @ The \.{DVI} bytes are output to a buffer instead of being written directly to the output file. This makes it possible to reduce the overhead of subroutine calls, thereby measurably speeding up the computation, since output of \.{DVI} bytes is part of \TeX's inner loop. And it has another advantage as well, since we can change instructions in the buffer in order to make the output more compact. For example, a `|down2|' command can be changed to a `|y2|', thereby making a subsequent `|y0|' command possible, saving two bytes. The output buffer is divided into two parts of equal size; the bytes found in |dvi_buf[0 dotdot half_buf-1]| constitute the first half, and those in |dvi_buf[half_buf dotdot dvi_buf_size-1]| constitute the second. The global variable |dvi_ptr| points to the position that will receive the next output byte. When |dvi_ptr| reaches |dvi_limit|, which is always equal to one of the two values |half_buf| or |dvi_buf_size|, the half buffer that is about to be invaded next is sent to the output and |dvi_limit| is changed to its other value. Thus, there is always at least a half buffer's worth of information present, except at the very beginning of the job. Bytes of the \.{DVI} file are numbered sequentially starting with 0; the next byte to be generated will be number |dvi_offset+dvi_ptr|. A byte is present in the buffer only if its number is | >= dvi_gone|. @<Types...@>= typedef uint16_t dvi_index; /*an index into the output buffer*/ @ Some systems may find it more efficient to make |dvi_buf| a || array, since output of four bytes at once may be facilitated. @^system dependencies@> @<Glob...@>= eight_bits @!dvi_buf[dvi_buf_size+1]; /*buffer for \.{DVI} output*/ dvi_index @!half_buf; /*half of |dvi_buf_size|*/ dvi_index @!dvi_limit; /*end of the current half buffer*/ dvi_index @!dvi_ptr; /*the next available buffer address*/ int @!dvi_offset; /*|dvi_buf_size| times the number of times the output buffer has been fully emptied*/ int @!dvi_gone; /*the number of bytes already output to |dvi_file|*/ @ Initially the buffer is all in one piece; we will output half of it only after it first fills up. @<Set init...@>= half_buf=dvi_buf_size/2;dvi_limit=dvi_buf_size;dvi_ptr=0; dvi_offset=0;dvi_gone=0; @ The actual output of |dvi_buf[a dotdot b]| to |dvi_file| is performed by calling |write_dvi(a, b)|. For best results, this procedure should be optimized to run as fast as possible on each particular system, since it is part of \TeX's inner loop. It is safe to assume that |a| and |b+1| will both be multiples of 4 when |write_dvi(a, b)| is called; therefore it is possible on many machines to use efficient methods to pack four bytes per word and to output an array of words with one system call. @^system dependencies@> @^inner loop@> @^defecation@> @p void write_dvi(dvi_index @!a, dvi_index @!b) {@+int k; for (k=a; k<=b; k++) write(dvi_file, "%c", dvi_buf[k]); } @ To put a byte in the buffer without paying the cost of invoking a procedure each time, we use the macro |dvi_out|. @d dvi_out(X) @+{@+dvi_buf[dvi_ptr]=X;incr(dvi_ptr); if (dvi_ptr==dvi_limit) dvi_swap(); } @p void dvi_swap(void) /*outputs half of the buffer*/ {@+if (dvi_limit==dvi_buf_size) {@+write_dvi(0, half_buf-1);dvi_limit=half_buf; dvi_offset=dvi_offset+dvi_buf_size;dvi_ptr=0; } else{@+write_dvi(half_buf, dvi_buf_size-1);dvi_limit=dvi_buf_size; } dvi_gone=dvi_gone+half_buf; } @ Here is how we clean out the buffer when \TeX\ is all through; |dvi_ptr| will be a multiple of~4. @<Empty the last bytes out of |dvi_buf|@>= if (dvi_limit==half_buf) write_dvi(half_buf, dvi_buf_size-1); if (dvi_ptr > 0) write_dvi(0, dvi_ptr-1) @ The |dvi_four| procedure outputs four bytes in two's complement notation, without risking arithmetic overflow. @p void dvi_four(int @!x) {@+if (x >= 0) dvi_out(x/0100000000)@; else{@+x=x+010000000000; x=x+010000000000; dvi_out((x/0100000000)+128); } x=x%0100000000;dvi_out(x/0200000); x=x%0200000;dvi_out(x/0400); dvi_out(x%0400); } @ A mild optimization of the output is performed by the |dvi_pop| routine, which issues a |pop| unless it is possible to cancel a `|push| |pop|' pair. The parameter to |dvi_pop| is the byte address following the old |push| that matches the new |pop|. @p void dvi_pop(int @!l) {@+if ((l==dvi_offset+dvi_ptr)&&(dvi_ptr > 0)) decr(dvi_ptr); else dvi_out(pop); } @ Here's a procedure that outputs a font definition. Since \TeX82 uses at most 256 different fonts per job, |fnt_def1| is always used as the command code. @p void dvi_font_def(internal_font_number @!f) {@+int k; /*index into |str_pool|*/ dvi_out(fnt_def1); dvi_out(f-font_base-1);@/ dvi_out(qo(font_check[f].b0)); dvi_out(qo(font_check[f].b1)); dvi_out(qo(font_check[f].b2)); dvi_out(qo(font_check[f].b3));@/ dvi_four(font_size[f]); dvi_four(font_dsize[f]);@/ dvi_out(length(font_area[f])); dvi_out(length(font_name[f])); @<Output the font name whose internal number is |f|@>; } @ @<Output the font name whose internal number is |f|@>= for (k=str_start[font_area[f]]; k<=str_start[font_area[f]+1]-1; k++) dvi_out(so(str_pool[k])); for (k=str_start[font_name[f]]; k<=str_start[font_name[f]+1]-1; k++) dvi_out(so(str_pool[k])) @ Versions of \TeX\ intended for small computers might well choose to omit the ideas in the next few parts of this program, since it is not really necessary to optimize the \.{DVI} code by making use of the |w0|, |x0|, |y0|, and |z0| commands. Furthermore, the algorithm that we are about to describe does not pretend to give an optimum reduction in the length of the \.{DVI} code; after all, speed is more important than compactness. But the method is surprisingly effective, and it takes comparatively little time. We can best understand the basic idea by first considering a simpler problem that has the same essential characteristics. Given a sequence of digits, say $3\,1\,4\,1\,5\,9\,2\,6\,5\,3\,5\,8\,9$, we want to assign subscripts $d$, $y$, or $z$ to each digit so as to maximize the number of ``$y$-hits'' and ``$z$-hits''; a $y$-hit is an instance of two appearances of the same digit with the subscript $y$, where no $y$'s intervene between the two appearances, and a $z$-hit is defined similarly. For example, the sequence above could be decorated with subscripts as follows: $$3_z\,1_y\,4_d\,1_y\,5_y\,9_d\,2_d\,6_d\,5_y\,3_z\,5_y\,8_d\,9_d.$$ There are three $y$-hits ($1_y\ldots1_y$ and $5_y\ldots5_y\ldots5_y$) and one $z$-hit ($3_z\ldots3_z$); there are no $d$-hits, since the two appearances of $9_d$ have $d$'s between them, but we don't count $d$-hits so it doesn't matter how many there are. These subscripts are analogous to the \.{DVI} commands called \\{down}, $y$, and $z$, and the digits are analogous to different amounts of vertical motion; a $y$-hit or $z$-hit corresponds to the opportunity to use the one-byte commands |y0| or |z0| in a \.{DVI} file. \TeX's method of assigning subscripts works like this: Append a new digit, say $\delta$, to the right of the sequence. Now look back through the sequence until one of the following things happens: (a)~You see $\delta_y$ or $\delta_z$, and this was the first time you encountered a $y$ or $z$ subscript, respectively. Then assign $y$ or $z$ to the new $\delta$; you have scored a hit. (b)~You see $\delta_d$, and no $y$ subscripts have been encountered so far during this search. Then change the previous $\delta_d$ to $\delta_y$ (this corresponds to changing a command in the output buffer), and assign $y$ to the new $\delta$; it's another hit. (c)~You see $\delta_d$, and a $y$ subscript has been seen but not a $z$. Change the previous $\delta_d$ to $\delta_z$ and assign $z$ to the new $\delta$. (d)~You encounter both $y$ and $z$ subscripts before encountering a suitable $\delta$, or you scan all the way to the front of the sequence. Assign $d$ to the new $\delta$; this assignment may be changed later. The subscripts $3_z\,1_y\,4_d\ldots\,$ in the example above were, in fact, produced by this procedure, as the reader can verify. (Go ahead and try it.) @ In order to implement such an idea, \TeX\ maintains a stack of pointers to the \\{down}, $y$, and $z$ commands that have been generated for the current page. And there is a similar stack for \\{right}, |w|, and |x| commands. These stacks are called the down stack and right stack, and their top elements are maintained in the variables |down_ptr| and |right_ptr|. Each entry in these stacks contains four fields: The |width| field is the amount of motion down or to the right; the |location| field is the byte number of the \.{DVI} command in question (including the appropriate |dvi_offset|); the |link| field points to the next item below this one on the stack; and the |info| field encodes the options for possible change in the \.{DVI} command. @d movement_node_size 3 /*number of words per entry in the down and right stacks*/ @d location(X) mem[X+2].i /*\.{DVI} byte number for a movement command*/ @<Glob...@>= pointer @!down_ptr, @!right_ptr; /*heads of the down and right stacks*/ @ @<Set init...@>= down_ptr=null;right_ptr=null; @ Here is a subroutine that produces a \.{DVI} command for some specified downward or rightward motion. It has two parameters: |w| is the amount of motion, and |o| is either |down1| or |right1|. We use the fact that the command codes have convenient arithmetic properties: |y1-down1==w1-right1| and |z1-down1==x1-right1|. @p void movement(scaled @!w, eight_bits @!o) {@+ small_number mstate; /*have we seen a |y| or |z|?*/ pointer @!p, @!q; /*current and top nodes on the stack*/ int @!k; /*index into |dvi_buf|, modulo |dvi_buf_size|*/ q=get_node(movement_node_size); /*new node for the top of the stack*/ width(q)=w;location(q)=dvi_offset+dvi_ptr; if (o==down1) {@+link(q)=down_ptr;down_ptr=q; } else{@+link(q)=right_ptr;right_ptr=q; } @<Look at the other stack entries until deciding what sort of \.{DVI} command to generate; |goto found| if node |p| is a ``hit''@>; @<Generate a |down| or |right| command for |w| and |return|@>; found: @<Generate a |y0| or |z0| command in order to reuse a previous appearance of~|w|@>; } @ The |info| fields in the entries of the down stack or the right stack have six possible settings: |y_here| or |z_here| mean that the \.{DVI} command refers to |y| or |z|, respectively (or to |w| or |x|, in the case of horizontal motion); |yz_OK| means that the \.{DVI} command is \\{down} (or \\{right}) but can be changed to either |y| or |z| (or to either |w| or |x|); |y_OK| means that it is \\{down} and can be changed to |y| but not |z|; |z_OK| is similar; and |d_fixed| means it must stay \\{down}. The four settings |yz_OK|, |y_OK|, |z_OK|, |d_fixed| would not need to be distinguished from each other if we were simply solving the digit-subscripting problem mentioned above. But in \TeX's case there is a complication because of the nested structure of |push| and |pop| commands. Suppose we add parentheses to the digit-subscripting problem, redefining hits so that $\delta_y\ldots \delta_y$ is a hit if all $y$'s between the $\delta$'s are enclosed in properly nested parentheses, and if the parenthesis level of the right-hand $\delta_y$ is deeper than or equal to that of the left-hand one. Thus, `(' and `)' correspond to `|push|' and `|pop|'. Now if we want to assign a subscript to the final 1 in the sequence $$2_y\,7_d\,1_d\,(\,8_z\,2_y\,8_z\,)\,1$$ we cannot change the previous $1_d$ to $1_y$, since that would invalidate the $2_y\ldots2_y$ hit. But we can change it to $1_z$, scoring a hit since the intervening $8_z$'s are enclosed in parentheses. The program below removes movement nodes that are introduced after a |push|, before it outputs the corresponding |pop|. @d y_here 1 /*|info| when the movement entry points to a |y| command*/ @d z_here 2 /*|info| when the movement entry points to a |z| command*/ @d yz_OK 3 /*|info| corresponding to an unconstrained \\{down} command*/ @d y_OK 4 /*|info| corresponding to a \\{down} that can't become a |z|*/ @d z_OK 5 /*|info| corresponding to a \\{down} that can't become a |y|*/ @d d_fixed 6 /*|info| corresponding to a \\{down} that can't change*/ @ When the |movement| procedure gets to the label |found|, the value of |info(p)| will be either |y_here| or |z_here|. If it is, say, |y_here|, the procedure generates a |y0| command (or a |w0| command), and marks all |info| fields between |q| and |p| so that |y| is not OK in that range. @<Generate a |y0| or |z0| command...@>= info(q)=info(p); if (info(q)==y_here) {@+dvi_out(o+y0-down1); /*|y0| or |w0|*/ while (link(q)!=p) {@+q=link(q); switch (info(q)) { case yz_OK: info(q)=z_OK;@+break; case y_OK: info(q)=d_fixed;@+break; default:do_nothing; } } } else{@+dvi_out(o+z0-down1); /*|z0| or |x0|*/ while (link(q)!=p) {@+q=link(q); switch (info(q)) { case yz_OK: info(q)=y_OK;@+break; case z_OK: info(q)=d_fixed;@+break; default:do_nothing; } } } @ @<Generate a |down| or |right|...@>= info(q)=yz_OK; if (abs(w) >= 040000000) {@+dvi_out(o+3); /*|down4| or |right4|*/ dvi_four(w);return; } if (abs(w) >= 0100000) {@+dvi_out(o+2); /*|down3| or |right3|*/ if (w < 0) w=w+0100000000; dvi_out(w/0200000);w=w%0200000;goto label2; } if (abs(w) >= 0200) {@+dvi_out(o+1); /*|down2| or |right2|*/ if (w < 0) w=w+0200000; goto label2; } dvi_out(o); /*|down1| or |right1|*/ if (w < 0) w=w+0400; goto label1; label2: dvi_out(w/0400); label1: dvi_out(w%0400);return @ As we search through the stack, we are in one of three states, |y_seen|, |z_seen|, or |none_seen|, depending on whether we have encountered |y_here| or |z_here| nodes. These states are encoded as multiples of 6, so that they can be added to the |info| fields for quick decision-making. @^inner loop@> @d none_seen 0 /*no |y_here| or |z_here| nodes have been encountered yet*/ @d y_seen 6 /*we have seen |y_here| but not |z_here|*/ @d z_seen 12 /*we have seen |z_here| but not |y_here|*/ @<Look at the other stack entries until deciding...@>= p=link(q);mstate=none_seen; while (p!=null) {@+if (width(p)==w) @<Consider a node with matching width; |goto found| if it's a hit@>@; else switch (mstate+info(p)) { case none_seen+y_here: mstate=y_seen;@+break; case none_seen+z_here: mstate=z_seen;@+break; case y_seen+z_here: case z_seen+y_here: goto not_found; default:do_nothing; } p=link(p); } not_found: @ We might find a valid hit in a |y| or |z| byte that is already gone from the buffer. But we can't change bytes that are gone forever; ``the moving finger writes, $\ldots\,\,$.'' @<Consider a node with matching width...@>= switch (mstate+info(p)) { case none_seen+yz_OK: case none_seen+y_OK: case z_seen+yz_OK: case z_seen+y_OK: @t@>@;@/ if (location(p) < dvi_gone) goto not_found; else@<Change buffered instruction to |y| or |w| and |goto found|@>@;@+break; case none_seen+z_OK: case y_seen+yz_OK: case y_seen+z_OK: @t@>@;@/ if (location(p) < dvi_gone) goto not_found; else@<Change buffered instruction to |z| or |x| and |goto found|@>@;@+break; case none_seen+y_here: case none_seen+z_here: case y_seen+z_here: case z_seen+y_here: goto found; default:do_nothing; } @ @<Change buffered instruction to |y| or |w| and |goto found|@>= {@+k=location(p)-dvi_offset; if (k < 0) k=k+dvi_buf_size; dvi_buf[k]=dvi_buf[k]+y1-down1; info(p)=y_here;goto found; } @ @<Change buffered instruction to |z| or |x| and |goto found|@>= {@+k=location(p)-dvi_offset; if (k < 0) k=k+dvi_buf_size; dvi_buf[k]=dvi_buf[k]+z1-down1; info(p)=z_here;goto found; } @ In case you are wondering when all the movement nodes are removed from \TeX's memory, the answer is that they are recycled just before |hlist_out| and |vlist_out| finish outputting a box. This restores the down and right stacks to the state they were in before the box was output, except that some |info|'s may have become more restrictive. @p void prune_movements(int @!l) /*delete movement nodes with |location >= l|*/ {@+ pointer p; /*node being deleted*/ while (down_ptr!=null) {@+if (location(down_ptr) < l) goto done; p=down_ptr;down_ptr=link(p);free_node(p, movement_node_size); } done: while (right_ptr!=null) {@+if (location(right_ptr) < l) return; p=right_ptr;right_ptr=link(p);free_node(p, movement_node_size); } } @ The actual distances by which we want to move might be computed as the sum of several separate movements. For example, there might be several glue nodes in succession, or we might want to move right by the width of some box plus some amount of glue. More importantly, the baselineskip distances are computed in terms of glue together with the depth and height of adjacent boxes, and we want the \.{DVI} file to lump these three quantities together into a single motion. Therefore, \TeX\ maintains two pairs of global variables: |dvi_h| and |dvi_v| are the |h| and |v| coordinates corresponding to the commands actually output to the \.{DVI} file, while |cur_h| and |cur_v| are the coordinates corresponding to the current state of the output routines. Coordinate changes will accumulate in |cur_h| and |cur_v| without being reflected in the output, until such a change becomes necessary or desirable; we can call the |movement| procedure whenever we want to make |dvi_h==cur_h| or |dvi_v==cur_v|. The current font reflected in the \.{DVI} output is called |dvi_f|; there is no need for a `\\{cur\_f}' variable. The depth of nesting of |hlist_out| and |vlist_out| is called |cur_s|; this is essentially the depth of |push| commands in the \.{DVI} output. @d synch_h if (cur_h!=dvi_h) {@+movement(cur_h-dvi_h, right1);dvi_h=cur_h; } @d synch_v if (cur_v!=dvi_v) {@+movement(cur_v-dvi_v, down1);dvi_v=cur_v; } @<Glob...@>= scaled @!dvi_h, @!dvi_v; /*a \.{DVI} reader program thinks we are here*/ scaled @!cur_h, @!cur_v; /*\TeX\ thinks we are here*/ internal_font_number @!dvi_f; /*the current font*/ int @!cur_s; /*current depth of output box nesting, initially $-1$*/ @ @<Initialize variables as |ship_out| begins@>= dvi_h=0;dvi_v=0;cur_h=h_offset;dvi_f=null_font; @<ensure dvi open@>; if (total_pages==0) {@+dvi_out(pre);dvi_out(id_byte); /*output the preamble*/ @^preamble of \.{DVI} file@> dvi_four(25400000);dvi_four(473628672); /*conversion ratio for sp*/ prepare_mag();dvi_four(mag); /*magnification factor is frozen*/ old_setting=selector;selector=new_string; print_str(" TeX output ");print_int(year);print_char('.'); print_two(month);print_char('.');print_two(day); print_char(':');print_two(time/60); print_two(time%60); selector=old_setting;dvi_out(cur_length); for (s=str_start[str_ptr]; s<=pool_ptr-1; s++) dvi_out(so(str_pool[s])); pool_ptr=str_start[str_ptr]; /*flush the current string*/ } @ When |hlist_out| is called, its duty is to output the box represented by the |hlist_node| pointed to by |temp_ptr|. The reference point of that box has coordinates |(cur_h, cur_v)|. Similarly, when |vlist_out| is called, its duty is to output the box represented by the |vlist_node| pointed to by |temp_ptr|. The reference point of that box has coordinates |(cur_h, cur_v)|. @^recursion@> @p void vlist_out(void); /*|hlist_out| and |vlist_out| are mutually recursive*/ @ The recursive procedures |hlist_out| and |vlist_out| each have local variables |save_h| and |save_v| to hold the values of |dvi_h| and |dvi_v| just before entering a new level of recursion. In effect, the values of |save_h| and |save_v| on \TeX's run-time stack correspond to the values of |h| and |v| that a \.{DVI}-reading program will push onto its coordinate stack. @p@t\4@>@<Declare procedures needed in |hlist_out|, |vlist_out|@>@t@>@/ void hlist_out(void) /*output an |hlist_node| box*/ {@+ scaled base_line; /*the baseline coordinate for this box*/ scaled @!left_edge; /*the left coordinate for this box*/ scaled @!save_h, @!save_v; /*what |dvi_h| and |dvi_v| should pop to*/ pointer @!this_box; /*pointer to containing box*/ glue_ord @!g_order; /*applicable order of infinity for glue*/ uint8_t @!g_sign; /*selects type of glue*/ pointer @!p; /*current position in the hlist*/ int @!save_loc; /*\.{DVI} byte location upon entry*/ pointer @!leader_box; /*the leader box being replicated*/ scaled @!leader_wd; /*width of leader box being replicated*/ scaled @!lx; /*extra space between leader boxes*/ bool @!outer_doing_leaders; /*were we doing leaders?*/ scaled @!edge; /*left edge of sub-box, or right edge of leader space*/ double @!glue_temp; /*glue value before rounding*/ double @!cur_glue; /*glue seen so far*/ scaled @!cur_g; /*rounded equivalent of |cur_glue| times the glue ratio*/ cur_g=0;cur_glue=float_constant(0); this_box=temp_ptr;g_order=glue_order(this_box); g_sign=glue_sign(this_box);p=list_ptr(this_box); incr(cur_s); if (cur_s > 0) dvi_out(push); if (cur_s > max_push) max_push=cur_s; save_loc=dvi_offset+dvi_ptr;base_line=cur_v;left_edge=cur_h; while (p!=null) @<Output node |p| for |hlist_out| and move to the next node, maintaining the condition |cur_v=base_line|@>; prune_movements(save_loc); if (cur_s > 0) dvi_pop(save_loc); decr(cur_s); } @ We ought to give special care to the efficiency of one part of |hlist_out|, since it belongs to \TeX's inner loop. When a |char_node| is encountered, we save a little time by processing several nodes in succession until reaching a non-|char_node|. The program uses the fact that |set_char_0==0|. @^inner loop@> @<Output node |p| for |hlist_out|...@>= reswitch: if (is_char_node(p)) {@+synch_h;synch_v; @/do@+{f=font(p);c=character(p); if (f!=dvi_f) @<Change font |dvi_f| to |f|@>; if (c >= qi(128)) dvi_out(set1); dvi_out(qo(c));@/ cur_h=cur_h+char_width(f)(char_info(f)(c)); p=link(p); }@+ while (!(!is_char_node(p))); dvi_h=cur_h; } else@<Output the non-|char_node| |p| for |hlist_out| and move to the next node@>@; @ @<Change font |dvi_f| to |f|@>= {@+if (!font_used[f]) {@+dvi_font_def(f);font_used[f]=true; } if (f <= 64+font_base) dvi_out(f-font_base-1+fnt_num_0)@; else{@+dvi_out(fnt1);dvi_out(f-font_base-1); } dvi_f=f; } @ @<Output the non-|char_node| |p| for |hlist_out|...@>= {@+switch (type(p)) { case hlist_node: case vlist_node: @<Output a box in an hlist@>@;@+break; case rule_node: {@+rule_ht=height(p);rule_dp=depth(p);rule_wd=width(p); goto fin_rule; } case whatsit_node: @<Output the whatsit node |p| in an hlist@>;@+break; case glue_node: @<Move right or output leaders@>@; case kern_node: case math_node: cur_h=cur_h+width(p);@+break; case ligature_node: @<Make node |p| look like a |char_node| and |goto reswitch|@>@; default:do_nothing; } @/ goto next_p; fin_rule: @<Output a rule in an hlist@>; move_past: cur_h=cur_h+rule_wd; next_p: p=link(p); } @ @<Output a box in an hlist@>= if (list_ptr(p)==null) cur_h=cur_h+width(p); else{@+save_h=dvi_h;save_v=dvi_v; cur_v=base_line+shift_amount(p); /*shift the box down*/ temp_ptr=p;edge=cur_h; if (type(p)==vlist_node) vlist_out();@+else hlist_out(); dvi_h=save_h;dvi_v=save_v; cur_h=edge+width(p);cur_v=base_line; } @ @<Output a rule in an hlist@>= if (is_running(rule_ht)) rule_ht=height(this_box); if (is_running(rule_dp)) rule_dp=depth(this_box); rule_ht=rule_ht+rule_dp; /*this is the rule thickness*/ if ((rule_ht > 0)&&(rule_wd > 0)) /*we don't output empty rules*/ {@+synch_h;cur_v=base_line+rule_dp;synch_v; dvi_out(set_rule);dvi_four(rule_ht);dvi_four(rule_wd); cur_v=base_line;dvi_h=dvi_h+rule_wd; } @ @d billion float_constant(1000000000) @d vet_glue(X) glue_temp=X; if (glue_temp > billion) glue_temp=billion; else if (glue_temp < -billion) glue_temp=-billion @<Move right or output leaders@>= {@+g=glue_ptr(p);rule_wd=width(g)-cur_g; if (g_sign!=normal) {@+if (g_sign==stretching) {@+if (stretch_order(g)==g_order) {@+cur_glue=cur_glue+stretch(g); vet_glue(float(glue_set(this_box))*cur_glue); @^real multiplication@> cur_g=round(glue_temp); } } else if (shrink_order(g)==g_order) {@+cur_glue=cur_glue-shrink(g); vet_glue(float(glue_set(this_box))*cur_glue); cur_g=round(glue_temp); } } rule_wd=rule_wd+cur_g; if (subtype(p) >= a_leaders) @<Output leaders in an hlist, |goto fin_rule| if a rule or to |next_p| if done@>; goto move_past; } @ @<Output leaders in an hlist...@>= {@+leader_box=leader_ptr(p); if (type(leader_box)==rule_node) {@+rule_ht=height(leader_box);rule_dp=depth(leader_box); goto fin_rule; } leader_wd=width(leader_box); if ((leader_wd > 0)&&(rule_wd > 0)) {@+rule_wd=rule_wd+10; /*compensate for floating-point rounding*/ edge=cur_h+rule_wd;lx=0; @<Let |cur_h| be the position of the first box, and set |leader_wd+lx| to the spacing between corresponding parts of boxes@>; while (cur_h+leader_wd <= edge) @<Output a leader box at |cur_h|, then advance |cur_h| by |leader_wd+lx|@>; cur_h=edge-10;goto next_p; } } @ The calculations related to leaders require a bit of care. First, in the case of |a_leaders| (aligned leaders), we want to move |cur_h| to |left_edge| plus the smallest multiple of |leader_wd| for which the result is not less than the current value of |cur_h|; i.e., |cur_h| should become $|left_edge|+|leader_wd|\times\lceil (|cur_h|-|left_edge|)/|leader_wd|\rceil$. The program here should work in all cases even though some implementations of \PASCAL\ give nonstandard results for the |/| operation when |cur_h| is less than |left_edge|. In the case of |c_leaders| (centered leaders), we want to increase |cur_h| by half of the excess space not occupied by the leaders; and in the case of |x_leaders| (expanded leaders) we increase |cur_h| by $1/(q+1)$ of this excess space, where $q$ is the number of times the leader box will be replicated. Slight inaccuracies in the division might accumulate; half of this rounding error is placed at each end of the leaders. @<Let |cur_h| be the position of the first box,...@>= if (subtype(p)==a_leaders) {@+save_h=cur_h; cur_h=left_edge+leader_wd*((cur_h-left_edge)/leader_wd); if (cur_h < save_h) cur_h=cur_h+leader_wd; } else{@+lq=rule_wd/leader_wd; /*the number of box copies*/ lr=rule_wd%leader_wd; /*the remaining space*/ if (subtype(p)==c_leaders) cur_h=cur_h+(lr/2); else{@+lx=lr/(lq+1); cur_h=cur_h+((lr-(lq-1)*lx)/2); } } @ The `\\{synch}' operations here are intended to decrease the number of bytes needed to specify horizontal and vertical motion in the \.{DVI} output. @<Output a leader box at |cur_h|,...@>= {@+cur_v=base_line+shift_amount(leader_box);synch_v;save_v=dvi_v;@/ synch_h;save_h=dvi_h;temp_ptr=leader_box; outer_doing_leaders=doing_leaders;doing_leaders=true; if (type(leader_box)==vlist_node) vlist_out();@+else hlist_out(); doing_leaders=outer_doing_leaders; dvi_v=save_v;dvi_h=save_h;cur_v=base_line; cur_h=save_h+leader_wd+lx; } @ The |vlist_out| routine is similar to |hlist_out|, but a bit simpler. @p void vlist_out(void) /*output a |vlist_node| box*/ {@+ scaled left_edge; /*the left coordinate for this box*/ scaled @!top_edge; /*the top coordinate for this box*/ scaled @!save_h, @!save_v; /*what |dvi_h| and |dvi_v| should pop to*/ pointer @!this_box; /*pointer to containing box*/ glue_ord @!g_order; /*applicable order of infinity for glue*/ uint8_t @!g_sign; /*selects type of glue*/ pointer @!p; /*current position in the vlist*/ int @!save_loc; /*\.{DVI} byte location upon entry*/ pointer @!leader_box; /*the leader box being replicated*/ scaled @!leader_ht; /*height of leader box being replicated*/ scaled @!lx; /*extra space between leader boxes*/ bool @!outer_doing_leaders; /*were we doing leaders?*/ scaled @!edge; /*bottom boundary of leader space*/ double @!glue_temp; /*glue value before rounding*/ double @!cur_glue; /*glue seen so far*/ scaled @!cur_g; /*rounded equivalent of |cur_glue| times the glue ratio*/ cur_g=0;cur_glue=float_constant(0); this_box=temp_ptr;g_order=glue_order(this_box); g_sign=glue_sign(this_box);p=list_ptr(this_box); incr(cur_s); if (cur_s > 0) dvi_out(push); if (cur_s > max_push) max_push=cur_s; save_loc=dvi_offset+dvi_ptr;left_edge=cur_h;cur_v=cur_v-height(this_box); top_edge=cur_v; while (p!=null) @<Output node |p| for |vlist_out| and move to the next node, maintaining the condition |cur_h=left_edge|@>; prune_movements(save_loc); if (cur_s > 0) dvi_pop(save_loc); decr(cur_s); } @ @<Output node |p| for |vlist_out|...@>= {@+if (is_char_node(p)) confusion(@[@<|"vlistout"|@>@]); @:this can't happen vlistout}{\quad vlistout@> else@<Output the non-|char_node| |p| for |vlist_out|@>; next_p: p=link(p); } @ @<Output the non-|char_node| |p| for |vlist_out|@>= {@+switch (type(p)) { case hlist_node: case vlist_node: @<Output a box in a vlist@>@;@+break; case rule_node: {@+rule_ht=height(p);rule_dp=depth(p);rule_wd=width(p); goto fin_rule; } case whatsit_node: @<Output the whatsit node |p| in a vlist@>;@+break; case glue_node: @<Move down or output leaders@>@; case kern_node: cur_v=cur_v+width(p);@+break; default:do_nothing; } @/ goto next_p; fin_rule: @<Output a rule in a vlist, |goto next_p|@>; move_past: cur_v=cur_v+rule_ht; } @ The |synch_v| here allows the \.{DVI} output to use one-byte commands for adjusting |v| in most cases, since the baselineskip distance will usually be constant. @<Output a box in a vlist@>= if (list_ptr(p)==null) cur_v=cur_v+height(p)+depth(p); else{@+cur_v=cur_v+height(p);synch_v; save_h=dvi_h;save_v=dvi_v; cur_h=left_edge+shift_amount(p); /*shift the box right*/ temp_ptr=p; if (type(p)==vlist_node) vlist_out();@+else hlist_out(); dvi_h=save_h;dvi_v=save_v; cur_v=save_v+depth(p);cur_h=left_edge; } @ @<Output a rule in a vlist...@>= if (is_running(rule_wd)) rule_wd=width(this_box); rule_ht=rule_ht+rule_dp; /*this is the rule thickness*/ cur_v=cur_v+rule_ht; if ((rule_ht > 0)&&(rule_wd > 0)) /*we don't output empty rules*/ {@+synch_h;synch_v; dvi_out(put_rule);dvi_four(rule_ht);dvi_four(rule_wd); } goto next_p @ @<Move down or output leaders@>= {@+g=glue_ptr(p);rule_ht=width(g)-cur_g; if (g_sign!=normal) {@+if (g_sign==stretching) {@+if (stretch_order(g)==g_order) {@+cur_glue=cur_glue+stretch(g); vet_glue(float(glue_set(this_box))*cur_glue); @^real multiplication@> cur_g=round(glue_temp); } } else if (shrink_order(g)==g_order) {@+cur_glue=cur_glue-shrink(g); vet_glue(float(glue_set(this_box))*cur_glue); cur_g=round(glue_temp); } } rule_ht=rule_ht+cur_g; if (subtype(p) >= a_leaders) @<Output leaders in a vlist, |goto fin_rule| if a rule or to |next_p| if done@>; goto move_past; } @ @<Output leaders in a vlist...@>= {@+leader_box=leader_ptr(p); if (type(leader_box)==rule_node) {@+rule_wd=width(leader_box);rule_dp=0; goto fin_rule; } leader_ht=height(leader_box)+depth(leader_box); if ((leader_ht > 0)&&(rule_ht > 0)) {@+rule_ht=rule_ht+10; /*compensate for floating-point rounding*/ edge=cur_v+rule_ht;lx=0; @<Let |cur_v| be the position of the first box, and set |leader_ht+lx| to the spacing between corresponding parts of boxes@>; while (cur_v+leader_ht <= edge) @<Output a leader box at |cur_v|, then advance |cur_v| by |leader_ht+lx|@>; cur_v=edge-10;goto next_p; } } @ @<Let |cur_v| be the position of the first box,...@>= if (subtype(p)==a_leaders) {@+save_v=cur_v; cur_v=top_edge+leader_ht*((cur_v-top_edge)/leader_ht); if (cur_v < save_v) cur_v=cur_v+leader_ht; } else{@+lq=rule_ht/leader_ht; /*the number of box copies*/ lr=rule_ht%leader_ht; /*the remaining space*/ if (subtype(p)==c_leaders) cur_v=cur_v+(lr/2); else{@+lx=lr/(lq+1); cur_v=cur_v+((lr-(lq-1)*lx)/2); } } @ When we reach this part of the program, |cur_v| indicates the top of a leader box, not its baseline. @<Output a leader box at |cur_v|,...@>= {@+cur_h=left_edge+shift_amount(leader_box);synch_h;save_h=dvi_h;@/ cur_v=cur_v+height(leader_box);synch_v;save_v=dvi_v; temp_ptr=leader_box; outer_doing_leaders=doing_leaders;doing_leaders=true; if (type(leader_box)==vlist_node) vlist_out();@+else hlist_out(); doing_leaders=outer_doing_leaders; dvi_v=save_v;dvi_h=save_h;cur_h=left_edge; cur_v=save_v-height(leader_box)+leader_ht+lx; } @ The |hlist_out| and |vlist_out| procedures are now complete, so we are ready for the |ship_out| routine that gets them started in the first place. @p void ship_out(pointer @!p) /*output the box |p|*/ {@+ int page_loc; /*location of the current |bop|*/ uint8_t @!j, @!k; /*indices to first ten count registers*/ int @!s; /*index into |str_pool|*/ uint8_t @!old_setting; /*saved |selector| setting*/ if (tracing_output > 0) {@+print_nl("");print_ln(); print_str("Completed box being shipped out"); @.Completed box...@> } if (term_offset > max_print_line-9) print_ln(); else if ((term_offset > 0)||(file_offset > 0)) print_char(' '); print_char('[');j=9; while ((count(j)==0)&&(j > 0)) decr(j); for (k=0; k<=j; k++) {@+print_int(count(k)); if (k < j) print_char('.'); } update_terminal; if (tracing_output > 0) {@+print_char(']'); begin_diagnostic();show_box(p);end_diagnostic(true); } @<Ship box |p| out@>; if (tracing_output <= 0) print_char(']'); dead_cycles=0; update_terminal; /*progress report*/ @<Flush the box from memory, showing statistics if requested@>; } @ @<Flush the box from memory, showing statistics if requested@>= #ifdef @!STAT if (tracing_stats > 1) {@+print_nl("Memory usage before: "); @.Memory usage...@> print_int(var_used);print_char('&'); print_int(dyn_used);print_char(';'); } #endif flush_node_list(p); #ifdef @!STAT if (tracing_stats > 1) {@+print_str(" after: "); print_int(var_used);print_char('&'); print_int(dyn_used);print_str("; still untouched: "); print_int(hi_mem_min-lo_mem_max-1);print_ln(); } #endif @ @<Ship box |p| out@>= @<Update the values of |max_h| and |max_v|; but if the page is too large, |goto done|@>; @<Initialize variables as |ship_out| begins@>; page_loc=dvi_offset+dvi_ptr; dvi_out(bop); for (k=0; k<=9; k++) dvi_four(count(k)); dvi_four(last_bop);last_bop=page_loc; cur_v=height(p)+v_offset;temp_ptr=p; if (type(p)==vlist_node) vlist_out();@+else hlist_out(); dvi_out(eop);incr(total_pages);cur_s=-1; done: @ Sometimes the user will generate a huge page because other error messages are being ignored. Such pages are not output to the \.{dvi} file, since they may confuse the printing software. @<Update the values of |max_h| and |max_v|; but if the page is too large...@>= if ((height(p) > max_dimen)||@|(depth(p) > max_dimen)||@| (height(p)+depth(p)+v_offset > max_dimen)||@| (width(p)+h_offset > max_dimen)) {@+print_err("Huge page cannot be shipped out"); @.Huge page...@> help2("The page just created is more than 18 feet tall or")@/ ("more than 18 feet wide, so I suspect something went wrong."); error(); if (tracing_output <= 0) {@+begin_diagnostic(); print_nl("The following box has been deleted:"); @.The following...deleted@> show_box(p); end_diagnostic(true); } goto done; } if (height(p)+depth(p)+v_offset > max_v) max_v=height(p)+depth(p)+v_offset; if (width(p)+h_offset > max_h) max_h=width(p)+h_offset @ At the end of the program, we must finish things off by writing the post\-amble. If |total_pages==0|, the \.{DVI} file was never opened. If |total_pages >= 65536|, the \.{DVI} file will lie. And if |max_push >= 65536|, the user deserves whatever chaos might ensue. An integer variable |k| will be declared for use by this routine. @<Finish the \.{DVI} file@>= while (cur_s > -1) {@+if (cur_s > 0) dvi_out(pop)@; else{@+dvi_out(eop);incr(total_pages); } decr(cur_s); } if (total_pages==0) print_nl("No pages of output."); @.No pages of output@> else{@+dvi_out(post); /*beginning of the postamble*/ dvi_four(last_bop);last_bop=dvi_offset+dvi_ptr-5; /*|post| location*/ dvi_four(25400000);dvi_four(473628672); /*conversion ratio for sp*/ prepare_mag();dvi_four(mag); /*magnification factor*/ dvi_four(max_v);dvi_four(max_h);@/ dvi_out(max_push/256);dvi_out(max_push%256);@/ dvi_out((total_pages/256)%256);dvi_out(total_pages%256);@/ @<Output the font definitions for all fonts that were used@>; dvi_out(post_post);dvi_four(last_bop);dvi_out(id_byte);@/ k=4+((dvi_buf_size-dvi_ptr)%4); /*the number of 223's*/ while (k > 0) {@+dvi_out(223);decr(k); } @<Empty the last bytes out of |dvi_buf|@>; print_nl("Output written on ");slow_print(output_file_name); @.Output written on x@> print_str(" (");print_int(total_pages);print_str(" page"); if (total_pages!=1) print_char('s'); print_str(", ");print_int(dvi_offset+dvi_ptr);print_str(" bytes)."); b_close(&dvi_file); } @ @<Output the font definitions...@>= while (font_ptr > font_base) {@+if (font_used[font_ptr]) dvi_font_def(font_ptr); decr(font_ptr); } @* Packaging. We're essentially done with the parts of \TeX\ that are concerned with the input (|get_next|) and the output (|ship_out|). So it's time to get heavily into the remaining part, which does the real work of typesetting. After lists are constructed, \TeX\ wraps them up and puts them into boxes. Two major subroutines are given the responsibility for this task: |hpack| applies to horizontal lists (hlists) and |vpack| applies to vertical lists (vlists). The main duty of |hpack| and |vpack| is to compute the dimensions of the resulting boxes, and to adjust the glue if one of those dimensions is pre-specified. The computed sizes normally enclose all of the material inside the new box; but some items may stick out if negative glue is used, if the box is overfull, or if a \.{\\vbox} includes other boxes that have been shifted left. The subroutine call |hpack(p, w, m)| returns a pointer to an |hlist_node| for a box containing the hlist that starts at |p|. Parameter |w| specifies a width; and parameter |m| is either `|exactly|' or `|additional|'. Thus, |hpack(p, w, exactly)| produces a box whose width is exactly |w|, while |hpack(p, w, additional)| yields a box whose width is the natural width plus |w|. It is convenient to define a macro called `|natural|' to cover the most common case, so that we can say |hpack(p, natural)| to get a box that has the natural width of list |p|. Similarly, |vpack(p, w, m)| returns a pointer to a |vlist_node| for a box containing the vlist that starts at |p|. In this case |w| represents a height instead of a width; the parameter |m| is interpreted as in |hpack|. @d exactly 0 /*a box dimension is pre-specified*/ @d additional 1 /*a box dimension is increased from the natural one*/ @d natural 0, additional /*shorthand for parameters to |hpack| and |vpack|*/ @ The parameters to |hpack| and |vpack| correspond to \TeX's primitives like `\.{\\hbox} \.{to} \.{300pt}', `\.{\\hbox} \.{spread} \.{10pt}'; note that `\.{\\hbox}' with no dimension following it is equivalent to `\.{\\hbox} \.{spread} \.{0pt}'. The |scan_spec| subroutine scans such constructions in the user's input, including the mandatory left brace that follows them, and it puts the specification onto |save_stack| so that the desired box can later be obtained by executing the following code: $$\vbox{\halign{#\hfil\cr |save_ptr=save_ptr-2;|\cr |hpack(p, saved(1), saved(0)).|\cr}}$$ Special care is necessary to ensure that the special |save_stack| codes are placed just below the new group code, because scanning can change |save_stack| when \.{\\csname} appears. @p void scan_spec(group_code @!c, bool @!three_codes) /*scans a box specification and left brace*/ {@+ int @!s; /*temporarily saved value*/ uint8_t @!spec_code; if (three_codes) s=saved(0); if (scan_keyword(@[@<|"to"|@>@])) spec_code=exactly; @.to@> else if (scan_keyword(@[@<|"spread"|@>@])) spec_code=additional; @.spread@> else{@+spec_code=additional;cur_val=0; goto found; } scan_normal_dimen; found: if (three_codes) {@+saved(0)=s;incr(save_ptr); } saved(0)=spec_code;saved(1)=cur_val;save_ptr=save_ptr+2; new_save_level(c);scan_left_brace(); } @ To figure out the glue setting, |hpack| and |vpack| determine how much stretchability and shrinkability are present, considering all four orders of infinity. The highest order of infinity that has a nonzero coefficient is then used as if no other orders were present. For example, suppose that the given list contains six glue nodes with the respective stretchabilities 3pt, 8fill, 5fil, 6pt, $-3$fil, $-8$fill. Then the total is essentially 2fil; and if a total additional space of 6pt is to be achieved by stretching, the actual amounts of stretch will be 0pt, 0pt, 15pt, 0pt, $-9$pt, and 0pt, since only `fil' glue will be considered. (The `fill' glue is therefore not really stretching infinitely with respect to `fil'; nobody would actually want that to happen.) The arrays |total_stretch| and |total_shrink| are used to determine how much glue of each kind is present. A global variable |last_badness| is used to implement \.{\\badness}. @<Glob...@>= scaled @!total_stretch0[filll-normal+1], *const @!total_stretch = @!total_stretch0-normal, @!total_shrink0[filll-normal+1], *const @!total_shrink = @!total_shrink0-normal; /*glue found by |hpack| or |vpack|*/ int @!last_badness; /*badness of the most recently packaged box*/ @ If the global variable |adjust_tail| is non-null, the |hpack| routine also removes all occurrences of |ins_node|, |mark_node|, and |adjust_node| items and appends the resulting material onto the list that ends at location |adjust_tail|. @<Glob...@>= pointer @!adjust_tail; /*tail of adjustment list*/ @ @<Set init...@>=adjust_tail=null;last_badness=0; @ Here now is |hpack|, which contains few if any surprises. @p pointer hpack(pointer @!p, scaled @!w, small_number @!m) {@+ pointer r; /*the box node that will be returned*/ pointer @!q; /*trails behind |p|*/ scaled @!h, @!d, @!x; /*height, depth, and natural width*/ scaled @!s; /*shift amount*/ pointer @!g; /*points to a glue specification*/ glue_ord @!o; /*order of infinity*/ internal_font_number @!f; /*the font in a |char_node|*/ four_quarters @!i; /*font information about a |char_node|*/ eight_bits @!hd; /*height and depth indices for a character*/ last_badness=0;r=get_node(box_node_size);type(r)=hlist_node; subtype(r)=min_quarterword;shift_amount(r)=0; q=r+list_offset;link(q)=p;@/ h=0;@<Clear dimensions to zero@>; while (p!=null) @<Examine node |p| in the hlist, taking account of its effect on the dimensions of the new box, or moving it to the adjustment list; then advance |p| to the next node@>; if (adjust_tail!=null) link(adjust_tail)=null; height(r)=h;depth(r)=d;@/ @<Determine the value of |width(r)| and the appropriate glue setting; then |return| or |goto common_ending|@>; common_ending: @<Finish issuing a diagnostic message for an overfull or underfull hbox@>; end: return r; } @ @<Clear dimensions to zero@>= d=0;x=0; total_stretch[normal]=0;total_shrink[normal]=0; total_stretch[fil]=0;total_shrink[fil]=0; total_stretch[fill]=0;total_shrink[fill]=0; total_stretch[filll]=0;total_shrink[filll]=0 @ @<Examine node |p| in the hlist, taking account of its effect...@>= @^inner loop@> {@+reswitch: while (is_char_node(p)) @<Incorporate character dimensions into the dimensions of the hbox that will contain~it, then move to the next node@>; if (p!=null) {@+switch (type(p)) { case hlist_node: case vlist_node: case rule_node: case unset_node: @<Incorporate box dimensions into the dimensions of the hbox that will contain~it@>@;@+break; case ins_node: case mark_node: case adjust_node: if (adjust_tail!=null) @<Transfer node |p| to the adjustment list@>@;@+break; case whatsit_node: @<Incorporate a whatsit node into an hbox@>;@+break; case glue_node: @<Incorporate glue into the horizontal totals@>@;@+break; case kern_node: case math_node: x=x+width(p);@+break; case ligature_node: @<Make node |p| look like a |char_node| and |goto reswitch|@>@; default:do_nothing; } @/ p=link(p); } } @ @<Make node |p| look like a |char_node| and |goto reswitch|@>= {@+mem[lig_trick]=mem[lig_char(p)];link(lig_trick)=link(p); p=lig_trick;goto reswitch; } @ The code here implicitly uses the fact that running dimensions are indicated by |null_flag|, which will be ignored in the calculations because it is a highly negative number. @<Incorporate box dimensions into the dimensions of the hbox...@>= {@+x=x+width(p); if (type(p) >= rule_node) s=0;@+else s=shift_amount(p); if (height(p)-s > h) h=height(p)-s; if (depth(p)+s > d) d=depth(p)+s; } @ The following code is part of \TeX's inner loop; i.e., adding another character of text to the user's input will cause each of these instructions to be exercised one more time. @^inner loop@> @<Incorporate character dimensions into the dimensions of the hbox...@>= {@+f=font(p);i=char_info(f)(character(p));hd=height_depth(i); x=x+char_width(f)(i);@/ s=char_height(f)(hd);@+if (s > h) h=s; s=char_depth(f)(hd);@+if (s > d) d=s; p=link(p); } @ Although node |q| is not necessarily the immediate predecessor of node |p|, it always points to some node in the list preceding |p|. Thus, we can delete nodes by moving |q| when necessary. The algorithm takes linear time, and the extra computation does not intrude on the inner loop unless it is necessary to make a deletion. @^inner loop@> @<Transfer node |p| to the adjustment list@>= {@+while (link(q)!=p) q=link(q); if (type(p)==adjust_node) {@+link(adjust_tail)=adjust_ptr(p); while (link(adjust_tail)!=null) adjust_tail=link(adjust_tail); p=link(p);free_node(link(q), small_node_size); } else{@+link(adjust_tail)=p;adjust_tail=p;p=link(p); } link(q)=p;p=q; } @ @<Incorporate glue into the horizontal totals@>= {@+g=glue_ptr(p);x=x+width(g);@/ o=stretch_order(g);total_stretch[o]=total_stretch[o]+stretch(g); o=shrink_order(g);total_shrink[o]=total_shrink[o]+shrink(g); if (subtype(p) >= a_leaders) {@+g=leader_ptr(p); if (height(g) > h) h=height(g); if (depth(g) > d) d=depth(g); } } @ When we get to the present part of the program, |x| is the natural width of the box being packaged. @<Determine the value of |width(r)| and the appropriate glue setting...@>= if (m==additional) w=x+w; width(r)=w;x=w-x; /*now |x| is the excess to be made up*/ if (x==0) {@+glue_sign(r)=normal;glue_order(r)=normal; set_glue_ratio_zero(glue_set(r)); goto end; } else if (x > 0) @<Determine horizontal glue stretch setting, then |return| or \hbox{|goto common_ending|}@>@; else@<Determine horizontal glue shrink setting, then |return| or \hbox{|goto common_ending|}@>@; @ @<Determine horizontal glue stretch setting...@>= {@+@<Determine the stretch order@>; glue_order(r)=o;glue_sign(r)=stretching; if (total_stretch[o]!=0) glue_set(r)=unfloat(x/(double)total_stretch[o]); @^real division@> else{@+glue_sign(r)=normal; set_glue_ratio_zero(glue_set(r)); /*there's nothing to stretch*/ } if (o==normal) if (list_ptr(r)!=null) @<Report an underfull hbox and |goto common_ending|, if this box is sufficiently bad@>; goto end; } @ @<Determine the stretch order@>= if (total_stretch[filll]!=0) o=filll; else if (total_stretch[fill]!=0) o=fill; else if (total_stretch[fil]!=0) o=fil; else o=normal @ @<Report an underfull hbox and |goto common_ending|, if...@>= {@+last_badness=badness(x, total_stretch[normal]); if (last_badness > hbadness) {@+print_ln(); if (last_badness > 100) print_nl("Underfull");@+else print_nl("Loose"); print_str(" \\hbox (badness ");print_int(last_badness); @.Underfull \\hbox...@> @.Loose \\hbox...@> goto common_ending; } } @ In order to provide a decent indication of where an overfull or underfull box originated, we use a global variable |pack_begin_line| that is set nonzero only when |hpack| is being called by the paragraph builder or the alignment finishing routine. @<Glob...@>= int @!pack_begin_line; /*source file line where the current paragraph or alignment began; a negative value denotes alignment*/ @ @<Set init...@>= pack_begin_line=0; @ @<Finish issuing a diagnostic message for an overfull or underfull hbox@>= if (output_active) print_str(") has occurred while \\output is active"); else{@+if (pack_begin_line!=0) {@+if (pack_begin_line > 0) print_str(") in paragraph at lines "); else print_str(") in alignment at lines "); print_int(abs(pack_begin_line)); print_str("--"); } else print_str(") detected at line "); print_int(line); } print_ln();@/ font_in_short_display=null_font;short_display(list_ptr(r));print_ln();@/ begin_diagnostic();show_box(r);end_diagnostic(true) @ @<Determine horizontal glue shrink setting...@>= {@+@<Determine the shrink order@>; glue_order(r)=o;glue_sign(r)=shrinking; if (total_shrink[o]!=0) glue_set(r)=unfloat((-x)/(double)total_shrink[o]); @^real division@> else{@+glue_sign(r)=normal; set_glue_ratio_zero(glue_set(r)); /*there's nothing to shrink*/ } if ((total_shrink[o] < -x)&&(o==normal)&&(list_ptr(r)!=null)) {@+last_badness=1000000; set_glue_ratio_one(glue_set(r)); /*use the maximum shrinkage*/ @<Report an overfull hbox and |goto common_ending|, if this box is sufficiently bad@>; } else if (o==normal) if (list_ptr(r)!=null) @<Report a tight hbox and |goto common_ending|, if this box is sufficiently bad@>; goto end; } @ @<Determine the shrink order@>= if (total_shrink[filll]!=0) o=filll; else if (total_shrink[fill]!=0) o=fill; else if (total_shrink[fil]!=0) o=fil; else o=normal @ @<Report an overfull hbox and |goto common_ending|, if...@>= if ((-x-total_shrink[normal] > hfuzz)||(hbadness < 100)) {@+if ((overfull_rule > 0)&&(-x-total_shrink[normal] > hfuzz)) {@+while (link(q)!=null) q=link(q); link(q)=new_rule(); width(link(q))=overfull_rule; } print_ln();print_nl("Overfull \\hbox ("); @.Overfull \\hbox...@> print_scaled(-x-total_shrink[normal]);print_str("pt too wide"); goto common_ending; } @ @<Report a tight hbox and |goto common_ending|, if...@>= {@+last_badness=badness(-x, total_shrink[normal]); if (last_badness > hbadness) {@+print_ln();print_nl("Tight \\hbox (badness ");print_int(last_badness); @.Tight \\hbox...@> goto common_ending; } } @ The |vpack| subroutine is actually a special case of a slightly more general routine called |vpackage|, which has four parameters. The fourth parameter, which is |max_dimen| in the case of |vpack|, specifies the maximum depth of the page box that is constructed. The depth is first computed by the normal rules; if it exceeds this limit, the reference point is simply moved down until the limiting depth is attained. @p #define vpack(...) @[vpackage(__VA_ARGS__, max_dimen)@] /*special case of unconstrained depth*/ pointer vpackage(pointer @!p, scaled @!h, small_number @!m, scaled @!l) {@+ pointer r; /*the box node that will be returned*/ scaled @!w, @!d, @!x; /*width, depth, and natural height*/ scaled @!s; /*shift amount*/ pointer @!g; /*points to a glue specification*/ glue_ord @!o; /*order of infinity*/ last_badness=0;r=get_node(box_node_size);type(r)=vlist_node; subtype(r)=min_quarterword;shift_amount(r)=0; list_ptr(r)=p;@/ w=0;@<Clear dimensions to zero@>; while (p!=null) @<Examine node |p| in the vlist, taking account of its effect on the dimensions of the new box; then advance |p| to the next node@>; width(r)=w; if (d > l) {@+x=x+d-l;depth(r)=l; } else depth(r)=d; @<Determine the value of |height(r)| and the appropriate glue setting; then |return| or |goto common_ending|@>; common_ending: @<Finish issuing a diagnostic message for an overfull or underfull vbox@>; end: return r; } @ @<Examine node |p| in the vlist, taking account of its effect...@>= {@+if (is_char_node(p)) confusion(@[@<|"vpack"|@>@]); @:this can't happen vpack}{\quad vpack@> else switch (type(p)) { case hlist_node: case vlist_node: case rule_node: case unset_node: @<Incorporate box dimensions into the dimensions of the vbox that will contain~it@>@;@+break; case whatsit_node: @<Incorporate a whatsit node into a vbox@>;@+break; case glue_node: @<Incorporate glue into the vertical totals@>@;@+break; case kern_node: {@+x=x+d+width(p);d=0; } @+break; default:do_nothing; } p=link(p); } @ @<Incorporate box dimensions into the dimensions of the vbox...@>= {@+x=x+d+height(p);d=depth(p); if (type(p) >= rule_node) s=0;@+else s=shift_amount(p); if (width(p)+s > w) w=width(p)+s; } @ @<Incorporate glue into the vertical totals@>= {@+x=x+d;d=0;@/ g=glue_ptr(p);x=x+width(g);@/ o=stretch_order(g);total_stretch[o]=total_stretch[o]+stretch(g); o=shrink_order(g);total_shrink[o]=total_shrink[o]+shrink(g); if (subtype(p) >= a_leaders) {@+g=leader_ptr(p); if (width(g) > w) w=width(g); } } @ When we get to the present part of the program, |x| is the natural height of the box being packaged. @<Determine the value of |height(r)| and the appropriate glue setting...@>= if (m==additional) h=x+h; height(r)=h;x=h-x; /*now |x| is the excess to be made up*/ if (x==0) {@+glue_sign(r)=normal;glue_order(r)=normal; set_glue_ratio_zero(glue_set(r)); goto end; } else if (x > 0) @<Determine vertical glue stretch setting, then |return| or \hbox{|goto common_ending|}@>@; else@<Determine vertical glue shrink setting, then |return| or \hbox{|goto common_ending|}@>@; @ @<Determine vertical glue stretch setting...@>= {@+@<Determine the stretch order@>; glue_order(r)=o;glue_sign(r)=stretching; if (total_stretch[o]!=0) glue_set(r)=unfloat(x/(double)total_stretch[o]); @^real division@> else{@+glue_sign(r)=normal; set_glue_ratio_zero(glue_set(r)); /*there's nothing to stretch*/ } if (o==normal) if (list_ptr(r)!=null) @<Report an underfull vbox and |goto common_ending|, if this box is sufficiently bad@>; goto end; } @ @<Report an underfull vbox and |goto common_ending|, if...@>= {@+last_badness=badness(x, total_stretch[normal]); if (last_badness > vbadness) {@+print_ln(); if (last_badness > 100) print_nl("Underfull");@+else print_nl("Loose"); print_str(" \\vbox (badness ");print_int(last_badness); @.Underfull \\vbox...@> @.Loose \\vbox...@> goto common_ending; } } @ @<Finish issuing a diagnostic message for an overfull or underfull vbox@>= if (output_active) print_str(") has occurred while \\output is active"); else{@+if (pack_begin_line!=0) /*it's actually negative*/ {@+print_str(") in alignment at lines "); print_int(abs(pack_begin_line)); print_str("--"); } else print_str(") detected at line "); print_int(line); print_ln();@/ } begin_diagnostic();show_box(r);end_diagnostic(true) @ @<Determine vertical glue shrink setting...@>= {@+@<Determine the shrink order@>; glue_order(r)=o;glue_sign(r)=shrinking; if (total_shrink[o]!=0) glue_set(r)=unfloat((-x)/(double)total_shrink[o]); @^real division@> else{@+glue_sign(r)=normal; set_glue_ratio_zero(glue_set(r)); /*there's nothing to shrink*/ } if ((total_shrink[o] < -x)&&(o==normal)&&(list_ptr(r)!=null)) {@+last_badness=1000000; set_glue_ratio_one(glue_set(r)); /*use the maximum shrinkage*/ @<Report an overfull vbox and |goto common_ending|, if this box is sufficiently bad@>; } else if (o==normal) if (list_ptr(r)!=null) @<Report a tight vbox and |goto common_ending|, if this box is sufficiently bad@>; goto end; } @ @<Report an overfull vbox and |goto common_ending|, if...@>= if ((-x-total_shrink[normal] > vfuzz)||(vbadness < 100)) {@+print_ln();print_nl("Overfull \\vbox ("); @.Overfull \\vbox...@> print_scaled(-x-total_shrink[normal]);print_str("pt too high"); goto common_ending; } @ @<Report a tight vbox and |goto common_ending|, if...@>= {@+last_badness=badness(-x, total_shrink[normal]); if (last_badness > vbadness) {@+print_ln();print_nl("Tight \\vbox (badness ");print_int(last_badness); @.Tight \\vbox...@> goto common_ending; } } @ When a box is being appended to the current vertical list, the baselineskip calculation is handled by the |append_to_vlist| routine. @p void append_to_vlist(pointer @!b) {@+scaled d; /*deficiency of space between baselines*/ pointer @!p; /*a new glue node*/ if (prev_depth > ignore_depth) {@+d=width(baseline_skip)-prev_depth-height(b); if (d < line_skip_limit) p=new_param_glue(line_skip_code); else{@+p=new_skip_param(baseline_skip_code); width(temp_ptr)=d; /*|temp_ptr==glue_ptr(p)|*/ } link(tail)=p;tail=p; } link(tail)=b;tail=b;prev_depth=depth(b); } @* Data structures for math mode. When \TeX\ reads a formula that is enclosed between \.\$'s, it constructs an {\sl mlist}, which is essentially a tree structure representing that formula. An mlist is a linear sequence of items, but we can regard it as a tree structure because mlists can appear within mlists. For example, many of the entries can be subscripted or superscripted, and such ``scripts'' are mlists in their own right. An entire formula is parsed into such a tree before any of the actual typesetting is done, because the current style of type is usually not known until the formula has been fully scanned. For example, when the formula `\.{\$a+b \\over c+d\$}' is being read, there is no way to tell that `\.{a+b}' will be in script size until `\.{\\over}' has appeared. During the scanning process, each element of the mlist being built is classified as a relation, a binary operator, an open parenthesis, etc., or as a construct like `\.{\\sqrt}' that must be built up. This classification appears in the mlist data structure. After a formula has been fully scanned, the mlist is converted to an hlist so that it can be incorporated into the surrounding text. This conversion is controlled by a recursive procedure that decides all of the appropriate styles by a ``top-down'' process starting at the outermost level and working in towards the subformulas. The formula is ultimately pasted together using combinations of horizontal and vertical boxes, with glue and penalty nodes inserted as necessary. An mlist is represented internally as a linked list consisting chiefly of ``noads'' (pronounced ``no-adds''), to distinguish them from the somewhat similar ``nodes'' in hlists and vlists. Certain kinds of ordinary nodes are allowed to appear in mlists together with the noads; \TeX\ tells the difference by means of the |type| field, since a noad's |type| is always greater than that of a node. An mlist does not contain character nodes, hlist nodes, vlist nodes, math nodes, ligature nodes, or unset nodes; in particular, each mlist item appears in the variable-size part of |mem|, so the |type| field is always present. @ Each noad is four or more words long. The first word contains the |type| and |subtype| and |link| fields that are already so familiar to us; the second, third, and fourth words are called the noad's |nucleus|, |subscr|, and |supscr| fields. Consider, for example, the simple formula `\.{\$x\^2\$}', which would be parsed into an mlist containing a single element called an |ord_noad|. The |nucleus| of this noad is a representation of `\.x', the |subscr| is empty, and the |supscr| is a representation of `\.2'. The |nucleus|, |subscr|, and |supscr| fields are further broken into subfields. If |p| points to a noad, and if |q| is one of its principal fields (e.g., |q==subscr(p)|), there are several possibilities for the subfields, depending on the |math_type| of |q|. \yskip\hang|math_type(q)==math_char| means that |fam(q)| refers to one of the sixteen font families, and |character(q)| is the number of a character within a font of that family, as in a character node. \yskip\hang|math_type(q)==math_text_char| is similar, but the character is unsubscripted and unsuperscripted and it is followed immediately by another character from the same font. (This |math_type| setting appears only briefly during the processing; it is used to suppress unwanted italic corrections.) \yskip\hang|math_type(q)==empty| indicates a field with no value (the corresponding attribute of noad |p| is not present). \yskip\hang|math_type(q)==sub_box| means that |info(q)| points to a box node (either an |hlist_node| or a |vlist_node|) that should be used as the value of the field. The |shift_amount| in the subsidiary box node is the amount by which that box will be shifted downward. \yskip\hang|math_type(q)==sub_mlist| means that |info(q)| points to an mlist; the mlist must be converted to an hlist in order to obtain the value of this field. \yskip\noindent In the latter case, we might have |info(q)==null|. This is not the same as |math_type(q)==empty|; for example, `\.{\$P\_\{\}\$}' and `\.{\$P\$}' produce different results (the former will not have the ``italic correction'' added to the width of |P|, but the ``script skip'' will be added). The definitions of subfields given here are evidently wasteful of space, since a halfword is being used for the |math_type| although only three bits would be needed. However, there are hardly ever many noads present at once, since they are soon converted to nodes that take up even more space, so we can afford to represent them in whatever way simplifies the programming. @d noad_size 4 /*number of words in a normal noad*/ @d nucleus(X) X+1 /*the |nucleus| field of a noad*/ @d supscr(X) X+2 /*the |supscr| field of a noad*/ @d subscr(X) X+3 /*the |subscr| field of a noad*/ @d math_type link /*a |halfword| in |mem|*/ @d fam font /*a |quarterword| in |mem|*/ @d math_char 1 /*|math_type| when the attribute is simple*/ @d sub_box 2 /*|math_type| when the attribute is a box*/ @d sub_mlist 3 /*|math_type| when the attribute is a formula*/ @d math_text_char 4 /*|math_type| when italic correction is dubious*/ @ Each portion of a formula is classified as Ord, Op, Bin, Rel, Ope, Clo, Pun, or Inn, for purposes of spacing and line breaking. An |ord_noad|, |op_noad|, |bin_noad|, |rel_noad|, |open_noad|, |close_noad|, |punct_noad|, or |inner_noad| is used to represent portions of the various types. For example, an `\.=' sign in a formula leads to the creation of a |rel_noad| whose |nucleus| field is a representation of an equals sign (usually |fam==0|, |character==075|). A formula preceded by \.{\\mathrel} also results in a |rel_noad|. When a |rel_noad| is followed by an |op_noad|, say, and possibly separated by one or more ordinary nodes (not noads), \TeX\ will insert a penalty node (with the current |rel_penalty|) just after the formula that corresponds to the |rel_noad|, unless there already was a penalty immediately following; and a ``thick space'' will be inserted just before the formula that corresponds to the |op_noad|. A noad of type |ord_noad|, |op_noad|, \dots, |inner_noad| usually has a |subtype==normal|. The only exception is that an |op_noad| might have |subtype==limits| or |no_limits|, if the normal positioning of limits has been overridden for this operator. @d ord_noad (unset_node+3) /*|type| of a noad classified Ord*/ @d op_noad (ord_noad+1) /*|type| of a noad classified Op*/ @d bin_noad (ord_noad+2) /*|type| of a noad classified Bin*/ @d rel_noad (ord_noad+3) /*|type| of a noad classified Rel*/ @d open_noad (ord_noad+4) /*|type| of a noad classified Ope*/ @d close_noad (ord_noad+5) /*|type| of a noad classified Clo*/ @d punct_noad (ord_noad+6) /*|type| of a noad classified Pun*/ @d inner_noad (ord_noad+7) /*|type| of a noad classified Inn*/ @d limits 1 /*|subtype| of |op_noad| whose scripts are to be above, below*/ @d no_limits 2 /*|subtype| of |op_noad| whose scripts are to be normal*/ @ A |radical_noad| is five words long; the fifth word is the |left_delimiter| field, which usually represents a square root sign. A |fraction_noad| is six words long; it has a |right_delimiter| field as well as a |left_delimiter|. Delimiter fields are of type |four_quarters|, and they have four subfields called |small_fam|, |small_char|, |large_fam|, |large_char|. These subfields represent variable-size delimiters by giving the ``small'' and ``large'' starting characters, as explained in Chapter~17 of {\sl The \TeX book}. @:TeXbook}{\sl The \TeX book@> A |fraction_noad| is actually quite different from all other noads. Not only does it have six words, it has |thickness|, |denominator|, and |numerator| fields instead of |nucleus|, |subscr|, and |supscr|. The |thickness| is a scaled value that tells how thick to make a fraction rule; however, the special value |default_code| is used to stand for the |default_rule_thickness| of the current size. The |numerator| and |denominator| point to mlists that define a fraction; we always have $$\hbox{|math_type(numerator)==math_type(denominator)==sub_mlist|}.$$ The |left_delimiter| and |right_delimiter| fields specify delimiters that will be placed at the left and right of the fraction. In this way, a |fraction_noad| is able to represent all of \TeX's operators \.{\\over}, \.{\\atop}, \.{\\above}, \.{\\overwithdelims}, \.{\\atopwithdelims}, and \.{\\abovewithdelims}. @d left_delimiter(X) X+4 /*first delimiter field of a noad*/ @d right_delimiter(X) X+5 /*second delimiter field of a fraction noad*/ @d radical_noad (inner_noad+1) /*|type| of a noad for square roots*/ @d radical_noad_size 5 /*number of |mem| words in a radical noad*/ @d fraction_noad (radical_noad+1) /*|type| of a noad for generalized fractions*/ @d fraction_noad_size 6 /*number of |mem| words in a fraction noad*/ @d small_fam(X) mem[X].qqqq.b0 /*|fam| for ``small'' delimiter*/ @d small_char(X) mem[X].qqqq.b1 /*|character| for ``small'' delimiter*/ @d large_fam(X) mem[X].qqqq.b2 /*|fam| for ``large'' delimiter*/ @d large_char(X) mem[X].qqqq.b3 /*|character| for ``large'' delimiter*/ @d thickness width /*|thickness| field in a fraction noad*/ @d default_code 010000000000 /*denotes |default_rule_thickness|*/ @d numerator supscr /*|numerator| field in a fraction noad*/ @d denominator subscr /*|denominator| field in a fraction noad*/ @ The global variable |empty_field| is set up for initialization of empty fields in new noads. Similarly, |null_delimiter| is for the initialization of delimiter fields. @<Glob...@>= two_halves @!empty_field; four_quarters @!null_delimiter; @ @<Set init...@>= empty_field.rh=empty;empty_field.lh=null;@/ null_delimiter.b0=0;null_delimiter.b1=min_quarterword;@/ null_delimiter.b2=0;null_delimiter.b3=min_quarterword; @ The |new_noad| function creates an |ord_noad| that is completely null. @p pointer new_noad(void) {@+pointer p; p=get_node(noad_size); type(p)=ord_noad;subtype(p)=normal; mem[nucleus(p)].hh=empty_field; mem[subscr(p)].hh=empty_field; mem[supscr(p)].hh=empty_field; return p; } @ A few more kinds of noads will complete the set: An |under_noad| has its nucleus underlined; an |over_noad| has it overlined. An |accent_noad| places an accent over its nucleus; the accent character appears as |fam(accent_chr(p))| and |character(accent_chr(p))|. A |vcenter_noad| centers its nucleus vertically with respect to the axis of the formula; in such noads we always have |math_type(nucleus(p))==sub_box|. And finally, we have |left_noad| and |right_noad| types, to implement \TeX's \.{\\left} and \.{\\right}. The |nucleus| of such noads is replaced by a |delimiter| field; thus, for example, `\.{\\left(}' produces a |left_noad| such that |delimiter(p)| holds the family and character codes for all left parentheses. A |left_noad| never appears in an mlist except as the first element, and a |right_noad| never appears in an mlist except as the last element; furthermore, we either have both a |left_noad| and a |right_noad|, or neither one is present. The |subscr| and |supscr| fields are always |empty| in a |left_noad| and a |right_noad|. @d under_noad (fraction_noad+1) /*|type| of a noad for underlining*/ @d over_noad (under_noad+1) /*|type| of a noad for overlining*/ @d accent_noad (over_noad+1) /*|type| of a noad for accented subformulas*/ @d accent_noad_size 5 /*number of |mem| words in an accent noad*/ @d accent_chr(X) X+4 /*the |accent_chr| field of an accent noad*/ @d vcenter_noad (accent_noad+1) /*|type| of a noad for \.{\\vcenter}*/ @d left_noad (vcenter_noad+1) /*|type| of a noad for \.{\\left}*/ @d right_noad (left_noad+1) /*|type| of a noad for \.{\\right}*/ @d delimiter nucleus /*|delimiter| field in left and right noads*/ @d scripts_allowed(X) (type(X) >= ord_noad)&&(type(X) < left_noad) @ Math formulas can also contain instructions like \.{\\textstyle} that override \TeX's normal style rules. A |style_node| is inserted into the data structure to record such instructions; it is three words long, so it is considered a node instead of a noad. The |subtype| is either |display_style| or |text_style| or |script_style| or |script_script_style|. The second and third words of a |style_node| are not used, but they are present because a |choice_node| is converted to a |style_node|. \TeX\ uses even numbers 0, 2, 4, 6 to encode the basic styles |display_style|, \dots, |script_script_style|, and adds~1 to get the ``cramped'' versions of these styles. This gives a numerical order that is backwards from the convention of Appendix~G in {\sl The \TeX book\/}; i.e., a smaller style has a larger numerical value. @:TeXbook}{\sl The \TeX book@> @d style_node (unset_node+1) /*|type| of a style node*/ @d style_node_size 3 /*number of words in a style node*/ @d display_style 0 /*|subtype| for \.{\\displaystyle}*/ @d text_style 2 /*|subtype| for \.{\\textstyle}*/ @d script_style 4 /*|subtype| for \.{\\scriptstyle}*/ @d script_script_style 6 /*|subtype| for \.{\\scriptscriptstyle}*/ @d cramped 1 /*add this to an uncramped style if you want to cramp it*/ @p pointer new_style(small_number @!s) /*create a style node*/ {@+pointer p; /*the new node*/ p=get_node(style_node_size);type(p)=style_node; subtype(p)=s;width(p)=0;depth(p)=0; /*the |width| and |depth| are not used*/ return p; } @ Finally, the \.{\\mathchoice} primitive creates a |choice_node|, which has special subfields |display_mlist|, |text_mlist|, |script_mlist|, and |script_script_mlist| pointing to the mlists for each style. @d choice_node (unset_node+2) /*|type| of a choice node*/ @d display_mlist(X) info(X+1) /*mlist to be used in display style*/ @d text_mlist(X) link(X+1) /*mlist to be used in text style*/ @d script_mlist(X) info(X+2) /*mlist to be used in script style*/ @d script_script_mlist(X) link(X+2) /*mlist to be used in scriptscript style*/ @p pointer new_choice(void) /*create a choice node*/ {@+pointer p; /*the new node*/ p=get_node(style_node_size);type(p)=choice_node; subtype(p)=0; /*the |subtype| is not used*/ display_mlist(p)=null;text_mlist(p)=null;script_mlist(p)=null; script_script_mlist(p)=null; return p; } @ Let's consider now the previously unwritten part of |show_node_list| that displays the things that can only be present in mlists; this program illustrates how to access the data structures just defined. In the context of the following program, |p| points to a node or noad that should be displayed, and the current string contains the ``recursion history'' that leads to this point. The recursion history consists of a dot for each outer level in which |p| is subsidiary to some node, or in which |p| is subsidiary to the |nucleus| field of some noad; the dot is replaced by `\.\_' or `\.\^' or `\./' or `\.\\' if |p| is descended from the |subscr| or |supscr| or |denominator| or |numerator| fields of noads. For example, the current string would be `\.{.\^.\_/}' if |p| points to the |ord_noad| for |x| in the (ridiculous) formula `\.{\$\\sqrt\{a\^\{\\mathinner\{b\_\{c\\over x+y\}\}\}\}\$}'. @<Cases of |show_node_list| that arise...@>= case style_node: print_style(subtype(p));@+break; case choice_node: @<Display choice node |p|@>@;@+break; case ord_noad: case op_noad: case bin_noad: case rel_noad: case open_noad: case close_noad: case punct_noad: case inner_noad: case radical_noad: case over_noad: case under_noad: case vcenter_noad: case accent_noad: case left_noad: case right_noad: @<Display normal noad |p|@>@;@+break; case fraction_noad: @<Display fraction noad |p|@>@;@+break; @ Here are some simple routines used in the display of noads. @<Declare procedures needed for displaying the elements of mlists@>= void print_fam_and_char(pointer @!p) /*prints family and character*/ {@+print_esc(@[@<|"fam"|@>@]);print_int(fam(p));print_char(' '); print_ASCII(qo(character(p))); } @# void print_delimiter(pointer @!p) /*prints a delimiter as 24-bit hex value*/ {@+int a; /*accumulator*/ a=small_fam(p)*256+qo(small_char(p)); a=a*0x1000+large_fam(p)*256+qo(large_char(p)); if (a < 0) print_int(a); /*this should never happen*/ else print_hex(a); } @ The next subroutine will descend to another level of recursion when a subsidiary mlist needs to be displayed. The parameter |c| indicates what character is to become part of the recursion history. An empty mlist is distinguished from a field with |math_type(p)==empty|, because these are not equivalent (as explained above). @^recursion@> @<Declare procedures needed for displaying...@>= void show_info(void); /*|show_node_list(info(temp_ptr))|*/ void print_subsidiary_data(pointer @!p, ASCII_code @!c) /*display a noad field*/ {@+if (cur_length >= depth_threshold) {@+if (math_type(p)!=empty) print_str(" []"); } else{@+append_char(c); /*include |c| in the recursion history*/ temp_ptr=p; /*prepare for |show_info| if recursion is needed*/ switch (math_type(p)) { case math_char: {@+print_ln();print_current_string();print_fam_and_char(p); } @+break; case sub_box: show_info();@+break; /*recursive call*/ case sub_mlist: if (info(p)==null) {@+print_ln();print_current_string();print_str("{}"); } else show_info();@+break; /*recursive call*/ default:do_nothing; /*|empty|*/ } @/ flush_char; /*remove |c| from the recursion history*/ } } @ The inelegant introduction of |show_info| in the code above seems better than the alternative of using \PASCAL's strange |forward| declaration for a procedure with parameters. The \PASCAL\ convention about dropping parameters from a post-|forward| procedure is, frankly, so intolerable to the author of \TeX\ that he would rather stoop to communication via a global temporary variable. (A similar stoopidity occurred with respect to |hlist_out| and |vlist_out| above, and it will occur with respect to |mlist_to_hlist| below.) @^Knuth, Donald Ervin@> @:PASCAL}{\PASCAL@> @p void show_info(void) /*the reader will kindly forgive this*/ {@+show_node_list(info(temp_ptr)); } @ @<Declare procedures needed for displaying...@>= void print_style(int @!c) {@+switch (c/2) { case 0: print_esc(@[@<|"displaystyle"|@>@]);@+break; /*|display_style==0|*/ case 1: print_esc(@[@<|"textstyle"|@>@]);@+break; /*|text_style==2|*/ case 2: print_esc(@[@<|"scriptstyle"|@>@]);@+break; /*|script_style==4|*/ case 3: print_esc(@[@<|"scriptscriptstyle"|@>@]);@+break; /*|script_script_style==6|*/ default:print_str("Unknown style!"); } } @ @<Display choice node |p|@>= {@+print_esc(@[@<|"mathchoice"|@>@]); append_char('D');show_node_list(display_mlist(p));flush_char; append_char('T');show_node_list(text_mlist(p));flush_char; append_char('S');show_node_list(script_mlist(p));flush_char; append_char('s');show_node_list(script_script_mlist(p));flush_char; } @ @<Display normal noad |p|@>= {@+switch (type(p)) { case ord_noad: print_esc(@[@<|"mathord"|@>@]);@+break; case op_noad: print_esc(@[@<|"mathop"|@>@]);@+break; case bin_noad: print_esc(@[@<|"mathbin"|@>@]);@+break; case rel_noad: print_esc(@[@<|"mathrel"|@>@]);@+break; case open_noad: print_esc(@[@<|"mathopen"|@>@]);@+break; case close_noad: print_esc(@[@<|"mathclose"|@>@]);@+break; case punct_noad: print_esc(@[@<|"mathpunct"|@>@]);@+break; case inner_noad: print_esc(@[@<|"mathinner"|@>@]);@+break; case over_noad: print_esc(@[@<|"overline"|@>@]);@+break; case under_noad: print_esc(@[@<|"underline"|@>@]);@+break; case vcenter_noad: print_esc(@[@<|"vcenter"|@>@]);@+break; case radical_noad: {@+print_esc(@[@<|"radical"|@>@]);print_delimiter(left_delimiter(p)); } @+break; case accent_noad: {@+print_esc(@[@<|"accent"|@>@]);print_fam_and_char(accent_chr(p)); } @+break; case left_noad: {@+print_esc(@[@<|"left"|@>@]);print_delimiter(delimiter(p)); } @+break; case right_noad: {@+print_esc(@[@<|"right"|@>@]);print_delimiter(delimiter(p)); } } if (subtype(p)!=normal) if (subtype(p)==limits) print_esc(@[@<|"limits"|@>@]); else print_esc(@[@<|"nolimits"|@>@]); if (type(p) < left_noad) print_subsidiary_data(nucleus(p),'.'); print_subsidiary_data(supscr(p),'^'); print_subsidiary_data(subscr(p),'_'); } @ @<Display fraction noad |p|@>= {@+print_esc(@[@<|"fraction, thickness "|@>@]); if (thickness(p)==default_code) print_str("= default"); else print_scaled(thickness(p)); if ((small_fam(left_delimiter(p))!=0)||@+ (small_char(left_delimiter(p))!=min_quarterword)||@| (large_fam(left_delimiter(p))!=0)||@| (large_char(left_delimiter(p))!=min_quarterword)) {@+print_str(", left-delimiter ");print_delimiter(left_delimiter(p)); } if ((small_fam(right_delimiter(p))!=0)||@| (small_char(right_delimiter(p))!=min_quarterword)||@| (large_fam(right_delimiter(p))!=0)||@| (large_char(right_delimiter(p))!=min_quarterword)) {@+print_str(", right-delimiter ");print_delimiter(right_delimiter(p)); } print_subsidiary_data(numerator(p),'\\'); print_subsidiary_data(denominator(p),'/'); } @ That which can be displayed can also be destroyed. @<Cases of |flush_node_list| that arise...@>= case style_node: {@+free_node(p, style_node_size);goto done; } case choice_node: {@+flush_node_list(display_mlist(p)); flush_node_list(text_mlist(p)); flush_node_list(script_mlist(p)); flush_node_list(script_script_mlist(p)); free_node(p, style_node_size);goto done; } case ord_noad: case op_noad: case bin_noad: case rel_noad: case open_noad: case close_noad: case punct_noad: case inner_noad: case radical_noad: case over_noad: case under_noad: case vcenter_noad: case accent_noad: @t@>@;@/ {@+if (math_type(nucleus(p)) >= sub_box) flush_node_list(info(nucleus(p))); if (math_type(supscr(p)) >= sub_box) flush_node_list(info(supscr(p))); if (math_type(subscr(p)) >= sub_box) flush_node_list(info(subscr(p))); if (type(p)==radical_noad) free_node(p, radical_noad_size); else if (type(p)==accent_noad) free_node(p, accent_noad_size); else free_node(p, noad_size); goto done; } case left_noad: case right_noad: {@+free_node(p, noad_size);goto done; } case fraction_noad: {@+flush_node_list(info(numerator(p))); flush_node_list(info(denominator(p))); free_node(p, fraction_noad_size);goto done; } @* Subroutines for math mode. In order to convert mlists to hlists, i.e., noads to nodes, we need several subroutines that are conveniently dealt with now. Let us first introduce the macros that make it easy to get at the parameters and other font information. A size code, which is a multiple of 16, is added to a family number to get an index into the table of internal font numbers for each combination of family and size. (Be alert: Size codes get larger as the type gets smaller.) @d text_size 0 /*size code for the largest size in a family*/ @d script_size 16 /*size code for the medium size in a family*/ @d script_script_size 32 /*size code for the smallest size in a family*/ @<Basic printing procedures@>= void print_size(int @!s) {@+if (s==text_size) print_esc(@[@<|"textfont"|@>@]); else if (s==script_size) print_esc(@[@<|"scriptfont"|@>@]); else print_esc(@[@<|"scriptscriptfont"|@>@]); } @ Before an mlist is converted to an hlist, \TeX\ makes sure that the fonts in family~2 have enough parameters to be math-symbol fonts, and that the fonts in family~3 have enough parameters to be math-extension fonts. The math-symbol parameters are referred to by using the following macros, which take a size code as their parameter; for example, |num1(cur_size)| gives the value of the |num1| parameter for the current size. @^parameters for symbols@> @^font parameters@> @d mathsy_end(X) fam_fnt(2+X)]].sc @d mathsy(X) font_info[X+param_base[mathsy_end @d math_x_height mathsy(5) /*height of `\.x'*/ @d math_quad mathsy(6) /*\.{18mu}*/ @d num1 mathsy(8) /*numerator shift-up in display styles*/ @d num2 mathsy(9) /*numerator shift-up in non-display, non-\.{\\atop}*/ @d num3 mathsy(10) /*numerator shift-up in non-display \.{\\atop}*/ @d denom1 mathsy(11) /*denominator shift-down in display styles*/ @d denom2 mathsy(12) /*denominator shift-down in non-display styles*/ @d sup1 mathsy(13) /*superscript shift-up in uncramped display style*/ @d sup2 mathsy(14) /*superscript shift-up in uncramped non-display*/ @d sup3 mathsy(15) /*superscript shift-up in cramped styles*/ @d sub1 mathsy(16) /*subscript shift-down if superscript is absent*/ @d sub2 mathsy(17) /*subscript shift-down if superscript is present*/ @d sup_drop mathsy(18) /*superscript baseline below top of large box*/ @d sub_drop mathsy(19) /*subscript baseline below bottom of large box*/ @d delim1 mathsy(20) /*size of \.{\\atopwithdelims} delimiters in display styles*/ @d delim2 mathsy(21) /*size of \.{\\atopwithdelims} delimiters in non-displays*/ @d axis_height mathsy(22) /*height of fraction lines above the baseline*/ @d total_mathsy_params 22 @ The math-extension parameters have similar macros, but the size code is omitted (since it is always |cur_size| when we refer to such parameters). @^parameters for symbols@> @^font parameters@> @d mathex(X) font_info[X+param_base[fam_fnt(3+cur_size)]].sc @d default_rule_thickness mathex(8) /*thickness of \.{\\over} bars*/ @d big_op_spacing1 mathex(9) /*minimum clearance above a displayed op*/ @d big_op_spacing2 mathex(10) /*minimum clearance below a displayed op*/ @d big_op_spacing3 mathex(11) /*minimum baselineskip above displayed op*/ @d big_op_spacing4 mathex(12) /*minimum baselineskip below displayed op*/ @d big_op_spacing5 mathex(13) /*padding above and below displayed limits*/ @d total_mathex_params 13 @ We also need to compute the change in style between mlists and their subsidiaries. The following macros define the subsidiary style for an overlined nucleus (|cramped_style|), for a subscript or a superscript (|sub_style| or |sup_style|), or for a numerator or denominator (|num_style| or |denom_style|). @d cramped_style(X) 2*(X/2)+cramped /*cramp the style*/ @d sub_style(X) 2*(X/4)+script_style+cramped /*smaller and cramped*/ @d sup_style(X) 2*(X/4)+script_style+(X%2) /*smaller*/ @d num_style(X) X+2-2*(X/6) /*smaller unless already script-script*/ @d denom_style(X) 2*(X/2)+cramped+2-2*(X/6) /*smaller, cramped*/ @ When the style changes, the following piece of program computes associated information: @<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>= {@+if (cur_style < script_style) cur_size=text_size; else cur_size=16*((cur_style-text_style)/2); cur_mu=x_over_n(math_quad(cur_size), 18); } @ Here is a function that returns a pointer to a rule node having a given thickness |t|. The rule will extend horizontally to the boundary of the vlist that eventually contains it. @p pointer fraction_rule(scaled @!t) /*construct the bar for a fraction*/ {@+pointer p; /*the new node*/ p=new_rule();height(p)=t;depth(p)=0;return p; } @ The |overbar| function returns a pointer to a vlist box that consists of a given box |b|, above which has been placed a kern of height |k| under a fraction rule of thickness |t| under additional space of height |t|. @p pointer overbar(pointer @!b, scaled @!k, scaled @!t) {@+pointer p, @!q; /*nodes being constructed*/ p=new_kern(k);link(p)=b;q=fraction_rule(t);link(q)=p; p=new_kern(t);link(p)=q;return vpack(p, natural); } @ The |var_delimiter| function, which finds or constructs a sufficiently large delimiter, is the most interesting of the auxiliary functions that currently concern us. Given a pointer |d| to a delimiter field in some noad, together with a size code |s| and a vertical distance |v|, this function returns a pointer to a box that contains the smallest variant of |d| whose height plus depth is |v| or more. (And if no variant is large enough, it returns the largest available variant.) In particular, this routine will construct arbitrarily large delimiters from extensible components, if |d| leads to such characters. The value returned is a box whose |shift_amount| has been set so that the box is vertically centered with respect to the axis in the given size. If a built-up symbol is returned, the height of the box before shifting will be the height of its topmost component. @p@t\4@>@<Declare subprocedures for |var_delimiter|@>@; pointer var_delimiter(pointer @!d, small_number @!s, scaled @!v) {@+ pointer b; /*the box that will be constructed*/ internal_font_number @!f, @!g; /*best-so-far and tentative font codes*/ quarterword @!c, @!x, @!y; /*best-so-far and tentative character codes*/ int @!m, @!n; /*the number of extensible pieces*/ scaled @!u; /*height-plus-depth of a tentative character*/ scaled @!w; /*largest height-plus-depth so far*/ four_quarters @!q; /*character info*/ eight_bits @!hd; /*height-depth byte*/ four_quarters @!r; /*extensible pieces*/ small_number @!z; /*runs through font family members*/ bool @!large_attempt; /*are we trying the ``large'' variant?*/ f=null_font;w=0;large_attempt=false; z=small_fam(d);x=small_char(d); loop@+{@+@<Look at the variants of |(z,x)|; set |f| and |c| whenever a better character is found; |goto found| as soon as a large enough variant is encountered@>; if (large_attempt) goto found; /*there were none large enough*/ large_attempt=true;z=large_fam(d);x=large_char(d); } found: if (f!=null_font) @<Make variable |b| point to a box for |(f,c)|@>; else{@+b=new_null_box(); width(b)=null_delimiter_space; /*use this width if no delimiter was found*/ } shift_amount(b)=half(height(b)-depth(b))-axis_height(s); return b; } @ The search process is complicated slightly by the facts that some of the characters might not be present in some of the fonts, and they might not be probed in increasing order of height. @<Look at the variants of |(z,x)|; set |f| and |c|...@>= if ((z!=0)||(x!=min_quarterword)) {@+z=z+s+16; @/do@+{z=z-16;g=fam_fnt(z); if (g!=null_font) @<Look at the list of characters starting with |x| in font |g|; set |f| and |c| whenever a better character is found; |goto found| as soon as a large enough variant is encountered@>; }@+ while (!(z < 16)); } @ @<Look at the list of characters starting with |x|...@>= {@+y=x; if ((qo(y) >= font_bc[g])&&(qo(y) <= font_ec[g])) {@+resume: q=char_info(g)(y); if (char_exists(q)) {@+if (char_tag(q)==ext_tag) {@+f=g;c=y;goto found; } hd=height_depth(q); u=char_height(g)(hd)+char_depth(g)(hd); if (u > w) {@+f=g;c=y;w=u; if (u >= v) goto found; } if (char_tag(q)==list_tag) {@+y=rem_byte(q);goto resume; } } } } @ Here is a subroutine that creates a new box, whose list contains a single character, and whose width includes the italic correction for that character. The height or depth of the box will be negative, if the height or depth of the character is negative; thus, this routine may deliver a slightly different result than |hpack| would produce. @<Declare subprocedures for |var_delimiter|@>= pointer char_box(internal_font_number @!f, quarterword @!c) {@+four_quarters q; eight_bits @!hd; /*|height_depth| byte*/ pointer @!b, @!p; /*the new box and its character node*/ q=char_info(f)(c);hd=height_depth(q); b=new_null_box();width(b)=char_width(f)(q)+char_italic(f)(q); height(b)=char_height(f)(hd);depth(b)=char_depth(f)(hd); p=get_avail();character(p)=c;font(p)=f;list_ptr(b)=p;return b; } @ When the following code is executed, |char_tag(q)| will be equal to |ext_tag| if and only if a built-up symbol is supposed to be returned. @<Make variable |b| point to a box for |(f,c)|@>= if (char_tag(q)==ext_tag) @<Construct an extensible character in a new box |b|, using recipe |rem_byte(q)| and font |f|@>@; else b=char_box(f, c) @ When we build an extensible character, it's handy to have the following subroutine, which puts a given character on top of the characters already in box |b|: @<Declare subprocedures for |var_delimiter|@>= void stack_into_box(pointer @!b, internal_font_number @!f, quarterword @!c) {@+pointer p; /*new node placed into |b|*/ p=char_box(f, c);link(p)=list_ptr(b);list_ptr(b)=p; height(b)=height(p); } @ Another handy subroutine computes the height plus depth of a given character: @<Declare subprocedures for |var_delimiter|@>= scaled height_plus_depth(internal_font_number @!f, quarterword @!c) {@+four_quarters q; eight_bits @!hd; /*|height_depth| byte*/ q=char_info(f)(c);hd=height_depth(q); return char_height(f)(hd)+char_depth(f)(hd); } @ @<Construct an extensible...@>= {@+b=new_null_box(); type(b)=vlist_node; r=font_info[exten_base[f]+rem_byte(q)].qqqq;@/ @<Compute the minimum suitable height, |w|, and the corresponding number of extension steps, |n|; also set |width(b)|@>; c=ext_bot(r); if (c!=min_quarterword) stack_into_box(b, f, c); c=ext_rep(r); for (m=1; m<=n; m++) stack_into_box(b, f, c); c=ext_mid(r); if (c!=min_quarterword) {@+stack_into_box(b, f, c);c=ext_rep(r); for (m=1; m<=n; m++) stack_into_box(b, f, c); } c=ext_top(r); if (c!=min_quarterword) stack_into_box(b, f, c); depth(b)=w-height(b); } @ The width of an extensible character is the width of the repeatable module. If this module does not have positive height plus depth, we don't use any copies of it, otherwise we use as few as possible (in groups of two if there is a middle part). @<Compute the minimum suitable height, |w|, and...@>= c=ext_rep(r);u=height_plus_depth(f, c); w=0;q=char_info(f)(c);width(b)=char_width(f)(q)+char_italic(f)(q);@/ c=ext_bot(r);@+if (c!=min_quarterword) w=w+height_plus_depth(f, c); c=ext_mid(r);@+if (c!=min_quarterword) w=w+height_plus_depth(f, c); c=ext_top(r);@+if (c!=min_quarterword) w=w+height_plus_depth(f, c); n=0; if (u > 0) while (w < v) {@+w=w+u;incr(n); if (ext_mid(r)!=min_quarterword) w=w+u; } @ The next subroutine is much simpler; it is used for numerators and denominators of fractions as well as for displayed operators and their limits above and below. It takes a given box~|b| and changes it so that the new box is centered in a box of width~|w|. The centering is done by putting \.{\\hss} glue at the left and right of the list inside |b|, then packaging the new box; thus, the actual box might not really be centered, if it already contains infinite glue. The given box might contain a single character whose italic correction has been added to the width of the box; in this case a compensating kern is inserted. @p pointer rebox(pointer @!b, scaled @!w) {@+pointer p; /*temporary register for list manipulation*/ internal_font_number @!f; /*font in a one-character box*/ scaled @!v; /*width of a character without italic correction*/ if ((width(b)!=w)&&(list_ptr(b)!=null)) {@+if (type(b)==vlist_node) b=hpack(b, natural); p=list_ptr(b); if ((is_char_node(p))&&(link(p)==null)) {@+f=font(p);v=char_width(f)(char_info(f)(character(p))); if (v!=width(b)) link(p)=new_kern(width(b)-v); } free_node(b, box_node_size); b=new_glue(ss_glue);link(b)=p; while (link(p)!=null) p=link(p); link(p)=new_glue(ss_glue); return hpack(b, w, exactly); } else{@+width(b)=w;return b; } } @ Here is a subroutine that creates a new glue specification from another one that is expressed in `\.{mu}', given the value of the math unit. @d mu_mult(X) nx_plus_y(n, X, xn_over_d(X, f, 0200000)) @p pointer math_glue(pointer @!g, scaled @!m) {@+pointer p; /*the new glue specification*/ int @!n; /*integer part of |m|*/ scaled @!f; /*fraction part of |m|*/ n=x_over_n(m, 0200000);f=rem;@/ if (f < 0) {@+decr(n);f=f+0200000; } p=get_node(glue_spec_size); width(p)=mu_mult(width(g)); /*convert \.{mu} to \.{pt}*/ stretch_order(p)=stretch_order(g); if (stretch_order(p)==normal) stretch(p)=mu_mult(stretch(g)); else stretch(p)=stretch(g); shrink_order(p)=shrink_order(g); if (shrink_order(p)==normal) shrink(p)=mu_mult(shrink(g)); else shrink(p)=shrink(g); return p; } @ The |math_kern| subroutine removes |mu_glue| from a kern node, given the value of the math unit. @p void math_kern(pointer @!p, scaled @!m) {@+int @!n; /*integer part of |m|*/ scaled @!f; /*fraction part of |m|*/ if (subtype(p)==mu_glue) {@+n=x_over_n(m, 0200000);f=rem;@/ if (f < 0) {@+decr(n);f=f+0200000; } width(p)=mu_mult(width(p));subtype(p)=explicit; } } @ Sometimes it is necessary to destroy an mlist. The following subroutine empties the current list, assuming that |abs(mode)==mmode|. @p void flush_math(void) {@+flush_node_list(link(head));flush_node_list(incompleat_noad); link(head)=null;tail=head;incompleat_noad=null; } @* Typesetting math formulas. \TeX's most important routine for dealing with formulas is called |mlist_to_hlist|. After a formula has been scanned and represented as an mlist, this routine converts it to an hlist that can be placed into a box or incorporated into the text of a paragraph. There are three implicit parameters, passed in global variables: |cur_mlist| points to the first node or noad in the given mlist (and it might be |null|); |cur_style| is a style code; and |mlist_penalties| is |true| if penalty nodes for potential line breaks are to be inserted into the resulting hlist. After |mlist_to_hlist| has acted, |link(temp_head)| points to the translated hlist. Since mlists can be inside mlists, the procedure is recursive. And since this is not part of \TeX's inner loop, the program has been written in a manner that stresses compactness over efficiency. @^recursion@> @<Glob...@>= pointer @!cur_mlist; /*beginning of mlist to be translated*/ small_number @!cur_style; /*style code at current place in the list*/ small_number @!cur_size; /*size code corresponding to |cur_style|*/ scaled @!cur_mu; /*the math unit width corresponding to |cur_size|*/ bool @!mlist_penalties; /*should |mlist_to_hlist| insert penalties?*/ @ The recursion in |mlist_to_hlist| is due primarily to a subroutine called |clean_box| that puts a given noad field into a box using a given math style; |mlist_to_hlist| can call |clean_box|, which can call |mlist_to_hlist|. @^recursion@> The box returned by |clean_box| is ``clean'' in the sense that its |shift_amount| is zero. @p void mlist_to_hlist(void);@/ pointer clean_box(pointer @!p, small_number @!s) {@+ pointer q; /*beginning of a list to be boxed*/ small_number @!save_style; /*|cur_style| to be restored*/ pointer @!x; /*box to be returned*/ pointer @!r; /*temporary pointer*/ switch (math_type(p)) { case math_char: {@+cur_mlist=new_noad();mem[nucleus(cur_mlist)]=mem[p]; } @+break; case sub_box: {@+q=info(p);goto found; } case sub_mlist: cur_mlist=info(p);@+break; default:{@+q=new_null_box();goto found; } } @/ save_style=cur_style;cur_style=s;mlist_penalties=false;@/ mlist_to_hlist();q=link(temp_head); /*recursive call*/ cur_style=save_style; /*restore the style*/ @<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>; found: if (is_char_node(q)||(q==null)) x=hpack(q, natural); else if ((link(q)==null)&&(type(q) <= vlist_node)&&(shift_amount(q)==0)) x=q; /*it's already clean*/ else x=hpack(q, natural); @<Simplify a trivial box@>; return x; } @ Here we save memory space in a common case. @<Simplify a trivial box@>= q=list_ptr(x); if (is_char_node(q)) {@+r=link(q); if (r!=null) if (link(r)==null) if (!is_char_node(r)) if (type(r)==kern_node) /*unneeded italic correction*/ {@+free_node(r, small_node_size);link(q)=null; } } @ It is convenient to have a procedure that converts a |math_char| field to an ``unpacked'' form. The |fetch| routine sets |cur_f|, |cur_c|, and |cur_i| to the font code, character code, and character information bytes of a given noad field. It also takes care of issuing error messages for nonexistent characters; in such cases, |char_exists(cur_i)| will be |false| after |fetch| has acted, and the field will also have been reset to |empty|. @p void fetch(pointer @!a) /*unpack the |math_char| field |a|*/ {@+cur_c=character(a);cur_f=fam_fnt(fam(a)+cur_size); if (cur_f==null_font) @<Complain about an undefined family and set |cur_i| null@>@; else{@+if ((qo(cur_c) >= font_bc[cur_f])&&(qo(cur_c) <= font_ec[cur_f])) cur_i=char_info(cur_f)(cur_c); else cur_i=null_character; if (!(char_exists(cur_i))) {@+char_warning(cur_f, qo(cur_c)); math_type(a)=empty; } } } @ @<Complain about an undefined family...@>= {@+print_err("");print_size(cur_size);print_char(' '); print_int(fam(a));print_str(" is undefined (character "); print_ASCII(qo(cur_c));print_char(')'); help4("Somewhere in the math formula just ended, you used the")@/ ("stated character from an undefined font family. For example,")@/ ("plain TeX doesn't allow \\it or \\sl in subscripts. Proceed,")@/ ("and I'll try to forget that I needed that character."); error();cur_i=null_character;math_type(a)=empty; } @ The outputs of |fetch| are placed in global variables. @<Glob...@>= internal_font_number @!cur_f; /*the |font| field of a |math_char|*/ quarterword @!cur_c; /*the |character| field of a |math_char|*/ four_quarters @!cur_i; /*the |char_info| of a |math_char|, or a lig/kern instruction*/ @ We need to do a lot of different things, so |mlist_to_hlist| makes two passes over the given mlist. The first pass does most of the processing: It removes ``mu'' spacing from glue, it recursively evaluates all subsidiary mlists so that only the top-level mlist remains to be handled, it puts fractions and square roots and such things into boxes, it attaches subscripts and superscripts, and it computes the overall height and depth of the top-level mlist so that the size of delimiters for a |left_noad| and a |right_noad| will be known. The hlist resulting from each noad is recorded in that noad's |new_hlist| field, an integer field that replaces the |nucleus| or |thickness|. @^recursion@> The second pass eliminates all noads and inserts the correct glue and penalties between nodes. @d new_hlist(X) mem[nucleus(X)].i /*the translation of an mlist*/ @ Here is the overall plan of |mlist_to_hlist|, and the list of its local variables. @p@t\4@>@<Declare math construction procedures@>@; void mlist_to_hlist(void) {@+ pointer mlist; /*beginning of the given list*/ bool @!penalties; /*should penalty nodes be inserted?*/ small_number @!style; /*the given style*/ small_number @!save_style; /*holds |cur_style| during recursion*/ pointer @!q; /*runs through the mlist*/ pointer @!r; /*the most recent noad preceding |q|*/ small_number @!r_type; /*the |type| of noad |r|, or |op_noad| if |r==null|*/ small_number @!t; /*the effective |type| of noad |q| during the second pass*/ pointer @!p, @!x, @!y, @!z; /*temporary registers for list construction*/ int @!pen; /*a penalty to be inserted*/ small_number @!s; /*the size of a noad to be deleted*/ scaled @!max_h, @!max_d; /*maximum height and depth of the list translated so far*/ scaled @!delta; /*offset between subscript and superscript*/ mlist=cur_mlist;penalties=mlist_penalties; style=cur_style; /*tuck global parameters away as local variables*/ q=mlist;r=null;r_type=op_noad;max_h=0;max_d=0; @<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>; while (q!=null) @<Process node-or-noad |q| as much as possible in preparation for the second pass of |mlist_to_hlist|, then move to the next item in the mlist@>; @<Convert \(a)a final |bin_noad| to an |ord_noad|@>; @<Make a second pass over the mlist, removing all noads and inserting the proper spacing and penalties@>; } @ We use the fact that no character nodes appear in an mlist, hence the field |type(q)| is always present. @<Process node-or-noad...@>= {@+@<Do first-pass processing based on |type(q)|; |goto done_with_noad| if a noad has been fully processed, |goto check_dimensions| if it has been translated into |new_hlist(q)|, or |goto done_with_node| if a node has been fully processed@>; check_dimensions: z=hpack(new_hlist(q), natural); if (height(z) > max_h) max_h=height(z); if (depth(z) > max_d) max_d=depth(z); free_node(z, box_node_size); done_with_noad: r=q;r_type=type(r); done_with_node: q=link(q); } @ One of the things we must do on the first pass is change a |bin_noad| to an |ord_noad| if the |bin_noad| is not in the context of a binary operator. The values of |r| and |r_type| make this fairly easy. @<Do first-pass processing...@>= reswitch: delta=0; switch (type(q)) { case bin_noad: switch (r_type) { case bin_noad: case op_noad: case rel_noad: case open_noad: case punct_noad: case left_noad: {@+type(q)=ord_noad;goto reswitch; } default:do_nothing; } @+break; case rel_noad: case close_noad: case punct_noad: case right_noad: {@+@t@>@;@/ @<Convert \(a)a final |bin_noad| to an |ord_noad|@>; if (type(q)==right_noad) goto done_with_noad; } @+break; @t\4@>@<Cases for noads that can follow a |bin_noad|@>@; @t\4@>@<Cases for nodes that can appear in an mlist, after which we |goto done_with_node|@>@; default:confusion(@[@<|"mlist1"|@>@]); @:this can't happen mlist1}{\quad mlist1@> } @/ @<Convert \(n)|nucleus(q)| to an hlist and attach the sub/superscripts@>@; @ @<Convert \(a)a final |bin_noad| to an |ord_noad|@>= if (r_type==bin_noad) type(r)=ord_noad @ @<Cases for nodes that can appear in an mlist...@>= case style_node: {@+cur_style=subtype(q); @<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>; goto done_with_node; } case choice_node: @<Change this node to a style node followed by the correct choice, then |goto done_with_node|@>@; case ins_node: case mark_node: case adjust_node: case whatsit_node: case penalty_node: case disc_node: goto done_with_node; case rule_node: {@+if (height(q) > max_h) max_h=height(q); if (depth(q) > max_d) max_d=depth(q);goto done_with_node; } case glue_node: {@+@<Convert \(m)math glue to ordinary glue@>; goto done_with_node; } case kern_node: {@+math_kern(q, cur_mu);goto done_with_node; } @ @d choose_mlist(X) {@+p=X(q);X(q)=null;@+} @<Change this node to a style node...@>= {@+switch (cur_style/2) { case 0: choose_mlist(display_mlist)@;@+break; /*|display_style==0|*/ case 1: choose_mlist(text_mlist)@;@+break; /*|text_style==2|*/ case 2: choose_mlist(script_mlist)@;@+break; /*|script_style==4|*/ case 3: choose_mlist(script_script_mlist); /*|script_script_style==6|*/ } /*there are no other cases*/ flush_node_list(display_mlist(q)); flush_node_list(text_mlist(q)); flush_node_list(script_mlist(q)); flush_node_list(script_script_mlist(q));@/ type(q)=style_node;subtype(q)=cur_style;width(q)=0;depth(q)=0; if (p!=null) {@+z=link(q);link(q)=p; while (link(p)!=null) p=link(p); link(p)=z; } goto done_with_node; } @ Conditional math glue (`\.{\\nonscript}') results in a |glue_node| pointing to |zero_glue|, with |subtype(q)==cond_math_glue|; in such a case the node following will be eliminated if it is a glue or kern node and if the current size is different from |text_size|. Unconditional math glue (`\.{\\muskip}') is converted to normal glue by multiplying the dimensions by |cur_mu|. @!@:non_script_}{\.{\\nonscript} primitive@> @<Convert \(m)math glue to ordinary glue@>= if (subtype(q)==mu_glue) {@+x=glue_ptr(q); y=math_glue(x, cur_mu);delete_glue_ref(x);glue_ptr(q)=y; subtype(q)=normal; } else if ((cur_size!=text_size)&&(subtype(q)==cond_math_glue)) {@+p=link(q); if (p!=null) if ((type(p)==glue_node)||(type(p)==kern_node)) {@+link(q)=link(p);link(p)=null;flush_node_list(p); } } @ @<Cases for noads that can follow a |bin_noad|@>= case left_noad: goto done_with_noad; case fraction_noad: {@+make_fraction(q);goto check_dimensions; } case op_noad: {@+delta=make_op(q); if (subtype(q)==limits) goto check_dimensions; } @+break; case ord_noad: make_ord(q);@+break; case open_noad: case inner_noad: do_nothing;@+break; case radical_noad: make_radical(q);@+break; case over_noad: make_over(q);@+break; case under_noad: make_under(q);@+break; case accent_noad: make_math_accent(q);@+break; case vcenter_noad: make_vcenter(q);@+break; @ Most of the actual construction work of |mlist_to_hlist| is done by procedures with names like |make_fraction|, |make_radical|, etc. To illustrate the general setup of such procedures, let's begin with a couple of simple ones. @<Declare math...@>= void make_over(pointer @!q) {@+info(nucleus(q))=@| overbar(clean_box(nucleus(q), cramped_style(cur_style)),@| 3*default_rule_thickness, default_rule_thickness); math_type(nucleus(q))=sub_box; } @ @<Declare math...@>= void make_under(pointer @!q) {@+pointer p, @!x, @!y; /*temporary registers for box construction*/ scaled @!delta; /*overall height plus depth*/ x=clean_box(nucleus(q), cur_style); p=new_kern(3*default_rule_thickness);link(x)=p; link(p)=fraction_rule(default_rule_thickness); y=vpack(x, natural); delta=height(y)+depth(y)+default_rule_thickness; height(y)=height(x);depth(y)=delta-height(y); info(nucleus(q))=y;math_type(nucleus(q))=sub_box; } @ @<Declare math...@>= void make_vcenter(pointer @!q) {@+pointer v; /*the box that should be centered vertically*/ scaled @!delta; /*its height plus depth*/ v=info(nucleus(q)); if (type(v)!=vlist_node) confusion(@[@<|"vcenter"|@>@]); @:this can't happen vcenter}{\quad vcenter@> delta=height(v)+depth(v); height(v)=axis_height(cur_size)+half(delta); depth(v)=delta-height(v); } @ According to the rules in the \.{DVI} file specifications, we ensure alignment @^square roots@> between a square root sign and the rule above its nucleus by assuming that the baseline of the square-root symbol is the same as the bottom of the rule. The height of the square-root symbol will be the thickness of the rule, and the depth of the square-root symbol should exceed or equal the height-plus-depth of the nucleus plus a certain minimum clearance~|clr|. The symbol will be placed so that the actual clearance is |clr| plus half the excess. @<Declare math...@>= void make_radical(pointer @!q) {@+pointer x, @!y; /*temporary registers for box construction*/ scaled @!delta, @!clr; /*dimensions involved in the calculation*/ x=clean_box(nucleus(q), cramped_style(cur_style)); if (cur_style < text_style) /*display style*/ clr=default_rule_thickness+(abs(math_x_height(cur_size))/4); else{@+clr=default_rule_thickness;clr=clr+(abs(clr)/4); } y=var_delimiter(left_delimiter(q), cur_size, height(x)+depth(x)+clr+ default_rule_thickness); delta=depth(y)-(height(x)+depth(x)+clr); if (delta > 0) clr=clr+half(delta); /*increase the actual clearance*/ shift_amount(y)=-(height(x)+clr); link(y)=overbar(x, clr, height(y)); info(nucleus(q))=hpack(y, natural);math_type(nucleus(q))=sub_box; } @ Slants are not considered when placing accents in math mode. The accenter is centered over the accentee, and the accent width is treated as zero with respect to the size of the final box. @<Declare math...@>= void make_math_accent(pointer @!q) {@+ pointer p, @!x, @!y; /*temporary registers for box construction*/ int @!a; /*address of lig/kern instruction*/ quarterword @!c; /*accent character*/ internal_font_number @!f; /*its font*/ four_quarters @!i; /*its |char_info|*/ scaled @!s; /*amount to skew the accent to the right*/ scaled @!h; /*height of character being accented*/ scaled @!delta; /*space to remove between accent and accentee*/ scaled @!w; /*width of the accentee, not including sub/superscripts*/ fetch(accent_chr(q)); if (char_exists(cur_i)) {@+i=cur_i;c=cur_c;f=cur_f;@/ @<Compute the amount of skew@>; x=clean_box(nucleus(q), cramped_style(cur_style));w=width(x);h=height(x); @<Switch to a larger accent if available and appropriate@>; if (h < x_height(f)) delta=h;@+else delta=x_height(f); if ((math_type(supscr(q))!=empty)||(math_type(subscr(q))!=empty)) if (math_type(nucleus(q))==math_char) @<Swap the subscript and superscript into box |x|@>; y=char_box(f, c); shift_amount(y)=s+half(w-width(y)); width(y)=0;p=new_kern(-delta);link(p)=x;link(y)=p; y=vpack(y, natural);width(y)=width(x); if (height(y) < h) @<Make the height of box |y| equal to |h|@>; info(nucleus(q))=y; math_type(nucleus(q))=sub_box; } } @ @<Make the height of box |y|...@>= {@+p=new_kern(h-height(y));link(p)=list_ptr(y);list_ptr(y)=p; height(y)=h; } @ @<Switch to a larger accent if available and appropriate@>= loop@+{@+if (char_tag(i)!=list_tag) goto done; y=rem_byte(i); i=char_info(f)(y); if (!char_exists(i)) goto done; if (char_width(f)(i) > w) goto done; c=y; } done: @ @<Compute the amount of skew@>= s=0; if (math_type(nucleus(q))==math_char) {@+fetch(nucleus(q)); if (char_tag(cur_i)==lig_tag) {@+a=lig_kern_start(cur_f)(cur_i); cur_i=font_info[a].qqqq; if (skip_byte(cur_i) > stop_flag) {@+a=lig_kern_restart(cur_f)(cur_i); cur_i=font_info[a].qqqq; } loop@+{@+if (qo(next_char(cur_i))==skew_char[cur_f]) {@+if (op_byte(cur_i) >= kern_flag) if (skip_byte(cur_i) <= stop_flag) s=char_kern(cur_f)(cur_i); goto done1; } if (skip_byte(cur_i) >= stop_flag) goto done1; a=a+qo(skip_byte(cur_i))+1; cur_i=font_info[a].qqqq; } } } done1: @ @<Swap the subscript and superscript into box |x|@>= {@+flush_node_list(x);x=new_noad(); mem[nucleus(x)]=mem[nucleus(q)]; mem[supscr(x)]=mem[supscr(q)]; mem[subscr(x)]=mem[subscr(q)];@/ mem[supscr(q)].hh=empty_field; mem[subscr(q)].hh=empty_field;@/ math_type(nucleus(q))=sub_mlist;info(nucleus(q))=x; x=clean_box(nucleus(q), cur_style);delta=delta+height(x)-h;h=height(x); } @ The |make_fraction| procedure is a bit different because it sets |new_hlist(q)| directly rather than making a sub-box. @<Declare math...@>= void make_fraction(pointer @!q) {@+pointer p, @!v, @!x, @!y, @!z; /*temporary registers for box construction*/ scaled @!delta, @!delta1, @!delta2, @!shift_up, @!shift_down, @!clr; /*dimensions for box calculations*/ if (thickness(q)==default_code) thickness(q)=default_rule_thickness; @<Create equal-width boxes |x| and |z| for the numerator and denominator, and compute the default amounts |shift_up| and |shift_down| by which they are displaced from the baseline@>; if (thickness(q)==0) @<Adjust \(s)|shift_up| and |shift_down| for the case of no fraction line@>@; else@<Adjust \(s)|shift_up| and |shift_down| for the case of a fraction line@>; @<Construct a vlist box for the fraction, according to |shift_up| and |shift_down|@>; @<Put the \(f)fraction into a box with its delimiters, and make |new_hlist(q)| point to it@>; } @ @<Create equal-width boxes |x| and |z| for the numerator and denom...@>= x=clean_box(numerator(q), num_style(cur_style)); z=clean_box(denominator(q), denom_style(cur_style)); if (width(x) < width(z)) x=rebox(x, width(z)); else z=rebox(z, width(x)); if (cur_style < text_style) /*display style*/ {@+shift_up=num1(cur_size);shift_down=denom1(cur_size); } else{@+shift_down=denom2(cur_size); if (thickness(q)!=0) shift_up=num2(cur_size); else shift_up=num3(cur_size); } @ The numerator and denominator must be separated by a certain minimum clearance, called |clr| in the following program. The difference between |clr| and the actual clearance is |2 delta|. @<Adjust \(s)|shift_up| and |shift_down| for the case of no fraction line@>= {@+if (cur_style < text_style) clr=7*default_rule_thickness; else clr=3*default_rule_thickness; delta=half(clr-((shift_up-depth(x))-(height(z)-shift_down))); if (delta > 0) {@+shift_up=shift_up+delta; shift_down=shift_down+delta; } } @ In the case of a fraction line, the minimum clearance depends on the actual thickness of the line. @<Adjust \(s)|shift_up| and |shift_down| for the case of a fraction line@>= {@+if (cur_style < text_style) clr=3*thickness(q); else clr=thickness(q); delta=half(thickness(q)); delta1=clr-((shift_up-depth(x))-(axis_height(cur_size)+delta)); delta2=clr-((axis_height(cur_size)-delta)-(height(z)-shift_down)); if (delta1 > 0) shift_up=shift_up+delta1; if (delta2 > 0) shift_down=shift_down+delta2; } @ @<Construct a vlist box for the fraction...@>= v=new_null_box();type(v)=vlist_node; height(v)=shift_up+height(x);depth(v)=depth(z)+shift_down; width(v)=width(x); /*this also equals |width(z)|*/ if (thickness(q)==0) {@+p=new_kern((shift_up-depth(x))-(height(z)-shift_down)); link(p)=z; } else{@+y=fraction_rule(thickness(q));@/ p=new_kern((axis_height(cur_size)-delta)-@|(height(z)-shift_down));@/ link(y)=p;link(p)=z;@/ p=new_kern((shift_up-depth(x))-(axis_height(cur_size)+delta)); link(p)=y; } link(x)=p;list_ptr(v)=x @ @<Put the \(f)fraction into a box with its delimiters...@>= if (cur_style < text_style) delta=delim1(cur_size); else delta=delim2(cur_size); x=var_delimiter(left_delimiter(q), cur_size, delta);link(x)=v;@/ z=var_delimiter(right_delimiter(q), cur_size, delta);link(v)=z;@/ new_hlist(q)=hpack(x, natural) @ If the nucleus of an |op_noad| is a single character, it is to be centered vertically with respect to the axis, after first being enlarged (via a character list in the font) if we are in display style. The normal convention for placing displayed limits is to put them above and below the operator in display style. The italic correction is removed from the character if there is a subscript and the limits are not being displayed. The |make_op| routine returns the value that should be used as an offset between subscript and superscript. After |make_op| has acted, |subtype(q)| will be |limits| if and only if the limits have been set above and below the operator. In that case, |new_hlist(q)| will already contain the desired final box. @<Declare math...@>= scaled make_op(pointer @!q) {@+scaled delta; /*offset between subscript and superscript*/ pointer @!p, @!v, @!x, @!y, @!z; /*temporary registers for box construction*/ quarterword @!c;@+four_quarters @!i; /*registers for character examination*/ scaled @!shift_up, @!shift_down; /*dimensions for box calculation*/ if ((subtype(q)==normal)&&(cur_style < text_style)) subtype(q)=limits; if (math_type(nucleus(q))==math_char) {@+fetch(nucleus(q)); if ((cur_style < text_style)&&(char_tag(cur_i)==list_tag)) /*make it larger*/ {@+c=rem_byte(cur_i);i=char_info(cur_f)(c); if (char_exists(i)) {@+cur_c=c;cur_i=i;character(nucleus(q))=c; } } delta=char_italic(cur_f)(cur_i);x=clean_box(nucleus(q), cur_style); if ((math_type(subscr(q))!=empty)&&(subtype(q)!=limits)) width(x)=width(x)-delta; /*remove italic correction*/ shift_amount(x)=half(height(x)-depth(x))-axis_height(cur_size); /*center vertically*/ math_type(nucleus(q))=sub_box;info(nucleus(q))=x; } else delta=0; if (subtype(q)==limits) @<Construct a box with limits above and below it, skewed by |delta|@>; return delta; } @ The following program builds a vlist box |v| for displayed limits. The width of the box is not affected by the fact that the limits may be skewed. @<Construct a box with limits above and below it...@>= {@+x=clean_box(supscr(q), sup_style(cur_style)); y=clean_box(nucleus(q), cur_style); z=clean_box(subscr(q), sub_style(cur_style)); v=new_null_box();type(v)=vlist_node;width(v)=width(y); if (width(x) > width(v)) width(v)=width(x); if (width(z) > width(v)) width(v)=width(z); x=rebox(x, width(v));y=rebox(y, width(v));z=rebox(z, width(v));@/ shift_amount(x)=half(delta);shift_amount(z)=-shift_amount(x); height(v)=height(y);depth(v)=depth(y); @<Attach the limits to |y| and adjust |height(v)|, |depth(v)| to account for their presence@>; new_hlist(q)=v; } @ We use |shift_up| and |shift_down| in the following program for the amount of glue between the displayed operator |y| and its limits |x| and |z|. The vlist inside box |v| will consist of |x| followed by |y| followed by |z|, with kern nodes for the spaces between and around them. @<Attach the limits to |y| and adjust |height(v)|, |depth(v)|...@>= if (math_type(supscr(q))==empty) {@+free_node(x, box_node_size);list_ptr(v)=y; } else{@+shift_up=big_op_spacing3-depth(x); if (shift_up < big_op_spacing1) shift_up=big_op_spacing1; p=new_kern(shift_up);link(p)=y;link(x)=p;@/ p=new_kern(big_op_spacing5);link(p)=x;list_ptr(v)=p; height(v)=height(v)+big_op_spacing5+height(x)+depth(x)+shift_up; } if (math_type(subscr(q))==empty) free_node(z, box_node_size); else{@+shift_down=big_op_spacing4-height(z); if (shift_down < big_op_spacing2) shift_down=big_op_spacing2; p=new_kern(shift_down);link(y)=p;link(p)=z;@/ p=new_kern(big_op_spacing5);link(z)=p; depth(v)=depth(v)+big_op_spacing5+height(z)+depth(z)+shift_down; } @ A ligature found in a math formula does not create a |ligature_node|, because there is no question of hyphenation afterwards; the ligature will simply be stored in an ordinary |char_node|, after residing in an |ord_noad|. The |math_type| is converted to |math_text_char| here if we would not want to apply an italic correction to the current character unless it belongs to a math font (i.e., a font with |space==0|). No boundary characters enter into these ligatures. @<Declare math...@>= void make_ord(pointer @!q) {@+ int a; /*address of lig/kern instruction*/ pointer @!p, @!r; /*temporary registers for list manipulation*/ restart: @t@>@;@/ if (math_type(subscr(q))==empty) if (math_type(supscr(q))==empty) if (math_type(nucleus(q))==math_char) {@+p=link(q); if (p!=null) if ((type(p) >= ord_noad)&&(type(p) <= punct_noad)) if (math_type(nucleus(p))==math_char) if (fam(nucleus(p))==fam(nucleus(q))) {@+math_type(nucleus(q))=math_text_char; fetch(nucleus(q)); if (char_tag(cur_i)==lig_tag) {@+a=lig_kern_start(cur_f)(cur_i); cur_c=character(nucleus(p)); cur_i=font_info[a].qqqq; if (skip_byte(cur_i) > stop_flag) {@+a=lig_kern_restart(cur_f)(cur_i); cur_i=font_info[a].qqqq; } loop@+{@+@<If instruction |cur_i| is a kern with |cur_c|, attach the kern after~|q|; or if it is a ligature with |cur_c|, combine noads |q| and~|p| appropriately; then |return| if the cursor has moved past a noad, or |goto restart|@>; if (skip_byte(cur_i) >= stop_flag) return; a=a+qo(skip_byte(cur_i))+1; cur_i=font_info[a].qqqq; } } } } } @ Note that a ligature between an |ord_noad| and another kind of noad is replaced by an |ord_noad|, when the two noads collapse into one. But we could make a parenthesis (say) change shape when it follows certain letters. Presumably a font designer will define such ligatures only when this convention makes sense. \chardef\?='174 % vertical line to indicate character retention @<If instruction |cur_i| is a kern with |cur_c|,...@>= if (next_char(cur_i)==cur_c) if (skip_byte(cur_i) <= stop_flag) if (op_byte(cur_i) >= kern_flag) {@+p=new_kern(char_kern(cur_f)(cur_i)); link(p)=link(q);link(q)=p;return; } else{@+check_interrupt; /*allow a way out of infinite ligature loop*/ switch (op_byte(cur_i)) { qi(case 1): qi(case 5): character(nucleus(q))=rem_byte(cur_i);@+break; /*\.{=:\?}, \.{=:\?>}*/ qi(case 2): qi(case 6): character(nucleus(p))=rem_byte(cur_i);@+break; /*\.{\?=:}, \.{\?=:>}*/ qi(case 3): qi(case 7): qi(case 11): {@+r=new_noad(); /*\.{\?=:\?}, \.{\?=:\?>}, \.{\?=:\?>>}*/ character(nucleus(r))=rem_byte(cur_i); fam(nucleus(r))=fam(nucleus(q));@/ link(q)=r;link(r)=p; if (op_byte(cur_i) < qi(11)) math_type(nucleus(r))=math_char; else math_type(nucleus(r))=math_text_char; /*prevent combination*/ } @+break; default:{@+link(q)=link(p); character(nucleus(q))=rem_byte(cur_i); /*\.{=:}*/ mem[subscr(q)]=mem[subscr(p)];mem[supscr(q)]=mem[supscr(p)];@/ free_node(p, noad_size); } } if (op_byte(cur_i) > qi(3)) return; math_type(nucleus(q))=math_char;goto restart; } @ When we get to the following part of the program, we have ``fallen through'' from cases that did not lead to |check_dimensions| or |done_with_noad| or |done_with_node|. Thus, |q|~points to a noad whose nucleus may need to be converted to an hlist, and whose subscripts and superscripts need to be appended if they are present. If |nucleus(q)| is not a |math_char|, the variable |delta| is the amount by which a superscript should be moved right with respect to a subscript when both are present. @^subscripts@> @^superscripts@> @<Convert \(n)|nucleus(q)| to an hlist and attach the sub/superscripts@>= switch (math_type(nucleus(q))) { case math_char: case math_text_char: @<Create a character node |p| for |nucleus(q)|, possibly followed by a kern node for the italic correction, and set |delta| to the italic correction if a subscript is present@>@;@+break; case empty: p=null;@+break; case sub_box: p=info(nucleus(q));@+break; case sub_mlist: {@+cur_mlist=info(nucleus(q));save_style=cur_style; mlist_penalties=false;mlist_to_hlist(); /*recursive call*/ @^recursion@> cur_style=save_style;@<Set up the values...@>; p=hpack(link(temp_head), natural); } @+break; default:confusion(@[@<|"mlist2"|@>@]); @:this can't happen mlist2}{\quad mlist2@> } @/ new_hlist(q)=p; if ((math_type(subscr(q))==empty)&&(math_type(supscr(q))==empty)) goto check_dimensions; make_scripts(q, delta) @ @<Create a character node |p| for |nucleus(q)|...@>= {@+fetch(nucleus(q)); if (char_exists(cur_i)) {@+delta=char_italic(cur_f)(cur_i);p=new_character(cur_f, qo(cur_c)); if ((math_type(nucleus(q))==math_text_char)&&(space(cur_f)!=0)) delta=0; /*no italic correction in mid-word of text font*/ if ((math_type(subscr(q))==empty)&&(delta!=0)) {@+link(p)=new_kern(delta);delta=0; } } else p=null; } @ The purpose of |make_scripts(q, delta)| is to attach the subscript and/or superscript of noad |q| to the list that starts at |new_hlist(q)|, given that the subscript and superscript aren't both empty. The superscript will appear to the right of the subscript by a given distance |delta|. We set |shift_down| and |shift_up| to the minimum amounts to shift the baseline of subscripts and superscripts based on the given nucleus. @<Declare math...@>= void make_scripts(pointer @!q, scaled @!delta) {@+pointer p, @!x, @!y, @!z; /*temporary registers for box construction*/ scaled @!shift_up, @!shift_down, @!clr; /*dimensions in the calculation*/ small_number @!t; /*subsidiary size code*/ p=new_hlist(q); if (is_char_node(p)) {@+shift_up=0;shift_down=0; } else{@+z=hpack(p, natural); if (cur_style < script_style) t=script_size;@+else t=script_script_size; shift_up=height(z)-sup_drop(t); shift_down=depth(z)+sub_drop(t); free_node(z, box_node_size); } if (math_type(supscr(q))==empty) @<Construct a subscript box |x| when there is no superscript@>@; else{@+@<Construct a superscript box |x|@>; if (math_type(subscr(q))==empty) shift_amount(x)=-shift_up; else@<Construct a sub/superscript combination box |x|, with the superscript offset by |delta|@>; } if (new_hlist(q)==null) new_hlist(q)=x; else{@+p=new_hlist(q); while (link(p)!=null) p=link(p); link(p)=x; } } @ When there is a subscript without a superscript, the top of the subscript should not exceed the baseline plus four-fifths of the x-height. @<Construct a subscript box |x| when there is no superscript@>= {@+x=clean_box(subscr(q), sub_style(cur_style)); width(x)=width(x)+script_space; if (shift_down < sub1(cur_size)) shift_down=sub1(cur_size); clr=height(x)-(abs(math_x_height(cur_size)*4)/5); if (shift_down < clr) shift_down=clr; shift_amount(x)=shift_down; } @ The bottom of a superscript should never descend below the baseline plus one-fourth of the x-height. @<Construct a superscript box |x|@>= {@+x=clean_box(supscr(q), sup_style(cur_style)); width(x)=width(x)+script_space; if (odd(cur_style)) clr=sup3(cur_size); else if (cur_style < text_style) clr=sup1(cur_size); else clr=sup2(cur_size); if (shift_up < clr) shift_up=clr; clr=depth(x)+(abs(math_x_height(cur_size))/4); if (shift_up < clr) shift_up=clr; } @ When both subscript and superscript are present, the subscript must be separated from the superscript by at least four times |default_rule_thickness|. If this condition would be violated, the subscript moves down, after which both subscript and superscript move up so that the bottom of the superscript is at least as high as the baseline plus four-fifths of the x-height. @<Construct a sub/superscript combination box |x|...@>= {@+y=clean_box(subscr(q), sub_style(cur_style)); width(y)=width(y)+script_space; if (shift_down < sub2(cur_size)) shift_down=sub2(cur_size); clr=4*default_rule_thickness- ((shift_up-depth(x))-(height(y)-shift_down)); if (clr > 0) {@+shift_down=shift_down+clr; clr=(abs(math_x_height(cur_size)*4)/5)-(shift_up-depth(x)); if (clr > 0) {@+shift_up=shift_up+clr; shift_down=shift_down-clr; } } shift_amount(x)=delta; /*superscript is |delta| to the right of the subscript*/ p=new_kern((shift_up-depth(x))-(height(y)-shift_down));link(x)=p;link(p)=y; x=vpack(x, natural);shift_amount(x)=shift_down; } @ We have now tied up all the loose ends of the first pass of |mlist_to_hlist|. The second pass simply goes through and hooks everything together with the proper glue and penalties. It also handles the |left_noad| and |right_noad| that might be present, since |max_h| and |max_d| are now known. Variable |p| points to a node at the current end of the final hlist. @<Make a second pass over the mlist,...@>= p=temp_head;link(p)=null;q=mlist;r_type=0;cur_style=style; @<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>; while (q!=null) {@+@<If node |q| is a style node, change the style and |goto delete_q|; otherwise if it is not a noad, put it into the hlist, advance |q|, and |goto done|; otherwise set |s| to the size of noad |q|, set |t| to the associated type (|ord_noad.. inner_noad|), and set |pen| to the associated penalty@>; @<Append inter-element spacing based on |r_type| and |t|@>; @<Append any |new_hlist| entries for |q|, and any appropriate penalties@>; r_type=t; delete_q: r=q;q=link(q);free_node(r, s); done: ;} @ Just before doing the big |case| switch in the second pass, the program sets up default values so that most of the branches are short. @<If node |q| is a style node, change the style...@>= t=ord_noad;s=noad_size;pen=inf_penalty; switch (type(q)) { case op_noad: case open_noad: case close_noad: case punct_noad: case inner_noad: t=type(q);@+break; case bin_noad: {@+t=bin_noad;pen=bin_op_penalty; } @+break; case rel_noad: {@+t=rel_noad;pen=rel_penalty; } @+break; case ord_noad: case vcenter_noad: case over_noad: case under_noad: do_nothing;@+break; case radical_noad: s=radical_noad_size;@+break; case accent_noad: s=accent_noad_size;@+break; case fraction_noad: {@+t=inner_noad;s=fraction_noad_size; } @+break; case left_noad: case right_noad: t=make_left_right(q, style, max_d, max_h);@+break; case style_node: @<Change the current style and |goto delete_q|@>@; case whatsit_node: case penalty_node: case rule_node: case disc_node: case adjust_node: case ins_node: case mark_node: case glue_node: case kern_node: @t@>@;@/ {@+link(p)=q;p=q;q=link(q);link(p)=null;goto done; } default:confusion(@[@<|"mlist3"|@>@]); @:this can't happen mlist3}{\quad mlist3@> } @ The |make_left_right| function constructs a left or right delimiter of the required size and returns the value |open_noad| or |close_noad|. The |right_noad| and |left_noad| will both be based on the original |style|, so they will have consistent sizes. We use the fact that |right_noad-left_noad==close_noad-open_noad|. @<Declare math...@>= small_number make_left_right(pointer @!q, small_number @!style, scaled @!max_d, scaled @!max_h) {@+scaled delta, @!delta1, @!delta2; /*dimensions used in the calculation*/ if (style < script_style) cur_size=text_size; else cur_size=16*((style-text_style)/2); delta2=max_d+axis_height(cur_size); delta1=max_h+max_d-delta2; if (delta2 > delta1) delta1=delta2; /*|delta1| is max distance from axis*/ delta=(delta1/500)*delimiter_factor; delta2=delta1+delta1-delimiter_shortfall; if (delta < delta2) delta=delta2; new_hlist(q)=var_delimiter(delimiter(q), cur_size, delta); return type(q)-(left_noad-open_noad); /*|open_noad| or |close_noad|*/ } @ @<Change the current style and |goto delete_q|@>= {@+cur_style=subtype(q);s=style_node_size; @<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>; goto delete_q; } @ The inter-element spacing in math formulas depends on a $8\times8$ table that \TeX\ preloads as a 64-digit string. The elements of this string have the following significance: $$\vbox{\halign{#\hfil\cr \.0 means no space;\cr \.1 means a conditional thin space (\.{\\nonscript\\mskip\\thinmuskip});\cr \.2 means a thin space (\.{\\mskip\\thinmuskip});\cr \.3 means a conditional medium space (\.{\\nonscript\\mskip\\medmuskip});\cr \.4 means a conditional thick space (\.{\\nonscript\\mskip\\thickmuskip});\cr \.* means an impossible case.\cr}}$$ This is all pretty cryptic, but {\sl The \TeX book\/} explains what is supposed to happen, and the string makes it happen. @:TeXbook}{\sl The \TeX book@> A global variable |magic_offset| is computed so that if |a| and |b| are in the range |ord_noad dotdot inner_noad|, then |str_pool[a*8+b+magic_offset]| is the digit for spacing between noad types |a| and |b|. If \PASCAL\ had provided a good way to preload constant arrays, this part of the program would not have been so strange. @:PASCAL}{\PASCAL@> @d math_spacing_str @;@/ @t\hskip-35pt@> "math_spacing" @t$ \hskip-35pt$@> @<Glob...@>= int @!magic_offset; /*used to find inter-element spacing*/ @ @<Compute the magic offset@>= magic_offset=str_start[math_spacing]-9*ord_noad @ @<Append inter-element spacing based on |r_type| and |t|@>= if (r_type > 0) /*not the first noad*/ {@+switch (so(str_pool[r_type*8+t+magic_offset])) { case '0': x=0;@+break; case '1': if (cur_style < script_style) x=thin_mu_skip_code;@+else x=0;@+break; case '2': x=thin_mu_skip_code;@+break; case '3': if (cur_style < script_style) x=med_mu_skip_code;@+else x=0;@+break; case '4': if (cur_style < script_style) x=thick_mu_skip_code;@+else x=0;@+break; default:confusion(@[@<|"mlist4"|@>@]); @:this can't happen mlist4}{\quad mlist4@> } if (x!=0) {@+y=math_glue(glue_par(x), cur_mu); z=new_glue(y);glue_ref_count(y)=null;link(p)=z;p=z;@/ subtype(z)=x+1; /*store a symbolic subtype*/ } } @ We insert a penalty node after the hlist entries of noad |q| if |pen| is not an ``infinite'' penalty, and if the node immediately following |q| is not a penalty node or a |rel_noad| or absent entirely. @<Append any |new_hlist| entries for |q|, and any appropriate penalties@>= if (new_hlist(q)!=null) {@+link(p)=new_hlist(q); @/do@+{p=link(p); }@+ while (!(link(p)==null)); } if (penalties) if (link(q)!=null) if (pen < inf_penalty) {@+r_type=type(link(q)); if (r_type!=penalty_node) if (r_type!=rel_noad) {@+z=new_penalty(pen);link(p)=z;p=z; } } @* Alignment. It's sort of a miracle whenever \.{\\halign} and \.{\\valign} work, because they cut across so many of the control structures of \TeX. Therefore the present page is probably not the best place for a beginner to start reading this program; it is better to master everything else first. Let us focus our thoughts on an example of what the input might be, in order to get some idea about how the alignment miracle happens. The example doesn't do anything useful, but it is sufficiently general to indicate all of the special cases that must be dealt with; please do not be disturbed by its apparent complexity and meaninglessness. $$\vbox{\halign{\.{#}\hfil\cr {}\\tabskip 2pt plus 3pt\cr {}\\halign to 300pt\{u1\#v1\&\cr \hskip 50pt\\tabskip 1pt plus 1fil u2\#v2\&\cr \hskip 50pt u3\#v3\\cr\cr \hskip 25pt a1\&\\omit a2\&\\vrule\\cr\cr \hskip 25pt \\noalign\{\\vskip 3pt\}\cr \hskip 25pt b1\\span b2\\cr\cr \hskip 25pt \\omit\&c2\\span\\omit\\cr\}\cr}}$$ Here's what happens: \yskip (0) When `\.{\\halign to 300pt\{}' is scanned, the |scan_spec| routine places the 300pt dimension onto the |save_stack|, and an |align_group| code is placed above it. This will make it possible to complete the alignment when the matching `\.\}' is found. (1) The preamble is scanned next. Macros in the preamble are not expanded, @^preamble@> except as part of a tabskip specification. For example, if \.{u2} had been a macro in the preamble above, it would have been expanded, since \TeX\ must look for `\.{minus...}' as part of the tabskip glue. A ``preamble list'' is constructed based on the user's preamble; in our case it contains the following seven items: $$\vbox{\halign{\.{#}\hfil\qquad&(#)\hfil\cr {}\\glue 2pt plus 3pt&the tabskip preceding column 1\cr {}\\alignrecord, width $-\infty$&preamble info for column 1\cr {}\\glue 2pt plus 3pt&the tabskip between columns 1 and 2\cr {}\\alignrecord, width $-\infty$&preamble info for column 2\cr {}\\glue 1pt plus 1fil&the tabskip between columns 2 and 3\cr {}\\alignrecord, width $-\infty$&preamble info for column 3\cr {}\\glue 1pt plus 1fil&the tabskip following column 3\cr}}$$ These ``alignrecord'' entries have the same size as an |unset_node|, since they will later be converted into such nodes. However, at the moment they have no |type| or |subtype| fields; they have |info| fields instead, and these |info| fields are initially set to the value |end_span|, for reasons explained below. Furthermore, the alignrecord nodes have no |height| or |depth| fields; these are renamed |u_part| and |v_part|, and they point to token lists for the templates of the alignment. For example, the |u_part| field in the first alignrecord points to the token list `\.{u1}', i.e., the template preceding the `\.\#' for column~1. (2) \TeX\ now looks at what follows the \.{\\cr} that ended the preamble. It is not `\.{\\noalign}' or `\.{\\omit}', so this input is put back to be read again, and the template `\.{u1}' is fed to the scanner. Just before reading `\.{u1}', \TeX\ goes into restricted horizontal mode. Just after reading `\.{u1}', \TeX\ will see `\.{a1}', and then (when the {\.\&} is sensed) \TeX\ will see `\.{v1}'. Then \TeX\ scans an |endv| token, indicating the end of a column. At this point an |unset_node| is created, containing the contents of the current hlist (i.e., `\.{u1a1v1}'). The natural width of this unset node replaces the |width| field of the alignrecord for column~1; in general, the alignrecords will record the maximum natural width that has occurred so far in a given column. (3) Since `\.{\\omit}' follows the `\.\&', the templates for column~2 are now bypassed. Again \TeX\ goes into restricted horizontal mode and makes an |unset_node| from the resulting hlist; but this time the hlist contains simply `\.{a2}'. The natural width of the new unset box is remembered in the |width| field of the alignrecord for column~2. (4) A third |unset_node| is created for column 3, using essentially the mechanism that worked for column~1; this unset box contains `\.{u3\\vrule v3}'. The vertical rule in this case has running dimensions that will later extend to the height and depth of the whole first row, since each |unset_node| in a row will eventually inherit the height and depth of its enclosing box. (5) The first row has now ended; it is made into a single unset box comprising the following seven items: $$\vbox{\halign{\hbox to 325pt{\qquad\.{#}\hfil}\cr {}\\glue 2pt plus 3pt\cr {}\\unsetbox for 1 column: u1a1v1\cr {}\\glue 2pt plus 3pt\cr {}\\unsetbox for 1 column: a2\cr {}\\glue 1pt plus 1fil\cr {}\\unsetbox for 1 column: u3\\vrule v3\cr {}\\glue 1pt plus 1fil\cr}}$$ The width of this unset row is unimportant, but it has the correct height and depth, so the correct baselineskip glue will be computed as the row is inserted into a vertical list. (6) Since `\.{\\noalign}' follows the current \.{\\cr}, \TeX\ appends additional material (in this case \.{\\vskip 3pt}) to the vertical list. While processing this material, \TeX\ will be in internal vertical mode, and |no_align_group| will be on |save_stack|. (7) The next row produces an unset box that looks like this: $$\vbox{\halign{\hbox to 325pt{\qquad\.{#}\hfil}\cr {}\\glue 2pt plus 3pt\cr {}\\unsetbox for 2 columns: u1b1v1u2b2v2\cr {}\\glue 1pt plus 1fil\cr {}\\unsetbox for 1 column: {\rm(empty)}\cr {}\\glue 1pt plus 1fil\cr}}$$ The natural width of the unset box that spans columns 1~and~2 is stored in a ``span node,'' which we will explain later; the |info| field of the alignrecord for column~1 now points to the new span node, and the |info| of the span node points to |end_span|. (8) The final row produces the unset box $$\vbox{\halign{\hbox to 325pt{\qquad\.{#}\hfil}\cr {}\\glue 2pt plus 3pt\cr {}\\unsetbox for 1 column: {\rm(empty)}\cr {}\\glue 2pt plus 3pt\cr {}\\unsetbox for 2 columns: u2c2v2\cr {}\\glue 1pt plus 1fil\cr}}$$ A new span node is attached to the alignrecord for column 2. (9) The last step is to compute the true column widths and to change all the unset boxes to hboxes, appending the whole works to the vertical list that encloses the \.{\\halign}. The rules for deciding on the final widths of each unset column box will be explained below. \yskip\noindent Note that as \.{\\halign} is being processed, we fearlessly give up control to the rest of \TeX. At critical junctures, an alignment routine is called upon to step in and do some little action, but most of the time these routines just lurk in the background. It's something like post-hypnotic suggestion. @ We have mentioned that alignrecords contain no |height| or |depth| fields. Their |glue_sign| and |glue_order| are pre-empted as well, since it is necessary to store information about what to do when a template ends. This information is called the |extra_info| field. @d u_part(X) mem[X+height_offset].i /*pointer to \<u_j> token list*/ @d v_part(X) mem[X+depth_offset].i /*pointer to \<v_j> token list*/ @d extra_info(X) info(X+list_offset) /*info to remember during template*/ @ Alignments can occur within alignments, so a small stack is used to access the alignrecord information. At each level we have a |preamble| pointer, indicating the beginning of the preamble list; a |cur_align| pointer, indicating the current position in the preamble list; a |cur_span| pointer, indicating the value of |cur_align| at the beginning of a sequence of spanned columns; a |cur_loop| pointer, indicating the tabskip glue before an alignrecord that should be copied next if the current list is extended; and the |align_state| variable, which indicates the nesting of braces so that \.{\\cr} and \.{\\span} and tab marks are properly intercepted. There also are pointers |cur_head| and |cur_tail| to the head and tail of a list of adjustments being moved out from horizontal mode to vertical~mode. The current values of these seven quantities appear in global variables; when they have to be pushed down, they are stored in 5-word nodes, and |align_ptr| points to the topmost such node. @d preamble link(align_head) /*the current preamble list*/ @d align_stack_node_size 5 /*number of |mem| words to save alignment states*/ @<Glob...@>= pointer @!cur_align; /*current position in preamble list*/ pointer @!cur_span; /*start of currently spanned columns in preamble list*/ pointer @!cur_loop; /*place to copy when extending a periodic preamble*/ pointer @!align_ptr; /*most recently pushed-down alignment stack node*/ pointer @!cur_head, @!cur_tail; /*adjustment list pointers*/ @ The |align_state| and |preamble| variables are initialized elsewhere. @<Set init...@>= align_ptr=null;cur_align=null;cur_span=null;cur_loop=null; cur_head=null;cur_tail=null; @ Alignment stack maintenance is handled by a pair of trivial routines called |push_alignment| and |pop_alignment|. @p void push_alignment(void) {@+pointer p; /*the new alignment stack node*/ p=get_node(align_stack_node_size); link(p)=align_ptr;info(p)=cur_align; llink(p)=preamble;rlink(p)=cur_span; mem[p+2].i=cur_loop;mem[p+3].i=align_state; info(p+4)=cur_head;link(p+4)=cur_tail; align_ptr=p; cur_head=get_avail(); } @# void pop_alignment(void) {@+pointer p; /*the top alignment stack node*/ free_avail(cur_head); p=align_ptr; cur_tail=link(p+4);cur_head=info(p+4); align_state=mem[p+3].i;cur_loop=mem[p+2].i; cur_span=rlink(p);preamble=llink(p); cur_align=info(p);align_ptr=link(p); free_node(p, align_stack_node_size); } @ \TeX\ has eight procedures that govern alignments: |init_align| and |fin_align| are used at the very beginning and the very end; |init_row| and |fin_row| are used at the beginning and end of individual rows; |init_span| is used at the beginning of a sequence of spanned columns (possibly involving only one column); |init_col| and |fin_col| are used at the beginning and end of individual columns; and |align_peek| is used after \.{\\cr} to see whether the next item is \.{\\noalign}. We shall consider these routines in the order they are first used during the course of a complete \.{\\halign}, namely |init_align|, |align_peek|, |init_row|, |init_span|, |init_col|, |fin_col|, |fin_row|, |fin_align|. @ When \.{\\halign} or \.{\\valign} has been scanned in an appropriate mode, \TeX\ calls |init_align|, whose task is to get everything off to a good start. This mostly involves scanning the preamble and putting its information into the preamble list. @^preamble@> @p@t\4@>@<Declare the procedure called |get_preamble_token|@>@t@>@/ void align_peek(void);@/ void normal_paragraph(void);@/ void init_align(void) {@+ pointer save_cs_ptr; /*|warning_index| value for error messages*/ pointer @!p; /*for short-term temporary use*/ save_cs_ptr=cur_cs; /*\.{\\halign} or \.{\\valign}, usually*/ push_alignment();align_state=-1000000; /*enter a new alignment level*/ @<Check for improper alignment in displayed math@>; push_nest(); /*enter a new semantic level*/ @<Change current mode to |-vmode| for \.{\\halign}, |-hmode| for \.{\\valign}@>; scan_spec(align_group, false);@/ @<Scan the preamble and record it in the |preamble| list@>; new_save_level(align_group); if (every_cr!=null) begin_token_list(every_cr, every_cr_text); align_peek(); /*look for \.{\\noalign} or \.{\\omit}*/ } @ In vertical modes, |prev_depth| already has the correct value. But if we are in |mmode| (displayed formula mode), we reach out to the enclosing vertical mode for the |prev_depth| value that produces the correct baseline calculations. @<Change current mode...@>= if (mode==mmode) {@+mode=-vmode;prev_depth=nest[nest_ptr-2].aux_field.sc; } else if (mode > 0) negate(mode) @ When \.{\\halign} is used as a displayed formula, there should be no other pieces of mlists present. @<Check for improper alignment in displayed math@>= if ((mode==mmode)&&((tail!=head)||(incompleat_noad!=null))) {@+print_err("Improper ");print_esc(@[@<|"halign"|@>@]);print_str(" inside $$'s"); @.Improper \\halign...@> help3("Displays can use special alignments (like \\eqalignno)")@/ ("only if nothing but the alignment itself is between $$'s.")@/ ("So I've deleted the formulas that preceded this alignment."); error();flush_math(); } @ @<Scan the preamble and record it in the |preamble| list@>= preamble=null;cur_align=align_head;cur_loop=null;scanner_status=aligning; warning_index=save_cs_ptr;align_state=-1000000; /*at this point, |cur_cmd==left_brace|*/ loop@+{@+@<Append the current tabskip glue to the preamble list@>; if (cur_cmd==car_ret) goto done; /*\.{\\cr} ends the preamble*/ @<Scan preamble text until |cur_cmd| is |tab_mark| or |car_ret|, looking for changes in the tabskip glue; append an alignrecord to the preamble list@>; } done: scanner_status=normal @ @<Append the current tabskip glue to the preamble list@>= link(cur_align)=new_param_glue(tab_skip_code); cur_align=link(cur_align) @ @<Scan preamble text until |cur_cmd| is |tab_mark| or |car_ret|...@>= @<Scan the template \<u_j>, putting the resulting token list in |hold_head|@>; link(cur_align)=new_null_box();cur_align=link(cur_align); /*a new alignrecord*/ info(cur_align)=end_span;width(cur_align)=null_flag; u_part(cur_align)=link(hold_head); @<Scan the template \<v_j>, putting the resulting token list in |hold_head|@>; v_part(cur_align)=link(hold_head) @ We enter `\.{\\span}' into |eqtb| with |tab_mark| as its command code, and with |span_code| as the command modifier. This makes \TeX\ interpret it essentially the same as an alignment delimiter like `\.\&', yet it is recognizably different when we need to distinguish it from a normal delimiter. It also turns out to be useful to give a special |cr_code| to `\.{\\cr}', and an even larger |cr_cr_code| to `\.{\\crcr}'. The end of a template is represented by two ``frozen'' control sequences called \.{\\endtemplate}. The first has the command code |end_template|, which is | > outer_call|, so it will not easily disappear in the presence of errors. The |get_x_token| routine converts the first into the second, which has |endv| as its command code. @d span_code 256 /*distinct from any character*/ @d cr_code 257 /*distinct from |span_code| and from any character*/ @d cr_cr_code (cr_code+1) /*this distinguishes \.{\\crcr} from \.{\\cr}*/ @d end_template_token cs_token_flag+frozen_end_template @<Put each of \TeX's primitives into the hash table@>= primitive(@[@<|"span"|@>@], tab_mark, span_code);@/ @!@:span_}{\.{\\span} primitive@> primitive(@[@<|"cr"|@>@], car_ret, cr_code); @!@:cr_}{\.{\\cr} primitive@> text(frozen_cr)=@[@<|"cr"|@>@];eqtb[frozen_cr]=eqtb[cur_val];@/ primitive(@[@<|"crcr"|@>@], car_ret, cr_cr_code); @!@:cr_cr_}{\.{\\crcr} primitive@> text(frozen_end_template)=@[@<|"endtemplate"|@>@];text(frozen_endv)=@[@<|"endtemplate"|@>@]; eq_type(frozen_endv)=endv;equiv(frozen_endv)=null_list; eq_level(frozen_endv)=level_one;@/ eqtb[frozen_end_template]=eqtb[frozen_endv]; eq_type(frozen_end_template)=end_template; @ @<Cases of |print_cmd_chr|...@>= case tab_mark: if (chr_code==span_code) print_esc(@[@<|"span"|@>@]); else chr_cmd("alignment tab character ")@;@+break; case car_ret: if (chr_code==cr_code) print_esc(@[@<|"cr"|@>@]); else print_esc(@[@<|"crcr"|@>@]);@+break; @ The preamble is copied directly, except that \.{\\tabskip} causes a change to the tabskip glue, thereby possibly expanding macros that immediately follow it. An appearance of \.{\\span} also causes such an expansion. Note that if the preamble contains `\.{\\global\\tabskip}', the `\.{\\global}' token survives in the preamble and the `\.{\\tabskip}' defines new tabskip glue (locally). @<Declare the procedure called |get_preamble_token|@>= void get_preamble_token(void) {@+ restart: get_token(); while ((cur_chr==span_code)&&(cur_cmd==tab_mark)) {@+get_token(); /*this token will be expanded once*/ if (cur_cmd > max_command) {@+expand();get_token(); } } if (cur_cmd==endv) fatal_error("(interwoven alignment preambles are not allowed)"); @.interwoven alignment preambles...@> if ((cur_cmd==assign_glue)&&(cur_chr==glue_base+tab_skip_code)) {@+scan_optional_equals();scan_glue(glue_val); if (global_defs > 0) geq_define(glue_base+tab_skip_code, glue_ref, cur_val); else eq_define(glue_base+tab_skip_code, glue_ref, cur_val); goto restart; } } @ Spaces are eliminated from the beginning of a template. @<Scan the template \<u_j>...@>= p=hold_head;link(p)=null; loop@+{@+get_preamble_token(); if (cur_cmd==mac_param) goto done1; if ((cur_cmd <= car_ret)&&(cur_cmd >= tab_mark)&&(align_state==-1000000)) if ((p==hold_head)&&(cur_loop==null)&&(cur_cmd==tab_mark) ) cur_loop=cur_align; else{@+print_err("Missing # inserted in alignment preamble"); @.Missing \# inserted...@> help3("There should be exactly one # between &'s, when an")@/ ("\\halign or \\valign is being set up. In this case you had")@/ ("none, so I've put one in; maybe that will work."); back_error();goto done1; } else if ((cur_cmd!=spacer)||(p!=hold_head)) {@+link(p)=get_avail();p=link(p);info(p)=cur_tok; } } done1: @ @<Scan the template \<v_j>...@>= p=hold_head;link(p)=null; loop@+{@+resume: get_preamble_token(); if ((cur_cmd <= car_ret)&&(cur_cmd >= tab_mark)&&(align_state==-1000000)) goto done2; if (cur_cmd==mac_param) {@+print_err("Only one # is allowed per tab"); @.Only one \# is allowed...@> help3("There should be exactly one # between &'s, when an")@/ ("\\halign or \\valign is being set up. In this case you had")@/ ("more than one, so I'm ignoring all but the first."); error();goto resume; } link(p)=get_avail();p=link(p);info(p)=cur_tok; } done2: link(p)=get_avail();p=link(p); info(p)=end_template_token /*put \.{\\endtemplate} at the end*/ @ The tricky part about alignments is getting the templates into the scanner at the right time, and recovering control when a row or column is finished. We usually begin a row after each \.{\\cr} has been sensed, unless that \.{\\cr} is followed by \.{\\noalign} or by the right brace that terminates the alignment. The |align_peek| routine is used to look ahead and do the right thing; it either gets a new row started, or gets a \.{\\noalign} started, or finishes off the alignment. @<Declare the procedure called |align_peek|@>= void align_peek(void) {@+ restart: align_state=1000000;@<Get the next non-blank non-call token@>; if (cur_cmd==no_align) {@+scan_left_brace();new_save_level(no_align_group); if (mode==-vmode) normal_paragraph(); } else if (cur_cmd==right_brace) fin_align(); else if ((cur_cmd==car_ret)&&(cur_chr==cr_cr_code)) goto restart; /*ignore \.{\\crcr}*/ else{@+init_row(); /*start a new row*/ init_col(); /*start a new column and replace what we peeked at*/ } } @ To start a row (i.e., a `row' that rhymes with `dough' but not with `bough'), we enter a new semantic level, copy the first tabskip glue, and change from internal vertical mode to restricted horizontal mode or vice versa. The |space_factor| and |prev_depth| are not used on this semantic level, but we clear them to zero just to be tidy. @p@t\4@>@<Declare the procedure called |init_span|@>@t@>@/ void init_row(void) {@+push_nest();mode=(-hmode-vmode)-mode; if (mode==-hmode) space_factor=0;@+else prev_depth=0; tail_append(new_glue(glue_ptr(preamble))); subtype(tail)=tab_skip_code+1;@/ cur_align=link(preamble);cur_tail=cur_head;init_span(cur_align); } @ The parameter to |init_span| is a pointer to the alignrecord where the next column or group of columns will begin. A new semantic level is entered, so that the columns will generate a list for subsequent packaging. @<Declare the procedure called |init_span|@>= void init_span(pointer @!p) {@+push_nest(); if (mode==-hmode) space_factor=1000; else{@+prev_depth=ignore_depth;normal_paragraph(); } cur_span=p; } @ When a column begins, we assume that |cur_cmd| is either |omit| or else the current token should be put back into the input until the \<u_j> template has been scanned. (Note that |cur_cmd| might be |tab_mark| or |car_ret|.) We also assume that |align_state| is approximately 1000000 at this time. We remain in the same mode, and start the template if it is called for. @p void init_col(void) {@+extra_info(cur_align)=cur_cmd; if (cur_cmd==omit) align_state=0; else{@+back_input();begin_token_list(u_part(cur_align), u_template); } /*now |align_state==1000000|*/ } @ The scanner sets |align_state| to zero when the \<u_j> template ends. When a subsequent \.{\\cr} or \.{\\span} or tab mark occurs with |align_state==0|, the scanner activates the following code, which fires up the \<v_j> template. We need to remember the |cur_chr|, which is either |cr_cr_code|, |cr_code|, |span_code|, or a character code, depending on how the column text has ended. This part of the program had better not be activated when the preamble to another alignment is being scanned, or when no alignment preamble is active. @<Insert the \(v)\<v_j>...@>= {@+if ((scanner_status==aligning)||(cur_align==null)) fatal_error("(interwoven alignment preambles are not allowed)"); @.interwoven alignment preambles...@> cur_cmd=extra_info(cur_align);extra_info(cur_align)=cur_chr; if (cur_cmd==omit) begin_token_list(omit_template, v_template); else begin_token_list(v_part(cur_align), v_template); align_state=1000000;goto restart; } @ The token list |omit_template| just referred to is a constant token list that contains the special control sequence \.{\\endtemplate} only. @<Initialize the special...@>= info(omit_template)=end_template_token; /*|link(omit_template)==null|*/ @ When the |endv| command at the end of a \<v_j> template comes through the scanner, things really start to happen; and it is the |fin_col| routine that makes them happen. This routine returns |true| if a row as well as a column has been finished. @p bool fin_col(void) {@+ pointer p; /*the alignrecord after the current one*/ pointer @!q, @!r; /*temporary pointers for list manipulation*/ pointer @!s; /*a new span node*/ pointer @!u; /*a new unset box*/ scaled @!w; /*natural width*/ glue_ord @!o; /*order of infinity*/ halfword @!n; /*span counter*/ if (cur_align==null) confusion(@[@<|"endv"|@>@]); q=link(cur_align);@+if (q==null) confusion(@[@<|"endv"|@>@]); @:this can't happen endv}{\quad endv@> if (align_state < 500000) fatal_error("(interwoven alignment preambles are not allowed)"); @.interwoven alignment preambles...@> p=link(q); @<If the preamble list has been traversed, check that the row has ended@>; if (extra_info(cur_align)!=span_code) {@+unsave();new_save_level(align_group);@/ @<Package an unset box for the current column and record its width@>; @<Copy the tabskip glue between columns@>; if (extra_info(cur_align) >= cr_code) {@+return true; } init_span(p); } align_state=1000000;@<Get the next non-blank non-call token@>; cur_align=p; init_col();return false; } @ @<If the preamble list has been traversed, check that the row has ended@>= if ((p==null)&&(extra_info(cur_align) < cr_code)) if (cur_loop!=null) @<Lengthen the preamble periodically@>@; else{@+print_err("Extra alignment tab has been changed to "); @.Extra alignment tab...@> print_esc(@[@<|"cr"|@>@]); help3("You have given more \\span or & marks than there were")@/ ("in the preamble to the \\halign or \\valign now in progress.")@/ ("So I'll assume that you meant to type \\cr instead."); extra_info(cur_align)=cr_code;error(); } @ @<Lengthen the preamble...@>= {@+link(q)=new_null_box();p=link(q); /*a new alignrecord*/ info(p)=end_span;width(p)=null_flag;cur_loop=link(cur_loop); @<Copy the templates from node |cur_loop| into node |p|@>; cur_loop=link(cur_loop); link(p)=new_glue(glue_ptr(cur_loop)); } @ @<Copy the templates from node |cur_loop| into node |p|@>= q=hold_head;r=u_part(cur_loop); while (r!=null) {@+link(q)=get_avail();q=link(q);info(q)=info(r);r=link(r); } link(q)=null;u_part(p)=link(hold_head); q=hold_head;r=v_part(cur_loop); while (r!=null) {@+link(q)=get_avail();q=link(q);info(q)=info(r);r=link(r); } link(q)=null;v_part(p)=link(hold_head) @ @<Copy the tabskip glue...@>= tail_append(new_glue(glue_ptr(link(cur_align)))); subtype(tail)=tab_skip_code+1 @ @<Package an unset...@>= {@+if (mode==-hmode) {@+adjust_tail=cur_tail;u=hpack(link(head), natural);w=width(u); cur_tail=adjust_tail;adjust_tail=null; } else{@+u=vpackage(link(head), natural, 0);w=height(u); } n=min_quarterword; /*this represents a span count of 1*/ if (cur_span!=cur_align) @<Update width entry for spanned columns@>@; else if (w > width(cur_align)) width(cur_align)=w; type(u)=unset_node;span_count(u)=n;@/ @<Determine the stretch order@>; glue_order(u)=o;glue_stretch(u)=total_stretch[o];@/ @<Determine the shrink order@>; glue_sign(u)=o;glue_shrink(u)=total_shrink[o];@/ pop_nest();link(tail)=u;tail=u; } @ A span node is a 2-word record containing |width|, |info|, and |link| fields. The |link| field is not really a link, it indicates the number of spanned columns; the |info| field points to a span node for the same starting column, having a greater extent of spanning, or to |end_span|, which has the largest possible |link| field; the |width| field holds the largest natural width corresponding to a particular set of spanned columns. A list of the maximum widths so far, for spanned columns starting at a given column, begins with the |info| field of the alignrecord for that column. @d span_node_size 2 /*number of |mem| words for a span node*/ @<Initialize the special list heads...@>= link(end_span)=max_quarterword+1;info(end_span)=null; @ @<Update width entry for spanned columns@>= {@+q=cur_span; @/do@+{incr(n);q=link(link(q)); }@+ while (!(q==cur_align)); if (n > max_quarterword) confusion(@[@<|"256 spans"|@>@]); /*this can happen, but won't*/ @^system dependencies@> @:this can't happen 256 spans}{\quad 256 spans@> q=cur_span;while (link(info(q)) < n) q=info(q); if (link(info(q)) > n) {@+s=get_node(span_node_size);info(s)=info(q);link(s)=n; info(q)=s;width(s)=w; } else if (width(info(q)) < w) width(info(q))=w; } @ At the end of a row, we append an unset box to the current vlist (for \.{\\halign}) or the current hlist (for \.{\\valign}). This unset box contains the unset boxes for the columns, separated by the tabskip glue. Everything will be set later. @p void fin_row(void) {@+pointer p; /*the new unset box*/ if (mode==-hmode) {@+p=hpack(link(head), natural); pop_nest();append_to_vlist(p); if (cur_head!=cur_tail) {@+link(tail)=link(cur_head);tail=cur_tail; } } else{@+p=vpack(link(head), natural);pop_nest(); link(tail)=p;tail=p;space_factor=1000; } type(p)=unset_node;glue_stretch(p)=0; if (every_cr!=null) begin_token_list(every_cr, every_cr_text); align_peek(); } /*note that |glue_shrink(p)==0| since |glue_shrink====shift_amount|*/ @ Finally, we will reach the end of the alignment, and we can breathe a sigh of relief that memory hasn't overflowed. All the unset boxes will now be set so that the columns line up, taking due account of spanned columns. @p void do_assignments(void);@/ void resume_after_display(void);@/ void build_page(void);@/ void fin_align(void) {@+pointer @!p, @!q, @!r, @!s, @!u, @!v; /*registers for the list operations*/ scaled @!t, @!w; /*width of column*/ scaled @!o; /*shift offset for unset boxes*/ halfword @!n; /*matching span amount*/ scaled @!rule_save; /*temporary storage for |overfull_rule|*/ memory_word @!aux_save; /*temporary storage for |aux|*/ if (cur_group!=align_group) confusion(@[@<|"align1"|@>@]); @:this can't happen align}{\quad align@> unsave(); /*that |align_group| was for individual entries*/ if (cur_group!=align_group) confusion(@[@<|"align0"|@>@]); unsave(); /*that |align_group| was for the whole alignment*/ if (nest[nest_ptr-1].mode_field==mmode) o=display_indent; else o=0; @<Go through the preamble list, determining the column widths and changing the alignrecords to dummy unset boxes@>; @<Package the preamble list, to determine the actual tabskip glue amounts, and let |p| point to this prototype box@>; @<Set the glue in all the unset boxes of the current list@>; flush_node_list(p);pop_alignment(); @<Insert the \(c)current list into its environment@>; } @/ @t\4@>@<Declare the procedure called |align_peek|@>@; @ It's time now to dismantle the preamble list and to compute the column widths. Let $w_{ij}$ be the maximum of the natural widths of all entries that span columns $i$ through $j$, inclusive. The alignrecord for column~$i$ contains $w_{ii}$ in its |width| field, and there is also a linked list of the nonzero $w_{ij}$ for increasing $j$, accessible via the |info| field; these span nodes contain the value $j-i+|min_quarterword|$ in their |link| fields. The values of $w_{ii}$ were initialized to |null_flag|, which we regard as $-\infty$. The final column widths are defined by the formula $$w_j=\max_{1\L i\L j}\biggl( w_{ij}-\sum_{i\L k<j}(t_k+w_k)\biggr),$$ where $t_k$ is the natural width of the tabskip glue between columns $k$ and~$k+1$. However, if $w_{ij}=-\infty$ for all |i| in the range |1 <= i <= j| (i.e., if every entry that involved column~|j| also involved column~|j+1|), we let $w_j=0$, and we zero out the tabskip glue after column~|j|. \TeX\ computes these values by using the following scheme: First $w_1=w_{11}$. Then replace $w_{2j}$ by $\max(w_{2j},w_{1j}-t_1-w_1)$, for all $j>1$. Then $w_2=w_{22}$. Then replace $w_{3j}$ by $\max(w_{3j},w_{2j}-t_2-w_2)$ for all $j>2$; and so on. If any $w_j$ turns out to be $-\infty$, its value is changed to zero and so is the next tabskip. @<Go through the preamble list,...@>= q=link(preamble); @/do@+{flush_list(u_part(q));flush_list(v_part(q)); p=link(link(q)); if (width(q)==null_flag) @<Nullify |width(q)| and the tabskip glue following this column@>; if (info(q)!=end_span) @<Merge the widths in the span nodes of |q| with those of |p|, destroying the span nodes of |q|@>; type(q)=unset_node;span_count(q)=min_quarterword;height(q)=0; depth(q)=0;glue_order(q)=normal;glue_sign(q)=normal; glue_stretch(q)=0;glue_shrink(q)=0;q=p; }@+ while (!(q==null)) @ @<Nullify |width(q)| and the tabskip glue following this column@>= {@+width(q)=0;r=link(q);s=glue_ptr(r); if (s!=zero_glue) {@+add_glue_ref(zero_glue);delete_glue_ref(s); glue_ptr(r)=zero_glue; } } @ Merging of two span-node lists is a typical exercise in the manipulation of linearly linked data structures. The essential invariant in the following |@/do@+{| loop is that we want to dispense with node |r|, in |q|'s list, and |u| is its successor; all nodes of |p|'s list up to and including |s| have been processed, and the successor of |s| matches |r| or precedes |r| or follows |r|, according as |link(r)==n| or |link(r) > n| or |link(r) < n|. @<Merge the widths...@>= {@+t=width(q)+width(glue_ptr(link(q))); r=info(q);s=end_span;info(s)=p;n=min_quarterword+1; @/do@+{width(r)=width(r)-t;u=info(r); while (link(r) > n) {@+s=info(s);n=link(info(s))+1; } if (link(r) < n) {@+info(r)=info(s);info(s)=r;decr(link(r));s=r; } else{@+if (width(r) > width(info(s))) width(info(s))=width(r); free_node(r, span_node_size); } r=u; }@+ while (!(r==end_span)); } @ Now the preamble list has been converted to a list of alternating unset boxes and tabskip glue, where the box widths are equal to the final column sizes. In case of \.{\\valign}, we change the widths to heights, so that a correct error message will be produced if the alignment is overfull or underfull. @<Package the preamble list...@>= save_ptr=save_ptr-2;pack_begin_line=-mode_line; if (mode==-vmode) {@+rule_save=overfull_rule; overfull_rule=0; /*prevent rule from being packaged*/ p=hpack(preamble, saved(1), saved(0));overfull_rule=rule_save; } else{@+q=link(preamble); @/do@+{height(q)=width(q);width(q)=0;q=link(link(q)); }@+ while (!(q==null)); p=vpack(preamble, saved(1), saved(0)); q=link(preamble); @/do@+{width(q)=height(q);height(q)=0;q=link(link(q)); }@+ while (!(q==null)); } pack_begin_line=0 @ @<Set the glue in all the unset...@>= q=link(head);s=head; while (q!=null) {@+if (!is_char_node(q)) if (type(q)==unset_node) @<Set the unset box |q| and the unset boxes in it@>@; else if (type(q)==rule_node) @<Make the running dimensions in rule |q| extend to the boundaries of the alignment@>; s=q;q=link(q); } @ @<Make the running dimensions in rule |q| extend...@>= {@+if (is_running(width(q))) width(q)=width(p); if (is_running(height(q))) height(q)=height(p); if (is_running(depth(q))) depth(q)=depth(p); if (o!=0) {@+r=link(q);link(q)=null;q=hpack(q, natural); shift_amount(q)=o;link(q)=r;link(s)=q; } } @ The unset box |q| represents a row that contains one or more unset boxes, depending on how soon \.{\\cr} occurred in that row. @<Set the unset box |q| and the unset boxes in it@>= {@+if (mode==-vmode) {@+type(q)=hlist_node;width(q)=width(p); } else{@+type(q)=vlist_node;height(q)=height(p); } glue_order(q)=glue_order(p);glue_sign(q)=glue_sign(p); glue_set(q)=glue_set(p);shift_amount(q)=o; r=link(list_ptr(q));s=link(list_ptr(p)); @/do@+{@<Set the glue in node |r| and change it from an unset node@>; r=link(link(r));s=link(link(s)); }@+ while (!(r==null)); } @ A box made from spanned columns will be followed by tabskip glue nodes and by empty boxes as if there were no spanning. This permits perfect alignment of subsequent entries, and it prevents values that depend on floating point arithmetic from entering into the dimensions of any boxes. @<Set the glue in node |r|...@>= n=span_count(r);t=width(s);w=t;u=hold_head; while (n > min_quarterword) {@+decr(n); @<Append tabskip glue and an empty box to list |u|, and update |s| and |t| as the prototype nodes are passed@>; } if (mode==-vmode) @<Make the unset node |r| into an |hlist_node| of width |w|, setting the glue as if the width were |t|@>@; else@<Make the unset node |r| into a |vlist_node| of height |w|, setting the glue as if the height were |t|@>; shift_amount(r)=0; if (u!=hold_head) /*append blank boxes to account for spanned nodes*/ {@+link(u)=link(r);link(r)=link(hold_head);r=u; } @ @<Append tabskip glue and an empty box to list |u|...@>= s=link(s);v=glue_ptr(s);link(u)=new_glue(v);u=link(u); subtype(u)=tab_skip_code+1;t=t+width(v); if (glue_sign(p)==stretching) {@+if (stretch_order(v)==glue_order(p)) t=t+round(float(glue_set(p))*stretch(v)); @^real multiplication@> } else if (glue_sign(p)==shrinking) {@+if (shrink_order(v)==glue_order(p)) t=t-round(float(glue_set(p))*shrink(v)); } s=link(s);link(u)=new_null_box();u=link(u);t=t+width(s); if (mode==-vmode) width(u)=width(s);@+else {@+type(u)=vlist_node;height(u)=width(s); } @ @<Make the unset node |r| into an |hlist_node| of width |w|...@>= {@+height(r)=height(q);depth(r)=depth(q); if (t==width(r)) {@+glue_sign(r)=normal;glue_order(r)=normal; set_glue_ratio_zero(glue_set(r)); } else if (t > width(r)) {@+glue_sign(r)=stretching; if (glue_stretch(r)==0) set_glue_ratio_zero(glue_set(r)); else glue_set(r)=unfloat((t-width(r))/(double)glue_stretch(r)); @^real division@> } else{@+glue_order(r)=glue_sign(r);glue_sign(r)=shrinking; if (glue_shrink(r)==0) set_glue_ratio_zero(glue_set(r)); else if ((glue_order(r)==normal)&&(width(r)-t > glue_shrink(r))) set_glue_ratio_one(glue_set(r)); else glue_set(r)=unfloat((width(r)-t)/(double)glue_shrink(r)); } width(r)=w;type(r)=hlist_node; } @ @<Make the unset node |r| into a |vlist_node| of height |w|...@>= {@+width(r)=width(q); if (t==height(r)) {@+glue_sign(r)=normal;glue_order(r)=normal; set_glue_ratio_zero(glue_set(r)); } else if (t > height(r)) {@+glue_sign(r)=stretching; if (glue_stretch(r)==0) set_glue_ratio_zero(glue_set(r)); else glue_set(r)=unfloat((t-height(r))/(double)glue_stretch(r)); @^real division@> } else{@+glue_order(r)=glue_sign(r);glue_sign(r)=shrinking; if (glue_shrink(r)==0) set_glue_ratio_zero(glue_set(r)); else if ((glue_order(r)==normal)&&(height(r)-t > glue_shrink(r))) set_glue_ratio_one(glue_set(r)); else glue_set(r)=unfloat((height(r)-t)/(double)glue_shrink(r)); } height(r)=w;type(r)=vlist_node; } @ We now have a completed alignment, in the list that starts at |head| and ends at |tail|. This list will be merged with the one that encloses it. (In case the enclosing mode is |mmode|, for displayed formulas, we will need to insert glue before and after the display; that part of the program will be deferred until we're more familiar with such operations.) In restricted horizontal mode, the |clang| part of |aux| is undefined; an over-cautious \PASCAL\ runtime system may complain about this. @^dirty \PASCAL@> @<Insert the \(c)current list into its environment@>= aux_save=aux;p=link(head);q=tail;pop_nest(); if (mode==mmode) @<Finish an alignment in a display@>@; else{@+aux=aux_save;link(tail)=p; if (p!=null) tail=q; if (mode==vmode) build_page(); } @* Breaking paragraphs into lines. We come now to what is probably the most interesting algorithm of \TeX: the mechanism for choosing the ``best possible'' breakpoints that yield the individual lines of a paragraph. \TeX's line-breaking algorithm takes a given horizontal list and converts it to a sequence of boxes that are appended to the current vertical list. In the course of doing this, it creates a special data structure containing three kinds of records that are not used elsewhere in \TeX. Such nodes are created while a paragraph is being processed, and they are destroyed afterwards; thus, the other parts of \TeX\ do not need to know anything about how line-breaking is done. The method used here is based on an approach devised by Michael F. Plass and @^Plass, Michael Frederick@> @^Knuth, Donald Ervin@> the author in 1977, subsequently generalized and improved by the same two people in 1980. A detailed discussion appears in {\sl SOFTWARE---Practice \AM\ Experience \bf11} (1981), 1119--1184, where it is shown that the line-breaking problem can be regarded as a special case of the problem of computing the shortest path in an acyclic network. The cited paper includes numerous examples and describes the history of line breaking as it has been practiced by printers through the ages. The present implementation adds two new ideas to the algorithm of 1980: Memory space requirements are considerably reduced by using smaller records for inactive nodes than for active ones, and arithmetic overflow is avoided by using ``delta distances'' instead of keeping track of the total distance from the beginning of the paragraph to the current point. @ The |line_break| procedure should be invoked only in horizontal mode; it leaves that mode and places its output into the current vlist of the enclosing vertical mode (or internal vertical mode). There is one explicit parameter: |final_widow_penalty| is the amount of additional penalty to be inserted before the final line of the paragraph. There are also a number of implicit parameters: The hlist to be broken starts at |link(head)|, and it is nonempty. The value of |prev_graf| in the enclosing semantic level tells where the paragraph should begin in the sequence of line numbers, in case hanging indentation or \.{\\parshape} is in use; |prev_graf| is zero unless this paragraph is being continued after a displayed formula. Other implicit parameters, such as the |par_shape_ptr| and various penalties to use for hyphenation, etc., appear in |eqtb|. After |line_break| has acted, it will have updated the current vlist and the value of |prev_graf|. Furthermore, the global variable |just_box| will point to the final box created by |line_break|, so that the width of this line can be ascertained when it is necessary to decide whether to use |above_display_skip| or |above_display_short_skip| before a displayed formula. @<Glob...@>= pointer @!just_box; /*the |hlist_node| for the last line of the new paragraph*/ @ Since |line_break| is a rather lengthy procedure---sort of a small world unto itself---we must build it up little by little, somewhat more cautiously than we have done with the simpler procedures of \TeX. Here is the general outline. @p@t\4@>@<Declare subprocedures for |line_break|@>@; void line_break(int @!final_widow_penalty) {@+ @<Local variables for line breaking@>@; pack_begin_line=mode_line; /*this is for over/underfull box messages*/ @<Get ready to start line breaking@>; @<Find optimal breakpoints@>; @<Break the paragraph at the chosen breakpoints, justify the resulting lines to the correct widths, and append them to the current vertical list@>; @<Clean up the memory by removing the break nodes@>; pack_begin_line=0; } @ The first task is to move the list from |head| to |temp_head| and go into the enclosing semantic level. We also append the \.{\\parfillskip} glue to the end of the paragraph, removing a space (or other glue node) if it was there, since spaces usually precede blank lines and instances of `\.{\$\$}'. The |par_fill_skip| is preceded by an infinite penalty, so it will never be considered as a potential breakpoint. This code assumes that a |glue_node| and a |penalty_node| occupy the same number of |mem|~words. @^data structure assumptions@> @<Get ready to start...@>= link(temp_head)=link(head); if (is_char_node(tail)) tail_append(new_penalty(inf_penalty))@; else if (type(tail)!=glue_node) tail_append(new_penalty(inf_penalty))@; else{@+type(tail)=penalty_node;delete_glue_ref(glue_ptr(tail)); flush_node_list(leader_ptr(tail));penalty(tail)=inf_penalty; } link(tail)=new_param_glue(par_fill_skip_code); init_cur_lang=prev_graf%0200000; init_l_hyf=prev_graf/020000000; init_r_hyf=(prev_graf/0200000)%0100; pop_nest(); @ When looking for optimal line breaks, \TeX\ creates a ``break node'' for each break that is {\sl feasible}, in the sense that there is a way to end a line at the given place without requiring any line to stretch more than a given tolerance. A break node is characterized by three things: the position of the break (which is a pointer to a |glue_node|, |math_node|, |penalty_node|, or |disc_node|); the ordinal number of the line that will follow this breakpoint; and the fitness classification of the line that has just ended, i.e., |tight_fit|, |decent_fit|, |loose_fit|, or |very_loose_fit|. @d tight_fit 3 /*fitness classification for lines shrinking 0.5 to 1.0 of their shrinkability*/ @d loose_fit 1 /*fitness classification for lines stretching 0.5 to 1.0 of their stretchability*/ @d very_loose_fit 0 /*fitness classification for lines stretching more than their stretchability*/ @d decent_fit 2 /*fitness classification for all other lines*/ @ The algorithm essentially determines the best possible way to achieve each feasible combination of position, line, and fitness. Thus, it answers questions like, ``What is the best way to break the opening part of the paragraph so that the fourth line is a tight line ending at such-and-such a place?'' However, the fact that all lines are to be the same length after a certain point makes it possible to regard all sufficiently large line numbers as equivalent, when the looseness parameter is zero, and this makes it possible for the algorithm to save space and time. An ``active node'' and a ``passive node'' are created in |mem| for each feasible breakpoint that needs to be considered. Active nodes are three words long and passive nodes are two words long. We need active nodes only for breakpoints near the place in the paragraph that is currently being examined, so they are recycled within a comparatively short time after they are created. @ An active node for a given breakpoint contains six fields: \yskip\hang|link| points to the next node in the list of active nodes; the last active node has |link==last_active|. \yskip\hang|break_node| points to the passive node associated with this breakpoint. \yskip\hang|line_number| is the number of the line that follows this breakpoint. \yskip\hang|fitness| is the fitness classification of the line ending at this breakpoint. \yskip\hang|type| is either |hyphenated| or |unhyphenated|, depending on whether this breakpoint is a |disc_node|. \yskip\hang|total_demerits| is the minimum possible sum of demerits over all lines leading from the beginning of the paragraph to this breakpoint. \yskip\noindent The value of |link(active)| points to the first active node on a linked list of all currently active nodes. This list is in order by |line_number|, except that nodes with |line_number > easy_line| may be in any order relative to each other. @d active_node_size 3 /*number of words in active nodes*/ @d fitness subtype /*|very_loose_fit dotdot tight_fit| on final line for this break*/ @d break_node rlink /*pointer to the corresponding passive node*/ @d line_number llink /*line that begins at this breakpoint*/ @d total_demerits(X) mem[X+2].i /*the quantity that \TeX\ minimizes*/ @d unhyphenated 0 /*the |type| of a normal active break node*/ @d hyphenated 1 /*the |type| of an active node that breaks at a |disc_node|*/ @d last_active active /*the active list ends where it begins*/ @ @<Initialize the special list heads...@>= type(last_active)=hyphenated;line_number(last_active)=max_halfword; subtype(last_active)=0; /*the |subtype| is never examined by the algorithm*/ @ The passive node for a given breakpoint contains only four fields: \yskip\hang|link| points to the passive node created just before this one, if any, otherwise it is |null|. \yskip\hang|cur_break| points to the position of this breakpoint in the horizontal list for the paragraph being broken. \yskip\hang|prev_break| points to the passive node that should precede this one in an optimal path to this breakpoint. \yskip\hang|serial| is equal to |n| if this passive node is the |n|th one created during the current pass. (This field is used only when printing out detailed statistics about the line-breaking calculations.) \yskip\noindent There is a global variable called |passive| that points to the most recently created passive node. Another global variable, |printed_node|, is used to help print out the paragraph when detailed information about the line-breaking computation is being displayed. @d passive_node_size 2 /*number of words in passive nodes*/ @d cur_break rlink /*in passive node, points to position of this breakpoint*/ @d prev_break llink /*points to passive node that should precede this one*/ @d serial info /*serial number for symbolic identification*/ @<Glob...@>= pointer @!passive; /*most recent node on passive list*/ pointer @!printed_node; /*most recent node that has been printed*/ halfword @!pass_number; /*the number of passive nodes allocated on this pass*/ @ The active list also contains ``delta'' nodes that help the algorithm compute the badness of individual lines. Such nodes appear only between two active nodes, and they have |type==delta_node|. If |p| and |r| are active nodes and if |q| is a delta node between them, so that |link(p)==q| and |link(q)==r|, then |q| tells the space difference between lines in the horizontal list that start after breakpoint |p| and lines that start after breakpoint |r|. In other words, if we know the length of the line that starts after |p| and ends at our current position, then the corresponding length of the line that starts after |r| is obtained by adding the amounts in node~|q|. A delta node contains six scaled numbers, since it must record the net change in glue stretchability with respect to all orders of infinity. The natural width difference appears in |mem[q+1].sc|; the stretch differences in units of pt, fil, fill, and filll appear in |mem[q+2 dotdot q+5].sc|; and the shrink difference appears in |mem[q+6].sc|. The |subtype| field of a delta node is not used. @d delta_node_size 7 /*number of words in a delta node*/ @d delta_node 2 /*|type| field in a delta node*/ @ As the algorithm runs, it maintains a set of six delta-like registers for the length of the line following the first active breakpoint to the current position in the given hlist. When it makes a pass through the active list, it also maintains a similar set of six registers for the length following the active breakpoint of current interest. A third set holds the length of an empty line (namely, the sum of \.{\\leftskip} and \.{\\rightskip}); and a fourth set is used to create new delta nodes. When we pass a delta node we want to do operations like $$\hbox{\ignorespaces|for k=1 to 6 do cur_active_width[k]=cur_active_width[k]+mem[q+k].sc|};$$ and we want to do this without the overhead of |for| loops. The |do_all_six| macro makes such six-tuples convenient. @d do_all_six(X) X(1);X(2);X(3);X(4);X(5);X(6) @<Glob...@>= scaled @!active_width0[6], *const @!active_width = @!active_width0-1; /*distance from first active node to~|cur_p|*/ scaled @!cur_active_width0[6], *const @!cur_active_width = @!cur_active_width0-1; /*distance from current active node*/ scaled @!background0[6], *const @!background = @!background0-1; /*length of an ``empty'' line*/ scaled @!break_width0[6], *const @!break_width = @!break_width0-1; /*length being computed after current break*/ @ Let's state the principles of the delta nodes more precisely and concisely, so that the following programs will be less obscure. For each legal breakpoint~|p| in the paragraph, we define two quantities $\alpha(p)$ and $\beta(p)$ such that the length of material in a line from breakpoint~|p| to breakpoint~|q| is $\gamma+\beta(q)-\alpha(p)$, for some fixed $\gamma$. Intuitively, $\alpha(p)$ and $\beta(q)$ are the total length of material from the beginning of the paragraph to a point ``after'' a break at |p| and to a point ``before'' a break at |q|; and $\gamma$ is the width of an empty line, namely the length contributed by \.{\\leftskip} and \.{\\rightskip}. Suppose, for example, that the paragraph consists entirely of alternating boxes and glue skips; let the boxes have widths $x_1\ldots x_n$ and let the skips have widths $y_1\ldots y_n$, so that the paragraph can be represented by $x_1y_1\ldots x_ny_n$. Let $p_i$ be the legal breakpoint at $y_i$; then $\alpha(p_i)=x_1+y_1+\cdots+x_i+y_i$, and $\beta(p_i)= x_1+y_1+\cdots+x_i$. To check this, note that the length of material from $p_2$ to $p_5$, say, is $\gamma+x_3+y_3+x_4+y_4+x_5=\gamma+\beta(p_5) -\alpha(p_2)$. The quantities $\alpha$, $\beta$, $\gamma$ involve glue stretchability and shrinkability as well as a natural width. If we were to compute $\alpha(p)$ and $\beta(p)$ for each |p|, we would need multiple precision arithmetic, and the multiprecise numbers would have to be kept in the active nodes. \TeX\ avoids this problem by working entirely with relative differences or ``deltas.'' Suppose, for example, that the active list contains $a_1\,\delta_1\,a_2\,\delta_2\,a_3$, where the |a|'s are active breakpoints and the $\delta$'s are delta nodes. Then $\delta_1=\alpha(a_1)-\alpha(a_2)$ and $\delta_2=\alpha(a_2)-\alpha(a_3)$. If the line breaking algorithm is currently positioned at some other breakpoint |p|, the |active_width| array contains the value $\gamma+\beta(p)-\alpha(a_1)$. If we are scanning through the list of active nodes and considering a tentative line that runs from $a_2$ to~|p|, say, the |cur_active_width| array will contain the value $\gamma+\beta(p)-\alpha(a_2)$. Thus, when we move from $a_2$ to $a_3$, we want to add $\alpha(a_2)-\alpha(a_3)$ to |cur_active_width|; and this is just $\delta_2$, which appears in the active list between $a_2$ and $a_3$. The |background| array contains $\gamma$. The |break_width| array will be used to calculate values of new delta nodes when the active list is being updated. @ Glue nodes in a horizontal list that is being paragraphed are not supposed to include ``infinite'' shrinkability; that is why the algorithm maintains four registers for stretching but only one for shrinking. If the user tries to introduce infinite shrinkability, the shrinkability will be reset to finite and an error message will be issued. A boolean variable |no_shrink_error_yet| prevents this error message from appearing more than once per paragraph. @d check_shrinkage(X) if ((shrink_order(X)!=normal)&&(shrink(X)!=0)) {@+X=finite_shrink(X); } @<Glob...@>= bool @!no_shrink_error_yet; /*have we complained about infinite shrinkage?*/ @ @<Declare subprocedures for |line_break|@>= pointer finite_shrink(pointer @!p) /*recovers from infinite shrinkage*/ {@+pointer q; /*new glue specification*/ if (no_shrink_error_yet) {@+no_shrink_error_yet=false; print_err("Infinite glue shrinkage found in a paragraph"); @.Infinite glue shrinkage...@> help5("The paragraph just ended includes some glue that has")@/ ("infinite shrinkability, e.g., `\\hskip 0pt minus 1fil'.")@/ ("Such glue doesn't belong there---it allows a paragraph")@/ ("of any length to fit on one line. But it's safe to proceed,")@/ ("since the offensive shrinkability has been made finite."); error(); } q=new_spec(p);shrink_order(q)=normal; delete_glue_ref(p);return q; } @ @<Get ready to start...@>= no_shrink_error_yet=true;@/ check_shrinkage(left_skip);check_shrinkage(right_skip);@/ q=left_skip;r=right_skip;background[1]=width(q)+width(r);@/ background[2]=0;background[3]=0;background[4]=0;background[5]=0;@/ background[2+stretch_order(q)]=stretch(q);@/ background[2+stretch_order(r)]=@|background[2+stretch_order(r)]+stretch(r);@/ background[6]=shrink(q)+shrink(r); @ A pointer variable |cur_p| runs through the given horizontal list as we look for breakpoints. This variable is global, since it is used both by |line_break| and by its subprocedure |try_break|. Another global variable called |threshold| is used to determine the feasibility of individual lines: Breakpoints are feasible if there is a way to reach them without creating lines whose badness exceeds |threshold|. (The badness is compared to |threshold| before penalties are added, so that penalty values do not affect the feasibility of breakpoints, except that no break is allowed when the penalty is 10000 or more.) If |threshold| is 10000 or more, all legal breaks are considered feasible, since the |badness| function specified above never returns a value greater than~10000. Up to three passes might be made through the paragraph in an attempt to find at least one set of feasible breakpoints. On the first pass, we have |threshold==pretolerance| and |second_pass==final_pass==false|. If this pass fails to find a feasible solution, |threshold| is set to |tolerance|, |second_pass| is set |true|, and an attempt is made to hyphenate as many words as possible. If that fails too, we add |emergency_stretch| to the background stretchability and set |final_pass==true|. @<Glob...@>= pointer @!cur_p; /*the current breakpoint under consideration*/ bool @!second_pass; /*is this our second attempt to break this paragraph?*/ bool @!final_pass; /*is this our final attempt to break this paragraph?*/ int @!threshold; /*maximum badness on feasible lines*/ @ The heart of the line-breaking procedure is `|try_break|', a subroutine that tests if the current breakpoint |cur_p| is feasible, by running through the active list to see what lines of text can be made from active nodes to~|cur_p|. If feasible breaks are possible, new break nodes are created. If |cur_p| is too far from an active node, that node is deactivated. The parameter |pi| to |try_break| is the penalty associated with a break at |cur_p|; we have |pi==eject_penalty| if the break is forced, and |pi==inf_penalty| if the break is illegal. The other parameter, |break_type|, is set to |hyphenated| or |unhyphenated|, depending on whether or not the current break is at a |disc_node|. The end of a paragraph is also regarded as `|hyphenated|'; this case is distinguishable by the condition |cur_p==null|. @d copy_to_cur_active(X) cur_active_width[X]=active_width[X] @<Declare subprocedures for |line_break|@>= void try_break(int @!pi, small_number @!break_type) {@+ pointer r; /*runs through the active list*/ pointer @!prev_r; /*stays a step behind |r|*/ halfword @!old_l; /*maximum line number in current equivalence class of lines*/ bool @!no_break_yet; /*have we found a feasible break at |cur_p|?*/ @<Other local variables for |try_break|@>@; @<Make sure that |pi| is in the proper range@>; no_break_yet=true;prev_r=active;old_l=0; do_all_six(copy_to_cur_active); loop@+{@+resume: r=link(prev_r); @<If node |r| is of type |delta_node|, update |cur_active_width|, set |prev_r| and |prev_prev_r|, then |goto continue|@>; @<If a line number class has ended, create new active nodes for the best feasible breaks in that class; then |return| if |r=last_active|, otherwise compute the new |line_width|@>; @<Consider the demerits for a line from |r| to |cur_p|; deactivate node |r| if it should no longer be active; then |goto continue| if a line from |r| to |cur_p| is infeasible, otherwise record a new feasible break@>; } end: ; #ifdef @!STAT @<Update the value of |printed_node| for symbolic displays@>; #endif @; } @ @<Other local variables for |try_break|@>= pointer @!prev_prev_r; /*a step behind |prev_r|, if |type(prev_r)==delta_node|*/ pointer @!s; /*runs through nodes ahead of |cur_p|*/ pointer @!q; /*points to a new node being created*/ pointer @!v; /*points to a glue specification or a node ahead of |cur_p|*/ int @!t; /*node count, if |cur_p| is a discretionary node*/ internal_font_number @!f; /*used in character width calculation*/ halfword @!l; /*line number of current active node*/ bool @!node_r_stays_active; /*should node |r| remain in the active list?*/ scaled @!line_width; /*the current line will be justified to this width*/ uint8_t @!fit_class; /*possible fitness class of test line*/ halfword @!b; /*badness of test line*/ int @!d; /*demerits of test line*/ bool @!artificial_demerits; /*has |d| been forced to zero?*/ #ifdef @!STAT pointer @!save_link; /*temporarily holds value of |link(cur_p)|*/ #endif scaled @!shortfall; /*used in badness calculations*/ @ @<Make sure that |pi| is in the proper range@>= if (abs(pi) >= inf_penalty) if (pi > 0) goto end; /*this breakpoint is inhibited by infinite penalty*/ else pi=eject_penalty /*this breakpoint will be forced*/ @ The following code uses the fact that |type(last_active)!=delta_node|. @d update_width(X) @| cur_active_width[X]=cur_active_width[X]+mem[r+X].sc @<If node |r|...@>= @^inner loop@> if (type(r)==delta_node) {@+do_all_six(update_width); prev_prev_r=prev_r;prev_r=r;goto resume; } @ As we consider various ways to end a line at |cur_p|, in a given line number class, we keep track of the best total demerits known, in an array with one entry for each of the fitness classifications. For example, |minimal_demerits[tight_fit]| contains the fewest total demerits of feasible line breaks ending at |cur_p| with a |tight_fit| line; |best_place[tight_fit]| points to the passive node for the break before~|cur_p| that achieves such an optimum; and |best_pl_line[tight_fit]| is the |line_number| field in the active node corresponding to |best_place[tight_fit]|. When no feasible break sequence is known, the |minimal_demerits| entries will be equal to |awful_bad|, which is $2^{30}-1$. Another variable, |minimum_demerits|, keeps track of the smallest value in the |minimal_demerits| array. @d awful_bad 07777777777 /*more than a billion demerits*/ @<Global...@>= int @!minimal_demerits0[tight_fit-very_loose_fit+1], *const @!minimal_demerits = @!minimal_demerits0-very_loose_fit; /*best total demerits known for current line class and position, given the fitness*/ int @!minimum_demerits; /*best total demerits known for current line class and position*/ pointer @!best_place0[tight_fit-very_loose_fit+1], *const @!best_place = @!best_place0-very_loose_fit; /*how to achieve |minimal_demerits|*/ halfword @!best_pl_line0[tight_fit-very_loose_fit+1], *const @!best_pl_line = @!best_pl_line0-very_loose_fit; /*corresponding line number*/ @ @<Get ready to start...@>= minimum_demerits=awful_bad; minimal_demerits[tight_fit]=awful_bad; minimal_demerits[decent_fit]=awful_bad; minimal_demerits[loose_fit]=awful_bad; minimal_demerits[very_loose_fit]=awful_bad; @ The first part of the following code is part of \TeX's inner loop, so we don't want to waste any time. The current active node, namely node |r|, contains the line number that will be considered next. At the end of the list we have arranged the data structure so that |r==last_active| and |line_number(last_active) > old_l|. @^inner loop@> @<If a line number class...@>= {@+l=line_number(r); if (l > old_l) {@+ /*now we are no longer in the inner loop*/ if ((minimum_demerits < awful_bad)&&@| ((old_l!=easy_line)||(r==last_active))) @<Create new active nodes for the best feasible breaks just found@>; if (r==last_active) goto end; @<Compute the new line width@>; } } @ It is not necessary to create new active nodes having |minimal_demerits| greater than |minimum_demerits+abs(adj_demerits)|, since such active nodes will never be chosen in the final paragraph breaks. This observation allows us to omit a substantial number of feasible breakpoints from further consideration. @<Create new active nodes...@>= {@+if (no_break_yet) @<Compute the values of |break_width|@>; @<Insert a delta node to prepare for breaks at |cur_p|@>; if (abs(adj_demerits) >= awful_bad-minimum_demerits) minimum_demerits=awful_bad-1; else minimum_demerits=minimum_demerits+abs(adj_demerits); for (fit_class=very_loose_fit; fit_class<=tight_fit; fit_class++) {@+if (minimal_demerits[fit_class] <= minimum_demerits) @<Insert a new active node from |best_place[fit_class]| to |cur_p|@>; minimal_demerits[fit_class]=awful_bad; } minimum_demerits=awful_bad; @<Insert a delta node to prepare for the next active node@>; } @ When we insert a new active node for a break at |cur_p|, suppose this new node is to be placed just before active node |a|; then we essentially want to insert `$\delta\,|cur_p|\,\delta^\prime$' before |a|, where $\delta=\alpha(a)-\alpha(|cur_p|)$ and $\delta^\prime=\alpha(|cur_p|)-\alpha(a)$ in the notation explained above. The |cur_active_width| array now holds $\gamma+\beta(|cur_p|)-\alpha(a)$; so $\delta$ can be obtained by subtracting |cur_active_width| from the quantity $\gamma+\beta(|cur_p|)- \alpha(|cur_p|)$. The latter quantity can be regarded as the length of a line ``from |cur_p| to |cur_p|''; we call it the |break_width| at |cur_p|. The |break_width| is usually negative, since it consists of the background (which is normally zero) minus the width of nodes following~|cur_p| that are eliminated after a break. If, for example, node |cur_p| is a glue node, the width of this glue is subtracted from the background; and we also look ahead to eliminate all subsequent glue and penalty and kern and math nodes, subtracting their widths as well. Kern nodes do not disappear at a line break unless they are |explicit|. @d set_break_width_to_background(X) break_width[X]=background[X] @<Compute the values of |break...@>= {@+no_break_yet=false;do_all_six(set_break_width_to_background); s=cur_p; if (break_type > unhyphenated) if (cur_p!=null) @<Compute the discretionary |break_width| values@>; while (s!=null) {@+if (is_char_node(s)) goto done; switch (type(s)) { case glue_node: @<Subtract glue from |break_width|@>@;@+break; case penalty_node: do_nothing;@+break; case math_node: break_width[1]=break_width[1]-width(s);@+break; case kern_node: if (subtype(s)!=explicit) goto done; else break_width[1]=break_width[1]-width(s);@+break; default:goto done; } @/ s=link(s); } done: ;} @ @<Subtract glue from |break...@>= {@+v=glue_ptr(s);break_width[1]=break_width[1]-width(v); break_width[2+stretch_order(v)]=break_width[2+stretch_order(v)]-stretch(v); break_width[6]=break_width[6]-shrink(v); } @ When |cur_p| is a discretionary break, the length of a line ``from |cur_p| to |cur_p|'' has to be defined properly so that the other calculations work out. Suppose that the pre-break text at |cur_p| has length $l_0$, the post-break text has length $l_1$, and the replacement text has length |l|. Suppose also that |q| is the node following the replacement text. Then length of a line from |cur_p| to |q| will be computed as $\gamma+\beta(q)-\alpha(|cur_p|)$, where $\beta(q)=\beta(|cur_p|)-l_0+l$. The actual length will be the background plus $l_1$, so the length from |cur_p| to |cur_p| should be $\gamma+l_0+l_1-l$. If the post-break text of the discretionary is empty, a break may also discard~|q|; in that unusual case we subtract the length of~|q| and any other nodes that will be discarded after the discretionary break. The value of $l_0$ need not be computed, since |line_break| will put it into the global variable |disc_width| before calling |try_break|. @<Glob...@>= scaled @!disc_width; /*the length of discretionary material preceding a break*/ @ @<Compute the discretionary |break...@>= {@+t=replace_count(cur_p);v=cur_p;s=post_break(cur_p); while (t > 0) {@+decr(t);v=link(v); @<Subtract the width of node |v| from |break_width|@>; } while (s!=null) {@+@<Add the width of node |s| to |break_width|@>; s=link(s); } break_width[1]=break_width[1]+disc_width; if (post_break(cur_p)==null) s=link(v); /*nodes may be discardable after the break*/ } @ Replacement texts and discretionary texts are supposed to contain only character nodes, kern nodes, ligature nodes, and box or rule nodes. @<Subtract the width of node |v|...@>= if (is_char_node(v)) {@+f=font(v); break_width[1]=break_width[1]-char_width(f)(char_info(f)(character(v))); } else switch (type(v)) { case ligature_node: {@+f=font(lig_char(v));@/ break_width[1]=@|break_width[1]- char_width(f)(char_info(f)(character(lig_char(v)))); } @+break; case hlist_node: case vlist_node: case rule_node: case kern_node: break_width[1]=break_width[1]-width(v);@+break; default:confusion(@[@<|"disc1"|@>@]); @:this can't happen disc1}{\quad disc1@> } @ @<Add the width of node |s| to |b...@>= if (is_char_node(s)) {@+f=font(s); break_width[1]=@|break_width[1]+char_width(f)(char_info(f)(character(s))); } else switch (type(s)) { case ligature_node: {@+f=font(lig_char(s)); break_width[1]=break_width[1]+ char_width(f)(char_info(f)(character(lig_char(s)))); } @+break; case hlist_node: case vlist_node: case rule_node: case kern_node: break_width[1]=break_width[1]+width(s);@+break; default:confusion(@[@<|"disc2"|@>@]); @:this can't happen disc2}{\quad disc2@> } @ We use the fact that |type(active)!=delta_node|. @d convert_to_break_width(X) @| mem[prev_r+X].sc=@|@t\hskip10pt@>mem[prev_r+X].sc -cur_active_width[X]+break_width[X] @d store_break_width(X) active_width[X]=break_width[X] @d new_delta_to_break_width(X) @| mem[q+X].sc=break_width[X]-cur_active_width[X] @<Insert a delta node to prepare for breaks at |cur_p|@>= if (type(prev_r)==delta_node) /*modify an existing delta node*/ {@+do_all_six(convert_to_break_width); } else if (prev_r==active) /*no delta node needed at the beginning*/ {@+do_all_six(store_break_width); } else{@+q=get_node(delta_node_size);link(q)=r;type(q)=delta_node;@/ subtype(q)=0; /*the |subtype| is not used*/ do_all_six(new_delta_to_break_width); link(prev_r)=q;prev_prev_r=prev_r;prev_r=q; } @ When the following code is performed, we will have just inserted at least one active node before |r|, so |type(prev_r)!=delta_node|. @d new_delta_from_break_width(X) @|mem[q+X].sc= cur_active_width[X]-break_width[X] @<Insert a delta node to prepare for the next active node@>= if (r!=last_active) {@+q=get_node(delta_node_size);link(q)=r;type(q)=delta_node;@/ subtype(q)=0; /*the |subtype| is not used*/ do_all_six(new_delta_from_break_width); link(prev_r)=q;prev_prev_r=prev_r;prev_r=q; } @ When we create an active node, we also create the corresponding passive node. @<Insert a new active node from |best_place[fit_class]| to |cur_p|@>= {@+q=get_node(passive_node_size); link(q)=passive;passive=q;cur_break(q)=cur_p; #ifdef @!STAT incr(pass_number);serial(q)=pass_number; #endif @;@/ prev_break(q)=best_place[fit_class];@/ q=get_node(active_node_size);break_node(q)=passive; line_number(q)=best_pl_line[fit_class]+1; fitness(q)=fit_class;type(q)=break_type; total_demerits(q)=minimal_demerits[fit_class]; link(q)=r;link(prev_r)=q;prev_r=q; #ifdef @!STAT if (tracing_paragraphs > 0) @<Print a symbolic description of the new break node@>; #endif @;@/ } @ @<Print a symbolic description of the new break node@>= {@+print_nl("@@@@");print_int(serial(passive)); @.\AT!\AT!@> print_str(": line ");print_int(line_number(q)-1); print_char('.');print_int(fit_class); if (break_type==hyphenated) print_char('-'); print_str(" t=");print_int(total_demerits(q)); print_str(" -> @@@@"); if (prev_break(passive)==null) print_char('0'); else print_int(serial(prev_break(passive))); } @ The length of lines depends on whether the user has specified \.{\\parshape} or \.{\\hangindent}. If |par_shape_ptr| is not null, it points to a $(2n+1)$-word record in |mem|, where the |info| in the first word contains the value of |n|, and the other $2n$ words contain the left margins and line lengths for the first |n| lines of the paragraph; the specifications for line |n| apply to all subsequent lines. If |par_shape_ptr==null|, the shape of the paragraph depends on the value of |n==hang_after|; if |n >= 0|, hanging indentation takes place on lines |n+1|, |n+2|, \dots, otherwise it takes place on lines 1, \dots, $\vert n\vert$. When hanging indentation is active, the left margin is |hang_indent|, if |hang_indent >= 0|, else it is 0; the line length is $|hsize|-\vert|hang_indent|\vert$. The normal setting is |par_shape_ptr==null|, |hang_after==1|, and |hang_indent==0|. Note that if |hang_indent==0|, the value of |hang_after| is irrelevant. @^length of lines@> @^hanging indentation@> @<Glob...@>= halfword @!easy_line; /*line numbers | > easy_line| are equivalent in break nodes*/ halfword @!last_special_line; /*line numbers | > last_special_line| all have the same width*/ scaled @!first_width; /*the width of all lines | <= last_special_line|, if no \.{\\parshape} has been specified*/ scaled @!second_width; /*the width of all lines | > last_special_line|*/ scaled @!first_indent; /*left margin to go with |first_width|*/ scaled @!second_indent; /*left margin to go with |second_width|*/ @ We compute the values of |easy_line| and the other local variables relating to line length when the |line_break| procedure is initializing itself. @<Get ready to start...@>= if (par_shape_ptr==null) if (hang_indent==0) {@+last_special_line=0;second_width=hsize; second_indent=0; } else@<Set line length parameters in preparation for hanging indentation@>@; else{@+last_special_line=info(par_shape_ptr)-1; second_width=mem[par_shape_ptr+2*(last_special_line+1)].sc; second_indent=mem[par_shape_ptr+2*last_special_line+1].sc; } if (looseness==0) easy_line=last_special_line; else easy_line=max_halfword @ @<Set line length parameters in preparation for hanging indentation@>= {@+last_special_line=abs(hang_after); if (hang_after < 0) {@+first_width=hsize-abs(hang_indent); if (hang_indent >= 0) first_indent=hang_indent; else first_indent=0; second_width=hsize;second_indent=0; } else{@+first_width=hsize;first_indent=0; second_width=hsize-abs(hang_indent); if (hang_indent >= 0) second_indent=hang_indent; else second_indent=0; } } @ When we come to the following code, we have just encountered the first active node~|r| whose |line_number| field contains |l|. Thus we want to compute the length of the $l\mskip1mu$th line of the current paragraph. Furthermore, we want to set |old_l| to the last number in the class of line numbers equivalent to~|l|. @<Compute the new line width@>= if (l > easy_line) {@+line_width=second_width;old_l=max_halfword-1; } else{@+old_l=l; if (l > last_special_line) line_width=second_width; else if (par_shape_ptr==null) line_width=first_width; else line_width=mem[par_shape_ptr+2*l@,].sc; } @ The remaining part of |try_break| deals with the calculation of demerits for a break from |r| to |cur_p|. The first thing to do is calculate the badness, |b|. This value will always be between zero and |inf_bad+1|; the latter value occurs only in the case of lines from |r| to |cur_p| that cannot shrink enough to fit the necessary width. In such cases, node |r| will be deactivated. We also deactivate node~|r| when a break at~|cur_p| is forced, since future breaks must go through a forced break. @<Consider the demerits for a line from |r| to |cur_p|...@>= {@+artificial_demerits=false;@/ @^inner loop@> shortfall=line_width-cur_active_width[1]; /*we're this much too short*/ if (shortfall > 0) @<Set the value of |b| to the badness for stretching the line, and compute the corresponding |fit_class|@>@; else@<Set the value of |b| to the badness for shrinking the line, and compute the corresponding |fit_class|@>; if ((b > inf_bad)||(pi==eject_penalty)) @<Prepare to deactivate node~|r|, and |goto deactivate| unless there is a reason to consider lines of text from |r| to |cur_p|@>@; else{@+prev_r=r; if (b > threshold) goto resume; node_r_stays_active=true; } @<Record a new feasible break@>; if (node_r_stays_active) goto resume; /*|prev_r| has been set to |r|*/ deactivate: @<Deactivate node |r|@>; } @ When a line must stretch, the available stretchability can be found in the subarray |cur_active_width[2 dotdot 5]|, in units of points, fil, fill, and filll. The present section is part of \TeX's inner loop, and it is most often performed when the badness is infinite; therefore it is worth while to make a quick test for large width excess and small stretchability, before calling the |badness| subroutine. @^inner loop@> @<Set the value of |b| to the badness for stretching...@>= if ((cur_active_width[3]!=0)||(cur_active_width[4]!=0)||@| (cur_active_width[5]!=0)) {@+b=0;fit_class=decent_fit; /*infinite stretch*/ } else{@+if (shortfall > 7230584) if (cur_active_width[2] < 1663497) {@+b=inf_bad;fit_class=very_loose_fit;goto done1; } b=badness(shortfall, cur_active_width[2]); if (b > 12) if (b > 99) fit_class=very_loose_fit; else fit_class=loose_fit; else fit_class=decent_fit; done1: ; } @ Shrinkability is never infinite in a paragraph; we can shrink the line from |r| to |cur_p| by at most |cur_active_width[6]|. @<Set the value of |b| to the badness for shrinking...@>= {@+if (-shortfall > cur_active_width[6]) b=inf_bad+1; else b=badness(-shortfall, cur_active_width[6]); if (b > 12) fit_class=tight_fit;@+else fit_class=decent_fit; } @ During the final pass, we dare not lose all active nodes, lest we lose touch with the line breaks already found. The code shown here makes sure that such a catastrophe does not happen, by permitting overfull boxes as a last resort. This particular part of \TeX\ was a source of several subtle bugs before the correct program logic was finally discovered; readers who seek to ``improve'' \TeX\ should therefore think thrice before daring to make any changes here. @^overfull boxes@> @<Prepare to deactivate node~|r|, and |goto deactivate| unless...@>= {@+if (final_pass&&(minimum_demerits==awful_bad)&&@| (link(r)==last_active)&& (prev_r==active)) artificial_demerits=true; /*set demerits zero, this break is forced*/ else if (b > threshold) goto deactivate; node_r_stays_active=false; } @ When we get to this part of the code, the line from |r| to |cur_p| is feasible, its badness is~|b|, and its fitness classification is |fit_class|. We don't want to make an active node for this break yet, but we will compute the total demerits and record them in the |minimal_demerits| array, if such a break is the current champion among all ways to get to |cur_p| in a given line-number class and fitness class. @<Record a new feasible break@>= if (artificial_demerits) d=0; else@<Compute the demerits, |d|, from |r| to |cur_p|@>; #ifdef @!STAT if (tracing_paragraphs > 0) @<Print a symbolic description of this feasible break@>; #endif @;@/ d=d+total_demerits(r); /*this is the minimum total demerits from the beginning to |cur_p| via |r|*/ if (d <= minimal_demerits[fit_class]) {@+minimal_demerits[fit_class]=d; best_place[fit_class]=break_node(r);best_pl_line[fit_class]=l; if (d < minimum_demerits) minimum_demerits=d; } @ @<Print a symbolic description of this feasible break@>= {@+if (printed_node!=cur_p) @<Print the list between |printed_node| and |cur_p|, then set |printed_node:=cur_p|@>; print_nl("@@"); @.\AT!@> if (cur_p==null) print_esc(@[@<|"par"|@>@]); else if (type(cur_p)!=glue_node) {@+if (type(cur_p)==penalty_node) print_esc(@[@<|"penalty"|@>@]); else if (type(cur_p)==disc_node) print_esc(@[@<|"discretionary"|@>@]); else if (type(cur_p)==kern_node) print_esc(@[@<|"kern"|@>@]); else print_esc(@[@<|"math"|@>@]); } print_str(" via @@@@"); if (break_node(r)==null) print_char('0'); else print_int(serial(break_node(r))); print_str(" b="); if (b > inf_bad) print_char('*');@+else print_int(b); @.*\relax@> print_str(" p=");print_int(pi);print_str(" d="); if (artificial_demerits) print_char('*');@+else print_int(d); } @ @<Print the list between |printed_node| and |cur_p|...@>= {@+print_nl(""); if (cur_p==null) short_display(link(printed_node)); else{@+save_link=link(cur_p); link(cur_p)=null;print_nl("");short_display(link(printed_node)); link(cur_p)=save_link; } printed_node=cur_p; } @ When the data for a discretionary break is being displayed, we will have printed the |pre_break| and |post_break| lists; we want to skip over the third list, so that the discretionary data will not appear twice. The following code is performed at the very end of |try_break|. @<Update the value of |printed_node|...@>= if (cur_p==printed_node) if (cur_p!=null) if (type(cur_p)==disc_node) {@+t=replace_count(cur_p); while (t > 0) {@+decr(t);printed_node=link(printed_node); } } @ @<Compute the demerits, |d|, from |r| to |cur_p|@>= {@+d=line_penalty+b; if (abs(d) >= 10000) d=100000000;@+else d=d*d; if (pi!=0) if (pi > 0) d=d+pi*pi; else if (pi > eject_penalty) d=d-pi*pi; if ((break_type==hyphenated)&&(type(r)==hyphenated)) if (cur_p!=null) d=d+double_hyphen_demerits; else d=d+final_hyphen_demerits; if (abs(fit_class-fitness(r)) > 1) d=d+adj_demerits; } @ When an active node disappears, we must delete an adjacent delta node if the active node was at the beginning or the end of the active list, or if it was surrounded by delta nodes. We also must preserve the property that |cur_active_width| represents the length of material from |link(prev_r)| to~|cur_p|. @d combine_two_deltas(X) @|mem[prev_r+X].sc=mem[prev_r+X].sc+mem[r+X].sc @d downdate_width(X) @|cur_active_width[X]=cur_active_width[X]- mem[prev_r+X].sc @<Deactivate node |r|@>= link(prev_r)=link(r);free_node(r, active_node_size); if (prev_r==active) @<Update the active widths, since the first active node has been deleted@>@; else if (type(prev_r)==delta_node) {@+r=link(prev_r); if (r==last_active) {@+do_all_six(downdate_width); link(prev_prev_r)=last_active; free_node(prev_r, delta_node_size);prev_r=prev_prev_r; } else if (type(r)==delta_node) {@+do_all_six(update_width); do_all_six(combine_two_deltas); link(prev_r)=link(r);free_node(r, delta_node_size); } } @ The following code uses the fact that |type(last_active)!=delta_node|. If the active list has just become empty, we do not need to update the |active_width| array, since it will be initialized when an active node is next inserted. @d update_active(X) active_width[X]=active_width[X]+mem[r+X].sc @<Update the active widths,...@>= {@+r=link(active); if (type(r)==delta_node) {@+do_all_six(update_active); do_all_six(copy_to_cur_active); link(active)=link(r);free_node(r, delta_node_size); } } @* Breaking paragraphs into lines, continued. So far we have gotten a little way into the |line_break| routine, having covered its important |try_break| subroutine. Now let's consider the rest of the process. The main loop of |line_break| traverses the given hlist, starting at |link(temp_head)|, and calls |try_break| at each legal breakpoint. A variable called |auto_breaking| is set to true except within math formulas, since glue nodes are not legal breakpoints when they appear in formulas. The current node of interest in the hlist is pointed to by |cur_p|. Another variable, |prev_p|, is usually one step behind |cur_p|, but the real meaning of |prev_p| is this: If |type(cur_p)==glue_node| then |cur_p| is a legal breakpoint if and only if |auto_breaking| is true and |prev_p| does not point to a glue node, penalty node, explicit kern node, or math node. The following declarations provide for a few other local variables that are used in special calculations. @<Local variables for line breaking@>= bool @!auto_breaking; /*is node |cur_p| outside a formula?*/ pointer @!prev_p; /*helps to determine when glue nodes are breakpoints*/ pointer @!q, @!r, @!s, @!prev_s; /*miscellaneous nodes of temporary interest*/ internal_font_number @!f; /*used when calculating character widths*/ @ The `\ignorespaces|loop|\unskip' in the following code is performed at most thrice per call of |line_break|, since it is actually a pass over the entire paragraph. @<Find optimal breakpoints@>= threshold=pretolerance; if (threshold >= 0) { #ifdef @!STAT if (tracing_paragraphs > 0) {@+begin_diagnostic();print_nl("@@firstpass");@+} @; #endif @;@/ second_pass=false;final_pass=false; } else{@+threshold=tolerance;second_pass=true; final_pass=(emergency_stretch <= 0); #ifdef @!STAT if (tracing_paragraphs > 0) begin_diagnostic(); #endif @; } loop@+{@+if (threshold > inf_bad) threshold=inf_bad; if (second_pass) @<Initialize for hyphenating a paragraph@>; @<Create an active breakpoint representing the beginning of the paragraph@>; cur_p=link(temp_head);auto_breaking=true;@/ prev_p=cur_p; /*glue at beginning is not a legal breakpoint*/ while ((cur_p!=null)&&(link(active)!=last_active)) @<Call |try_break| if |cur_p| is a legal breakpoint; on the second pass, also try to hyphenate the next word, if |cur_p| is a glue node; then advance |cur_p| to the next node of the paragraph that could possibly be a legal breakpoint@>; if (cur_p==null) @<Try the final line break at the end of the paragraph, and |goto done| if the desired breakpoints have been found@>; @<Clean up the memory by removing the break nodes@>; if (!second_pass) { #ifdef @!STAT if (tracing_paragraphs > 0) print_nl("@@secondpass");@; #endif threshold=tolerance;second_pass=true;final_pass=(emergency_stretch <= 0); } /*if at first you don't succeed, \dots*/ else{ #ifdef @!STAT if (tracing_paragraphs > 0) print_nl("@@emergencypass");@; #endif background[2]=background[2]+emergency_stretch;final_pass=true; } } done: #ifdef @!STAT if (tracing_paragraphs > 0) {@+end_diagnostic(true);normalize_selector(); } #endif @ The active node that represents the starting point does not need a corresponding passive node. @d store_background(X) active_width[X]=background[X] @<Create an active breakpoint representing the beginning of the paragraph@>= q=get_node(active_node_size); type(q)=unhyphenated;fitness(q)=decent_fit; link(q)=last_active;break_node(q)=null; line_number(q)=prev_graf+1;total_demerits(q)=0;link(active)=q; do_all_six(store_background);@/ passive=null;printed_node=temp_head;pass_number=0; font_in_short_display=null_font @ @<Clean...@>= q=link(active); while (q!=last_active) {@+cur_p=link(q); if (type(q)==delta_node) free_node(q, delta_node_size); else free_node(q, active_node_size); q=cur_p; } q=passive; while (q!=null) {@+cur_p=link(q); free_node(q, passive_node_size); q=cur_p; } @ Here is the main switch in the |line_break| routine, where legal breaks are determined. As we move through the hlist, we need to keep the |active_width| array up to date, so that the badness of individual lines is readily calculated by |try_break|. It is convenient to use the short name |act_width| for the component of active width that represents real width as opposed to glue. @d act_width active_width[1] /*length from first active node to current node*/ @d kern_break {@+if (!is_char_node(link(cur_p))&&auto_breaking) if (type(link(cur_p))==glue_node) try_break(0, unhyphenated); act_width=act_width+width(cur_p); } @<Call |try_break| if |cur_p| is a legal breakpoint...@>= {@+if (is_char_node(cur_p)) @<Advance \(c)|cur_p| to the node following the present string of characters@>; switch (type(cur_p)) { case hlist_node: case vlist_node: case rule_node: act_width=act_width+width(cur_p);@+break; case whatsit_node: @<Advance \(p)past a whatsit node in the \(l)|line_break| loop@>@;@+break; case glue_node: {@+@<If node |cur_p| is a legal breakpoint, call |try_break|; then update the active widths by including the glue in |glue_ptr(cur_p)|@>; if (second_pass&&auto_breaking) @<Try to hyphenate the following word@>; } @+break; case kern_node: if (subtype(cur_p)==explicit) kern_break@; else act_width=act_width+width(cur_p);@+break; case ligature_node: {@+f=font(lig_char(cur_p)); act_width=act_width+char_width(f)(char_info(f)(character(lig_char(cur_p)))); } @+break; case disc_node: @<Try to break after a discretionary fragment, then |goto done5|@>@; case math_node: {@+auto_breaking=(subtype(cur_p)==after);kern_break; } @+break; case penalty_node: try_break(penalty(cur_p), unhyphenated);@+break; case mark_node: case ins_node: case adjust_node: do_nothing;@+break; default:confusion(@[@<|"paragraph"|@>@]); @:this can't happen paragraph}{\quad paragraph@> } @/ prev_p=cur_p;cur_p=link(cur_p); done5: ;} @ The code that passes over the characters of words in a paragraph is part of \TeX's inner loop, so it has been streamlined for speed. We use the fact that `\.{\\parfillskip}' glue appears at the end of each paragraph; it is therefore unnecessary to check if |link(cur_p)==null| when |cur_p| is a character node. @^inner loop@> @<Advance \(c)|cur_p| to the node following the present string...@>= {@+prev_p=cur_p; @/do@+{f=font(cur_p); act_width=act_width+char_width(f)(char_info(f)(character(cur_p))); cur_p=link(cur_p); }@+ while (!(!is_char_node(cur_p))); } @ When node |cur_p| is a glue node, we look at |prev_p| to see whether or not a breakpoint is legal at |cur_p|, as explained above. @<If node |cur_p| is a legal breakpoint, call...@>= if (auto_breaking) {@+if (is_char_node(prev_p)) try_break(0, unhyphenated); else if (precedes_break(prev_p)) try_break(0, unhyphenated); else if ((type(prev_p)==kern_node)&&(subtype(prev_p)!=explicit)) try_break(0, unhyphenated); } check_shrinkage(glue_ptr(cur_p));q=glue_ptr(cur_p); act_width=act_width+width(q);@| active_width[2+stretch_order(q)]=@| active_width[2+stretch_order(q)]+stretch(q);@/ active_width[6]=active_width[6]+shrink(q) @ The following code knows that discretionary texts contain only character nodes, kern nodes, box nodes, rule nodes, and ligature nodes. @<Try to break after a discretionary fragment...@>= {@+s=pre_break(cur_p);disc_width=0; if (s==null) try_break(ex_hyphen_penalty, hyphenated); else{@+@/do@+{@<Add the width of node |s| to |disc_width|@>; s=link(s); }@+ while (!(s==null)); act_width=act_width+disc_width; try_break(hyphen_penalty, hyphenated); act_width=act_width-disc_width; } r=replace_count(cur_p);s=link(cur_p); while (r > 0) {@+@<Add the width of node |s| to |act_width|@>; decr(r);s=link(s); } prev_p=cur_p;cur_p=s;goto done5; } @ @<Add the width of node |s| to |disc_width|@>= if (is_char_node(s)) {@+f=font(s); disc_width=disc_width+char_width(f)(char_info(f)(character(s))); } else switch (type(s)) { case ligature_node: {@+f=font(lig_char(s)); disc_width=disc_width+ char_width(f)(char_info(f)(character(lig_char(s)))); } @+break; case hlist_node: case vlist_node: case rule_node: case kern_node: disc_width=disc_width+width(s);@+break; default:confusion(@[@<|"disc3"|@>@]); @:this can't happen disc3}{\quad disc3@> } @ @<Add the width of node |s| to |act_width|@>= if (is_char_node(s)) {@+f=font(s); act_width=act_width+char_width(f)(char_info(f)(character(s))); } else switch (type(s)) { case ligature_node: {@+f=font(lig_char(s)); act_width=act_width+ char_width(f)(char_info(f)(character(lig_char(s)))); } @+break; case hlist_node: case vlist_node: case rule_node: case kern_node: act_width=act_width+width(s);@+break; default:confusion(@[@<|"disc4"|@>@]); @:this can't happen disc4}{\quad disc4@> } @ The forced line break at the paragraph's end will reduce the list of breakpoints so that all active nodes represent breaks at |cur_p==null|. On the first pass, we insist on finding an active node that has the correct ``looseness.'' On the final pass, there will be at least one active node, and we will match the desired looseness as well as we can. The global variable |best_bet| will be set to the active node for the best way to break the paragraph, and a few other variables are used to help determine what is best. @<Glob...@>= pointer @!best_bet; /*use this passive node and its predecessors*/ int @!fewest_demerits; /*the demerits associated with |best_bet|*/ halfword @!best_line; /*line number following the last line of the new paragraph*/ int @!actual_looseness; /*the difference between |line_number(best_bet)| and the optimum |best_line|*/ int @!line_diff; /*the difference between the current line number and the optimum |best_line|*/ @ @<Try the final line break at the end of the paragraph...@>= {@+try_break(eject_penalty, hyphenated); if (link(active)!=last_active) {@+@<Find an active node with fewest demerits@>; if (looseness==0) goto done; @<Find the best active node for the desired looseness@>; if ((actual_looseness==looseness)||final_pass) goto done; } } @ @<Find an active node...@>= r=link(active);fewest_demerits=awful_bad; @/do@+{if (type(r)!=delta_node) if (total_demerits(r) < fewest_demerits) {@+fewest_demerits=total_demerits(r);best_bet=r; } r=link(r); }@+ while (!(r==last_active)); best_line=line_number(best_bet) @ The adjustment for a desired looseness is a slightly more complicated version of the loop just considered. Note that if a paragraph is broken into segments by displayed equations, each segment will be subject to the looseness calculation, independently of the other segments. @<Find the best active node...@>= {@+r=link(active);actual_looseness=0; @/do@+{if (type(r)!=delta_node) {@+line_diff=line_number(r)-best_line; if (((line_diff < actual_looseness)&&(looseness <= line_diff))||@| ((line_diff > actual_looseness)&&(looseness >= line_diff))) {@+best_bet=r;actual_looseness=line_diff; fewest_demerits=total_demerits(r); } else if ((line_diff==actual_looseness)&&@| (total_demerits(r) < fewest_demerits)) {@+best_bet=r;fewest_demerits=total_demerits(r); } } r=link(r); }@+ while (!(r==last_active)); best_line=line_number(best_bet); } @ Once the best sequence of breakpoints has been found (hurray), we call on the procedure |post_line_break| to finish the remainder of the work. (By introducing this subprocedure, we are able to keep |line_break| from getting extremely long.) @<Break the paragraph at the chosen...@>= post_line_break(final_widow_penalty) @ The total number of lines that will be set by |post_line_break| is |best_line-prev_graf-1|. The last breakpoint is specified by |break_node(best_bet)|, and this passive node points to the other breakpoints via the |prev_break| links. The finishing-up phase starts by linking the relevant passive nodes in forward order, changing |prev_break| to |next_break|. (The |next_break| fields actually reside in the same memory space as the |prev_break| fields did, but we give them a new name because of their new significance.) Then the lines are justified, one by one. @d next_break prev_break /*new name for |prev_break| after links are reversed*/ @<Declare subprocedures for |line_break|@>= void post_line_break(int @!final_widow_penalty) {@+ pointer q, @!r, @!s; /*temporary registers for list manipulation*/ bool @!disc_break; /*was the current break at a discretionary node?*/ bool @!post_disc_break; /*and did it have a nonempty post-break part?*/ scaled @!cur_width; /*width of line number |cur_line|*/ scaled @!cur_indent; /*left margin of line number |cur_line|*/ quarterword @!t; /*used for replacement counts in discretionary nodes*/ int @!pen; /*use when calculating penalties between lines*/ halfword @!cur_line; /*the current line number being justified*/ @<Reverse the links of the relevant passive nodes, setting |cur_p| to the first breakpoint@>; cur_line=prev_graf+1; @/do@+{@<Justify the line ending at breakpoint |cur_p|, and append it to the current vertical list, together with associated penalties and other insertions@>; incr(cur_line);cur_p=next_break(cur_p); if (cur_p!=null) if (!post_disc_break) @<Prune unwanted nodes at the beginning of the next line@>; }@+ while (!(cur_p==null)); if ((cur_line!=best_line)||(link(temp_head)!=null)) confusion(@[@<|"line breaking"|@>@]); @:this can't happen line breaking}{\quad line breaking@> prev_graf=best_line-1; } @ The job of reversing links in a list is conveniently regarded as the job of taking items off one stack and putting them on another. In this case we take them off a stack pointed to by |q| and having |prev_break| fields; we put them on a stack pointed to by |cur_p| and having |next_break| fields. Node |r| is the passive node being moved from stack to stack. @<Reverse the links of the relevant passive nodes...@>= q=break_node(best_bet);cur_p=null; @/do@+{r=q;q=prev_break(q);next_break(r)=cur_p;cur_p=r; }@+ while (!(q==null)) @ Glue and penalty and kern and math nodes are deleted at the beginning of a line, except in the anomalous case that the node to be deleted is actually one of the chosen breakpoints. Otherwise the pruning done here is designed to match the lookahead computation in |try_break|, where the |break_width| values are computed for non-discretionary breakpoints. @<Prune unwanted nodes at the beginning of the next line@>= {@+r=temp_head; loop@+{@+q=link(r); if (q==cur_break(cur_p)) goto done1; /*|cur_break(cur_p)| is the next breakpoint*/ /*now |q| cannot be |null|*/ if (is_char_node(q)) goto done1; if (non_discardable(q)) goto done1; if (type(q)==kern_node) if (subtype(q)!=explicit) goto done1; r=q; /*now |type(q)==glue_node|, |kern_node|, |math_node| or |penalty_node|*/ } done1: if (r!=temp_head) {@+link(r)=null;flush_node_list(link(temp_head)); link(temp_head)=q; } } @ The current line to be justified appears in a horizontal list starting at |link(temp_head)| and ending at |cur_break(cur_p)|. If |cur_break(cur_p)| is a glue node, we reset the glue to equal the |right_skip| glue; otherwise we append the |right_skip| glue at the right. If |cur_break(cur_p)| is a discretionary node, we modify the list so that the discretionary break is compulsory, and we set |disc_break| to |true|. We also append the |left_skip| glue at the left of the line, unless it is zero. @<Justify the line ending at breakpoint |cur_p|, and append it...@>= @<Modify the end of the line to reflect the nature of the break and to include \.{\\rightskip}; also set the proper value of |disc_break|@>; @<Put the \(l)\.{\\leftskip} glue at the left and detach this line@>; @<Call the packaging subroutine, setting |just_box| to the justified box@>; @<Append the new box to the current vertical list, followed by the list of special nodes taken out of the box by the packager@>; @<Append a penalty node, if a nonzero penalty is appropriate@>@; @ At the end of the following code, |q| will point to the final node on the list about to be justified. @<Modify the end of the line...@>= q=cur_break(cur_p);disc_break=false;post_disc_break=false; if (q!=null) /*|q| cannot be a |char_node|*/ if (type(q)==glue_node) {@+delete_glue_ref(glue_ptr(q)); glue_ptr(q)=right_skip; subtype(q)=right_skip_code+1;add_glue_ref(right_skip); goto done; } else{@+if (type(q)==disc_node) @<Change discretionary to compulsory and set |disc_break:=true|@>@; else if ((type(q)==math_node)||(type(q)==kern_node)) width(q)=0; } else{@+q=temp_head; while (link(q)!=null) q=link(q); } @<Put the \(r)\.{\\rightskip} glue after node |q|@>; done: @ @<Change discretionary to compulsory...@>= {@+t=replace_count(q); @<Destroy the |t| nodes following |q|, and make |r| point to the following node@>; if (post_break(q)!=null) @<Transplant the post-break list@>; if (pre_break(q)!=null) @<Transplant the pre-break list@>; link(q)=r;disc_break=true; } @ @<Destroy the |t| nodes following |q|...@>= if (t==0) r=link(q); else{@+r=q; while (t > 1) {@+r=link(r);decr(t); } s=link(r); r=link(s);link(s)=null; flush_node_list(link(q));replace_count(q)=0; } @ We move the post-break list from inside node |q| to the main list by re\-attaching it just before the present node |r|, then resetting |r|. @<Transplant the post-break list@>= {@+s=post_break(q); while (link(s)!=null) s=link(s); link(s)=r;r=post_break(q);post_break(q)=null;post_disc_break=true; } @ We move the pre-break list from inside node |q| to the main list by re\-attaching it just after the present node |q|, then resetting |q|. @<Transplant the pre-break list@>= {@+s=pre_break(q);link(q)=s; while (link(s)!=null) s=link(s); pre_break(q)=null;q=s; } @ @<Put the \(r)\.{\\rightskip} glue after node |q|@>= r=new_param_glue(right_skip_code);link(r)=link(q);link(q)=r;q=r @ The following code begins with |q| at the end of the list to be justified. It ends with |q| at the beginning of that list, and with |link(temp_head)| pointing to the remainder of the paragraph, if any. @<Put the \(l)\.{\\leftskip} glue at the left...@>= r=link(q);link(q)=null;q=link(temp_head);link(temp_head)=r; if (left_skip!=zero_glue) {@+r=new_param_glue(left_skip_code); link(r)=q;q=r; } @ @<Append the new box to the current vertical list...@>= append_to_vlist(just_box); if (adjust_head!=adjust_tail) {@+link(tail)=link(adjust_head);tail=adjust_tail; } adjust_tail=null @ Now |q| points to the hlist that represents the current line of the paragraph. We need to compute the appropriate line width, pack the line into a box of this size, and shift the box by the appropriate amount of indentation. @<Call the packaging subroutine...@>= if (cur_line > last_special_line) {@+cur_width=second_width;cur_indent=second_indent; } else if (par_shape_ptr==null) {@+cur_width=first_width;cur_indent=first_indent; } else{@+cur_width=mem[par_shape_ptr+2*cur_line].sc; cur_indent=mem[par_shape_ptr+2*cur_line-1].sc; } adjust_tail=adjust_head;just_box=hpack(q, cur_width, exactly); shift_amount(just_box)=cur_indent @ Penalties between the lines of a paragraph come from club and widow lines, from the |inter_line_penalty| parameter, and from lines that end at discretionary breaks. Breaking between lines of a two-line paragraph gets both club-line and widow-line penalties. The local variable |pen| will be set to the sum of all relevant penalties for the current line, except that the final line is never penalized. @<Append a penalty node, if a nonzero penalty is appropriate@>= if (cur_line+1!=best_line) {@+pen=inter_line_penalty; if (cur_line==prev_graf+1) pen=pen+club_penalty; if (cur_line+2==best_line) pen=pen+final_widow_penalty; if (disc_break) pen=pen+broken_penalty; if (pen!=0) {@+r=new_penalty(pen); link(tail)=r;tail=r; } } @* Pre-hyphenation. When the line-breaking routine is unable to find a feasible sequence of breakpoints, it makes a second pass over the paragraph, attempting to hyphenate the hyphenatable words. The goal of hyphenation is to insert discretionary material into the paragraph so that there are more potential places to break. The general rules for hyphenation are somewhat complex and technical, because we want to be able to hyphenate words that are preceded or followed by punctuation marks, and because we want the rules to work for languages other than English. We also must contend with the fact that hyphens might radically alter the ligature and kerning structure of a word. A sequence of characters will be considered for hyphenation only if it belongs to a ``potentially hyphenatable part'' of the current paragraph. This is a sequence of nodes $p_0p_1\ldots p_m$ where $p_0$ is a glue node, $p_1\ldots p_{m-1}$ are either character or ligature or whatsit or implicit kern nodes, and $p_m$ is a glue or penalty or insertion or adjust or mark or whatsit or explicit kern node. (Therefore hyphenation is disabled by boxes, math formulas, and discretionary nodes already inserted by the user.) The ligature nodes among $p_1\ldots p_{m-1}$ are effectively expanded into the original non-ligature characters; the kern nodes and whatsits are ignored. Each character |c| is now classified as either a nonletter (if |lc_code(c)==0|), a lowercase letter (if |lc_code(c)==c|), or an uppercase letter (otherwise); an uppercase letter is treated as if it were |lc_code(c)| for purposes of hyphenation. The characters generated by $p_1\ldots p_{m-1}$ may begin with nonletters; let $c_1$ be the first letter that is not in the middle of a ligature. Whatsit nodes preceding $c_1$ are ignored; a whatsit found after $c_1$ will be the terminating node $p_m$. All characters that do not have the same font as $c_1$ will be treated as nonletters. The |hyphen_char| for that font must be between 0 and 255, otherwise hyphenation will not be attempted. \TeX\ looks ahead for as many consecutive letters $c_1\ldots c_n$ as possible; however, |n| must be less than 64, so a character that would otherwise be $c_{64}$ is effectively not a letter. Furthermore $c_n$ must not be in the middle of a ligature. In this way we obtain a string of letters $c_1\ldots c_n$ that are generated by nodes $p_a\ldots p_b$, where |1 <= a <= b+1 <= m|. If |n >= l_hyf+r_hyf|, this string qualifies for hyphenation; however, |uc_hyph| must be positive, if $c_1$ is uppercase. The hyphenation process takes place in three stages. First, the candidate sequence $c_1\ldots c_n$ is found; then potential positions for hyphens are determined by referring to hyphenation tables; and finally, the nodes $p_a\ldots p_b$ are replaced by a new sequence of nodes that includes the discretionary breaks found. Fortunately, we do not have to do all this calculation very often, because of the way it has been taken out of \TeX's inner loop. For example, when the second edition of the author's 700-page book {\sl Seminumerical Algorithms} was typeset by \TeX, only about 1.2 hyphenations needed to be @^Knuth, Donald Ervin@> tried per paragraph, since the line breaking algorithm needed to use two passes on only about 5 per cent of the paragraphs. @<Initialize for hyphenating...@>= { #ifdef @!INIT if (trie_not_ready) init_trie(); #endif @;@/ cur_lang=init_cur_lang;l_hyf=init_l_hyf;r_hyf=init_r_hyf; } @ The letters $c_1\ldots c_n$ that are candidates for hyphenation are placed into an array called |hc|; the number |n| is placed into |hn|; pointers to nodes $p_{a-1}$ and~$p_b$ in the description above are placed into variables |ha| and |hb|; and the font number is placed into |hf|. @<Glob...@>= uint16_t @!hc[66]; /*word to be hyphenated*/ small_number @!hn; /*the number of positions occupied in |hc|*/ pointer @!ha, @!hb; /*nodes |ha dotdot hb| should be replaced by the hyphenated result*/ internal_font_number @!hf; /*font number of the letters in |hc|*/ uint16_t @!hu[64]; /*like |hc|, before conversion to lowercase*/ int @!hyf_char; /*hyphen character of the relevant font*/ ASCII_code @!cur_lang, @!init_cur_lang; /*current hyphenation table of interest*/ int @!l_hyf, @!r_hyf, @!init_l_hyf, @!init_r_hyf; /*limits on fragment sizes*/ halfword @!hyf_bchar; /*boundary character after $c_n$*/ @ Hyphenation routines need a few more local variables. @<Local variables for line...@>= small_number @!j; /*an index into |hc| or |hu|*/ uint8_t @!c; /*character being considered for hyphenation*/ @ When the following code is activated, the |line_break| procedure is in its second pass, and |cur_p| points to a glue node. @<Try to hyphenate...@>= {@+prev_s=cur_p;s=link(prev_s); if (s!=null) {@+@<Skip to node |ha|, or |goto done1| if no hyphenation should be attempted@>; if (l_hyf+r_hyf > 63) goto done1; @<Skip to node |hb|, putting letters into |hu| and |hc|@>; @<Check that the nodes following |hb| permit hyphenation and that at least |l_hyf+r_hyf| letters have been found, otherwise |goto done1|@>; hyphenate(); } done1: ;} @ @<Declare subprocedures for |line_break|@>= @t\4@>@<Declare the function called |reconstitute|@>@; void hyphenate(void) {@+ @<Local variables for hyphenation@>@; @<Find hyphen locations for the word in |hc|, or |return|@>; @<If no hyphens were found, |return|@>; @<Replace nodes |ha..hb| by a sequence of nodes that includes the discretionary hyphens@>; } @ The first thing we need to do is find the node |ha| just before the first letter. @<Skip to node |ha|, or |goto done1|...@>= loop@+{@+if (is_char_node(s)) {@+c=qo(character(s));hf=font(s); } else if (type(s)==ligature_node) if (lig_ptr(s)==null) goto resume; else{@+q=lig_ptr(s);c=qo(character(q));hf=font(q); } else if ((type(s)==kern_node)&&(subtype(s)==normal)) goto resume; else if (type(s)==whatsit_node) {@+@<Advance \(p)past a whatsit node in the \(p)pre-hyphenation loop@>; goto resume; } else goto done1; if (lc_code(c)!=0) if ((lc_code(c)==c)||(uc_hyph > 0)) goto done2; else goto done1; resume: prev_s=s;s=link(prev_s); } done2: hyf_char=hyphen_char[hf]; if (hyf_char < 0) goto done1; if (hyf_char > 255) goto done1; ha=prev_s @ The word to be hyphenated is now moved to the |hu| and |hc| arrays. @<Skip to node |hb|, putting letters...@>= hn=0; loop@+{@+if (is_char_node(s)) {@+if (font(s)!=hf) goto done3; hyf_bchar=character(s);c=qo(hyf_bchar); if (lc_code(c)==0) goto done3; if (hn==63) goto done3; hb=s;incr(hn);hu[hn]=c;hc[hn]=lc_code(c);hyf_bchar=non_char; } else if (type(s)==ligature_node) @<Move the characters of a ligature node to |hu| and |hc|; but |goto done3| if they are not all letters@>@; else if ((type(s)==kern_node)&&(subtype(s)==normal)) {@+hb=s; hyf_bchar=font_bchar[hf]; } else goto done3; s=link(s); } done3: @ We let |j| be the index of the character being stored when a ligature node is being expanded, since we do not want to advance |hn| until we are sure that the entire ligature consists of letters. Note that it is possible to get to |done3| with |hn==0| and |hb| not set to any value. @<Move the characters of a ligature node to |hu| and |hc|...@>= {@+if (font(lig_char(s))!=hf) goto done3; j=hn;q=lig_ptr(s);@+if (q > null) hyf_bchar=character(q); while (q > null) {@+c=qo(character(q)); if (lc_code(c)==0) goto done3; if (j==63) goto done3; incr(j);hu[j]=c;hc[j]=lc_code(c);@/ q=link(q); } hb=s;hn=j; if (odd(subtype(s))) hyf_bchar=font_bchar[hf];@+else hyf_bchar=non_char; } @ @<Check that the nodes following |hb| permit hyphenation...@>= if (hn < l_hyf+r_hyf) goto done1; /*|l_hyf| and |r_hyf| are | >= 1|*/ loop@+{@+if (!(is_char_node(s))) switch (type(s)) { case ligature_node: do_nothing;@+break; case kern_node: if (subtype(s)!=normal) goto done4;@+break; case whatsit_node: case glue_node: case penalty_node: case ins_node: case adjust_node: case mark_node: goto done4; default:goto done1; } s=link(s); } done4: @* Post-hyphenation. If a hyphen may be inserted between |hc[j]| and |hc[j+1]|, the hyphenation procedure will set |hyf[j]| to some small odd number. But before we look at \TeX's hyphenation procedure, which is independent of the rest of the line-breaking algorithm, let us consider what we will do with the hyphens it finds, since it is better to work on this part of the program before forgetting what |ha| and |hb|, etc., are all about. @<Glob...@>= uint8_t @!hyf[65]; /*odd values indicate discretionary hyphens*/ pointer @!init_list; /*list of punctuation characters preceding the word*/ bool @!init_lig; /*does |init_list| represent a ligature?*/ bool @!init_lft; /*if so, did the ligature involve a left boundary?*/ @ @<Local variables for hyphenation@>= int @!i, @!j, @!l; /*indices into |hc| or |hu|*/ pointer @!q, @!r, @!s; /*temporary registers for list manipulation*/ halfword @!bchar; /*right boundary character of hyphenated word, or |non_char|*/ @ \TeX\ will never insert a hyphen that has fewer than \.{\\lefthyphenmin} letters before it or fewer than \.{\\righthyphenmin} after it; hence, a short word has comparatively little chance of being hyphenated. If no hyphens have been found, we can save time by not having to make any changes to the paragraph. @<If no hyphens were found, |return|@>= for (j=l_hyf; j<=hn-r_hyf; j++) if (odd(hyf[j])) goto found1; return; found1: @ If hyphens are in fact going to be inserted, \TeX\ first deletes the subsequence of nodes between |ha| and~|hb|. An attempt is made to preserve the effect that implicit boundary characters and punctuation marks had on ligatures inside the hyphenated word, by storing a left boundary or preceding character in |hu[0]| and by storing a possible right boundary in |bchar|. We set |j=0| if |hu[0]| is to be part of the reconstruction; otherwise |j=1|. The variable |s| will point to the tail of the current hlist, and |q| will point to the node following |hb|, so that things can be hooked up after we reconstitute the hyphenated word. @<Replace nodes |ha..hb| by a sequence of nodes...@>= q=link(hb);link(hb)=null;r=link(ha);link(ha)=null;bchar=hyf_bchar; if (is_char_node(ha)) if (font(ha)!=hf) goto found2; else{@+init_list=ha;init_lig=false;hu[0]=qo(character(ha)); } else if (type(ha)==ligature_node) if (font(lig_char(ha))!=hf) goto found2; else{@+init_list=lig_ptr(ha);init_lig=true;init_lft=(subtype(ha) > 1); hu[0]=qo(character(lig_char(ha))); if (init_list==null) if (init_lft) {@+hu[0]=256;init_lig=false; } /*in this case a ligature will be reconstructed from scratch*/ free_node(ha, small_node_size); } else{@+ /*no punctuation found; look for left boundary*/ if (!is_char_node(r)) if (type(r)==ligature_node) if (subtype(r) > 1) goto found2; j=1;s=ha;init_list=null;goto common_ending; } s=cur_p; /*we have |cur_p!=ha| because |type(cur_p)==glue_node|*/ while (link(s)!=ha) s=link(s); j=0;goto common_ending; found2: s=ha;j=0;hu[0]=256;init_lig=false;init_list=null; common_ending: flush_node_list(r); @<Reconstitute nodes for the hyphenated word, inserting discretionary hyphens@>; flush_list(init_list) @ We must now face the fact that the battle is not over, even though the {\def\!{\kern-1pt}% hyphens have been found: The process of reconstituting a word can be nontrivial because ligatures might change when a hyphen is present. {\sl The \TeX book\/} discusses the difficulties of the word ``difficult'', and the discretionary material surrounding a hyphen can be considerably more complex than that. Suppose \.{abcdef} is a word in a font for which the only ligatures are \.{b\!c}, \.{c\!d}, \.{d\!e}, and \.{e\!f}. If this word permits hyphenation between \.b and \.c, the two patterns with and without hyphenation are $\.a\,\.b\,\.-\,\.{c\!d}\,\.{e\!f}$ and $\.a\,\.{b\!c}\,\.{d\!e}\,\.f$. Thus the insertion of a hyphen might cause effects to ripple arbitrarily far into the rest of the word. A further complication arises if additional hyphens appear together with such rippling, e.g., if the word in the example just given could also be hyphenated between \.c and \.d; \TeX\ avoids this by simply ignoring the additional hyphens in such weird cases.} Still further complications arise in the presence of ligatures that do not delete the original characters. When punctuation precedes the word being hyphenated, \TeX's method is not perfect under all possible scenarios, because punctuation marks and letters can propagate information back and forth. For example, suppose the original pre-hyphenation pair \.{*a} changes to \.{*y} via a \.{\?=:} ligature, which changes to \.{xy} via a \.{=:\?} ligature; if $p_{a-1}=\.x$ and $p_a=\.y$, the reconstitution procedure isn't smart enough to obtain \.{xy} again. In such cases the font designer should include a ligature that goes from \.{xa} to \.{xy}. @ The processing is facilitated by a subroutine called |reconstitute|. Given a string of characters $x_j\ldots x_n$, there is a smallest index $m\ge j$ such that the ``translation'' of $x_j\ldots x_n$ by ligatures and kerning has the form $y_1\ldots y_t$ followed by the translation of $x_{m+1}\ldots x_n$, where $y_1\ldots y_t$ is some nonempty sequence of character, ligature, and kern nodes. We call $x_j\ldots x_m$ a ``cut prefix'' of $x_j\ldots x_n$. For example, if $x_1x_2x_3=\.{fly}$, and if the font contains `fl' as a ligature and a kern between `fl' and `y', then $m=2$, $t=2$, and $y_1$ will be a ligature node for `fl' followed by an appropriate kern node~$y_2$. In the most common case, $x_j$~forms no ligature with $x_{j+1}$ and we simply have $m=j$, $y_1=x_j$. If $m<n$ we can repeat the procedure on $x_{m+1}\ldots x_n$ until the entire translation has been found. The |reconstitute| function returns the integer $m$ and puts the nodes $y_1\ldots y_t$ into a linked list starting at |link(hold_head)|, getting the input $x_j\ldots x_n$ from the |hu| array. If $x_j=256$, we consider $x_j$ to be an implicit left boundary character; in this case |j| must be strictly less than~|n|. There is a parameter |bchar|, which is either 256 or an implicit right boundary character assumed to be present just following~$x_n$. (The value |hu[n+1]| is never explicitly examined, but the algorithm imagines that |bchar| is there.) If there exists an index |k| in the range $j\le k\le m$ such that |hyf[k]| is odd and such that the result of |reconstitute| would have been different if $x_{k+1}$ had been |hchar|, then |reconstitute| sets |hyphen_passed| to the smallest such~|k|. Otherwise it sets |hyphen_passed| to zero. A special convention is used in the case |j==0|: Then we assume that the translation of |hu[0]| appears in a special list of charnodes starting at |init_list|; moreover, if |init_lig| is |true|, then |hu[0]| will be a ligature character, involving a left boundary if |init_lft| is |true|. This facility is provided for cases when a hyphenated word is preceded by punctuation (like single or double quotes) that might affect the translation of the beginning of the word. @<Glob...@>= small_number @!hyphen_passed; /*first hyphen in a ligature, if any*/ @ @<Declare the function called |reconstitute|@>= small_number reconstitute(small_number @!j, small_number @!n, halfword @!bchar, halfword @!hchar) {@+ pointer @!p; /*temporary register for list manipulation*/ pointer @!t; /*a node being appended to*/ four_quarters @!q; /*character information or a lig/kern instruction*/ halfword @!cur_rh; /*hyphen character for ligature testing*/ halfword @!test_char; /*hyphen or other character for ligature testing*/ scaled @!w; /*amount of kerning*/ font_index @!k; /*position of current lig/kern instruction*/ hyphen_passed=0;t=hold_head;w=0;link(hold_head)=null; /*at this point |ligature_present==lft_hit==rt_hit==false|*/ @<Set up data structures with the cursor following position |j|@>; resume: @<If there's a ligature or kern at the cursor position, update the data structures, possibly advancing~|j|; continue until the cursor moves@>; @<Append a ligature and/or kern to the translation; |goto continue| if the stack of inserted ligatures is nonempty@>; return j; } @ The reconstitution procedure shares many of the global data structures by which \TeX\ has processed the words before they were hyphenated. There is an implied ``cursor'' between characters |cur_l| and |cur_r|; these characters will be tested for possible ligature activity. If |ligature_present| then |cur_l| is a ligature character formed from the original characters following |cur_q| in the current translation list. There is a ``ligature stack'' between the cursor and character |j+1|, consisting of pseudo-ligature nodes linked together by their |link| fields. This stack is normally empty unless a ligature command has created a new character that will need to be processed later. A pseudo-ligature is a special node having a |character| field that represents a potential ligature and a |lig_ptr| field that points to a |char_node| or is |null|. We have $$|cur_r|=\cases{|character(lig_stack)|,&if |lig_stack > null|;\cr \hbox{|qi(hu[j+1])|},&if |lig_stack==null| and |j < n|;\cr bchar,&if |lig_stack==null| and |j==n|.\cr}$$ @<Glob...@>= halfword @!cur_l, @!cur_r; /*characters before and after the cursor*/ pointer @!cur_q; /*where a ligature should be detached*/ pointer @!lig_stack; /*unfinished business to the right of the cursor*/ bool @!ligature_present; /*should a ligature node be made for |cur_l|?*/ bool @!lft_hit, @!rt_hit; /*did we hit a ligature with a boundary character?*/ @ @d append_charnode_to_t(X) {@+link(t)=get_avail();t=link(t); font(t)=hf;character(t)=X; } @d set_cur_r {@+if (j < n) cur_r=qi(hu[j+1]);@+else cur_r=bchar; if (odd(hyf[j])) cur_rh=hchar;@+else cur_rh=non_char; } @<Set up data structures with the cursor following position |j|@>= cur_l=qi(hu[j]);cur_q=t; if (j==0) {@+ligature_present=init_lig;p=init_list; if (ligature_present) lft_hit=init_lft; while (p > null) {@+append_charnode_to_t(character(p));p=link(p); } } else if (cur_l < non_char) append_charnode_to_t(cur_l); lig_stack=null;set_cur_r @ We may want to look at the lig/kern program twice, once for a hyphen and once for a normal letter. (The hyphen might appear after the letter in the program, so we'd better not try to look for both at once.) @<If there's a ligature or kern at the cursor position, update...@>= if (cur_l==non_char) {@+k=bchar_label[hf]; if (k==non_address) goto done;@+else q=font_info[k].qqqq; } else{@+q=char_info(hf)(cur_l); if (char_tag(q)!=lig_tag) goto done; k=lig_kern_start(hf)(q);q=font_info[k].qqqq; if (skip_byte(q) > stop_flag) {@+k=lig_kern_restart(hf)(q);q=font_info[k].qqqq; } } /*now |k| is the starting address of the lig/kern program*/ if (cur_rh < non_char) test_char=cur_rh;@+else test_char=cur_r; loop@+{@+if (next_char(q)==test_char) if (skip_byte(q) <= stop_flag) if (cur_rh < non_char) {@+hyphen_passed=j;hchar=non_char;cur_rh=non_char; goto resume; } else{@+if (hchar < non_char) if (odd(hyf[j])) {@+hyphen_passed=j;hchar=non_char; } if (op_byte(q) < kern_flag) @<Carry out a ligature replacement, updating the cursor structure and possibly advancing~|j|; |goto continue| if the cursor doesn't advance, otherwise |goto done|@>; w=char_kern(hf)(q);goto done; /*this kern will be inserted below*/ } if (skip_byte(q) >= stop_flag) if (cur_rh==non_char) goto done; else{@+cur_rh=non_char;goto resume; } k=k+qo(skip_byte(q))+1;q=font_info[k].qqqq; } done: @ @d wrap_lig(X) if (ligature_present) {@+p=new_ligature(hf, cur_l, link(cur_q)); if (lft_hit) {@+subtype(p)=2;lft_hit=false; } if (X) if (lig_stack==null) {@+incr(subtype(p));rt_hit=false; } link(cur_q)=p;t=p;ligature_present=false; } @d pop_lig_stack {@+if (lig_ptr(lig_stack) > null) {@+link(t)=lig_ptr(lig_stack); /*this is a charnode for |hu[j+1]|*/ t=link(t);incr(j); } p=lig_stack;lig_stack=link(p);free_node(p, small_node_size); if (lig_stack==null) set_cur_r@;@+else cur_r=character(lig_stack); } /*if |lig_stack| isn't |null| we have |cur_rh==non_char|*/ @<Append a ligature and/or kern to the translation...@>= wrap_lig(rt_hit); if (w!=0) {@+link(t)=new_kern(w);t=link(t);w=0; } if (lig_stack > null) {@+cur_q=t;cur_l=character(lig_stack);ligature_present=true; pop_lig_stack;goto resume; } @ @<Carry out a ligature replacement, updating the cursor structure...@>= {@+if (cur_l==non_char) lft_hit=true; if (j==n) if (lig_stack==null) rt_hit=true; check_interrupt; /*allow a way out in case there's an infinite ligature loop*/ switch (op_byte(q)) { qi(case 1): qi(case 5): {@+cur_l=rem_byte(q); /*\.{=:\?}, \.{=:\?>}*/ ligature_present=true; } @+break; qi(case 2): qi(case 6): {@+cur_r=rem_byte(q); /*\.{\?=:}, \.{\?=:>}*/ if (lig_stack > null) character(lig_stack)=cur_r; else{@+lig_stack=new_lig_item(cur_r); if (j==n) bchar=non_char; else{@+p=get_avail();lig_ptr(lig_stack)=p; character(p)=qi(hu[j+1]);font(p)=hf; } } } @+break; qi(case 3): {@+cur_r=rem_byte(q); /*\.{\?=:\?}*/ p=lig_stack;lig_stack=new_lig_item(cur_r);link(lig_stack)=p; } @+break; qi(case 7): qi(case 11): {@+wrap_lig(false); /*\.{\?=:\?>}, \.{\?=:\?>>}*/ cur_q=t;cur_l=rem_byte(q);ligature_present=true; } @+break; default:{@+cur_l=rem_byte(q);ligature_present=true; /*\.{=:}*/ if (lig_stack > null) pop_lig_stack@; else if (j==n) goto done; else{@+append_charnode_to_t(cur_r);incr(j);set_cur_r; } } } if (op_byte(q) > qi(4)) if (op_byte(q)!=qi(7)) goto done; goto resume; } @ Okay, we're ready to insert the potential hyphenations that were found. When the following program is executed, we want to append the word |hu[1 dotdot hn]| after node |ha|, and node |q| should be appended to the result. During this process, the variable |i| will be a temporary index into |hu|; the variable |j| will be an index to our current position in |hu|; the variable |l| will be the counterpart of |j|, in a discretionary branch; the variable |r| will point to new nodes being created; and we need a few new local variables: @<Local variables for hyph...@>= pointer @!major_tail, @!minor_tail; /*the end of lists in the main and discretionary branches being reconstructed*/ ASCII_code @!c; /*character temporarily replaced by a hyphen*/ uint8_t @!c_loc; /*where that character came from*/ int @!r_count; /*replacement count for discretionary*/ pointer @!hyf_node; /*the hyphen, if it exists*/ @ When the following code is performed, |hyf[0]| and |hyf[hn]| will be zero. @<Reconstitute nodes for the hyphenated word...@>= @/do@+{l=j;j=reconstitute(j, hn, bchar, qi(hyf_char))+1; if (hyphen_passed==0) {@+link(s)=link(hold_head); while (link(s) > null) s=link(s); if (odd(hyf[j-1])) {@+l=j;hyphen_passed=j-1;link(hold_head)=null; } } if (hyphen_passed > 0) @<Create and append a discretionary node as an alternative to the unhyphenated word, and continue to develop both branches until they become equivalent@>; }@+ while (!(j > hn)); link(s)=q @ In this repeat loop we will insert another discretionary if |hyf[j-1]| is odd, when both branches of the previous discretionary end at position |j-1|. Strictly speaking, we aren't justified in doing this, because we don't know that a hyphen after |j-1| is truly independent of those branches. But in almost all applications we would rather not lose a potentially valuable hyphenation point. (Consider the word `difficult', where the letter `c' is in position |j|.) @d advance_major_tail {@+major_tail=link(major_tail);incr(r_count); } @<Create and append a discretionary node as an alternative...@>= @/do@+{r=get_node(small_node_size); link(r)=link(hold_head);type(r)=disc_node; major_tail=r;r_count=0; while (link(major_tail) > null) advance_major_tail; i=hyphen_passed;hyf[i]=0; @<Put the \(c)characters |hu[l..i]| and a hyphen into |pre_break(r)|@>; @<Put the \(c)characters |hu[i+1..]| into |post_break(r)|, appending to this list and to |major_tail| until synchronization has been achieved@>; @<Move pointer |s| to the end of the current list, and set |replace_count(r)| appropriately@>; hyphen_passed=j-1;link(hold_head)=null; }@+ while (!(!odd(hyf[j-1]))) @ The new hyphen might combine with the previous character via ligature or kern. At this point we have |l-1 <= i < j| and |i < hn|. @<Put the \(c)characters |hu[l..i]| and a hyphen into |pre_break(r)|@>= minor_tail=null;pre_break(r)=null;hyf_node=new_character(hf, hyf_char); if (hyf_node!=null) {@+incr(i);c=hu[i];hu[i]=hyf_char;free_avail(hyf_node); } while (l <= i) {@+l=reconstitute(l, i, font_bchar[hf], non_char)+1; if (link(hold_head) > null) {@+if (minor_tail==null) pre_break(r)=link(hold_head); else link(minor_tail)=link(hold_head); minor_tail=link(hold_head); while (link(minor_tail) > null) minor_tail=link(minor_tail); } } if (hyf_node!=null) {@+hu[i]=c; /*restore the character in the hyphen position*/ l=i;decr(i); } @ The synchronization algorithm begins with |l==i+1 <= j|. @<Put the \(c)characters |hu[i+1..]| into |post_break(r)|...@>= minor_tail=null;post_break(r)=null;c_loc=0; if (bchar_label[hf]!=non_address) /*put left boundary at beginning of new line*/ {@+decr(l);c=hu[l];c_loc=l;hu[l]=256; } while (l < j) {@+@/do@+{l=reconstitute(l, hn, bchar, non_char)+1; if (c_loc > 0) {@+hu[c_loc]=c;c_loc=0; } if (link(hold_head) > null) {@+if (minor_tail==null) post_break(r)=link(hold_head); else link(minor_tail)=link(hold_head); minor_tail=link(hold_head); while (link(minor_tail) > null) minor_tail=link(minor_tail); } }@+ while (!(l >= j)); while (l > j) @<Append characters of |hu[j..]| to |major_tail|, advancing~|j|@>; } @ @<Append characters of |hu[j..]|...@>= {@+j=reconstitute(j, hn, bchar, non_char)+1; link(major_tail)=link(hold_head); while (link(major_tail) > null) advance_major_tail; } @ Ligature insertion can cause a word to grow exponentially in size. Therefore we must test the size of |r_count| here, even though the hyphenated text was at most 63 characters long. @<Move pointer |s| to the end of the current list...@>= if (r_count > 127) /*we have to forget the discretionary hyphen*/ {@+link(s)=link(r);link(r)=null;flush_node_list(r); } else{@+link(s)=r;replace_count(r)=r_count; } s=major_tail @* Hyphenation. When a word |hc[1 dotdot hn]| has been set up to contain a candidate for hyphenation, \TeX\ first looks to see if it is in the user's exception dictionary. If not, hyphens are inserted based on patterns that appear within the given word, using an algorithm due to Frank~M. Liang. @^Liang, Franklin Mark@> Let's consider Liang's method first, since it is much more interesting than the exception-lookup routine. The algorithm begins by setting |hyf[j]| to zero for all |j|, and invalid characters are inserted into |hc[0]| and |hc[hn+1]| to serve as delimiters. Then a reasonably fast method is used to see which of a given set of patterns occurs in the word |hc[0 dotdot(hn+1)]|. Each pattern $p_1\ldots p_k$ of length |k| has an associated sequence of |k+1| numbers $n_0\ldots n_k$; and if the pattern occurs in |hc[(j+1)dotdot(j+k)]|, \TeX\ will set |hyf[j+i]=@tmax@>(hyf[j+i],@t$n_i$@>)| for |0 <= i <= k|. After this has been done for each pattern that occurs, a discretionary hyphen will be inserted between |hc[j]| and |hc[j+1]| when |hyf[j]| is odd, as we have already seen. The set of patterns $p_1\ldots p_k$ and associated numbers $n_0\ldots n_k$ depends, of course, on the language whose words are being hyphenated, and on the degree of hyphenation that is desired. A method for finding appropriate |p|'s and |n|'s, from a given dictionary of words and acceptable hyphenations, is discussed in Liang's Ph.D. thesis (Stanford University, 1983); \TeX\ simply starts with the patterns and works from there. @ The patterns are stored in a compact table that is also efficient for retrieval, using a variant of ``trie memory'' [cf.\ {\sl The Art of Computer Programming \bf3} (1973), 481--505]. We can find each pattern $p_1\ldots p_k$ by letting $z_0$ be one greater than the relevant language index and then, for |1 <= i <= k|, setting |@t$z_i$@>=trie_link@t$(z_{i-1})+p_i$@>|; the pattern will be identified by the number $z_k$. Since all the pattern information is packed together into a single |trie_link| array, it is necessary to prevent confusion between the data from inequivalent patterns, so another table is provided such that |trie_char@t$(z_i)=p_i$@>| for all |i|. There is also a table |trie_op|$(z_k)$ to identify the numbers $n_0\ldots n_k$ associated with $p_1\ldots p_k$. Comparatively few different number sequences $n_0\ldots n_k$ actually occur, since most of the |n|'s are generally zero. Therefore the number sequences are encoded in such a way that |trie_op|$(z_k)$ is only one byte long. If |trie_op(@t$z_k$@>)!=min_quarterword|, when $p_1\ldots p_k$ has matched the letters in |hc[(l-k+1)dotdot l@,]| of language |t|, we perform all of the required operations for this pattern by carrying out the following little program: Set |v=trie_op(@t$z_k$@>)|. Then set |v=v+op_start[t]|, |hyf[l-hyf_distance[v]]=@tmax@>(hyf[l-hyf_distance[v]], hyf_num[v])|, and |v=hyf_next[v]|; repeat, if necessary, until |v==min_quarterword|. @<Types...@>= typedef uint16_t trie_pointer; /*an index into |trie|*/ @ @d trie_link(X) trie[X].rh /*``downward'' link in a trie*/ @d trie_char(X) trie[X].b1 /*character matched at this trie location*/ @d trie_op(X) trie[X].b0 /*program for hyphenation at this trie location*/ @<Glob...@>= two_halves @!trie[trie_size+1]; /*|trie_link|, |trie_char|, |trie_op|*/ small_number @!hyf_distance0[trie_op_size], *const @!hyf_distance = @!hyf_distance0-1; /*position |k-j| of $n_j$*/ small_number @!hyf_num0[trie_op_size], *const @!hyf_num = @!hyf_num0-1; /*value of $n_j$*/ quarterword @!hyf_next0[trie_op_size], *const @!hyf_next = @!hyf_next0-1; /*continuation code*/ uint16_t @!op_start[256]; /*offset for current language*/ @ @<Local variables for hyph...@>= trie_pointer @!z; /*an index into |trie|*/ int @!v; /*an index into |hyf_distance|, etc.*/ @ Assuming that these auxiliary tables have been set up properly, the hyphenation algorithm is quite short. In the following code we set |hc[hn+2]| to the impossible value 256, in order to guarantee that |hc[hn+3]| will never be fetched. @<Find hyphen locations for the word in |hc|...@>= for (j=0; j<=hn; j++) hyf[j]=0; @<Look for the word |hc[1..hn]| in the exception table, and |goto found| (with |hyf| containing the hyphens) if an entry is found@>; if (trie_char(cur_lang+1)!=qi(cur_lang)) return; /*no patterns for |cur_lang|*/ hc[0]=0;hc[hn+1]=0;hc[hn+2]=256; /*insert delimiters*/ for (j=0; j<=hn-r_hyf+1; j++) {@+z=trie_link(cur_lang+1)+hc[j];l=j; while (hc[l]==qo(trie_char(z))) {@+if (trie_op(z)!=min_quarterword) @<Store \(m)maximum values in the |hyf| table@>; incr(l);z=trie_link(z)+hc[l]; } } found: for (j=0; j<=l_hyf-1; j++) hyf[j]=0; for (j=0; j<=r_hyf-1; j++) hyf[hn-j]=0 @ @<Store \(m)maximum values in the |hyf| table@>= {@+v=trie_op(z); @/do@+{v=v+op_start[cur_lang];i=l-hyf_distance[v]; if (hyf_num[v] > hyf[i]) hyf[i]=hyf_num[v]; v=hyf_next[v]; }@+ while (!(v==min_quarterword)); } @ The exception table that is built by \TeX's \.{\\hyphenation} primitive is organized as an ordered hash table [cf.\ Amble and Knuth, {\sl The Computer @^Amble, Ole@> @^Knuth, Donald Ervin@> Journal\/ \bf17} (1974), 135--142] using linear probing. If $\alpha$ and $\beta$ are words, we will say that $\alpha<\beta$ if $\vert\alpha\vert< \vert\beta\vert$ or if $\vert\alpha\vert=\vert\beta\vert$ and $\alpha$ is lexicographically smaller than $\beta$. (The notation $\vert \alpha\vert$ stands for the length of $\alpha$.) The idea of ordered hashing is to arrange the table so that a given word $\alpha$ can be sought by computing a hash address $h=h(\alpha)$ and then looking in table positions |h|, |h-1|, \dots, until encountering the first word $\L\alpha$. If this word is different from $\alpha$, we can conclude that $\alpha$ is not in the table. The words in the table point to lists in |mem| that specify hyphen positions in their |info| fields. The list for $c_1\ldots c_n$ contains the number |k| if the word $c_1\ldots c_n$ has a discretionary hyphen between $c_k$ and $c_{k+1}$. @<Types...@>= typedef uint16_t hyph_pointer; /*an index into the ordered hash table*/ @ @<Glob...@>= str_number @!hyph_word[hyph_size+1]; /*exception words*/ pointer @!hyph_list[hyph_size+1]; /*lists of hyphen positions*/ hyph_pointer @!hyph_count; /*the number of words in the exception dictionary*/ @ @<Local variables for init...@>= hyph_pointer @!z; /*runs through the exception dictionary*/ @ @<Set init...@>= for (z=0; z<=hyph_size; z++) {@+hyph_word[z]=0;hyph_list[z]=null; } hyph_count=0; @ The algorithm for exception lookup is quite simple, as soon as we have a few more local variables to work with. @<Local variables for hyph...@>= hyph_pointer @!h; /*an index into |hyph_word| and |hyph_list|*/ str_number @!k; /*an index into |str_start|*/ pool_pointer @!u; /*an index into |str_pool|*/ @ First we compute the hash code |h|, then we search until we either find the word or we don't. Words from different languages are kept separate by appending the language code to the string. @<Look for the word |hc[1...@>= h=hc[1];incr(hn);hc[hn]=cur_lang; for (j=2; j<=hn; j++) h=(h+h+hc[j])%hyph_size; loop@+{@+@<If the string |hyph_word[h]| is less than \(hc)|hc[1..hn]|, |goto not_found|; but if the two strings are equal, set |hyf| to the hyphen positions and |goto found|@>; if (h > 0) decr(h);@+else h=hyph_size; } not_found: decr(hn) @ @<If the string |hyph_word[h]| is less than \(hc)...@>= k=hyph_word[h];if (k==0) goto not_found; if (length(k) < hn) goto not_found; if (length(k)==hn) {@+j=1;u=str_start[k]; @/do@+{if (so(str_pool[u]) < hc[j]) goto not_found; if (so(str_pool[u]) > hc[j]) goto done; incr(j);incr(u); }@+ while (!(j > hn)); @<Insert hyphens as specified in |hyph_list[h]|@>; decr(hn);goto found; } done: @ @<Insert hyphens as specified...@>= s=hyph_list[h]; while (s!=null) {@+hyf[info(s)]=1;s=link(s); } @ @<Search |hyph_list| for pointers to |p|@>= for (q=0; q<=hyph_size; q++) {@+if (hyph_list[q]==p) {@+print_nl("HYPH(");print_int(q);print_char(')'); } } @ We have now completed the hyphenation routine, so the |line_break| procedure is finished at last. Since the hyphenation exception table is fresh in our minds, it's a good time to deal with the routine that adds new entries to it. When \TeX\ has scanned `\.{\\hyphenation}', it calls on a procedure named |new_hyph_exceptions| to do the right thing. @d set_cur_lang if (language <= 0) cur_lang=0; else if (language > 255) cur_lang=0; else cur_lang=language @p void new_hyph_exceptions(void) /*enters new exceptions*/ {@+ uint8_t n; /*length of current word; not always a |small_number|*/ int @!j; /*an index into |hc|*/ hyph_pointer @!h; /*an index into |hyph_word| and |hyph_list|*/ str_number @!k; /*an index into |str_start|*/ pointer @!p; /*head of a list of hyphen positions*/ pointer @!q; /*used when creating a new node for list |p|*/ str_number @!s, @!t; /*strings being compared or stored*/ pool_pointer @!u, @!v; /*indices into |str_pool|*/ scan_left_brace(); /*a left brace must follow \.{\\hyphenation}*/ set_cur_lang; @<Enter as many hyphenation exceptions as are listed, until coming to a right brace; then |return|@>; } @ @<Enter as many...@>= n=0;p=null; loop@+{@+get_x_token(); reswitch: switch (cur_cmd) { case letter: case other_char: case char_given: @<Append a new letter or hyphen@>@;@+break; case char_num: {@+scan_char_num();cur_chr=cur_val;cur_cmd=char_given; goto reswitch; } case spacer: case right_brace: {@+if (n > 1) @<Enter a hyphenation exception@>; if (cur_cmd==right_brace) return; n=0;p=null; } @+break; default:@<Give improper \.{\\hyphenation} error@>@; } } @ @<Give improper \.{\\hyph...@>= {@+print_err("Improper ");print_esc(@[@<|"hyphenation"|@>@]); @.Improper \\hyphenation...@> print_str(" will be flushed"); help2("Hyphenation exceptions must contain only letters")@/ ("and hyphens. But continue; I'll forgive and forget."); error(); } @ @<Append a new letter or hyphen@>= if (cur_chr=='-') @<Append the value |n| to list |p|@>@; else{@+if (lc_code(cur_chr)==0) {@+print_err("Not a letter"); @.Not a letter@> help2("Letters in \\hyphenation words must have \\lccode>0.")@/ ("Proceed; I'll ignore the character I just read."); error(); } else if (n < 63) {@+incr(n);hc[n]=lc_code(cur_chr); } } @ @<Append the value |n| to list |p|@>= {@+if (n < 63) {@+q=get_avail();link(q)=p;info(q)=n;p=q; } } @ @<Enter a hyphenation exception@>= {@+incr(n);hc[n]=cur_lang;str_room(n);h=0; for (j=1; j<=n; j++) {@+h=(h+h+hc[j])%hyph_size; append_char(hc[j]); } s=make_string(); @<Insert the \(p)pair |(s,p)| into the exception table@>; } @ @<Insert the \(p)pair |(s,p)|...@>= if (hyph_count==hyph_size) overflow("exception dictionary", hyph_size); @:TeX capacity exceeded exception dictionary}{\quad exception dictionary@> incr(hyph_count); while (hyph_word[h]!=0) {@+@<If the string |hyph_word[h]| is less than \(or)or equal to |s|, interchange |(hyph_word[h],hyph_list[h])| with |(s,p)|@>; if (h > 0) decr(h);@+else h=hyph_size; } hyph_word[h]=s;hyph_list[h]=p @ @<If the string |hyph_word[h]| is less than \(or)...@>= k=hyph_word[h]; if (length(k) < length(s)) goto found; if (length(k) > length(s)) goto not_found; u=str_start[k];v=str_start[s]; @/do@+{if (str_pool[u] < str_pool[v]) goto found; if (str_pool[u] > str_pool[v]) goto not_found; incr(u);incr(v); }@+ while (!(u==str_start[k+1])); found: q=hyph_list[h];hyph_list[h]=p;p=q;@/ t=hyph_word[h];hyph_word[h]=s;s=t; not_found: @* Initializing the hyphenation tables. The trie for \TeX's hyphenation algorithm is built from a sequence of patterns following a \.{\\patterns} specification. Such a specification is allowed only in \.{INITEX}, since the extra memory for auxiliary tables and for the initialization program itself would only clutter up the production version of \TeX\ with a lot of deadwood. The first step is to build a trie that is linked, instead of packed into sequential storage, so that insertions are readily made. After all patterns have been processed, \.{INITEX} compresses the linked trie by identifying common subtries. Finally the trie is packed into the efficient sequential form that the hyphenation algorithm actually uses. @<Declare subprocedures for |line_break|@>= #ifdef @!INIT @<Declare procedures for preprocessing hyphenation patterns@>@; #endif @ Before we discuss trie building in detail, let's consider the simpler problem of creating the |hyf_distance|, |hyf_num|, and |hyf_next| arrays. Suppose, for example, that \TeX\ reads the pattern `\.{ab2cde1}'. This is a pattern of length 5, with $n_0\ldots n_5=0\,0\,2\,0\,0\,1$ in the notation above. We want the corresponding |trie_op| code |v| to have |hyf_distance[v]==3|, |hyf_num[v]==2|, and |hyf_next[v]==@t$v^\prime$@>|, where the auxiliary |trie_op| code $v^\prime$ has |hyf_distance[@t$v^\prime$@>]==0|, |hyf_num[@t$v^\prime$@>]==1|, and |hyf_next[@t$v^\prime$@>]==min_quarterword|. \TeX\ computes an appropriate value |v| with the |new_trie_op| subroutine below, by setting $$\hbox{|@t$v^\prime$@>=new_trie_op(0, 1, min_quarterword)|,\qquad |v=new_trie_op(3, 2,@t$v^\prime$@>)|.}$$ This subroutine looks up its three parameters in a special hash table, assigning a new value only if these three have not appeared before for the current language. The hash table is called |trie_op_hash|, and the number of entries it contains is |trie_op_ptr|. @<Glob...@>= #ifdef @!INIT uint16_t @!trie_op_hash0[trie_op_size+trie_op_size+1], *const @!trie_op_hash = @!trie_op_hash0+trie_op_size; /*trie op codes for quadruples*/ quarterword @!trie_used[256]; /*largest opcode used so far for this language*/ ASCII_code @!trie_op_lang0[trie_op_size], *const @!trie_op_lang = @!trie_op_lang0-1; /*language part of a hashed quadruple*/ quarterword @!trie_op_val0[trie_op_size], *const @!trie_op_val = @!trie_op_val0-1; /*opcode corresponding to a hashed quadruple*/ uint16_t @!trie_op_ptr; /*number of stored ops so far*/ #endif @ It's tempting to remove the |overflow| stops in the following procedure; |new_trie_op| could return |min_quarterword| (thereby simply ignoring part of a hyphenation pattern) instead of aborting the job. However, that would lead to different hyphenation results on different installations of \TeX\ using the same patterns. The |overflow| stops are necessary for portability of patterns. @<Declare procedures for preprocessing hyph...@>= quarterword new_trie_op(small_number @!d, small_number @!n, quarterword @!v) {@+ int16_t h; /*trial hash location*/ quarterword @!u; /*trial op code*/ uint16_t @!l; /*pointer to stored data*/ h=abs(n+313*d+361*v+1009*cur_lang)%(trie_op_size+trie_op_size) -trie_op_size; loop@+{@+l=trie_op_hash[h]; if (l==0) /*empty position found for a new op*/ {@+if (trie_op_ptr==trie_op_size) overflow("pattern memory ops", trie_op_size); u=trie_used[cur_lang]; if (u==max_quarterword) overflow("pattern memory ops per language", max_quarterword-min_quarterword); incr(trie_op_ptr);incr(u);trie_used[cur_lang]=u; hyf_distance[trie_op_ptr]=d; hyf_num[trie_op_ptr]=n;hyf_next[trie_op_ptr]=v; trie_op_lang[trie_op_ptr]=cur_lang;trie_op_hash[h]=trie_op_ptr; trie_op_val[trie_op_ptr]=u;return u; } if ((hyf_distance[l]==d)&&(hyf_num[l]==n)&&(hyf_next[l]==v) &&(trie_op_lang[l]==cur_lang)) {@+return trie_op_val[l]; } if (h > -trie_op_size) decr(h);@+else h=trie_op_size; } } @ After |new_trie_op| has compressed the necessary opcode information, plenty of information is available to unscramble the data into the final form needed by our hyphenation algorithm. @<Sort \(t)the hyphenation op tables into proper order@>= op_start[0]=-min_quarterword; for (j=1; j<=255; j++) op_start[j]=op_start[j-1]+qo(trie_used[j-1]); for (j=1; j<=trie_op_ptr; j++) trie_op_hash[j]=op_start[trie_op_lang[j]]+trie_op_val[j]; /*destination*/ for (j=1; j<=trie_op_ptr; j++) while (trie_op_hash[j] > j) {@+k=trie_op_hash[j];@/ t=hyf_distance[k];hyf_distance[k]=hyf_distance[j];hyf_distance[j]=t;@/ t=hyf_num[k];hyf_num[k]=hyf_num[j];hyf_num[j]=t;@/ t=hyf_next[k];hyf_next[k]=hyf_next[j];hyf_next[j]=t;@/ trie_op_hash[j]=trie_op_hash[k];trie_op_hash[k]=k; } @ Before we forget how to initialize the data structures that have been mentioned so far, let's write down the code that gets them started. @<Initialize table entries...@>= for (k=-trie_op_size; k<=trie_op_size; k++) trie_op_hash[k]=0; for (k=0; k<=255; k++) trie_used[k]=min_quarterword; trie_op_ptr=0; @ The linked trie that is used to preprocess hyphenation patterns appears in several global arrays. Each node represents an instruction of the form ``if you see character |c|, then perform operation |o|, move to the next character, and go to node |l|; otherwise go to node |r|.'' The four quantities |c|, |o|, |l|, and |r| are stored in four arrays |trie_c|, |trie_o|, |trie_l|, and |trie_r|. The root of the trie is |trie_l[0]|, and the number of nodes is |trie_ptr|. Null trie pointers are represented by zero. To initialize the trie, we simply set |trie_l[0]| and |trie_ptr| to zero. We also set |trie_c[0]| to some arbitrary value, since the algorithm may access it. The algorithms maintain the condition $$\hbox{|trie_c[trie_r[z]] > trie_c[z]|\qquad whenever |z!=0| and |trie_r[z]!=0|};$$ in other words, sibling nodes are ordered by their |c| fields. @d trie_root trie_l[0] /*root of the linked trie*/ @<Glob...@>= #ifdef @!INIT packed_ASCII_code @!trie_c[trie_size+1]; /*characters to match*/ @t\hskip10pt@>quarterword @!trie_o[trie_size+1]; /*operations to perform*/ @t\hskip10pt@>trie_pointer @!trie_l[trie_size+1]; /*left subtrie links*/ @t\hskip10pt@>trie_pointer @!trie_r[trie_size+1]; /*right subtrie links*/ @t\hskip10pt@>trie_pointer @!trie_ptr; /*the number of nodes in the trie*/ @t\hskip10pt@>trie_pointer @!trie_hash[trie_size+1]; /*used to identify equivalent subtries*/ #endif @ Let us suppose that a linked trie has already been constructed. Experience shows that we can often reduce its size by recognizing common subtries; therefore another hash table is introduced for this purpose, somewhat similar to |trie_op_hash|. The new hash table will be initialized to zero. The function |trie_node(p)| returns |p| if |p| is distinct from other nodes that it has seen, otherwise it returns the number of the first equivalent node that it has seen. Notice that we might make subtries equivalent even if they correspond to patterns for different languages, in which the trie ops might mean quite different things. That's perfectly all right. @<Declare procedures for preprocessing hyph...@>= trie_pointer trie_node(trie_pointer @!p) /*converts to a canonical form*/ {@+ trie_pointer h; /*trial hash location*/ trie_pointer @!q; /*trial trie node*/ h=abs(trie_c[p]+1009*trie_o[p]+@| 2718*trie_l[p]+3142*trie_r[p])%trie_size; loop@+{@+q=trie_hash[h]; if (q==0) {@+trie_hash[h]=p;return p; } if ((trie_c[q]==trie_c[p])&&(trie_o[q]==trie_o[p])&&@| (trie_l[q]==trie_l[p])&&(trie_r[q]==trie_r[p])) {@+return q; } if (h > 0) decr(h);@+else h=trie_size; } } @ A neat recursive procedure is now able to compress a trie by traversing it and applying |trie_node| to its nodes in ``bottom up'' fashion. We will compress the entire trie by clearing |trie_hash| to zero and then saying `|trie_root=compress_trie(trie_root)|'. @^recursion@> @<Declare procedures for preprocessing hyph...@>= trie_pointer compress_trie(trie_pointer @!p) {@+if (p==0) return 0; else{@+trie_l[p]=compress_trie(trie_l[p]); trie_r[p]=compress_trie(trie_r[p]); return trie_node(p); } } @ The compressed trie will be packed into the |trie| array using a ``top-down first-fit'' procedure. This is a little tricky, so the reader should pay close attention: The |trie_hash| array is cleared to zero again and renamed |trie_ref| for this phase of the operation; later on, |trie_ref[p]| will be nonzero only if the linked trie node |p| is the smallest character in a family and if the characters |c| of that family have been allocated to locations |trie_ref[p]+c| in the |trie| array. Locations of |trie| that are in use will have |trie_link==0|, while the unused holes in |trie| will be doubly linked with |trie_link| pointing to the next larger vacant location and |trie_back| pointing to the next smaller one. This double linking will have been carried out only as far as |trie_max|, where |trie_max| is the largest index of |trie| that will be needed. To save time at the low end of the trie, we maintain array entries |trie_min[c]| pointing to the smallest hole that is greater than~|c|. Another array |trie_taken| tells whether or not a given location is equal to |trie_ref[p]| for some |p|; this array is used to ensure that distinct nodes in the compressed trie will have distinct |trie_ref| entries. @d trie_ref trie_hash /*where linked trie families go into |trie|*/ @d trie_back(X) trie[X].lh /*backward links in |trie| holes*/ @<Glob...@>= #ifdef @!INIT bool @!trie_taken0[trie_size], *const @!trie_taken = @!trie_taken0-1; /*does a family start here?*/ @t\hskip10pt@>trie_pointer @!trie_min[256]; /*the first possible slot for each character*/ @t\hskip10pt@>trie_pointer @!trie_max; /*largest location used in |trie|*/ @t\hskip10pt@>bool @!trie_not_ready; /*is the trie still in linked form?*/ #endif @ Each time \.{\\patterns} appears, it contributes further patterns to the future trie, which will be built only when hyphenation is attempted or when a format file is dumped. The boolean variable |trie_not_ready| will change to |false| when the trie is compressed; this will disable further patterns. @<Initialize table entries...@>= trie_not_ready=true;trie_root=0;trie_c[0]=si(0);trie_ptr=0; @ Here is how the trie-compression data structures are initialized. If storage is tight, it would be possible to overlap |trie_op_hash|, |trie_op_lang|, and |trie_op_val| with |trie|, |trie_hash|, and |trie_taken|, because we finish with the former just before we need the latter. @<Get ready to compress the trie@>= @<Sort \(t)the hyphenation...@>; for (p=0; p<=trie_size; p++) trie_hash[p]=0; trie_root=compress_trie(trie_root); /*identify equivalent subtries*/ for (p=0; p<=trie_ptr; p++) trie_ref[p]=0; for (p=0; p<=255; p++) trie_min[p]=p+1; trie_link(0)=1;trie_max=0 @ The |first_fit| procedure finds the smallest hole |z| in |trie| such that a trie family starting at a given node |p| will fit into vacant positions starting at |z|. If |c==trie_c[p]|, this means that location |z-c| must not already be taken by some other family, and that |z-c+@t$c^\prime$@>| must be vacant for all characters $c^\prime$ in the family. The procedure sets |trie_ref[p]| to |z-c| when the first fit has been found. @<Declare procedures for preprocessing hyph...@>= void first_fit(trie_pointer @!p) /*packs a family into |trie|*/ {@+ trie_pointer h; /*candidate for |trie_ref[p]|*/ trie_pointer @!z; /*runs through holes*/ trie_pointer @!q; /*runs through the family starting at |p|*/ ASCII_code @!c; /*smallest character in the family*/ trie_pointer @!l, @!r; /*left and right neighbors*/ uint16_t @!ll; /*upper limit of |trie_min| updating*/ c=so(trie_c[p]); z=trie_min[c]; /*get the first conceivably good hole*/ loop@+{@+h=z-c;@/ @<Ensure that |trie_max>=h+256|@>; if (trie_taken[h]) goto not_found; @<If all characters of the family fit relative to |h|, then |goto found|,\30\ otherwise |goto not_found|@>; not_found: z=trie_link(z); /*move to the next hole*/ } found: @<Pack the family into |trie| relative to |h|@>; } @ By making sure that |trie_max| is at least |h+256|, we can be sure that |trie_max > z|, since |h==z-c|. It follows that location |trie_max| will never be occupied in |trie|, and we will have |trie_max >= trie_link(z)|. @<Ensure that |trie_max>=h+256|@>= if (trie_max < h+256) {@+if (trie_size <= h+256) overflow("pattern memory", trie_size); @:TeX capacity exceeded pattern memory}{\quad pattern memory@> @/do@+{incr(trie_max);trie_taken[trie_max]=false; trie_link(trie_max)=trie_max+1;trie_back(trie_max)=trie_max-1; }@+ while (!(trie_max==h+256)); } @ @<If all characters of the family fit relative to |h|...@>= q=trie_r[p]; while (q > 0) {@+if (trie_link(h+so(trie_c[q]))==0) goto not_found; q=trie_r[q]; } goto found @ @<Pack the family into |trie| relative to |h|@>= trie_taken[h]=true;trie_ref[p]=h;q=p; @/do@+{z=h+so(trie_c[q]);l=trie_back(z);r=trie_link(z); trie_back(r)=l;trie_link(l)=r;trie_link(z)=0; if (l < 256) {@+if (z < 256) ll=z;@+else ll=256; @/do@+{trie_min[l]=r;incr(l); }@+ while (!(l==ll)); } q=trie_r[q]; }@+ while (!(q==0)) @ To pack the entire linked trie, we use the following recursive procedure. @^recursion@> @<Declare procedures for preprocessing hyph...@>= void trie_pack(trie_pointer @!p) /*pack subtries of a family*/ {@+trie_pointer q; /*a local variable that need not be saved on recursive calls*/ @/do@+{q=trie_l[p]; if ((q > 0)&&(trie_ref[q]==0)) {@+first_fit(q);trie_pack(q); } p=trie_r[p]; }@+ while (!(p==0)); } @ When the whole trie has been allocated into the sequential table, we must go through it once again so that |trie| contains the correct information. Null pointers in the linked trie will be represented by the value~0, which properly implements an ``empty'' family. @<Move the data into |trie|@>= h.rh=0;h.b0=min_quarterword;h.b1=min_quarterword; /*|trie_link=0|, |trie_op=min_quarterword|, |trie_char=qi(0)|*/ if (trie_root==0) /*no patterns were given*/ {@+for (r=0; r<=256; r++) trie[r]=h; trie_max=256; } else{@+trie_fix(trie_root); /*this fixes the non-holes in |trie|*/ r=0; /*now we will zero out all the holes*/ @/do@+{s=trie_link(r);trie[r]=h;r=s; }@+ while (!(r > trie_max)); } trie_char(0)=qi('?'); /*make |trie_char(c)!=c| for all |c|*/ @ The fixing-up procedure is, of course, recursive. Since the linked trie usually has overlapping subtries, the same data may be moved several times; but that causes no harm, and at most as much work is done as it took to build the uncompressed trie. @^recursion@> @<Declare procedures for preprocessing hyph...@>= void trie_fix(trie_pointer @!p) /*moves |p| and its siblings into |trie|*/ {@+trie_pointer q; /*a local variable that need not be saved on recursive calls*/ ASCII_code @!c; /*another one that need not be saved*/ trie_pointer @!z; /*|trie| reference; this local variable must be saved*/ z=trie_ref[p]; @/do@+{q=trie_l[p];c=so(trie_c[p]); trie_link(z+c)=trie_ref[q];trie_char(z+c)=qi(c);trie_op(z+c)=trie_o[p]; if (q > 0) trie_fix(q); p=trie_r[p]; }@+ while (!(p==0)); } @ Now let's go back to the easier problem, of building the linked trie. When \.{INITEX} has scanned the `\.{\\patterns}' control sequence, it calls on |new_patterns| to do the right thing. @<Declare procedures for preprocessing hyph...@>= void new_patterns(void) /*initializes the hyphenation pattern data*/ {@+ uint8_t k, @!l; /*indices into |hc| and |hyf|; not always in |small_number| range*/ bool @!digit_sensed; /*should the next digit be treated as a letter?*/ quarterword @!v; /*trie op code*/ trie_pointer @!p, @!q; /*nodes of trie traversed during insertion*/ bool @!first_child; /*is |p==trie_l[q]|?*/ ASCII_code @!c; /*character being inserted*/ if (trie_not_ready) {@+set_cur_lang;scan_left_brace(); /*a left brace must follow \.{\\patterns}*/ @<Enter all of the patterns into a linked trie, until coming to a right brace@>; } else{@+print_err("Too late for ");print_esc(@[@<|"patterns"|@>@]); help1("All patterns must be given before typesetting begins."); error();link(garbage)=scan_toks(false, false);flush_list(def_ref); } } @ Novices are not supposed to be using \.{\\patterns}, so the error messages are terse. (Note that all error messages appear in \TeX's string pool, even if they are used only by \.{INITEX}.) @<Enter all of the patterns into a linked trie...@>= k=0;hyf[0]=0;digit_sensed=false; loop@+{@+get_x_token(); switch (cur_cmd) { case letter: case other_char: @<Append a new letter or a hyphen level@>@;@+break; case spacer: case right_brace: {@+if (k > 0) @<Insert a new pattern into the linked trie@>; if (cur_cmd==right_brace) goto done; k=0;hyf[0]=0;digit_sensed=false; } @+break; default:{@+print_err("Bad ");print_esc(@[@<|"patterns"|@>@]); @.Bad \\patterns@> help1("(See Appendix H.)");error(); } } } done: @ @<Append a new letter or a hyphen level@>= if (digit_sensed||(cur_chr < '0')||(cur_chr > '9')) {@+if (cur_chr=='.') cur_chr=0; /*edge-of-word delimiter*/ else{@+cur_chr=lc_code(cur_chr); if (cur_chr==0) {@+print_err("Nonletter"); @.Nonletter@> help1("(See Appendix H.)");error(); } } if (k < 63) {@+incr(k);hc[k]=cur_chr;hyf[k]=0;digit_sensed=false; } } else if (k < 63) {@+hyf[k]=cur_chr-'0';digit_sensed=true; } @ When the following code comes into play, the pattern $p_1\ldots p_k$ appears in |hc[1 dotdot k]|, and the corresponding sequence of numbers $n_0\ldots n_k$ appears in |hyf[0 dotdot k]|. @<Insert a new pattern into the linked trie@>= {@+@<Compute the trie op code, |v|, and set |l:=0|@>; q=0;hc[0]=cur_lang; while (l <= k) {@+c=hc[l];incr(l);p=trie_l[q];first_child=true; while ((p > 0)&&(c > so(trie_c[p]))) {@+q=p;p=trie_r[q];first_child=false; } if ((p==0)||(c < so(trie_c[p]))) @<Insert a new trie node between |q| and |p|, and make |p| point to it@>; q=p; /*now node |q| represents $p_1\ldots p_{l-1}$*/ } if (trie_o[q]!=min_quarterword) {@+print_err("Duplicate pattern"); @.Duplicate pattern@> help1("(See Appendix H.)");error(); } trie_o[q]=v; } @ @<Insert a new trie node between |q| and |p|...@>= {@+if (trie_ptr==trie_size) overflow("pattern memory", trie_size); @:TeX capacity exceeded pattern memory}{\quad pattern memory@> incr(trie_ptr);trie_r[trie_ptr]=p;p=trie_ptr;trie_l[p]=0; if (first_child) trie_l[q]=p;@+else trie_r[q]=p; trie_c[p]=si(c);trie_o[p]=min_quarterword; } @ @<Compute the trie op code, |v|...@>= if (hc[1]==0) hyf[0]=0; if (hc[k]==0) hyf[k]=0; l=k;v=min_quarterword; loop@+{@+if (hyf[l]!=0) v=new_trie_op(k-l, hyf[l], v); if (l > 0) decr(l);@+else goto done1; } done1: @ Finally we put everything together: Here is how the trie gets to its final, efficient form. The following packing routine is rigged so that the root of the linked tree gets mapped into location 1 of |trie|, as required by the hyphenation algorithm. This happens because the first call of |first_fit| will ``take'' location~1. @<Declare procedures for preprocessing hyphenation patterns@>= void init_trie(void) {@+trie_pointer @!p; /*pointer for initialization*/ int @!j, @!k, @!t; /*all-purpose registers for initialization*/ trie_pointer @!r, @!s; /*used to clean up the packed |trie|*/ two_halves @!h; /*template used to zero out |trie|'s holes*/ @<Get ready to compress the trie@>; if (trie_root!=0) {@+first_fit(trie_root);trie_pack(trie_root); } @<Move the data into |trie|@>; trie_not_ready=false; } @* Breaking vertical lists into pages. The |vsplit| procedure, which implements \TeX's \.{\\vsplit} operation, is considerably simpler than |line_break| because it doesn't have to worry about hyphenation, and because its mission is to discover a single break instead of an optimum sequence of breakpoints. But before we get into the details of |vsplit|, we need to consider a few more basic things. @ A subroutine called |prune_page_top| takes a pointer to a vlist and returns a pointer to a modified vlist in which all glue, kern, and penalty nodes have been deleted before the first box or rule node. However, the first box or rule is actually preceded by a newly created glue node designed so that the topmost baseline will be at distance |split_top_skip| from the top, whenever this is possible without backspacing. In this routine and those that follow, we make use of the fact that a vertical list contains no character nodes, hence the |type| field exists for each node in the list. @^data structure assumptions@> @p pointer prune_page_top(pointer @!p) /*adjust top after page break*/ {@+pointer prev_p; /*lags one step behind |p|*/ pointer @!q; /*temporary variable for list manipulation*/ prev_p=temp_head;link(temp_head)=p; while (p!=null) switch (type(p)) { case hlist_node: case vlist_node: case rule_node: @<Insert glue for |split_top_skip| and set~|p:=null|@>@;@+break; case whatsit_node: case mark_node: case ins_node: {@+prev_p=p;p=link(prev_p); } @+break; case glue_node: case kern_node: case penalty_node: {@+q=p;p=link(q);link(q)=null; link(prev_p)=p;flush_node_list(q); } @+break; default:confusion(@[@<|"pruning"|@>@]); @:this can't happen pruning}{\quad pruning@> } return link(temp_head); } @ @<Insert glue for |split_top_skip|...@>= {@+q=new_skip_param(split_top_skip_code);link(prev_p)=q;link(q)=p; /*now |temp_ptr==glue_ptr(q)|*/ if (width(temp_ptr) > height(p)) width(temp_ptr)=width(temp_ptr)-height(p); else width(temp_ptr)=0; p=null; } @ The next subroutine finds the best place to break a given vertical list so as to obtain a box of height~|h|, with maximum depth~|d|. A pointer to the beginning of the vertical list is given, and a pointer to the optimum breakpoint is returned. The list is effectively followed by a forced break, i.e., a penalty node with the |eject_penalty|; if the best break occurs at this artificial node, the value |null| is returned. An array of six |scaled| distances is used to keep track of the height from the beginning of the list to the current place, just as in |line_break|. In fact, we use one of the same arrays, only changing its name to reflect its new significance. @d active_height active_width /*new name for the six distance variables*/ @d cur_height active_height[1] /*the natural height*/ @d set_height_zero(X) active_height[X]=0 /*initialize the height to zero*/ @# @p pointer vert_break(pointer @!p, scaled @!h, scaled @!d) /*finds optimum page break*/ {@+ pointer prev_p; /*if |p| is a glue node, |type(prev_p)| determines whether |p| is a legal breakpoint*/ pointer @!q, @!r; /*glue specifications*/ int @!pi; /*penalty value*/ int @!b; /*badness at a trial breakpoint*/ int @!least_cost; /*the smallest badness plus penalties found so far*/ pointer @!best_place; /*the most recent break that leads to |least_cost|*/ scaled @!prev_dp; /*depth of previous box in the list*/ small_number @!t; /*|type| of the node following a kern*/ prev_p=p; /*an initial glue node is not a legal breakpoint*/ least_cost=awful_bad;do_all_six(set_height_zero);prev_dp=0; loop@+{@+@<If node |p| is a legal breakpoint, check if this break is the best known, and |goto done| if |p| is null or if the page-so-far is already too full to accept more stuff@>; prev_p=p;p=link(prev_p); } done: return best_place; } @ A global variable |best_height_plus_depth| will be set to the natural size of the box that corresponds to the optimum breakpoint found by |vert_break|. (This value is used by the insertion-splitting algorithm of the page builder.) @<Glob...@>= scaled @!best_height_plus_depth; /*height of the best box, without stretching or shrinking*/ @ A subtle point to be noted here is that the maximum depth~|d| might be negative, so |cur_height| and |prev_dp| might need to be corrected even after a glue or kern node. @<If node |p| is a legal breakpoint, check...@>= if (p==null) pi=eject_penalty; else@<Use node |p| to update the current height and depth measurements; if this node is not a legal breakpoint, |goto not_found| or |update_heights|, otherwise set |pi| to the associated penalty at the break@>; @<Check if node |p| is a new champion breakpoint; then \(go)|goto done| if |p| is a forced break or if the page-so-far is already too full@>; if ((type(p) < glue_node)||(type(p) > kern_node)) goto not_found; update_heights: @<Update the current height and depth measurements with respect to a glue or kern node~|p|@>; not_found: if (prev_dp > d) {@+cur_height=cur_height+prev_dp-d; prev_dp=d; } @ @<Use node |p| to update the current height and depth measurements...@>= switch (type(p)) { case hlist_node: case vlist_node: case rule_node: {@+@t@>@;@/ cur_height=cur_height+prev_dp+height(p);prev_dp=depth(p); goto not_found; } case whatsit_node: @<Process whatsit |p| in |vert_break| loop, |goto not_found|@>; case glue_node: if (precedes_break(prev_p)) pi=0; else goto update_heights;@+break; case kern_node: {@+if (link(p)==null) t=penalty_node; else t=type(link(p)); if (t==glue_node) pi=0;@+else goto update_heights; } @+break; case penalty_node: pi=penalty(p);@+break; case mark_node: case ins_node: goto not_found; default:confusion(@[@<|"vertbreak"|@>@]); @:this can't happen vertbreak}{\quad vertbreak@> } @ @d deplorable 100000 /*more than |inf_bad|, but less than |awful_bad|*/ @<Check if node |p| is a new champion breakpoint; then \(go)...@>= if (pi < inf_penalty) {@+@<Compute the badness, |b|, using |awful_bad| if the box is too full@>; if (b < awful_bad) if (pi <= eject_penalty) b=pi; else if (b < inf_bad) b=b+pi; else b=deplorable; if (b <= least_cost) {@+best_place=p;least_cost=b; best_height_plus_depth=cur_height+prev_dp; } if ((b==awful_bad)||(pi <= eject_penalty)) goto done; } @ @<Compute the badness, |b|, using |awful_bad| if the box is too full@>= if (cur_height < h) if ((active_height[3]!=0)||(active_height[4]!=0)|| (active_height[5]!=0)) b=0; else b=badness(h-cur_height, active_height[2]); else if (cur_height-h > active_height[6]) b=awful_bad; else b=badness(cur_height-h, active_height[6]) @ Vertical lists that are subject to the |vert_break| procedure should not contain infinite shrinkability, since that would permit any amount of information to ``fit'' on one page. @<Update the current height and depth measurements with...@>= if (type(p)==kern_node) q=p; else{@+q=glue_ptr(p); active_height[2+stretch_order(q)]=@| active_height[2+stretch_order(q)]+stretch(q);@/ active_height[6]=active_height[6]+shrink(q); if ((shrink_order(q)!=normal)&&(shrink(q)!=0)) {@+@t@>@;@/ print_err("Infinite glue shrinkage found in box being split");@/ @.Infinite glue shrinkage...@> help4("The box you are \\vsplitting contains some infinitely")@/ ("shrinkable glue, e.g., `\\vss' or `\\vskip 0pt minus 1fil'.")@/ ("Such glue doesn't belong there; but you can safely proceed,")@/ ("since the offensive shrinkability has been made finite."); error();r=new_spec(q);shrink_order(r)=normal;delete_glue_ref(q); glue_ptr(p)=r;q=r; } } cur_height=cur_height+prev_dp+width(q);prev_dp=0 @ Now we are ready to consider |vsplit| itself. Most of its work is accomplished by the two subroutines that we have just considered. Given the number of a vlist box |n|, and given a desired page height |h|, the |vsplit| function finds the best initial segment of the vlist and returns a box for a page of height~|h|. The remainder of the vlist, if any, replaces the original box, after removing glue and penalties and adjusting for |split_top_skip|. Mark nodes in the split-off box are used to set the values of |split_first_mark| and |split_bot_mark|; we use the fact that |split_first_mark==null| if and only if |split_bot_mark==null|. The original box becomes ``void'' if and only if it has been entirely extracted. The extracted box is ``void'' if and only if the original box was void (or if it was, erroneously, an hlist box). @p pointer vsplit(eight_bits @!n, scaled @!h) /*extracts a page of height |h| from box |n|*/ {@+ pointer v; /*the box to be split*/ pointer p; /*runs through the vlist*/ pointer q; /*points to where the break occurs*/ v=box(n); if (split_first_mark!=null) {@+delete_token_ref(split_first_mark);split_first_mark=null; delete_token_ref(split_bot_mark);split_bot_mark=null; } @<Dispense with trivial cases of void or bad boxes@>; q=vert_break(list_ptr(v), h, split_max_depth); @<Look at all the marks in nodes before the break, and set the final link to |null| at the break@>; q=prune_page_top(q);p=list_ptr(v);free_node(v, box_node_size); if (q==null) box(n)=null; /*the |eq_level| of the box stays the same*/ else box(n)=vpack(q, natural); return vpackage(p, h, exactly, split_max_depth); } @ @<Dispense with trivial cases of void or bad boxes@>= if (v==null) {@+return null; } if (type(v)!=vlist_node) {@+print_err("");print_esc(@[@<|"vsplit"|@>@]);print_str(" needs a "); print_esc(@[@<|"vbox"|@>@]); @:vsplit_}{\.{\\vsplit needs a \\vbox}@> help2("The box you are trying to split is an \\hbox.")@/ ("I can't split such a box, so I'll leave it alone."); error();return null; } @ It's possible that the box begins with a penalty node that is the ``best'' break, so we must be careful to handle this special case correctly. @<Look at all the marks...@>= p=list_ptr(v); if (p==q) list_ptr(v)=null; else loop@+{@+if (type(p)==mark_node) if (split_first_mark==null) {@+split_first_mark=mark_ptr(p); split_bot_mark=split_first_mark; token_ref_count(split_first_mark)=@| token_ref_count(split_first_mark)+2; } else{@+delete_token_ref(split_bot_mark); split_bot_mark=mark_ptr(p); add_token_ref(split_bot_mark); } if (link(p)==q) {@+link(p)=null;goto done; } p=link(p); } done: @* The page builder. When \TeX\ appends new material to its main vlist in vertical mode, it uses a method something like |vsplit| to decide where a page ends, except that the calculations are done ``on line'' as new items come in. The main complication in this process is that insertions must be put into their boxes and removed from the vlist, in a more-or-less optimum manner. We shall use the term ``current page'' for that part of the main vlist that is being considered as a candidate for being broken off and sent to the user's output routine. The current page starts at |link(page_head)|, and it ends at |page_tail|. We have |page_head==page_tail| if this list is empty. @^current page@> Utter chaos would reign if the user kept changing page specifications while a page is being constructed, so the page builder keeps the pertinent specifications frozen as soon as the page receives its first box or insertion. The global variable |page_contents| is |empty| when the current page contains only mark nodes and content-less whatsit nodes; it is |inserts_only| if the page contains only insertion nodes in addition to marks and whatsits. Glue nodes, kern nodes, and penalty nodes are discarded until a box or rule node appears, at which time |page_contents| changes to |box_there|. As soon as |page_contents| becomes non-|empty|, the current |vsize| and |max_depth| are squirreled away into |page_goal| and |page_max_depth|; the latter values will be used until the page has been forwarded to the user's output routine. The \.{\\topskip} adjustment is made when |page_contents| changes to |box_there|. Although |page_goal| starts out equal to |vsize|, it is decreased by the scaled natural height-plus-depth of the insertions considered so far, and by the \.{\\skip} corrections for those insertions. Therefore it represents the size into which the non-inserted material should fit, assuming that all insertions in the current page have been made. The global variables |best_page_break| and |least_page_cost| correspond respectively to the local variables |best_place| and |least_cost| in the |vert_break| routine that we have already studied; i.e., they record the location and value of the best place currently known for breaking the current page. The value of |page_goal| at the time of the best break is stored in |best_size|. @d inserts_only 1 /*|page_contents| when an insert node has been contributed, but no boxes*/ @d box_there 2 /*|page_contents| when a box or rule has been contributed*/ @<Glob...@>= pointer @!page_tail; /*the final node on the current page*/ uint8_t @!page_contents; /*what is on the current page so far?*/ scaled @!page_max_depth; /*maximum box depth on page being built*/ pointer @!best_page_break; /*break here to get the best page known so far*/ int @!least_page_cost; /*the score for this currently best page*/ scaled @!best_size; /*its |page_goal|*/ @ The page builder has another data structure to keep track of insertions. This is a list of four-word nodes, starting and ending at |page_ins_head|. That is, the first element of the list is node |r@t$_1$@>==link(page_ins_head)|; node $r_j$ is followed by |r@t$_{j+1}$@>==link(r@t$_j$@>)|; and if there are |n| items we have |r@t$_{n+1}$@>==page_ins_head|. The |subtype| field of each node in this list refers to an insertion number; for example, `\.{\\insert 250}' would correspond to a node whose |subtype| is |qi(250)| (the same as the |subtype| field of the relevant |ins_node|). These |subtype| fields are in increasing order, and |subtype(page_ins_head)== qi(255)|, so |page_ins_head| serves as a convenient sentinel at the end of the list. A record is present for each insertion number that appears in the current page. The |type| field in these nodes distinguishes two possibilities that might occur as we look ahead before deciding on the optimum page break. If |type(r)==inserting|, then |height(r)| contains the total of the height-plus-depth dimensions of the box and all its inserts seen so far. If |type(r)==split_up|, then no more insertions will be made into this box, because at least one previous insertion was too big to fit on the current page; |broken_ptr(r)| points to the node where that insertion will be split, if \TeX\ decides to split it, |broken_ins(r)| points to the insertion node that was tentatively split, and |height(r)| includes also the natural height plus depth of the part that would be split off. In both cases, |last_ins_ptr(r)| points to the last |ins_node| encountered for box |qo(subtype(r))| that would be at least partially inserted on the next page; and |best_ins_ptr(r)| points to the last such |ins_node| that should actually be inserted, to get the page with minimum badness among all page breaks considered so far. We have |best_ins_ptr(r)==null| if and only if no insertion for this box should be made to produce this optimum page. The data structure definitions here use the fact that the |@!height| field appears in the fourth word of a box node. @^data structure assumptions@> @d page_ins_node_size 4 /*number of words for a page insertion node*/ @d inserting 0 /*an insertion class that has not yet overflowed*/ @d split_up 1 /*an overflowed insertion class*/ @d broken_ptr(X) link(X+1) /*an insertion for this class will break here if anywhere*/ @d broken_ins(X) info(X+1) /*this insertion might break at |broken_ptr|*/ @d last_ins_ptr(X) link(X+2) /*the most recent insertion for this |subtype|*/ @d best_ins_ptr(X) info(X+2) /*the optimum most recent insertion*/ @<Initialize the special list heads...@>= subtype(page_ins_head)=qi(255); type(page_ins_head)=split_up;link(page_ins_head)=page_ins_head; @ An array |page_so_far| records the heights and depths of everything on the current page. This array contains six |scaled| numbers, like the similar arrays already considered in |line_break| and |vert_break|; and it also contains |page_goal| and |page_depth|, since these values are all accessible to the user via |set_page_dimen| commands. The value of |page_so_far[1]| is also called |page_total|. The stretch and shrink components of the \.{\\skip} corrections for each insertion are included in |page_so_far|, but the natural space components of these corrections are not, since they have been subtracted from |page_goal|. The variable |page_depth| records the depth of the current page; it has been adjusted so that it is at most |page_max_depth|. The variable |last_glue| points to the glue specification of the most recent node contributed from the contribution list, if this was a glue node; otherwise |last_glue==max_halfword|. (If the contribution list is nonempty, however, the value of |last_glue| is not necessarily accurate.) The variables |last_penalty| and |last_kern| are similar. And finally, |insert_penalties| holds the sum of the penalties associated with all split and floating insertions. @d page_goal page_so_far[0] /*desired height of information on page being built*/ @d page_total page_so_far[1] /*height of the current page*/ @d page_shrink page_so_far[6] /*shrinkability of the current page*/ @d page_depth page_so_far[7] /*depth of the current page*/ @<Glob...@>= scaled @!page_so_far[8]; /*height and glue of the current page*/ pointer @!last_glue; /*used to implement \.{\\lastskip}*/ int @!last_penalty; /*used to implement \.{\\lastpenalty}*/ scaled @!last_kern; /*used to implement \.{\\lastkern}*/ int @!insert_penalties; /*sum of the penalties for held-over insertions*/ @ @<Put each...@>= primitive(@[@<|"pagegoal"|@>@], set_page_dimen, 0); @!@:page_goal_}{\.{\\pagegoal} primitive@> primitive(@[@<|"pagetotal"|@>@], set_page_dimen, 1); @!@:page_total_}{\.{\\pagetotal} primitive@> primitive(@[@<|"pagestretch"|@>@], set_page_dimen, 2); @!@:page_stretch_}{\.{\\pagestretch} primitive@> primitive(@[@<|"pagefilstretch"|@>@], set_page_dimen, 3); @!@:page_fil_stretch_}{\.{\\pagefilstretch} primitive@> primitive(@[@<|"pagefillstretch"|@>@], set_page_dimen, 4); @!@:page_fill_stretch_}{\.{\\pagefillstretch} primitive@> primitive(@[@<|"pagefilllstretch"|@>@], set_page_dimen, 5); @!@:page_filll_stretch_}{\.{\\pagefilllstretch} primitive@> primitive(@[@<|"pageshrink"|@>@], set_page_dimen, 6); @!@:page_shrink_}{\.{\\pageshrink} primitive@> primitive(@[@<|"pagedepth"|@>@], set_page_dimen, 7); @!@:page_depth_}{\.{\\pagedepth} primitive@> @ @<Cases of |print_cmd_chr|...@>= case set_page_dimen: switch (chr_code) { case 0: print_esc(@[@<|"pagegoal"|@>@]);@+break; case 1: print_esc(@[@<|"pagetotal"|@>@]);@+break; case 2: print_esc(@[@<|"pagestretch"|@>@]);@+break; case 3: print_esc(@[@<|"pagefilstretch"|@>@]);@+break; case 4: print_esc(@[@<|"pagefillstretch"|@>@]);@+break; case 5: print_esc(@[@<|"pagefilllstretch"|@>@]);@+break; case 6: print_esc(@[@<|"pageshrink"|@>@]);@+break; default:print_esc(@[@<|"pagedepth"|@>@]); } @+break; @ @d print_plus_end(X) print_str(X);@+} @d print_plus(X) if (page_so_far[X]!=0) {@+print_str(" plus ");print_scaled(page_so_far[X]);print_plus_end @p void print_totals(void) {@+print_scaled(page_total); print_plus(2)(""); print_plus(3)("fil"); print_plus(4)("fill"); print_plus(5)("filll"); if (page_shrink!=0) {@+print_str(" minus ");print_scaled(page_shrink); } } @ @<Show the status of the current page@>= if (page_head!=page_tail) {@+print_nl("### current page:"); if (output_active) print_str(" (held over for next output)"); @.held over for next output@> show_box(link(page_head)); if (page_contents > empty) {@+print_nl("total height ");print_totals(); @:total_height}{\.{total height}@> print_nl(" goal height ");print_scaled(page_goal); @.goal height@> r=link(page_ins_head); while (r!=page_ins_head) {@+print_ln();print_esc(@[@<|"insert"|@>@]);t=qo(subtype(r)); print_int(t);print_str(" adds "); if (count(t)==1000) t=height(r); else t=x_over_n(height(r), 1000)*count(t); print_scaled(t); if (type(r)==split_up) {@+q=page_head;t=0; @/do@+{q=link(q); if ((type(q)==ins_node)&&(subtype(q)==subtype(r))) incr(t); }@+ while (!(q==broken_ins(r))); print_str(", #");print_int(t);print_str(" might split"); } r=link(r); } } } @ Here is a procedure that is called when the |page_contents| is changing from |empty| to |inserts_only| or |box_there|. @d set_page_so_far_zero(X) page_so_far[X]=0 @p void freeze_page_specs(small_number @!s) {@+page_contents=s; page_goal=vsize;page_max_depth=max_depth; page_depth=0;do_all_six(set_page_so_far_zero); least_page_cost=awful_bad; #ifdef @!STAT if (tracing_pages > 0) {@+begin_diagnostic(); print_nl("%% goal height=");print_scaled(page_goal); @.goal height@> print_str(", max depth=");print_scaled(page_max_depth); end_diagnostic(false); } @; #endif @;@/ } @ Pages are built by appending nodes to the current list in \TeX's vertical mode, which is at the outermost level of the semantic nest. This vlist is split into two parts; the ``current page'' that we have been talking so much about already, and the ``contribution list'' that receives new nodes as they are created. The current page contains everything that the page builder has accounted for in its data structures, as described above, while the contribution list contains other things that have been generated by other parts of \TeX\ but have not yet been seen by the page builder. The contribution list starts at |link(contrib_head)|, and it ends at the current node in \TeX's vertical mode. When \TeX\ has appended new material in vertical mode, it calls the procedure |build_page|, which tries to catch up by moving nodes from the contribution list to the current page. This procedure will succeed in its goal of emptying the contribution list, unless a page break is discovered, i.e., unless the current page has grown to the point where the optimum next page break has been determined. In the latter case, the nodes after the optimum break will go back onto the contribution list, and control will effectively pass to the user's output routine. We make |type(page_head)==glue_node|, so that an initial glue node on the current page will not be considered a valid breakpoint. @<Initialize the special list...@>= type(page_head)=glue_node;subtype(page_head)=normal; @ The global variable |output_active| is true during the time the user's output routine is driving \TeX. @<Glob...@>= bool @!output_active; /*are we in the midst of an output routine?*/ @ @<Set init...@>= output_active=false;insert_penalties=0; @ The page builder is ready to start a fresh page if we initialize the following state variables. (However, the page insertion list is initialized elsewhere.) @<Start a new current page@>= page_contents=empty;page_tail=page_head;link(page_head)=null;@/ last_glue=max_halfword;last_penalty=0;last_kern=0; page_depth=0;page_max_depth=0 @ At certain times box 255 is supposed to be void (i.e., |null|), or an insertion box is supposed to be ready to accept a vertical list. If not, an error message is printed, and the following subroutine flushes the unwanted contents, reporting them to the user. @p void box_error(eight_bits @!n) {@+error();begin_diagnostic(); print_nl("The following box has been deleted:"); @.The following...deleted@> show_box(box(n));end_diagnostic(true); flush_node_list(box(n));box(n)=null; } @ The following procedure guarantees that a given box register does not contain an \.{\\hbox}. @p void ensure_vbox(eight_bits @!n) {@+pointer p; /*the box register contents*/ p=box(n); if (p!=null) if (type(p)==hlist_node) {@+print_err("Insertions can only be added to a vbox"); @.Insertions can only...@> help3("Tut tut: You're trying to \\insert into a")@/ ("\\box register that now contains an \\hbox.")@/ ("Proceed, and I'll discard its present contents."); box_error(n); } } @ \TeX\ is not always in vertical mode at the time |build_page| is called; the current mode reflects what \TeX\ should return to, after the contribution list has been emptied. A call on |build_page| should be immediately followed by `|goto big_switch|', which is \TeX's central control point. @p@t\4@>@<Declare the procedure called |fire_up|@>@;@/ void build_page(void) /*append contributions to the current page*/ {@+ pointer p; /*the node being appended*/ pointer @!q, @!r; /*nodes being examined*/ int @!b, @!c; /*badness and cost of current page*/ int @!pi; /*penalty to be added to the badness*/ uint8_t @!n; /*insertion box number*/ scaled @!delta, @!h, @!w; /*sizes used for insertion calculations*/ if ((link(contrib_head)==null)||output_active) return; @/do@+{resume: p=link(contrib_head);@/ @<Update the values of |last_glue|, |last_penalty|, and |last_kern|@>; @<Move node |p| to the current page; if it is time for a page break, put the nodes following the break back onto the contribution list, and |return| to the user's output routine if there is one@>; }@+ while (!(link(contrib_head)==null)); @<Make the contribution list empty by setting its tail to |contrib_head|@>; } @ @d contrib_tail nest[0].tail_field /*tail of the contribution list*/ @<Make the contribution list empty...@>= if (nest_ptr==0) tail=contrib_head; /*vertical mode*/ else contrib_tail=contrib_head /*other modes*/ @ @<Update the values of |last_glue|...@>= if (last_glue!=max_halfword) delete_glue_ref(last_glue); last_penalty=0;last_kern=0; if (type(p)==glue_node) {@+last_glue=glue_ptr(p);add_glue_ref(last_glue); } else{@+last_glue=max_halfword; if (type(p)==penalty_node) last_penalty=penalty(p); else if (type(p)==kern_node) last_kern=width(p); } @ The code here is an example of a many-way switch into routines that merge together in different places. Some people call this unstructured programming, but the author doesn't see much wrong with it, as long as @^Knuth, Donald Ervin@> the various labels have a well-understood meaning. @<Move node |p| to the current page;...@>= @<If the current page is empty and node |p| is to be deleted, |goto done1|; otherwise use node |p| to update the state of the current page; if this node is an insertion, |goto contribute|; otherwise if this node is not a legal breakpoint, |goto contribute| or |update_heights|; otherwise set |pi| to the penalty associated with this breakpoint@>; @<Check if node |p| is a new champion breakpoint; then \(if)if it is time for a page break, prepare for output, and either fire up the user's output routine and |return| or ship out the page and |goto done|@>; if ((type(p) < glue_node)||(type(p) > kern_node)) goto contribute; update_heights: @<Update the current page measurements with respect to the glue or kern specified by node~|p|@>; contribute: @<Make sure that |page_max_depth| is not exceeded@>; @<Link node |p| into the current page and |goto done|@>; done1: @<Recycle node |p|@>; done: @ @<Link node |p| into the current page and |goto done|@>= link(page_tail)=p;page_tail=p; link(contrib_head)=link(p);link(p)=null;goto done @ @<Recycle node |p|@>= link(contrib_head)=link(p);link(p)=null;flush_node_list(p) @ The title of this section is already so long, it seems best to avoid making it more accurate but still longer, by mentioning the fact that a kern node at the end of the contribution list will not be contributed until we know its successor. @<If the current page is empty...@>= switch (type(p)) { case hlist_node: case vlist_node: case rule_node: if (page_contents < box_there) @<Initialize the current page, insert the \.{\\topskip} glue ahead of |p|, and |goto continue|@>@; else@<Prepare to move a box or rule node to the current page, then |goto contribute|@>@;@+break; case whatsit_node: @<Prepare to move whatsit |p| to the current page, then |goto contribute|@>; case glue_node: if (page_contents < box_there) goto done1; else if (precedes_break(page_tail)) pi=0; else goto update_heights;@+break; case kern_node: if (page_contents < box_there) goto done1; else if (link(p)==null) return; else if (type(link(p))==glue_node) pi=0; else goto update_heights;@+break; case penalty_node: if (page_contents < box_there) goto done1;@+else pi=penalty(p);@+break; case mark_node: goto contribute; case ins_node: @<Append an insertion to the current page and |goto contribute|@>@; default:confusion(@[@<|"page"|@>@]); @:this can't happen page}{\quad page@> } @ @<Initialize the current page, insert the \.{\\topskip} glue...@>= {@+if (page_contents==empty) freeze_page_specs(box_there); else page_contents=box_there; q=new_skip_param(top_skip_code); /*now |temp_ptr==glue_ptr(q)|*/ if (width(temp_ptr) > height(p)) width(temp_ptr)=width(temp_ptr)-height(p); else width(temp_ptr)=0; link(q)=p;link(contrib_head)=q;goto resume; } @ @<Prepare to move a box or rule node to the current page...@>= {@+page_total=page_total+page_depth+height(p); page_depth=depth(p); goto contribute; } @ @<Make sure that |page_max_depth| is not exceeded@>= if (page_depth > page_max_depth) {@+page_total=@| page_total+page_depth-page_max_depth;@/ page_depth=page_max_depth; } @ @<Update the current page measurements with respect to the glue...@>= if (type(p)==kern_node) q=p; else{@+q=glue_ptr(p); page_so_far[2+stretch_order(q)]=@| page_so_far[2+stretch_order(q)]+stretch(q);@/ page_shrink=page_shrink+shrink(q); if ((shrink_order(q)!=normal)&&(shrink(q)!=0)) {@+@t@>@;@/ print_err("Infinite glue shrinkage found on current page");@/ @.Infinite glue shrinkage...@> help4("The page about to be output contains some infinitely")@/ ("shrinkable glue, e.g., `\\vss' or `\\vskip 0pt minus 1fil'.")@/ ("Such glue doesn't belong there; but you can safely proceed,")@/ ("since the offensive shrinkability has been made finite."); error(); r=new_spec(q);shrink_order(r)=normal;delete_glue_ref(q); glue_ptr(p)=r;q=r; } } page_total=page_total+page_depth+width(q);page_depth=0 @ @<Check if node |p| is a new champion breakpoint; then \(if)...@>= if (pi < inf_penalty) {@+@<Compute the badness, |b|, of the current page, using |awful_bad| if the box is too full@>; if (b < awful_bad) if (pi <= eject_penalty) c=pi; else if (b < inf_bad) c=b+pi+insert_penalties; else c=deplorable; else c=b; if (insert_penalties >= 10000) c=awful_bad; #ifdef @!STAT if (tracing_pages > 0) @<Display the page break cost@>; #endif @;@/ if (c <= least_page_cost) {@+best_page_break=p;best_size=page_goal; least_page_cost=c; r=link(page_ins_head); while (r!=page_ins_head) {@+best_ins_ptr(r)=last_ins_ptr(r); r=link(r); } } if ((c==awful_bad)||(pi <= eject_penalty)) {@+fire_up(p); /*output the current page at the best place*/ if (output_active) return; /*user's output routine will act*/ goto done; /*the page has been shipped out by default output routine*/ } } @ @<Display the page break cost@>= {@+begin_diagnostic();print_nl("%"); print_str(" t=");print_totals();@/ print_str(" g=");print_scaled(page_goal);@/ print_str(" b="); if (b==awful_bad) print_char('*');@+else print_int(b); @.*\relax@> print_str(" p=");print_int(pi); print_str(" c="); if (c==awful_bad) print_char('*');@+else print_int(c); if (c <= least_page_cost) print_char('#'); end_diagnostic(false); } @ @<Compute the badness, |b|, of the current page...@>= if (page_total < page_goal) if ((page_so_far[3]!=0)||(page_so_far[4]!=0)||@| (page_so_far[5]!=0)) b=0; else b=badness(page_goal-page_total, page_so_far[2]); else if (page_total-page_goal > page_shrink) b=awful_bad; else b=badness(page_total-page_goal, page_shrink) @ @<Append an insertion to the current page and |goto contribute|@>= {@+if (page_contents==empty) freeze_page_specs(inserts_only); n=subtype(p);r=page_ins_head; while (n >= subtype(link(r))) r=link(r); n=qo(n); if (subtype(r)!=qi(n)) @<Create a page insertion node with |subtype(r)=qi(n)|, and include the glue correction for box |n| in the current page state@>; if (type(r)==split_up) insert_penalties=insert_penalties+float_cost(p); else{@+last_ins_ptr(r)=p; delta=page_goal-page_total-page_depth+page_shrink; /*this much room is left if we shrink the maximum*/ if (count(n)==1000) h=height(p); else h=x_over_n(height(p), 1000)*count(n); /*this much room is needed*/ if (((h <= 0)||(h <= delta))&&(height(p)+height(r) <= dimen(n))) {@+page_goal=page_goal-h;height(r)=height(r)+height(p); } else@<Find the best way to split the insertion, and change |type(r)| to |split_up|@>; } goto contribute; } @ We take note of the value of \.{\\skip} |n| and the height plus depth of \.{\\box}~|n| only when the first \.{\\insert}~|n| node is encountered for a new page. A user who changes the contents of \.{\\box}~|n| after that first \.{\\insert}~|n| had better be either extremely careful or extremely lucky, or both. @<Create a page insertion node...@>= {@+q=get_node(page_ins_node_size);link(q)=link(r);link(r)=q;r=q; subtype(r)=qi(n);type(r)=inserting;ensure_vbox(n); if (box(n)==null) height(r)=0; else height(r)=height(box(n))+depth(box(n)); best_ins_ptr(r)=null;@/ q=skip(n); if (count(n)==1000) h=height(r); else h=x_over_n(height(r), 1000)*count(n); page_goal=page_goal-h-width(q);@/ page_so_far[2+stretch_order(q)]=@|page_so_far[2+stretch_order(q)]+stretch(q);@/ page_shrink=page_shrink+shrink(q); if ((shrink_order(q)!=normal)&&(shrink(q)!=0)) {@+print_err("Infinite glue shrinkage inserted from ");print_esc(@[@<|"skip"|@>@]); @.Infinite glue shrinkage...@> print_int(n); help3("The correction glue for page breaking with insertions")@/ ("must have finite shrinkability. But you may proceed,")@/ ("since the offensive shrinkability has been made finite."); error(); } } @ Here is the code that will split a long footnote between pages, in an emergency. The current situation deserves to be recapitulated: Node |p| is an insertion into box |n|; the insertion will not fit, in its entirety, either because it would make the total contents of box |n| greater than \.{\\dimen} |n|, or because it would make the incremental amount of growth |h| greater than the available space |delta|, or both. (This amount |h| has been weighted by the insertion scaling factor, i.e., by \.{\\count} |n| over 1000.) Now we will choose the best way to break the vlist of the insertion, using the same criteria as in the \.{\\vsplit} operation. @<Find the best way to split the insertion...@>= {@+if (count(n) <= 0) w=max_dimen; else{@+w=page_goal-page_total-page_depth; if (count(n)!=1000) w=x_over_n(w, count(n))*1000; } if (w > dimen(n)-height(r)) w=dimen(n)-height(r); q=vert_break(ins_ptr(p), w, depth(p)); height(r)=height(r)+best_height_plus_depth; #ifdef @!STAT if (tracing_pages > 0) @<Display the insertion split cost@>; #endif @;@/ if (count(n)!=1000) best_height_plus_depth=x_over_n(best_height_plus_depth, 1000)*count(n); page_goal=page_goal-best_height_plus_depth; type(r)=split_up;broken_ptr(r)=q;broken_ins(r)=p; if (q==null) insert_penalties=insert_penalties+eject_penalty; else if (type(q)==penalty_node) insert_penalties=insert_penalties+penalty(q); } @ @<Display the insertion split cost@>= {@+begin_diagnostic();print_nl("% split");print_int(n); @.split@> print_str(" to ");print_scaled(w); print_char(',');print_scaled(best_height_plus_depth);@/ print_str(" p="); if (q==null) print_int(eject_penalty); else if (type(q)==penalty_node) print_int(penalty(q)); else print_char('0'); end_diagnostic(false); } @ When the page builder has looked at as much material as could appear before the next page break, it makes its decision. The break that gave minimum badness will be used to put a completed ``page'' into box 255, with insertions appended to their other boxes. We also set the values of |top_mark|, |first_mark|, and |bot_mark|. The program uses the fact that |bot_mark!=null| implies |first_mark!=null|; it also knows that |bot_mark==null| implies |top_mark==first_mark==null|. The |fire_up| subroutine prepares to output the current page at the best place; then it fires up the user's output routine, if there is one, or it simply ships out the page. There is one parameter, |c|, which represents the node that was being contributed to the page when the decision to force an output was made. @<Declare the procedure called |fire_up|@>= void fire_up(pointer @!c) {@+ pointer p, @!q, @!r, @!s; /*nodes being examined and/or changed*/ pointer @!prev_p; /*predecessor of |p|*/ uint8_t @!n; /*insertion box number*/ bool @!wait; /*should the present insertion be held over?*/ int @!save_vbadness; /*saved value of |vbadness|*/ scaled @!save_vfuzz; /*saved value of |vfuzz|*/ pointer @!save_split_top_skip; /*saved value of |split_top_skip|*/ @<Set the value of |output_penalty|@>; if (bot_mark!=null) {@+if (top_mark!=null) delete_token_ref(top_mark); top_mark=bot_mark;add_token_ref(top_mark); delete_token_ref(first_mark);first_mark=null; } @<Put the \(o)optimal current page into box 255, update |first_mark| and |bot_mark|, append insertions to their boxes, and put the remaining nodes back on the contribution list@>; if ((top_mark!=null)&&(first_mark==null)) {@+first_mark=top_mark;add_token_ref(top_mark); } if (output_routine!=null) if (dead_cycles >= max_dead_cycles) @<Explain that too many dead cycles have occurred in a row@>@; else@<Fire up the user's output routine and |return|@>; @<Perform the default output routine@>; } @ @<Set the value of |output_penalty|@>= if (type(best_page_break)==penalty_node) {@+geq_word_define(int_base+output_penalty_code, penalty(best_page_break)); penalty(best_page_break)=inf_penalty; } else geq_word_define(int_base+output_penalty_code, inf_penalty) @ As the page is finally being prepared for output, pointer |p| runs through the vlist, with |prev_p| trailing behind; pointer |q| is the tail of a list of insertions that are being held over for a subsequent page. @<Put the \(o)optimal current page into box 255...@>= if (c==best_page_break) best_page_break=null; /*|c| not yet linked in*/ @<Ensure that box 255 is empty before output@>; insert_penalties=0; /*this will count the number of insertions held over*/ save_split_top_skip=split_top_skip; if (holding_inserts <= 0) @<Prepare all the boxes involved in insertions to act as queues@>; q=hold_head;link(q)=null;prev_p=page_head;p=link(prev_p); while (p!=best_page_break) {@+if (type(p)==ins_node) {@+if (holding_inserts <= 0) @<Either insert the material specified by node |p| into the appropriate box, or hold it for the next page; also delete node |p| from the current page@>; } else if (type(p)==mark_node) @<Update the values of |first_mark| and |bot_mark|@>; prev_p=p;p=link(prev_p); } split_top_skip=save_split_top_skip; @<Break the current page at node |p|, put it in box~255, and put the remaining nodes on the contribution list@>; @<Delete \(t)the page-insertion nodes@>@; @ @<Ensure that box 255 is empty before output@>= if (box(255)!=null) {@+print_err("");print_esc(@[@<|"box"|@>@]);print_str("255 is not void"); @:box255}{\.{\\box255 is not void}@> help2("You shouldn't use \\box255 except in \\output routines.")@/ ("Proceed, and I'll discard its present contents."); box_error(255); } @ @<Update the values of |first_mark| and |bot_mark|@>= {@+if (first_mark==null) {@+first_mark=mark_ptr(p); add_token_ref(first_mark); } if (bot_mark!=null) delete_token_ref(bot_mark); bot_mark=mark_ptr(p);add_token_ref(bot_mark); } @ When the following code is executed, the current page runs from node |link(page_head)| to node |prev_p|, and the nodes from |p| to |page_tail| are to be placed back at the front of the contribution list. Furthermore the heldover insertions appear in a list from |link(hold_head)| to |q|; we will put them into the current page list for safekeeping while the user's output routine is active. We might have |q==hold_head|; and |p==null| if and only if |prev_p==page_tail|. Error messages are suppressed within |vpackage|, since the box might appear to be overfull or underfull simply because the stretch and shrink from the \.{\\skip} registers for inserts are not actually present in the box. @<Break the current page at node |p|, put it...@>= if (p!=null) {@+if (link(contrib_head)==null) if (nest_ptr==0) tail=page_tail; else contrib_tail=page_tail; link(page_tail)=link(contrib_head); link(contrib_head)=p; link(prev_p)=null; } save_vbadness=vbadness;vbadness=inf_bad; save_vfuzz=vfuzz;vfuzz=max_dimen; /*inhibit error messages*/ box(255)=vpackage(link(page_head), best_size, exactly, page_max_depth); vbadness=save_vbadness;vfuzz=save_vfuzz; if (last_glue!=max_halfword) delete_glue_ref(last_glue); @<Start a new current page@>; /*this sets |last_glue=max_halfword|*/ if (q!=hold_head) {@+link(page_head)=link(hold_head);page_tail=q; } @ If many insertions are supposed to go into the same box, we want to know the position of the last node in that box, so that we don't need to waste time when linking further information into it. The |last_ins_ptr| fields of the page insertion nodes are therefore used for this purpose during the packaging phase. @<Prepare all the boxes involved in insertions to act as queues@>= {@+r=link(page_ins_head); while (r!=page_ins_head) {@+if (best_ins_ptr(r)!=null) {@+n=qo(subtype(r));ensure_vbox(n); if (box(n)==null) box(n)=new_null_box(); p=box(n)+list_offset; while (link(p)!=null) p=link(p); last_ins_ptr(r)=p; } r=link(r); } } @ @<Delete \(t)the page-insertion nodes@>= r=link(page_ins_head); while (r!=page_ins_head) {@+q=link(r);free_node(r, page_ins_node_size);r=q; } link(page_ins_head)=page_ins_head @ We will set |best_ins_ptr=null| and package the box corresponding to insertion node~|r|, just after making the final insertion into that box. If this final insertion is `|split_up|', the remainder after splitting and pruning (if any) will be carried over to the next page. @<Either insert the material specified by node |p| into...@>= {@+r=link(page_ins_head); while (subtype(r)!=subtype(p)) r=link(r); if (best_ins_ptr(r)==null) wait=true; else{@+wait=false;s=last_ins_ptr(r);link(s)=ins_ptr(p); if (best_ins_ptr(r)==p) @<Wrap up the box specified by node |r|, splitting node |p| if called for; set |wait:=true| if node |p| holds a remainder after splitting@>@; else{@+while (link(s)!=null) s=link(s); last_ins_ptr(r)=s; } } @<Either append the insertion node |p| after node |q|, and remove it from the current page, or delete |node(p)|@>; } @ @<Wrap up the box specified by node |r|, splitting node |p| if...@>= {@+if (type(r)==split_up) if ((broken_ins(r)==p)&&(broken_ptr(r)!=null)) {@+while (link(s)!=broken_ptr(r)) s=link(s); link(s)=null; split_top_skip=split_top_ptr(p); ins_ptr(p)=prune_page_top(broken_ptr(r)); if (ins_ptr(p)!=null) {@+temp_ptr=vpack(ins_ptr(p), natural); height(p)=height(temp_ptr)+depth(temp_ptr); free_node(temp_ptr, box_node_size);wait=true; } } best_ins_ptr(r)=null; n=qo(subtype(r)); temp_ptr=list_ptr(box(n)); free_node(box(n), box_node_size); box(n)=vpack(temp_ptr, natural); } @ @<Either append the insertion node |p|...@>= link(prev_p)=link(p);link(p)=null; if (wait) {@+link(q)=p;q=p;incr(insert_penalties); } else{@+delete_glue_ref(split_top_ptr(p)); free_node(p, ins_node_size); } p=prev_p @ The list of heldover insertions, running from |link(page_head)| to |page_tail|, must be moved to the contribution list when the user has specified no output routine. @<Perform the default output routine@>= {@+if (link(page_head)!=null) {@+if (link(contrib_head)==null) if (nest_ptr==0) tail=page_tail;@+else contrib_tail=page_tail; else link(page_tail)=link(contrib_head); link(contrib_head)=link(page_head); link(page_head)=null;page_tail=page_head; } ship_out(box(255));box(255)=null; } @ @<Explain that too many dead cycles have occurred in a row@>= {@+print_err("Output loop---");print_int(dead_cycles); @.Output loop...@> print_str(" consecutive dead cycles"); help3("I've concluded that your \\output is awry; it never does a")@/ ("\\shipout, so I'm shipping \\box255 out myself. Next time")@/ ("increase \\maxdeadcycles if you want me to be more patient!");error(); } @ @<Fire up the user's output routine and |return|@>= {@+output_active=true; incr(dead_cycles); push_nest();mode=-vmode;prev_depth=ignore_depth;mode_line=-line; begin_token_list(output_routine, output_text); new_save_level(output_group);normal_paragraph(); scan_left_brace(); return; } @ When the user's output routine finishes, it has constructed a vlist in internal vertical mode, and \TeX\ will do the following: @<Resume the page builder after an output routine has come to an end@>= {@+if ((loc!=null)|| ((token_type!=output_text)&&(token_type!=backed_up))) @<Recover from an unbalanced output routine@>; end_token_list(); /*conserve stack space in case more outputs are triggered*/ end_graf();unsave();output_active=false;insert_penalties=0;@/ @<Ensure that box 255 is empty after output@>; if (tail!=head) /*current list goes after heldover insertions*/ {@+link(page_tail)=link(head); page_tail=tail; } if (link(page_head)!=null) /*and both go before heldover contributions*/ {@+if (link(contrib_head)==null) contrib_tail=page_tail; link(page_tail)=link(contrib_head); link(contrib_head)=link(page_head); link(page_head)=null;page_tail=page_head; } pop_nest();build_page(); } @ @<Recover from an unbalanced output routine@>= {@+print_err("Unbalanced output routine"); @.Unbalanced output routine@> help2("Your sneaky output routine has problematic {'s and/or }'s.")@/ ("I can't handle that very well; good luck.");error(); @/do@+{get_token(); }@+ while (!(loc==null)); } /*loops forever if reading from a file, since |null==min_halfword <= 0|*/ @ @<Ensure that box 255 is empty after output@>= if (box(255)!=null) {@+print_err("Output routine didn't use all of "); print_esc(@[@<|"box"|@>@]);print_int(255); @.Output routine didn't use...@> help3("Your \\output commands should empty \\box255,")@/ ("e.g., by saying `\\shipout\\box255'.")@/ ("Proceed; I'll discard its present contents."); box_error(255); } @* The chief executive. We come now to the |main_control| routine, which contains the master switch that causes all the various pieces of \TeX\ to do their things, in the right order. In a sense, this is the grand climax of the program: It applies all the tools that we have worked so hard to construct. In another sense, this is the messiest part of the program: It necessarily refers to other pieces of code all over the place, so that a person can't fully understand what is going on without paging back and forth to be reminded of conventions that are defined elsewhere. We are now at the hub of the web, the central nervous system that touches most of the other parts and ties them together. @^brain@> The structure of |main_control| itself is quite simple. There's a label called |big_switch|, at which point the next token of input is fetched using |get_x_token|. Then the program branches at high speed into one of about 100 possible directions, based on the value of the current mode and the newly fetched command code; the sum |abs(mode)+cur_cmd| indicates what to do next. For example, the case `|vmode+letter|' arises when a letter occurs in vertical mode (or internal vertical mode); this case leads to instructions that initialize a new paragraph and enter horizontal mode. The big |case| statement that contains this multiway switch has been labeled |reswitch|, so that the program can |goto reswitch| when the next token has already been fetched. Most of the cases are quite short; they call an ``action procedure'' that does the work for that case, and then they either |goto reswitch| or they ``fall through'' to the end of the |case| statement, which returns control back to |big_switch|. Thus, |main_control| is not an extremely large procedure, in spite of the multiplicity of things it must do; it is small enough to be handled by \PASCAL\ compilers that put severe restrictions on procedure size. @!@^action procedure@> One case is singled out for special treatment, because it accounts for most of \TeX's activities in typical applications. The process of reading simple text and converting it into |char_node| records, while looking for ligatures and kerns, is part of \TeX's ``inner loop''; the whole program runs efficiently when its inner loop is fast, so this part has been written with particular care. @ We shall concentrate first on the inner loop of |main_control|, deferring consideration of the other cases until later. @^inner loop@> @p@t\4@>@<Declare action procedures for use by |main_control|@>@; @t\4@>@<Declare the procedure called |handle_right_brace|@>@; void main_control(void) /*governs \TeX's activities*/ {@+ int @!t; /*general-purpose temporary variable*/ if (every_job!=null) begin_token_list(every_job, every_job_text); big_switch: get_x_token();@/ reswitch: @<Give diagnostic information, if requested@>; switch (abs(mode)+cur_cmd) { case hmode+letter: case hmode+other_char: case hmode+char_given: goto main_loop; case hmode+char_num: {@+scan_char_num();cur_chr=cur_val;goto main_loop;@+} case hmode+no_boundary: {@+get_x_token(); if ((cur_cmd==letter)||(cur_cmd==other_char)||(cur_cmd==char_given)|| (cur_cmd==char_num)) cancel_boundary=true; goto reswitch; } case hmode+spacer: if (space_factor==1000) goto append_normal_space; else app_space();@+break; case hmode+ex_space: case mmode+ex_space: goto append_normal_space; @t\4@>@<Cases of |main_control| that are not part of the inner loop@>@; } /*of the big |case| statement*/ goto big_switch; main_loop: @<Append character |cur_chr| and the following characters (if~any) to the current hlist in the current font; |goto reswitch| when a non-character has been fetched@>; append_normal_space: @<Append a normal inter-word space to the current list, then |goto big_switch|@>; } @ When a new token has just been fetched at |big_switch|, we have an ideal place to monitor \TeX's activity. @^debugging@> @<Give diagnostic information, if requested@>= if (interrupt!=0) if (OK_to_interrupt) {@+back_input();check_interrupt;goto big_switch; } #ifdef @!DEBUG if (panicking) check_mem(false);@; #endif if (tracing_commands > 0) show_cur_cmd_chr() @ The following part of the program was first written in a structured manner, according to the philosophy that ``premature optimization is the root of all evil.'' Then it was rearranged into pieces of spaghetti so that the most common actions could proceed with little or no redundancy. The original unoptimized form of this algorithm resembles the |reconstitute| procedure, which was described earlier in connection with hyphenation. Again we have an implied ``cursor'' between characters |cur_l| and |cur_r|. The main difference is that the |lig_stack| can now contain a charnode as well as pseudo-ligatures; that stack is now usually nonempty, because the next character of input (if any) has been appended to it. In |main_control| we have $$|cur_r|=\cases{|character(lig_stack)|,&if |lig_stack > null|;\cr |font_bchar[cur_font]|,&otherwise;\cr}$$ except when |character(lig_stack)==font_false_bchar[cur_font]|. Several additional global variables are needed. @<Glob...@>= internal_font_number @!main_f; /*the current font*/ four_quarters @!main_i; /*character information bytes for |cur_l|*/ four_quarters @!main_j; /*ligature/kern command*/ font_index @!main_k; /*index into |font_info|*/ pointer @!main_p; /*temporary register for list manipulation*/ int @!main_s; /*space factor value*/ halfword @!bchar; /*right boundary character of current font, or |non_char|*/ halfword @!false_bchar; /*nonexistent character matching |bchar|, or |non_char|*/ bool @!cancel_boundary; /*should the left boundary be ignored?*/ bool @!ins_disc; /*should we insert a discretionary node?*/ @ The boolean variables of the main loop are normally false, and always reset to false before the loop is left. That saves us the extra work of initializing each time. @<Set init...@>= ligature_present=false;cancel_boundary=false;lft_hit=false;rt_hit=false; ins_disc=false; @ We leave the |space_factor| unchanged if |sf_code(cur_chr)==0|; otherwise we set it equal to |sf_code(cur_chr)|, except that it should never change from a value less than 1000 to a value exceeding 1000. The most common case is |sf_code(cur_chr)==1000|, so we want that case to be fast. The overall structure of the main loop is presented here. Some program labels are inside the individual sections. @^inner loop@> @d adjust_space_factor @t@>@;@/ main_s=sf_code(cur_chr); if (main_s==1000) space_factor=1000; else if (main_s < 1000) {@+if (main_s > 0) space_factor=main_s; } else if (space_factor < 1000) space_factor=1000; else space_factor=main_s @<Append character |cur_chr|...@>= adjust_space_factor;@/ main_f=cur_font; bchar=font_bchar[main_f];false_bchar=font_false_bchar[main_f]; if (mode > 0) if (language!=clang) fix_language(); fast_get_avail(lig_stack);font(lig_stack)=main_f;cur_l=qi(cur_chr); character(lig_stack)=cur_l;@/ cur_q=tail; if (cancel_boundary) {@+cancel_boundary=false;main_k=non_address; } else main_k=bchar_label[main_f]; if (main_k==non_address) goto main_loop_move2; /*no left boundary processing*/ cur_r=cur_l;cur_l=non_char; goto main_lig_loop1; /*begin with cursor after left boundary*/ @# main_loop_wrapup: @<Make a ligature node, if |ligature_present|; insert a null discretionary, if appropriate@>; main_loop_move: @<If the cursor is immediately followed by the right boundary, |goto reswitch|; if it's followed by an invalid character, |goto big_switch|; otherwise move the cursor one step to the right and |goto main_lig_loop|@>; main_loop_lookahead: @<Look ahead for another character, or leave |lig_stack| empty if there's none there@>; main_lig_loop: @<If there's a ligature/kern command relevant to |cur_l| and |cur_r|, adjust the text appropriately; exit to |main_loop_wrapup|@>; main_loop_move_lig: @<Move the cursor past a pseudo-ligature, then |goto main_loop_lookahead| or |main_lig_loop|@>@; @ If |link(cur_q)| is nonnull when |wrapup| is invoked, |cur_q| points to the list of characters that were consumed while building the ligature character~|cur_l|. A discretionary break is not inserted for an explicit hyphen when we are in restricted horizontal mode. In particular, this avoids putting discretionary nodes inside of other discretionaries. @^inner loop@> @d pack_lig(X) /*the parameter is either |rt_hit| or |false|*/ {@+main_p=new_ligature(main_f, cur_l, link(cur_q)); if (lft_hit) {@+subtype(main_p)=2;lft_hit=false; } if (X) if (lig_stack==null) {@+incr(subtype(main_p));rt_hit=false; } link(cur_q)=main_p;tail=main_p;ligature_present=false; } @d wrapup(X) if (cur_l < non_char) {@+if (link(cur_q) > null) if (character(tail)==qi(hyphen_char[main_f])) ins_disc=true; if (ligature_present) pack_lig(X); if (ins_disc) {@+ins_disc=false; if (mode > 0) tail_append(new_disc()); } } @<Make a ligature node, if |ligature_present|;...@>= wrapup(rt_hit) @ @<If the cursor is immediately followed by the right boundary...@>= @^inner loop@> if (lig_stack==null) goto reswitch; cur_q=tail;cur_l=character(lig_stack); main_loop_move1: if (!is_char_node(lig_stack)) goto main_loop_move_lig; main_loop_move2: if ((cur_chr < font_bc[main_f])||(cur_chr > font_ec[main_f])) {@+char_warning(main_f, cur_chr);free_avail(lig_stack);goto big_switch; } main_i=char_info(main_f)(cur_l); if (!char_exists(main_i)) {@+char_warning(main_f, cur_chr);free_avail(lig_stack);goto big_switch; } link(tail)=lig_stack;tail=lig_stack /*|main_loop_lookahead| is next*/ @ Here we are at |main_loop_move_lig|. When we begin this code we have |cur_q==tail| and |cur_l==character(lig_stack)|. @<Move the cursor past a pseudo-ligature...@>= main_p=lig_ptr(lig_stack); if (main_p > null) tail_append(main_p); /*append a single character*/ temp_ptr=lig_stack;lig_stack=link(temp_ptr); free_node(temp_ptr, small_node_size); main_i=char_info(main_f)(cur_l);ligature_present=true; if (lig_stack==null) if (main_p > null) goto main_loop_lookahead; else cur_r=bchar; else cur_r=character(lig_stack); goto main_lig_loop @ The result of \.{\\char} can participate in a ligature or kern, so we must look ahead for it. @<Look ahead for another character...@>= get_next(); /*set only |cur_cmd| and |cur_chr|, for speed*/ if (cur_cmd==letter) goto main_loop_lookahead1; if (cur_cmd==other_char) goto main_loop_lookahead1; if (cur_cmd==char_given) goto main_loop_lookahead1; x_token(); /*now expand and set |cur_cmd|, |cur_chr|, |cur_tok|*/ if (cur_cmd==letter) goto main_loop_lookahead1; if (cur_cmd==other_char) goto main_loop_lookahead1; if (cur_cmd==char_given) goto main_loop_lookahead1; if (cur_cmd==char_num) {@+scan_char_num();cur_chr=cur_val;goto main_loop_lookahead1; } if (cur_cmd==no_boundary) bchar=non_char; cur_r=bchar;lig_stack=null;goto main_lig_loop; main_loop_lookahead1: adjust_space_factor; fast_get_avail(lig_stack);font(lig_stack)=main_f; cur_r=qi(cur_chr);character(lig_stack)=cur_r; if (cur_r==false_bchar) cur_r=non_char /*this prevents spurious ligatures*/ @ Even though comparatively few characters have a lig/kern program, several of the instructions here count as part of \TeX's inner loop, since a @^inner loop@> potentially long sequential search must be performed. For example, tests with Computer Modern Roman showed that about 40 per cent of all characters actually encountered in practice had a lig/kern program, and that about four lig/kern commands were investigated for every such character. At the beginning of this code we have |main_i==char_info(main_f)(cur_l)|. @<If there's a ligature/kern command...@>= if (char_tag(main_i)!=lig_tag) goto main_loop_wrapup; if (cur_r==non_char) goto main_loop_wrapup; main_k=lig_kern_start(main_f)(main_i);main_j=font_info[main_k].qqqq; if (skip_byte(main_j) <= stop_flag) goto main_lig_loop2; main_k=lig_kern_restart(main_f)(main_j); main_lig_loop1: main_j=font_info[main_k].qqqq; main_lig_loop2: if (next_char(main_j)==cur_r) if (skip_byte(main_j) <= stop_flag) @<Do ligature or kern command, returning to |main_lig_loop| or |main_loop_wrapup| or |main_loop_move|@>; if (skip_byte(main_j)==qi(0)) incr(main_k); else{@+if (skip_byte(main_j) >= stop_flag) goto main_loop_wrapup; main_k=main_k+qo(skip_byte(main_j))+1; } goto main_lig_loop1 @ When a ligature or kern instruction matches a character, we know from |read_font_info| that the character exists in the font, even though we haven't verified its existence in the normal way. This section could be made into a subroutine, if the code inside |main_control| needs to be shortened. \chardef\?='174 % vertical line to indicate character retention @<Do ligature or kern command...@>= {@+if (op_byte(main_j) >= kern_flag) {@+wrapup(rt_hit); tail_append(new_kern(char_kern(main_f)(main_j)));goto main_loop_move; } if (cur_l==non_char) lft_hit=true; else if (lig_stack==null) rt_hit=true; check_interrupt; /*allow a way out in case there's an infinite ligature loop*/ switch (op_byte(main_j)) { qi(case 1): qi(case 5): {@+cur_l=rem_byte(main_j); /*\.{=:\?}, \.{=:\?>}*/ main_i=char_info(main_f)(cur_l);ligature_present=true; } @+break; qi(case 2): qi(case 6): {@+cur_r=rem_byte(main_j); /*\.{\?=:}, \.{\?=:>}*/ if (lig_stack==null) /*right boundary character is being consumed*/ {@+lig_stack=new_lig_item(cur_r);bchar=non_char; } else if (is_char_node(lig_stack)) /*|link(lig_stack)==null|*/ {@+main_p=lig_stack;lig_stack=new_lig_item(cur_r); lig_ptr(lig_stack)=main_p; } else character(lig_stack)=cur_r; } @+break; qi(case 3): {@+cur_r=rem_byte(main_j); /*\.{\?=:\?}*/ main_p=lig_stack;lig_stack=new_lig_item(cur_r); link(lig_stack)=main_p; } @+break; qi(case 7): qi(case 11): {@+wrapup(false); /*\.{\?=:\?>}, \.{\?=:\?>>}*/ cur_q=tail;cur_l=rem_byte(main_j); main_i=char_info(main_f)(cur_l);ligature_present=true; } @+break; default:{@+cur_l=rem_byte(main_j);ligature_present=true; /*\.{=:}*/ if (lig_stack==null) goto main_loop_wrapup; else goto main_loop_move1; } } if (op_byte(main_j) > qi(4)) if (op_byte(main_j)!=qi(7)) goto main_loop_wrapup; if (cur_l < non_char) goto main_lig_loop; main_k=bchar_label[main_f];goto main_lig_loop1; } @ The occurrence of blank spaces is almost part of \TeX's inner loop, @^inner loop@> since we usually encounter about one space for every five non-blank characters. Therefore |main_control| gives second-highest priority to ordinary spaces. When a glue parameter like \.{\\spaceskip} is set to `\.{0pt}', we will see to it later that the corresponding glue specification is precisely |zero_glue|, not merely a pointer to some specification that happens to be full of zeroes. Therefore it is simple to test whether a glue parameter is zero or~not. @<Append a normal inter-word space...@>= if (space_skip==zero_glue) {@+@<Find the glue specification, |main_p|, for text spaces in the current font@>; temp_ptr=new_glue(main_p); } else temp_ptr=new_param_glue(space_skip_code); link(tail)=temp_ptr;tail=temp_ptr; goto big_switch @ Having |font_glue| allocated for each text font saves both time and memory. If any of the three spacing parameters are subsequently changed by the use of \.{\\fontdimen}, the |find_font_dimen| procedure deallocates the |font_glue| specification allocated here. @<Find the glue specification...@>= {@+main_p=font_glue[cur_font]; if (main_p==null) {@+main_p=new_spec(zero_glue);main_k=param_base[cur_font]+space_code; width(main_p)=font_info[main_k].sc; /*that's |space(cur_font)|*/ stretch(main_p)=font_info[main_k+1].sc; /*and |space_stretch(cur_font)|*/ shrink(main_p)=font_info[main_k+2].sc; /*and |space_shrink(cur_font)|*/ font_glue[cur_font]=main_p; } } @ @<Declare act...@>= void app_space(void) /*handle spaces when |space_factor!=1000|*/ {@+pointer @!q; /*glue node*/ if ((space_factor >= 2000)&&(xspace_skip!=zero_glue)) q=new_param_glue(xspace_skip_code); else{@+if (space_skip!=zero_glue) main_p=space_skip; else@<Find the glue specification...@>; main_p=new_spec(main_p); @<Modify the glue specification in |main_p| according to the space factor@>; q=new_glue(main_p);glue_ref_count(main_p)=null; } link(tail)=q;tail=q; } @ @<Modify the glue specification in |main_p| according to the space factor@>= if (space_factor >= 2000) width(main_p)=width(main_p)+extra_space(cur_font); stretch(main_p)=xn_over_d(stretch(main_p), space_factor, 1000); shrink(main_p)=xn_over_d(shrink(main_p), 1000, space_factor) @ Whew---that covers the main loop. We can now proceed at a leisurely pace through the other combinations of possibilities. @d any_mode(X) case vmode+X: case hmode+X: case mmode+X /*for mode-independent commands*/ @<Cases of |main_control| that are not part of the inner loop@>= any_mode(relax): case vmode+spacer: case mmode+spacer: case mmode+no_boundary: do_nothing; any_mode(ignore_spaces): {@+@<Get the next non-blank non-call...@>; goto reswitch; } case vmode+stop: if (its_all_over()) return;@+break; /*this is the only way out*/ @t\4@>@<Forbidden cases detected in |main_control|@>@+@, any_mode(mac_param): report_illegal_case();@+break; @<Math-only cases in non-math modes, or vice versa@>: insert_dollar_sign();@+break; @t\4@>@<Cases of |main_control| that build boxes and lists@>@; @t\4@>@<Cases of |main_control| that don't depend on |mode|@>@; @t\4@>@<Cases of |main_control| that are for extensions to \TeX@>@; @ Here is a list of cases where the user has probably gotten into or out of math mode by mistake. \TeX\ will insert a dollar sign and rescan the current token. @d non_math(X) case vmode+X: case hmode+X @<Math-only cases in non-math modes...@>= non_math(sup_mark): non_math(sub_mark): non_math(math_char_num): non_math(math_given): non_math(math_comp): non_math(delim_num): non_math(left_right): non_math(above): non_math(radical): non_math(math_style): non_math(math_choice): non_math(vcenter): non_math(non_script): non_math(mkern): non_math(limit_switch): non_math(mskip): non_math(math_accent): case mmode+endv: case mmode+par_end: case mmode+stop: case mmode+vskip: case mmode+un_vbox: case mmode+valign: case mmode+hrule @ @<Declare action...@>= void insert_dollar_sign(void) {@+back_input();cur_tok=math_shift_token+'$'; print_err("Missing $ inserted"); @.Missing \$ inserted@> help2("I've inserted a begin-math/end-math symbol since I think")@/ ("you left one out. Proceed, with fingers crossed.");ins_error(); } @ When erroneous situations arise, \TeX\ usually issues an error message specific to the particular error. For example, `\.{\\noalign}' should not appear in any mode, since it is recognized by the |align_peek| routine in all of its legitimate appearances; a special error message is given when `\.{\\noalign}' occurs elsewhere. But sometimes the most appropriate error message is simply that the user is not allowed to do what he or she has attempted. For example, `\.{\\moveleft}' is allowed only in vertical mode, and `\.{\\lower}' only in non-vertical modes. Such cases are enumerated here and in the other sections referred to under `See also \dots.' @<Forbidden cases...@>= case vmode+vmove: case hmode+hmove: case mmode+hmove: any_mode(last_item): @ The `|you_cant|' procedure prints a line saying that the current command is illegal in the current mode; it identifies these things symbolically. @<Declare action...@>= void you_cant(void) {@+print_err("You can't use `"); @.You can't use x in y mode@> print_cmd_chr(cur_cmd, cur_chr); print_str("' in ");print_mode(mode); } @ @<Declare act...@>= void report_illegal_case(void) {@+you_cant(); help4("Sorry, but I'm not programmed to handle this case;")@/ ("I'll just pretend that you didn't ask for it.")@/ ("If you're in the wrong mode, you might be able to")@/ ("return to the right one by typing `I}' or `I$' or `I\\par'.");@/ error(); } @ Some operations are allowed only in privileged modes, i.e., in cases that |mode > 0|. The |privileged| function is used to detect violations of this rule; it issues an error message and returns |false| if the current |mode| is negative. @<Declare act...@>= bool privileged(void) {@+if (mode > 0) return true; else{@+report_illegal_case();return false; } } @ Either \.{\\dump} or \.{\\end} will cause |main_control| to enter the endgame, since both of them have `|stop|' as their command code. @<Put each...@>= primitive(@[@<|"end"|@>@], stop, 0);@/ @!@:end_}{\.{\\end} primitive@> primitive(@[@<|"dump"|@>@], stop, 1);@/ @!@:dump_}{\.{\\dump} primitive@> @ @<Cases of |print_cmd_chr|...@>= case stop: if (chr_code==1) print_esc(@[@<|"dump"|@>@]);@+else print_esc(@[@<|"end"|@>@]);@+break; @ We don't want to leave |main_control| immediately when a |stop| command is sensed, because it may be necessary to invoke an \.{\\output} routine several times before things really grind to a halt. (The output routine might even say `\.{\\gdef\\end\{...\}}', to prolong the life of the job.) Therefore |its_all_over| is |true| only when the current page and contribution list are empty, and when the last output was not a ``dead cycle.'' @<Declare act...@>= bool its_all_over(void) /*do this when \.{\\end} or \.{\\dump} occurs*/ {@+ if (privileged()) {@+if ((page_head==page_tail)&&(head==tail)&&(dead_cycles==0)) {@+return true; } back_input(); /*we will try to end again after ejecting residual material*/ tail_append(new_null_box()); width(tail)=hsize; tail_append(new_glue(fill_glue)); tail_append(new_penalty(-010000000000));@/ build_page(); /*append \.{\\hbox to \\hsize\{\}\\vfill\\penalty-'10000000000}*/ } return false; } @* Building boxes and lists. The most important parts of |main_control| are concerned with \TeX's chief mission of box-making. We need to control the activities that put entries on vlists and hlists, as well as the activities that convert those lists into boxes. All of the necessary machinery has already been developed; it remains for us to ``push the buttons'' at the right times. @ As an introduction to these routines, let's consider one of the simplest cases: What happens when `\.{\\hrule}' occurs in vertical mode, or `\.{\\vrule}' in horizontal mode or math mode? The code in |main_control| is short, since the |scan_rule_spec| routine already does most of what is required; thus, there is no need for a special action procedure. Note that baselineskip calculations are disabled after a rule in vertical mode, by setting |prev_depth=ignore_depth|. @<Cases of |main_control| that build...@>= case vmode+hrule: case hmode+vrule: case mmode+vrule: {@+tail_append(scan_rule_spec()); if (abs(mode)==vmode) prev_depth=ignore_depth; else if (abs(mode)==hmode) space_factor=1000; } @+break; @ The processing of things like \.{\\hskip} and \.{\\vskip} is slightly more complicated. But the code in |main_control| is very short, since it simply calls on the action routine |append_glue|. Similarly, \.{\\kern} activates |append_kern|. @<Cases of |main_control| that build...@>= case vmode+vskip: case hmode+hskip: case mmode+hskip: case mmode+mskip: append_glue();@+break; any_mode(kern): case mmode+mkern: append_kern();@+break; @ The |hskip| and |vskip| command codes are used for control sequences like \.{\\hss} and \.{\\vfil} as well as for \.{\\hskip} and \.{\\vskip}. The difference is in the value of |cur_chr|. @d fil_code 0 /*identifies \.{\\hfil} and \.{\\vfil}*/ @d fill_code 1 /*identifies \.{\\hfill} and \.{\\vfill}*/ @d ss_code 2 /*identifies \.{\\hss} and \.{\\vss}*/ @d fil_neg_code 3 /*identifies \.{\\hfilneg} and \.{\\vfilneg}*/ @d skip_code 4 /*identifies \.{\\hskip} and \.{\\vskip}*/ @d mskip_code 5 /*identifies \.{\\mskip}*/ @<Put each...@>= primitive(@[@<|"hskip"|@>@], hskip, skip_code);@/ @!@:hskip_}{\.{\\hskip} primitive@> primitive(@[@<|"hfil"|@>@], hskip, fil_code); @!@:hfil_}{\.{\\hfil} primitive@> primitive(@[@<|"hfill"|@>@], hskip, fill_code);@/ @!@:hfill_}{\.{\\hfill} primitive@> primitive(@[@<|"hss"|@>@], hskip, ss_code); @!@:hss_}{\.{\\hss} primitive@> primitive(@[@<|"hfilneg"|@>@], hskip, fil_neg_code);@/ @!@:hfil_neg_}{\.{\\hfilneg} primitive@> primitive(@[@<|"vskip"|@>@], vskip, skip_code);@/ @!@:vskip_}{\.{\\vskip} primitive@> primitive(@[@<|"vfil"|@>@], vskip, fil_code); @!@:vfil_}{\.{\\vfil} primitive@> primitive(@[@<|"vfill"|@>@], vskip, fill_code);@/ @!@:vfill_}{\.{\\vfill} primitive@> primitive(@[@<|"vss"|@>@], vskip, ss_code); @!@:vss_}{\.{\\vss} primitive@> primitive(@[@<|"vfilneg"|@>@], vskip, fil_neg_code);@/ @!@:vfil_neg_}{\.{\\vfilneg} primitive@> primitive(@[@<|"mskip"|@>@], mskip, mskip_code);@/ @!@:mskip_}{\.{\\mskip} primitive@> primitive(@[@<|"kern"|@>@], kern, explicit); @!@:kern_}{\.{\\kern} primitive@> primitive(@[@<|"mkern"|@>@], mkern, mu_glue);@/ @!@:mkern_}{\.{\\mkern} primitive@> @ @<Cases of |print_cmd_chr|...@>= case hskip: switch (chr_code) { case skip_code: print_esc(@[@<|"hskip"|@>@]);@+break; case fil_code: print_esc(@[@<|"hfil"|@>@]);@+break; case fill_code: print_esc(@[@<|"hfill"|@>@]);@+break; case ss_code: print_esc(@[@<|"hss"|@>@]);@+break; default:print_esc(@[@<|"hfilneg"|@>@]); } @+break; case vskip: switch (chr_code) { case skip_code: print_esc(@[@<|"vskip"|@>@]);@+break; case fil_code: print_esc(@[@<|"vfil"|@>@]);@+break; case fill_code: print_esc(@[@<|"vfill"|@>@]);@+break; case ss_code: print_esc(@[@<|"vss"|@>@]);@+break; default:print_esc(@[@<|"vfilneg"|@>@]); } @+break; case mskip: print_esc(@[@<|"mskip"|@>@]);@+break; case kern: print_esc(@[@<|"kern"|@>@]);@+break; case mkern: print_esc(@[@<|"mkern"|@>@]);@+break; @ All the work relating to glue creation has been relegated to the following subroutine. It does not call |build_page|, because it is used in at least one place where that would be a mistake. @<Declare action...@>= void append_glue(void) {@+small_number s; /*modifier of skip command*/ s=cur_chr; switch (s) { case fil_code: cur_val=fil_glue;@+break; case fill_code: cur_val=fill_glue;@+break; case ss_code: cur_val=ss_glue;@+break; case fil_neg_code: cur_val=fil_neg_glue;@+break; case skip_code: scan_glue(glue_val);@+break; case mskip_code: scan_glue(mu_val); } /*now |cur_val| points to the glue specification*/ tail_append(new_glue(cur_val)); if (s >= skip_code) {@+decr(glue_ref_count(cur_val)); if (s > skip_code) subtype(tail)=mu_glue; } } @ @<Declare act...@>= void append_kern(void) {@+quarterword s; /*|subtype| of the kern node*/ s=cur_chr;scan_dimen(s==mu_glue, false, false); tail_append(new_kern(cur_val));subtype(tail)=s; } @ Many of the actions related to box-making are triggered by the appearance of braces in the input. For example, when the user says `\.{\\hbox} \.{to} \.{100pt\{$\langle\,\hbox{hlist}\,\rangle$\}}' in vertical mode, the information about the box size (100pt, |exactly|) is put onto |save_stack| with a level boundary word just above it, and |cur_group=adjusted_hbox_group|; \TeX\ enters restricted horizontal mode to process the hlist. The right brace eventually causes |save_stack| to be restored to its former state, at which time the information about the box size (100pt, |exactly|) is available once again; a box is packaged and we leave restricted horizontal mode, appending the new box to the current list of the enclosing mode (in this case to the current list of vertical mode), followed by any vertical adjustments that were removed from the box by |hpack|. The next few sections of the program are therefore concerned with the treatment of left and right curly braces. @ If a left brace occurs in the middle of a page or paragraph, it simply introduces a new level of grouping, and the matching right brace will not have such a drastic effect. Such grouping affects neither the mode nor the current list. @<Cases of |main_control| that build...@>= non_math(left_brace): new_save_level(simple_group);@+break; any_mode(begin_group): new_save_level(semi_simple_group);@+break; any_mode(end_group): if (cur_group==semi_simple_group) unsave(); else off_save();@+break; @ We have to deal with errors in which braces and such things are not properly nested. Sometimes the user makes an error of commission by inserting an extra symbol, but sometimes the user makes an error of omission. \TeX\ can't always tell one from the other, so it makes a guess and tries to avoid getting into a loop. The |off_save| routine is called when the current group code is wrong. It tries to insert something into the user's input that will help clean off the top level. @<Declare act...@>= void off_save(void) {@+pointer p; /*inserted token*/ if (cur_group==bottom_level) @<Drop current token and complain that it was unmatched@>@; else{@+back_input();p=get_avail();link(temp_head)=p; print_err("Missing "); @<Prepare to insert a token that matches |cur_group|, and print what it is@>; print_str(" inserted");ins_list(link(temp_head)); help5("I've inserted something that you may have forgotten.")@/ ("(See the <inserted text> above.)")@/ ("With luck, this will get me unwedged. But if you")@/ ("really didn't forget anything, try typing `2' now; then")@/ ("my insertion and my current dilemma will both disappear."); error(); } } @ At this point, |link(temp_head)==p|, a pointer to an empty one-word node. @<Prepare to insert a token that matches |cur_group|...@>= switch (cur_group) { case semi_simple_group: {@+info(p)=cs_token_flag+frozen_end_group; print_esc(@[@<|"endgroup"|@>@]); @.Missing \\endgroup inserted@> } @+break; case math_shift_group: {@+info(p)=math_shift_token+'$';print_char('$'); @.Missing \$ inserted@> } @+break; case math_left_group: {@+info(p)=cs_token_flag+frozen_right;link(p)=get_avail(); p=link(p);info(p)=other_token+'.';print_esc(@[@<|"right."|@>@]); @.Missing \\right\hbox{.} inserted@> @^null delimiter@> } @+break; default:{@+info(p)=right_brace_token+'}';print_char('}'); @.Missing \} inserted@> } } @ @<Drop current token and complain that it was unmatched@>= {@+print_err("Extra ");print_cmd_chr(cur_cmd, cur_chr); @.Extra x@> help1("Things are pretty mixed up, but I think the worst is over.");@/ error(); } @ The routine for a |right_brace| character branches into many subcases, since a variety of things may happen, depending on |cur_group|. Some types of groups are not supposed to be ended by a right brace; error messages are given in hopes of pinpointing the problem. Most branches of this routine will be filled in later, when we are ready to understand them; meanwhile, we must prepare ourselves to deal with such errors. @<Cases of |main_control| that build...@>= any_mode(right_brace): handle_right_brace();@+break; @ @<Declare the procedure called |handle_right_brace|@>= void handle_right_brace(void) {@+pointer p, @!q; /*for short-term use*/ scaled @!d; /*holds |split_max_depth| in |insert_group|*/ int @!f; /*holds |floating_penalty| in |insert_group|*/ switch (cur_group) { case simple_group: unsave();@+break; case bottom_level: {@+print_err("Too many }'s"); @.Too many \}'s@> help2("You've closed more groups than you opened.")@/ ("Such booboos are generally harmless, so keep going.");error(); } @+break; case semi_simple_group: case math_shift_group: case math_left_group: extra_right_brace();@+break; @t\4@>@<Cases of |handle_right_brace| where a |right_brace| triggers a delayed action@>@; default:confusion(@[@<|"rightbrace"|@>@]); @:this can't happen rightbrace}{\quad rightbrace@> } } @ @<Declare act...@>= void extra_right_brace(void) {@+print_err("Extra }, or forgotten "); @.Extra \}, or forgotten x@> switch (cur_group) { case semi_simple_group: print_esc(@[@<|"endgroup"|@>@]);@+break; case math_shift_group: print_char('$');@+break; case math_left_group: print_esc(@[@<|"right"|@>@]); } @/ help5("I've deleted a group-closing symbol because it seems to be")@/ ("spurious, as in `$x}$'. But perhaps the } is legitimate and")@/ ("you forgot something else, as in `\\hbox{$x}'. In such cases")@/ ("the way to recover is to insert both the forgotten and the")@/ ("deleted material, e.g., by typing `I$}'.");error(); incr(align_state); } @ Here is where we clear the parameters that are supposed to revert to their default values after every paragraph and when internal vertical mode is entered. @<Declare act...@>= void normal_paragraph(void) {@+if (looseness!=0) eq_word_define(int_base+looseness_code, 0); if (hang_indent!=0) eq_word_define(dimen_base+hang_indent_code, 0); if (hang_after!=1) eq_word_define(int_base+hang_after_code, 1); if (par_shape_ptr!=null) eq_define(par_shape_loc, shape_ref, null); } @ Now let's turn to the question of how \.{\\hbox} is treated. We actually need to consider also a slightly larger context, since constructions like `\.{\\setbox3=}\penalty0\.{\\hbox...}' and `\.{\\leaders}\penalty0\.{\\hbox...}' and `\.{\\lower3.8pt\\hbox...}' are supposed to invoke quite different actions after the box has been packaged. Conversely, constructions like `\.{\\setbox3=}' can be followed by a variety of different kinds of boxes, and we would like to encode such things in an efficient way. In other words, there are two problems: to represent the context of a box, and to represent its type. The first problem is solved by putting a ``context code'' on the |save_stack|, just below the two entries that give the dimensions produced by |scan_spec|. The context code is either a (signed) shift amount, or it is a large integer | >= box_flag|, where |box_flag==@t$2^{30}$@>|. Codes |box_flag| through |box_flag+255| represent `\.{\\setbox0}' through `\.{\\setbox255}'; codes |box_flag+256| through |box_flag+511| represent `\.{\\global\\setbox0}' through `\.{\\global\\setbox255}'; code |box_flag+512| represents `\.{\\shipout}'; and codes |box_flag+513| through |box_flag+515| represent `\.{\\leaders}', `\.{\\cleaders}', and `\.{\\xleaders}'. The second problem is solved by giving the command code |make_box| to all control sequences that produce a box, and by using the following |chr_code| values to distinguish between them: |box_code|, |copy_code|, |last_box_code|, |vsplit_code|, |vtop_code|, |vtop_code+vmode|, and |vtop_code+hmode|, where the latter two are used to denote \.{\\vbox} and \.{\\hbox}, respectively. @d box_flag 010000000000 /*context code for `\.{\\setbox0}'*/ @d ship_out_flag box_flag+512 /*context code for `\.{\\shipout}'*/ @d leader_flag box_flag+513 /*context code for `\.{\\leaders}'*/ @d box_code 0 /*|chr_code| for `\.{\\box}'*/ @d copy_code 1 /*|chr_code| for `\.{\\copy}'*/ @d last_box_code 2 /*|chr_code| for `\.{\\lastbox}'*/ @d vsplit_code 3 /*|chr_code| for `\.{\\vsplit}'*/ @d vtop_code 4 /*|chr_code| for `\.{\\vtop}'*/ @<Put each...@>= primitive(@[@<|"moveleft"|@>@], hmove, 1); @!@:move_left_}{\.{\\moveleft} primitive@> primitive(@[@<|"moveright"|@>@], hmove, 0);@/ @!@:move_right_}{\.{\\moveright} primitive@> primitive(@[@<|"raise"|@>@], vmove, 1); @!@:raise_}{\.{\\raise} primitive@> primitive(@[@<|"lower"|@>@], vmove, 0); @!@:lower_}{\.{\\lower} primitive@> @# primitive(@[@<|"box"|@>@], make_box, box_code); @!@:box_}{\.{\\box} primitive@> primitive(@[@<|"copy"|@>@], make_box, copy_code); @!@:copy_}{\.{\\copy} primitive@> primitive(@[@<|"lastbox"|@>@], make_box, last_box_code); @!@:last_box_}{\.{\\lastbox} primitive@> primitive(@[@<|"vsplit"|@>@], make_box, vsplit_code); @!@:vsplit_}{\.{\\vsplit} primitive@> primitive(@[@<|"vtop"|@>@], make_box, vtop_code);@/ @!@:vtop_}{\.{\\vtop} primitive@> primitive(@[@<|"vbox"|@>@], make_box, vtop_code+vmode); @!@:vbox_}{\.{\\vbox} primitive@> primitive(@[@<|"hbox"|@>@], make_box, vtop_code+hmode);@/ @!@:hbox_}{\.{\\hbox} primitive@> primitive(@[@<|"shipout"|@>@], leader_ship, a_leaders-1); /*|ship_out_flag==leader_flag-1|*/ @!@:ship_out_}{\.{\\shipout} primitive@> primitive(@[@<|"leaders"|@>@], leader_ship, a_leaders); @!@:leaders_}{\.{\\leaders} primitive@> primitive(@[@<|"cleaders"|@>@], leader_ship, c_leaders); @!@:c_leaders_}{\.{\\cleaders} primitive@> primitive(@[@<|"xleaders"|@>@], leader_ship, x_leaders); @!@:x_leaders_}{\.{\\xleaders} primitive@> @ @<Cases of |print_cmd_chr|...@>= case hmove: if (chr_code==1) print_esc(@[@<|"moveleft"|@>@]);@+else print_esc(@[@<|"moveright"|@>@]);@+break; case vmove: if (chr_code==1) print_esc(@[@<|"raise"|@>@]);@+else print_esc(@[@<|"lower"|@>@]);@+break; case make_box: switch (chr_code) { case box_code: print_esc(@[@<|"box"|@>@]);@+break; case copy_code: print_esc(@[@<|"copy"|@>@]);@+break; case last_box_code: print_esc(@[@<|"lastbox"|@>@]);@+break; case vsplit_code: print_esc(@[@<|"vsplit"|@>@]);@+break; case vtop_code: print_esc(@[@<|"vtop"|@>@]);@+break; case vtop_code+vmode: print_esc(@[@<|"vbox"|@>@]);@+break; default:print_esc(@[@<|"hbox"|@>@]); } @+break; case leader_ship: if (chr_code==a_leaders) print_esc(@[@<|"leaders"|@>@]); else if (chr_code==c_leaders) print_esc(@[@<|"cleaders"|@>@]); else if (chr_code==x_leaders) print_esc(@[@<|"xleaders"|@>@]); else print_esc(@[@<|"shipout"|@>@]);@+break; @ Constructions that require a box are started by calling |scan_box| with a specified context code. The |scan_box| routine verifies that a |make_box| command comes next and then it calls |begin_box|. @<Cases of |main_control| that build...@>= case vmode+hmove: case hmode+vmove: case mmode+vmove: {@+t=cur_chr; scan_normal_dimen; if (t==0) scan_box(cur_val);@+else scan_box(-cur_val); } @+break; any_mode(leader_ship): scan_box(leader_flag-a_leaders+cur_chr);@+break; any_mode(make_box): begin_box(0);@+break; @ The global variable |cur_box| will point to a newly made box. If the box is void, we will have |cur_box==null|. Otherwise we will have |type(cur_box)==hlist_node| or |vlist_node| or |rule_node|; the |rule_node| case can occur only with leaders. @<Glob...@>= pointer @!cur_box; /*box to be placed into its context*/ @ The |box_end| procedure does the right thing with |cur_box|, if |box_context| represents the context as explained above. @<Declare act...@>= void box_end(int @!box_context) {@+pointer p; /*|ord_noad| for new box in math mode*/ if (box_context < box_flag) @<Append box |cur_box| to the current list, shifted by |box_context|@>@; else if (box_context < ship_out_flag) @<Store \(c)|cur_box| in a box register@>; else if (cur_box!=null) if (box_context > ship_out_flag) @<Append a new leader node that uses |cur_box|@>@; else ship_out(cur_box); } @ The global variable |adjust_tail| will be non-null if and only if the current box might include adjustments that should be appended to the current vertical list. @<Append box |cur_box| to the current...@>= {@+if (cur_box!=null) {@+shift_amount(cur_box)=box_context; if (abs(mode)==vmode) {@+append_to_vlist(cur_box); if (adjust_tail!=null) {@+if (adjust_head!=adjust_tail) {@+link(tail)=link(adjust_head);tail=adjust_tail; } adjust_tail=null; } if (mode > 0) build_page(); } else{@+if (abs(mode)==hmode) space_factor=1000; else{@+p=new_noad(); math_type(nucleus(p))=sub_box; info(nucleus(p))=cur_box;cur_box=p; } link(tail)=cur_box;tail=cur_box; } } } @ @<Store \(c)|cur_box| in a box register@>= if (box_context < box_flag+256) eq_define(box_base-box_flag+box_context, box_ref, cur_box); else geq_define(box_base-box_flag-256+box_context, box_ref, cur_box) @ @<Append a new leader node...@>= {@+@<Get the next non-blank non-relax...@>; if (((cur_cmd==hskip)&&(abs(mode)!=vmode))||@| ((cur_cmd==vskip)&&(abs(mode)==vmode))) {@+append_glue();subtype(tail)=box_context-(leader_flag-a_leaders); leader_ptr(tail)=cur_box; } else{@+print_err("Leaders not followed by proper glue"); @.Leaders not followed by...@> help3("You should say `\\leaders <box or rule><hskip or vskip>'.")@/ ("I found the <box or rule>, but there's no suitable")@/ ("<hskip or vskip>, so I'm ignoring these leaders.");back_error(); flush_node_list(cur_box); } } @ Now that we can see what eventually happens to boxes, we can consider the first steps in their creation. The |begin_box| routine is called when |box_context| is a context specification, |cur_chr| specifies the type of box desired, and |cur_cmd==make_box|. @<Declare act...@>= void begin_box(int @!box_context) {@+ pointer @!p, @!q; /*run through the current list*/ int @!m; /*the length of a replacement list*/ halfword @!k; /*0 or |vmode| or |hmode|*/ eight_bits @!n; /*a box number*/ switch (cur_chr) { case box_code: {@+scan_eight_bit_int();cur_box=box(cur_val); box(cur_val)=null; /*the box becomes void, at the same level*/ } @+break; case copy_code: {@+scan_eight_bit_int();cur_box=copy_node_list(box(cur_val)); } @+break; case last_box_code: @<If the current list ends with a box node, delete it from the list and make |cur_box| point to it; otherwise set |cur_box:=null|@>@;@+break; case vsplit_code: @<Split off part of a vertical box, make |cur_box| point to it@>@;@+break; default:@<Initiate the construction of an hbox or vbox, then |return|@>@; } @/ box_end(box_context); /*in simple cases, we use the box immediately*/ } @ Note that the condition |!is_char_node(tail)| implies that |head!=tail|, since |head| is a one-word node. @<If the current list ends with a box node, delete it...@>= {@+cur_box=null; if (abs(mode)==mmode) {@+you_cant();help1("Sorry; this \\lastbox will be void.");error(); } else if ((mode==vmode)&&(head==tail)) {@+you_cant(); help2("Sorry...I usually can't take things from the current page.")@/ ("This \\lastbox will therefore be void.");error(); } else{@+if (!is_char_node(tail)) if ((type(tail)==hlist_node)||(type(tail)==vlist_node)) @<Remove the last box, unless it's part of a discretionary@>; } } @ @<Remove the last box...@>= {@+q=head; @/do@+{p=q; if (!is_char_node(q)) if (type(q)==disc_node) {@+for (m=1; m<=replace_count(q); m++) p=link(p); if (p==tail) goto done; } q=link(p); }@+ while (!(q==tail)); cur_box=tail;shift_amount(cur_box)=0; tail=p;link(p)=null; done: ;} @ Here we deal with things like `\.{\\vsplit 13 to 100pt}'. @<Split off part of a vertical box, make |cur_box| point to it@>= {@+scan_eight_bit_int();n=cur_val; if (!scan_keyword(@[@<|"to"|@>@])) @.to@> {@+print_err("Missing `to' inserted"); @.Missing `to' inserted@> help2("I'm working on `\\vsplit<box number> to <dimen>';")@/ ("will look for the <dimen> next.");error(); } scan_normal_dimen; cur_box=vsplit(n, cur_val); } @ Here is where we enter restricted horizontal mode or internal vertical mode, in order to make a box. @<Initiate the construction of an hbox or vbox, then |return|@>= {@+k=cur_chr-vtop_code;saved(0)=box_context; if (k==hmode) if ((box_context < box_flag)&&(abs(mode)==vmode)) scan_spec(adjusted_hbox_group, true); else scan_spec(hbox_group, true); else{@+if (k==vmode) scan_spec(vbox_group, true); else{@+scan_spec(vtop_group, true);k=vmode; } normal_paragraph(); } push_nest();mode=-k; if (k==vmode) {@+prev_depth=ignore_depth; if (every_vbox!=null) begin_token_list(every_vbox, every_vbox_text); } else{@+space_factor=1000; if (every_hbox!=null) begin_token_list(every_hbox, every_hbox_text); } return; } @ @<Declare act...@>= void scan_box(int @!box_context) /*the next input should specify a box or perhaps a rule*/ {@+@<Get the next non-blank non-relax...@>; if (cur_cmd==make_box) begin_box(box_context); else if ((box_context >= leader_flag)&&((cur_cmd==hrule)||(cur_cmd==vrule))) {@+cur_box=scan_rule_spec();box_end(box_context); } else{@+@t@>@;@/ print_err("A <box> was supposed to be here");@/ @.A <box> was supposed to...@> help3("I was expecting to see \\hbox or \\vbox or \\copy or \\box or")@/ ("something like that. So you might find something missing in")@/ ("your output. But keep trying; you can fix this later.");back_error(); } } @ When the right brace occurs at the end of an \.{\\hbox} or \.{\\vbox} or \.{\\vtop} construction, the |package| routine comes into action. We might also have to finish a paragraph that hasn't ended. @<Cases of |handle...@>= case hbox_group: package(0);@+break; case adjusted_hbox_group: {@+adjust_tail=adjust_head;package(0); } @+break; case vbox_group: {@+end_graf();package(0); } @+break; case vtop_group: {@+end_graf();package(vtop_code); } @+break; @ @<Declare action...@>= void package(small_number @!c) {@+scaled h; /*height of box*/ pointer @!p; /*first node in a box*/ scaled @!d; /*max depth*/ d=box_max_depth;unsave();save_ptr=save_ptr-3; if (mode==-hmode) cur_box=hpack(link(head), saved(2), saved(1)); else{@+cur_box=vpackage(link(head), saved(2), saved(1), d); if (c==vtop_code) @<Readjust the height and depth of |cur_box|, for \.{\\vtop}@>; } pop_nest();box_end(saved(0)); } @ The height of a `\.{\\vtop}' box is inherited from the first item on its list, if that item is an |hlist_node|, |vlist_node|, or |rule_node|; otherwise the \.{\\vtop} height is zero. @<Readjust the height...@>= {@+h=0;p=list_ptr(cur_box); if (p!=null) if (type(p) <= rule_node) h=height(p); depth(cur_box)=depth(cur_box)-h+height(cur_box);height(cur_box)=h; } @ A paragraph begins when horizontal-mode material occurs in vertical mode, or when the paragraph is explicitly started by `\.{\\indent}' or `\.{\\noindent}'. @<Put each...@>= primitive(@[@<|"indent"|@>@], start_par, 1); @!@:indent_}{\.{\\indent} primitive@> primitive(@[@<|"noindent"|@>@], start_par, 0); @!@:no_indent_}{\.{\\noindent} primitive@> @ @<Cases of |print_cmd_chr|...@>= case start_par: if (chr_code==0) print_esc(@[@<|"noindent"|@>@]);@+else print_esc(@[@<|"indent"|@>@]);@+break; @ @<Cases of |main_control| that build...@>= case vmode+start_par: new_graf(cur_chr > 0);@+break; case vmode+letter: case vmode+other_char: case vmode+char_num: case vmode+char_given: case vmode+math_shift: case vmode+un_hbox: case vmode+vrule: case vmode+accent: case vmode+discretionary: case vmode+hskip: case vmode+valign: case vmode+ex_space: case vmode+no_boundary: @t@>@;@/ {@+back_input();new_graf(true); } @+break; @ @<Declare act...@>= small_number norm_min(int @!h) {@+if (h <= 0) return 1;@+else if (h >= 63) return 63;@+ else return h; } @# void new_graf(bool @!indented) {@+prev_graf=0; if ((mode==vmode)||(head!=tail)) tail_append(new_param_glue(par_skip_code)); push_nest();mode=hmode;space_factor=1000;set_cur_lang;clang=cur_lang; prev_graf=(norm_min(left_hyphen_min)*0100+norm_min(right_hyphen_min)) *0200000+cur_lang; if (indented) {@+tail=new_null_box();link(head)=tail;width(tail)=par_indent;@+ } if (every_par!=null) begin_token_list(every_par, every_par_text); if (nest_ptr==1) build_page(); /*put |par_skip| glue on current page*/ } @ @<Cases of |main_control| that build...@>= case hmode+start_par: case mmode+start_par: indent_in_hmode();@+break; @ @<Declare act...@>= void indent_in_hmode(void) {@+pointer p, @!q; if (cur_chr > 0) /*\.{\\indent}*/ {@+p=new_null_box();width(p)=par_indent; if (abs(mode)==hmode) space_factor=1000; else{@+q=new_noad();math_type(nucleus(q))=sub_box; info(nucleus(q))=p;p=q; } tail_append(p); } } @ A paragraph ends when a |par_end| command is sensed, or when we are in horizontal mode when reaching the right brace of vertical-mode routines like \.{\\vbox}, \.{\\insert}, or \.{\\output}. @<Cases of |main_control| that build...@>= case vmode+par_end: {@+normal_paragraph(); if (mode > 0) build_page(); } @+break; case hmode+par_end: {@+if (align_state < 0) off_save(); /*this tries to recover from an alignment that didn't end properly*/ end_graf(); /*this takes us to the enclosing mode, if |mode > 0|*/ if (mode==vmode) build_page(); } @+break; case hmode+stop: case hmode+vskip: case hmode+hrule: case hmode+un_vbox: case hmode+halign: head_for_vmode();@+break; @ @<Declare act...@>= void head_for_vmode(void) {@+if (mode < 0) if (cur_cmd!=hrule) off_save(); else{@+print_err("You can't use `"); print_esc(@[@<|"hrule"|@>@]);print_str("' here except with leaders"); @.You can't use \\hrule...@> help2("To put a horizontal rule in an hbox or an alignment,")@/ ("you should use \\leaders or \\hrulefill (see The TeXbook)."); error(); } else{@+back_input();cur_tok=par_token;back_input();token_type=inserted; } } @ @<Declare act...@>= void end_graf(void) {@+if (mode==hmode) {@+if (head==tail) pop_nest(); /*null paragraphs are ignored*/ else line_break(widow_penalty); normal_paragraph(); error_count=0; } } @ Insertion and adjustment and mark nodes are constructed by the following pieces of the program. @<Cases of |main_control| that build...@>= any_mode(insert): case hmode+vadjust: case mmode+vadjust: begin_insert_or_adjust();@+break; any_mode(mark): make_mark();@+break; @ @<Forbidden...@>= case vmode+vadjust: @ @<Declare act...@>= void begin_insert_or_adjust(void) {@+if (cur_cmd==vadjust) cur_val=255; else{@+scan_eight_bit_int(); if (cur_val==255) {@+print_err("You can't ");print_esc(@[@<|"insert"|@>@]);print_int(255); @.You can't \\insert255@> help1("I'm changing to \\insert0; box 255 is special."); error();cur_val=0; } } saved(0)=cur_val;incr(save_ptr); new_save_level(insert_group);scan_left_brace();normal_paragraph(); push_nest();mode=-vmode;prev_depth=ignore_depth; } @ @<Cases of |handle...@>= case insert_group: {@+end_graf();q=split_top_skip;add_glue_ref(q); d=split_max_depth;f=floating_penalty;unsave();decr(save_ptr); /*now |saved(0)| is the insertion number, or 255 for |vadjust|*/ p=vpack(link(head), natural);pop_nest(); if (saved(0) < 255) {@+tail_append(get_node(ins_node_size)); type(tail)=ins_node;subtype(tail)=qi(saved(0)); height(tail)=height(p)+depth(p);ins_ptr(tail)=list_ptr(p); split_top_ptr(tail)=q;depth(tail)=d;float_cost(tail)=f; } else{@+tail_append(get_node(small_node_size)); type(tail)=adjust_node;@/ subtype(tail)=0; /*the |subtype| is not used*/ adjust_ptr(tail)=list_ptr(p);delete_glue_ref(q); } free_node(p, box_node_size); if (nest_ptr==0) build_page(); } @+break; case output_group: @<Resume the page builder...@>@;@+break; @ @<Declare act...@>= void make_mark(void) {@+pointer p; /*new node*/ p=scan_toks(false, true);p=get_node(small_node_size); type(p)=mark_node;subtype(p)=0; /*the |subtype| is not used*/ mark_ptr(p)=def_ref;link(tail)=p;tail=p; } @ Penalty nodes get into a list via the |break_penalty| command. @^penalties@> @<Cases of |main_control| that build...@>= any_mode(break_penalty): append_penalty();@+break; @ @<Declare action...@>= void append_penalty(void) {@+scan_int();tail_append(new_penalty(cur_val)); if (mode==vmode) build_page(); } @ The |remove_item| command removes a penalty, kern, or glue node if it appears at the tail of the current list, using a brute-force linear scan. Like \.{\\lastbox}, this command is not allowed in vertical mode (except internal vertical mode), since the current list in vertical mode is sent to the page builder. But if we happen to be able to implement it in vertical mode, we do. @<Cases of |main_control| that build...@>= any_mode(remove_item): delete_last();@+break; @ When |delete_last| is called, |cur_chr| is the |type| of node that will be deleted, if present. @<Declare action...@>= void delete_last(void) {@+ pointer @!p, @!q; /*run through the current list*/ int @!m; /*the length of a replacement list*/ if ((mode==vmode)&&(tail==head)) @<Apologize for inability to do the operation now, unless \.{\\unskip} follows non-glue@>@; else{@+if (!is_char_node(tail)) if (type(tail)==cur_chr) {@+q=head; @/do@+{p=q; if (!is_char_node(q)) if (type(q)==disc_node) {@+for (m=1; m<=replace_count(q); m++) p=link(p); if (p==tail) return; } q=link(p); }@+ while (!(q==tail)); link(p)=null;flush_node_list(tail);tail=p; } } } @ @<Apologize for inability to do the operation...@>= {@+if ((cur_chr!=glue_node)||(last_glue!=max_halfword)) {@+you_cant(); help2("Sorry...I usually can't take things from the current page.")@/ ("Try `I\\vskip-\\lastskip' instead."); if (cur_chr==kern_node) help_line[0]= ("Try `I\\kern-\\lastkern' instead."); else if (cur_chr!=glue_node) help_line[0]=@| ("Perhaps you can make the output routine do it."); error(); } } @ @<Put each...@>= primitive(@[@<|"unpenalty"|@>@], remove_item, penalty_node);@/ @!@:un_penalty_}{\.{\\unpenalty} primitive@> primitive(@[@<|"unkern"|@>@], remove_item, kern_node);@/ @!@:un_kern_}{\.{\\unkern} primitive@> primitive(@[@<|"unskip"|@>@], remove_item, glue_node);@/ @!@:un_skip_}{\.{\\unskip} primitive@> primitive(@[@<|"unhbox"|@>@], un_hbox, box_code);@/ @!@:un_hbox_}{\.{\\unhbox} primitive@> primitive(@[@<|"unhcopy"|@>@], un_hbox, copy_code);@/ @!@:un_hcopy_}{\.{\\unhcopy} primitive@> primitive(@[@<|"unvbox"|@>@], un_vbox, box_code);@/ @!@:un_vbox_}{\.{\\unvbox} primitive@> primitive(@[@<|"unvcopy"|@>@], un_vbox, copy_code);@/ @!@:un_vcopy_}{\.{\\unvcopy} primitive@> @ @<Cases of |print_cmd_chr|...@>= case remove_item: if (chr_code==glue_node) print_esc(@[@<|"unskip"|@>@]); else if (chr_code==kern_node) print_esc(@[@<|"unkern"|@>@]); else print_esc(@[@<|"unpenalty"|@>@]);@+break; case un_hbox: if (chr_code==copy_code) print_esc(@[@<|"unhcopy"|@>@]); else print_esc(@[@<|"unhbox"|@>@]);@+break; case un_vbox: if (chr_code==copy_code) print_esc(@[@<|"unvcopy"|@>@]); else print_esc(@[@<|"unvbox"|@>@]);@+break; @ The |un_hbox| and |un_vbox| commands unwrap one of the 256 current boxes. @<Cases of |main_control| that build...@>= case vmode+un_vbox: case hmode+un_hbox: case mmode+un_hbox: unpackage();@+break; @ @<Declare act...@>= void unpackage(void) {@+ pointer p; /*the box*/ uint8_t @!c; /*should we copy?*/ c=cur_chr;scan_eight_bit_int();p=box(cur_val); if (p==null) return; if ((abs(mode)==mmode)||((abs(mode)==vmode)&&(type(p)!=vlist_node))||@| ((abs(mode)==hmode)&&(type(p)!=hlist_node))) {@+print_err("Incompatible list can't be unboxed"); @.Incompatible list...@> help3("Sorry, Pandora. (You sneaky devil.)")@/ ("I refuse to unbox an \\hbox in vertical mode or vice versa.")@/ ("And I can't open any boxes in math mode.");@/ error();return; } if (c==copy_code) link(tail)=copy_node_list(list_ptr(p)); else{@+link(tail)=list_ptr(p);box(cur_val)=null; free_node(p, box_node_size); } while (link(tail)!=null) tail=link(tail); } @ @<Forbidden...@>=case vmode+ital_corr: @ Italic corrections are converted to kern nodes when the |ital_corr| command follows a character. In math mode the same effect is achieved by appending a kern of zero here, since italic corrections are supplied later. @<Cases of |main_control| that build...@>= case hmode+ital_corr: append_italic_correction();@+break; case mmode+ital_corr: tail_append(new_kern(0))@;@+break; @ @<Declare act...@>= void append_italic_correction(void) {@+ pointer p; /*|char_node| at the tail of the current list*/ internal_font_number @!f; /*the font in the |char_node|*/ if (tail!=head) {@+if (is_char_node(tail)) p=tail; else if (type(tail)==ligature_node) p=lig_char(tail); else return; f=font(p); tail_append(new_kern(char_italic(f)(char_info(f)(character(p))))); subtype(tail)=explicit; } } @ Discretionary nodes are easy in the common case `\.{\\-}', but in the general case we must process three braces full of items. @<Put each...@>= primitive('-', discretionary, 1); @!@:Single-character primitives -}{\quad\.{\\-}@> primitive(@[@<|"discretionary"|@>@], discretionary, 0); @!@:discretionary_}{\.{\\discretionary} primitive@> @ @<Cases of |print_cmd_chr|...@>= case discretionary: if (chr_code==1) print_esc('-');@+else print_esc(@[@<|"discretionary"|@>@]);@+break; @ @<Cases of |main_control| that build...@>= case hmode+discretionary: case mmode+discretionary: append_discretionary();@+break; @ The space factor does not change when we append a discretionary node, but it starts out as 1000 in the subsidiary lists. @<Declare act...@>= void append_discretionary(void) {@+int c; /*hyphen character*/ tail_append(new_disc()); if (cur_chr==1) {@+c=hyphen_char[cur_font]; if (c >= 0) if (c < 256) pre_break(tail)=new_character(cur_font, c); } else{@+incr(save_ptr);saved(-1)=0;new_save_level(disc_group); scan_left_brace();push_nest();mode=-hmode;space_factor=1000; } } @ The three discretionary lists are constructed somewhat as if they were hboxes. A~subroutine called |build_discretionary| handles the transitions. (This is sort of fun.) @<Cases of |handle...@>= case disc_group: build_discretionary();@+break; @ @<Declare act...@>= void build_discretionary(void) {@+ pointer p, @!q; /*for link manipulation*/ int @!n; /*length of discretionary list*/ unsave(); @<Prune the current list, if necessary, until it contains only |char_node|, |kern_node|, |hlist_node|, |vlist_node|, |rule_node|, and |ligature_node| items; set |n| to the length of the list, and set |q| to the list's tail@>; p=link(head);pop_nest(); switch (saved(-1)) { case 0: pre_break(tail)=p;@+break; case 1: post_break(tail)=p;@+break; case 2: @<Attach list |p| to the current list, and record its length; then finish up and |return|@>; } /*there are no other cases*/ incr(saved(-1));new_save_level(disc_group);scan_left_brace(); push_nest();mode=-hmode;space_factor=1000; } @ @<Attach list |p| to the current...@>= {@+if ((n > 0)&&(abs(mode)==mmode)) {@+print_err("Illegal math ");print_esc(@[@<|"discretionary"|@>@]); @.Illegal math \\disc...@> help2("Sorry: The third part of a discretionary break must be")@/ ("empty, in math formulas. I had to delete your third part."); flush_node_list(p);n=0;error(); } else link(tail)=p; if (n <= max_quarterword) replace_count(tail)=n; else{@+print_err("Discretionary list is too long"); @.Discretionary list is too long@> help2("Wow---I never thought anybody would tweak me here.")@/ ("You can't seriously need such a huge discretionary list?"); error(); } if (n > 0) tail=q; decr(save_ptr);return; } @ During this loop, |p==link(q)| and there are |n| items preceding |p|. @<Prune the current list, if necessary...@>= q=head;p=link(q);n=0; while (p!=null) {@+if (!is_char_node(p)) if (type(p) > rule_node) if (type(p)!=kern_node) if (type(p)!=ligature_node) {@+print_err("Improper discretionary list"); @.Improper discretionary list@> help1("Discretionary lists must contain only boxes and kerns.");@/ error(); begin_diagnostic(); print_nl("The following discretionary sublist has been deleted:"); @.The following...deleted@> show_box(p); end_diagnostic(true); flush_node_list(p);link(q)=null;goto done; } q=p;p=link(q);incr(n); } done: @ We need only one more thing to complete the horizontal mode routines, namely the \.{\\accent} primitive. @<Cases of |main_control| that build...@>= case hmode+accent: make_accent();@+break; @ The positioning of accents is straightforward but tedious. Given an accent of width |a|, designed for characters of height |x| and slant |s|; and given a character of width |w|, height |h|, and slant |t|: We will shift the accent down by |x-h|, and we will insert kern nodes that have the effect of centering the accent over the character and shifting the accent to the right by $\delta={1\over2}(w-a)+h\cdot t-x\cdot s$. If either character is absent from the font, we will simply use the other, without shifting. @<Declare act...@>= void make_accent(void) {@+double s, @!t; /*amount of slant*/ pointer @!p, @!q, @!r; /*character, box, and kern nodes*/ internal_font_number @!f; /*relevant font*/ scaled @!a, @!h, @!x, @!w, @!delta; /*heights and widths, as explained above*/ four_quarters @!i; /*character information*/ scan_char_num();f=cur_font;p=new_character(f, cur_val); if (p!=null) {@+x=x_height(f);s=slant(f)/float_constant(65536); @^real division@> a=char_width(f)(char_info(f)(character(p)));@/ do_assignments();@/ @<Create a character node |q| for the next character, but set |q:=null| if problems arise@>; if (q!=null) @<Append the accent with appropriate kerns, then set |p:=q|@>; link(tail)=p;tail=p;space_factor=1000; } } @ @<Create a character node |q| for the next...@>= q=null;f=cur_font; if ((cur_cmd==letter)||(cur_cmd==other_char)||(cur_cmd==char_given)) q=new_character(f, cur_chr); else if (cur_cmd==char_num) {@+scan_char_num();q=new_character(f, cur_val); } else back_input() @ The kern nodes appended here must be distinguished from other kerns, lest they be wiped away by the hyphenation algorithm or by a previous line break. The two kerns are computed with (machine-dependent) |double| arithmetic, but their sum is machine-independent; the net effect is machine-independent, because the user cannot remove these nodes nor access them via \.{\\lastkern}. @<Append the accent with appropriate kerns...@>= {@+t=slant(f)/float_constant(65536); @^real division@> i=char_info(f)(character(q)); w=char_width(f)(i);h=char_height(f)(height_depth(i)); if (h!=x) /*the accent must be shifted up or down*/ {@+p=hpack(p, natural);shift_amount(p)=x-h; } delta=round((w-a)/float_constant(2)+h*t-x*s); @^real multiplication@> @^real addition@> r=new_kern(delta);subtype(r)=acc_kern;link(tail)=r;link(r)=p; tail=new_kern(-a-delta);subtype(tail)=acc_kern;link(p)=tail;p=q; } @ When `\.{\\cr}' or `\.{\\span}' or a tab mark comes through the scanner into |main_control|, it might be that the user has foolishly inserted one of them into something that has nothing to do with alignment. But it is far more likely that a left brace or right brace has been omitted, since |get_next| takes actions appropriate to alignment only when `\.{\\cr}' or `\.{\\span}' or tab marks occur with |align_state==0|. The following program attempts to make an appropriate recovery. @<Cases of |main_control| that build...@>= any_mode(car_ret): any_mode(tab_mark): align_error();@+break; any_mode(no_align): no_align_error();@+break; any_mode(omit): omit_error();@+break; @ @<Declare act...@>= void align_error(void) {@+if (abs(align_state) > 2) @<Express consternation over the fact that no alignment is in progress@>@; else{@+back_input(); if (align_state < 0) {@+print_err("Missing { inserted"); @.Missing \{ inserted@> incr(align_state);cur_tok=left_brace_token+'{'; } else{@+print_err("Missing } inserted"); @.Missing \} inserted@> decr(align_state);cur_tok=right_brace_token+'}'; } help3("I've put in what seems to be necessary to fix")@/ ("the current column of the current alignment.")@/ ("Try to go on, since this might almost work.");ins_error(); } } @ @<Express consternation...@>= {@+print_err("Misplaced ");print_cmd_chr(cur_cmd, cur_chr); @.Misplaced \&@> @.Misplaced \\span@> @.Misplaced \\cr@> if (cur_tok==tab_token+'&') {@+help6("I can't figure out why you would want to use a tab mark")@/ ("here. If you just want an ampersand, the remedy is")@/ ("simple: Just type `I\\&' now. But if some right brace")@/ ("up above has ended a previous alignment prematurely,")@/ ("you're probably due for more error messages, and you")@/ ("might try typing `S' now just to see what is salvageable."); } else{@+help5("I can't figure out why you would want to use a tab mark")@/ ("or \\cr or \\span just now. If something like a right brace")@/ ("up above has ended a previous alignment prematurely,")@/ ("you're probably due for more error messages, and you")@/ ("might try typing `S' now just to see what is salvageable."); } error(); } @ The help messages here contain a little white lie, since \.{\\noalign} and \.{\\omit} are allowed also after `\.{\\noalign\{...\}}'. @<Declare act...@>= void no_align_error(void) {@+print_err("Misplaced ");print_esc(@[@<|"noalign"|@>@]); @.Misplaced \\noalign@> help2("I expect to see \\noalign only after the \\cr of")@/ ("an alignment. Proceed, and I'll ignore this case.");error(); } void omit_error(void) {@+print_err("Misplaced ");print_esc(@[@<|"omit"|@>@]); @.Misplaced \\omit@> help2("I expect to see \\omit only after tab marks or the \\cr of")@/ ("an alignment. Proceed, and I'll ignore this case.");error(); } @ We've now covered most of the abuses of \.{\\halign} and \.{\\valign}. Let's take a look at what happens when they are used correctly. @<Cases of |main_control| that build...@>= case vmode+halign: case hmode+valign: init_align();@+break; case mmode+halign: if (privileged()) if (cur_group==math_shift_group) init_align(); else off_save();@+break; case vmode+endv: case hmode+endv: do_endv();@+break; @ An |align_group| code is supposed to remain on the |save_stack| during an entire alignment, until |fin_align| removes it. A devious user might force an |endv| command to occur just about anywhere; we must defeat such hacks. @<Declare act...@>= void do_endv(void) {@+base_ptr=input_ptr;input_stack[base_ptr]=cur_input; while ((input_stack[base_ptr].index_field!=v_template)&& (input_stack[base_ptr].loc_field==null)&& (input_stack[base_ptr].state_field==token_list)) decr(base_ptr); if ((input_stack[base_ptr].index_field!=v_template)|| (input_stack[base_ptr].loc_field!=null)|| (input_stack[base_ptr].state_field!=token_list)) fatal_error("(interwoven alignment preambles are not allowed)"); @.interwoven alignment preambles...@> if (cur_group==align_group) {@+end_graf(); if (fin_col()) fin_row(); } else off_save(); } @ @<Cases of |handle_right_brace|...@>= case align_group: {@+back_input();cur_tok=cs_token_flag+frozen_cr; print_err("Missing ");print_esc(@[@<|"cr"|@>@]);print_str(" inserted"); @.Missing \\cr inserted@> help1("I'm guessing that you meant to end an alignment here."); ins_error(); } @+break; @ @<Cases of |handle_right_brace|...@>= case no_align_group: {@+end_graf();unsave();align_peek(); } @+break; @ Finally, \.{\\endcsname} is not supposed to get through to |main_control|. @<Cases of |main_control| that build...@>= any_mode(end_cs_name): cs_error();@+break; @ @<Declare act...@>= void cs_error(void) {@+print_err("Extra ");print_esc(@[@<|"endcsname"|@>@]); @.Extra \\endcsname@> help1("I'm ignoring this, since I wasn't doing a \\csname."); error(); } @* Building math lists. The routines that \TeX\ uses to create mlists are similar to those we have just seen for the generation of hlists and vlists. But it is necessary to make ``noads'' as well as nodes, so the reader should review the discussion of math mode data structures before trying to make sense out of the following program. Here is a little routine that needs to be done whenever a subformula is about to be processed. The parameter is a code like |math_group|. @<Declare act...@>= void push_math(group_code @!c) {@+push_nest();mode=-mmode;incompleat_noad=null;new_save_level(c); } @ We get into math mode from horizontal mode when a `\.\$' (i.e., a |math_shift| character) is scanned. We must check to see whether this `\.\$' is immediately followed by another, in case display math mode is called for. @<Cases of |main_control| that build...@>= case hmode+math_shift: init_math();@+break; @ @<Declare act...@>= void init_math(void) {@+ scaled w; /*new or partial |pre_display_size|*/ scaled @!l; /*new |display_width|*/ scaled @!s; /*new |display_indent|*/ pointer @!p; /*current node when calculating |pre_display_size|*/ pointer @!q; /*glue specification when calculating |pre_display_size|*/ internal_font_number @!f; /*font in current |char_node|*/ int @!n; /*scope of paragraph shape specification*/ scaled @!v; /*|w| plus possible glue amount*/ scaled @!d; /*increment to |v|*/ get_token(); /*|get_x_token| would fail on \.{\\ifmmode}\thinspace!*/ if ((cur_cmd==math_shift)&&(mode > 0)) @<Go into display math mode@>@; else{@+back_input();@<Go into ordinary math mode@>; } } @ @<Go into ordinary math mode@>= {@+push_math(math_shift_group);eq_word_define(int_base+cur_fam_code,-1); if (every_math!=null) begin_token_list(every_math, every_math_text); } @ We get into ordinary math mode from display math mode when `\.{\\eqno}' or `\.{\\leqno}' appears. In such cases |cur_chr| will be 0 or~1, respectively; the value of |cur_chr| is placed onto |save_stack| for safe keeping. @<Cases of |main_control| that build...@>= case mmode+eq_no: if (privileged()) if (cur_group==math_shift_group) start_eq_no(); else off_save();@+break; @ @<Put each...@>= primitive(@[@<|"eqno"|@>@], eq_no, 0); @!@:eq_no_}{\.{\\eqno} primitive@> primitive(@[@<|"leqno"|@>@], eq_no, 1); @!@:leq_no_}{\.{\\leqno} primitive@> @ When \TeX\ is in display math mode, |cur_group==math_shift_group|, so it is not necessary for the |start_eq_no| procedure to test for this condition. @<Declare act...@>= void start_eq_no(void) {@+saved(0)=cur_chr;incr(save_ptr); @<Go into ordinary math mode@>; } @ @<Cases of |print_cmd_chr|...@>= case eq_no: if (chr_code==1) print_esc(@[@<|"leqno"|@>@]);@+else print_esc(@[@<|"eqno"|@>@]);@+break; @ @<Forbidden...@>=non_math(eq_no): @ When we enter display math mode, we need to call |line_break| to process the partial paragraph that has just been interrupted by the display. Then we can set the proper values of |display_width| and |display_indent| and |pre_display_size|. @<Go into display math mode@>= {@+if (head==tail) /*`\.{\\noindent\$\$}' or `\.{\$\${ }\$\$}'*/ {@+pop_nest();w=-max_dimen; } else{@+line_break(display_widow_penalty);@/ @<Calculate the natural width, |w|, by which the characters of the final line extend to the right of the reference point, plus two ems; or set |w:=max_dimen| if the non-blank information on that line is affected by stretching or shrinking@>; } /*now we are in vertical mode, working on the list that will contain the display*/ @<Calculate the length, |l|, and the shift amount, |s|, of the display lines@>; push_math(math_shift_group);mode=mmode; eq_word_define(int_base+cur_fam_code,-1);@/ eq_word_define(dimen_base+pre_display_size_code, w); eq_word_define(dimen_base+display_width_code, l); eq_word_define(dimen_base+display_indent_code, s); if (every_display!=null) begin_token_list(every_display, every_display_text); if (nest_ptr==1) build_page(); } @ @<Calculate the natural width, |w|, by which...@>= v=shift_amount(just_box)+2*quad(cur_font);w=-max_dimen; p=list_ptr(just_box); while (p!=null) {@+@<Let |d| be the natural width of node |p|; if the node is ``visible,'' |goto found|; if the node is glue that stretches or shrinks, set |v:=max_dimen|@>; if (v < max_dimen) v=v+d; goto not_found; found: if (v < max_dimen) {@+v=v+d;w=v; } else{@+w=max_dimen;goto done; } not_found: p=link(p); } done: @ @<Let |d| be the natural width of node |p|...@>= reswitch: if (is_char_node(p)) {@+f=font(p);d=char_width(f)(char_info(f)(character(p))); goto found; } switch (type(p)) { case hlist_node: case vlist_node: case rule_node: {@+d=width(p);goto found; } case ligature_node: @<Make node |p| look like a |char_node|...@>@; case kern_node: case math_node: d=width(p);@+break; case glue_node: @<Let |d| be the natural width of this glue; if stretching or shrinking, set |v:=max_dimen|; |goto found| in the case of leaders@>@;@+break; case whatsit_node: @<Let |d| be the width of the whatsit |p|@>;@+break; default:d=0; } @ We need to be careful that |w|, |v|, and |d| do not depend on any |glue_set| values, since such values are subject to system-dependent rounding. System-dependent numbers are not allowed to infiltrate parameters like |pre_display_size|, since \TeX82 is supposed to make the same decisions on all machines. @<Let |d| be the natural width of this glue...@>= {@+q=glue_ptr(p);d=width(q); if (glue_sign(just_box)==stretching) {@+if ((glue_order(just_box)==stretch_order(q))&&@| (stretch(q)!=0)) v=max_dimen; } else if (glue_sign(just_box)==shrinking) {@+if ((glue_order(just_box)==shrink_order(q))&&@| (shrink(q)!=0)) v=max_dimen; } if (subtype(p) >= a_leaders) goto found; } @ A displayed equation is considered to be three lines long, so we calculate the length and offset of line number |prev_graf+2|. @<Calculate the length, |l|,...@>= if (par_shape_ptr==null) if ((hang_indent!=0)&&@| (((hang_after >= 0)&&(prev_graf+2 > hang_after))||@| (prev_graf+1 < -hang_after))) {@+l=hsize-abs(hang_indent); if (hang_indent > 0) s=hang_indent;@+else s=0; } else{@+l=hsize;s=0; } else{@+n=info(par_shape_ptr); if (prev_graf+2 >= n) p=par_shape_ptr+2*n; else p=par_shape_ptr+2*(prev_graf+2); s=mem[p-1].sc;l=mem[p].sc; } @ Subformulas of math formulas cause a new level of math mode to be entered, on the semantic nest as well as the save stack. These subformulas arise in several ways: (1)~A left brace by itself indicates the beginning of a subformula that will be put into a box, thereby freezing its glue and preventing line breaks. (2)~A subscript or superscript is treated as a subformula if it is not a single character; the same applies to the nucleus of things like \.{\\underline}. (3)~The \.{\\left} primitive initiates a subformula that will be terminated by a matching \.{\\right}. The group codes placed on |save_stack| in these three cases are |math_group|, |math_group|, and |math_left_group|, respectively. Here is the code that handles case (1); the other cases are not quite as trivial, so we shall consider them later. @<Cases of |main_control| that build...@>= case mmode+left_brace: {@+tail_append(new_noad()); back_input();scan_math(nucleus(tail)); } @+break; @ Recall that the |nucleus|, |subscr|, and |supscr| fields in a noad are broken down into subfields called |math_type| and either |info| or |(fam, character)|. The job of |scan_math| is to figure out what to place in one of these principal fields; it looks at the subformula that comes next in the input, and places an encoding of that subformula into a given word of |mem|. @d fam_in_range ((cur_fam >= 0)&&(cur_fam < 16)) @<Declare act...@>= void scan_math(pointer @!p) {@+ int c; /*math character code*/ restart: @<Get the next non-blank non-relax...@>; reswitch: switch (cur_cmd) { case letter: case other_char: case char_given: {@+c=ho(math_code(cur_chr)); if (c==0100000) {@+@<Treat |cur_chr| as an active character@>; goto restart; } } @+break; case char_num: {@+scan_char_num();cur_chr=cur_val;cur_cmd=char_given; goto reswitch; } case math_char_num: {@+scan_fifteen_bit_int();c=cur_val; } @+break; case math_given: c=cur_chr;@+break; case delim_num: {@+scan_twenty_seven_bit_int();c=cur_val/010000; } @+break; default:@<Scan a subformula enclosed in braces and |return|@>@; } @/ math_type(p)=math_char;character(p)=qi(c%256); if ((c >= var_code)&&fam_in_range) fam(p)=cur_fam; else fam(p)=(c/256)%16; } @ An active character that is an |outer_call| is allowed here. @<Treat |cur_chr|...@>= {@+cur_cs=cur_chr+active_base; cur_cmd=eq_type(cur_cs);cur_chr=equiv(cur_cs); x_token();back_input(); } @ The pointer |p| is placed on |save_stack| while a complex subformula is being scanned. @<Scan a subformula...@>= {@+back_input();scan_left_brace();@/ saved(0)=p;incr(save_ptr);push_math(math_group);return; } @ The simplest math formula is, of course, `\.{\${ }\$}', when no noads are generated. The next simplest cases involve a single character, e.g., `\.{\$x\$}'. Even though such cases may not seem to be very interesting, the reader can perhaps understand how happy the author was when `\.{\$x\$}' was first properly typeset by \TeX. The code in this section was used. @^Knuth, Donald Ervin@> @<Cases of |main_control| that build...@>= case mmode+letter: case mmode+other_char: case mmode+char_given: set_math_char(ho(math_code(cur_chr)));@+break; case mmode+char_num: {@+scan_char_num();cur_chr=cur_val; set_math_char(ho(math_code(cur_chr))); } @+break; case mmode+math_char_num: {@+scan_fifteen_bit_int();set_math_char(cur_val); } @+break; case mmode+math_given: set_math_char(cur_chr);@+break; case mmode+delim_num: {@+scan_twenty_seven_bit_int(); set_math_char(cur_val/010000); } @+break; @ The |set_math_char| procedure creates a new noad appropriate to a given math code, and appends it to the current mlist. However, if the math code is sufficiently large, the |cur_chr| is treated as an active character and nothing is appended. @<Declare act...@>= void set_math_char(int @!c) {@+pointer p; /*the new noad*/ if (c >= 0100000) @<Treat |cur_chr|...@>@; else{@+p=new_noad();math_type(nucleus(p))=math_char; character(nucleus(p))=qi(c%256); fam(nucleus(p))=(c/256)%16; if (c >= var_code) {@+if (fam_in_range) fam(nucleus(p))=cur_fam; type(p)=ord_noad; } else type(p)=ord_noad+(c/010000); link(tail)=p;tail=p; } } @ Primitive math operators like \.{\\mathop} and \.{\\underline} are given the command code |math_comp|, supplemented by the noad type that they generate. @<Put each...@>= primitive(@[@<|"mathord"|@>@], math_comp, ord_noad); @!@:math_ord_}{\.{\\mathord} primitive@> primitive(@[@<|"mathop"|@>@], math_comp, op_noad); @!@:math_op_}{\.{\\mathop} primitive@> primitive(@[@<|"mathbin"|@>@], math_comp, bin_noad); @!@:math_bin_}{\.{\\mathbin} primitive@> primitive(@[@<|"mathrel"|@>@], math_comp, rel_noad); @!@:math_rel_}{\.{\\mathrel} primitive@> primitive(@[@<|"mathopen"|@>@], math_comp, open_noad); @!@:math_open_}{\.{\\mathopen} primitive@> primitive(@[@<|"mathclose"|@>@], math_comp, close_noad); @!@:math_close_}{\.{\\mathclose} primitive@> primitive(@[@<|"mathpunct"|@>@], math_comp, punct_noad); @!@:math_punct_}{\.{\\mathpunct} primitive@> primitive(@[@<|"mathinner"|@>@], math_comp, inner_noad); @!@:math_inner_}{\.{\\mathinner} primitive@> primitive(@[@<|"underline"|@>@], math_comp, under_noad); @!@:underline_}{\.{\\underline} primitive@> primitive(@[@<|"overline"|@>@], math_comp, over_noad);@/ @!@:overline_}{\.{\\overline} primitive@> primitive(@[@<|"displaylimits"|@>@], limit_switch, normal); @!@:display_limits_}{\.{\\displaylimits} primitive@> primitive(@[@<|"limits"|@>@], limit_switch, limits); @!@:limits_}{\.{\\limits} primitive@> primitive(@[@<|"nolimits"|@>@], limit_switch, no_limits); @!@:no_limits_}{\.{\\nolimits} primitive@> @ @<Cases of |print_cmd_chr|...@>= case math_comp: switch (chr_code) { case ord_noad: print_esc(@[@<|"mathord"|@>@]);@+break; case op_noad: print_esc(@[@<|"mathop"|@>@]);@+break; case bin_noad: print_esc(@[@<|"mathbin"|@>@]);@+break; case rel_noad: print_esc(@[@<|"mathrel"|@>@]);@+break; case open_noad: print_esc(@[@<|"mathopen"|@>@]);@+break; case close_noad: print_esc(@[@<|"mathclose"|@>@]);@+break; case punct_noad: print_esc(@[@<|"mathpunct"|@>@]);@+break; case inner_noad: print_esc(@[@<|"mathinner"|@>@]);@+break; case under_noad: print_esc(@[@<|"underline"|@>@]);@+break; default:print_esc(@[@<|"overline"|@>@]); } @+break; case limit_switch: if (chr_code==limits) print_esc(@[@<|"limits"|@>@]); else if (chr_code==no_limits) print_esc(@[@<|"nolimits"|@>@]); else print_esc(@[@<|"displaylimits"|@>@]);@+break; @ @<Cases of |main_control| that build...@>= case mmode+math_comp: {@+tail_append(new_noad()); type(tail)=cur_chr;scan_math(nucleus(tail)); } @+break; case mmode+limit_switch: math_limit_switch();@+break; @ @<Declare act...@>= void math_limit_switch(void) {@+ if (head!=tail) if (type(tail)==op_noad) {@+subtype(tail)=cur_chr;return; } print_err("Limit controls must follow a math operator"); @.Limit controls must follow...@> help1("I'm ignoring this misplaced \\limits or \\nolimits command.");error(); } @ Delimiter fields of noads are filled in by the |scan_delimiter| routine. The first parameter of this procedure is the |mem| address where the delimiter is to be placed; the second tells if this delimiter follows \.{\\radical} or not. @<Declare act...@>= void scan_delimiter(pointer @!p, bool @!r) {@+if (r) scan_twenty_seven_bit_int(); else{@+@<Get the next non-blank non-relax...@>; switch (cur_cmd) { case letter: case other_char: cur_val=del_code(cur_chr);@+break; case delim_num: scan_twenty_seven_bit_int();@+break; default:cur_val=-1; } } if (cur_val < 0) @<Report that an invalid delimiter code is being changed to null; set~|cur_val:=0|@>; small_fam(p)=(cur_val/04000000)%16; small_char(p)=qi((cur_val/010000)%256); large_fam(p)=(cur_val/256)%16; large_char(p)=qi(cur_val%256); } @ @<Report that an invalid delimiter...@>= {@+print_err("Missing delimiter (. inserted)"); @.Missing delimiter...@> help6("I was expecting to see something like `(' or `\\{' or")@/ ("`\\}' here. If you typed, e.g., `{' instead of `\\{', you")@/ ("should probably delete the `{' by typing `1' now, so that")@/ ("braces don't get unbalanced. Otherwise just proceed.")@/ ("Acceptable delimiters are characters whose \\delcode is")@/ ("nonnegative, or you can use `\\delimiter <delimiter code>'."); back_error();cur_val=0; } @ @<Cases of |main_control| that build...@>= case mmode+radical: math_radical();@+break; @ @<Declare act...@>= void math_radical(void) {@+tail_append(get_node(radical_noad_size)); type(tail)=radical_noad;subtype(tail)=normal; mem[nucleus(tail)].hh=empty_field; mem[subscr(tail)].hh=empty_field; mem[supscr(tail)].hh=empty_field; scan_delimiter(left_delimiter(tail), true);scan_math(nucleus(tail)); } @ @<Cases of |main_control| that build...@>= case mmode+accent: case mmode+math_accent: math_ac();@+break; @ @<Declare act...@>= void math_ac(void) {@+if (cur_cmd==accent) @<Complain that the user should have said \.{\\mathaccent}@>; tail_append(get_node(accent_noad_size)); type(tail)=accent_noad;subtype(tail)=normal; mem[nucleus(tail)].hh=empty_field; mem[subscr(tail)].hh=empty_field; mem[supscr(tail)].hh=empty_field; math_type(accent_chr(tail))=math_char; scan_fifteen_bit_int(); character(accent_chr(tail))=qi(cur_val%256); if ((cur_val >= var_code)&&fam_in_range) fam(accent_chr(tail))=cur_fam; else fam(accent_chr(tail))=(cur_val/256)%16; scan_math(nucleus(tail)); } @ @<Complain that the user should have said \.{\\mathaccent}@>= {@+print_err("Please use ");print_esc(@[@<|"mathaccent"|@>@]); print_str(" for accents in math mode"); @.Please use \\mathaccent...@> help2("I'm changing \\accent to \\mathaccent here; wish me luck.")@/ ("(Accents are not the same in formulas as they are in text.)"); error(); } @ @<Cases of |main_control| that build...@>= case mmode+vcenter: {@+scan_spec(vcenter_group, false);normal_paragraph(); push_nest();mode=-vmode;prev_depth=ignore_depth; if (every_vbox!=null) begin_token_list(every_vbox, every_vbox_text); } @+break; @ @<Cases of |handle...@>= case vcenter_group: {@+end_graf();unsave();save_ptr=save_ptr-2; p=vpack(link(head), saved(1), saved(0));pop_nest(); tail_append(new_noad());type(tail)=vcenter_noad; math_type(nucleus(tail))=sub_box;info(nucleus(tail))=p; } @+break; @ The routine that inserts a |style_node| holds no surprises. @<Put each...@>= primitive(@[@<|"displaystyle"|@>@], math_style, display_style); @!@:display_style_}{\.{\\displaystyle} primitive@> primitive(@[@<|"textstyle"|@>@], math_style, text_style); @!@:text_style_}{\.{\\textstyle} primitive@> primitive(@[@<|"scriptstyle"|@>@], math_style, script_style); @!@:script_style_}{\.{\\scriptstyle} primitive@> primitive(@[@<|"scriptscriptstyle"|@>@], math_style, script_script_style); @!@:script_script_style_}{\.{\\scriptscriptstyle} primitive@> @ @<Cases of |print_cmd_chr|...@>= case math_style: print_style(chr_code);@+break; @ @<Cases of |main_control| that build...@>= case mmode+math_style: tail_append(new_style(cur_chr))@;@+break; case mmode+non_script: {@+tail_append(new_glue(zero_glue)); subtype(tail)=cond_math_glue; } @+break; case mmode+math_choice: append_choices();@+break; @ The routine that scans the four mlists of a \.{\\mathchoice} is very much like the routine that builds discretionary nodes. @<Declare act...@>= void append_choices(void) {@+tail_append(new_choice());incr(save_ptr);saved(-1)=0; push_math(math_choice_group);scan_left_brace(); } @ @<Cases of |handle_right_brace|...@>= case math_choice_group: build_choices();@+break; @ @<Declare act...@>= @t\4@>@<Declare the function called |fin_mlist|@>@t@>@;@/ void build_choices(void) {@+ pointer p; /*the current mlist*/ unsave();p=fin_mlist(null); switch (saved(-1)) { case 0: display_mlist(tail)=p;@+break; case 1: text_mlist(tail)=p;@+break; case 2: script_mlist(tail)=p;@+break; case 3: {@+script_script_mlist(tail)=p;decr(save_ptr);return; } } /*there are no other cases*/ incr(saved(-1));push_math(math_choice_group);scan_left_brace(); } @ Subscripts and superscripts are attached to the previous nucleus by the @^superscripts@>@^subscripts@> action procedure called |sub_sup|. We use the facts that |sub_mark==sup_mark+1| and |subscr(p)==supscr(p)+1|. @<Cases of |main_control| that build...@>= case mmode+sub_mark: case mmode+sup_mark: sub_sup();@+break; @ @<Declare act...@>= void sub_sup(void) {@+small_number t; /*type of previous sub/superscript*/ pointer @!p; /*field to be filled by |scan_math|*/ t=empty;p=null; if (tail!=head) if (scripts_allowed(tail)) {@+p=supscr(tail)+cur_cmd-sup_mark; /*|supscr| or |subscr|*/ t=math_type(p); } if ((p==null)||(t!=empty)) @<Insert a dummy noad to be sub/superscripted@>; scan_math(p); } @ @<Insert a dummy...@>= {@+tail_append(new_noad()); p=supscr(tail)+cur_cmd-sup_mark; /*|supscr| or |subscr|*/ if (t!=empty) {@+if (cur_cmd==sup_mark) {@+print_err("Double superscript"); @.Double superscript@> help1("I treat `x^1^2' essentially like `x^1{}^2'."); } else{@+print_err("Double subscript"); @.Double subscript@> help1("I treat `x_1_2' essentially like `x_1{}_2'."); } error(); } } @ An operation like `\.{\\over}' causes the current mlist to go into a state of suspended animation: |incompleat_noad| points to a |fraction_noad| that contains the mlist-so-far as its numerator, while the denominator is yet to come. Finally when the mlist is finished, the denominator will go into the incompleat fraction noad, and that noad will become the whole formula, unless it is surrounded by `\.{\\left}' and `\.{\\right}' delimiters. @d above_code 0 /* `\.{\\above}' */ @d over_code 1 /* `\.{\\over}' */ @d atop_code 2 /* `\.{\\atop}' */ @d delimited_code 3 /* `\.{\\abovewithdelims}', etc.*/ @<Put each...@>= primitive(@[@<|"above"|@>@], above, above_code);@/ @!@:above_}{\.{\\above} primitive@> primitive(@[@<|"over"|@>@], above, over_code);@/ @!@:over_}{\.{\\over} primitive@> primitive(@[@<|"atop"|@>@], above, atop_code);@/ @!@:atop_}{\.{\\atop} primitive@> primitive(@[@<|"abovewithdelims"|@>@], above, delimited_code+above_code);@/ @!@:above_with_delims_}{\.{\\abovewithdelims} primitive@> primitive(@[@<|"overwithdelims"|@>@], above, delimited_code+over_code);@/ @!@:over_with_delims_}{\.{\\overwithdelims} primitive@> primitive(@[@<|"atopwithdelims"|@>@], above, delimited_code+atop_code); @!@:atop_with_delims_}{\.{\\atopwithdelims} primitive@> @ @<Cases of |print_cmd_chr|...@>= case above: switch (chr_code) { case over_code: print_esc(@[@<|"over"|@>@]);@+break; case atop_code: print_esc(@[@<|"atop"|@>@]);@+break; case delimited_code+above_code: print_esc(@[@<|"abovewithdelims"|@>@]);@+break; case delimited_code+over_code: print_esc(@[@<|"overwithdelims"|@>@]);@+break; case delimited_code+atop_code: print_esc(@[@<|"atopwithdelims"|@>@]);@+break; default:print_esc(@[@<|"above"|@>@]); } @+break; @ @<Cases of |main_control| that build...@>= case mmode+above: math_fraction();@+break; @ @<Declare act...@>= void math_fraction(void) {@+small_number c; /*the type of generalized fraction we are scanning*/ c=cur_chr; if (incompleat_noad!=null) @<Ignore the fraction operation and complain about this ambiguous case@>@; else{@+incompleat_noad=get_node(fraction_noad_size); type(incompleat_noad)=fraction_noad; subtype(incompleat_noad)=normal; math_type(numerator(incompleat_noad))=sub_mlist; info(numerator(incompleat_noad))=link(head); mem[denominator(incompleat_noad)].hh=empty_field; mem[left_delimiter(incompleat_noad)].qqqq=null_delimiter; mem[right_delimiter(incompleat_noad)].qqqq=null_delimiter;@/ link(head)=null;tail=head; @<Use code |c| to distinguish between generalized fractions@>; } } @ @<Use code |c|...@>= if (c >= delimited_code) {@+scan_delimiter(left_delimiter(incompleat_noad), false); scan_delimiter(right_delimiter(incompleat_noad), false); } switch (c%delimited_code) { case above_code: {@+scan_normal_dimen; thickness(incompleat_noad)=cur_val; } @+break; case over_code: thickness(incompleat_noad)=default_code;@+break; case atop_code: thickness(incompleat_noad)=0; } /*there are no other cases*/ @ @<Ignore the fraction...@>= {@+if (c >= delimited_code) {@+scan_delimiter(garbage, false);scan_delimiter(garbage, false); } if (c%delimited_code==above_code) scan_normal_dimen; print_err("Ambiguous; you need another { and }"); @.Ambiguous...@> help3("I'm ignoring this fraction specification, since I don't")@/ ("know whether a construction like `x \\over y \\over z'")@/ ("means `{x \\over y} \\over z' or `x \\over {y \\over z}'."); error(); } @ At the end of a math formula or subformula, the |fin_mlist| routine is called upon to return a pointer to the newly completed mlist, and to pop the nest back to the enclosing semantic level. The parameter to |fin_mlist|, if not null, points to a |right_noad| that ends the current mlist; this |right_noad| has not yet been appended. @<Declare the function called |fin_mlist|@>= pointer fin_mlist(pointer @!p) {@+pointer q; /*the mlist to return*/ if (incompleat_noad!=null) @<Compleat the incompleat noad@>@; else{@+link(tail)=p;q=link(head); } pop_nest();return q; } @ @<Compleat...@>= {@+math_type(denominator(incompleat_noad))=sub_mlist; info(denominator(incompleat_noad))=link(head); if (p==null) q=incompleat_noad; else{@+q=info(numerator(incompleat_noad)); if (type(q)!=left_noad) confusion(@[@<|"right"|@>@]); @:this can't happen right}{\quad right@> info(numerator(incompleat_noad))=link(q); link(q)=incompleat_noad;link(incompleat_noad)=p; } } @ Now at last we're ready to see what happens when a right brace occurs in a math formula. Two special cases are simplified here: Braces are effectively removed when they surround a single Ord without sub/superscripts, or when they surround an accent that is the nucleus of an Ord atom. @<Cases of |handle...@>= case math_group: {@+unsave();decr(save_ptr);@/ math_type(saved(0))=sub_mlist;p=fin_mlist(null);info(saved(0))=p; if (p!=null) if (link(p)==null) if (type(p)==ord_noad) {@+if (math_type(subscr(p))==empty) if (math_type(supscr(p))==empty) {@+mem[saved(0)].hh=mem[nucleus(p)].hh; free_node(p, noad_size); } } else if (type(p)==accent_noad) if (saved(0)==nucleus(tail)) if (type(tail)==ord_noad) @<Replace the tail of the list by |p|@>; } @+break; @ @<Replace the tail...@>= {@+q=head;while (link(q)!=tail) q=link(q); link(q)=p;free_node(tail, noad_size);tail=p; } @ We have dealt with all constructions of math mode except `\.{\\left}' and `\.{\\right}', so the picture is completed by the following sections of the program. @<Put each...@>= primitive(@[@<|"left"|@>@], left_right, left_noad); @!@:left_}{\.{\\left} primitive@> primitive(@[@<|"right"|@>@], left_right, right_noad); @!@:right_}{\.{\\right} primitive@> text(frozen_right)=@[@<|"right"|@>@];eqtb[frozen_right]=eqtb[cur_val]; @ @<Cases of |print_cmd_chr|...@>= case left_right: if (chr_code==left_noad) print_esc(@[@<|"left"|@>@]); else print_esc(@[@<|"right"|@>@]);@+break; @ @<Cases of |main_control| that build...@>= case mmode+left_right: math_left_right();@+break; @ @<Declare act...@>= void math_left_right(void) {@+small_number t; /*|left_noad| or |right_noad|*/ pointer @!p; /*new noad*/ t=cur_chr; if ((t==right_noad)&&(cur_group!=math_left_group)) @<Try to recover from mismatched \.{\\right}@>@; else{@+p=new_noad();type(p)=t; scan_delimiter(delimiter(p), false); if (t==left_noad) {@+push_math(math_left_group);link(head)=p;tail=p; } else{@+p=fin_mlist(p);unsave(); /*end of |math_left_group|*/ tail_append(new_noad());type(tail)=inner_noad; math_type(nucleus(tail))=sub_mlist; info(nucleus(tail))=p; } } } @ @<Try to recover from mismatch...@>= {@+if (cur_group==math_shift_group) {@+scan_delimiter(garbage, false); print_err("Extra ");print_esc(@[@<|"right"|@>@]); @.Extra \\right.@> help1("I'm ignoring a \\right that had no matching \\left."); error(); } else off_save(); } @ Here is the only way out of math mode. @<Cases of |main_control| that build...@>= case mmode+math_shift: if (cur_group==math_shift_group) after_math(); else off_save();@+break; @ @<Declare act...@>= void after_math(void) {@+bool l; /*`\.{\\leqno}' instead of `\.{\\eqno}'*/ bool @!danger; /*not enough symbol fonts are present*/ int @!m; /*|mmode| or |-mmode|*/ pointer @!p; /*the formula*/ pointer @!a; /*box containing equation number*/ @<Local variables for finishing a displayed formula@>@; danger=false; @<Check that the necessary fonts for math symbols are present; if not, flush the current math lists and set |danger:=true|@>; m=mode;l=false;p=fin_mlist(null); /*this pops the nest*/ if (mode==-m) /*end of equation number*/ {@+@<Check that another \.\$ follows@>; cur_mlist=p;cur_style=text_style;mlist_penalties=false; mlist_to_hlist();a=hpack(link(temp_head), natural); unsave();decr(save_ptr); /*now |cur_group==math_shift_group|*/ if (saved(0)==1) l=true; danger=false; @<Check that the necessary fonts for math symbols are present; if not, flush the current math lists and set |danger:=true|@>; m=mode;p=fin_mlist(null); } else a=null; if (m < 0) @<Finish math in text@>@; else{@+if (a==null) @<Check that another \.\$ follows@>; @<Finish displayed math@>; } } @ @<Check that the necessary fonts...@>= if ((font_params[fam_fnt(2+text_size)] < total_mathsy_params)||@| (font_params[fam_fnt(2+script_size)] < total_mathsy_params)||@| (font_params[fam_fnt(2+script_script_size)] < total_mathsy_params)) {@+print_err("Math formula deleted: Insufficient symbol fonts");@/ @.Math formula deleted...@> help3("Sorry, but I can't typeset math unless \\textfont 2")@/ ("and \\scriptfont 2 and \\scriptscriptfont 2 have all")@/ ("the \\fontdimen values needed in math symbol fonts."); error();flush_math();danger=true; } else if ((font_params[fam_fnt(3+text_size)] < total_mathex_params)||@| (font_params[fam_fnt(3+script_size)] < total_mathex_params)||@| (font_params[fam_fnt(3+script_script_size)] < total_mathex_params)) {@+print_err("Math formula deleted: Insufficient extension fonts");@/ help3("Sorry, but I can't typeset math unless \\textfont 3")@/ ("and \\scriptfont 3 and \\scriptscriptfont 3 have all")@/ ("the \\fontdimen values needed in math extension fonts."); error();flush_math();danger=true; } @ The |unsave| is done after everything else here; hence an appearance of `\.{\\mathsurround}' inside of `\.{\$...\$}' affects the spacing at these particular \.\$'s. This is consistent with the conventions of `\.{\$\$...\$\$}', since `\.{\\abovedisplayskip}' inside a display affects the space above that display. @<Finish math in text@>= {@+tail_append(new_math(math_surround, before)); cur_mlist=p;cur_style=text_style;mlist_penalties=(mode > 0);mlist_to_hlist(); link(tail)=link(temp_head); while (link(tail)!=null) tail=link(tail); tail_append(new_math(math_surround, after)); space_factor=1000;unsave(); } @ \TeX\ gets to the following part of the program when the first `\.\$' ending a display has been scanned. @<Check that another \.\$ follows@>= {@+get_x_token(); if (cur_cmd!=math_shift) {@+print_err("Display math should end with $$"); @.Display math...with \$\$@> help2("The `$' that I just saw supposedly matches a previous `$$'.")@/ ("So I shall assume that you typed `$$' both times."); back_error(); } } @ We have saved the worst for last: The fussiest part of math mode processing occurs when a displayed formula is being centered and placed with an optional equation number. @<Local variables for finishing...@>= pointer @!b; /*box containing the equation*/ scaled @!w; /*width of the equation*/ scaled @!z; /*width of the line*/ scaled @!e; /*width of equation number*/ scaled @!q; /*width of equation number plus space to separate from equation*/ scaled @!d; /*displacement of equation in the line*/ scaled @!s; /*move the line right this much*/ small_number @!g1, @!g2; /*glue parameter codes for before and after*/ pointer @!r; /*kern node used to position the display*/ pointer @!t; /*tail of adjustment list*/ @ At this time |p| points to the mlist for the formula; |a| is either |null| or it points to a box containing the equation number; and we are in vertical mode (or internal vertical mode). @<Finish displayed math@>= cur_mlist=p;cur_style=display_style;mlist_penalties=false; mlist_to_hlist();p=link(temp_head);@/ adjust_tail=adjust_head;b=hpack(p, natural);p=list_ptr(b); t=adjust_tail;adjust_tail=null;@/ w=width(b);z=display_width;s=display_indent; if ((a==null)||danger) {@+e=0;q=0; } else{@+e=width(a);q=e+math_quad(text_size); } if (w+q > z) @<Squeeze the equation as much as possible; if there is an equation number that should go on a separate line by itself, set~|e:=0|@>; @<Determine the displacement, |d|, of the left edge of the equation, with respect to the line size |z|, assuming that |l=false|@>; @<Append the glue or equation number preceding the display@>; @<Append the display and perhaps also the equation number@>; @<Append the glue or equation number following the display@>; resume_after_display() @ @<Declare act...@>= void resume_after_display(void) {@+if (cur_group!=math_shift_group) confusion(@[@<|"display"|@>@]); @:this can't happen display}{\quad display@> unsave();prev_graf=prev_graf+3; push_nest();mode=hmode;space_factor=1000;set_cur_lang;clang=cur_lang; prev_graf=(norm_min(left_hyphen_min)*0100+norm_min(right_hyphen_min)) *0200000+cur_lang; @<Scan an optional space@>; if (nest_ptr==1) build_page(); } @ The user can force the equation number to go on a separate line by causing its width to be zero. @<Squeeze the equation as much as possible...@>= {@+if ((e!=0)&&((w-total_shrink[normal]+q <= z)||@| (total_shrink[fil]!=0)||(total_shrink[fill]!=0)|| (total_shrink[filll]!=0))) {@+free_node(b, box_node_size); b=hpack(p, z-q, exactly); } else{@+e=0; if (w > z) {@+free_node(b, box_node_size); b=hpack(p, z, exactly); } } w=width(b); } @ We try first to center the display without regard to the existence of the equation number. If that would make it too close (where ``too close'' means that the space between display and equation number is less than the width of the equation number), we either center it in the remaining space or move it as far from the equation number as possible. The latter alternative is taken only if the display begins with glue, since we assume that the user put glue there to control the spacing precisely. @<Determine the displacement, |d|, of the left edge of the equation...@>= d=half(z-w); if ((e > 0)&&(d < 2*e)) /*too close*/ {@+d=half(z-w-e); if (p!=null) if (!is_char_node(p)) if (type(p)==glue_node) d=0; } @ If the equation number is set on a line by itself, either before or after the formula, we append an infinite penalty so that no page break will separate the display from its number; and we use the same size and displacement for all three potential lines of the display, even though `\.{\\parshape}' may specify them differently. @<Append the glue or equation number preceding the display@>= tail_append(new_penalty(pre_display_penalty));@/ if ((d+s <= pre_display_size)||l) /*not enough clearance*/ {@+g1=above_display_skip_code;g2=below_display_skip_code; } else{@+g1=above_display_short_skip_code; g2=below_display_short_skip_code; } if (l&&(e==0)) /*it follows that |type(a)==hlist_node|*/ {@+shift_amount(a)=s;append_to_vlist(a); tail_append(new_penalty(inf_penalty)); } else tail_append(new_param_glue(g1)) @ @<Append the display and perhaps also the equation number@>= if (e!=0) {@+r=new_kern(z-w-e-d); if (l) {@+link(a)=r;link(r)=b;b=a;d=0; } else{@+link(b)=r;link(r)=a; } b=hpack(b, natural); } shift_amount(b)=s+d;append_to_vlist(b) @ @<Append the glue or equation number following the display@>= if ((a!=null)&&(e==0)&&!l) {@+tail_append(new_penalty(inf_penalty)); shift_amount(a)=s+z-width(a); append_to_vlist(a); g2=0; } if (t!=adjust_head) /*migrating material comes after equation number*/ {@+link(tail)=link(adjust_head);tail=t; } tail_append(new_penalty(post_display_penalty)); if (g2 > 0) tail_append(new_param_glue(g2)) @ When \.{\\halign} appears in a display, the alignment routines operate essentially as they do in vertical mode. Then the following program is activated, with |p| and |q| pointing to the beginning and end of the resulting list, and with |aux_save| holding the |prev_depth| value. @<Finish an alignment in a display@>= {@+do_assignments(); if (cur_cmd!=math_shift) @<Pontificate about improper alignment in display@>@; else@<Check that another \.\$ follows@>; pop_nest(); tail_append(new_penalty(pre_display_penalty)); tail_append(new_param_glue(above_display_skip_code)); link(tail)=p; if (p!=null) tail=q; tail_append(new_penalty(post_display_penalty)); tail_append(new_param_glue(below_display_skip_code)); prev_depth=aux_save.sc;resume_after_display(); } @ @<Pontificate...@>= {@+print_err("Missing $$ inserted"); @.Missing {\$\$} inserted@> help2("Displays can use special alignments (like \\eqalignno)")@/ ("only if nothing but the alignment itself is between $$'s."); back_error(); } @* Mode-independent processing. The long |main_control| procedure has now been fully specified, except for certain activities that are independent of the current mode. These activities do not change the current vlist or hlist or mlist; if they change anything, it is the value of a parameter or the meaning of a control sequence. Assignments to values in |eqtb| can be global or local. Furthermore, a control sequence can be defined to be `\.{\\long}' or `\.{\\outer}', and it might or might not be expanded. The prefixes `\.{\\global}', `\.{\\long}', and `\.{\\outer}' can occur in any order. Therefore we assign binary numeric codes, making it possible to accumulate the union of all specified prefixes by adding the corresponding codes. (\PASCAL's |set| operations could also have been used.) @<Put each...@>= primitive(@[@<|"long"|@>@], prefix, 1); @!@:long_}{\.{\\long} primitive@> primitive(@[@<|"outer"|@>@], prefix, 2); @!@:outer_}{\.{\\outer} primitive@> primitive(@[@<|"global"|@>@], prefix, 4); @!@:global_}{\.{\\global} primitive@> primitive(@[@<|"def"|@>@], def, 0); @!@:def_}{\.{\\def} primitive@> primitive(@[@<|"gdef"|@>@], def, 1); @!@:gdef_}{\.{\\gdef} primitive@> primitive(@[@<|"edef"|@>@], def, 2); @!@:edef_}{\.{\\edef} primitive@> primitive(@[@<|"xdef"|@>@], def, 3); @!@:xdef_}{\.{\\xdef} primitive@> @ @<Cases of |print_cmd_chr|...@>= case prefix: if (chr_code==1) print_esc(@[@<|"long"|@>@]); else if (chr_code==2) print_esc(@[@<|"outer"|@>@]); else print_esc(@[@<|"global"|@>@]);@+break; case def: if (chr_code==0) print_esc(@[@<|"def"|@>@]); else if (chr_code==1) print_esc(@[@<|"gdef"|@>@]); else if (chr_code==2) print_esc(@[@<|"edef"|@>@]); else print_esc(@[@<|"xdef"|@>@]);@+break; @ Every prefix, and every command code that might or might not be prefixed, calls the action procedure |prefixed_command|. This routine accumulates a sequence of prefixes until coming to a non-prefix, then it carries out the command. @<Cases of |main_control| that don't...@>= any_mode(toks_register): any_mode(assign_toks): any_mode(assign_int): any_mode(assign_dimen): any_mode(assign_glue): any_mode(assign_mu_glue): any_mode(assign_font_dimen): any_mode(assign_font_int): any_mode(set_aux): any_mode(set_prev_graf): any_mode(set_page_dimen): any_mode(set_page_int): any_mode(set_box_dimen): any_mode(set_shape): any_mode(def_code): any_mode(def_family): any_mode(set_font): any_mode(def_font): any_mode(register): any_mode(advance): any_mode(multiply): any_mode(divide): any_mode(prefix): any_mode(let): any_mode(shorthand_def): any_mode(read_to_cs): any_mode(def): any_mode(set_box): any_mode(hyph_data): any_mode(set_interaction): prefixed_command();@+break; @ If the user says, e.g., `\.{\\global\\global}', the redundancy is silently accepted. @<Declare act...@>= @t\4@>@<Declare subprocedures for |prefixed_command|@>@t@>@;@/ void prefixed_command(void) {@+ small_number a; /*accumulated prefix codes so far*/ internal_font_number @!f; /*identifies a font*/ int @!j; /*index into a \.{\\parshape} specification*/ font_index @!k; /*index into |font_info|*/ pointer @!p, @!q; /*for temporary short-term use*/ int @!n; /*ditto*/ bool @!e; /*should a definition be expanded? or was \.{\\let} not done?*/ a=0; while (cur_cmd==prefix) {@+if (!odd(a/cur_chr)) a=a+cur_chr; @<Get the next non-blank non-relax...@>; if (cur_cmd <= max_non_prefixed_command) @<Discard erroneous prefixes and |return|@>; } @<Discard the prefixes \.{\\long} and \.{\\outer} if they are irrelevant@>; @<Adjust \(f)for the setting of \.{\\globaldefs}@>; switch (cur_cmd) { @t\4@>@<Assignments@>@; default:confusion(@[@<|"prefix"|@>@]); @:this can't happen prefix}{\quad prefix@> } done: @<Insert a token saved by \.{\\afterassignment}, if any@>; } @ @<Discard erroneous...@>= {@+print_err("You can't use a prefix with `"); @.You can't use a prefix with x@> print_cmd_chr(cur_cmd, cur_chr);print_char('\''); help1("I'll pretend you didn't say \\long or \\outer or \\global."); back_error();return; } @ @<Discard the prefixes...@>= if ((cur_cmd!=def)&&(a%4!=0)) {@+print_err("You can't use `");print_esc(@[@<|"long"|@>@]);print_str("' or `"); print_esc(@[@<|"outer"|@>@]);print_str("' with `"); @.You can't use \\long...@> print_cmd_chr(cur_cmd, cur_chr);print_char('\''); help1("I'll pretend you didn't say \\long or \\outer here."); error(); } @ The previous routine does not have to adjust |a| so that |a%4==0|, since the following routines test for the \.{\\global} prefix as follows. @d global (a >= 4) @d define(X, Y, Z) if (global) geq_define(X, Y, Z);@+else eq_define(X, Y, Z) @d word_define(X, Y) if (global) geq_word_define(X, Y);@+else eq_word_define(X, Y) @<Adjust \(f)for the setting of \.{\\globaldefs}@>= if (global_defs!=0) if (global_defs < 0) {@+if (global) a=a-4; } else{@+if (!global) a=a+4; } @ When a control sequence is to be defined, by \.{\\def} or \.{\\let} or something similar, the |get_r_token| routine will substitute a special control sequence for a token that is not redefinable. @<Declare subprocedures for |prefixed_command|@>= void get_r_token(void) {@+ restart: @/do@+{get_token(); }@+ while (!(cur_tok!=space_token)); if ((cur_cs==0)||(cur_cs > frozen_control_sequence)) {@+print_err("Missing control sequence inserted"); @.Missing control...@> help5("Please don't say `\\def cs{...}', say `\\def\\cs{...}'.")@/ ("I've inserted an inaccessible control sequence so that your")@/ ("definition will be completed without mixing me up too badly.")@/ ("You can recover graciously from this error, if you're")@/ ("careful; see exercise 27.2 in The TeXbook."); @:TeXbook}{\sl The \TeX book@> if (cur_cs==0) back_input(); cur_tok=cs_token_flag+frozen_protection;ins_error();goto restart; } } @ @<Initialize table entries...@>= text(frozen_protection)=@[@<|"inaccessible"|@>@]; @ Here's an example of the way many of the following routines operate. (Unfortunately, they aren't all as simple as this.) @<Assignments@>= case set_font: define(cur_font_loc, data, cur_chr);@+break; @ When a |def| command has been scanned, |cur_chr| is odd if the definition is supposed to be global, and |cur_chr >= 2| if the definition is supposed to be expanded. @<Assignments@>= case def: {@+if (odd(cur_chr)&&!global&&(global_defs >= 0)) a=a+4; e=(cur_chr >= 2);get_r_token();p=cur_cs; q=scan_toks(true, e);define(p, call+(a%4), def_ref); } @+break; @ Both \.{\\let} and \.{\\futurelet} share the command code |let|. @<Put each...@>= primitive(@[@<|"let"|@>@], let, normal);@/ @!@:let_}{\.{\\let} primitive@> primitive(@[@<|"futurelet"|@>@], let, normal+1);@/ @!@:future_let_}{\.{\\futurelet} primitive@> @ @<Cases of |print_cmd_chr|...@>= case let: if (chr_code!=normal) print_esc(@[@<|"futurelet"|@>@]);@+else print_esc(@[@<|"let"|@>@]);@+break; @ @<Assignments@>= case let: {@+n=cur_chr; get_r_token();p=cur_cs; if (n==normal) {@+@/do@+{get_token(); }@+ while (!(cur_cmd!=spacer)); if (cur_tok==other_token+'=') {@+get_token(); if (cur_cmd==spacer) get_token(); } } else{@+get_token();q=cur_tok;get_token();back_input(); cur_tok=q;back_input(); /*look ahead, then back up*/ } /*note that |back_input| doesn't affect |cur_cmd|, |cur_chr|*/ if (cur_cmd >= call) add_token_ref(cur_chr); define(p, cur_cmd, cur_chr); } @+break; @ A \.{\\chardef} creates a control sequence whose |cmd| is |char_given|; a \.{\\mathchardef} creates a control sequence whose |cmd| is |math_given|; and the corresponding |chr| is the character code or math code. A \.{\\countdef} or \.{\\dimendef} or \.{\\skipdef} or \.{\\muskipdef} creates a control sequence whose |cmd| is |assign_int| or \dots\ or |assign_mu_glue|, and the corresponding |chr| is the |eqtb| location of the internal register in question. @d char_def_code 0 /*|shorthand_def| for \.{\\chardef}*/ @d math_char_def_code 1 /*|shorthand_def| for \.{\\mathchardef}*/ @d count_def_code 2 /*|shorthand_def| for \.{\\countdef}*/ @d dimen_def_code 3 /*|shorthand_def| for \.{\\dimendef}*/ @d skip_def_code 4 /*|shorthand_def| for \.{\\skipdef}*/ @d mu_skip_def_code 5 /*|shorthand_def| for \.{\\muskipdef}*/ @d toks_def_code 6 /*|shorthand_def| for \.{\\toksdef}*/ @<Put each...@>= primitive(@[@<|"chardef"|@>@], shorthand_def, char_def_code);@/ @!@:char_def_}{\.{\\chardef} primitive@> primitive(@[@<|"mathchardef"|@>@], shorthand_def, math_char_def_code);@/ @!@:math_char_def_}{\.{\\mathchardef} primitive@> primitive(@[@<|"countdef"|@>@], shorthand_def, count_def_code);@/ @!@:count_def_}{\.{\\countdef} primitive@> primitive(@[@<|"dimendef"|@>@], shorthand_def, dimen_def_code);@/ @!@:dimen_def_}{\.{\\dimendef} primitive@> primitive(@[@<|"skipdef"|@>@], shorthand_def, skip_def_code);@/ @!@:skip_def_}{\.{\\skipdef} primitive@> primitive(@[@<|"muskipdef"|@>@], shorthand_def, mu_skip_def_code);@/ @!@:mu_skip_def_}{\.{\\muskipdef} primitive@> primitive(@[@<|"toksdef"|@>@], shorthand_def, toks_def_code);@/ @!@:toks_def_}{\.{\\toksdef} primitive@> @ @<Cases of |print_cmd_chr|...@>= case shorthand_def: switch (chr_code) { case char_def_code: print_esc(@[@<|"chardef"|@>@]);@+break; case math_char_def_code: print_esc(@[@<|"mathchardef"|@>@]);@+break; case count_def_code: print_esc(@[@<|"countdef"|@>@]);@+break; case dimen_def_code: print_esc(@[@<|"dimendef"|@>@]);@+break; case skip_def_code: print_esc(@[@<|"skipdef"|@>@]);@+break; case mu_skip_def_code: print_esc(@[@<|"muskipdef"|@>@]);@+break; default:print_esc(@[@<|"toksdef"|@>@]); } @+break; case char_given: {@+print_esc(@[@<|"char"|@>@]);print_hex(chr_code); } @+break; case math_given: {@+print_esc(@[@<|"mathchar"|@>@]);print_hex(chr_code); } @+break; @ We temporarily define |p| to be |relax|, so that an occurrence of |p| while scanning the definition will simply stop the scanning instead of producing an ``undefined control sequence'' error or expanding the previous meaning. This allows, for instance, `\.{\\chardef\\foo=123\\foo}'. @<Assignments@>= case shorthand_def: {@+n=cur_chr;get_r_token();p=cur_cs;define(p, relax, 256); scan_optional_equals(); switch (n) { case char_def_code: {@+scan_char_num();define(p, char_given, cur_val); } @+break; case math_char_def_code: {@+scan_fifteen_bit_int();define(p, math_given, cur_val); } @+break; default:{@+scan_eight_bit_int(); switch (n) { case count_def_code: define(p, assign_int, count_base+cur_val);@+break; case dimen_def_code: define(p, assign_dimen, scaled_base+cur_val);@+break; case skip_def_code: define(p, assign_glue, skip_base+cur_val);@+break; case mu_skip_def_code: define(p, assign_mu_glue, mu_skip_base+cur_val);@+break; case toks_def_code: define(p, assign_toks, toks_base+cur_val); } /*there are no other cases*/ } } } @+break; @ @<Assignments@>= case read_to_cs: {@+scan_int();n=cur_val; if (!scan_keyword(@[@<|"to"|@>@])) @.to@> {@+print_err("Missing `to' inserted"); @.Missing `to'...@> help2("You should have said `\\read<number> to \\cs'.")@/ ("I'm going to look for the \\cs now.");error(); } get_r_token(); p=cur_cs;read_toks(n, p);define(p, call, cur_val); } @+break; @ The token-list parameters, \.{\\output} and \.{\\everypar}, etc., receive their values in the following way. (For safety's sake, we place an enclosing pair of braces around an \.{\\output} list.) @<Assignments@>= case toks_register: case assign_toks: {@+q=cur_cs; if (cur_cmd==toks_register) {@+scan_eight_bit_int();p=toks_base+cur_val; } else p=cur_chr; /*|p==every_par_loc| or |output_routine_loc| or \dots*/ scan_optional_equals(); @<Get the next non-blank non-relax non-call token@>; if (cur_cmd!=left_brace) @<If the right-hand side is a token parameter or token register, finish the assignment and |goto done|@>; back_input();cur_cs=q;q=scan_toks(false, false); if (link(def_ref)==null) /*empty list: revert to the default*/ {@+define(p, undefined_cs, null);free_avail(def_ref); } else{@+if (p==output_routine_loc) /*enclose in curlies*/ {@+link(q)=get_avail();q=link(q); info(q)=right_brace_token+'}'; q=get_avail();info(q)=left_brace_token+'{'; link(q)=link(def_ref);link(def_ref)=q; } define(p, call, def_ref); } } @+break; @ @<If the right-hand side is a token parameter...@>= {@+if (cur_cmd==toks_register) {@+scan_eight_bit_int();cur_cmd=assign_toks;cur_chr=toks_base+cur_val; } if (cur_cmd==assign_toks) {@+q=equiv(cur_chr); if (q==null) define(p, undefined_cs, null); else{@+add_token_ref(q);define(p, call, q); } goto done; } } @ Similar routines are used to assign values to the numeric parameters. @<Assignments@>= case assign_int: {@+p=cur_chr;scan_optional_equals();scan_int(); word_define(p, cur_val); } @+break; case assign_dimen: {@+p=cur_chr;scan_optional_equals(); scan_normal_dimen;word_define(p, cur_val); } @+break; case assign_glue: case assign_mu_glue: {@+p=cur_chr;n=cur_cmd;scan_optional_equals(); if (n==assign_mu_glue) scan_glue(mu_val);@+else scan_glue(glue_val); trap_zero_glue(); define(p, glue_ref, cur_val); } @+break; @ When a glue register or parameter becomes zero, it will always point to |zero_glue| because of the following procedure. (Exception: The tabskip glue isn't trapped while preambles are being scanned.) @<Declare subprocedures for |prefixed_command|@>= void trap_zero_glue(void) {@+if ((width(cur_val)==0)&&(stretch(cur_val)==0)&&(shrink(cur_val)==0)) {@+add_glue_ref(zero_glue); delete_glue_ref(cur_val);cur_val=zero_glue; } } @ The various character code tables are changed by the |def_code| commands, and the font families are declared by |def_family|. @<Put each...@>= primitive(@[@<|"catcode"|@>@], def_code, cat_code_base); @!@:cat_code_}{\.{\\catcode} primitive@> primitive(@[@<|"mathcode"|@>@], def_code, math_code_base); @!@:math_code_}{\.{\\mathcode} primitive@> primitive(@[@<|"lccode"|@>@], def_code, lc_code_base); @!@:lc_code_}{\.{\\lccode} primitive@> primitive(@[@<|"uccode"|@>@], def_code, uc_code_base); @!@:uc_code_}{\.{\\uccode} primitive@> primitive(@[@<|"sfcode"|@>@], def_code, sf_code_base); @!@:sf_code_}{\.{\\sfcode} primitive@> primitive(@[@<|"delcode"|@>@], def_code, del_code_base); @!@:del_code_}{\.{\\delcode} primitive@> primitive(@[@<|"textfont"|@>@], def_family, math_font_base); @!@:text_font_}{\.{\\textfont} primitive@> primitive(@[@<|"scriptfont"|@>@], def_family, math_font_base+script_size); @!@:script_font_}{\.{\\scriptfont} primitive@> primitive(@[@<|"scriptscriptfont"|@>@], def_family, math_font_base+script_script_size); @!@:script_script_font_}{\.{\\scriptscriptfont} primitive@> @ @<Cases of |print_cmd_chr|...@>= case def_code: if (chr_code==cat_code_base) print_esc(@[@<|"catcode"|@>@]); else if (chr_code==math_code_base) print_esc(@[@<|"mathcode"|@>@]); else if (chr_code==lc_code_base) print_esc(@[@<|"lccode"|@>@]); else if (chr_code==uc_code_base) print_esc(@[@<|"uccode"|@>@]); else if (chr_code==sf_code_base) print_esc(@[@<|"sfcode"|@>@]); else print_esc(@[@<|"delcode"|@>@]);@+break; case def_family: print_size(chr_code-math_font_base);@+break; @ The different types of code values have different legal ranges; the following program is careful to check each case properly. @<Assignments@>= case def_code: {@+@<Let |n| be the largest legal code value, based on |cur_chr|@>; p=cur_chr;scan_char_num();p=p+cur_val;scan_optional_equals(); scan_int(); if (((cur_val < 0)&&(p < del_code_base))||(cur_val > n)) {@+print_err("Invalid code (");print_int(cur_val); @.Invalid code@> if (p < del_code_base) print_str("), should be in the range 0.."); else print_str("), should be at most "); print_int(n); help1("I'm going to use 0 instead of that illegal code value.");@/ error();cur_val=0; } if (p < math_code_base) define(p, data, cur_val); else if (p < del_code_base) define(p, data, hi(cur_val)); else word_define(p, cur_val); } @+break; @ @<Let |n| be the largest...@>= if (cur_chr==cat_code_base) n=max_char_code; else if (cur_chr==math_code_base) n=0100000; else if (cur_chr==sf_code_base) n=077777; else if (cur_chr==del_code_base) n=077777777; else n=255 @ @<Assignments@>= case def_family: {@+p=cur_chr;scan_four_bit_int();p=p+cur_val; scan_optional_equals();scan_font_ident();define(p, data, cur_val); } @+break; @ Next we consider changes to \TeX's numeric registers. @<Assignments@>= case register: case advance: case multiply: case divide: do_register_command(a);@+break; @ We use the fact that |register < advance < multiply < divide|. @<Declare subprocedures for |prefixed_command|@>= void do_register_command(small_number @!a) {@+ pointer l, @!q, @!r, @!s; /*for list manipulation*/ uint8_t @!p; /*type of register involved*/ q=cur_cmd; @<Compute the register location |l| and its type |p|; but |return| if invalid@>; if (q==register) scan_optional_equals(); else if (scan_keyword(@[@<|"by"|@>@])) do_nothing; /*optional `\.{by}'*/ @.by@> arith_error=false; if (q < multiply) @<Compute result of |register| or |advance|, put it in |cur_val|@>@; else@<Compute result of |multiply| or |divide|, put it in |cur_val|@>; if (arith_error) {@+print_err("Arithmetic overflow"); @.Arithmetic overflow@> help2("I can't carry out that multiplication or division,")@/ ("since the result is out of range."); if (p >= glue_val) delete_glue_ref(cur_val); error();return; } if (p < glue_val) word_define(l, cur_val); else{@+trap_zero_glue();define(l, glue_ref, cur_val); } } @ Here we use the fact that the consecutive codes |int_val dotdot mu_val| and |assign_int dotdot assign_mu_glue| correspond to each other nicely. @<Compute the register location |l| and its type |p|...@>= {@+if (q!=register) {@+get_x_token(); if ((cur_cmd >= assign_int)&&(cur_cmd <= assign_mu_glue)) {@+l=cur_chr;p=cur_cmd-assign_int;goto found; } if (cur_cmd!=register) {@+print_err("You can't use `");print_cmd_chr(cur_cmd, cur_chr); @.You can't use x after ...@> print_str("' after ");print_cmd_chr(q, 0); help1("I'm forgetting what you said and not changing anything."); error();return; } } p=cur_chr;scan_eight_bit_int(); switch (p) { case int_val: l=cur_val+count_base;@+break; case dimen_val: l=cur_val+scaled_base;@+break; case glue_val: l=cur_val+skip_base;@+break; case mu_val: l=cur_val+mu_skip_base; } /*there are no other cases*/ } found: @ @<Compute result of |register| or |advance|...@>= if (p < glue_val) {@+if (p==int_val) scan_int();@+else scan_normal_dimen; if (q==advance) cur_val=cur_val+eqtb[l].i; } else{@+scan_glue(p); if (q==advance) @<Compute the sum of two glue specs@>; } @ @<Compute the sum of two glue specs@>= {@+q=new_spec(cur_val);r=equiv(l); delete_glue_ref(cur_val); width(q)=width(q)+width(r); if (stretch(q)==0) stretch_order(q)=normal; if (stretch_order(q)==stretch_order(r)) stretch(q)=stretch(q)+stretch(r); else if ((stretch_order(q) < stretch_order(r))&&(stretch(r)!=0)) {@+stretch(q)=stretch(r);stretch_order(q)=stretch_order(r); } if (shrink(q)==0) shrink_order(q)=normal; if (shrink_order(q)==shrink_order(r)) shrink(q)=shrink(q)+shrink(r); else if ((shrink_order(q) < shrink_order(r))&&(shrink(r)!=0)) {@+shrink(q)=shrink(r);shrink_order(q)=shrink_order(r); } cur_val=q; } @ @<Compute result of |multiply| or |divide|...@>= {@+scan_int(); if (p < glue_val) if (q==multiply) if (p==int_val) cur_val=mult_integers(eqtb[l].i, cur_val); else cur_val=nx_plus_y(eqtb[l].i, cur_val, 0); else cur_val=x_over_n(eqtb[l].i, cur_val); else{@+s=equiv(l);r=new_spec(s); if (q==multiply) {@+width(r)=nx_plus_y(width(s), cur_val, 0); stretch(r)=nx_plus_y(stretch(s), cur_val, 0); shrink(r)=nx_plus_y(shrink(s), cur_val, 0); } else{@+width(r)=x_over_n(width(s), cur_val); stretch(r)=x_over_n(stretch(s), cur_val); shrink(r)=x_over_n(shrink(s), cur_val); } cur_val=r; } } @ The processing of boxes is somewhat different, because we may need to scan and create an entire box before we actually change the value of the old one. @<Assignments@>= case set_box: {@+scan_eight_bit_int(); if (global) n=256+cur_val;@+else n=cur_val; scan_optional_equals(); if (set_box_allowed) scan_box(box_flag+n); else{@+print_err("Improper ");print_esc(@[@<|"setbox"|@>@]); @.Improper \\setbox@> help2("Sorry, \\setbox is not allowed after \\halign in a display,")@/ ("or between \\accent and an accented character.");error(); } } @+break; @ The |space_factor| or |prev_depth| settings are changed when a |set_aux| command is sensed. Similarly, |prev_graf| is changed in the presence of |set_prev_graf|, and |dead_cycles| or |insert_penalties| in the presence of |set_page_int|. These definitions are always global. When some dimension of a box register is changed, the change isn't exactly global; but \TeX\ does not look at the \.{\\global} switch. @<Assignments@>= case set_aux: alter_aux();@+break; case set_prev_graf: alter_prev_graf();@+break; case set_page_dimen: alter_page_so_far();@+break; case set_page_int: alter_integer();@+break; case set_box_dimen: alter_box_dimen();@+break; @ @<Declare subprocedures for |prefixed_command|@>= void alter_aux(void) {@+halfword c; /*|hmode| or |vmode|*/ if (cur_chr!=abs(mode)) report_illegal_case(); else{@+c=cur_chr;scan_optional_equals(); if (c==vmode) {@+scan_normal_dimen;prev_depth=cur_val; } else{@+scan_int(); if ((cur_val <= 0)||(cur_val > 32767)) {@+print_err("Bad space factor"); @.Bad space factor@> help1("I allow only values in the range 1..32767 here."); int_error(cur_val); } else space_factor=cur_val; } } } @ @<Declare subprocedures for |prefixed_command|@>= void alter_prev_graf(void) {@+uint8_t p; /*index into |nest|*/ nest[nest_ptr]=cur_list;p=nest_ptr; while (abs(nest[p].mode_field)!=vmode) decr(p); scan_optional_equals();scan_int(); if (cur_val < 0) {@+print_err("Bad ");print_esc(@[@<|"prevgraf"|@>@]); @.Bad \\prevgraf@> help1("I allow only nonnegative values here."); int_error(cur_val); } else{@+nest[p].pg_field=cur_val;cur_list=nest[nest_ptr]; } } @ @<Declare subprocedures for |prefixed_command|@>= void alter_page_so_far(void) {@+uint8_t c; /*index into |page_so_far|*/ c=cur_chr;scan_optional_equals();scan_normal_dimen; page_so_far[c]=cur_val; } @ @<Declare subprocedures for |prefixed_command|@>= void alter_integer(void) {@+uint8_t c; /*0 for \.{\\deadcycles}, 1 for \.{\\insertpenalties}*/ c=cur_chr;scan_optional_equals();scan_int(); if (c==0) dead_cycles=cur_val; else insert_penalties=cur_val; } @ @<Declare subprocedures for |prefixed_command|@>= void alter_box_dimen(void) {@+small_number c; /*|width_offset| or |height_offset| or |depth_offset|*/ eight_bits @!b; /*box number*/ c=cur_chr;scan_eight_bit_int();b=cur_val;scan_optional_equals(); scan_normal_dimen; if (box(b)!=null) mem[box(b)+c].sc=cur_val; } @ Paragraph shapes are set up in the obvious way. @<Assignments@>= case set_shape: {@+scan_optional_equals();scan_int();n=cur_val; if (n <= 0) p=null; else{@+p=get_node(2*n+1);info(p)=n; for (j=1; j<=n; j++) {@+scan_normal_dimen; mem[p+2*j-1].sc=cur_val; /*indentation*/ scan_normal_dimen; mem[p+2*j].sc=cur_val; /*width*/ } } define(par_shape_loc, shape_ref, p); } @+break; @ Here's something that isn't quite so obvious. It guarantees that |info(par_shape_ptr)| can hold any positive~|n| for which |get_node(2*n+1)| doesn't overflow the memory capacity. @<Check the ``constant''...@>= if (2*max_halfword < mem_top-mem_min) bad=41; @ New hyphenation data is loaded by the |hyph_data| command. @<Put each...@>= primitive(@[@<|"hyphenation"|@>@], hyph_data, 0); @!@:hyphenation_}{\.{\\hyphenation} primitive@> primitive(@[@<|"patterns"|@>@], hyph_data, 1); @!@:patterns_}{\.{\\patterns} primitive@> @ @<Cases of |print_cmd_chr|...@>= case hyph_data: if (chr_code==1) print_esc(@[@<|"patterns"|@>@]); else print_esc(@[@<|"hyphenation"|@>@]);@+break; @ @<Assignments@>= case hyph_data: if (cur_chr==1) { #ifdef @!INIT new_patterns();goto done;@; #endif print_err("Patterns can be loaded only by INITEX"); @.Patterns can be...@> help0;error(); @/do@+{get_token();}@+ while (!(cur_cmd==right_brace)); /*flush the patterns*/ return; } else{@+new_hyph_exceptions();goto done; } @+break; @ All of \TeX's parameters are kept in |eqtb| except the font information, the interaction mode, and the hyphenation tables; these are strictly global. @<Assignments@>= case assign_font_dimen: {@+find_font_dimen(true);k=cur_val; scan_optional_equals();scan_normal_dimen;font_info[k].sc=cur_val; } @+break; case assign_font_int: {@+n=cur_chr;scan_font_ident();f=cur_val; scan_optional_equals();scan_int(); if (n==0) hyphen_char[f]=cur_val;@+else skew_char[f]=cur_val; } @+break; @ @<Put each...@>= primitive(@[@<|"hyphenchar"|@>@], assign_font_int, 0); @!@:hyphen_char_}{\.{\\hyphenchar} primitive@> primitive(@[@<|"skewchar"|@>@], assign_font_int, 1); @!@:skew_char_}{\.{\\skewchar} primitive@> @ @<Cases of |print_cmd_chr|...@>= case assign_font_int: if (chr_code==0) print_esc(@[@<|"hyphenchar"|@>@]); else print_esc(@[@<|"skewchar"|@>@]);@+break; @ Here is where the information for a new font gets loaded. @<Assignments@>= case def_font: new_font(a);@+break; @ @<Declare subprocedures for |prefixed_command|@>= void new_font(small_number @!a) {@+ pointer u; /*user's font identifier*/ scaled @!s; /*stated ``at'' size, or negative of scaled magnification*/ int @!f; /*runs through existing fonts*/ str_number @!t; /*name for the frozen font identifier*/ uint8_t @!old_setting; /*holds |selector| setting*/ str_number @!flushable_string; /*string not yet referenced*/ if (job_name==0) open_log_file(); /*avoid confusing \.{texput} with the font name*/ @.texput@> get_r_token();u=cur_cs; if (u >= hash_base) t=text(u); else if (u >= single_base) if (u==null_cs) t=@[@<|"FONT"|@>@];@+else t=u-single_base; else{@+old_setting=selector;selector=new_string; print_str("FONT");print(u-active_base);selector=old_setting; @.FONTx@> str_room(1);t=make_string(); } define(u, set_font, null_font);scan_optional_equals();scan_file_name(); @<Scan the font size specification@>; @<If this font has already been loaded, set |f| to the internal font number and |goto common_ending|@>; f=read_font_info(u, cur_name, cur_area, s); common_ending: equiv(u)=f;eqtb[font_id_base+f]=eqtb[u];font_id_text(f)=t; } @ @<Scan the font size specification@>= name_in_progress=true; /*this keeps |cur_name| from being changed*/ if (scan_keyword(@[@<|"at"|@>@])) @<Put the \(p)(positive) `at' size into |s|@>@; @.at@> else if (scan_keyword(@[@<|"scaled"|@>@])) @.scaled@> {@+scan_int();s=-cur_val; if ((cur_val <= 0)||(cur_val > 32768)) {@+print_err("Illegal magnification has been changed to 1000");@/ @.Illegal magnification...@> help1("The magnification ratio must be between 1 and 32768."); int_error(cur_val);s=-1000; } } else s=-1000; name_in_progress=false @ @<Put the \(p)(positive) `at' size into |s|@>= {@+scan_normal_dimen;s=cur_val; if ((s <= 0)||(s >= 01000000000)) {@+print_err("Improper `at' size ("); print_scaled(s);print_str("pt), replaced by 10pt"); @.Improper `at' size...@> help2("I can only handle fonts at positive sizes that are")@/ ("less than 2048pt, so I've changed what you said to 10pt."); error();s=10*unity; } } @ When the user gives a new identifier to a font that was previously loaded, the new name becomes the font identifier of record. Font names `\.{xyz}' and `\.{XYZ}' are considered to be different. @<If this font has already been loaded...@>= flushable_string=str_ptr-1; for (f=font_base+1; f<=font_ptr; f++) if (str_eq_str(font_name[f], cur_name)&&str_eq_str(font_area[f], cur_area)) {@+if (cur_name==flushable_string) {@+flush_string;cur_name=font_name[f]; } if (s > 0) {@+if (s==font_size[f]) goto common_ending; } else if (font_size[f]==xn_over_d(font_dsize[f],-s, 1000)) goto common_ending; } @ @<Cases of |print_cmd_chr|...@>= case set_font: {@+print_str("select font ");slow_print(font_name[chr_code]); if (font_size[chr_code]!=font_dsize[chr_code]) {@+print_str(" at ");print_scaled(font_size[chr_code]); print_str("pt"); } } @+break; @ @<Put each...@>= primitive(@[@<|"batchmode"|@>@], set_interaction, batch_mode); @!@:batch_mode_}{\.{\\batchmode} primitive@> primitive(@[@<|"nonstopmode"|@>@], set_interaction, nonstop_mode); @!@:nonstop_mode_}{\.{\\nonstopmode} primitive@> primitive(@[@<|"scrollmode"|@>@], set_interaction, scroll_mode); @!@:scroll_mode_}{\.{\\scrollmode} primitive@> primitive(@[@<|"errorstopmode"|@>@], set_interaction, error_stop_mode); @!@:error_stop_mode_}{\.{\\errorstopmode} primitive@> @ @<Cases of |print_cmd_chr|...@>= case set_interaction: switch (chr_code) { case batch_mode: print_esc(@[@<|"batchmode"|@>@]);@+break; case nonstop_mode: print_esc(@[@<|"nonstopmode"|@>@]);@+break; case scroll_mode: print_esc(@[@<|"scrollmode"|@>@]);@+break; default:print_esc(@[@<|"errorstopmode"|@>@]); } @+break; @ @<Assignments@>= case set_interaction: new_interaction();@+break; @ @<Declare subprocedures for |prefixed_command|@>= void new_interaction(void) {@+print_ln(); interaction=cur_chr; @<Initialize the print |selector| based on |interaction|@>; if (log_opened) selector=selector+2; } @ The \.{\\afterassignment} command puts a token into the global variable |after_token|. This global variable is examined just after every assignment has been performed. @<Glob...@>= halfword @!after_token; /*zero, or a saved token*/ @ @<Set init...@>= after_token=0; @ @<Cases of |main_control| that don't...@>= any_mode(after_assignment): {@+get_token();after_token=cur_tok; } @+break; @ @<Insert a token saved by \.{\\afterassignment}, if any@>= if (after_token!=0) {@+cur_tok=after_token;back_input();after_token=0; } @ Here is a procedure that might be called `Get the next non-blank non-relax non-call non-assignment token'. @<Declare act...@>= void do_assignments(void) {@+ loop{@+@<Get the next non-blank non-relax...@>; if (cur_cmd <= max_non_prefixed_command) return; set_box_allowed=false;prefixed_command();set_box_allowed=true; } } @ @<Cases of |main_control| that don't...@>= any_mode(after_group): {@+get_token();save_for_after(cur_tok); } @+break; @ Files for \.{\\read} are opened and closed by the |in_stream| command. @<Put each...@>= primitive(@[@<|"openin"|@>@], in_stream, 1); @!@:open_in_}{\.{\\openin} primitive@> primitive(@[@<|"closein"|@>@], in_stream, 0); @!@:close_in_}{\.{\\closein} primitive@> @ @<Cases of |print_cmd_chr|...@>= case in_stream: if (chr_code==0) print_esc(@[@<|"closein"|@>@]); else print_esc(@[@<|"openin"|@>@]);@+break; @ @<Cases of |main_control| that don't...@>= any_mode(in_stream): open_or_close_in();@+break; @ @<Declare act...@>= void open_or_close_in(void) {@+uint8_t c; /*1 for \.{\\openin}, 0 for \.{\\closein}*/ uint8_t @!n; /*stream number*/ c=cur_chr;scan_four_bit_int();n=cur_val; if (read_open[n]!=closed) {@+a_close(&read_file[n]);read_open[n]=closed; } if (c!=0) {@+scan_optional_equals();scan_file_name(); if (cur_ext==empty_string) cur_ext=@[@<|".tex"|@>@]; pack_cur_name; if (a_open_in(&read_file[n])) read_open[n]=just_open; } } @ The user can issue messages to the terminal, regardless of the current mode. @<Cases of |main_control| that don't...@>= any_mode(message): issue_message();@+break; @ @<Put each...@>= primitive(@[@<|"message"|@>@], message, 0); @!@:message_}{\.{\\message} primitive@> primitive(@[@<|"errmessage"|@>@], message, 1); @!@:err_message_}{\.{\\errmessage} primitive@> @ @<Cases of |print_cmd_chr|...@>= case message: if (chr_code==0) print_esc(@[@<|"message"|@>@]); else print_esc(@[@<|"errmessage"|@>@]);@+break; @ @<Declare act...@>= void issue_message(void) {@+uint8_t old_setting; /*holds |selector| setting*/ uint8_t @!c; /*identifies \.{\\message} and \.{\\errmessage}*/ str_number @!s; /*the message*/ c=cur_chr;link(garbage)=scan_toks(false, true); old_setting=selector;selector=new_string; token_show(def_ref);selector=old_setting; flush_list(def_ref); str_room(1);s=make_string(); if (c==0) @<Print string |s| on the terminal@>@; else@<Print string |s| as an error message@>; flush_string; } @ @<Print string |s| on the terminal@>= {@+if (term_offset+length(s) > max_print_line-2) print_ln(); else if ((term_offset > 0)||(file_offset > 0)) print_char(' '); slow_print(s);update_terminal; } @ If \.{\\errmessage} occurs often in |scroll_mode|, without user-defined \.{\\errhelp}, we don't want to give a long help message each time. So we give a verbose explanation only once. @<Glob...@>= bool @!long_help_seen; /*has the long \.{\\errmessage} help been used?*/ @ @<Set init...@>=long_help_seen=false; @ @<Print string |s| as an error message@>= {@+print_err("");slow_print(s); if (err_help!=null) use_err_help=true; else if (long_help_seen) help1("(That was another \\errmessage.)")@; else{@+if (interaction < error_stop_mode) long_help_seen=true; help4("This error message was generated by an \\errmessage")@/ ("command, so I can't give any explicit help.")@/ ("Pretend that you're Hercule Poirot: Examine all clues,")@/ @^Poirot, Hercule@> ("and deduce the truth by order and method."); } error();use_err_help=false; } @ The |error| routine calls on |give_err_help| if help is requested from the |err_help| parameter. @p void give_err_help(void) {@+token_show(err_help); } @ The \.{\\uppercase} and \.{\\lowercase} commands are implemented by building a token list and then changing the cases of the letters in it. @<Cases of |main_control| that don't...@>= any_mode(case_shift): shift_case();@+break; @ @<Put each...@>= primitive(@[@<|"lowercase"|@>@], case_shift, lc_code_base); @!@:lowercase_}{\.{\\lowercase} primitive@> primitive(@[@<|"uppercase"|@>@], case_shift, uc_code_base); @!@:uppercase_}{\.{\\uppercase} primitive@> @ @<Cases of |print_cmd_chr|...@>= case case_shift: if (chr_code==lc_code_base) print_esc(@[@<|"lowercase"|@>@]); else print_esc(@[@<|"uppercase"|@>@]);@+break; @ @<Declare act...@>= void shift_case(void) {@+pointer b; /*|lc_code_base| or |uc_code_base|*/ pointer @!p; /*runs through the token list*/ halfword @!t; /*token*/ eight_bits @!c; /*character code*/ b=cur_chr;p=scan_toks(false, false);p=link(def_ref); while (p!=null) {@+@<Change the case of the token in |p|, if a change is appropriate@>; p=link(p); } back_list(link(def_ref));free_avail(def_ref); /*omit reference count*/ } @ When the case of a |chr_code| changes, we don't change the |cmd|. We also change active characters, using the fact that |cs_token_flag+active_base| is a multiple of~256. @^data structure assumptions@> @<Change the case of the token in |p|, if a change is appropriate@>= t=info(p); if (t < cs_token_flag+single_base) {@+c=t%256; if (equiv(b+c)!=0) info(p)=t-c+equiv(b+c); } @ We come finally to the last pieces missing from |main_control|, namely the `\.{\\show}' commands that are useful when debugging. @<Cases of |main_control| that don't...@>= any_mode(xray): show_whatever();@+break; @ @d show_code 0 /* \.{\\show} */ @d show_box_code 1 /* \.{\\showbox} */ @d show_the_code 2 /* \.{\\showthe} */ @d show_lists 3 /* \.{\\showlists} */ @<Put each...@>= primitive(@[@<|"show"|@>@], xray, show_code); @!@:show_}{\.{\\show} primitive@> primitive(@[@<|"showbox"|@>@], xray, show_box_code); @!@:show_box_}{\.{\\showbox} primitive@> primitive(@[@<|"showthe"|@>@], xray, show_the_code); @!@:show_the_}{\.{\\showthe} primitive@> primitive(@[@<|"showlists"|@>@], xray, show_lists); @!@:show_lists_}{\.{\\showlists} primitive@> @ @<Cases of |print_cmd_chr|...@>= case xray: switch (chr_code) { case show_box_code: print_esc(@[@<|"showbox"|@>@]);@+break; case show_the_code: print_esc(@[@<|"showthe"|@>@]);@+break; case show_lists: print_esc(@[@<|"showlists"|@>@]);@+break; default:print_esc(@[@<|"show"|@>@]); } @+break; @ @<Declare act...@>= void show_whatever(void) {@+ pointer p; /*tail of a token list to show*/ switch (cur_chr) { case show_lists: {@+begin_diagnostic();show_activities(); } @+break; case show_box_code: @<Show the current contents of a box@>@;@+break; case show_code: @<Show the current meaning of a token, then |goto common_ending|@>@; default:@<Show the current value of some parameter or register, then |goto common_ending|@>@; } @/ @<Complete a potentially long \.{\\show} command@>; common_ending: if (interaction < error_stop_mode) {@+help0;decr(error_count); } else if (tracing_online > 0) {@+@t@>@;@/ help3("This isn't an error message; I'm just \\showing something.")@/ ("Type `I\\show...' to show more (e.g., \\show\\cs,")@/ ("\\showthe\\count10, \\showbox255, \\showlists)."); } else{@+@t@>@;@/ help5("This isn't an error message; I'm just \\showing something.")@/ ("Type `I\\show...' to show more (e.g., \\show\\cs,")@/ ("\\showthe\\count10, \\showbox255, \\showlists).")@/ ("And type `I\\tracingonline=1\\show...' to show boxes and")@/ ("lists on your terminal as well as in the transcript file."); } error(); } @ @<Show the current meaning of a token...@>= {@+get_token(); if (interaction==error_stop_mode) wake_up_terminal; print_nl("> "); if (cur_cs!=0) {@+sprint_cs(cur_cs);print_char('='); } print_meaning();goto common_ending; } @ @<Cases of |print_cmd_chr|...@>= case undefined_cs: print_str("undefined");@+break; case call: print_str("macro");@+break; case long_call: print_esc(@[@<|"long macro"|@>@]);@+break; case outer_call: print_esc(@[@<|"outer macro"|@>@]);@+break; case long_outer_call: {@+print_esc(@[@<|"long"|@>@]);print_esc(@[@<|"outer macro"|@>@]); } @+break; case end_template: print_esc(@[@<|"outer endtemplate"|@>@]);@+break; @ @<Show the current contents of a box@>= {@+scan_eight_bit_int();begin_diagnostic(); print_nl("> \\box");print_int(cur_val);print_char('='); if (box(cur_val)==null) print_str("void"); else show_box(box(cur_val)); } @ @<Show the current value of some parameter...@>= {@+p=the_toks(); if (interaction==error_stop_mode) wake_up_terminal; print_nl("> ");token_show(temp_head); flush_list(link(temp_head));goto common_ending; } @ @<Complete a potentially long \.{\\show} command@>= end_diagnostic(true);print_err("OK"); @.OK@> if (selector==term_and_log) if (tracing_online <= 0) {@+selector=term_only;print_str(" (see the transcript file)"); selector=term_and_log; } @* Dumping and undumping the tables. After \.{INITEX} has seen a collection of fonts and macros, it can write all the necessary information on an auxiliary file so that production versions of \TeX\ are able to initialize their memory at high speed. The present section of the program takes care of such output and input. We shall consider simultaneously the processes of storing and restoring, so that the inverse relation between them is clear. @.INITEX@> The global variable |format_ident| is a string that is printed right after the |banner| line when \TeX\ is ready to start. For \.{INITEX} this string says simply `\.{(INITEX)}'; for other versions of \TeX\ it says, for example, `\.{(preloaded format=plain 1982.11.19)}', showing the year, month, and day that the format file was created. We have |format_ident==0| before \TeX's tables are loaded. @<Glob...@>= str_number @!format_ident; @ @<Set init...@>= format_ident=0; @ @<Initialize table entries...@>= format_ident=@[@<|" (INITEX)"|@>@]; @ @<Declare act...@>= #ifdef @!INIT void store_fmt_file(void) {@+ int j, @!k, @!l; /*all-purpose indices*/ pointer @!p, @!q; /*all-purpose pointers*/ int @!x; /*something to dump*/ four_quarters @!w; /*four ASCII codes*/ @<If dumping is not allowed, abort@>; @<Create the |format_ident|, open the format file, and inform the user that dumping has begun@>; @<Dump constants for consistency check@>; @<Dump the string pool@>; @<Dump the dynamic memory@>; @<Dump the table of equivalents@>; @<Dump the font information@>; @<Dump the hyphenation tables@>; @<Dump a couple more things and the closing check word@>; @<Close the format file@>; } #endif @ Corresponding to the procedure that dumps a format file, we have a function that reads one in. The function returns |false| if the dumped format is incompatible with the present \TeX\ table sizes, etc. @d too_small(X) {@+wake_up_terminal; wterm_ln("---! Must increase the %s", X); @.Must increase the x@> goto bad_fmt; } @p@t\4@>@<Declare the function called |open_fmt_file|@>@; bool load_fmt_file(void) {@+ int j, @!k; /*all-purpose indices*/ pointer @!p, @!q; /*all-purpose pointers*/ int @!x; /*something undumped*/ four_quarters @!w; /*four ASCII codes*/ @<Undump constants for consistency check@>; @<Undump the string pool@>; @<Undump the dynamic memory@>; @<Undump the table of equivalents@>; @<Undump the font information@>; @<Undump the hyphenation tables@>; @<Undump a couple more things and the closing check word@>; return true; /*it worked!*/ bad_fmt: wake_up_terminal; wterm_ln("(Fatal format file error; I'm stymied)"); @.Fatal format file error@> return false; } @ The user is not allowed to dump a format file unless |save_ptr==0|. This condition implies that |cur_level==level_one|, hence the |xeq_level| array is constant and it need not be dumped. @<If dumping is not allowed, abort@>= if (save_ptr!=0) {@+print_err("You can't dump inside a group"); @.You can't dump...@> help1("`{...\\dump}' is a no-no.");succumb; } @ Format files consist of |memory_word| items, and we use the following macros to dump words of different types: @d dump_wd(X) {@+fmt_file.d=X;put(fmt_file);@+} @d dump_int(X) {@+fmt_file.d.i=X;put(fmt_file);@+} @d dump_hh(X) {@+fmt_file.d.hh=X;put(fmt_file);@+} @d dump_qqqq(X) {@+fmt_file.d.qqqq=X;put(fmt_file);@+} @<Glob...@>= word_file @!fmt_file; /*for input or output of format information*/ @ The inverse macros are slightly more complicated, since we need to check the range of the values we are reading in. We say `|undump(a)(b)(x)|' to read an integer value |x| that is supposed to be in the range |a <= x <= b|. @d undump_wd(X) {@+get(fmt_file);X=fmt_file.d;@+} @d undump_int(X) {@+get(fmt_file);X=fmt_file.d.i;@+} @d undump_hh(X) {@+get(fmt_file);X=fmt_file.d.hh;@+} @d undump_qqqq(X) {@+get(fmt_file);X=fmt_file.d.qqqq;@+} @d undump_end_end(X) X=x;@+} @d undump_end(X) (x > X)) goto bad_fmt;@+else undump_end_end @d undump(X) {@+undump_int(x);if ((x < X)||undump_end @d undump_size_end_end(X) too_small(X)@;@+else undump_end_end @d undump_size_end(X) if (x > X) undump_size_end_end @d undump_size(X) {@+undump_int(x); if (x < X) goto bad_fmt;undump_size_end @ The next few sections of the program should make it clear how we use the dump/undump macros. @<Dump constants for consistency check@>= dump_int(0);@/ dump_int(mem_bot);@/ dump_int(mem_top);@/ dump_int(eqtb_size);@/ dump_int(hash_prime);@/ dump_int(hyph_size) @ Sections of a \.{WEB} program that are ``commented out'' still contribute strings to the string pool; therefore \.{INITEX} and \TeX\ will have the same strings. (And it is, of course, a good thing that they do.) @.WEB@> @^string pool@> @<Undump constants for consistency check@>= x=fmt_file.d.i; if (x!=0) goto bad_fmt; /*check that strings are the same*/ undump_int(x); if (x!=mem_bot) goto bad_fmt; undump_int(x); if (x!=mem_top) goto bad_fmt; undump_int(x); if (x!=eqtb_size) goto bad_fmt; undump_int(x); if (x!=hash_prime) goto bad_fmt; undump_int(x); if (x!=hyph_size) goto bad_fmt @ @d dump_four_ASCII w.b0=qi(so(str_pool[k]));w.b1=qi(so(str_pool[k+1])); w.b2=qi(so(str_pool[k+2]));w.b3=qi(so(str_pool[k+3])); dump_qqqq(w) @<Dump the string pool@>= dump_int(pool_ptr); dump_int(str_ptr); for (k=0; k<=str_ptr; k++) dump_int(str_start[k]); k=0; while (k+4 < pool_ptr) {@+dump_four_ASCII;k=k+4; } k=pool_ptr-4;dump_four_ASCII; print_ln();print_int(str_ptr);print_str(" strings of total length "); print_int(pool_ptr) @ @d undump_four_ASCII undump_qqqq(w); str_pool[k]=si(qo(w.b0));str_pool[k+1]=si(qo(w.b1)); str_pool[k+2]=si(qo(w.b2));str_pool[k+3]=si(qo(w.b3)) @<Undump the string pool@>= undump_size(0)(pool_size)("string pool size")(pool_ptr); undump_size(0)(max_strings)("max strings")(str_ptr); for (k=0; k<=str_ptr; k++) undump(0)(pool_ptr)(str_start[k]); k=0; while (k+4 < pool_ptr) {@+undump_four_ASCII;k=k+4; } k=pool_ptr-4;undump_four_ASCII; init_str_ptr=str_ptr;init_pool_ptr=pool_ptr @ By sorting the list of available spaces in the variable-size portion of |mem|, we are usually able to get by without having to dump very much of the dynamic memory. We recompute |var_used| and |dyn_used|, so that \.{INITEX} dumps valid information even when it has not been gathering statistics. @<Dump the dynamic memory@>= sort_avail();var_used=0; dump_int(lo_mem_max);dump_int(rover); p=mem_bot;q=rover;x=0; @/do@+{for (k=p; k<=q+1; k++) dump_wd(mem[k]); x=x+q+2-p;var_used=var_used+q-p; p=q+node_size(q);q=rlink(q); }@+ while (!(q==rover)); var_used=var_used+lo_mem_max-p;dyn_used=mem_end+1-hi_mem_min;@/ for (k=p; k<=lo_mem_max; k++) dump_wd(mem[k]); x=x+lo_mem_max+1-p; dump_int(hi_mem_min);dump_int(avail); for (k=hi_mem_min; k<=mem_end; k++) dump_wd(mem[k]); x=x+mem_end+1-hi_mem_min; p=avail; while (p!=null) {@+decr(dyn_used);p=link(p); } dump_int(var_used);dump_int(dyn_used); print_ln();print_int(x); print_str(" memory locations dumped; current usage is "); print_int(var_used);print_char('&');print_int(dyn_used) @ @<Undump the dynamic memory@>= undump(lo_mem_stat_max+1000)(hi_mem_stat_min-1)(lo_mem_max); undump(lo_mem_stat_max+1)(lo_mem_max)(rover); p=mem_bot;q=rover; @/do@+{for (k=p; k<=q+1; k++) undump_wd(mem[k]); p=q+node_size(q); if ((p > lo_mem_max)||((q >= rlink(q))&&(rlink(q)!=rover))) goto bad_fmt; q=rlink(q); }@+ while (!(q==rover)); for (k=p; k<=lo_mem_max; k++) undump_wd(mem[k]); if (mem_min < mem_bot-2) /*make more low memory available*/ {@+p=llink(rover);q=mem_min+1; link(mem_min)=null;info(mem_min)=null; /*we don't use the bottom word*/ rlink(p)=q;llink(rover)=q;@/ rlink(q)=rover;llink(q)=p;link(q)=empty_flag; node_size(q)=mem_bot-q; } undump(lo_mem_max+1)(hi_mem_stat_min)(hi_mem_min); undump(null)(mem_top)(avail);mem_end=mem_top; for (k=hi_mem_min; k<=mem_end; k++) undump_wd(mem[k]); undump_int(var_used);undump_int(dyn_used) @ @<Dump the table of equivalents@>= @<Dump regions 1 to 4 of |eqtb|@>; @<Dump regions 5 and 6 of |eqtb|@>; dump_int(par_loc);dump_int(write_loc);@/ @<Dump the hash table@>@; @ @<Undump the table of equivalents@>= @<Undump regions 1 to 6 of |eqtb|@>; undump(hash_base)(frozen_control_sequence)(par_loc); par_token=cs_token_flag+par_loc;@/ undump(hash_base)(frozen_control_sequence)(write_loc);@/ @<Undump the hash table@>@; @ The table of equivalents usually contains repeated information, so we dump it in compressed form: The sequence of $n+2$ values $(n,x_1,\ldots,x_n,m)$ in the format file represents $n+m$ consecutive entries of |eqtb|, with |m| extra copies of $x_n$, namely $(x_1,\ldots,x_n,x_n,\ldots,x_n)$. @<Dump regions 1 to 4 of |eqtb|@>= k=active_base; @/do@+{j=k; while (j < int_base-1) {@+if ((equiv(j)==equiv(j+1))&&(eq_type(j)==eq_type(j+1))&&@| (eq_level(j)==eq_level(j+1))) goto found1; incr(j); } l=int_base;goto done1; /*|j==int_base-1|*/ found1: incr(j);l=j; while (j < int_base-1) {@+if ((equiv(j)!=equiv(j+1))||(eq_type(j)!=eq_type(j+1))||@| (eq_level(j)!=eq_level(j+1))) goto done1; incr(j); } done1: dump_int(l-k); while (k < l) {@+dump_wd(eqtb[k]);incr(k); } k=j+1;dump_int(k-l); }@+ while (!(k==int_base)) @ @<Dump regions 5 and 6 of |eqtb|@>= @/do@+{j=k; while (j < eqtb_size) {@+if (eqtb[j].i==eqtb[j+1].i) goto found2; incr(j); } l=eqtb_size+1;goto done2; /*|j==eqtb_size|*/ found2: incr(j);l=j; while (j < eqtb_size) {@+if (eqtb[j].i!=eqtb[j+1].i) goto done2; incr(j); } done2: dump_int(l-k); while (k < l) {@+dump_wd(eqtb[k]);incr(k); } k=j+1;dump_int(k-l); }@+ while (!(k > eqtb_size)) @ @<Undump regions 1 to 6 of |eqtb|@>= k=active_base; @/do@+{undump_int(x); if ((x < 1)||(k+x > eqtb_size+1)) goto bad_fmt; for (j=k; j<=k+x-1; j++) undump_wd(eqtb[j]); k=k+x; undump_int(x); if ((x < 0)||(k+x > eqtb_size+1)) goto bad_fmt; for (j=k; j<=k+x-1; j++) eqtb[j]=eqtb[k-1]; k=k+x; }@+ while (!(k > eqtb_size)) @ A different scheme is used to compress the hash table, since its lower region is usually sparse. When |text(p)!=0| for |p <= hash_used|, we output two words, |p| and |hash[p]|. The hash table is, of course, densely packed for |p >= hash_used|, so the remaining entries are output in a~block. @<Dump the hash table@>= dump_int(hash_used);cs_count=frozen_control_sequence-1-hash_used; for (p=hash_base; p<=hash_used; p++) if (text(p)!=0) {@+dump_int(p);dump_hh(hash[p]);incr(cs_count); } for (p=hash_used+1; p<=undefined_control_sequence-1; p++) dump_hh(hash[p]); dump_int(cs_count);@/ print_ln();print_int(cs_count);print_str(" multiletter control sequences") @ @<Undump the hash table@>= undump(hash_base)(frozen_control_sequence)(hash_used);p=hash_base-1; @/do@+{undump(p+1)(hash_used)(p);undump_hh(hash[p]); }@+ while (!(p==hash_used)); for (p=hash_used+1; p<=undefined_control_sequence-1; p++) undump_hh(hash[p]); undump_int(cs_count) @ @<Dump the font information@>= dump_int(fmem_ptr); for (k=0; k<=fmem_ptr-1; k++) dump_wd(font_info[k]); dump_int(font_ptr); for (k=null_font; k<=font_ptr; k++) @<Dump the array info for internal font number |k|@>; print_ln();print_int(fmem_ptr-7);print_str(" words of font info for "); print_int(font_ptr-font_base);print_str(" preloaded font"); if (font_ptr!=font_base+1) print_char('s') @ @<Undump the font information@>= undump_size(7)(font_mem_size)("font mem size")(fmem_ptr); for (k=0; k<=fmem_ptr-1; k++) undump_wd(font_info[k]); undump_size(font_base)(font_max)("font max")(font_ptr); for (k=null_font; k<=font_ptr; k++) @<Undump the array info for internal font number |k|@>@; @ @<Dump the array info for internal font number |k|@>= {@+dump_qqqq(font_check[k]); dump_int(font_size[k]); dump_int(font_dsize[k]); dump_int(font_params[k]);@/ dump_int(hyphen_char[k]); dump_int(skew_char[k]);@/ dump_int(font_name[k]); dump_int(font_area[k]);@/ dump_int(font_bc[k]); dump_int(font_ec[k]);@/ dump_int(char_base[k]); dump_int(width_base[k]); dump_int(height_base[k]);@/ dump_int(depth_base[k]); dump_int(italic_base[k]); dump_int(lig_kern_base[k]);@/ dump_int(kern_base[k]); dump_int(exten_base[k]); dump_int(param_base[k]);@/ dump_int(font_glue[k]);@/ dump_int(bchar_label[k]); dump_int(font_bchar[k]); dump_int(font_false_bchar[k]);@/ print_nl("\\font");print_esc(font_id_text(k));print_char('='); print_file_name(font_name[k], font_area[k], empty_string); if (font_size[k]!=font_dsize[k]) {@+print_str(" at ");print_scaled(font_size[k]);print_str("pt"); } } @ @<Undump the array info for internal font number |k|@>= {@+undump_qqqq(font_check[k]);@/ undump_int(font_size[k]); undump_int(font_dsize[k]); undump(min_halfword)(max_halfword)(font_params[k]);@/ undump_int(hyphen_char[k]); undump_int(skew_char[k]);@/ undump(0)(str_ptr)(font_name[k]); undump(0)(str_ptr)(font_area[k]);@/ undump(0)(255)(font_bc[k]); undump(0)(255)(font_ec[k]);@/ undump_int(char_base[k]); undump_int(width_base[k]); undump_int(height_base[k]);@/ undump_int(depth_base[k]); undump_int(italic_base[k]); undump_int(lig_kern_base[k]);@/ undump_int(kern_base[k]); undump_int(exten_base[k]); undump_int(param_base[k]);@/ undump(min_halfword)(lo_mem_max)(font_glue[k]);@/ undump(0)(fmem_ptr-1)(bchar_label[k]); undump(min_quarterword)(non_char)(font_bchar[k]); undump(min_quarterword)(non_char)(font_false_bchar[k]); } @ @<Dump the hyphenation tables@>= dump_int(hyph_count); for (k=0; k<=hyph_size; k++) if (hyph_word[k]!=0) {@+dump_int(k);dump_int(hyph_word[k]);dump_int(hyph_list[k]); } print_ln();print_int(hyph_count);print_str(" hyphenation exception"); if (hyph_count!=1) print_char('s'); if (trie_not_ready) init_trie(); dump_int(trie_max); for (k=0; k<=trie_max; k++) dump_hh(trie[k]); dump_int(trie_op_ptr); for (k=1; k<=trie_op_ptr; k++) {@+dump_int(hyf_distance[k]); dump_int(hyf_num[k]); dump_int(hyf_next[k]); } print_nl("Hyphenation trie of length ");print_int(trie_max); @.Hyphenation trie...@> print_str(" has ");print_int(trie_op_ptr);print_str(" op"); if (trie_op_ptr!=1) print_char('s'); print_str(" out of ");print_int(trie_op_size); for (k=255; k>=0; k--) if (trie_used[k] > min_quarterword) {@+print_nl(" ");print_int(qo(trie_used[k])); print_str(" for language ");print_int(k); dump_int(k);dump_int(qo(trie_used[k])); } @ Only ``nonempty'' parts of |op_start| need to be restored. @<Undump the hyphenation tables@>= undump(0)(hyph_size)(hyph_count); for (k=1; k<=hyph_count; k++) {@+undump(0)(hyph_size)(j); undump(0)(str_ptr)(hyph_word[j]); undump(min_halfword)(max_halfword)(hyph_list[j]); } undump_size(0)(trie_size)("trie size")(j); #ifdef @!INIT trie_max=j; #endif for (k=0; k<=j; k++) undump_hh(trie[k]); undump_size(0)(trie_op_size)("trie op size")(j); #ifdef @!INIT trie_op_ptr=j; #endif for (k=1; k<=j; k++) {@+undump(0)(63)(hyf_distance[k]); /*a |small_number|*/ undump(0)(63)(hyf_num[k]); undump(min_quarterword)(max_quarterword)(hyf_next[k]); } #ifdef @!INIT for (k=0; k<=255; k++) trie_used[k]=min_quarterword; #endif @;@/ k=256; while (j > 0) {@+undump(0)(k-1)(k);undump(1)(j)(x); #ifdef @!INIT trie_used[k]=qi(x); #endif @;@/ j=j-x;op_start[k]=qo(j); } #ifdef @!INIT trie_not_ready=false #endif @ We have already printed a lot of statistics, so we set |tracing_stats=0| to prevent them from appearing again. @<Dump a couple more things and the closing check word@>= dump_int(interaction);dump_int(format_ident);dump_int(69069); tracing_stats=0 @ @<Undump a couple more things and the closing check word@>= undump(batch_mode)(error_stop_mode)(interaction); undump(0)(str_ptr)(format_ident); undump_int(x); if ((x!=69069)||eof(fmt_file)) goto bad_fmt @ @<Create the |format_ident|...@>= selector=new_string; print_str(" (preloaded format=");print(job_name);print_char(' '); print_int(year);print_char('.'); print_int(month);print_char('.');print_int(day);print_char(')'); if (interaction==batch_mode) selector=log_only; else selector=term_and_log; str_room(1); format_ident=make_string(); pack_job_name(format_extension); while (!w_open_out(&fmt_file)) prompt_file_name("format file name", format_extension); print_nl("Beginning to dump on file "); @.Beginning to dump...@> slow_print(w_make_name_string(&fmt_file));flush_string; print_nl("");slow_print(format_ident) @ @<Close the format file@>= w_close(&fmt_file) @* The main program. This is it: the part of \TeX\ that executes all those procedures we have written. Well---almost. Let's leave space for a few more routines that we may have forgotten. @p@<Last-minute procedures@>@; @ We have noted that there are two versions of \TeX82. One, called \.{INITEX}, @.INITEX@> has to be run first; it initializes everything from scratch, without reading a format file, and it has the capability of dumping a format file. The other one is called `\.{VIRTEX}'; it is a ``virgin'' program that needs @.VIRTEX@> to input a format file in order to get started. \.{VIRTEX} typically has more memory capacity than \.{INITEX}, because it does not need the space consumed by the auxiliary hyphenation tables and the numerous calls on |primitive|, etc. The \.{VIRTEX} program cannot read a format file instantaneously, of course; the best implementations therefore allow for production versions of \TeX\ that not only avoid the loading routine for \PASCAL\ object code, they also have a format file pre-loaded. This is impossible to do if we stick to standard \PASCAL; but there is a simple way to fool many systems into avoiding the initialization, as follows:\quad(1)~We declare a global integer variable called |ready_already|. The probability is negligible that this variable holds any particular value like 314159 when \.{VIRTEX} is first loaded.\quad(2)~After we have read in a format file and initialized everything, we set |ready_already=314159|.\quad(3)~Soon \.{VIRTEX} will print `\.*', waiting for more input; and at this point we interrupt the program and save its core image in some form that the operating system can reload speedily.\quad(4)~When that core image is activated, the program starts again at the beginning; but now |ready_already==314159| and all the other global variables have their initial values too. The former chastity has vanished! In other words, if we allow ourselves to test the condition |ready_already==314159|, before |ready_already| has been assigned a value, we can avoid the lengthy initialization. Dirty tricks rarely pay off so handsomely. @^dirty \PASCAL@> @^system dependencies@> On systems that allow such preloading, the standard program called \.{TeX} should be the one that has \.{plain} format preloaded, since that agrees with {\sl The \TeX book}. Other versions, e.g., \.{AmSTeX}, should also @:TeXbook}{\sl The \TeX book@> @.AmSTeX@> @.plain@> be provided for commonly used formats. @<Glob...@>= int @!ready_already; /*a sacrifice of purity for economy*/ @ Now this is really it: \TeX\ starts and ends here. The initial test involving |ready_already| should be deleted if the \PASCAL\ runtime system is smart enough to detect such a ``mistake.'' @^system dependencies@> @p int main(void) {@! /*|start_here|*/ history=fatal_error_stop; /*in case we quit during initialization*/ t_open_out; /*open the terminal for output*/ if (ready_already==314159) goto start_of_TEX; @<Check the ``constant'' values...@>@; if (bad > 0) {@+wterm_ln("Ouch---my internal constants have been clobbered!" "---case %d", bad); @.Ouch...clobbered@> exit(0); } initialize(); /*set global variables to their starting values*/ #ifdef @!INIT init_prim(); /*call |primitive| for each primitive*/ init_str_ptr=str_ptr;init_pool_ptr=pool_ptr;fix_date_and_time(); #endif ready_already=314159; start_of_TEX: @<Initialize the output routines@>; @<Get the first line of input and prepare to start@>; history=spotless; /*ready to go!*/ main_control(); /*come to life*/ final_cleanup(); /*prepare for death*/ close_files_and_terminate(); ready_already=0; return 0; } @ Here we do whatever is needed to complete \TeX's job gracefully on the local operating system. The code here might come into play after a fatal error; it must therefore consist entirely of ``safe'' operations that cannot produce error messages. For example, it would be a mistake to call |str_room| or |make_string| at this time, because a call on |overflow| might lead to an infinite loop. @^system dependencies@> Actually there's one way to get error messages, via |prepare_mag|; but that can't cause infinite recursion. @^recursion@> This program doesn't bother to close the input files that may still be open. @<Last-minute...@>= void close_files_and_terminate(void) {@+int k; /*all-purpose index*/ @<Finish the extensions@>; #ifdef @!STAT if (tracing_stats > 0) @<Output statistics about this job@>;@; #endif wake_up_terminal;@<Finish the \.{DVI} file@>; if (log_opened) {@+wlog_cr;a_close(&log_file);selector=selector-2; if (selector==term_only) {@+print_nl("Transcript written on "); @.Transcript written...@> slow_print(log_name);print_char('.'); } } } @ The present section goes directly to the log file instead of using |print| commands, because there's no need for these strings to take up |str_pool| memory when a non-{\bf stat} version of \TeX\ is being used. @<Output statistics...@>= if (log_opened) {@+wlog_ln(" "); wlog_ln("Here is how much of TeX's memory you used:"); @.Here is how much...@> wlog(" %d string", str_ptr-init_str_ptr); if (str_ptr!=init_str_ptr+1) wlog( "s" ); wlog_ln( " out of %d", max_strings-init_str_ptr);@/ wlog_ln( " %d string characters out of %d", pool_ptr-init_pool_ptr, pool_size-init_pool_ptr);@/ wlog_ln(" %d words of memory out of %d", lo_mem_max-mem_min+mem_end-hi_mem_min+2,@| mem_end+1-mem_min);@/ wlog_ln(" %d multiletter control sequences out of %d", cs_count, hash_size);@/ wlog(" %d words of font info for %d font", fmem_ptr, font_ptr-font_base); if (font_ptr!=font_base+1) wlog("s"); wlog_ln( ", out of %d for %d", font_mem_size, font_max-font_base);@/ wlog(" %d hyphenation exception", hyph_count); if (hyph_count!=1) wlog("s"); wlog_ln( " out of %d", hyph_size);@/ wlog_ln(" %di,%dn,%dp,%db,%ds stack positions out of %di,%dn,%dp,%db,%ds", max_in_stack, max_nest_stack,@| max_param_stack,@| max_buf_stack+1,@| max_save_stack+6,@| stack_size, nest_size, param_size, buf_size, save_size ); } @ We get to the |final_cleanup| routine when \.{\\end} or \.{\\dump} has been scanned and |its_all_over|\kern-2pt. @<Last-minute...@>= void final_cleanup(void) {@+ small_number c; /*0 for \.{\\end}, 1 for \.{\\dump}*/ c=cur_chr; if (job_name==0) open_log_file(); while (input_ptr > 0) if (state==token_list) end_token_list();@+else end_file_reading(); while (open_parens > 0) {@+print_str(" )");decr(open_parens); } if (cur_level > level_one) {@+print_nl("(");print_esc(@[@<|"end occurred "|@>@]); print_str("inside a group at level "); @:end_}{\.{(\\end occurred...)}@> print_int(cur_level-level_one);print_char(')'); } while (cond_ptr!=null) {@+print_nl("(");print_esc(@[@<|"end occurred "|@>@]); print_str("when ");print_cmd_chr(if_test, cur_if); if (if_line!=0) {@+print_str(" on line ");print_int(if_line); } print_str(" was incomplete)"); if_line=if_line_field(cond_ptr); cur_if=subtype(cond_ptr);temp_ptr=cond_ptr; cond_ptr=link(cond_ptr);free_node(temp_ptr, if_node_size); } if (history!=spotless) if (((history==warning_issued)||(interaction < error_stop_mode))) if (selector==term_and_log) {@+selector=term_only; print_nl("(see the transcript file for additional information)"); @.see the transcript file...@> selector=term_and_log; } if (c==1) { #ifdef @!INIT for (c=top_mark_code; c<=split_bot_mark_code; c++) if (cur_mark[c]!=null) delete_token_ref(cur_mark[c]); if (last_glue!=max_halfword) delete_glue_ref(last_glue); store_fmt_file();return; #endif print_nl("(\\dump is performed only by INITEX)");return; @:dump_}{\.{\\dump...only by INITEX}@> } } @ @<Last-minute...@>= #ifdef @!INIT void init_prim(void) /*initialize all the primitives*/ {@+no_new_control_sequence=false; @<Put each...@>; no_new_control_sequence=true; } #endif @ When we begin the following code, \TeX's tables may still contain garbage; the strings might not even be present. Thus we must proceed cautiously to get bootstrapped in. But when we finish this part of the program, \TeX\ is ready to call on the |main_control| routine to do its work. @<Get the first line...@>= {@+@<Initialize the input routines@>; if ((format_ident==0)||(buffer[loc]=='&')) {@+if (format_ident!=0) initialize(); /*erase preloaded format*/ if (!open_fmt_file()) exit(0); if (!load_fmt_file()) {@+w_close(&fmt_file);exit(0); } w_close(&fmt_file); while ((loc < limit)&&(buffer[loc]==' ')) incr(loc); } if (end_line_char_inactive) decr(limit); else buffer[limit]=end_line_char; fix_date_and_time();@/ @<Compute the magic offset@>; @<Initialize the print |selector|...@>; if ((loc < limit)&&(cat_code(buffer[loc])!=escape)) start_input(); /*\.{\\input} assumed*/ } @* Debugging. Once \TeX\ is working, you should be able to diagnose most errors with the \.{\\show} commands and other diagnostic features. But for the initial stages of debugging, and for the revelation of really deep mysteries, you can compile \TeX\ with a few more aids, including the \PASCAL\ runtime checks and its debugger. An additional routine called |debug_help| will also come into play when you type `\.D' after an error message; |debug_help| also occurs just before a fatal error causes \TeX\ to succumb. @^debugging@> @^system dependencies@> The interface to |debug_help| is primitive, but it is good enough when used with a \PASCAL\ debugger that allows you to set breakpoints and to read variables and change their values. After getting the prompt `\.{debug \#}', you type either a negative number (this exits |debug_help|), or zero (this goes to a location where you can set a breakpoint, thereby entering into dialog with the \PASCAL\ debugger), or a positive number |m| followed by an argument |n|. The meaning of |m| and |n| will be clear from the program below. (If |m==13|, there is an additional argument, |l|.) @.debug \#@> @<Last-minute...@>= #ifdef @!DEBUG void debug_help(void) /*routine to display various things*/ {@+ int k, @!l, @!m, @!n; loop{@+wake_up_terminal; print_nl("debug # (-1 to exit):");update_terminal; @.debug \#@> if (fscanf(term_in.f," %d",&m)<1 || m < 0) return; else if (m==0) {@+goto breakpoint; /*go to every label at least once*/ breakpoint: m=0;/*'BREAKPOINT'*/ } else{@+fscanf(term_in.f," %d",&n); switch (m) { @t\4@>@<Numbered cases for |debug_help|@>@; default:print_str("?"); } } } } #endif @ @<Numbered cases...@>= case 1: print_word(mem[n]);@+break; /*display |mem[n]| in all forms*/ case 2: print_int(info(n));@+break; case 3: print_int(link(n));@+break; case 4: print_word(eqtb[n]);@+break; case 5: print_word(font_info[n]);@+break; case 6: print_word(save_stack[n]);@+break; case 7: show_box(n);@+break; /*show a box, abbreviated by |show_box_depth| and |show_box_breadth|*/ case 8: {@+breadth_max=10000;depth_threshold=pool_size-pool_ptr-10; show_node_list(n); /*show a box in its entirety*/ } @+break; case 9: show_token_list(n, null, 1000);@+break; case 10: slow_print(n);@+break; case 11: check_mem(n > 0);@+break; /*check wellformedness; print new busy locations if |n > 0|*/ case 12: search_mem(n);@+break; /*look for pointers to |n|*/ case 13: {@+fscanf(term_in.f," %d",&l);print_cmd_chr(n, l); } @+break; case 14: for (k=0; k<=n; k++) print(buffer[k]);@+break; case 15: {@+font_in_short_display=null_font;short_display(n); } @+break; case 16: panicking=!panicking;@+break; @* Extensions. The program above includes a bunch of ``hooks'' that allow further capabilities to be added without upsetting \TeX's basic structure. Most of these hooks are concerned with ``whatsit'' nodes, which are intended to be used for special purposes; whenever a new extension to \TeX\ involves a new kind of whatsit node, a corresponding change needs to be made to the routines below that deal with such nodes, but it will usually be unnecessary to make many changes to the other parts of this program. In order to demonstrate how extensions can be made, we shall treat `\.{\\write}', `\.{\\openout}', `\.{\\closeout}', `\.{\\immediate}', `\.{\\special}', and `\.{\\setlanguage}' as if they were extensions. These commands are actually primitives of \TeX, and they should appear in all implementations of the system; but let's try to imagine that they aren't. Then the program below illustrates how a person could add them. Sometimes, of course, an extension will require changes to \TeX\ itself; no system of hooks could be complete enough for all conceivable extensions. The features associated with `\.{\\write}' are almost all confined to the following paragraphs, but there are small parts of the |print_ln| and |print_char| procedures that were introduced specifically to \.{\\write} characters. Furthermore one of the token lists recognized by the scanner is a |write_text|; and there are a few other miscellaneous places where we have already provided for some aspect of \.{\\write}. The goal of a \TeX\ extender should be to minimize alterations to the standard parts of the program, and to avoid them completely if possible. He or she should also be quite sure that there's no easy way to accomplish the desired goals with the standard features that \TeX\ already has. ``Think thrice before extending,'' because that may save a lot of work, and it will also keep incompatible extensions of \TeX\ from proliferating. @^system dependencies@> @^extensions to \TeX@> @ First let's consider the format of whatsit nodes that are used to represent the data associated with \.{\\write} and its relatives. Recall that a whatsit has |type==whatsit_node|, and the |subtype| is supposed to distinguish different kinds of whatsits. Each node occupies two or more words; the exact number is immaterial, as long as it is readily determined from the |subtype| or other data. We shall introduce five |subtype| values here, corresponding to the control sequences \.{\\openout}, \.{\\write}, \.{\\closeout}, \.{\\special}, and \.{\\setlanguage}. The second word of I/O whatsits has a |write_stream| field that identifies the write-stream number (0 to 15, or 16 for out-of-range and positive, or 17 for out-of-range and negative). In the case of \.{\\write} and \.{\\special}, there is also a field that points to the reference count of a token list that should be sent. In the case of \.{\\openout}, we need three words and three auxiliary subfields to hold the string numbers for name, area, and extension. @d write_node_size 2 /*number of words in a write/whatsit node*/ @d open_node_size 3 /*number of words in an open/whatsit node*/ @d open_node 0 /*|subtype| in whatsits that represent files to \.{\\openout}*/ @d write_node 1 /*|subtype| in whatsits that represent things to \.{\\write}*/ @d close_node 2 /*|subtype| in whatsits that represent streams to \.{\\closeout}*/ @d special_node 3 /*|subtype| in whatsits that represent \.{\\special} things*/ @d language_node 4 /*|subtype| in whatsits that change the current language*/ @d what_lang(X) link(X+1) /*language number, in the range |0 dotdot 255|*/ @d what_lhm(X) type(X+1) /*minimum left fragment, in the range |1 dotdot 63|*/ @d what_rhm(X) subtype(X+1) /*minimum right fragment, in the range |1 dotdot 63|*/ @d write_tokens(X) link(X+1) /*reference count of token list to write*/ @d write_stream(X) info(X+1) /*stream number (0 to 17)*/ @d open_name(X) link(X+1) /*string number of file name to open*/ @d open_area(X) info(X+2) /*string number of file area for |open_name|*/ @d open_ext(X) link(X+2) /*string number of file extension for |open_name|*/ @ The sixteen possible \.{\\write} streams are represented by the |write_file| array. The |j|th file is open if and only if |write_open[j]==true|. The last two streams are special; |write_open[16]| represents a stream number greater than 15, while |write_open[17]| represents a negative stream number, and both of these variables are always |false|. @<Glob...@>= alpha_file @!write_file[16]; bool @!write_open[18]; @ @<Set init...@>= for (k=0; k<=17; k++) write_open[k]=false; @ Extensions might introduce new command codes; but it's best to use |extension| with a modifier, whenever possible, so that |main_control| stays the same. @d immediate_code 4 /*command modifier for \.{\\immediate}*/ @d set_language_code 5 /*command modifier for \.{\\setlanguage}*/ @<Put each...@>= primitive(@[@<|"openout"|@>@], extension, open_node);@/ @!@:open_out_}{\.{\\openout} primitive@> primitive(@[@<|"write"|@>@], extension, write_node);write_loc=cur_val;@/ @!@:write_}{\.{\\write} primitive@> primitive(@[@<|"closeout"|@>@], extension, close_node);@/ @!@:close_out_}{\.{\\closeout} primitive@> primitive(@[@<|"special"|@>@], extension, special_node);@/ @!@:special_}{\.{\\special} primitive@> primitive(@[@<|"immediate"|@>@], extension, immediate_code);@/ @!@:immediate_}{\.{\\immediate} primitive@> primitive(@[@<|"setlanguage"|@>@], extension, set_language_code);@/ @!@:set_language_}{\.{\\setlanguage} primitive@> @ The variable |write_loc| just introduced is used to provide an appropriate error message in case of ``runaway'' write texts. @<Glob...@>= pointer @!write_loc; /*|eqtb| address of \.{\\write}*/ @ @<Cases of |print_cmd_chr|...@>= case extension: switch (chr_code) { case open_node: print_esc(@[@<|"openout"|@>@]);@+break; case write_node: print_esc(@[@<|"write"|@>@]);@+break; case close_node: print_esc(@[@<|"closeout"|@>@]);@+break; case special_node: print_esc(@[@<|"special"|@>@]);@+break; case immediate_code: print_esc(@[@<|"immediate"|@>@]);@+break; case set_language_code: print_esc(@[@<|"setlanguage"|@>@]);@+break; default:print_str("[unknown extension!]"); } @+break; @ When an |extension| command occurs in |main_control|, in any mode, the |do_extension| routine is called. @<Cases of |main_control| that are for extensions...@>= any_mode(extension): do_extension(); @ @<Declare act...@>= @t\4@>@<Declare procedures needed in |do_extension|@>@; void do_extension(void) {@+int @!k; /*all-purpose integer*/ pointer @!p; /*all-purpose pointer*/ switch (cur_chr) { case open_node: @<Implement \.{\\openout}@>@;@+break; case write_node: @<Implement \.{\\write}@>@;@+break; case close_node: @<Implement \.{\\closeout}@>@;@+break; case special_node: @<Implement \.{\\special}@>@;@+break; case immediate_code: @<Implement \.{\\immediate}@>@;@+break; case set_language_code: @<Implement \.{\\setlanguage}@>@;@+break; default:confusion(@[@<|"ext1"|@>@]); @:this can't happen ext1}{\quad ext1@> } } @ Here is a subroutine that creates a whatsit node having a given |subtype| and a given number of words. It initializes only the first word of the whatsit, and appends it to the current list. @<Declare procedures needed in |do_extension|@>= void new_whatsit(small_number @!s, small_number @!w) {@+pointer p; /*the new node*/ p=get_node(w);type(p)=whatsit_node;subtype(p)=s; link(tail)=p;tail=p; } @ The next subroutine uses |cur_chr| to decide what sort of whatsit is involved, and also inserts a |write_stream| number. @<Declare procedures needed in |do_ext...@>= void new_write_whatsit(small_number @!w) {@+new_whatsit(cur_chr, w); if (w!=write_node_size) scan_four_bit_int(); else{@+scan_int(); if (cur_val < 0) cur_val=17; else if (cur_val > 15) cur_val=16; } write_stream(tail)=cur_val; } @ @<Implement \.{\\openout}@>= {@+new_write_whatsit(open_node_size); scan_optional_equals();scan_file_name();@/ open_name(tail)=cur_name;open_area(tail)=cur_area;open_ext(tail)=cur_ext; } @ When `\.{\\write 12\{...\}}' appears, we scan the token list `\.{\{...\}}' without expanding its macros; the macros will be expanded later when this token list is rescanned. @<Implement \.{\\write}@>= {@+k=cur_cs;new_write_whatsit(write_node_size);@/ cur_cs=k;p=scan_toks(false, false);write_tokens(tail)=def_ref; } @ @<Implement \.{\\closeout}@>= {@+new_write_whatsit(write_node_size);write_tokens(tail)=null; } @ When `\.{\\special\{...\}}' appears, we expand the macros in the token list as in \.{\\xdef} and \.{\\mark}. @<Implement \.{\\special}@>= {@+new_whatsit(special_node, write_node_size);write_stream(tail)=null; p=scan_toks(false, true);write_tokens(tail)=def_ref; } @ Each new type of node that appears in our data structure must be capable of being displayed, copied, destroyed, and so on. The routines that we need for write-oriented whatsits are somewhat like those for mark nodes; other extensions might, of course, involve more subtlety here. @<Basic printing...@>= void print_write_whatsit(str_number @!s, pointer @!p) {@+print_esc(s); if (write_stream(p) < 16) print_int(write_stream(p)); else if (write_stream(p)==16) print_char('*'); @.*\relax@> else print_char('-'); } @ @<Display the whatsit...@>= switch (subtype(p)) { case open_node: {@+print_write_whatsit(@[@<|"openout"|@>@], p); print_char('=');print_file_name(open_name(p), open_area(p), open_ext(p)); } @+break; case write_node: {@+print_write_whatsit(@[@<|"write"|@>@], p); print_mark(write_tokens(p)); } @+break; case close_node: print_write_whatsit(@[@<|"closeout"|@>@], p);@+break; case special_node: {@+print_esc(@[@<|"special"|@>@]); print_mark(write_tokens(p)); } @+break; case language_node: {@+print_esc(@[@<|"setlanguage"|@>@]); print_int(what_lang(p));print_str(" (hyphenmin "); print_int(what_lhm(p));print_char(','); print_int(what_rhm(p));print_char(')'); } @+break; default:print_str("whatsit?"); } @ @<Make a partial copy of the whatsit...@>= switch (subtype(p)) { case open_node: {@+r=get_node(open_node_size);words=open_node_size; } @+break; case write_node: case special_node: {@+r=get_node(write_node_size); add_token_ref(write_tokens(p));words=write_node_size; } @+break; case close_node: case language_node: {@+r=get_node(small_node_size); words=small_node_size; } @+break; default:confusion(@[@<|"ext2"|@>@]); @:this can't happen ext2}{\quad ext2@> } @ @<Wipe out the whatsit...@>= {@+switch (subtype(p)) { case open_node: free_node(p, open_node_size);@+break; case write_node: case special_node: {@+delete_token_ref(write_tokens(p)); free_node(p, write_node_size);goto done; } case close_node: case language_node: free_node(p, small_node_size);@+break; default:confusion(@[@<|"ext3"|@>@]); @:this can't happen ext3}{\quad ext3@> } @/ goto done; } @ @<Incorporate a whatsit node into a vbox@>=do_nothing @ @<Incorporate a whatsit node into an hbox@>=do_nothing @ @<Let |d| be the width of the whatsit |p|@>=d=0 @ @d adv_past(X) @+if (subtype(X)==language_node) {@+cur_lang=what_lang(X);l_hyf=what_lhm(X);r_hyf=what_rhm(X);@+} @<Advance \(p)past a whatsit node in the \(l)|line_break| loop@>=@+ adv_past(cur_p) @ @<Advance \(p)past a whatsit node in the \(p)pre-hyphenation loop@>=@+ adv_past(s) @ @<Prepare to move whatsit |p| to the current page, then |goto contribute|@>= goto contribute @ @<Process whatsit |p| in |vert_break| loop, |goto not_found|@>= goto not_found @ @<Output the whatsit node |p| in a vlist@>= out_what(p) @ @<Output the whatsit node |p| in an hlist@>= out_what(p) @ After all this preliminary shuffling, we come finally to the routines that actually send out the requested data. Let's do \.{\\special} first (it's easier). @<Declare procedures needed in |hlist_out|, |vlist_out|@>= void special_out(pointer @!p) {@+uint8_t old_setting; /*holds print |selector|*/ int @!k; /*index into |str_pool|*/ synch_h;synch_v;@/ old_setting=selector;selector=new_string; show_token_list(link(write_tokens(p)), null, pool_size-pool_ptr); selector=old_setting; str_room(1); if (cur_length < 256) {@+dvi_out(xxx1);dvi_out(cur_length); } else{@+dvi_out(xxx4);dvi_four(cur_length); } for (k=str_start[str_ptr]; k<=pool_ptr-1; k++) dvi_out(so(str_pool[k])); pool_ptr=str_start[str_ptr]; /*erase the string*/ } @ To write a token list, we must run it through \TeX's scanner, expanding macros and \.{\\the} and \.{\\number}, etc. This might cause runaways, if a delimited macro parameter isn't matched, and runaways would be extremely confusing since we are calling on \TeX's scanner in the middle of a \.{\\shipout} command. Therefore we will put a dummy control sequence as a ``stopper,'' right after the token list. This control sequence is artificially defined to be \.{\\outer}. @:end_write_}{\.{\\endwrite}@> @<Initialize table...@>= text(end_write)=@[@<|"endwrite"|@>@];eq_level(end_write)=level_one; eq_type(end_write)=outer_call;equiv(end_write)=null; @ @<Declare procedures needed in |hlist_out|, |vlist_out|@>= void write_out(pointer @!p) {@+uint8_t old_setting; /*holds print |selector|*/ int @!old_mode; /*saved |mode|*/ small_number @!j; /*write stream number*/ pointer @!q, @!r; /*temporary variables for list manipulation*/ @<Expand macros in the token list and make |link(def_ref)| point to the result@>; old_setting=selector;j=write_stream(p); if (write_open[j]) selector=j; else{@+ /*write to the terminal if file isn't open*/ if ((j==17)&&(selector==term_and_log)) selector=log_only; print_nl(""); } token_show(def_ref);print_ln(); flush_list(def_ref);selector=old_setting; } @ The final line of this routine is slightly subtle; at least, the author didn't think about it until getting burnt! There is a used-up token list @^Knuth, Donald Ervin@> on the stack, namely the one that contained |end_write_token|. (We insert this artificial `\.{\\endwrite}' to prevent runaways, as explained above.) If it were not removed, and if there were numerous writes on a single page, the stack would overflow. @d end_write_token cs_token_flag+end_write @<Expand macros in the token list and...@>= q=get_avail();info(q)=right_brace_token+'}';@/ r=get_avail();link(q)=r;info(r)=end_write_token;ins_list(q);@/ begin_token_list(write_tokens(p), write_text);@/ q=get_avail();info(q)=left_brace_token+'{';ins_list(q); /*now we're ready to scan `\.\{$\langle\,$token list$\,\rangle$\.{\} \\endwrite}'*/ old_mode=mode;mode=0; /*disable \.{\\prevdepth}, \.{\\spacefactor}, \.{\\lastskip}, \.{\\prevgraf}*/ cur_cs=write_loc;q=scan_toks(false, true); /*expand macros, etc.*/ get_token();@+if (cur_tok!=end_write_token) @<Recover from an unbalanced write command@>; mode=old_mode; end_token_list() /*conserve stack space*/ @ @<Recover from an unbalanced write command@>= {@+print_err("Unbalanced write command"); @.Unbalanced write...@> help2("On this page there's a \\write with fewer real {'s than }'s.")@/ ("I can't handle that very well; good luck.");error(); @/do@+{get_token(); }@+ while (!(cur_tok==end_write_token)); } @ The |out_what| procedure takes care of outputting whatsit nodes for |vlist_out| and |hlist_out|\kern-.3pt. @<Declare procedures needed in |hlist_out|, |vlist_out|@>= void out_what(pointer @!p) {@+small_number j; /*write stream number*/ switch (subtype(p)) { case open_node: case write_node: case close_node: @<Do some work that has been queued up for \.{\\write}@>@;@+break; case special_node: special_out(p);@+break; case language_node: do_nothing;@+break; default:confusion(@[@<|"ext4"|@>@]); @:this can't happen ext4}{\quad ext4@> } } @ We don't implement \.{\\write} inside of leaders. (The reason is that the number of times a leader box appears might be different in different implementations, due to machine-dependent rounding in the glue calculations.) @^leaders@> @<Do some work that has been queued up...@>= if (!doing_leaders) {@+j=write_stream(p); if (subtype(p)==write_node) write_out(p); else{@+if (write_open[j]) a_close(&write_file[j]); if (subtype(p)==close_node) write_open[j]=false; else if (j < 16) {@+cur_name=open_name(p);cur_area=open_area(p); cur_ext=open_ext(p); if (cur_ext==empty_string) cur_ext=@[@<|".tex"|@>@]; pack_cur_name; while (!a_open_out(&write_file[j])) prompt_file_name("output file name",@[@<|".tex"|@>@]); write_open[j]=true; } } } @ The presence of `\.{\\immediate}' causes the |do_extension| procedure to descend to one level of recursion. Nothing happens unless \.{\\immediate} is followed by `\.{\\openout}', `\.{\\write}', or `\.{\\closeout}'. @^recursion@> @<Implement \.{\\immediate}@>= {@+get_x_token(); if ((cur_cmd==extension)&&(cur_chr <= close_node)) {@+p=tail;do_extension(); /*append a whatsit node*/ out_what(tail); /*do the action immediately*/ flush_node_list(tail);tail=p;link(p)=null; } else back_input(); } @ The \.{\\language} extension is somewhat different. We need a subroutine that comes into play when a character of a non-|clang| language is being appended to the current paragraph. @<Declare action...@>= void fix_language(void) {@+ASCII_code @!l; /*the new current language*/ if (language <= 0) l=0; else if (language > 255) l=0; else l=language; if (l!=clang) {@+new_whatsit(language_node, small_node_size); what_lang(tail)=l;clang=l;@/ what_lhm(tail)=norm_min(left_hyphen_min); what_rhm(tail)=norm_min(right_hyphen_min); } } @ @<Implement \.{\\setlanguage}@>= if (abs(mode)!=hmode) report_illegal_case(); else{@+new_whatsit(language_node, small_node_size); scan_int(); if (cur_val <= 0) clang=0; else if (cur_val > 255) clang=0; else clang=cur_val; what_lang(tail)=clang; what_lhm(tail)=norm_min(left_hyphen_min); what_rhm(tail)=norm_min(right_hyphen_min); } @ @<Finish the extensions@>= for (k=0; k<=15; k++) if (write_open[k]) a_close(&write_file[k]) @* System-dependent changes. This section should be replaced, if necessary, by any special modifications of the program that are necessary to make \TeX\ work at a particular installation. It is usually best to design your change file so that all changes to previous sections preserve the section numbering; then everybody's version will be consistent with the published program. More extensive changes, which introduce new sections, can be inserted here; then only the index itself will get a new section number. @^system dependencies@> @ Appendix: Replacement of the string pool file. @d str_0_255 "^^@@^^A^^B^^C^^D^^E^^F^^G^^H^^I^^J^^K^^L^^M^^N^^O"@/ "^^P^^Q^^R^^S^^T^^U^^V^^W^^X^^Y^^Z^^[^^\\^^]^^^^^_"@/ " !\"#$%&'()*+,-./"@/ "0123456789:;<=>?"@/ "@@ABCDEFGHIJKLMNO"@/ "PQRSTUVWXYZ[\\]^_"@/ "`abcdefghijklmno"@/ "pqrstuvwxyz{|}~^^?"@/ "^^80^^81^^82^^83^^84^^85^^86^^87^^88^^89^^8a^^8b^^8c^^8d^^8e^^8f"@/ "^^90^^91^^92^^93^^94^^95^^96^^97^^98^^99^^9a^^9b^^9c^^9d^^9e^^9f"@/ "^^a0^^a1^^a2^^a3^^a4^^a5^^a6^^a7^^a8^^a9^^aa^^ab^^ac^^ad^^ae^^af"@/ "^^b0^^b1^^b2^^b3^^b4^^b5^^b6^^b7^^b8^^b9^^ba^^bb^^bc^^bd^^be^^bf"@/ "^^c0^^c1^^c2^^c3^^c4^^c5^^c6^^c7^^c8^^c9^^ca^^cb^^cc^^cd^^ce^^cf"@/ "^^d0^^d1^^d2^^d3^^d4^^d5^^d6^^d7^^d8^^d9^^da^^db^^dc^^dd^^de^^df"@/ "^^e0^^e1^^e2^^e3^^e4^^e5^^e6^^e7^^e8^^e9^^ea^^eb^^ec^^ed^^ee^^ef"@/ "^^f0^^f1^^f2^^f3^^f4^^f5^^f6^^f7^^f8^^f9^^fa^^fb^^fc^^fd^^fe^^ff"@/ @d str_start_0_255 0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45,@/ 48, 51, 54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93,@/ 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,@/ 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127,@/ 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143,@/ 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159,@/ 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175,@/ 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191,@/ 194, 198, 202, 206, 210, 214, 218, 222, 226, 230, 234, 238, 242, 246, 250, 254,@/ 258, 262, 266, 270, 274, 278, 282, 286, 290, 294, 298, 302, 306, 310, 314, 318,@/ 322, 326, 330, 334, 338, 342, 346, 350, 354, 358, 362, 366, 370, 374, 378, 382,@/ 386, 390, 394, 398, 402, 406, 410, 414, 418, 422, 426, 430, 434, 438, 442, 446,@/ 450, 454, 458, 462, 466, 470, 474, 478, 482, 486, 490, 494, 498, 502, 506, 510,@/ 514, 518, 522, 526, 530, 534, 538, 542, 546, 550, 554, 558, 562, 566, 570, 574,@/ 578, 582, 586, 590, 594, 598, 602, 606, 610, 614, 618, 622, 626, 630, 634, 638,@/ 642, 646, 650, 654, 658, 662, 666, 670, 674, 678, 682, 686, 690, 694, 698, 702,@/ @ @d str_256 "???" @<|"???"|@>=@+256 @ @d str_257 "m2d5c2l5x2v5i" @<|"m2d5c2l5x2v5i"|@>=@+257 @ @d str_258 "batchmode" @<|"batchmode"|@>=@+258 @ @d str_259 "nonstopmode" @<|"nonstopmode"|@>=@+259 @ @d str_260 "scrollmode" @<|"scrollmode"|@>=@+260 @ @d str_261 "CLOBBERED." @<|"CLOBBERED."|@>=@+261 @ @d str_262 "fil" @<|"fil"|@>=@+262 @ @d str_263 "unset" @<|"unset"|@>=@+263 @ @d str_264 "rule(" @<|"rule("|@>=@+264 @ @d str_265 "insert" @<|"insert"|@>=@+265 @ @d str_266 "glue" @<|"glue"|@>=@+266 @ @d str_267 "nonscript" @<|"nonscript"|@>=@+267 @ @d str_268 "mskip" @<|"mskip"|@>=@+268 @ @d str_269 "mu" @<|"mu"|@>=@+269 @ @d str_270 "" @d empty_string 270 @ @d str_271 "kern" @<|"kern"|@>=@+271 @ @d str_272 "mkern" @<|"mkern"|@>=@+272 @ @d str_273 "math" @<|"math"|@>=@+273 @ @d str_274 "penalty " @<|"penalty "|@>=@+274 @ @d str_275 "discretionary" @<|"discretionary"|@>=@+275 @ @d str_276 "mark" @<|"mark"|@>=@+276 @ @d str_277 "vadjust" @<|"vadjust"|@>=@+277 @ @d str_278 "flushing" @<|"flushing"|@>=@+278 @ @d str_279 "copying" @<|"copying"|@>=@+279 @ @d str_280 "lineskip" @<|"lineskip"|@>=@+280 @ @d str_281 "baselineskip" @<|"baselineskip"|@>=@+281 @ @d str_282 "parskip" @<|"parskip"|@>=@+282 @ @d str_283 "abovedisplayskip" @<|"abovedisplayskip"|@>=@+283 @ @d str_284 "belowdisplayskip" @<|"belowdisplayskip"|@>=@+284 @ @d str_285 "abovedisplayshortskip" @<|"abovedisplayshortskip"|@>=@+285 @ @d str_286 "belowdisplayshortskip" @<|"belowdisplayshortskip"|@>=@+286 @ @d str_287 "leftskip" @<|"leftskip"|@>=@+287 @ @d str_288 "rightskip" @<|"rightskip"|@>=@+288 @ @d str_289 "topskip" @<|"topskip"|@>=@+289 @ @d str_290 "splittopskip" @<|"splittopskip"|@>=@+290 @ @d str_291 "tabskip" @<|"tabskip"|@>=@+291 @ @d str_292 "spaceskip" @<|"spaceskip"|@>=@+292 @ @d str_293 "xspaceskip" @<|"xspaceskip"|@>=@+293 @ @d str_294 "parfillskip" @<|"parfillskip"|@>=@+294 @ @d str_295 "thinmuskip" @<|"thinmuskip"|@>=@+295 @ @d str_296 "medmuskip" @<|"medmuskip"|@>=@+296 @ @d str_297 "thickmuskip" @<|"thickmuskip"|@>=@+297 @ @d str_298 "skip" @<|"skip"|@>=@+298 @ @d str_299 "muskip" @<|"muskip"|@>=@+299 @ @d str_300 "pt" @<|"pt"|@>=@+300 @ @d str_301 "output" @<|"output"|@>=@+301 @ @d str_302 "everypar" @<|"everypar"|@>=@+302 @ @d str_303 "everymath" @<|"everymath"|@>=@+303 @ @d str_304 "everydisplay" @<|"everydisplay"|@>=@+304 @ @d str_305 "everyhbox" @<|"everyhbox"|@>=@+305 @ @d str_306 "everyvbox" @<|"everyvbox"|@>=@+306 @ @d str_307 "everyjob" @<|"everyjob"|@>=@+307 @ @d str_308 "everycr" @<|"everycr"|@>=@+308 @ @d str_309 "errhelp" @<|"errhelp"|@>=@+309 @ @d str_310 "toks" @<|"toks"|@>=@+310 @ @d str_311 "parshape" @<|"parshape"|@>=@+311 @ @d str_312 "box" @<|"box"|@>=@+312 @ @d str_313 "textfont" @<|"textfont"|@>=@+313 @ @d str_314 "scriptfont" @<|"scriptfont"|@>=@+314 @ @d str_315 "scriptscriptfont" @<|"scriptscriptfont"|@>=@+315 @ @d str_316 "catcode" @<|"catcode"|@>=@+316 @ @d str_317 "lccode" @<|"lccode"|@>=@+317 @ @d str_318 "uccode" @<|"uccode"|@>=@+318 @ @d str_319 "sfcode" @<|"sfcode"|@>=@+319 @ @d str_320 "mathcode" @<|"mathcode"|@>=@+320 @ @d str_321 "pretolerance" @<|"pretolerance"|@>=@+321 @ @d str_322 "tolerance" @<|"tolerance"|@>=@+322 @ @d str_323 "linepenalty" @<|"linepenalty"|@>=@+323 @ @d str_324 "hyphenpenalty" @<|"hyphenpenalty"|@>=@+324 @ @d str_325 "exhyphenpenalty" @<|"exhyphenpenalty"|@>=@+325 @ @d str_326 "clubpenalty" @<|"clubpenalty"|@>=@+326 @ @d str_327 "widowpenalty" @<|"widowpenalty"|@>=@+327 @ @d str_328 "displaywidowpenalty" @<|"displaywidowpenalty"|@>=@+328 @ @d str_329 "brokenpenalty" @<|"brokenpenalty"|@>=@+329 @ @d str_330 "binoppenalty" @<|"binoppenalty"|@>=@+330 @ @d str_331 "relpenalty" @<|"relpenalty"|@>=@+331 @ @d str_332 "predisplaypenalty" @<|"predisplaypenalty"|@>=@+332 @ @d str_333 "postdisplaypenalty" @<|"postdisplaypenalty"|@>=@+333 @ @d str_334 "interlinepenalty" @<|"interlinepenalty"|@>=@+334 @ @d str_335 "doublehyphendemerits" @<|"doublehyphendemerits"|@>=@+335 @ @d str_336 "finalhyphendemerits" @<|"finalhyphendemerits"|@>=@+336 @ @d str_337 "adjdemerits" @<|"adjdemerits"|@>=@+337 @ @d str_338 "mag" @<|"mag"|@>=@+338 @ @d str_339 "delimiterfactor" @<|"delimiterfactor"|@>=@+339 @ @d str_340 "looseness" @<|"looseness"|@>=@+340 @ @d str_341 "time" @<|"time"|@>=@+341 @ @d str_342 "day" @<|"day"|@>=@+342 @ @d str_343 "month" @<|"month"|@>=@+343 @ @d str_344 "year" @<|"year"|@>=@+344 @ @d str_345 "showboxbreadth" @<|"showboxbreadth"|@>=@+345 @ @d str_346 "showboxdepth" @<|"showboxdepth"|@>=@+346 @ @d str_347 "hbadness" @<|"hbadness"|@>=@+347 @ @d str_348 "vbadness" @<|"vbadness"|@>=@+348 @ @d str_349 "pausing" @<|"pausing"|@>=@+349 @ @d str_350 "tracingonline" @<|"tracingonline"|@>=@+350 @ @d str_351 "tracingmacros" @<|"tracingmacros"|@>=@+351 @ @d str_352 "tracingstats" @<|"tracingstats"|@>=@+352 @ @d str_353 "tracingparagraphs" @<|"tracingparagraphs"|@>=@+353 @ @d str_354 "tracingpages" @<|"tracingpages"|@>=@+354 @ @d str_355 "tracingoutput" @<|"tracingoutput"|@>=@+355 @ @d str_356 "tracinglostchars" @<|"tracinglostchars"|@>=@+356 @ @d str_357 "tracingcommands" @<|"tracingcommands"|@>=@+357 @ @d str_358 "tracingrestores" @<|"tracingrestores"|@>=@+358 @ @d str_359 "uchyph" @<|"uchyph"|@>=@+359 @ @d str_360 "outputpenalty" @<|"outputpenalty"|@>=@+360 @ @d str_361 "maxdeadcycles" @<|"maxdeadcycles"|@>=@+361 @ @d str_362 "hangafter" @<|"hangafter"|@>=@+362 @ @d str_363 "floatingpenalty" @<|"floatingpenalty"|@>=@+363 @ @d str_364 "globaldefs" @<|"globaldefs"|@>=@+364 @ @d str_365 "fam" @<|"fam"|@>=@+365 @ @d str_366 "escapechar" @<|"escapechar"|@>=@+366 @ @d str_367 "defaulthyphenchar" @<|"defaulthyphenchar"|@>=@+367 @ @d str_368 "defaultskewchar" @<|"defaultskewchar"|@>=@+368 @ @d str_369 "endlinechar" @<|"endlinechar"|@>=@+369 @ @d str_370 "newlinechar" @<|"newlinechar"|@>=@+370 @ @d str_371 "language" @<|"language"|@>=@+371 @ @d str_372 "lefthyphenmin" @<|"lefthyphenmin"|@>=@+372 @ @d str_373 "righthyphenmin" @<|"righthyphenmin"|@>=@+373 @ @d str_374 "holdinginserts" @<|"holdinginserts"|@>=@+374 @ @d str_375 "errorcontextlines" @<|"errorcontextlines"|@>=@+375 @ @d str_376 "count" @<|"count"|@>=@+376 @ @d str_377 "delcode" @<|"delcode"|@>=@+377 @ @d str_378 "parindent" @<|"parindent"|@>=@+378 @ @d str_379 "mathsurround" @<|"mathsurround"|@>=@+379 @ @d str_380 "lineskiplimit" @<|"lineskiplimit"|@>=@+380 @ @d str_381 "hsize" @<|"hsize"|@>=@+381 @ @d str_382 "vsize" @<|"vsize"|@>=@+382 @ @d str_383 "maxdepth" @<|"maxdepth"|@>=@+383 @ @d str_384 "splitmaxdepth" @<|"splitmaxdepth"|@>=@+384 @ @d str_385 "boxmaxdepth" @<|"boxmaxdepth"|@>=@+385 @ @d str_386 "hfuzz" @<|"hfuzz"|@>=@+386 @ @d str_387 "vfuzz" @<|"vfuzz"|@>=@+387 @ @d str_388 "delimitershortfall" @<|"delimitershortfall"|@>=@+388 @ @d str_389 "nulldelimiterspace" @<|"nulldelimiterspace"|@>=@+389 @ @d str_390 "scriptspace" @<|"scriptspace"|@>=@+390 @ @d str_391 "predisplaysize" @<|"predisplaysize"|@>=@+391 @ @d str_392 "displaywidth" @<|"displaywidth"|@>=@+392 @ @d str_393 "displayindent" @<|"displayindent"|@>=@+393 @ @d str_394 "overfullrule" @<|"overfullrule"|@>=@+394 @ @d str_395 "hangindent" @<|"hangindent"|@>=@+395 @ @d str_396 "hoffset" @<|"hoffset"|@>=@+396 @ @d str_397 "voffset" @<|"voffset"|@>=@+397 @ @d str_398 "emergencystretch" @<|"emergencystretch"|@>=@+398 @ @d str_399 "dimen" @<|"dimen"|@>=@+399 @ @d str_400 "notexpanded:" @<|"notexpanded:"|@>=@+400 @ @d str_401 "csname" @<|"csname"|@>=@+401 @ @d str_402 "endcsname" @<|"endcsname"|@>=@+402 @ @d str_403 "IMPOSSIBLE." @<|"IMPOSSIBLE."|@>=@+403 @ @d str_404 "NONEXISTENT." @<|"NONEXISTENT."|@>=@+404 @ @d str_405 "accent" @<|"accent"|@>=@+405 @ @d str_406 "advance" @<|"advance"|@>=@+406 @ @d str_407 "afterassignment" @<|"afterassignment"|@>=@+407 @ @d str_408 "aftergroup" @<|"aftergroup"|@>=@+408 @ @d str_409 "begingroup" @<|"begingroup"|@>=@+409 @ @d str_410 "char" @<|"char"|@>=@+410 @ @d str_411 "delimiter" @<|"delimiter"|@>=@+411 @ @d str_412 "divide" @<|"divide"|@>=@+412 @ @d str_413 "endgroup" @<|"endgroup"|@>=@+413 @ @d str_414 "expandafter" @<|"expandafter"|@>=@+414 @ @d str_415 "font" @<|"font"|@>=@+415 @ @d str_416 "fontdimen" @<|"fontdimen"|@>=@+416 @ @d str_417 "halign" @<|"halign"|@>=@+417 @ @d str_418 "hrule" @<|"hrule"|@>=@+418 @ @d str_419 "ignorespaces" @<|"ignorespaces"|@>=@+419 @ @d str_420 "mathaccent" @<|"mathaccent"|@>=@+420 @ @d str_421 "mathchar" @<|"mathchar"|@>=@+421 @ @d str_422 "mathchoice" @<|"mathchoice"|@>=@+422 @ @d str_423 "multiply" @<|"multiply"|@>=@+423 @ @d str_424 "noalign" @<|"noalign"|@>=@+424 @ @d str_425 "noboundary" @<|"noboundary"|@>=@+425 @ @d str_426 "noexpand" @<|"noexpand"|@>=@+426 @ @d str_427 "omit" @<|"omit"|@>=@+427 @ @d str_428 "penalty" @<|"penalty"|@>=@+428 @ @d str_429 "prevgraf" @<|"prevgraf"|@>=@+429 @ @d str_430 "radical" @<|"radical"|@>=@+430 @ @d str_431 "read" @<|"read"|@>=@+431 @ @d str_432 "relax" @<|"relax"|@>=@+432 @ @d str_433 "setbox" @<|"setbox"|@>=@+433 @ @d str_434 "the" @<|"the"|@>=@+434 @ @d str_435 "valign" @<|"valign"|@>=@+435 @ @d str_436 "vcenter" @<|"vcenter"|@>=@+436 @ @d str_437 "vrule" @<|"vrule"|@>=@+437 @ @d str_438 "curlevel" @<|"curlevel"|@>=@+438 @ @d str_439 "retaining" @<|"retaining"|@>=@+439 @ @d str_440 "restoring" @<|"restoring"|@>=@+440 @ @d str_441 "ETC." @<|"ETC."|@>=@+441 @ @d str_442 "BAD." @<|"BAD."|@>=@+442 @ @d str_443 "write" @<|"write"|@>=@+443 @ @d str_444 "par" @<|"par"|@>=@+444 @ @d str_445 "input" @<|"input"|@>=@+445 @ @d str_446 "endinput" @<|"endinput"|@>=@+446 @ @d str_447 "topmark" @<|"topmark"|@>=@+447 @ @d str_448 "firstmark" @<|"firstmark"|@>=@+448 @ @d str_449 "botmark" @<|"botmark"|@>=@+449 @ @d str_450 "splitfirstmark" @<|"splitfirstmark"|@>=@+450 @ @d str_451 "splitbotmark" @<|"splitbotmark"|@>=@+451 @ @d str_452 "spacefactor" @<|"spacefactor"|@>=@+452 @ @d str_453 "prevdepth" @<|"prevdepth"|@>=@+453 @ @d str_454 "deadcycles" @<|"deadcycles"|@>=@+454 @ @d str_455 "insertpenalties" @<|"insertpenalties"|@>=@+455 @ @d str_456 "wd" @<|"wd"|@>=@+456 @ @d str_457 "ht" @<|"ht"|@>=@+457 @ @d str_458 "dp" @<|"dp"|@>=@+458 @ @d str_459 "lastpenalty" @<|"lastpenalty"|@>=@+459 @ @d str_460 "lastkern" @<|"lastkern"|@>=@+460 @ @d str_461 "lastskip" @<|"lastskip"|@>=@+461 @ @d str_462 "inputlineno" @<|"inputlineno"|@>=@+462 @ @d str_463 "badness" @<|"badness"|@>=@+463 @ @d str_464 "true" @<|"true"|@>=@+464 @ @d str_465 "em" @<|"em"|@>=@+465 @ @d str_466 "ex" @<|"ex"|@>=@+466 @ @d str_467 "in" @<|"in"|@>=@+467 @ @d str_468 "pc" @<|"pc"|@>=@+468 @ @d str_469 "cm" @<|"cm"|@>=@+469 @ @d str_470 "mm" @<|"mm"|@>=@+470 @ @d str_471 "bp" @<|"bp"|@>=@+471 @ @d str_472 "dd" @<|"dd"|@>=@+472 @ @d str_473 "cc" @<|"cc"|@>=@+473 @ @d str_474 "sp" @<|"sp"|@>=@+474 @ @d str_475 "plus" @<|"plus"|@>=@+475 @ @d str_476 "minus" @<|"minus"|@>=@+476 @ @d str_477 "width" @<|"width"|@>=@+477 @ @d str_478 "height" @<|"height"|@>=@+478 @ @d str_479 "depth" @<|"depth"|@>=@+479 @ @d str_480 "number" @<|"number"|@>=@+480 @ @d str_481 "romannumeral" @<|"romannumeral"|@>=@+481 @ @d str_482 "string" @<|"string"|@>=@+482 @ @d str_483 "meaning" @<|"meaning"|@>=@+483 @ @d str_484 "fontname" @<|"fontname"|@>=@+484 @ @d str_485 "jobname" @<|"jobname"|@>=@+485 @ @d str_486 "if" @<|"if"|@>=@+486 @ @d str_487 "ifcat" @<|"ifcat"|@>=@+487 @ @d str_488 "ifnum" @<|"ifnum"|@>=@+488 @ @d str_489 "ifdim" @<|"ifdim"|@>=@+489 @ @d str_490 "ifodd" @<|"ifodd"|@>=@+490 @ @d str_491 "ifvmode" @<|"ifvmode"|@>=@+491 @ @d str_492 "ifhmode" @<|"ifhmode"|@>=@+492 @ @d str_493 "ifmmode" @<|"ifmmode"|@>=@+493 @ @d str_494 "ifinner" @<|"ifinner"|@>=@+494 @ @d str_495 "ifvoid" @<|"ifvoid"|@>=@+495 @ @d str_496 "ifhbox" @<|"ifhbox"|@>=@+496 @ @d str_497 "ifvbox" @<|"ifvbox"|@>=@+497 @ @d str_498 "ifx" @<|"ifx"|@>=@+498 @ @d str_499 "ifeof" @<|"ifeof"|@>=@+499 @ @d str_500 "iftrue" @<|"iftrue"|@>=@+500 @ @d str_501 "iffalse" @<|"iffalse"|@>=@+501 @ @d str_502 "ifcase" @<|"ifcase"|@>=@+502 @ @d str_503 "fi" @<|"fi"|@>=@+503 @ @d str_504 "or" @<|"or"|@>=@+504 @ @d str_505 "else" @<|"else"|@>=@+505 @ @d str_506 "TeXinputs/" @d TEX_area 506 @ @d str_507 "TeXfonts/" @d TEX_font_area 507 @ @d str_508 ".fmt" @d format_extension 508 @ @d str_509 ".log" @<|".log"|@>=@+509 @ @d str_510 ".dvi" @<|".dvi"|@>=@+510 @ @d str_511 "input file name" @<|"input file name"|@>=@+511 @ @d str_512 ".tex" @<|".tex"|@>=@+512 @ @d str_513 "texput" @<|"texput"|@>=@+513 @ @d str_514 "nullfont" @<|"nullfont"|@>=@+514 @ @d str_515 ".tfm" @<|".tfm"|@>=@+515 @ @d str_516 "vlistout" @<|"vlistout"|@>=@+516 @ @d str_517 "to" @<|"to"|@>=@+517 @ @d str_518 "spread" @<|"spread"|@>=@+518 @ @d str_519 "vpack" @<|"vpack"|@>=@+519 @ @d str_520 "displaystyle" @<|"displaystyle"|@>=@+520 @ @d str_521 "textstyle" @<|"textstyle"|@>=@+521 @ @d str_522 "scriptstyle" @<|"scriptstyle"|@>=@+522 @ @d str_523 "scriptscriptstyle" @<|"scriptscriptstyle"|@>=@+523 @ @d str_524 "mathord" @<|"mathord"|@>=@+524 @ @d str_525 "mathop" @<|"mathop"|@>=@+525 @ @d str_526 "mathbin" @<|"mathbin"|@>=@+526 @ @d str_527 "mathrel" @<|"mathrel"|@>=@+527 @ @d str_528 "mathopen" @<|"mathopen"|@>=@+528 @ @d str_529 "mathclose" @<|"mathclose"|@>=@+529 @ @d str_530 "mathpunct" @<|"mathpunct"|@>=@+530 @ @d str_531 "mathinner" @<|"mathinner"|@>=@+531 @ @d str_532 "overline" @<|"overline"|@>=@+532 @ @d str_533 "underline" @<|"underline"|@>=@+533 @ @d str_534 "left" @<|"left"|@>=@+534 @ @d str_535 "right" @<|"right"|@>=@+535 @ @d str_536 "limits" @<|"limits"|@>=@+536 @ @d str_537 "nolimits" @<|"nolimits"|@>=@+537 @ @d str_538 "fraction, thickness " @<|"fraction, thickness "|@>=@+538 @ @d str_539 "mlist1" @<|"mlist1"|@>=@+539 @ @d str_540 "mlist2" @<|"mlist2"|@>=@+540 @ @d str_541 "mlist3" @<|"mlist3"|@>=@+541 @ @d str_542 "0234000122*4000133**3**344*0400400*000000234000111*1111112341011" @d math_spacing 542 @ @d str_543 "mlist4" @<|"mlist4"|@>=@+543 @ @d str_544 "span" @<|"span"|@>=@+544 @ @d str_545 "cr" @<|"cr"|@>=@+545 @ @d str_546 "crcr" @<|"crcr"|@>=@+546 @ @d str_547 "endtemplate" @<|"endtemplate"|@>=@+547 @ @d str_548 "endv" @<|"endv"|@>=@+548 @ @d str_549 "256 spans" @<|"256 spans"|@>=@+549 @ @d str_550 "align1" @<|"align1"|@>=@+550 @ @d str_551 "align0" @<|"align0"|@>=@+551 @ @d str_552 "disc1" @<|"disc1"|@>=@+552 @ @d str_553 "disc2" @<|"disc2"|@>=@+553 @ @d str_554 "paragraph" @<|"paragraph"|@>=@+554 @ @d str_555 "disc3" @<|"disc3"|@>=@+555 @ @d str_556 "disc4" @<|"disc4"|@>=@+556 @ @d str_557 "line breaking" @<|"line breaking"|@>=@+557 @ @d str_558 "hyphenation" @<|"hyphenation"|@>=@+558 @ @d str_559 "patterns" @<|"patterns"|@>=@+559 @ @d str_560 "pruning" @<|"pruning"|@>=@+560 @ @d str_561 "vertbreak" @<|"vertbreak"|@>=@+561 @ @d str_562 "vsplit" @<|"vsplit"|@>=@+562 @ @d str_563 "vbox" @<|"vbox"|@>=@+563 @ @d str_564 "pagegoal" @<|"pagegoal"|@>=@+564 @ @d str_565 "pagetotal" @<|"pagetotal"|@>=@+565 @ @d str_566 "pagestretch" @<|"pagestretch"|@>=@+566 @ @d str_567 "pagefilstretch" @<|"pagefilstretch"|@>=@+567 @ @d str_568 "pagefillstretch" @<|"pagefillstretch"|@>=@+568 @ @d str_569 "pagefilllstretch" @<|"pagefilllstretch"|@>=@+569 @ @d str_570 "pageshrink" @<|"pageshrink"|@>=@+570 @ @d str_571 "pagedepth" @<|"pagedepth"|@>=@+571 @ @d str_572 "page" @<|"page"|@>=@+572 @ @d str_573 "end" @<|"end"|@>=@+573 @ @d str_574 "dump" @<|"dump"|@>=@+574 @ @d str_575 "hskip" @<|"hskip"|@>=@+575 @ @d str_576 "hfil" @<|"hfil"|@>=@+576 @ @d str_577 "hfill" @<|"hfill"|@>=@+577 @ @d str_578 "hss" @<|"hss"|@>=@+578 @ @d str_579 "hfilneg" @<|"hfilneg"|@>=@+579 @ @d str_580 "vskip" @<|"vskip"|@>=@+580 @ @d str_581 "vfil" @<|"vfil"|@>=@+581 @ @d str_582 "vfill" @<|"vfill"|@>=@+582 @ @d str_583 "vss" @<|"vss"|@>=@+583 @ @d str_584 "vfilneg" @<|"vfilneg"|@>=@+584 @ @d str_585 "right." @<|"right."|@>=@+585 @ @d str_586 "rightbrace" @<|"rightbrace"|@>=@+586 @ @d str_587 "moveleft" @<|"moveleft"|@>=@+587 @ @d str_588 "moveright" @<|"moveright"|@>=@+588 @ @d str_589 "raise" @<|"raise"|@>=@+589 @ @d str_590 "lower" @<|"lower"|@>=@+590 @ @d str_591 "copy" @<|"copy"|@>=@+591 @ @d str_592 "lastbox" @<|"lastbox"|@>=@+592 @ @d str_593 "vtop" @<|"vtop"|@>=@+593 @ @d str_594 "hbox" @<|"hbox"|@>=@+594 @ @d str_595 "shipout" @<|"shipout"|@>=@+595 @ @d str_596 "leaders" @<|"leaders"|@>=@+596 @ @d str_597 "cleaders" @<|"cleaders"|@>=@+597 @ @d str_598 "xleaders" @<|"xleaders"|@>=@+598 @ @d str_599 "indent" @<|"indent"|@>=@+599 @ @d str_600 "noindent" @<|"noindent"|@>=@+600 @ @d str_601 "unpenalty" @<|"unpenalty"|@>=@+601 @ @d str_602 "unkern" @<|"unkern"|@>=@+602 @ @d str_603 "unskip" @<|"unskip"|@>=@+603 @ @d str_604 "unhbox" @<|"unhbox"|@>=@+604 @ @d str_605 "unhcopy" @<|"unhcopy"|@>=@+605 @ @d str_606 "unvbox" @<|"unvbox"|@>=@+606 @ @d str_607 "unvcopy" @<|"unvcopy"|@>=@+607 @ @d str_608 "eqno" @<|"eqno"|@>=@+608 @ @d str_609 "leqno" @<|"leqno"|@>=@+609 @ @d str_610 "displaylimits" @<|"displaylimits"|@>=@+610 @ @d str_611 "above" @<|"above"|@>=@+611 @ @d str_612 "over" @<|"over"|@>=@+612 @ @d str_613 "atop" @<|"atop"|@>=@+613 @ @d str_614 "abovewithdelims" @<|"abovewithdelims"|@>=@+614 @ @d str_615 "overwithdelims" @<|"overwithdelims"|@>=@+615 @ @d str_616 "atopwithdelims" @<|"atopwithdelims"|@>=@+616 @ @d str_617 "display" @<|"display"|@>=@+617 @ @d str_618 "long" @<|"long"|@>=@+618 @ @d str_619 "outer" @<|"outer"|@>=@+619 @ @d str_620 "global" @<|"global"|@>=@+620 @ @d str_621 "def" @<|"def"|@>=@+621 @ @d str_622 "gdef" @<|"gdef"|@>=@+622 @ @d str_623 "edef" @<|"edef"|@>=@+623 @ @d str_624 "xdef" @<|"xdef"|@>=@+624 @ @d str_625 "prefix" @<|"prefix"|@>=@+625 @ @d str_626 "inaccessible" @<|"inaccessible"|@>=@+626 @ @d str_627 "let" @<|"let"|@>=@+627 @ @d str_628 "futurelet" @<|"futurelet"|@>=@+628 @ @d str_629 "chardef" @<|"chardef"|@>=@+629 @ @d str_630 "mathchardef" @<|"mathchardef"|@>=@+630 @ @d str_631 "countdef" @<|"countdef"|@>=@+631 @ @d str_632 "dimendef" @<|"dimendef"|@>=@+632 @ @d str_633 "skipdef" @<|"skipdef"|@>=@+633 @ @d str_634 "muskipdef" @<|"muskipdef"|@>=@+634 @ @d str_635 "toksdef" @<|"toksdef"|@>=@+635 @ @d str_636 "by" @<|"by"|@>=@+636 @ @d str_637 "hyphenchar" @<|"hyphenchar"|@>=@+637 @ @d str_638 "skewchar" @<|"skewchar"|@>=@+638 @ @d str_639 "FONT" @<|"FONT"|@>=@+639 @ @d str_640 "at" @<|"at"|@>=@+640 @ @d str_641 "scaled" @<|"scaled"|@>=@+641 @ @d str_642 "errorstopmode" @<|"errorstopmode"|@>=@+642 @ @d str_643 "openin" @<|"openin"|@>=@+643 @ @d str_644 "closein" @<|"closein"|@>=@+644 @ @d str_645 "message" @<|"message"|@>=@+645 @ @d str_646 "errmessage" @<|"errmessage"|@>=@+646 @ @d str_647 "lowercase" @<|"lowercase"|@>=@+647 @ @d str_648 "uppercase" @<|"uppercase"|@>=@+648 @ @d str_649 "show" @<|"show"|@>=@+649 @ @d str_650 "showbox" @<|"showbox"|@>=@+650 @ @d str_651 "showthe" @<|"showthe"|@>=@+651 @ @d str_652 "showlists" @<|"showlists"|@>=@+652 @ @d str_653 "long macro" @<|"long macro"|@>=@+653 @ @d str_654 "outer macro" @<|"outer macro"|@>=@+654 @ @d str_655 "outer endtemplate" @<|"outer endtemplate"|@>=@+655 @ @d str_656 " (INITEX)" @<|" (INITEX)"|@>=@+656 @ @d str_657 "end occurred " @<|"end occurred "|@>=@+657 @ @d str_658 "openout" @<|"openout"|@>=@+658 @ @d str_659 "closeout" @<|"closeout"|@>=@+659 @ @d str_660 "special" @<|"special"|@>=@+660 @ @d str_661 "immediate" @<|"immediate"|@>=@+661 @ @d str_662 "setlanguage" @<|"setlanguage"|@>=@+662 @ @d str_663 "ext1" @<|"ext1"|@>=@+663 @ @d str_664 "ext2" @<|"ext2"|@>=@+664 @ @d str_665 "ext3" @<|"ext3"|@>=@+665 @ @d str_666 "endwrite" @<|"endwrite"|@>=@+666 @ @d str_667 "ext4" @<|"ext4"|@>=@+667 @ All the above strings together make up the string pool. @<|str_pool| initialization@>= str_0_255 str_256 str_257 str_258 str_259 str_260 str_261 str_262 str_263@/ str_264 str_265 str_266 str_267 str_268 str_269 str_270 str_271@/ str_272 str_273 str_274 str_275 str_276 str_277 str_278 str_279@/ str_280 str_281 str_282 str_283 str_284 str_285 str_286 str_287@/ str_288 str_289 str_290 str_291 str_292 str_293 str_294 str_295@/ str_296 str_297 str_298 str_299 str_300 str_301 str_302 str_303@/ str_304 str_305 str_306 str_307 str_308 str_309 str_310 str_311@/ str_312 str_313 str_314 str_315 str_316 str_317 str_318 str_319@/ str_320 str_321 str_322 str_323 str_324 str_325 str_326 str_327@/ str_328 str_329 str_330 str_331 str_332 str_333 str_334 str_335@/ str_336 str_337 str_338 str_339 str_340 str_341 str_342 str_343@/ str_344 str_345 str_346 str_347 str_348 str_349 str_350 str_351@/ str_352 str_353 str_354 str_355 str_356 str_357 str_358 str_359@/ str_360 str_361 str_362 str_363 str_364 str_365 str_366 str_367@/ str_368 str_369 str_370 str_371 str_372 str_373 str_374 str_375@/ str_376 str_377 str_378 str_379 str_380 str_381 str_382 str_383@/ str_384 str_385 str_386 str_387 str_388 str_389 str_390 str_391@/ str_392 str_393 str_394 str_395 str_396 str_397 str_398 str_399@/ str_400 str_401 str_402 str_403 str_404 str_405 str_406 str_407@/ str_408 str_409 str_410 str_411 str_412 str_413 str_414 str_415@/ str_416 str_417 str_418 str_419 str_420 str_421 str_422 str_423@/ str_424 str_425 str_426 str_427 str_428 str_429 str_430 str_431@/ str_432 str_433 str_434 str_435 str_436 str_437 str_438 str_439@/ str_440 str_441 str_442 str_443 str_444 str_445 str_446 str_447@/ str_448 str_449 str_450 str_451 str_452 str_453 str_454 str_455@/ str_456 str_457 str_458 str_459 str_460 str_461 str_462 str_463@/ str_464 str_465 str_466 str_467 str_468 str_469 str_470 str_471@/ str_472 str_473 str_474 str_475 str_476 str_477 str_478 str_479@/ str_480 str_481 str_482 str_483 str_484 str_485 str_486 str_487@/ str_488 str_489 str_490 str_491 str_492 str_493 str_494 str_495@/ str_496 str_497 str_498 str_499 str_500 str_501 str_502 str_503@/ str_504 str_505 str_506 str_507 str_508 str_509 str_510 str_511@/ str_512 str_513 str_514 str_515 str_516 str_517 str_518 str_519@/ str_520 str_521 str_522 str_523 str_524 str_525 str_526 str_527@/ str_528 str_529 str_530 str_531 str_532 str_533 str_534 str_535@/ str_536 str_537 str_538 str_539 str_540 str_541 str_542 str_543@/ str_544 str_545 str_546 str_547 str_548 str_549 str_550 str_551@/ str_552 str_553 str_554 str_555 str_556 str_557 str_558 str_559@/ str_560 str_561 str_562 str_563 str_564 str_565 str_566 str_567@/ str_568 str_569 str_570 str_571 str_572 str_573 str_574 str_575@/ str_576 str_577 str_578 str_579 str_580 str_581 str_582 str_583@/ str_584 str_585 str_586 str_587 str_588 str_589 str_590 str_591@/ str_592 str_593 str_594 str_595 str_596 str_597 str_598 str_599@/ str_600 str_601 str_602 str_603 str_604 str_605 str_606 str_607@/ str_608 str_609 str_610 str_611 str_612 str_613 str_614 str_615@/ str_616 str_617 str_618 str_619 str_620 str_621 str_622 str_623@/ str_624 str_625 str_626 str_627 str_628 str_629 str_630 str_631@/ str_632 str_633 str_634 str_635 str_636 str_637 str_638 str_639@/ str_640 str_641 str_642 str_643 str_644 str_645 str_646 str_647@/ str_648 str_649 str_650 str_651 str_652 str_653 str_654 str_655@/ str_656 str_657 str_658 str_659 str_660 str_661 str_662 str_663@/ str_664 str_665 str_666 str_667 @ @<|str_start| initialization@>= str_start_0_255 str_start_256, str_start_257, str_start_258, str_start_259, str_start_260, str_start_261, str_start_262, str_start_263, str_start_264, str_start_265, str_start_266, str_start_267, str_start_268, str_start_269, str_start_270, str_start_271, str_start_272, str_start_273, str_start_274, str_start_275, str_start_276, str_start_277, str_start_278, str_start_279, str_start_280, str_start_281, str_start_282, str_start_283, str_start_284, str_start_285, str_start_286, str_start_287, str_start_288, str_start_289, str_start_290, str_start_291, str_start_292, str_start_293, str_start_294, str_start_295, str_start_296, str_start_297, str_start_298, str_start_299, str_start_300, str_start_301, str_start_302, str_start_303, str_start_304, str_start_305, str_start_306, str_start_307, str_start_308, str_start_309, str_start_310, str_start_311, str_start_312, str_start_313, str_start_314, str_start_315, str_start_316, str_start_317, str_start_318, str_start_319, str_start_320, str_start_321, str_start_322, str_start_323, str_start_324, str_start_325, str_start_326, str_start_327, str_start_328, str_start_329, str_start_330, str_start_331, str_start_332, str_start_333, str_start_334, str_start_335, str_start_336, str_start_337, str_start_338, str_start_339, str_start_340, str_start_341, str_start_342, str_start_343, str_start_344, str_start_345, str_start_346, str_start_347, str_start_348, str_start_349, str_start_350, str_start_351, str_start_352, str_start_353, str_start_354, str_start_355, str_start_356, str_start_357, str_start_358, str_start_359, str_start_360, str_start_361, str_start_362, str_start_363, str_start_364, str_start_365, str_start_366, str_start_367, str_start_368, str_start_369, str_start_370, str_start_371, str_start_372, str_start_373, str_start_374, str_start_375, str_start_376, str_start_377, str_start_378, str_start_379, str_start_380, str_start_381, str_start_382, str_start_383, str_start_384, str_start_385, str_start_386, str_start_387, str_start_388, str_start_389, str_start_390, str_start_391, str_start_392, str_start_393, str_start_394, str_start_395, str_start_396, str_start_397, str_start_398, str_start_399, str_start_400, str_start_401, str_start_402, str_start_403, str_start_404, str_start_405, str_start_406, str_start_407, str_start_408, str_start_409, str_start_410, str_start_411, str_start_412, str_start_413, str_start_414, str_start_415, str_start_416, str_start_417, str_start_418, str_start_419, str_start_420, str_start_421, str_start_422, str_start_423, str_start_424, str_start_425, str_start_426, str_start_427, str_start_428, str_start_429, str_start_430, str_start_431, str_start_432, str_start_433, str_start_434, str_start_435, str_start_436, str_start_437, str_start_438, str_start_439, str_start_440, str_start_441, str_start_442, str_start_443, str_start_444, str_start_445, str_start_446, str_start_447, str_start_448, str_start_449, str_start_450, str_start_451, str_start_452, str_start_453, str_start_454, str_start_455, str_start_456, str_start_457, str_start_458, str_start_459, str_start_460, str_start_461, str_start_462, str_start_463, str_start_464, str_start_465, str_start_466, str_start_467, str_start_468, str_start_469, str_start_470, str_start_471, str_start_472, str_start_473, str_start_474, str_start_475, str_start_476, str_start_477, str_start_478, str_start_479, str_start_480, str_start_481, str_start_482, str_start_483, str_start_484, str_start_485, str_start_486, str_start_487, str_start_488, str_start_489, str_start_490, str_start_491, str_start_492, str_start_493, str_start_494, str_start_495, str_start_496, str_start_497, str_start_498, str_start_499, str_start_500, str_start_501, str_start_502, str_start_503, str_start_504, str_start_505, str_start_506, str_start_507, str_start_508, str_start_509, str_start_510, str_start_511, str_start_512, str_start_513, str_start_514, str_start_515, str_start_516, str_start_517, str_start_518, str_start_519, str_start_520, str_start_521, str_start_522, str_start_523, str_start_524, str_start_525, str_start_526, str_start_527, str_start_528, str_start_529, str_start_530, str_start_531, str_start_532, str_start_533, str_start_534, str_start_535, str_start_536, str_start_537, str_start_538, str_start_539, str_start_540, str_start_541, str_start_542, str_start_543, str_start_544, str_start_545, str_start_546, str_start_547, str_start_548, str_start_549, str_start_550, str_start_551, str_start_552, str_start_553, str_start_554, str_start_555, str_start_556, str_start_557, str_start_558, str_start_559, str_start_560, str_start_561, str_start_562, str_start_563, str_start_564, str_start_565, str_start_566, str_start_567, str_start_568, str_start_569, str_start_570, str_start_571, str_start_572, str_start_573, str_start_574, str_start_575, str_start_576, str_start_577, str_start_578, str_start_579, str_start_580, str_start_581, str_start_582, str_start_583, str_start_584, str_start_585, str_start_586, str_start_587, str_start_588, str_start_589, str_start_590, str_start_591, str_start_592, str_start_593, str_start_594, str_start_595, str_start_596, str_start_597, str_start_598, str_start_599, str_start_600, str_start_601, str_start_602, str_start_603, str_start_604, str_start_605, str_start_606, str_start_607, str_start_608, str_start_609, str_start_610, str_start_611, str_start_612, str_start_613, str_start_614, str_start_615, str_start_616, str_start_617, str_start_618, str_start_619, str_start_620, str_start_621, str_start_622, str_start_623, str_start_624, str_start_625, str_start_626, str_start_627, str_start_628, str_start_629, str_start_630, str_start_631, str_start_632, str_start_633, str_start_634, str_start_635, str_start_636, str_start_637, str_start_638, str_start_639, str_start_640, str_start_641, str_start_642, str_start_643, str_start_644, str_start_645, str_start_646, str_start_647, str_start_648, str_start_649, str_start_650, str_start_651, str_start_652, str_start_653, str_start_654, str_start_655, str_start_656, str_start_657, str_start_658, str_start_659, str_start_660, str_start_661, str_start_662, str_start_663, str_start_664, str_start_665, str_start_666, str_start_667, str_start_668 @ We still need to define the start locations of the strings. @<prepare for string pool initialization@>= typedef enum { str_start_256=sizeof(str_0_255)-1, str_start_257=str_start_256+sizeof(str_256)-1,@/ str_start_258=str_start_257+sizeof(str_257)-1,@/ str_start_259=str_start_258+sizeof(str_258)-1,@/ str_start_260=str_start_259+sizeof(str_259)-1,@/ str_start_261=str_start_260+sizeof(str_260)-1,@/ str_start_262=str_start_261+sizeof(str_261)-1,@/ str_start_263=str_start_262+sizeof(str_262)-1,@/ str_start_264=str_start_263+sizeof(str_263)-1,@/ str_start_265=str_start_264+sizeof(str_264)-1,@/ str_start_266=str_start_265+sizeof(str_265)-1,@/ str_start_267=str_start_266+sizeof(str_266)-1,@/ str_start_268=str_start_267+sizeof(str_267)-1,@/ str_start_269=str_start_268+sizeof(str_268)-1,@/ str_start_270=str_start_269+sizeof(str_269)-1,@/ str_start_271=str_start_270+sizeof(str_270)-1,@/ str_start_272=str_start_271+sizeof(str_271)-1,@/ str_start_273=str_start_272+sizeof(str_272)-1,@/ str_start_274=str_start_273+sizeof(str_273)-1,@/ str_start_275=str_start_274+sizeof(str_274)-1,@/ str_start_276=str_start_275+sizeof(str_275)-1,@/ str_start_277=str_start_276+sizeof(str_276)-1,@/ str_start_278=str_start_277+sizeof(str_277)-1,@/ str_start_279=str_start_278+sizeof(str_278)-1,@/ str_start_280=str_start_279+sizeof(str_279)-1,@/ str_start_281=str_start_280+sizeof(str_280)-1,@/ str_start_282=str_start_281+sizeof(str_281)-1,@/ str_start_283=str_start_282+sizeof(str_282)-1,@/ str_start_284=str_start_283+sizeof(str_283)-1,@/ str_start_285=str_start_284+sizeof(str_284)-1,@/ str_start_286=str_start_285+sizeof(str_285)-1,@/ str_start_287=str_start_286+sizeof(str_286)-1,@/ str_start_288=str_start_287+sizeof(str_287)-1,@/ str_start_289=str_start_288+sizeof(str_288)-1,@/ str_start_290=str_start_289+sizeof(str_289)-1,@/ str_start_291=str_start_290+sizeof(str_290)-1,@/ str_start_292=str_start_291+sizeof(str_291)-1,@/ str_start_293=str_start_292+sizeof(str_292)-1,@/ str_start_294=str_start_293+sizeof(str_293)-1,@/ str_start_295=str_start_294+sizeof(str_294)-1,@/ str_start_296=str_start_295+sizeof(str_295)-1,@/ str_start_297=str_start_296+sizeof(str_296)-1,@/ str_start_298=str_start_297+sizeof(str_297)-1,@/ str_start_299=str_start_298+sizeof(str_298)-1,@/ str_start_300=str_start_299+sizeof(str_299)-1,@/ str_start_301=str_start_300+sizeof(str_300)-1,@/ str_start_302=str_start_301+sizeof(str_301)-1,@/ str_start_303=str_start_302+sizeof(str_302)-1,@/ str_start_304=str_start_303+sizeof(str_303)-1,@/ str_start_305=str_start_304+sizeof(str_304)-1,@/ str_start_306=str_start_305+sizeof(str_305)-1,@/ str_start_307=str_start_306+sizeof(str_306)-1,@/ str_start_308=str_start_307+sizeof(str_307)-1,@/ str_start_309=str_start_308+sizeof(str_308)-1,@/ str_start_310=str_start_309+sizeof(str_309)-1,@/ str_start_311=str_start_310+sizeof(str_310)-1,@/ str_start_312=str_start_311+sizeof(str_311)-1,@/ str_start_313=str_start_312+sizeof(str_312)-1,@/ str_start_314=str_start_313+sizeof(str_313)-1,@/ str_start_315=str_start_314+sizeof(str_314)-1,@/ str_start_316=str_start_315+sizeof(str_315)-1,@/ str_start_317=str_start_316+sizeof(str_316)-1,@/ str_start_318=str_start_317+sizeof(str_317)-1,@/ str_start_319=str_start_318+sizeof(str_318)-1,@/ str_start_320=str_start_319+sizeof(str_319)-1,@/ str_start_321=str_start_320+sizeof(str_320)-1,@/ str_start_322=str_start_321+sizeof(str_321)-1,@/ str_start_323=str_start_322+sizeof(str_322)-1,@/ str_start_324=str_start_323+sizeof(str_323)-1,@/ str_start_325=str_start_324+sizeof(str_324)-1,@/ str_start_326=str_start_325+sizeof(str_325)-1,@/ str_start_327=str_start_326+sizeof(str_326)-1,@/ str_start_328=str_start_327+sizeof(str_327)-1,@/ str_start_329=str_start_328+sizeof(str_328)-1,@/ str_start_330=str_start_329+sizeof(str_329)-1,@/ str_start_331=str_start_330+sizeof(str_330)-1,@/ str_start_332=str_start_331+sizeof(str_331)-1,@/ str_start_333=str_start_332+sizeof(str_332)-1,@/ str_start_334=str_start_333+sizeof(str_333)-1,@/ str_start_335=str_start_334+sizeof(str_334)-1,@/ str_start_336=str_start_335+sizeof(str_335)-1,@/ str_start_337=str_start_336+sizeof(str_336)-1,@/ str_start_338=str_start_337+sizeof(str_337)-1,@/ str_start_339=str_start_338+sizeof(str_338)-1,@/ str_start_340=str_start_339+sizeof(str_339)-1,@/ str_start_341=str_start_340+sizeof(str_340)-1,@/ str_start_342=str_start_341+sizeof(str_341)-1,@/ str_start_343=str_start_342+sizeof(str_342)-1,@/ str_start_344=str_start_343+sizeof(str_343)-1,@/ str_start_345=str_start_344+sizeof(str_344)-1,@/ str_start_346=str_start_345+sizeof(str_345)-1,@/ str_start_347=str_start_346+sizeof(str_346)-1,@/ str_start_348=str_start_347+sizeof(str_347)-1,@/ str_start_349=str_start_348+sizeof(str_348)-1,@/ str_start_350=str_start_349+sizeof(str_349)-1,@/ str_start_351=str_start_350+sizeof(str_350)-1,@/ str_start_352=str_start_351+sizeof(str_351)-1,@/ str_start_353=str_start_352+sizeof(str_352)-1,@/ str_start_354=str_start_353+sizeof(str_353)-1,@/ str_start_355=str_start_354+sizeof(str_354)-1,@/ str_start_356=str_start_355+sizeof(str_355)-1,@/ str_start_357=str_start_356+sizeof(str_356)-1,@/ str_start_358=str_start_357+sizeof(str_357)-1,@/ str_start_359=str_start_358+sizeof(str_358)-1,@/ str_start_360=str_start_359+sizeof(str_359)-1,@/ str_start_361=str_start_360+sizeof(str_360)-1,@/ str_start_362=str_start_361+sizeof(str_361)-1,@/ str_start_363=str_start_362+sizeof(str_362)-1,@/ str_start_364=str_start_363+sizeof(str_363)-1,@/ str_start_365=str_start_364+sizeof(str_364)-1,@/ str_start_366=str_start_365+sizeof(str_365)-1,@/ str_start_367=str_start_366+sizeof(str_366)-1,@/ str_start_368=str_start_367+sizeof(str_367)-1,@/ str_start_369=str_start_368+sizeof(str_368)-1,@/ str_start_370=str_start_369+sizeof(str_369)-1,@/ str_start_371=str_start_370+sizeof(str_370)-1,@/ str_start_372=str_start_371+sizeof(str_371)-1,@/ str_start_373=str_start_372+sizeof(str_372)-1,@/ str_start_374=str_start_373+sizeof(str_373)-1,@/ str_start_375=str_start_374+sizeof(str_374)-1,@/ str_start_376=str_start_375+sizeof(str_375)-1,@/ str_start_377=str_start_376+sizeof(str_376)-1,@/ str_start_378=str_start_377+sizeof(str_377)-1,@/ str_start_379=str_start_378+sizeof(str_378)-1,@/ str_start_380=str_start_379+sizeof(str_379)-1,@/ str_start_381=str_start_380+sizeof(str_380)-1,@/ str_start_382=str_start_381+sizeof(str_381)-1,@/ str_start_383=str_start_382+sizeof(str_382)-1,@/ str_start_384=str_start_383+sizeof(str_383)-1,@/ str_start_385=str_start_384+sizeof(str_384)-1,@/ str_start_386=str_start_385+sizeof(str_385)-1,@/ str_start_387=str_start_386+sizeof(str_386)-1,@/ str_start_388=str_start_387+sizeof(str_387)-1,@/ str_start_389=str_start_388+sizeof(str_388)-1,@/ str_start_390=str_start_389+sizeof(str_389)-1,@/ str_start_391=str_start_390+sizeof(str_390)-1,@/ str_start_392=str_start_391+sizeof(str_391)-1,@/ str_start_393=str_start_392+sizeof(str_392)-1,@/ str_start_394=str_start_393+sizeof(str_393)-1,@/ str_start_395=str_start_394+sizeof(str_394)-1,@/ str_start_396=str_start_395+sizeof(str_395)-1,@/ str_start_397=str_start_396+sizeof(str_396)-1,@/ str_start_398=str_start_397+sizeof(str_397)-1,@/ str_start_399=str_start_398+sizeof(str_398)-1,@/ str_start_400=str_start_399+sizeof(str_399)-1,@/ str_start_401=str_start_400+sizeof(str_400)-1,@/ str_start_402=str_start_401+sizeof(str_401)-1,@/ str_start_403=str_start_402+sizeof(str_402)-1,@/ str_start_404=str_start_403+sizeof(str_403)-1,@/ str_start_405=str_start_404+sizeof(str_404)-1,@/ str_start_406=str_start_405+sizeof(str_405)-1,@/ str_start_407=str_start_406+sizeof(str_406)-1,@/ str_start_408=str_start_407+sizeof(str_407)-1,@/ str_start_409=str_start_408+sizeof(str_408)-1,@/ str_start_410=str_start_409+sizeof(str_409)-1,@/ str_start_411=str_start_410+sizeof(str_410)-1,@/ str_start_412=str_start_411+sizeof(str_411)-1,@/ str_start_413=str_start_412+sizeof(str_412)-1,@/ str_start_414=str_start_413+sizeof(str_413)-1,@/ str_start_415=str_start_414+sizeof(str_414)-1,@/ str_start_416=str_start_415+sizeof(str_415)-1,@/ str_start_417=str_start_416+sizeof(str_416)-1,@/ str_start_418=str_start_417+sizeof(str_417)-1,@/ str_start_419=str_start_418+sizeof(str_418)-1,@/ str_start_420=str_start_419+sizeof(str_419)-1,@/ str_start_421=str_start_420+sizeof(str_420)-1,@/ str_start_422=str_start_421+sizeof(str_421)-1,@/ str_start_423=str_start_422+sizeof(str_422)-1,@/ str_start_424=str_start_423+sizeof(str_423)-1,@/ str_start_425=str_start_424+sizeof(str_424)-1,@/ str_start_426=str_start_425+sizeof(str_425)-1,@/ str_start_427=str_start_426+sizeof(str_426)-1,@/ str_start_428=str_start_427+sizeof(str_427)-1,@/ str_start_429=str_start_428+sizeof(str_428)-1,@/ str_start_430=str_start_429+sizeof(str_429)-1,@/ str_start_431=str_start_430+sizeof(str_430)-1,@/ str_start_432=str_start_431+sizeof(str_431)-1,@/ str_start_433=str_start_432+sizeof(str_432)-1,@/ str_start_434=str_start_433+sizeof(str_433)-1,@/ str_start_435=str_start_434+sizeof(str_434)-1,@/ str_start_436=str_start_435+sizeof(str_435)-1,@/ str_start_437=str_start_436+sizeof(str_436)-1,@/ str_start_438=str_start_437+sizeof(str_437)-1,@/ str_start_439=str_start_438+sizeof(str_438)-1,@/ str_start_440=str_start_439+sizeof(str_439)-1,@/ str_start_441=str_start_440+sizeof(str_440)-1,@/ str_start_442=str_start_441+sizeof(str_441)-1,@/ str_start_443=str_start_442+sizeof(str_442)-1,@/ str_start_444=str_start_443+sizeof(str_443)-1,@/ str_start_445=str_start_444+sizeof(str_444)-1,@/ str_start_446=str_start_445+sizeof(str_445)-1,@/ str_start_447=str_start_446+sizeof(str_446)-1,@/ str_start_448=str_start_447+sizeof(str_447)-1,@/ str_start_449=str_start_448+sizeof(str_448)-1,@/ str_start_450=str_start_449+sizeof(str_449)-1,@/ str_start_451=str_start_450+sizeof(str_450)-1,@/ str_start_452=str_start_451+sizeof(str_451)-1,@/ str_start_453=str_start_452+sizeof(str_452)-1,@/ str_start_454=str_start_453+sizeof(str_453)-1,@/ str_start_455=str_start_454+sizeof(str_454)-1,@/ str_start_456=str_start_455+sizeof(str_455)-1,@/ str_start_457=str_start_456+sizeof(str_456)-1,@/ str_start_458=str_start_457+sizeof(str_457)-1,@/ str_start_459=str_start_458+sizeof(str_458)-1,@/ str_start_460=str_start_459+sizeof(str_459)-1,@/ str_start_461=str_start_460+sizeof(str_460)-1,@/ str_start_462=str_start_461+sizeof(str_461)-1,@/ str_start_463=str_start_462+sizeof(str_462)-1,@/ str_start_464=str_start_463+sizeof(str_463)-1,@/ str_start_465=str_start_464+sizeof(str_464)-1,@/ str_start_466=str_start_465+sizeof(str_465)-1,@/ str_start_467=str_start_466+sizeof(str_466)-1,@/ str_start_468=str_start_467+sizeof(str_467)-1,@/ str_start_469=str_start_468+sizeof(str_468)-1,@/ str_start_470=str_start_469+sizeof(str_469)-1,@/ str_start_471=str_start_470+sizeof(str_470)-1,@/ str_start_472=str_start_471+sizeof(str_471)-1,@/ str_start_473=str_start_472+sizeof(str_472)-1,@/ str_start_474=str_start_473+sizeof(str_473)-1,@/ str_start_475=str_start_474+sizeof(str_474)-1,@/ str_start_476=str_start_475+sizeof(str_475)-1,@/ str_start_477=str_start_476+sizeof(str_476)-1,@/ str_start_478=str_start_477+sizeof(str_477)-1,@/ str_start_479=str_start_478+sizeof(str_478)-1,@/ str_start_480=str_start_479+sizeof(str_479)-1,@/ str_start_481=str_start_480+sizeof(str_480)-1,@/ str_start_482=str_start_481+sizeof(str_481)-1,@/ str_start_483=str_start_482+sizeof(str_482)-1,@/ str_start_484=str_start_483+sizeof(str_483)-1,@/ str_start_485=str_start_484+sizeof(str_484)-1,@/ str_start_486=str_start_485+sizeof(str_485)-1,@/ str_start_487=str_start_486+sizeof(str_486)-1,@/ str_start_488=str_start_487+sizeof(str_487)-1,@/ str_start_489=str_start_488+sizeof(str_488)-1,@/ str_start_490=str_start_489+sizeof(str_489)-1,@/ str_start_491=str_start_490+sizeof(str_490)-1,@/ str_start_492=str_start_491+sizeof(str_491)-1,@/ str_start_493=str_start_492+sizeof(str_492)-1,@/ str_start_494=str_start_493+sizeof(str_493)-1,@/ str_start_495=str_start_494+sizeof(str_494)-1,@/ str_start_496=str_start_495+sizeof(str_495)-1,@/ str_start_497=str_start_496+sizeof(str_496)-1,@/ str_start_498=str_start_497+sizeof(str_497)-1,@/ str_start_499=str_start_498+sizeof(str_498)-1,@/ str_start_500=str_start_499+sizeof(str_499)-1,@/ str_start_501=str_start_500+sizeof(str_500)-1,@/ str_start_502=str_start_501+sizeof(str_501)-1,@/ str_start_503=str_start_502+sizeof(str_502)-1,@/ str_start_504=str_start_503+sizeof(str_503)-1,@/ str_start_505=str_start_504+sizeof(str_504)-1,@/ str_start_506=str_start_505+sizeof(str_505)-1,@/ str_start_507=str_start_506+sizeof(str_506)-1,@/ str_start_508=str_start_507+sizeof(str_507)-1,@/ str_start_509=str_start_508+sizeof(str_508)-1,@/ str_start_510=str_start_509+sizeof(str_509)-1,@/ str_start_511=str_start_510+sizeof(str_510)-1,@/ str_start_512=str_start_511+sizeof(str_511)-1,@/ str_start_513=str_start_512+sizeof(str_512)-1,@/ str_start_514=str_start_513+sizeof(str_513)-1,@/ str_start_515=str_start_514+sizeof(str_514)-1,@/ str_start_516=str_start_515+sizeof(str_515)-1,@/ str_start_517=str_start_516+sizeof(str_516)-1,@/ str_start_518=str_start_517+sizeof(str_517)-1,@/ str_start_519=str_start_518+sizeof(str_518)-1,@/ str_start_520=str_start_519+sizeof(str_519)-1,@/ str_start_521=str_start_520+sizeof(str_520)-1,@/ str_start_522=str_start_521+sizeof(str_521)-1,@/ str_start_523=str_start_522+sizeof(str_522)-1,@/ str_start_524=str_start_523+sizeof(str_523)-1,@/ str_start_525=str_start_524+sizeof(str_524)-1,@/ str_start_526=str_start_525+sizeof(str_525)-1,@/ str_start_527=str_start_526+sizeof(str_526)-1,@/ str_start_528=str_start_527+sizeof(str_527)-1,@/ str_start_529=str_start_528+sizeof(str_528)-1,@/ str_start_530=str_start_529+sizeof(str_529)-1,@/ str_start_531=str_start_530+sizeof(str_530)-1,@/ str_start_532=str_start_531+sizeof(str_531)-1,@/ str_start_533=str_start_532+sizeof(str_532)-1,@/ str_start_534=str_start_533+sizeof(str_533)-1,@/ str_start_535=str_start_534+sizeof(str_534)-1,@/ str_start_536=str_start_535+sizeof(str_535)-1,@/ str_start_537=str_start_536+sizeof(str_536)-1,@/ str_start_538=str_start_537+sizeof(str_537)-1,@/ str_start_539=str_start_538+sizeof(str_538)-1,@/ str_start_540=str_start_539+sizeof(str_539)-1,@/ str_start_541=str_start_540+sizeof(str_540)-1,@/ str_start_542=str_start_541+sizeof(str_541)-1,@/ str_start_543=str_start_542+sizeof(str_542)-1,@/ str_start_544=str_start_543+sizeof(str_543)-1,@/ str_start_545=str_start_544+sizeof(str_544)-1,@/ str_start_546=str_start_545+sizeof(str_545)-1,@/ str_start_547=str_start_546+sizeof(str_546)-1,@/ str_start_548=str_start_547+sizeof(str_547)-1,@/ str_start_549=str_start_548+sizeof(str_548)-1,@/ str_start_550=str_start_549+sizeof(str_549)-1,@/ str_start_551=str_start_550+sizeof(str_550)-1,@/ str_start_552=str_start_551+sizeof(str_551)-1,@/ str_start_553=str_start_552+sizeof(str_552)-1,@/ str_start_554=str_start_553+sizeof(str_553)-1,@/ str_start_555=str_start_554+sizeof(str_554)-1,@/ str_start_556=str_start_555+sizeof(str_555)-1,@/ str_start_557=str_start_556+sizeof(str_556)-1,@/ str_start_558=str_start_557+sizeof(str_557)-1,@/ str_start_559=str_start_558+sizeof(str_558)-1,@/ str_start_560=str_start_559+sizeof(str_559)-1,@/ str_start_561=str_start_560+sizeof(str_560)-1,@/ str_start_562=str_start_561+sizeof(str_561)-1,@/ str_start_563=str_start_562+sizeof(str_562)-1,@/ str_start_564=str_start_563+sizeof(str_563)-1,@/ str_start_565=str_start_564+sizeof(str_564)-1,@/ str_start_566=str_start_565+sizeof(str_565)-1,@/ str_start_567=str_start_566+sizeof(str_566)-1,@/ str_start_568=str_start_567+sizeof(str_567)-1,@/ str_start_569=str_start_568+sizeof(str_568)-1,@/ str_start_570=str_start_569+sizeof(str_569)-1,@/ str_start_571=str_start_570+sizeof(str_570)-1,@/ str_start_572=str_start_571+sizeof(str_571)-1,@/ str_start_573=str_start_572+sizeof(str_572)-1,@/ str_start_574=str_start_573+sizeof(str_573)-1,@/ str_start_575=str_start_574+sizeof(str_574)-1,@/ str_start_576=str_start_575+sizeof(str_575)-1,@/ str_start_577=str_start_576+sizeof(str_576)-1,@/ str_start_578=str_start_577+sizeof(str_577)-1,@/ str_start_579=str_start_578+sizeof(str_578)-1,@/ str_start_580=str_start_579+sizeof(str_579)-1,@/ str_start_581=str_start_580+sizeof(str_580)-1,@/ str_start_582=str_start_581+sizeof(str_581)-1,@/ str_start_583=str_start_582+sizeof(str_582)-1,@/ str_start_584=str_start_583+sizeof(str_583)-1,@/ str_start_585=str_start_584+sizeof(str_584)-1,@/ str_start_586=str_start_585+sizeof(str_585)-1,@/ str_start_587=str_start_586+sizeof(str_586)-1,@/ str_start_588=str_start_587+sizeof(str_587)-1,@/ str_start_589=str_start_588+sizeof(str_588)-1,@/ str_start_590=str_start_589+sizeof(str_589)-1,@/ str_start_591=str_start_590+sizeof(str_590)-1,@/ str_start_592=str_start_591+sizeof(str_591)-1,@/ str_start_593=str_start_592+sizeof(str_592)-1,@/ str_start_594=str_start_593+sizeof(str_593)-1,@/ str_start_595=str_start_594+sizeof(str_594)-1,@/ str_start_596=str_start_595+sizeof(str_595)-1,@/ str_start_597=str_start_596+sizeof(str_596)-1,@/ str_start_598=str_start_597+sizeof(str_597)-1,@/ str_start_599=str_start_598+sizeof(str_598)-1,@/ str_start_600=str_start_599+sizeof(str_599)-1,@/ str_start_601=str_start_600+sizeof(str_600)-1,@/ str_start_602=str_start_601+sizeof(str_601)-1,@/ str_start_603=str_start_602+sizeof(str_602)-1,@/ str_start_604=str_start_603+sizeof(str_603)-1,@/ str_start_605=str_start_604+sizeof(str_604)-1,@/ str_start_606=str_start_605+sizeof(str_605)-1,@/ str_start_607=str_start_606+sizeof(str_606)-1,@/ str_start_608=str_start_607+sizeof(str_607)-1,@/ str_start_609=str_start_608+sizeof(str_608)-1,@/ str_start_610=str_start_609+sizeof(str_609)-1,@/ str_start_611=str_start_610+sizeof(str_610)-1,@/ str_start_612=str_start_611+sizeof(str_611)-1,@/ str_start_613=str_start_612+sizeof(str_612)-1,@/ str_start_614=str_start_613+sizeof(str_613)-1,@/ str_start_615=str_start_614+sizeof(str_614)-1,@/ str_start_616=str_start_615+sizeof(str_615)-1,@/ str_start_617=str_start_616+sizeof(str_616)-1,@/ str_start_618=str_start_617+sizeof(str_617)-1,@/ str_start_619=str_start_618+sizeof(str_618)-1,@/ str_start_620=str_start_619+sizeof(str_619)-1,@/ str_start_621=str_start_620+sizeof(str_620)-1,@/ str_start_622=str_start_621+sizeof(str_621)-1,@/ str_start_623=str_start_622+sizeof(str_622)-1,@/ str_start_624=str_start_623+sizeof(str_623)-1,@/ str_start_625=str_start_624+sizeof(str_624)-1,@/ str_start_626=str_start_625+sizeof(str_625)-1,@/ str_start_627=str_start_626+sizeof(str_626)-1,@/ str_start_628=str_start_627+sizeof(str_627)-1,@/ str_start_629=str_start_628+sizeof(str_628)-1,@/ str_start_630=str_start_629+sizeof(str_629)-1,@/ str_start_631=str_start_630+sizeof(str_630)-1,@/ str_start_632=str_start_631+sizeof(str_631)-1,@/ str_start_633=str_start_632+sizeof(str_632)-1,@/ str_start_634=str_start_633+sizeof(str_633)-1,@/ str_start_635=str_start_634+sizeof(str_634)-1,@/ str_start_636=str_start_635+sizeof(str_635)-1,@/ str_start_637=str_start_636+sizeof(str_636)-1,@/ str_start_638=str_start_637+sizeof(str_637)-1,@/ str_start_639=str_start_638+sizeof(str_638)-1,@/ str_start_640=str_start_639+sizeof(str_639)-1,@/ str_start_641=str_start_640+sizeof(str_640)-1,@/ str_start_642=str_start_641+sizeof(str_641)-1,@/ str_start_643=str_start_642+sizeof(str_642)-1,@/ str_start_644=str_start_643+sizeof(str_643)-1,@/ str_start_645=str_start_644+sizeof(str_644)-1,@/ str_start_646=str_start_645+sizeof(str_645)-1,@/ str_start_647=str_start_646+sizeof(str_646)-1,@/ str_start_648=str_start_647+sizeof(str_647)-1,@/ str_start_649=str_start_648+sizeof(str_648)-1,@/ str_start_650=str_start_649+sizeof(str_649)-1,@/ str_start_651=str_start_650+sizeof(str_650)-1,@/ str_start_652=str_start_651+sizeof(str_651)-1,@/ str_start_653=str_start_652+sizeof(str_652)-1,@/ str_start_654=str_start_653+sizeof(str_653)-1,@/ str_start_655=str_start_654+sizeof(str_654)-1,@/ str_start_656=str_start_655+sizeof(str_655)-1,@/ str_start_657=str_start_656+sizeof(str_656)-1,@/ str_start_658=str_start_657+sizeof(str_657)-1,@/ str_start_659=str_start_658+sizeof(str_658)-1,@/ str_start_660=str_start_659+sizeof(str_659)-1,@/ str_start_661=str_start_660+sizeof(str_660)-1,@/ str_start_662=str_start_661+sizeof(str_661)-1,@/ str_start_663=str_start_662+sizeof(str_662)-1,@/ str_start_664=str_start_663+sizeof(str_663)-1,@/ str_start_665=str_start_664+sizeof(str_664)-1,@/ str_start_666=str_start_665+sizeof(str_665)-1,@/ str_start_667=str_start_666+sizeof(str_666)-1,@/ str_start_668=str_start_667+sizeof(str_667)-1,@/ str_start_end } str_starts; @ @<|pool_ptr| initialization@>= str_start_668 @ @<|str_ptr| initialization@>= 668 @* Index. Here is where you can find all uses of each identifier in the program, with underlined entries pointing to where the identifier was defined. If the identifier is only one letter long, however, you get to see only the underlined entries. {\sl All references are to section numbers instead of page numbers.} This index also lists error messages and other aspects of the program that you might want to look up some day. For example, the entry for ``system dependencies'' lists all sections that should receive special attention from people who are installing \TeX\ in a new operating environment. A list of various things that can't happen appears under ``this can't happen''. Approximately 40 sections are listed under ``inner loop''; these account for about 60\pct! of \TeX's running time, exclusive of input and output.