The Genius of Donald Knuth: Typesetting with Boxes and Glue

Today is the 70th birthday of Donald Knuth.

If you don't know who Knuth is, then you're not a programmer. If you're a programmer and you don't know who Knuth is, well... I have no idea what rock you've been hiding under, but you should probably be fired.

Knuth is one of the most famous and accomplished people in the computer science community. He's done all sorts of great research, published a set of definitive textbooks on computer algorithms, and of particular interest to me, implemented a brilliant, hideous, beautiful, godawful piece of software called TeX.

When I went to grad school, instead of being a teaching assistant, I worked for the department as a sysadmin. I ended up spending close to six years doing almost nothing but technical support for TeX, which explains my mixed emotions about it in the description above.

So what is TeX?

When Knuth was writing his papers and books, one of the problems that he constantly encountered was having the typesetters screw up the math. Mathematicians use a lot of wierd symbols, and the people who typeset the books mostly didn't know math. So they'd mess up all sorts of things - sometimes in very serious ways. So after he'd gotten sufficiently sick of dealing with that, he sat down and wrote a typesetting programming - which became TeX.

TeX was one of the first major markup languages. The idea was that you'd write the text of
your document, and mixed into the text, you'd include commands that described how you'd like
things to be typeset. Then you'd run the whole shebang through a TeX processor, and it would
generate the perfectly typeset results. TeX rapidly became the way that technical
papers were written, and it remains the dominant typesetting system for technical work all the
way to the present day. TeX is a brilliant system. There's a darned good reason why it remains
so dominant 30 years after Knuth wrote the original version.

TeX is more than just a typesetting system. It's a full-fledged programming language. It is absolutely turing complete - as a proof of that, a lunatic by the name of Andrew Greene
wrote a complete, usable BASIC interpreter in TeX!. It's arguably insane to have a Turing complete programming language for a task like typesetting. But it actually makes sense. As I've pointed out before, it's actually very easy for a programming system to be Turing complete. Basically, if you can do iteration and arithmetic, and you've got no absolute limit on storage, you're pretty much guaranteed to be Turing complete. And attractive typesetting needs to do things like iteration - to lay out the pieces of stuff, and it needs to be able to do arithmetic, because it's got to
be able to compute the positions on the page of the different typesetting elements. Since Knuth gave it string-valued variables, without placing a limit on how much stuff you could put into a string, it was pretty much inevitable that it would be Turing complete.

There are two fundamental ideas behind TeX. As a language, it's based on macro expansion. You can define symbols, and describe replacements for those symbols. When a symbol is encountered, it's replaced by its value. (usually...)

For example, you could write a TeX macro that replaced a piece of text with
two copies, and then use it like the following:

\def\double#1{#1 #1}
\double{Foo}

That would create, as its result, the text "Foo Foo" typeset on a page.

There's a ton of control structure oriented towards managing just when things
do their macro-expansion. So, for example, you can alter the order in which
things do expansions using something called "expandafter". Expandafter takes the next
two syntactic elements, pushes the first onto stack, does the expansion of the second, replacing it with its expansion, and then doing the expansion of the first, allowing it
to use part of the expansion of the second as its parameter. Here's an example
from the basic interpreter:

\def\strlen#1{\strtmp-2% don't count " " \iw tokens
  \expandafter\if\stringP #1\let\next\strIter\strIter #1\iw\fi}

This basically says "Expand \stringP, and then do the if". Since stringP takes a parameter,
that means, roughly, compute "\stringP #1" (where #1 is a parameter to "strlen"), and
then use the result of that as the first parameter to the "if". So this says "If #1 is a string, then compute its length using strIter".

Just looking at this tiny fragment, you should be able to see that TeX could make
for a prize-winning entry as a pathological language.

The other main idea of TeX is the boxes-and-glue model of typesetting. All of that crazy macro stuff generally ends up by producing two kinds of things: Boxes, which are
things can be drawn on a page, and glue, which is invisible stretchy stuff that sticks boxes together. This is the part of TeX that is amazingly, gloriously, magnificently brilliant. It's an extremely simple model which is capable of doing extremely complex things.

The idea of typesetting in TeX is that you go through the document, expanding the macros, which results in a ton of boxes and glue. The boxes have all different sizes, and you want to put them together to produce something attractive. Glue makes that work. Glue attached boxes together: it defines how the boxes will be joined (should they line up their centers? Should they be aligned by some guideline?), and it defines how big the space between them should be, and how much it can be stretched or compressed. Page layout is really just tension
relaxation: find the arrangement of boxes which produces the smallest overall tension, within the constraints imposed by the glue.

The result of that varies, depending on the skill of the person who set the basic constraints used to determine how the basic glue tensions worked. You can create astonishingly beautifully set text - a skilled typesetter can work out the constraints to produce a result that's the aesthetic equal of the very best typesetter. You can also create truly astonishingly godawful stuff, on a par with some of what we've seen on the web.

But on the whole, it's been a great thing. Pick up any conference proceedings
from the last 20 years, in the fields of math, computer science, physics, or chemistry (among numerous others), and you'll see the results of TeX layout. Pick up a book published by Springer-Verlag, and it's almost certainly typeset by TeX. Look at Greg Chaitin's books - every one was written using TeX. Look at any typeset equation in pretty much any published source, from websites to conference proceedings, to journals, to textbooks. If the equation looks really good, if everything is in exactly the right place, and every symbol is correctly drawn in relation to everything else - odds are, it was generated by TeX. Even hardcore Microsoft word users generally use something TeX based for doing equations.

I've got a love-hate relationship with TeX. It's a tough system to master, and it's
amazing how badly many people misuse it. So as a guy who had to work doing technical support
of a bunch of people who didn't understand it, but were using it to write their papers and dissertations, I dealt with more than my fair share of frustration caused by some of the wierd things that TeX can do. But looking at it as an engineer, and looking at it as a user myself, and realizing when it was written, I have to conclude that it's one of the best pieces of software ever written. I don't know of any other software other than TeX implemented in the 1970s that remains absolutely and unquestionably dominant in its domain. And the glue-and-boxes model of text layout was a piece of absolute genius - one of the most masterful examples of capturing an extremely complex problem using an extremely simple model. It's beautiful. And it's typical of the kind of thing that Knuth does.

Happy Birthday, Dr. Knuth, and many happy returns!

More like this

I've always found it ironic that TeX makes it pretty difficult to typeset code, particularly in figures and sub-figures. What would your six (plus) years of expertise recommend?

Happy Birthday indeed Dr. Knuth and many, many happy returns.

A small quibble

Pick up any conference proceedings from the last 20 years, in the fields of math, computer science, physics, or chemistry (among numerous others), and you'll see the results of TeX layout

most of it was actually written in LaTeX.

For non-programmers Knuth also wrote the wonderful Surreal Numbers which is exactly that, an introduction to Conway's surreal numbers.

I generally use LaTeX, so the incantations would be somewhat
different for plain TeX. But the basic idea is that I wrote code preprocessing tools that translated the code into a tabbing environment contained in a minipage.

So, for example, if I started with something like

(define (fact n)
(if (= n 0)
1
(* n (fact (- n 1)))))

I would use a script to translate it into:

\begin{minipage}{4in}\tt
\begin{tabbing}
xx\=xx\=xx\=xx\=xx\=\kill
(define (fact n) \\
\> (if (= n 0) \\
\> \> 1 \\
\> \> (* n (fact (- n 1)))))
\end{tabbing}
\end{minipage}

Thony:

LaTeX is implemented as a ton of incredibly hairy macros written in TeX. It's still TeX. It has the advantage that
it's easier to write, and the most common formats are pre-written for you as style files. But it's still TeX.

"Look at any typeset equation in pretty much any published source, from websites to conference proceedings, to journals, to textbooks."

One of the unfortunate side-effects of the prevalence of TeX and its growing use by publishers is their incredible laziness. A couple of times I've had them ask me to fix a formatting problem in a submission, such as line spacing. It's a one-character change in the LaTeX code, but instead of doing it themselves, they require me to log in and follow their tedious resubmission process!

Yes, I completely agree with you. But I want to ask Knuth when he was writing TeX, whether he was able to see that TeX will be the most influential work of his. I never pick up a math book that is not typeset in TeX; non-TeX books are completely horrible to read! I think it would be fair to say that Knuth's largest contribution to the science community was TeX, and then his The Art of Computer Programming series.

Having been a combination math student and unix geek, I actually started using LaTeX before college---I like the output so much more than Microsoft Word or the like that I even do English papers in it. At some point, I just threw away my word processor (well, almost. People persist in thinking that the Linux/Mac guy can open up Word files for some reason).

I find it pretty easy to do anything that doesn't require much customization in LaTeX, although some equations still force me to look up commands. Other than the simplest of macros, however, I've never had much luck trying to do customization in it. Fortunately, the popular packages seem to cover most of what I need anyway.

By Matthew L. (not verified) on 10 Jan 2008 #permalink

According to Knuth, what really drove him to create TeX was being told by his publisher that the second edition of volume I of TAOCP could not be typeset in the same nice format as the first edition. So he sat himself down, learned a lot about typesetting mathematics, and we are all benefiting, almost 30 years later.

By Bobby The Programmer (not verified) on 10 Jan 2008 #permalink

My brother, damn his squidgy undeserving hide, actually got to work for Knuth. How much better if it could have been me!!!

Anyway, I would merely like to note in passing -- frogs ARE Turing compatible, I demonstrated this independently.

By Luna_the_cat (not verified) on 10 Jan 2008 #permalink

I think that what made your life so difficult all those years ago is that while TeX is a great program, the TexBook is a terrible way to learn it - at least for any non-computer-sciencey person. I started on TeX about 20 years ago as an astrophysics research student trying to write papers. The TexBook had me fairly flummoxed and most of my colleagues - some of the finest mathematical minds in Britain - found it equally impenetrable. In those pre-web days getting technical help wasn't nearly as easy as it is now so this was a serious hurdle, especially as our computer guy was, obviously unlike your good self, almost infamously unapproachable.

csrster: I am currently an undergraduate, working on my secondary ed credentials in mathematics (thus, I am a math major). Last semester, a professor of mine recommended that I learn TeX in order to typeset my homework (my handwriting is sh*t, so I was doing everything in AppleWorks, with the equation editor, which is close, but not quite right). So I picked up a copy of Knuth's TexBook. I agree that it is dense, and I certainly haven't gotten more than a basic level of understanding out of it. However, no one has been able to recommend a better place to go for a better understanding of TeX.

Is there any chance that you might be able to offer some guidance?

xander

Xander:

The best advice is to get away from plain TeX, and switch to LaTeX. LaTeX is a lot easier to use for most things, and has a lot of extra macros that make it easier to set complex equations. The latest edition of Lamport's book on LaTeX, plus the LaTeX Companion are pretty accessible.

I once helped Donald Knuth get an overhead projector working.

On the other hand, at a post-commencement celebration, my friend Jerome once grabbed a sandwich that Knuth was clearly on his way to taking from the platter. (It might even have been the last sandwich on the platter.)

I'm not sure how to set up that karmic equation (but I'd definitely use Tex to render it).

My whole Modern Algebra class had to learn LaTeX (or some variant of TeX) last semester when we collectively failed an exam. Our retake was to be take-home and LaTeX format. It does make things look really pretty, and I haven't found it ultra-hard, although at one point I was making some matrices and that was a bit of a pain.

I think I know what you mean by boxes, but anyone have an example of what's meant by "glue"? That might help me and possibly some of my classmates, because we're still a little unsure of how to manage all the formatting and stuff.

MCC wrote: "The latest edition of Lamport's book on LaTeX, plus the LaTeX Companion are pretty accessible."

The lesson I've also learned is to get a simple working LaTeX file from a friend that contains all the basics (equations, sections, figures). Just tweaking a simple file and seeing what happens is one of the best ways to figure out what's going on - just like for any programming language.

The one thing I don't like in *TeX is the complexity of adding images, Excel sheets and/or drawings.

Word/OpenOffice have an _awesome_ feature - in-place editing using OLE subsystem.

By Alex Besogonov (not verified) on 10 Jan 2008 #permalink

Or my collaboration with John Nash!

I've only known one person who did substantial work in TeX, rather than LaTeX. That was my group theory professor, Daniel Freedman. In our "group theory for physicists" class, the problem sets were written in TeX, since he needed some features (elaborate control of superscript and subscript placement, I think) which LaTeX made more difficult. The lecture notes were written in classic professorial scrawl and photocopied on a stochastic Xerox machine.

Maple's ability create LaTeX markup makes writing it one hell of a lot easier, that is fantastic feature.

By Paul Carpenter (not verified) on 10 Jan 2008 #permalink

I think anyone comparing TeX with OpenOffice or its expensive brother is strange.
On the other hand, it would be nice to have a SpTeX (LaTeX for spreadsheets - just macros for people like me who aren't good at tables).
Could we all ask google to accept tex as a format in google documents?

a stochastic Xerox machine

A, the predeccessor to the modern bug-, virus-, and worminfested Windows machine. How reliably technology advances!

By Torbjörn Lars… (not verified) on 10 Jan 2008 #permalink

I started using LaTeX for my first papers and for my thesis. I have no idea how most of it works, as the method mentioned above (find someone else's working file and modify it for your own purposes) has served me well so far...

By Stephen Wells (not verified) on 10 Jan 2008 #permalink

Aw, c'mon. If you are going to do real typesetting, you'll use troff!

:-)

I think one of the hardest parts of my PhD was modifying the LaTeX style file for dissertations at my school. The style file in question had been circulating around the physics department for years but the last person to do anything to it was long gone and it was just sitting on his old department web-page. Of course the year I was finally finishing the grad division decided to revamp the dissertation formatting guidelines. I could have done without that few days of LaTeX hacking at that point in my life.

The best advice is to get away from plain TeX, and switch to LaTeX. LaTeX is a lot easier to use for most things, and has a lot of extra macros that make it easier to set complex equations. The latest edition of Lamport's book on LaTeX, plus the LaTeX Companion are pretty accessible.

Also if people are coming from a word processor to LaTeX, the program WinEdt can provide a nice stepping stone. It's really well integrated with MikTeX on Windows and provides a nice interface for editing LaTeX documents.

for us true software eng geeks, (La)TeX has an additional bonus: it can be easily and efficiently revision controlled. binary word processor formats tend to be incompressible blobs for which version control brings no benefit over simple backups, whereas text-based markup formats like TeX can actually generate readable diffs when version controlled.

By Nomen Nescio (not verified) on 10 Jan 2008 #permalink

I remember one of the first programs I downloaded from the nascent internet was a viewer to handle LaTeX files in windows. Followed by a program that could handle the .tar extension ...
Somehow that experience was not conducive to convince me of the need to figure out any more about that program, which is why I'm still stuck with Office Equation editor 3.0.

I refuse to use powerpoint to do slide presentations.

Pdflatex and the acrobat reader work great.

By Chris Noble (not verified) on 10 Jan 2008 #permalink

The question that should be asked of D.E.K. is, "Why does TeX NOT support the use of
Potzrebies as units?"

I have asked my students to write him a letter (you know pencil and paper) and get a definitive answer. Even an offer of extra credit has not helped. Maybe I should pay the one who does $2.56 + postage.

Been using latex for years, and that, along with emacs/auctex, whizzytex, and jabref, is my main text writing environment for nearly everything. It's a pleasure.

And my slightly odd use for latex: whenever I need to learn some gnarly mathematics -- really well -- I always summarize the topic in latex. Because the output is so drop-dead gorgeous, I feel compelled to make the content 100% correct and lucid!

By anonymous (not verified) on 10 Jan 2008 #permalink

Nice post, thanks for sharing. I had no idea about the boxes and glue model behind TeX. Do you know of any other site that mentions this?

Sorry if I sound like a bore, but you misspelled "weird" a couple of times.

By anonymous (not verified) on 10 Jan 2008 #permalink

If you're a programmer and you don't know who Knuth is, well... I have no idea what rock you've been hiding under, but you should probably be fired.

Could you send this recommendation to my superiors, please? I want to get away from doing tech support....

Just curious, what do you think of Texinfo? I think it's *also* a set of TeX macros, though maybe not as hairy as LaTeX?

xander: I'm the last person you should ask for TeX help. For my recent master's thesis (did I mention, I am _now_ a computer-sciencey type) I used LaTex and found Wilkins' "Getting Started With LaTeX" to be enormously helpful (http://www.maths.tcd.ie/~dwilkins/LaTeXPrimer/). It's pretty ancient, but still covers the basics well.

Even beyond the typesetting system itself, TeX provided a lingua franca for sending mathematics via ASCII. Whenever I'm writing an email involving mathematics, I end up writing the equations in TeX source code. Since all mathematicians and physicists have a subset of a TeX compiler installed in their heads, this is a supremely efficient way of getting mathematics back and forth by email.

I like using TeX (and LaTeX) because all I need is a text editor and a command-line. None of this wussy GUI stuff. :-)

Actually, there is a nice WYSISWM* frontend for LaTeX called LyX, it makes it a bit easier to use (and you can embed raw TeX into your document also, which is good for custom headers and such).

*What You See Is What You Mean

For comments #32 and #34, you can define a potzrebie in TeX:

\newdim\potzrebie
\potzrebie=2.263mm

or LaTeX:

\newlength{\potzrebie}
\setlength{\potzrebie}{2.263mm}

That can be used the same as other length commands, like \textwidth. As stated by those living in Lansing and a few other cities around the country, TEX RULES!!!

Oops. Meant Houghton, not Lansing.

TeXnicCenter is the best graphical environment for TeX I've found on Windows, and unlike WinEDT, it's completely free.

I've always believed that TeX is the programming language of the devil. Seriously, if you were satan, and you wanted to cause maximum suffering and insanity among programmers, how would you do it? C++ is simply too easy to debug. No one will make you program in APL. Perl 6 will simply never be finished. And deliberately bad languages like INTERCAL are easily avoided and can just be laughed at. If you really want to cause suffering, you have to do more than just design an evil programming language: you have to give people a reason to program in it. Hence, TeX, the world's best typesetting engine tied to the world's most diabolical programming language. Diabolical in the sense that the quantity Suffering_per_programmer à Number_of_programmers is maximized. Programming in TeX combines the worst features of stack languages (with several different stacks) with macro languages.

I've found tex's great advantage to be PORTABILITY between my different machines and co-authors. Also, it handles large documents (e.g. a thesis) much better than Word (at least it did in 1994).

My one beef is that I'd like to change the font, and this seems nearly impossible to do.

Aw, c'mon. If you are going to do real typesetting, you'll use troff!

:-)

Don't laugh! I used to be a LaTeX user, and I found it so impossible to customize that I switched to (g)roff. I think almost everything about it is easier to use than LaTeX (including typesetting mathematics) and the quality of the typesetting is not that much poorer.

I was aware of Knuth from Robert X Cringely's Accidental Empires, but I've never used *TeX. However, having seen a little source I see it was probably the inspiration for Lilypond.

By John Ferguson (not verified) on 13 Jan 2008 #permalink

John:

More than just the inspiration. Lilypond uses tex for typesetting. They created a ton of special fonts for typesetting music, and TeX does the work of generating the output.

Hi - thought you might be interested in this blog posting I wrote about Donald a few months ago. Good if you have some free time on your hands!

Excerpt:

The Peoples Archive is a fantastic resource of many different people telling their life stories in front of a camera. All the stories are also transcripted for easy reading. What makes this a cut above any television interview you may have seen is the depth and length of the interviews. Donald Knuth, for instance, tells his life story in 97 parts. Most interviews last at least 2 hours and many over 4.

I don't know of any other software other than TeX implemented in the 1970s

TeX wasn't frozen until 1989; in the 70's we were using troff.

Knuth also took a little 10 year break to develop METAFONT so he could design his own fonts, since he didn't find any of the existing ones satisfactory to his tastes (i.e., perfect).

Some other fun facts about Knuth:

He's a devout Christian, and wrote 3:16 Bible Texts Illuminated.

Versions of TeX are numbered 3, 3.1, 3.14, ... 3.141592

He offers a reward for the very rare event of finding an error in his books and programs. The initial reward for finding a bug in TeX or METAFONT was $2.56; the amount doubled each time, reaching $327.68 (this doubling scheme would have sent any normal error-prone human being to the poor house).

He hasn't had an email address for 8 years, doesn't make appointments with visitors, travel to conferences, or accept speaking engagements; this gives him more time for his life's work, The Art of Computer Programming, in 7 volumes, of which he has published 1-3 and parts of volume 4, which will be published in at least 3 subvolumes. Volume 5 is in progress and is estimated to be ready in 2015. After he completes volume 5, he plans to update volumes 1-3. He only plans to continue to volumes 6 and 7, which are more specialized, if the material is still current and hasn't been published by other authors.

He plays piano and organ.

He has a Chinese name, é«å¾·çº³.

By truth machine (not verified) on 14 Jan 2008 #permalink

Good to see a balanced opinion of TeX. I love it for the ingenuity it reveals about its creator, and for the utility it has provided the world, but I find it frustrating that the general appearance of the input, and the design of its programming language, are so awful. It's basically just so low-level (even LaTeX, to my mind). I wish someone would dress the same power up in some Ruby (an elegant programming language) clothing.

Nowadays I happily use MathType in Word for my equation needs, and am happy with that. If I was writing long and serious dissertations, I'd seriously consider LaTeX, but...

BTW, noone has mentioned ConTeXt, which looks very interesting as a more presentation-friendly layer on top of TeX.

I'm not a programmer, but I do work in math publishing (with SIAM, mostly putting together the SIAM Review), so I guess I owe a lot to Mr. Knuth. I'd hate to have to typeset math by hand, that's for sure! Some mathematicians are better at writing LaTeX code than others, but it beats working with a Word document any day. And after 9 years, I'm still learning LaTeX tips and shortcuts, mostly from our more clever authors.

I have earned a good portion of my living since about 1991 by producing technicals books with TeX. Most of those books were produced with a macro package I wrote to replace LaTeX. I certainly owe Don Knuth a debt of gratitude.

TeX is indeed fairly high on the funkeroo scale for programming languages. But it is not Perl, and so I'm happy.

~~ Paul

By Paul C. Anagno… (not verified) on 17 Jan 2008 #permalink

The only problem is the poor CJK support, and there are two rival systems of Chinese Latex - Simplified Chinese and Traditional Chinese versions!!!

Thanks, Mark, for a fun and enlightening post.
Donald Knuth is indeed a god, and I hope he had a great birthday.
Your disquisition on boxes and glue hit me like a hammer: I never realized it until you explained it so well, but the ideal of typesetting beauty in TeX is a minimum of ugliness, and what TeX does is solve a variational problem.
Heavy, man.
Well, that's enough epiphany for one day. You have probably already noticed this, but some of the most beautifully presented mathematics in blog posts may be found over at Tamino's place, and one reason is that he uses LaTeX for the web.

As a typesetter that traveled from hot type to cold type to desktop and has a CS degree. Knuth's book on type setting was fun; it started me thinking about what I was doing in print.

'Glu' made me think of what H&J really meant and how you would implement it.

One of thingies lost is the random variations in a hot type font. An 'f' used several times would vary slight each - each would be a different 'f' molded in lead. Each would have slightly different age, dents and wear which your eye and brain would notice just underneath consciousness, but I always thought that these slight variation were of intertesting to your eye - like when a filling falls out and your tongue can not keep from exploring the hole.

I loved Knuth's computer books.

It's basically just so low-level (even LaTeX, to my mind). I wish someone would dress the same power up in some Ruby (an elegant programming language) clothing.

I think something like Curl is the way to go (unfortunately it's proprietary).

By truth machine (not verified) on 26 Jan 2008 #permalink

I really shudder to think of how far back the typewritten journals and books of the 60s-80s set the field of mathematics. It's just impossible to read that stuff!

Also, are you familiar with asymptote? It's a vector-graphics language with the box-and-glue approach. Quite nice, really. Better than faking things in photoshop, which is how I used to do it.

Reading about people like this you can't but admire them! First, their brain. Second, the use this brain has brought to humanity.

By IT job search (not verified) on 04 Feb 2010 #permalink

To Whom It May Concern:

I am writing my first of the ten math books I am planning to publish within three years time frame. And I would appreciate your courtesy and kindness in helping me to find the most flexible and the easiest math editing software that I can possibly use.

Your help would be well appreciated.

Paul Sinclair
psinclair@live.com
Home Phone: 604-253-6237
iphone:1(778)928-7128

By Paul Sinclair (not verified) on 23 May 2010 #permalink