[tz] [PATCH v2 3/4] zic.8: Use correct escape sequences instead of special characters

G. Branden Robinson g.branden.robinson at gmail.com
Tue Dec 13 23:24:07 UTC 2022


[dropping Alex and Geoff Clare of TOG, but keeping mailing lists because
Paul corrected me on a significant point and I won't have anyone
claiming I don't own up to my mistakes; still, length warning: 189
lines]

Hi Paul,

...finally getting back to this, with belated thanks.

At 2022-11-26T16:12:25-0800, Paul Eggert wrote:
> On 2022-11-26 13:19, G. Branden Robinson wrote:
> > I would attach scans of Tables I and II from "NROFF/TROFF User's
> > Manual", the version dated 1976, published with Volume 2 of the Unix
> > Programmer's Manual (1979)
> 
> Thanks for looking into this. It took me a trip down memory lane as I
> believe I was the first person to submit a computer-typeset PhD thesis
> to UCLA.

Cheers!

> I used 7th Edition Unix troff along with the C/A/T phototypesetter
> that was troff's main target in the 1970s. (As an aside, the C/A/T was
> why stderr was invented; see Diomidis Spinellis's "The Birth of
> Standard Error" 2013-12-11 <https://www.spinellis.gr/blog/20131211/>.)

I'll bet a lot of readers didn't know that one, but I did, and when I
found out about it via the TUHS list I was so tickled that I added a
link to groff's Texinfo manual.

  standard error stream.  The notation then serves to identify the
  output stream and does not necessarily mean that an error has
  occurred. at footnote{Unix and related operating systems distinguish
  standard output and standard error streams @emph{because} of
  @code{troff}:@:
  @uref{https://minnie.tuhs.org/pipermail/tuhs/2013-December/006113.html}.}

> Solaris 10 /usr/bin/troff is largely unchanged from 1970s troff, and
> supports \(ga but none of the other escapes you mention, I expect
> because they were not present in the Bell Labs special font version 4
> and Commercial II that Unix assumed on the C/A/T.

I admit to some shock here.  The 1976 version of Ossanna's nroff/troff
manual, CSTR #54, explicitly documents--

--wait, no it doesn't.

<blinks>

[Some UTF-8 follows, because it's essential to the discussion of
glyph/character repertoire.]

Apparently I outright hallucinated the presence of \(ha and \(ti in
"Table II: Input Naming Conventions for ’, ‘, and — and for Non-ASCII
Special Characters".  \(ga is there like you said but \(ha and \(ti are
not.  I managed to sustain this delusion despite acquiring a paper copy
of the HRW 1983 printing of both volumes of the Version 7 Unix
Programmer's Manual (typeset with the C/A/T itself), and reading it with
especially loving attention to the troff material.  By God, I told
myself, I'll figure this stuff out.

Hrm.  Vexing.

Lest some readers think this is a ridiculous thing to have gotten wrong,
permit me to quote one of the paragraphs interstitially present in
"Table II"'s 2 tables spread over 2 pages.  Times--and the Times
font--were very different in 1973, when the Bell Labs CSRC took delivery
of the C/A/T.

"The ASCII characters @, #, ", ’, ‘, <, >, \, {, }, ˜, ˆ, and _ exist
_only_ on the special font and are printed as a 1-em space if that font
is not mounted."

So why did I use so much non-Basic Latin Unicode to quote a list of
_ASCII_ characters from the CSTR #54 document?  Because that's what they
_look like_.  Some material in the groff_char(7) man page speaks to it.

History
    A consideration of the typefaces originally available to AT&T nroff
    and troff illuminates many conventions that one might regard as
    idiosyncratic fifty years afterward.  (See section “History” of
    roff(7) for more context.)  The face used by the Teletype Model 37
    terminals of the Murray Hill Unix Room was based on ASCII, but
    assigned multiple meanings to several code points, as suggested by
    that standard.  Decimal 34 (") served as a dieresis accent and
    neutral double quotation mark; decimal 39 (') as an acute accent,
    apostrophe, and closing (right) single quotation mark; decimal 45
    (-) as a hyphen and a minus sign; decimal 94 (^) as a circumflex
    accent and caret; decimal 96 (`) as a grave accent and opening
    (left) single quotation mark; and decimal 126 (~) as a tilde accent
    and (with a half‐line motion) swung dash.  The Model 37 bore an
    optional extended character set offering upright Greek letters and
    several mathematical symbols; these were documented as early as the
    kbd(VII) man page of the (First Edition) Unix Programmer’s Manual.

    At the time Graphic Systems delivered the C/A/T phototypesetter to
    AT&T, the ASCII character set was not considered a standard basis
    for a glyph repertoire by traditional typographers.  In the stock
    Times roman, italic, and bold styles available, several ASCII
    characters were not present at all, nor was most of the Teletype’s
    extended character set.  AT&T commissioned a “special” font to
    ensure no loss of repertoire.

(Nit: one character, the broken bar ¦, got lost anyway.  I guess no one
missed it.)

> The source code of 7th Edition Unix troff agrees with Solaris 10
> behavior here, and this also agrees with 7th Edition Unix
> /usr/doc/troff/table2 which documents \(ga but none of the other
> escapes you mentioned. I'm a bit surprised that the printed manuals
> you mention disagree with 7th Edition Unix,

Imagine how surprised I was when I found I had deceived myself!  Usually
my vision sucks this badly only when reviewing my _own_ work.

None of these three appear in the 1992 revision of CSTR #54 (revised by
Kernighan and documenting device-independent troff extensions).  I would
say they are GNU extensions, but two others that one might impugn with
such a descriptor are \(aq and \(dq (along with \(ga) appear in
Documenter's Workbench (DWB) troff 3.3 font descriptions for its
PostScript driver,[1] which I have no reason to believe isn't about 10
years older than that version of CSTR #54.  Device-independent troff
made it easy to specify your own special character names; people did.

> but anyway it doesn't matter all that much since Solaris 10 is what it
> is.

Agreed.  And even though someone could have added special character
aliases of "ASCII" glyphs in Solaris's font description files 30+ years
ago, they didn't.  Perhaps the reason was a feeling that nothing good
ever came from GNU; a more likely explanation to me is a dedication of
religious intensity to the principle of inertia, similarly to why
Solaris kept the World's Worst Bourne Shell implementation, compliant
with no published standard ever, as /bin/sh for something like 30 years.

(Think I'm kidding?  https://www.in-ulm.de/~mascheck/bourne/segv.html )

> On other words, on Solaris 10 if I take this file 'foo':
> 
> 	.nf
> 	default font
> 	aq |\(aq| |'|
> 	ga |\(ga| |`|
> 	ha |\(ha| |^|
> 	ti |\(ti| |~|
> 	.ft CW
> 	CW font
> 	aq |\(aq| |'|
> 	ga |\(ga| |`|
> 	ha |\(ha| |^|
> 	ti |\(ti| |~|
> 
> and run the shell command:
> 
>    /usr/bin/troff foo | /usr/lib/lp/postscript/dpost >foo.ps
> 
> I get the attached file foo.ps, and 'evince' says only \(ga works and
> even there it's barely usable in the default font, as shown in the
> attached screenshot foo.png of 'evince' displaying foo.ps.

Right.  With the undefinedness of \(ha and \(ti as well as \(aq now
clear to me, nothing about your output surprises me.

> > .ie \n(.g .q \f(CR!$%&\(aq()*,/:;<=>?@[\e]\(ha\(ga{|}\(ti\fP .
> > .el .ie t .q \f(CW!$%&'()*,/:;<=>?@[\e]\(ha\(ga{|}\(ti\fP .
> > .    el   .q !$%&'()*,/:;<=>?@[\e]\(ha\(ga{|}\(ti .
> 
> With Solaris 10 in mind, in the second line of your proposed code the
> \f(CW...\fP and the \(ga are OK but the \(ha, \(ga, \(ti are dubious
> so I installed the attached patch instead.

Quite sensible.  As we discussed elsewhere, Solaris troff is scheduled
for retirement in January 2024, and groff 1.22.3 succeeded it.  While
old, it certainly supports \(aq, \(ha, and \(ti.

Thank you again for knocking the scales off my eyes here.

Regards,
Branden

[1] https://github.com/n-t-roff/DWB3.3/blob/master/postscript/devopost/R
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://mm.icann.org/pipermail/tz/attachments/20221213/cb9f6473/signature.asc>


More information about the tz mailing list