[tz] [PATCH v2 3/4] zic.8: Use correct escape sequences instead of special characters

G. Branden Robinson g.branden.robinson at gmail.com
Sat Nov 26 21:19:47 UTC 2022


Hi Paul,

At 2022-11-25T18:31:02-0800, Paul Eggert wrote:
> On 2022-11-23 10:43, Paul Eggert wrote:
> > I installed that
> Further testing showed that the installed patch doesn't work with
> traditional troff, which doesn't support groff escape sequences like
> \(aq.

I think this patch goes too far in the retrograde direction.

\(xx, where xx is any two characters, is not a groff extension.  It
comes from Ossanna troff all the way back in the mid-1970s.

It is a special character escape sequence; a groff way of spelling it
is \[xxx] where xxx can be of any nonzero length (but cannot contain a
closing square bracket).

The repertoire of supported special character identifiers varies by
implementation and, after Kernighan's rewrite of troff circa 1980 for
device-independence, by output device.  Nevertheless, for
portability/backward compatibility, a set of them are very widely
supported.  These include three that your patch takes out, \(ha, \(ga,
and \(ti.  Replacing these with ASCII characters will _not_ produce
correct typography on typesetting output devices.

I would attach scans of Tables I and II from "NROFF/TROFF User's
Manual", the version dated 1976, published with Volume 2 of the Unix
Programmer's Manual (1979), and reprinted by Holt, Reinhart, and Winston
in 1983, but the linux-man list rejects all attachments bigger than a
breadbox, so I will ask for your trust (or ask me for it privately).

Those tables illustrate the glyph repertoire of Ossanna troff and the
special character identifiers that were implemented.

groff_char(7) from groff 1.22.4 and earlier marks the special character
identifiers you can expect to be portable (with "***" in its listings),
and for 1.23 I have added a "History" section to the page which
addresses most of the thousand questions I've asked over the past few
years while trying to learn this stuff.  I'll put that in a footnote.[1]

> To fix this I installed the equivalent of the attached further patch to
> TZDB.

I therefore propose the following snippet instead, also taking into
account Solaris 10 troff's poor handling of unsupported font selections
in nroff.

.q + .
To allow for future extensions,
an unquoted name should not contain characters from the set
.ie \n(.g .q \f(CR!$%&\(aq()*,/:;<=>?@[\e]\(ha\(ga{|}\(ti\fP .
.el .ie t .q \f(CW!$%&'()*,/:;<=>?@[\e]\(ha\(ga{|}\(ti\fP .
.    el   .q !$%&'()*,/:;<=>?@[\e]\(ha\(ga{|}\(ti .
.TP
.B FROM
Gives the first year in which the rule applies.

What do you think?

Regards,
Branden

[1] (Much UTF-8 follows.)

History
    A consideration of the typefaces originally available to AT&T nroff
    and troff illuminates many conventions that one might regard as
    idiosyncratic fifty years afterward.  (See section “History” of
    roff(7) for more context.)  The face used by the Teletype Model 37
    terminals of the Murray Hill Unix Room was based on ASCII, but
    assigned multiple meanings to several code points, as suggested by
    that standard.  Decimal 34 (") served as a dieresis accent and
    neutral double quotation mark; decimal 39 (') as an acute accent,
    apostrophe, and closing (right) single quotation mark; decimal 45
    (-) as a hyphen and a minus sign; decimal 94 (^) as a circumflex
    accent and caret; decimal 96 (`) as a grave accent and opening
    (left) single quotation mark; and decimal 126 (~) as a tilde accent
    and (with a half‐line motion) swung dash.  The Model 37 bore an
    optional extended character set offering upright Greek letters and
    several mathematical symbols; these were documented as early as the
    kbd(VII) man page of the (First Edition) Unix Programmer’s Manual.

    At the time Graphic Systems delivered the C/A/T phototypesetter to
    AT&T, the ASCII character set was not considered a standard basis
    for a glyph repertoire by traditional typographers.  In the stock
    Times roman, italic, and bold styles available, several ASCII
    characters were not present at all, nor was most of the Teletype’s
    extended character set.  AT&T commissioned a “special” font to
    ensure no loss of repertoire.

    A representation of the coverage of the C/A/T’s text fonts follows.
    The glyph resembling an underscore is a baseline rule, and that
    resembling a vertical line is a box rule.  In italics, the box rule
    was not slanted.  We also observe that the hyphen and minus sign
    were already “de‐unified” by the fonts provided; a decision whither
    to map an input “-” therefore had to be taken.

           ┌────────────────────────────────────────────────────┐
           │A B C D E F G H I J K L M N O P Q R S T U V W X Y Z │
           │a b c d e f g h i j k l m n o p q r s t u v w x y z │
           │0 1 2 3 4 5 6 7 8 9 fi fl ffi ffl                   │
           │! $ % & ( ) ‘ ’ * + - . , / : ; = ? [ ] │           │
           │• □ — ‐ _ ¼ ½ ¾ ° † ′ ¢ ® ©                         │
           └────────────────────────────────────────────────────┘

    The special font supplied the missing ASCII and Teletype extended
    glyphs, among several others.  The plus, minus, and equals signs
    appeared in the special font despite availability in text fonts “to
    insulate the appearance of equations from the choice of standard
    [read: text] fonts”—a priority since troff was turned to the task of
    mathematical typesetting as soon as it was developed.

    We note that AT&T took the opportunity to de‐unify the
    apostrophe/right single quotation mark from the acute accent (a
    choice ISO later duplicated in its 8859 series of standards).  A
    slash intended to be mirror‐symmetric with the backslash was also
    included, as was the Bell System logo; we do not attempt to depict
    the latter.

        ┌──────────────────────────────────────────────────────────┐
        │α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ σ ς τ υ ϕ χ ψ ω         │
        │Γ Δ Θ Λ Ξ Π Σ Υ Φ Ψ Ω                                     │
        │" ´ \ ^ _ ` ~ / < > { } # @ + − = ∗                       │
        │≥ ≤ ≡ ≈ ∼ ≠ ↑ ↓ ← → × ÷ ± ∞ ∂ ∇ ¬ ∫ ∝ √ ‾ ∪ ∩ ⊂ ⊃ ⊆ ⊇ ∅ ∈ │
        │§ ‡ ☜ ☞ | ○ ⎧ ⎩ ⎫ ⎭ ⎨ ⎬ ⎪ ⌊ ⌋ ⌈ ⌉                         │
        └──────────────────────────────────────────────────────────┘

    One ASCII character as rendered by the Model 37 was apparently
    abandoned.  That device printed decimal 124 (|) as a broken vertical
    line, like Unicode U+00A6 (¦).  No equivalent was available on the
    C/A/T; the box rule \[br], brace vertical extension \[bv], and “or”
    operator \[or] were used as contextually appropriate.

    Devices supported by AT&T device‐independent troff exhibited some
    differences in glyph detail.  For example, on the Autologic APS‐5
    phototypesetter, the square \(sq became filled in the Times bold
    face.

[The lowercase Greek letters in the last boxed table above render in
italics where feasible; it is not when pasting into a plain text email.]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://mm.icann.org/pipermail/tz/attachments/20221126/78a34bdf/signature-0001.asc>


More information about the tz mailing list