big-picture comments on your proposed extensions to ISO C <time.h>

Tue Sep 29 19:56:18 UTC 1998

(I hope you don't mind Paul, if I reply on tz)

Paul Eggert wrote on 1998-09-29 01:34 UTC:
> Here are some big-picture comments on your proposed extensions to ISO
> C <time.h>.  I've hacked up a copy of your HTML proposal along these
> lines, if you're interested in reading it.  (I still don't have a
> revised rationale, though.)
> 
> It's very hard work to come up with a good spec!  After trying my hand
> at it, I respect the work you've done so far; the following comments
> are in the spirit of improving it to be as useful as possible.
> 
> I have some more detailed comments once you've had time to digest
> these bigger-picture comments, but first things first.....
> 
> * Terminology proposal: let's rename `xtime' to `stime' (short for
>   ``standard C time''), uniformly.  The rest of my discussion assumes
>   this new terminology.

Hm, see my separate posting on the many different renaming proposals
that I received.

> * struct stime is inconvenient for programmers.  It's hard to portably
>   subtract such values, much less do other arithmetic on them.  Even
>   comparing them is error-prone.  This structure's members are derived
>   from a similar structure in POSIX.1, but the POSIX.1 structure was
>   predicated on not having integer types longer than 32 bits.
>   
>   Instead, let's just use a signed integer count of the number of time
>   intervals since the epoch. We can define a type stime_t for this
>   integer, and a stime_t macro STIMES_PER_SEC giving the number of
>   time intervals per second.  This makes time arithmetic much, much
>   easier.

I am afraid, but I strongly disagree here. I think my current approach
is more robust, functional, and therefore preferable. The reason: Most C
9X implementations will not provide any type with more than 64-bits. If
we take a 64-bit type, then you can have only nanosecond resolution over
an interval of 58 years. This is unacceptable; even the notorious Y2K
COBOL programs fail only after 99 years. Note that this is independent
of whether you use a 64-bit int or a 64-bit float type, 64-bit is just
not enough for a high-resolution timestamp.

About the "convenience" argument in general (and I'll come back to this
for Antoine's xtime_get return value critique): Compared to modern
programming languages with their variable length arrays, exceptions,
garbage collectors, controlled name spaces, etc. C is and will always be
a comparatively simple and quite inconvenient early-1970s language. If
you want convenience, look for another language than C. The interesting
thing about C these days is mostly that it is ubiquitously implemented.
Most programming environments for more modern and comfortable languages
are built on top of the existing C API. If you read the C++, Java, and
Ada95 standards, then it becomes embarrassingly  obvious that the
abilities of their standard library has been limited by the existing
ability of the C library, therefore is is extremely important to get the
functionality of the C API right. The comfort should certainly come
after functional criteria.

I do not understand, why it should be difficult to portably add and
subtract struct xtime values. The new <stdint.h> offer a reasonably good
support here. The algorithm is trivial: Before any arithmetic, you first
check whether there is a leap second in one of the arguments and abort
if there is (because arithmetic is ill-defined in this case). Then you
just add sec and nsec separately and adjust for the nsec overflow. Note
that non-leap nsec values do not overflow the nsec type, because 2 * 1e9
< 2**31. It all fits so nicely together with nanoseconds, I couldn't
think of a more appropriate representation. The POSIX.1b people made
certainly a good choice here.

In programming languages such as C++ or Ada where operators like "+" and
"-" can be overloaded, the authors of bindings to the xtime API will
certainly add the corresponding overloadings, and then xtime is as easy
to use as a float type. If you insist, we could add macros or functions
for struct xtime arithmetic, but I think it is better to let users do
this themselves to make sure they have thought about the leap second
aspects of arithmetic, which are certainly application dependent.
(Again, the lack of an exception mechanism in C prevents to do a robost
and comfortable interface at the same time, so we stay on the robust
side and leave to comfort to languages with proper mechanics to do it.)

How would you represent leap seconds in your stime_t arithmetic type?
This was after all besides the resolution concerns the major reason for
getting rid of time_t and its idea of using an arithmetic type.

> * We should pass timestamps by value whenever possible, instead of by
>   reference.  Passing timestamps by reference makes the program more
>   error-prone and (these days) slows it down.  This is particularly
>   important for stime_get.

That is an interesting point. Actually, I have never done any
measurements on whether passing a 96-bit struct by reference is more
efficient than passing it by value with the usual C compilers. Does
anyone have any numbers on this for typical 32-bit processors?

For stime_get however this is not an option: the return value is
certainly required for comfortable error checking. Other programming
languages with exceptions could probably return the time value here, but
in C we have no exceptions and therefore (as xtime_get can fail), we
will have to use it in some form of if statement with an explicit
exception handler.

> * We should constrain the implementation to support times through a
>   reasonable upper bound (e.g. 9999 AD, or perhaps something a bit
>   sooner since we'll have to reform our calendar well before then).
>   But we shouldn't insist on astronomical timescales; that places
>   an undue burden on the implementation.

What would this burden be? We know that 32-bit is too small, and the
next best size is 64-bit words or with nanoseconds a 96-bit struct. The
last non-power-of-two machine that I worked on was dumped >10 years ago
and never had a C compiler. These are of academic interest. There is no
practical intermediate size between a useful range and a 64-bit second
counter.

I think it is much more of an burden for the application programmer in
the end to leave the time encoding and epoch undefined.

> * We should specify better what happens before the epoch for TIME_UTC
>   and TIME_TAI.

Actually, I would suggest that we specify the functionality of a

  xtime_make(&xtp, &tmptr, NULL);

call completely by providing an example of a correct implementation in
the standard (should be possible in less than 50 lines). Real code is
much clearer here than any pseudo-mathematic specification. I started to
add such example code to my web page a few days ago, but was interrupted
and didn't get around yet to finish. Volunteers are welcome (especially
getting the leap year formula for negative years correct might be a bit
of a brain tweezer). Such example code resolves many ambiguities and
also should solve your concern here. As far as TIME_TAI is concerned, I
do not expect *ANY* implementation to support xtime_conv with pre-1972
TIME_TAI values. The other functions do not care about the difference
between TIME_TAI and TIME_UTC anyway, and the semantics of the TIME_UTC
clock relative to the Gregorian calendar withh be specified in the
example code.

> * We shouldn't insist on a particular type like `int_fast64_t' for
>   timestamps.  I assume that this type was chosen to encourage
>   portability,

I chose it for range and efficiency and fortunately. With int_fast64_t C
provides me with exactly what I had in mind. I am not sure what type of
portability you are referring to.

>   but it's not truly portable, since int_fast64_t might
>   have more than 64 bits.

Please explain what the problem would be there. I don't see any.

>   Also, as mentioned above, it's way way
>   overkill for almost all applications.  It's better to keep the
>   timestamp type a bit more abstract, like stime_t, and to place
>   constraints on it as needed, as described above.

The abstractness of types like time_t in ISO C 89 was not done because
this abstractness was considered to be good and beautiful design. On the
contrary, is was a necessary hack because the ISO C standard had to be
backwards compatible with a few strange C implementations. In this
respect, ISO C 89 was a step backwards compared to K&R C. (Special
thanks to Nick Maclaren <nmm1 at cus.cam.ac.uk> for providing me with the
historic background on this.) To my big surprise, many people today
start to admire the hacky type mess that ISO C had to introduce because
of bad PC compilers as glorious and wonderful abstractness, having
completely forgotten the original reasons for all the awful
type-uncertainties of ISO C 89.

> * The stime_get interface is confusing, since it has several
>   operational functions.  It's better to separate them out into one C
>   function for each operational function; e.g. we should have a
>   separate function stime_getres to get the resolution, instead of
>   passing a flag to stime_get to have it return the resolution instead
>   of the current time.  Similarly, we should have a separate function
>   that gives us an error bound on the clock, rather than have a
>   TIME_SYNC flag.

I considered defining several functions, and then deliberately packed
all these functions into a single one. The main reason is that this
packing of functionality allows implementors to add more functionality
by just giving additional option and return bits a meaning, without
cluttering up the namespace in non-portable ways.

My proposed xtime_get interface allows you to add additional clocks to
be requested (UT1 is one example I mentioned in a note in the proposal),
or allows additional data to be requested (like is the clock coming in
over a trusted channel, when was the last connection to this clock,
estimated error interval size, etc.). I prefer to have a single more
universal function that provides a clear way of addition additional
functionality in a portable way over a set of functions that leave no
room for extensions. It is always easy to add another biot, but is is
difficult to add another function in implementations in a binary
compatible way (thinking about shared libraries and these issues).

> * We should require at least POSIX-style time zone specifications for
>   tz_prep, and should suggest Olson-style.  As things stand, there's
>   little that portable code can do with that function.

Requiring POSIX-style time zone specifications as a minimum
functionality would be ok for me, but so would be to leave the string
completely application defined and just mentioning POSIX as one possible
implementation. I don't know whether the Olson-style ones are
standardizable unless we get some sort of official ISO registry for time
zone specifications. As I have pointed out before, I don't like the
continent prefix in the Olson/style names, therefore I do not want to
make this particular syntax immortal in an ISO standard (I much prefer
just ":Paris" over "Europe/Paris").

Note that the tzstring will usually be directly entered by the user (say
via a config file or via an environment variable), and the portable
application does not have to be aware of the syntax used on this system.

> * We should codify the convention that getenv("TZ") returns the
>   user-preferred time zone.  A null tzstring should expand to a
>   system-defined time zone.  This functionality is in practice and is
>   useful e.g. in mailer software, where you may or may not want to
>   allow the user to specify the time zone.

OK, that sounds like a good idea.

> * stime_make still has the old mktime problem that the function can't
>   distinguish a request for ``3 days after Feb 28'' from a request for
>   ``1 month before Mar 31''.  It's silly that mktime thinks that 1
>   month before Mar 31 is Mar 3 (or 2, if it's a leap year).  We should
>   fix this.

I think I fixed this very nicely by not requiring xtime_make to handle
*any* invalid time representations. If you want 3 days after Feb 28,
then you just add 3 * 86400 to the sec field. The extremely ugly hacks
with mktime overflows and underflows have become obsolete by defining
the encoding of the time representation to allow direct arithmetic. It
is very difficult to define mktime overflow behavior nicely, so why
bother if there is no need?

> * The proposal for strfstime still involves magic; e.g. how can the
>   strftime easily determine the time zone abbreviation?

strfxtime gets the timezone object passed as a paremeter. All
information can be stored in there and accessed directly. No need to
extend struct tm.

>   I'd rather
>   have a strftime that could in principle be written by the user; it's
>   common practice to write augmented strftime implementations, which
>   grind their teeth over questions like these, and it'd be better if
>   we let people write such functions cleanly.

If you need access to the zone name, then simply use strfxtime to access
it. I don't understand why it should be stored in any struct tm
extention if we have a function to read it.

>   This will involve a new
>   type `struct stm' that contains extra members so that no more magic
>   is needed.  Something like this:
> 
> 	struct stm {
> 	  stime_t year;
> 	  int month;
> 	  int month_day;
> 	  int hour;
> 	  int minute;
> 	  int second;
> 	  stime_t utc_offset;
> 	  stime_t dst_offset;
> 	  const char *zone;
> 	  const char *zone_description;
> 	};

No, please no additional structs. The existing one is good enough to
represent and handle a full broken-down time. All other information is
fully accessible via strfxtime. I consider the mixture of brocken-down
time and timezone information conceptually dubious. For me a time zone
is a function that maps broken-down times onto UTC, and not just
auxiliary data on a broken-down time. The existing tm_isdst field seems
to me to be sufficient (although not optimal) to handle ambiguities.

> * We shouldn't have a separate error function just for stime. Instead,
>   functions that report errors should yield an error number, which can
>   be passed as an argument to strerror. This will simplify the interface
>   (e.g. we don't need to worry about LC_MESSAGES).

That's perhaps worth thinking about. It depends on how detailed you want
to make these error messages, and whether a single number can carry all
information. Timezone strings can be rather tricky to get right and a
comfortable diagnostic might be useful here.

Thanks for your comments.

Markus

-- 
Markus G. Kuhn, Security Group, Computer Lab, Cambridge University, UK
email: mkuhn at acm.org,  home page: <http://www.cl.cam.ac.uk/~mgk25/>