draft C9x <time.h> (followup to Clive Feather's 18 Jun comments)

Wed Jun 24 20:07:48 UTC 1998

   Date: Thu, 18 Jun 1998 17:32:40 +0100
   From: "Clive D.W. Feather" <clive at on-the-train.demon.co.uk>

   This explains, I hope, why there are no reentrant versions of the
   functions in <time.h> - no-one expressed any desire to have them
   added (or if they did, they didn't do anything about it).

It's fair to say that the C9x deliberations on <time.h> have not been
publicized well outside the committee.  The public C9x draft was the
first I'd heard of it, and I try to follow time zone programming
issues fairly closely.  On this particular issue, there is
considerable expertise outside the committee, and I hope that the
committee will be open to careful criticisms by outside experts.

There is a widely recognized need for ``reentrant'' versions of
localtime, gmtime, etc.  The need is so widely recognized that these
functions are now in POSIX.1.  It would be a shame for C9x to omit
them -- there's nothing POSIX-specific about them.

(I quoted ``reentrant'' above because the POSIX.1 functions aren't
really reentrant if you are changing locales or time zones on the fly,
but that's a larger subject and I presume that C9x won't be able to
address it.)

   7.16.1 para 3:

   Replacing tm_zone with tm_gmtoff (except that it ought to be called
   tm_utcoff)

True, but (alas) the name `tm_gmtoff' is in common use, and it's
consistent with the name `gmtime'.

   The tm_ext and tm_extlen members are not a kludge, but rather an anti-
   kludge! ...  WG14 felt the adopted proposal was optimal.

I'm afraid that the only opinions that I've seen (which are not from
WG14) are that the tm_ext and tm_extlen members are a kludge.  They
don't reflect existing practice, and there's no precedent for them in
other parts of the standard.  They are an experiment invented by the
committee, and it's quite likely that they won't work well in
practice.

Let me give an example problem that might arise with these new
members.  I assume that the implementation can allocate storage and
assign its address to tm_ext; surely this is part of the point of
tm_ext.  Can it later free that storage?  If so, then how can an
application ever copy struct tmx values returned by the
implementation, since their tm_ext members might be invalid?  If not,
then won't there be problems either with garbage collection or reused
static storage?

The storage allocation problem becomes worse if we have reentrant
functions like localtime_r, since then the static storage solution is
not even feasible and we are forced to deal with garbage collection
issues.

For this reason alone, tm_ext should go.

You mentioned that the committee considered other possible approaches;
what were they, and what objections did the committee have to them?
Perhaps I can suggest improvements on them that would overcome the
objections.

   7.16.2.3 para 4:

   I don't understand your objection. The paragraph you cite means, in
   effect:

       the normalization process shall not alter a broken-down time
       that was generated by the normalization process

But the standard also requires that the second call to mktime must
return the same time_t value as the first call.  That is what makes
the requirement unrealistic.

I see that I should have given more details about the problem (and I
apologize for not being clearer in my earlier comments).  Here's an
example.  Suppose I am in Sri Lanka, and invoke mktime on the
equivalent of 1996-10-26 00:15:00 with tm_isdst==0.  There are two
distinct valid time_t values for this input, since Sri Lanka moved the
clock back from 00:30 to 00:00 that day, permanently.  There is no way
to disambiguate these two time_t values with tm_isdst, since both
times are standard time.  Therefore, mktime should be entitled to
return either time_t value.

On examples like these, at least one mktime implementation (mine :-)
can return different time_t values for the same input at different
times during the execution of the program, since it uses a cache to
improve performance.  It's unreasonable for the standard to disallow
this performance improvement.

   To normalize a struct tm, the implementation should do the equivalent of:
   - copy it to a struct tmx
   - set the additional fields and tm_isdst as described
   - normalize the result
   - copy the relevant fields back, except for tm_isdst which is just set
     to negative, zero, or positive as needed.

But the last step of this process loses information, and the next
invocation of mktime cannot reasonably be expected to intuit the lost
information, as shown in the example above.

   7.16.2.4 para 3:

   If tm_isdst is negative the zone information is not available, so the
   implementation should assume that "local time" has allowed for any DST
   in effect - in this case, 7.16.2.6 will set X2 to 0 in the algorithm.

Thanks for the clarification.  Perhaps you could add some text to the
standard about this?

   7.16.2.6 para 1:

   A negative tm_isdst, in both struct tm and struct tmx, means
   "unavailable". If POSIX.1 tries to give it a different meaning, POSIX.1
   is broken.

I wasn't referring to tm_isdst's value when I wrote ``negative
daylight-saving time''; I was referring to a daylight-saving UTC
offset that is less than the standard time UTC offset.  I don't know
that this has ever happened in practice (I've often heard rumors about
it but none of them have ever checked out), but I'm leery that draft
C9x disallows it, since POSIX.1 does allow it.

   Note that you never need to handle negative DST: take all the
   UTC offsets that the locale uses, and make the most negative the base
   time with all others being DST (in other words, replace negative
   "summer" time with positive "winter" time).

But then tm_isdst can be nonzero even when daylight-saving time is not
in effect.  E.g. under your proposal, tm_isdst should be nonzero now
in Sri Lanka (even though Sri Lanka does not now observe
daylight-saving time), because Sri Lanka's UTC offset now (+0600) is
greater than some UTC offset that it has had in the past (e.g. +0520).

If this is what is really meant by struct tmx's tm_isdst member, then
the member's name is misleading.  Its name should be
`tm_offset_from_most_negative_historical_gmtoff' or something like
that.  But frankly, I don't see how such a member's contents would be
useful in practice, and I think users should avoid it entirely.

   7.16.2.6 para 2:

   These limits were chosen to allow calculations to be done in longs
   without having to make excessive effort to avoid overflow.

Even with no limits, a calculation shouldn't need excessive effort; it
should need only a relatively small sanity check near the end.

   Your statement that you can't calculate today's date is wrong:

       tm_mday = time_t_now / 86400;
       tm_sec  = time_t_now % 86400

My Unix host has leap second support, so that method doesn't work.

   However, I do understand your concern. I have no objection in principle
   to changing these limits....

Good; let's remove them.

   7.16.2.6 para 3:

   (1) The algorithm assumes that "day" means 86400 seconds at all
   times

This behavior is not reasonable in the presence of leap seconds.  It's
not what users will expect or want.  If I'm writing an accounting
application and ask for 1 day after midnight December 31, I don't want
mktime to return December 31 merely because that day happens to have a
leap second!

Also, this behavior is not consistent with the other well-established
properties of mktime.  If I am at the start of a month and add 1
month, mktime returns the start of the next month, regardless of how
many days are in the current month.  Similarly, if I am at the start
of a minute and add 1 minute, mktime should return the start of the
next minute, regardless of how many seconds are in a minute.

   If you can construct an alternative algorithm I'll be please to
   look at it.

There's no simple answer to this (which is why I don't think it ought
to be in the standard).  But since you asked, one way to do it is to
normalize seconds-per-minute in a way similar to days-per-month.  This
is done in my free version of mktime.  You can get a copy of a recent
version from the latest GNU Emacs source code
(<ftp:://ftp.gnu.org/pub/gnu/emacs-20.2.tar.gz>; see the file
src/mktime.c) or from recent GNU/Linux C library sources.

   (2) S and D *are* determined - while they are implementation-defined in
   some circumstances (those when X1 and X2 are used), the implementation
   should only be able to pick one value for any given unambiguous input
   and environment.

In order to choose X1 and X2, the implementation (in general) must
know S and/or D.  For example, how can the implementation choose the
number of leap seconds to insert (X1) until it knows which day we're
talking about (D)?  So the definition looks circular to me.

It sound like you're trying to break the circularity by saying that
the implementation consults an oracle to choose X1 and X2.  But in
that case, more explanation is needed.

For more about this please see (5) below.

   (3) It is C code (as should be clear from the typeface). There is no
   possibility of overflow if the limits of paragraph 2 are kept to,

Yes there is.  For example, suppose tm_hour == INT_MAX && INT_MAX ==
32767.  Then tm_hour*3600 overflows, even though tm_hour satisfies the
limits of paragraph 2.  This is just the first example I found; there
are others.

Also, if you remove the limits (as suggested by the comments to
7.16.2.6 para 2 above), then the overflow problem becomes worse if you
assume the spec is written in C.

If there is to be a detailed spec like this at all (which I'm not yet
convinced of, considering its problems), I suspect that it will be
much easier to write it using mathematical arithmetic than using C
arithmetic; and if done well, it won't be any harder to implement.

   and there are no promotions or conversions happening as far as I
   can tell.

The pseudocode doesn't declare the types of SS, M, Y, Z, D, or S.
However, I presume that `int' won't suffice due to potential overflow
problems like the one discussed above; and in that case, there will be
promotions or conversions.

(This problem is another argument for using mathematical notation
instead of C here.)

   (5) I wrestled with this problem myself and failed to come up with
   a good answer....  can you find a better way of expressing it ?

Sorry, I can't think of a simple change that will correct the problem.
The only avenue that I can think of is that the section could be
reformulated as a _constraint_ on S and D, not as a way of
_determining_ S and D.  That is, S and D would not be uniquely
determined by the inputs.  But this will require some real thought.

If time is pressing, I suggest removing this section completely.  If
that's not acceptable, perhaps you can put in some vague English that
expresses the intent.

   (6) Yes, an error does seem to have crept into the definition of D. The
   first line should read:

       D = Y * 365 + QUOT(Z,400) * 97 + REM(Z,400) / 4 - REM(Z,400) / 100 +

This response doesn't address my criticism that the definition of D is
unmotived.  You wrote in response to (5) that ``Effectively D and S
are the TAI date and time since some epoch,'' but you didn't say what
the epoch is, nor did you say what ``effectively'' means.  Please add
comments so that it's absolutely clear what D and S are, so that other
people can check your work.  My comments on this section proposed a
(slightly different) definition for D in which the epoch is
0000-03-01; this change doesn't need to be adopted as-is, but whatever
definition that _is_ adopted ought to be explained.

I currently have the impression that nobody other than yourself has
ever completely understood the definition as it now stands.  This is
an unsatisfactory state of affairs.

   The rest of the expression is correct (including the Y at the start).

Why is it Y and not Z?

   7.16.3.5:

   The zonetime function is the struct tmx version of localtime and gmtime,
   but instead of offering only 2 choices of zone it offers any zone.

No, zonetime doesn't offer _any_ time zone; it offers only local times
that can be characterized by a single, invariant UTC offset.  In
practice, local times usually cannot be characterized this way,
because they involve daylight-saving time, or historical changes to
the underlying standard UTC offset, or both.

For example, I can't use zonetime to determine the broken-down time
for `*timer' in London's time zone, because London observes
daylight-saving time (and also because London's standard UTC offset
has not always been zero).  This means that `zonetime' is of only
limited use in practical applications.  It hardly seems worth adding
to the standard.

   7.16.1 para 2:

   The limits of 14400 are correct. This allows you to adjust by up to 10
   days (not 1 day) in an unnormalized time without risking accidentally
   using _LOCALTIME. Ideally _LOCALTIME would be something like INT_MAX or
   LONG_MIN.

OK; but please add a footnote to this effect, as it's confusing otherwise.

   7.16.2.6 para 3:

   The macros can't hit overflow with the limits of paragraph 2.

Yes they can, because those limits are in terms of LONG_MAX, but
sometimes the arguments of the macros are int, not long.  (Also, as
mentioned earlier, if the limits are removed, the macros can overflow
even if int and long are the same size; but all this is irrelevant if
we're using mathematical notation, not C.)

Finally, I didn't see any response to my comment (quoted below) for
section 7.16.3.6 paragraph 5; did I miss something?

	``If this value is outside the normal range, the characters stored
	are unspecified.''  What is the ``normal range''?  The range as
	output by localtime, the range of the Gregorian calendar, or
	the limits as specified in 7.16.2.6?