[tz] proposed changes for Win32 and a improved mktime() algorithm

Tue May 9 22:14:39 UTC 2017

    Date:        Tue, 9 May 2017 14:44:33 +0000
    From:        Kees Dekker <Kees.Dekker at infor.com>
    Message-ID:  <DM5PR02MB2809951912FF7B2DE1A24AD5E9EF0 at DM5PR02MB2809.namprd02.prod.outlook.com>

  | 2.       Adding a faster (and more efficient) algorithm for mktime().
  |	     Instead of the binary search another approach is used,

I think that kind of thing is fine for a system's implementation of
mktime(), but I don't think it is best for the reference implementation.

The big advantage of the binary search algorithm, is that it moves *all*
assumptions about the rules about the conversions into localtime, so
they exist only in one place, and need to be updated only in one place.

For example, ...
+		lDays	+= 1;			/* year 0 is a leap year */

Is it indeed?   Is there even a year 0?   For sure, in the current
implementation, we extend the Gregorian calendar way back into pre-history,
and assume that years all had 365.25 days, and days all had 86400
seconds, even when we know that neither was true ... we extend that
assumption back before the earth was formed, when there was nothing
rotating and revolving around the sun, back even before there was a
sun to rotate around, as if that all makes some kind of sense - which
it doesn't.

While fixing this in a way that would have made contemporary sense is
hard, if not impossible, we could at least do better with pre-Gregorian
human era dates - we could have the timezone data include the date/time
at which the Gregorian calendar came into use in a particular zone, and
some indication of what algorithm preceded it, and when that one came into
being, etc, and for times before we have any idea what times were called
we could just adopt stardates ...

If we do any of that, it would be nice to have to write the code just once,
in localtime(), and not have to also invent its inverse in mktime().

  | 3.       Adding a possibility not to use putenv().

That is certainly a worthwhile goal, though I would prefer to solve it
by introducing a variant of tzset() which takes a timezone spec (anything
which can be a value of getenv("TZ") as an arg, and returns a void *
(pointer to an internal tzlib data struct) (it could be given some other
type name, as long as internal details of its representation are not
available) and then a parallel set of functions to localtime, mktime,
etc which take this additional pointer as an extra arg, and use that
zone.

This allows parallel use of many different zones, without needing to be
constantly switching between them, with all the costs that  involves.

We could perhaps call the tzset() variant, tzalloc() the data type it
returns (rather than void *) a timezone_t, and add tzfree() to go with it
of course, and the parallel functions could be localtime_rz() ctime_rz()
mktime_z() ...

And just maybe, all of this is already implemented and available...

If the reason for doing it the other way is to avoid needing to change
some application's localtime() (etc) calls, that can easily be handled
by macros...

#define tzset()			(_my_tz = tzalloc(getenv("TZ")))
#define	localtime(t)		(localtime_rz(_my_tz, (t), &_my_static_tm))
/* ... */

  | 4.       Solved some platform (e.g. HPUX) specific errors/warnings
  | (initializing + some fixed char buffers).

Assuming that this is the reason for ...

-	static const char	wday_name[][3] = {
+	static const char	wday_name[][4] = {

Then IMO that system is just broken, it has always been legal to provide
a string where the non-null chars exactly fill the space provided, in
which case the terminating \0 is simply omitted from the initialisation.

If some system has broken that, it should be made to suffer, we should
not be working around it.

  | I'm not sure whether the way how I provide this patch is the
  | recommended way.

It is not me who has to deal with it, but the patch format itself looked
fine to me.

I would suggest re-formatting the code into the style of the existing
reference implementation, even if it is not your (or yor system's) preferred
coding (and naming) style, if you expect the patches to be adopted though.

A couple of the changes perplex me though (minor issues, but ...)

Why:

-# ifdef USG_COMPAT
+# if defined(USG_COMPAT)

When I first saw that, I assumed perhaps some system had deleted support for
the (very old now, and no longer really needed) #ifdef, in which case that
change would make sense, but then just after there is ...

+#ifdef _MSC_VER

where a new #ifdef is added, so that leaves the reason for the earlier
change unexplained?

+/* a lot of stuff does not exist on Windows/Visual Studio */
+#define DEFAULT_FOR_UNIX (!_MSC_VER)
+
 #ifndef HAVE_LINK
-#define HAVE_LINK		1
+#define HAVE_LINK		DEFAULT_FOR_UNIX

This kind of change should just be handled by pre-defining the  HAVE_XXX
symbols (with 0 values), that's why the #ifndef is there.

kre