Proposed 64-bit changes

Paul Eggert eggert at CS.UCLA.EDU
Mon Apr 25 18:57:44 UTC 2005


"Olson, Arthur David (NIH/NCI)" <olsona at dc37a.nci.nih.gov> writes:

> 	1. the transition times are 64 bits rather than 32 bits, doubling
> the size.
> 	2. About 400 years of transitions are recorded rather than about
> 100, quadrupling the size.
> The combination of the two consideration means that the new data takes about
> 8 times as much space as the old, and the total is about 9 times as much as
> the old.

Ah, thanks, that explains it.  I didn't know about (2).  How about
if we document this?  Here's a proposed patch to the Theory file
that explains this, along with some other issues that I noticed
when I reread that file:

  * Update references to POSIX, etc..

  * The tz code does not yet support the quoted time zone abbreviation
    syntax required by POSIX starting in 2001.

  * Add an example of a POSIX TZ setting.

--- Theory	2004/05/27 16:00:30	2004.1
+++ Theory	2005/04/25 18:55:19	2004.1.0.1
@@ -12,26 +12,31 @@
 
 ----- Time and date functions -----
 
-These time and date functions are upwards compatible with POSIX.1,
+These time and date functions are mostly upwards compatible with POSIX,
 an international standard for UNIX-like systems.
-As of this writing, the current edition of POSIX.1 is:
+As of this writing, the current edition of POSIX is:
 
-  Information technology --Portable Operating System Interface (POSIX (R))
-  -- Part 1: System Application Program Interface (API) [C Language]
-  ISO/IEC 9945-1:1996
-  ANSI/IEEE Std 1003.1, 1996 Edition
-  1996-07-12
+  Standard for Information technology
+  -- Portable Operating System Interface (POSIX (R))
+  -- System Interfaces
+  IEEE Std 1003.1, 2004 Edition
+  <http://www.opengroup.org/online-pubs?DOC=7999959899>
+  <http://www.opengroup.org/pubs/catalog/t041.htm>
+
+Currently the only POSIX feature not implemented is quoted time zone
+abbreviations, e.g., TZ='<UTC-10>10' for a time zone 10 hours behind
+UTC whose abbreviation is "UTC-10".
 
-POSIX.1 has the following properties and limitations.
+POSIX has the following properties and limitations.
 
-*	In POSIX.1, time display in a process is controlled by the
-	environment variable TZ.  Unfortunately, the POSIX.1 TZ string takes
+*	In POSIX, time display in a process is controlled by the
+	environment variable TZ.  Unfortunately, the POSIX TZ string takes
 	a form that is hard to describe and is error-prone in practice.
-	Also, POSIX.1 TZ strings can't deal with other (for example, Israeli)
+	Also, POSIX TZ strings can't deal with other (for example, Israeli)
 	daylight saving time rules, or situations where more than two
 	time zone abbreviations are used in an area.
 
-	The POSIX.1 TZ string takes the following form:
+	The POSIX TZ string takes the following form:
 
 		stdoffset[dst[offset],date[/time],date[/time]]
 
@@ -40,6 +45,9 @@ POSIX.1 has the following properties and
 	std and dst
 		are 3 or more characters specifying the standard
 		and daylight saving time (DST) zone names.
+		Starting with POSIX.1-2001, std and dst may also be
+		in a quoted form like "<UTC+10>"; this allows
+		"+" and "-" in the names.
 	offset
 		is of the form `[-]hh:[mm[:ss]]' and specifies the
 		offset west of UTC.  The default DST offset is one hour
@@ -61,15 +69,25 @@ POSIX.1 has the following properties and
 			where week 1 is the first week in which day d appears,
 			and `5' stands for the last week in which day d appears
 			(which may be either the 4th or 5th week).
+	
+	Here is an example POSIX TZ string, for US Pacific time using rules
+	appropriate from 1987 through at least 2005:
 
-*	In POSIX.1, when a TZ value like "EST5EDT" is parsed,
-	typically the current US DST rules are used,
+		TZ='PST8PDT,M4.1.0/02:00,M10.5.0/02:00'  
+
+	This POSIX TZ string is hard to remember, and mishandles time stamps
+	before 1987.  With this package you can use this instead:
+
+		TZ='America/Los_Angeles'
+
+*	POSIX does not define the exact meaning of TZ values like "EST5EDT".
+	Typically the current US DST rules are used to interpret such values,
 	but this means that the US DST rules are compiled into each program
 	that does time conversion.  This means that when US time conversion
 	rules change (as in the United States in 1987), all programs that
 	do time conversion must be recompiled to ensure proper results.
 
-*	In POSIX.1, there's no tamper-proof way for a process to learn the
+*	In POSIX, there's no tamper-proof way for a process to learn the
 	system's best idea of local wall clock.  (This is important for
 	applications that an administrator wants used only at certain times--
 	without regard to whether the user has fiddled the "TZ" environment
@@ -78,9 +96,9 @@ POSIX.1 has the following properties and
 	daylight saving time shifts--as might be required to limit phone
 	calls to off-peak hours.)
 
-*	POSIX.1 requires that systems ignore leap seconds.
+*	POSIX requires that systems ignore leap seconds.
 
-These are the extensions that have been made to the POSIX.1 functions:
+These are the extensions that have been made to the POSIX functions:
 
 *	The "TZ" environment variable is used in generating the name of a file
 	from which time zone information is read (or is interpreted a la
@@ -108,7 +126,7 @@ These are the extensions that have been 
 *	To handle places where more than two time zone abbreviations are used,
 	the functions "localtime" and "gmtime" set tzname[tmp->tm_isdst]
 	(where "tmp" is the value the function returns) to the time zone
-	abbreviation to be used.  This differs from POSIX.1, where the elements
+	abbreviation to be used.  This differs from POSIX, where the elements
 	of tzname are only changed as a result of calls to tzset.
 
 *	Since the "TZ" environment variable can now be used to control time
@@ -136,6 +154,18 @@ These are the extensions that have been 
 
 Points of interest to folks with other systems:
 
+*	In 2005 this package started generating time zone information files
+	containing two sets of data.  The first set uses 32-bit time stamps
+	and covers times from 1901-12-13 20:45:52 through 2038-01-19
+	03:14:07 UTC; it is for backward compatibility with older versions of
+	this and other libraries.  The second set uses 64-bit time stamps
+	and contains about 400 years of transition times, which are
+	extrapolated into the indefinite future; it is for newer libraries,
+	typically on hosts with 64-bit time stamps.  New files are
+	approximately nine times the size of the old, because the added data
+	set contains about four times as many transitions, and its time
+	stamps are twice as wide.
+		
 *	This package is already part of many POSIX-compliant hosts,
 	including BSD, HP, Linux, Network Appliance, SCO, SGI, and Sun.
 	On such hosts, the primary use of this package
@@ -173,9 +203,9 @@ Hewlett Packard, offer a wider selection
 beyond those provided here.  The absence of such functions from this package
 is not meant to discourage the development, standardization, or use of such
 functions.  Rather, their absence reflects the decision to make this package
-contain valid extensions to POSIX.1, to ensure its broad
-acceptability.  If more powerful time conversion functions can be standardized,
-so much the better.
+contain valid extensions to POSIX, to ensure its broad acceptability.  If
+more powerful time conversion functions can be standardized, so much the
+better.
 
 
 ----- Names of time zone rule files -----
@@ -277,7 +307,7 @@ and `Factory' (see the file `factory').
 ----- Time zone abbreviations -----
 
 When this package is installed, it generates time zone abbreviations
-like `EST' to be compatible with human tradition and POSIX.1.
+like `EST' to be compatible with human tradition and POSIX.
 Here are the general rules used for choosing time zone abbreviations,
 in decreasing order of importance:
 
@@ -292,17 +322,16 @@ in decreasing order of importance:
 		preferred "ChST", so the rule has been relaxed.
 
 		This rule guarantees that all abbreviations could have
-		been specified by a POSIX.1 TZ string.  POSIX.1
+		been specified by a POSIX TZ string.  POSIX
 		requires at least three characters for an
-		abbreviation.  POSIX.1-1996 says that an abbreviation
+		abbreviation.  POSIX through 2000 says that an abbreviation
 		cannot start with ':', and cannot contain ',', '-',
-		'+', NUL, or a digit.  Draft 7 of POSIX 1003.1-200x
-		changes this rule to say that an abbreviation can
-		contain only '-', '+', and alphanumeric characters in
-		the current locale.  To be portable to both sets of
+		'+', NUL, or a digit.  POSIX from 2001 on changes this
+		rule to say that an abbreviation can contain only '-', '+',
+		and alphanumeric characters from the portable character set
+		in the current locale.  To be portable to both sets of
 		rules, an abbreviation must therefore use only ASCII
-		letters, as these are the only letters that are
-		alphabetic in all locales.
+		letters.
 
 	Use abbreviations that are in common use among English-speakers,
 		e.g. `EST' for Eastern Standard Time in North America.
@@ -343,10 +372,10 @@ abbreviations like `EST'; this avoids th
 Calendrical issues are a bit out of scope for a time zone database,
 but they indicate the sort of problems that we would run into if we
 extended the time zone database further into the past.  An excellent
-resource in this area is Nachum Dershowitz and Edward M. Reingold,
-<a href="http://emr.cs.uiuc.edu/home/reingold/calendar-book/index.shtml">
-Calendrical Calculations
-</a>, Cambridge University Press (1997).  Other information and
+resource in this area is Edward M. Reingold and Nachum Dershowitz,
+<a href="http://emr.cs.uiuc.edu/home/reingold/calendar-book/second-edition/">
+Calendrical Calculations: The Millennium Edition
+</a>, Cambridge University Press (2001).  Other information and
 sources are given below.  They sometimes disagree.
 
 
@@ -546,7 +575,7 @@ Sources:
 
 Michael Allison and Robert Schmunk,
 "Technical Notes on Mars Solar Time as Adopted by the Mars24 Sunclock"
-<http://www.giss.nasa.gov/tools/mars24/help/notes.html> (2004-03-15).
+<http://www.giss.nasa.gov/tools/mars24/help/notes.html> (2004-07-30).
 
 Jia-Rui Chong, "Workdays Fit for a Martian", Los Angeles Times
 (2004-01-14), pp A1, A20-A21.



More information about the tz mailing list