[tz] input needed on creation of a new sub-package for raw zone data

Brian Inglis Brian.Inglis at SystematicSw.ab.ca
Tue May 23 15:57:05 UTC 2017

On 2017-05-23 02:29, Paul Eggert wrote:
>> We are planning to ship a new subpackage for users who want to
>> have access to the raw zone data files e.g. leapseconds
> This is a good idea overall; thanks. Here are some comments and
> suggestions for improvement.
> First, as a terminology issue, we need a better name than "raw zone
> data". The files we're talking about are ordinary text files, and "raw"
> has the wrong connotation for text. Also, the package name
> "tzdata-zonedata" is repetitive and somewhat-confusing. Instead, how
> about a package name like "tzdata-info" or "tzdata-src" or something
> like that?

Or tzdata-source, although Debian packagers may balk at that usage,
as RH packagers balk at tzdata-src.

>> Just as an example we would ship the following files:
> The LICENSE file conveys misleading information for the files in
> question, as they are all public domain, so let's not install it. Of
> course if you want to install all the source files as a package, then
> LICENSE should be included along with all the other files in the tzdb
> tarball; but as I understand it, the goal here is to install only the
> data source.

Almost mandatory nowadays for consideration for packaging, and avoidance
of doubt, it states that all the files are PD, with code exceptions.

>> africa
>> antarctica
>> asia
>> australasia
>> europe
>> northamerica
>> southamerica
>> pacificnew
>> etcetera
>> backward
>> systemv
>> factory
>> backzone
> The installed source data should match the installed binary data, so the
> above list of files needs to be adjusted to match what's installed as
> binary data. For example, by default 'backzone' should be omitted since
> its data items are normally not installed.

Packagers may use any of back{ward,zone} and zone{,1970}.tab generating
their binary packages based on the reference distribution, depending on
their policy decisions and tradeoffs of space vs backward compatibility
with earlier releases.
The corresponding tzdata distribution source packages can be installed
by those who want one-for-one source.
This (sub-)package is for those who want only the source data for other
uses, implied by the suggested approach.

> Also, that's a long list of file names. I would rather not propagate
> implementation details like this list into the installation directory.
> Although the intent may be that "the raw zone data format may change",
> in practice what happens is that people depend on the format. So we
> might as well use a simple format rather than a complicated one; see
> below for a specific proposal.
>> iso3166.tab
>> zone1970.tab
>> zone.tab.
> These files are already installed, and installing copies of them in a
> different directory would lead to operational problems. How about if we
> just leave them where they already are?

Do we know all or any of these are installed with all binary
distributions? This proposed package is effectively a data only
developer package for those who do not use the reference distribution
code, and for various reasons may not want to have the source code on
their systems.

>> leapseconds
>> leap-seconds.list

> We need not and probably should not ship two text files that contain the
> same leap-second info in different representations. As we're considering
> removing leap-seconds.list anyway, let's just install 'leapseconds' and
> skip leap-seconds.list.

Some distributions ship neither, and e.g. Debian ships only original
source file leap-seconds.list, from what I can find. In conformance with
NTP crypto file guidelines, which is why this file is generated and how
it is intended to be propagated, the canonical name is
where <timestamp> is the NTP time stamp for the file generation time,
allowing for checking whether this is the latest proventic generation,
soft linked to a generic name, in this case leap-seconds.list.

>> version
> I would rather that we didn't recommend installing this file in the tzdb
> source, as that would be a maintenance hassle and anyway the file is not
> needed to generate the binary data. Similarly, I don't think the
> installation directory's name should contain the tzdb version number, as
> others have proposed. Versioning should be an independent aspect of
> operations, and it should not be our job.

It is the packagers job to ensure that some indication of version is
available, and that indication is now in the version file.
It is probably desirable for intended users (and necessary for
packagers) to allow multiple releases to be installed simultaneously,
with symlinks like ...-latest, or without any suffix, used operationally
by admins, packagers, developers, or users to designate the currently
preferred release.

> With the above in mind, here's a simpler proposal: We optionally install
> two text files: 'leapseconds' and a new file 'tzdata.zi' containing the
> parts of asia, australasia, etc. that are actually used to create the
> binary data.
> The idea is that 'zic tzdata.zi' exactly re-creates the installed binary
> data files, and that 'zic -l leapseconds tzdata.zi' does the same for
> data with leap seconds. Programs that want text rather than binary data
> can read tzdata.zi (and optionally, 'leapseconds'). Because tzdata.zi
> uses the documented zic format, third-party tools can parse it. (".zi"
> stands for "zoneinfo": ".zi" is to zic as :.c: is to cc.)

Packagers prefer to distribute source files as is, and as the intent of
this (sub-)package is presumably to allow users to develop other
products based on the data files, or possibly further subset them, as
for embedded distributions, original file names, sizes, and timestamps
ensure that no files or data are missing from the package.

> We can install these two text files by default into the same directory
> as the already-installed text files iso3166.tab, zone1970.tab, and
> zone.tab.

As all files are currently available in the original and distribution
source packages corresponding to the tzdata binary packages, there are
other requirements from the requesters implied by installation into a
distinct directory only the source data files, sufficient for a major
vendor distributor to plan on releasing separate packages.

These could be used by downstream language packagers e.g. ghc, java,
dotnet, mono, python, ruby, etc. in their ...-tzdata-... packages, as
well as by embedded distribution or application packagers, e.g. Oracle,
etc., who may have to maintain a strictly documented long term audit
trail from original source data to selected source data to generated
binary data, to meet standards and for financial and government systems
and applications.

Requests about how to handle some of these requirements have been posted
on the list.

Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

More information about the tz mailing list