Generalized standard units scheme

Generalize standard units codes are derived as follows:

The 32bit units codes have a binary representation. The high bit 31 of the combined code will always be clear indicating a standard units system. The 32 bit units codes it composed of two parts or subunits.

The lower word denotes what is called the primary units.

The upper word denotes what is called the secondary units.

The scheme is extensible, I.e. if a variable or units is not represented in the standard definition, the programmer can derive his own non-standard variable and units codes. (The standard units codes only support SI units, american/english units can be represented with a non standard definition)

Units binary format

The primary and seconday subunits have the same format.

There are two types of subunits format: simple enumeration and metric.

The subunits code are grouped into a number of range values:
0x0000-0x00FFSpecial or non dimensional units. I.e. unitless, generic unit, count, percent, decimal percent, index, adjustment
0x1000-0x1AFFTemporal I.e calendar time, time of day, time counts can be used for time step.
0x0100-0x0FFF
0x2000-0xFFFF
These are encoded as described below.

With the exception of Seconds, the Special and Temporal subunits are simple enumerations and do not have the exponential encoding of the remaining metric subunits.

The metric bit layout is such that the metric exponent is encoded in the units code.

    1 1 1 1 1 1
    5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0|base quality |d|s|base 10 exp|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

In the metric units, the upper byte enumerates the unit physical quality class, the lower byte encodes the exponent. In the exponent byte the most significant bit denotes the sign of the exponent. This metric coding scheme allows exponents up to 10 ±127 so you can easily encode yotta grams [Would that be to measure the weight of the universe? :-) ] (and yocto grams)

Bit 15
This bit will always be 0 indicating standard units.
Bits 8-14
These bits are one of the international standard base or derive quality. Essentially this corresponds to the quantity being measured.
Bit 7
0 indicates this component of the unit appears in the numerator.
1 indicates it appears in the denominator.
Bit 6
This is the sign of the base 10 exponent in bits 0 -5
Bit 5
The base 10 exponent of the units.

The codification of the exponent is as follows. Not that the coding of the exponent allows representation of metric units from 10-63 to 1063 while SI units only have names for some of the exponents 10-24 to 1024.
signedunsigned
0 0
1 deka -1 deci
2 hecto -2 centi
3 kilo -3 milli
6 mega -6 micro
9 giga -9 nano
12 tera -12 pico
15 peta -15 femto
18 exa -18 atto
21 zetta -21 zepto
24 yotta -24 yocto

Example

For example, the enumerated physical quality code (byte) for mass is 0x06 so the units of grams of mass is encoded as 0x0600 (grams being the base unit) kg of mass is encoded as 0x0603 (103), and milligrams is encoded as 0x0643 (10-3)

Note that the MKS system designates kilogram as base measure for mass physical property.
This codification system uses grams to be consistent with automatic naming without special case.

Composed units

The upper word is used in a composed units code which is either second numerator or denominator, I.e. newton meter or kg/ha

In composite units the the second subunit is stored in the upper 16bits (shifted left) The most significant bit of the exponent byte indicates if the second unit is in the denominator otherwise it is also presumed to be in the numerator: So kg/ha would be coded as 0x24820603 (land area is 2A so hectare is 0x2A02, per hectare is 0x2482 = 0x2A02 + 0x0080)

Unfortunalty more complex numerator/denominator combinations can not be coded with only 32 bits, so something like accelleration m/s/s would be coded as m/s_s where s_s has its own units code. I have not encountered too many units that are composites so this is a good compromise.

In UED, the timestep is stored with the dataset, the dataset is qualified by the time so the units do not not need to be qualified by the timestep. That is, rates should not be used for data records where a value is specified for a time step.

In general, when composing units codes, do not compose the units with the timestep. That is, don't use a denominator secondary subunit that would be redundent for the type of record, a rate units might not be in order. For example, Don't use mm/day as the units for daily precipitation, the day is already implicit in the time step of the data record, you should simply use mm of depth of precipitation. Rates would be used used for something like wind speed which is a average daily value of m/sec.

Notes:
Primary and secondary numerators and denominators refer to the placement of the symbols for the units of measure For example: In kg/m² kg is the numerator and m² is the denominator. In N m (newton meters), N is the primary numerator, m is the secondary numerator. In 1/m the numerator is unit quantity, and m is the denominator.

Example

In this example the binary coding for kg/cm² (mass per rectangular area) is as follows:
    3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1  1 1 1 1 1 1
    1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6  5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0|0 1 0 1 0 0 0|1|0|0 0 0 0 1 0||0|0 0 0 0 1 1 0|0|0|0 0 0 0 1 1|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    ^ \rect. area_/ ^ ^\___Centi___/ ^ \___mass____/ ^ ^\___kilo___/
    |               | |              |               | |
    Standard        | unsigned exp.  Standard        | unsigned exponent
                    Denominator                      Numerator

A unit such as Joule of work has the same code as newton meter (both subunits are numerators). Newton = 0x5000, Meter = 0x2000 so Joule or Newton meter is 0x20005000.

Advantages of this units scheme

Fast conversion functions

I provide a convert function that returns the converted target value given the source value, source units and target units. With this coding scheme, I only need to do a single table look up for the base units, then use the exponent part to scale the conversion appropriately.

Simple regular label generation

Also, this allows easy generation of a subsunits label or abbreviation, with a simple look for the base units (I.e. to get the word grams, the prepend with the respective prefix (milli, kilo etc.). (Composed units get appended with "per" (for the denomintor) and the second subunit's description.

I have distinguished the length units (meter) to different subqualities I.e. meters depth, meters height, meters altitude; so that the label generator will label these units appropriately, I also intend to use these distinguish qualities for the future units conversions: I.e. to take something like three values of length width depth and automatically produce a units of volume, for example.

English (US customary, British customary and Imperial units

For non standard units systems the bit layout can have any format. By default, UED presumes that nonstandard units encountered in the database will be American customary units (English units). As other non-standard units are encountered a record will be provided in the database to identify the non-standard system.

Examples of other non standard systems might include ancient/historical or cultural units of measure (I.e. Cubits, Stones etc.). Or units specific to a xxxx such as dollars per capita, monetary units etc population

For the English units the encoding is similar to the metric encoding, but the lower byte has a different interpretation.

    1 1 1 1 1 1
    5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |1|base quality |d|I|var| enum  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

As with metric units, the upper byte enumerates the unit physical quality class.

Bit 15
This bit will always be 1 indicating nonstandard units.
Bits 8-14
These bits are one of the international standard base or derive quality. Essentially this corresponds to the quantity being measured. (This matches the metric units encoding)
Bit 7
0 indicates this component of the unit appears in the numerator.
1 indicates it appears in the denominator.
Bit 6
Imperial units

0 Indicates US customary units.
1 Indicates Imperial or English customary units or other units only used in England.

In the case of weight (base quality = 08h) Since the US and Imperial systems use Avoirdupois and Troy weights bit 6 does not represent Imperial units. The exception being the long and short hundredweight and ton. In all other weight cases bit 6 in combination with bits 4-5 represents archaic weight systems in use prior to the existance of the US.

Bits 4-5
These bits are used to denote variations in units or a different system. The coding depends on the base physical quality being measured.

Units system for a physical quality measurement these bits will be 00 (Or may be an extension of the enumeration if needed for future use).
Weight (base quality = 08h)
Bit 6Bits
4-5
System Special measurement
0 00 Avoirdupois
0 01 Troy
0 10 Wool For the measurement of wool.
Only the pound, clove, stone and tod were relevant wool units.
0 11 Undefined Reserved for future use
1 00 Imperial Avoirdupois For British (long) tons and hundredweight
1 01 Libra mercatoria (trade pound) Also known as the London pound was 7200 grains (i.e. 15 troy ounces). This died out around the middle of the 14th century. One London stone was of 12½ London pounds.
1 10 Tower The tower pound was used for weighing coins, and was of 5400 grains. The tower pound was abolished in 1527. The name may be from Tower Hill, the site of the royal mint. This number of grains comes from the traditional weight of an English silver penny of 22½ grains (Troy, or grains of barley - the same as 30 grains of wheat), and 240 pennies to the pound.
1 11 Undefined

Bit 0-4
This is the enumerated unit of measure.

Because the English system generally has no logical system for relating scales of units no attempt has been made encode the unit of measure. However, generally, the smaller the unit of measure, the smaller the code. Because there are 32 possible codes, the units have been assigned at staggard values to allow additional nonstandard codes to be inserted in the enumeration in the future without having to reenumerate. This allows the units to be sorted and compared.




Example

Temporal units

The time units are used in UED for units of measure and to distinguish the time step, and to time stamp data records. (UED allows you to store data at any time step, and time stamps identify the starting date of a series of numbers for a given variable, so record can be time stamped for the start date, and the time step specifies the interval for each value).