NetCDF Operators and Utilities

Russ Rew

Steve Emmerson

This draft document describes an initial set of netCDF operator and
utility programs.

NetCDF operators read one or more netCDF input files and write a netCDF
output file.  NetCDF utilities read one or more netCDF files and
produce some other kind of output (e.g. a CDL file or graphics), or
read some other kind of input (e.g. a CDL file) and write netCDF
files.  The notion of a netCDF operator is derived from the analogous
concept in Raymond's (1988) Candis system, and the specific operators
proposed here are similar in function to some of the operators
specified in Dave Raymond's Proposal for NetCDF Algebra, 1989.

1.0  General Conventions for Programs

Each program will accept, as appropriate, positional parameters naming
one or more input netCDF files and one output netCDF file.

If an output file does not already exist, a netCDF operator will create
it. If the output file exists, it will be overwritten only if the
operator completes successfully. Similarly, if the output file is the
same as one of the input files, the input file will be overwritten only
on successful completion of the operator. Although this behaves as if
the operation is being done "in place," the output is typically written
to a temporary file, which is then renamed to the specified output file
name on completion. Specifying an output file to be the same as one of
the input files is a potentially unsafe operation, and should be
avoided unless the input file can be easily recreated.

The netCDF programs use the concept of coordinate variables. A
coordinate variable is a one-dimensional variable having the same name
as its dimension. The purpose of coordi-nate variables is to store
coordinate values for each grid point of a dimension. Thus, coor-
dinate variables indicate the grid over which data are defined. Some
netCDF programs assume that a dimension has an associated coordinate
variable with monotonic (but not necessarily increasing or
evenly-spaced) values as a function of dimension index, so that values
may be interpolated or compared. Such programs will not work well with
coordi-nate systems that have internal discontinuities, e.g.
longitudes across the International Date Line. The ncsort operator can
be used to make non-monotonic coordinates mono-tonic.

As a consequence of using coordinate variables, netCDF programs use
generalized dimensional coordinates. A dimensional coordinate can be
specified as either an integer dimension index, or, if an associated
coordinate variable exists, as the corresponding floating-point value
of the coordinate variable. If a dimension has coordinate variables in
all input files, then dimensional coordinates may be given to an
program as either integer indices or as floating-point values;
otherwise, only integer indices must be used.

Variables, dimensions, and attributes are specified on command lines by
name. Programs that require dimension values may provide ways to
specify these values either in terms of the corresponding coordinate
variable values or in terms of zero-based indices. In such cases, a
convention used by the netCDF programs is that coordinate values should
be specified using real notation with a decimal point required in the
value, whereas dimen-sion indices are specified using integer notation
without a decimal point. Note that this convention is only to
differentiate coordinate values from dimension indices, and is inde-
pendent of the actual type of netCDF coordinate variables, if any.

In general, the global attribute history is used to keep a record of
the operations that have been performed to generate a file. The history
attribute will be constructed from input histories and the program
invocation line. If there are multiple input files, their history
attributes are merged on the output before appending the invocation of
the command that merges them. This default behavior can be overridden
by specifying -h on the command line, in which case no history
attribute is created for the output file.

Other variable and global attributes are preserved in the output file
when practical and meaningful. In the case of multiple input files with
global attributes having the same name, only a single instance of the
attribute is output if the input attribute values are iden-tical;
otherwise multiple instances of the global attributes suffixed with _0,
_1, and so on are output, where the appended numerals specify the
position of the input file in the invo-cation line, as recorded in the
invocation string in the history attribute.

References to fill values are to variable-specific values of the
_FillValue attribute, when such exists, otherwise to the type-specific
fill values defined in the file netcdf.h.  Fill values and valid_range
attributes (when they exist in the netCDF) are properly propagated.

The units attribute is used by programs in deriving units for output
variables and deter-mining commensurability for operations on input
variables.

See the Deferred section at the end of this document for a description
of other planned conventions that may eventually be supported by netCDF
programs.

2.0  Selectors

These operators can be used to select a subset of data from a netCDF
file.

2.1  ncextr

 ncextr  [-v var1[,...]]
	 [-g att1[,...]] [-x gv] [-e var,vatt1[,...]] [...] 
	 [-i var,vatt1[,...]] [...] [-chs] infile outfile

Extracts a subset of variables and global attributes from the input
file and puts them in the output file. The options -v and -g specify,
respectively, the set of variables and global-attributes to be
included in or excluded from the output. Not specifying these options
is equivalent to specifying all associated items in the input file. The
-x option indicates that the specified set of variables (v) or
global-attributes (g) is to be excluded from (rather than included in)
the output: all items not in the respective set will be copied to the
output.  The -e option specifies that the attributes vatt1,etc. of
variable var should not be copied.  This option may be specified more
than once.The -i option specifies that the attributes vatt1,etc. of
variable var should be copied if the variable itself will appear in the
output.  This option may be specified more than once.

If the -c option is specified, coordinate variables associated with the
dimensions of the extracted variables are also included in the output
(even if such a variable is explicitly excluded).

The -h option specifies that the output netCDF is not to contain a
history global-attribute.

The -s option causes the program to run silently: no warning messages
are emitted.

Dimensions that are not used by any of the extracted variables do not
appear in the output file; thus invoking ncextract with no variable or
attribute lists will copy the input to the output except that unneeded
dimensions will be deleted.

This operator is the inverse of ncmerge.

2.2  nccut

 nccut  -d name,[min][,[max]] [-d ...]  infile outfile

 Selects a subspace of the input file where the ranges on dimensions
 are given by the asso-ciated min and max values. A half-open range is
specified by omitting either the min or max parameter but including the
separating comma. The limit is then the maximum or minimum possible in
the unspecified direction. A cross-section at a specific coordinate is
extracted by specifying only the min limit and omitting a trailing
comma. Dimensions not mentioned are passed with no reduction in range.
The dimensionality of variables is not reduced (in the case of a
cross-section, the size of the constant dimension will be one).

If values of a coordinate-variable are used to specify a range or
cross-section, then the coordinate variable must be monotonic (values
either increasing or decreasing). In this case, command-line values
need not exactly match coordinate values for the specified dimension.
Ranges are determined by seeking the first coordinate value to occur in
the closed range [min,max] and including all subsequent values until
one falls outside the range. The coordinate value for a cross-section
is the coordinate-variable value closest to the specified value and
must lie within the range of coordinate-variable values.

This operator is the inverse of ncpaste.

2.3  ncrdim

 ncrdim -d name,[min][,[max]] [-d ...]   infile outfile

Reduces the dimensionality of variables that use the specified
dimensions by averaging the variables over dimension ranges or by
taking cross-sections at single dimensional val-ues. Such dimensions
do not appear in the output. The syntax of and constraints on
dimension-range and cross-section specifications are the same as for
the nccode opera-tor.

Missing data is ignored.

If a dimension has a coordinate-variable, then those values are used to
weight the average using Simpson's (i.e. the trapezoidal) rule.

Affected variables will each have attributes attached to them
specifying the dimensional reduction. Each reduction dimension will
cause a identically-named attribute to be created whose value will be
equal to the mean coordinate value for that reduction interval.

This operator is the inverse of ncappend.

2.4  ncthin

 ncthin -d name,step[,start] [-d ...] infile outfile

Thins the data by selecting every step grid point of dimension name
beginning at point start, etc. If start is specified using a coordinate
variable, then the coordinate variable must be monotonic. Only one
specification per dimension is allowed. start defaults to the first
point. Thus, -var x,2,10.0 -dim y,3,1 keeps every second point of
dimension x (starting with the first point whose x coordinate variable
is greater than or equal to 10.0) and every third point of dimension y
(starting with the second point). Dimensions which are not mentioned,
or which are specified with a thinning factor of 1 and start at the
first point, are not affected.

2.5  ncisocut

 ncisocut       -dim dimname -v test_var -x test_val1[,...]
	-t file [-max|-min] infile outfile

Inverts the dependent/independent relationship between a variable and
one of its dimen-sions by linearly interpolating an independent
dimension to coordinates of a dependent variable. If the independent
dimension has a coordinate variable, then that variable must be
strictly monotonic. Inversion parameters can be specified using one of
two methods:  the -d, -v, and -x option, or the -t option, but not
both.

In the first method, dimname is the independent dimension which will be
interpolated to the test_val1[,...] coordinates of dependent variable
test_var. If only one test value is spec-ified, the dimensionality is
reduced by one.

In the second method, file is a netCDF file which is structurally
identical to infile except that the dependent/independent relationship
of a variable and a dimension have been reversed. The new, independent,
coordinate variable in file (corresponding to the old, dependent
variable in infile) should have values at those positions to which the
old, inde-pendent dimension of infile should be interpolated,
supplanting whatever values exist in template. Fill values in template
supplant those in infile.

If the dependent/independent relationship isn't strictly invertible,
min and max specify, respectively, that the interpolated value should
be the least or greatest possible one.

As an example, values of meteorological fields on isentropic surfaces
(surfaces of equal potential temperature) can be obtained by invoking
ncisocut with dimname equal to the pressure or height dimension,
test_var equal to the name of the potential temperature field, and
test_val1,test_val2,... equal to the desired values of the potential
temperature.

2.6  ncselect

 ncselect [-o] condition infile outfile

Selects data that satisfies a specifiable condition. Variables for
which the condition is meaningful have values that satisfy the
condition copied to the output file. Condition is a relational
expression involving dimensions, variables, and constants. Acceptable
condi-tions include expressions like rec > 20 for an unlimited rec
dimension and T < 5.0 && z < w for variables T, z, and w. The condition
must be quoted if it involves shell metacharacters such as < or >. The
-o option specifies that only variables affected by the condition shall
be copied (the default is to copy all variables for which the
conditions is meaningless).

3.0  Combiners

These operators accept multiple input files from which data is combined
in various ways to form an output file. Some of these use the concept
of conformable netCDF files, which are files that have the same
structure (variables, dimensions, and attributes) but possibly
different data and attribute values.

3.1  ncappend

 ncappend [-d dimname] infile1 infile2 ... outfile

Laminates conformable netCDF files by creating a record dimension,
dimname, increasing the dimensionality of each input variable by one
for the new record dimension. If the out-put file does not already
exist, it is created and will have a record dimension named dim-name.
If the input variables each have an attribute named dimname, and that
value is the same for all variables in a file, then its value is used
for the record dimension coordinate variable; otherwise, the file
position is used. It is permissible to specify as the record dimension
name the name of a scalar variable in the input files, in which case
this variable will become a coordinate variable with values taken from
each input file, in the order in which they occur on the command line.
All input files must have the same structure.

If the output file exists, then the -d option is unnecessary: the data
in the input files is appended to it using the existing record
dimension. If dimname is given, then it must be the same as the
unlimited dimension name in the output file. Every input file must have
the same structure as each record of the output file.

This operator is the inverse of ncrdim.

3.2  nccat

 nccat [-d dimname] infile1 infile2 ... outfile

Merges conformable netCDF files by concatenation in the specified
dimension (or in the unlimited dimension if no dimension is specified).
The input files must have the same structure except that they may have
different sizes along the concatenation dimension. It often makes sense
to follow an nccat operation with an ncsort operation along the
concatenation dimension, to obtain an output file with monotonic
coordinate values along that dimension.

3.3  ncmerge

 ncmerge [-s suf1[,...]] infile1 infile2 ... outfile

Merges possibly dissimilar netCDF files into a single file. Variable
and global attribute name clashes are avoided by appending numeric
suffixes corresponding to input-file command-line position.
Alternatively, a list of suffixes may be specified after the -s option.
In this case, one suffix must be specified for each input file, and the
suffixes must be unique.

If clashes occur in dimension names, consistency is checked in sizes
and values of the coordinate variables. If there is consistency, only
one set of coordinate variables and dimensions is kept. Unlimited
dimensions after the first encountered one are turned into ordinary
dimensions. Otherwise, automatic suffixes are added as for other
variables. In this way variables from dissimilar files defined over
identical spaces may be reasonably combined.

This operator is for exchanging "kitchen sink" information.

3.4  ncpaste

 ncpaste infile1  [infile2 ...] outfile

Forms the union of the input netCDF files. Variables having the same
name and coordinate positions are merged, with valid data values in
later input files supplanting those in earlier ones.

This operator is the inverse of nccut.

3.5  ncjoin

ncjoin infile1  [infile2 ...] outfile

Forms the natural join of the input files. The join is formed using as
key columns those dimensions which are common to all input files. Such
dimensions are called join-dimen-sions. Key values for each
join-dimension are taken from the dimension's coordinate vari-able if
it exists in all input files; otherwise, the dimension index is used.
Variables not defined over join-dimensions will not appear in the
output. Note that it is possible to join multi-dimensional elements.
For example, vectors of temperatures from a T(p,t) array can be joined
with 2-D arrays of heights from a z(x,y,t) array via their common time
dimen-sion, i.e.

4.0  Mathematical Operators

This category contains a variety of different operators..

4.1  ncinterp

 ncinterp infile1 infile2 outfile

Linearly interpolates data in the first netCDF input file to the grid
point locations of the second. Only common dimensions (i.e. ones with
the same name in both files) are used.  Variables with no common
dimensions are ignored. If a common dimension has a coordi-nate
variable in both files, then the values of the coordinate variable are
used in the inter-polation; otherwise, the dimension index is used.

4.2  ncbarne

ncbarne [-r dim,rad] [-s rad] dev-file pts-file out-file

This program performs a Barnes analysis in a Euclidean co-ordinate
system. dev-file con-tains the deviation data (i.e. perturbations from
a mean or first-guess field). pts-file speci-fies the output points at
which to estimate the deviation field and may contain other data as
well. The analysis dimensions are determined from the intersection of
the weighting-func-tion dimensions and the one-dimensional variables
in both dev-file and pts-file (NB: vari-ables in pts-file
corresponding to the analysis dimensions may be either co-ordinate
variables or regular variables). The analysis variables are those in
dev-file that depend on one or more of the analysis dimensions.
out-file will contain the results of the analysis. It will be a copy of
pts-file together with the analysis variables of dev-file and their
associ-ated attributes. In addition, a variable that estimates the
degrees-of-freedom of the esti-mate at each output point will be
created for each analysis variable. Its name will be that of the
associated analysis variable with the suffix `_df'.

There are constraints on the netCDF data-types. The variables in both
dev-file and pts-file corresponding to the analysis dimensions must be
of type NC_FLOAT, as must the vari-ables in dev-file corresponding to
the analysis variables.

The value at an output point is a normalized weighted average of the
neighboring devia-tions:


The shape of the unnormalized weighting-function at an output point is
; where  is the square of the distance of the deviation datum from the
output point in nor-malized distance units:


The degrees-of-freedom is estimated by the sum of the squares of the
normalized weights:

The -r option specifies the length-scales of the weighting-function. It
is specified once for each analysis dimension.

The -s option specifies the maximum search radius about an output point
in normalized distance units. The default value is 2.

4.3  ncconvl

 ncconvl infile kernel outfile

Convolves a netCDF data file with a netCDF kernel file. Convolution
occurs over the dimensions common to both the data and the kernel. Such
dimensions are called convolu-tion dimensions. Each
coordinate-variable corresponding to convolution dimension must be
strictly monotonic and evenly spaced. Such coordinate-variables that
exist in both the data and kernel must also agree in grid spacing and
monotonicity.

4.4  ncarith

 ncarith result=expression ... infile outfile

Performs point by point arithmetic. Each expression is in the form of a
Standard C lan-guage expression (or some subset, if this is too hard).
The variable on the left side of the equals sign may or may not already
exist in the file. If it doesn't, it is created with a dimen-sionality
that is the union of the dimensionality of all the input variables. If
the output vari-able is already defined, then it must have this
dimensionality. Constants and function calls may occur in each
expression. Desired functions are, at a minimum, sin, cos, tan, atan2,
exp, log, log10, and pow, as defined in the Standard C math library.
Coordi-nate variables must not be changed.

4.5  nccalc

  nccalc -r | -i -v var1[,...] -d dim1[,...]
		-r res1[,...] infile outfile

Performs calculus operations on variables var1,.... Either integrals
(-i) or derivatives (-r) are done for each variable over the specified
dimensions dim1,.... The results are stored in new variables with names
res1,....

5.0  Miscellaneous

5.1  ncstat

 ncstat [-d dim1[,...]] [-s stat1[,...]] infile outfile

Creates a netCDF file that contains a summary of an input netCDF file
over the specified dimensions, dim1[,...], or over all dimensions if
the -d option is not used. The specified summary operations include
min, mean, max, and fillcount for the minimum, aver-age, and maximum
values and for a count of the number of fill values of each variable
over the specified dimensions. If no operations are specified with the
-s option, all statis-tics are assumed. Variables that do not involve
the selected dimensions are copied to the output. Other variables that
use the specified dimensions are replaced by summary vari-ables with
reduced dimensionality.

As an example, if the input contains a variable named var(i,j,k) and -d
i,k is specified, then the output netCDF file will have the variables
var_max(j), var_min(j), var_mean(j), and var_fillcount(j).

The *_max, *_min, and *_mean variables inherit all the attributes of
the variable from which they are derived; first two also inherit the
same type. The *_mean variable is of type float unless the original
variable is of type double, in which case it is also of type double.
The *_fillcount variable is always of type long and inherits no
attributes from the original variable.

If no dimensions are specified (i.e. all dimensions are selected) then
the output netCDF file will have four scalar variables for each
variable, var, in the input: var_max, var_min, and var_mean, and
var_fillcount, except that scalar variables in the input are merely
copied. Fill values do not participate in the computation of minimums,
maximums, or means.

5.2  ncsort

 ncsort [-v name] [...] [-ru] infile outfile

Sorts data according to the value of variables (including coordinate
variables). Numeric sort variables must be one-dimensional. Character
sort variables may be two-dimensional, in which case they will behave
like one-dimensional string variables for the purpose of sorting. If no
variables are specified, then records are sorted along the unlimited
dimen-sion. If a given variable does not exist, but a dimension of the
same name does, then the variable is taken to be the dimension index;
hence, an increasing sort on such a variable will merely copy its input
to output. By default, data is sorted in order of increasing values of
the named variables, with the first variable being the primary sort
key, the second vari-able being the secondary sort key, etc. This
order is changed to decreasing by the -r option. If the -u option is
present, then duplicate values of the sort dimension (together with the
associated data) are not copied to the output -resulting in a strictly
monotonic variable. String variables are sorted using the native
collating sequence of the operating system.

Some netCDF operators require that certain variables be monotonic or
even strictly mono-tonic.

5.3  ncrename

 ncrename [-d old,new] [-d ...]
	  [-v old,new] [-v ...] [-a old,new] [-a ...] infile outfile

Renames dimensions, variables, and attributes in a netCDF file. Each
object that has a name in the list of old names is renamed using the
corresponding name in the list of new names. All the new names must be
unique. Every old name must exist in the input file, unless the name is
preceded by the character `.'. If the output file is the same as the
input file, renaming is done "in-place". Note that this will still
incur the overhead of copying the whole file if any of the new names
are longer than the old names.

Note that renaming a dimension to the name of a dependent variable can
be used to invert the relationship between an independent coordinate
variable and a dependent variable. In this case, the named dependent
variable must be one-dimensional and should have no missing values.
Such a variables will become a coordinate variable.

5.4  ncorder

 ncorder [-n] -d dim1,... infile outfile

Changes the order of dimensions in all variables to that specified on
the command-line (the leftmost command-line dimension varies least
rapidly). All dimensions must be spec-ified. Variables that are
defined using more than one dimension and in an order that differs from
the command-line order, will undergo a structural reorganization. The
new first dimension will become the new unlimited dimension, unless the
-n option is present, in which case the output file will have no
unlimited dimension.

5.5  ncunpack

 ncunpack infile outfile

Uncompresses limited-precision floating-point data which is packed as
small integers using the scale_factor and add_offset attributes. The
type of the resulting unpacked variable is determined by the highest
type of its scale_factor and add_offset attributes.

If a _FillValue attribute exists for a packed variable, then it is
propagated, through unpacking, into a _FillValue for the expanded
type.

This is the inverse of the ncpack operator.

5.6  ncpack

 ncpack [-b var1[,...]] [-s var1[,...]] infile outfile

Compresses floating-point data into small integers. If no variable
lists are specified, all floating-point and double-precision variables
are packed into 8-bit, unsigned integers.  Otherwise, only the
specified variables are packed into either 8-bit (-b) or 16-bit (-s)
unsigned integers. For each packed variable, the scale_factor and
add_offset attributes are used. If these attributes exist in the input
file, then their values control the packing; otherwise, they are
automatically created in the output file and have same type as the
unpacked variable data.

If a _FillValue attribute exists for a variable, then it is propagated,
through packing, into a _FillValue for the compressed variable.

This is the inverse of the ncpack operator.

5.7  nccmp

 nccmp [-r releps] [-a abseps] infile1 infile2

Compares two netCDF files and reports any differences. If releps is
specified, then floating point numbers, x and y (in infile1 and
infile2, respectively) that satisfy

will be considered different. If abseps is specified, then floating
point numbers that satisfy

will be considered different. If both releps and abseps are specified,
only x, y pairs that sat-isfy both criteria will be considered
different. If neither releps nor abseps is specified, all value
differences will be reported.

5.8  ncmkgrd

 ncmkgrd -d name,npts -v name,start,stop,step [...] outfile

Generates a netCDF file containing a grid. For each dimension name,
either the number of grid points, npts, or a coordinate-variable
specification may be given, but not both. The dimension resulting from
a coordinate-variable specification will have start as its first
coordinate value, step as the grid-point spacing, and the last
coordinate value will be the one closest to stop from start in steps of
step. If stop is less than start, then step should be negative.

6.0  Deferred

View variables will be supported. View variables are like symbolic
links, in that they have no associated data but instead have a special
string-valued _Def attribute, giving the definition for a variable in
terms of a hyperslab subset of another variable in the same or a
different file. (Alternatively, if view variables are supported as part
of the netCDF inter-face rather than at this layer, their definition
will be part of their creation.) View variables may rename variables,
reorder their dimensions, or give names to variable cross-sections.
For example,

var:_Def = "foo:othervar(lat=(-125.0,-75.0),lon=40.0,
	    lvl=(850,400))";

defines a two-dimensional view variable var(lat,lvl) in terms of a
cross-section of a three-dimensional variable othervar in another file
named foo using dimension names associated with the real source
variable (which must not clash with local dimension names). Variable
attributes and data are inherited from the real variable, except that a
view variable may define additional attributes and may override the
attributes of its associated real variable. When a view variable is
output it is fully instantiated, with data copied into the output
netCDF file and appropriate output dimensions created as necessary.

7.0  References

1.  Fahle, J., TeraScan Applications Programming Interface,
    SeaSpace, San Diego, Cali-fornia, 1989.

2.  Raymond, D. J., Proposal for Netcdf Algebra, Unidata memo,
    1989.

3.  Raymond, D. J., "A C Language-Based Modular System for
    Analyzing and Displaying Gridded Numerical Data," Journal of
    Atmospheric and Oceanic Technology, 5, 501-511, 1988.