repair - attempt to repair existing or candidate RDB datafiles
repair [options] file ...
Options may be abbreviated.
- -blank
-
Remove leading and trailing blank characters from data fields.
- -cCOL_DELIM
-
Specify an alternate column delimiter. This is ignored if the
-exist, -Exist or -w options are specified. See also -C.
- -CCOL_DELIM
-
Specify an alternate column delimiter. This is ignored if the
-exist, -Exist, or -w options are specified. The delimiter is
treated as a Perl regular expression. See also -c.
- -dN
-
Deletes the first N lines of each input file. Used to
remove extra lines before the actual header.
- -D
-
Convert the
D's in FORTRAN double precision output to E's.
- -exist
-
The
file(s) are existing rdbtables. Instead of generating
new definition lines the current ones are used as starting
values.
- -Exist
-
Just like -exist except the data width check is not done.
- -f
-
The file is not an rdb table, but the first valid line (after -d
and/or -k processing) contains the column names
- -help
-
Print help information and exit.
- -kWORD
-
The first line of each input file containing WORD is the first valid
line of the file. The match is performed after any lines are skipped
via the -d option. WORD is a perl regular expression.
- -l
-
Remove leading white space from each line in the input file
before processing. Most useful with the -w option.
- -mfixed format data
-
The data are in fixed format (i.e. each column occupies a fixed number
of characters). This option specifies data to be used to split the
data into columns. It is used in conjunction with the -M option.
The presence of this option (not -M) indicates that the data are in
a fixed format.
- -Mfixed format type
-
This flag indicates how a fixed format input file will be split
into columns. It specifies the meaning of the value of the
-m option. The default value is mask.
The -b option may be useful in conjunction with this option.
- mask
-
The value of the -m option is the name of a file
which contains a mask indicating which characters belong in which
column. The mask is a single record, the same length as a record in
the data file. Contiguous characters for a column should have the
same mask value; repair looks for changes in the value to determine
the start and end of each column. Characters which should be ignored
should have values
x or X. Given the following data and mask,
000000.8 1.2 +252551 18 -36.0 108.6
112233334444X5667788X99X11111x22222
the resultant data columns would be
'00' '00' '00.8' ' 1.2' '+' '25' '25' '51' '99'
'-36.0' '108.6'
(the apostrophes are just for showing the contents of the columns).
Note that the last line in the mask file is used for the mask,
so you may put data on prior records for alignment.
- range
-
The value of the -m option is the name of a file which contains the
range of character positions in each input record which define the
columns.
A character position range consists of one or two numbers, separated
by a blank or the - character. Character positions start at one
(not zero). A single number indicates a single character; two numbers
indicate the start and end of the range (not the start and length).
Ranges may be separated by commas or placed on separate lines. Ranges
may not overlap. For example:
1-2, 4-9, 11, 13, 15-22
23 44
45
- -nWORD
-
Fields which equate identically to the string WORD are treated as
null (empty) fields.
- -nWORD
-
Fields which match the Perl regular expression WORD are treated as
null (empty) fields.
- -oFILE
-
Write the output to FILE instead of the automatically generated
filename. If more than one input file is processed, this option
is ignored.
- -quiet
-
No normal output messages.
- -rFILE
-
Use the template file FILE to replace the existing header.
Template files are in the format used by headchg. Column
definition checks are performed.
- -RFILE
-
Use the header in the rdb file FILE to replace the existing
header. Comments in FILE are ignored. Column definition checks
are performed.
- -sXXX
-
Suffix on new files of XXX instead of
.rdb.
- -tFILE
-
The input files have no header information. Use the template file
FILE for header and definition data. Template files are in the
format used by headchg. Column definition checks are not
performed, but column widths are determined, resulting in two passes
through the file. To avoid the width determination, use the -W
option in conjunction with this one.
- -TFILE
-
The input files have no header information. Use the rdb file FILE
for header and definition data. Comments in FILE are
ignored. Column definition checks are not performed, but column
widths are determined, resulting in two passes through the
file. To avoid the width determination, use the -W option
in conjunction with this one.
- -w
-
Split the input file on whitespace, rather than tabs.
- -W
-
Do not determine the column widths. This is usually used in
conjunction with -t or -T to ensure that only one pass is made
through the file.
- -zf
-
Zap the first column. This is useful when converting data into rdb
format where there's an extra column separator at the beginning of
each row. This has no effect if
-exist or -Exist are specified.
- -zl
-
Zap the last column. This is useful when converting data into rdb
format where there's an extra column separator at the end of
each row. This has no effect if
-exist or -Exist are specified.
repair is used to either import data into an rdb file (rewriting
the data into the rdb format) or check the consistency of an existing
rdb file. Consistency checks include determining the type of data in
a column and the column's maximum width, both of which are written to
the table header. Leading and trailing spaces in the data fields are
ignored when determining if a column is numeric. Fields are added as
necessary to rows (null), or to the header (DUM1, DUM2, ...) to
make the table structure valid.
repair writes out a new file with the extension .rdb replacing
whatever extension the original file had; the original file is
untouched. When processing existant rdb files, the .rdb suffix is
appended to the original file name. The new rdbtables will be in the
current directory (even if the input files are not).
You can override the automatic determination of the output file name
with the -o option. If there is more than one file to be
processed, the -o option is ignored.
If there is garbage at the front of the file to be repaired, the -d
and -k options allow it to be skipped. The options work on both
imported and existing rdb files.
If lines contain leading whitespace which should be removed, use
-l. This removes tabs as well as spaces.
Leading and trailing spaces in column entries may be removed with the
-blank option.
Imported data files should consist of columns of data.
- Tab delimited files
-
This is the default. If individual column entries contain spaces, the
columns must be separated by tabs. Tab delimited files may contain
empty column entries.
- Space delimited files
-
This is specified with the -w option. Since separating spaces in
space delimited files are collapsed, space delimited files may not
contain empty column entries.
- Arbitrary delimiters
-
The -c and -C options allow you to specify arbitrary delimiters.
The latter permits delimiters to match Perl regular expressions.
Sometimes extra delimiters may be prepended or appended to each row
For example:
| col1 | col2 | col3 |
The -zf and -zl will zap the extra front or last columns which
are created by the extra delimiters.
Column names for the imported data file may come from several places.
-
By default, repair will determine the column types and assign
them dummy names.
-
If the first valid line of the file contains the names, use the -f
option. The column types will automatically be determined.
-
You may use another rdb file (-T) or a header template file (-t)
to specify the column names and definitions. Consistency checks are
not made (i.e., make sure the definitions in the template or rdb file
are correct). The template file format is given in the headchg
documentation.
repair also works with existing rdbtables (-exist option) and is
convenient for removing leading and trailing space characters from data
values (-blank option). Its use is almost a prerequisite to using
the ptbl command.
You may use another rdb file (-R) or a header template file (-r)
to override the column names in the rdb file. Column consistency
checks are done, essentially overriding the column definitions in the
rdb or template file. The template file format is given in the
headchg documentation.
The -l, and -w options are
available for use with existing files as well, but are of limited
usage.
Diab Jerius ( djerius@cfa.harvard.edu )
Based on code by Walter Hobbs
$Revision: 1.30 $