NAME

repair - attempt to repair existing or candidate RDB datafiles


SYNOPSIS

repair [options] file ...


OPTIONS

Options may be abbreviated.

-blank
Remove leading and trailing blank characters from data fields.

-cCOL_DELIM
Specify an alternate column delimiter. This is ignored if the -exist, -Exist or -w options are specified. See also -C.

-CCOL_DELIM
Specify an alternate column delimiter. This is ignored if the -exist, -Exist, or -w options are specified. The delimiter is treated as a Perl regular expression. See also -c.

-dN
Deletes the first N lines of each input file. Used to remove extra lines before the actual header.

-D
Convert the D's in FORTRAN double precision output to E's.

-exist
The file(s) are existing rdbtables. Instead of generating new definition lines the current ones are used as starting values.

-Exist
Just like -exist except the data width check is not done.

-f
The file is not an rdb table, but the first valid line (after -d and/or -k processing) contains the column names

-help
Print help information and exit.

-kWORD
The first line of each input file containing WORD is the first valid line of the file. The match is performed after any lines are skipped via the -d option. WORD is a perl regular expression.

-l
Remove leading white space from each line in the input file before processing. Most useful with the -w option.

-mfixed format data
The data are in fixed format (i.e. each column occupies a fixed number of characters). This option specifies data to be used to split the data into columns. It is used in conjunction with the -M option. The presence of this option (not -M) indicates that the data are in a fixed format.

-Mfixed format type
This flag indicates how a fixed format input file will be split into columns. It specifies the meaning of the value of the -m option. The default value is mask.

The -b option may be useful in conjunction with this option.

mask
The value of the -m option is the name of a file which contains a mask indicating which characters belong in which column. The mask is a single record, the same length as a record in the data file. Contiguous characters for a column should have the same mask value; repair looks for changes in the value to determine the start and end of each column. Characters which should be ignored should have values x or X. Given the following data and mask,
  000000.8 1.2 +252551 18 -36.0 108.6
  112233334444X5667788X99X11111x22222

the resultant data columns would be

  '00' '00' '00.8' ' 1.2' '+' '25' '25' '51' '99'
  '-36.0' '108.6'

(the apostrophes are just for showing the contents of the columns).

Note that the last line in the mask file is used for the mask, so you may put data on prior records for alignment.

range
The value of the -m option is the name of a file which contains the range of character positions in each input record which define the columns.

A character position range consists of one or two numbers, separated by a blank or the - character. Character positions start at one (not zero). A single number indicates a single character; two numbers indicate the start and end of the range (not the start and length). Ranges may be separated by commas or placed on separate lines. Ranges may not overlap. For example:

        1-2, 4-9, 11, 13, 15-22
        23 44
        45

-nWORD
Fields which equate identically to the string WORD are treated as null (empty) fields.

-nWORD
Fields which match the Perl regular expression WORD are treated as null (empty) fields.

-oFILE
Write the output to FILE instead of the automatically generated filename. If more than one input file is processed, this option is ignored.

-quiet
No normal output messages.

-rFILE
Use the template file FILE to replace the existing header. Template files are in the format used by headchg. Column definition checks are performed.

-RFILE
Use the header in the rdb file FILE to replace the existing header. Comments in FILE are ignored. Column definition checks are performed.

-sXXX
Suffix on new files of XXX instead of .rdb.

-tFILE
The input files have no header information. Use the template file FILE for header and definition data. Template files are in the format used by headchg. Column definition checks are not performed, but column widths are determined, resulting in two passes through the file. To avoid the width determination, use the -W option in conjunction with this one.

-TFILE
The input files have no header information. Use the rdb file FILE for header and definition data. Comments in FILE are ignored. Column definition checks are not performed, but column widths are determined, resulting in two passes through the file. To avoid the width determination, use the -W option in conjunction with this one.

-w
Split the input file on whitespace, rather than tabs.

-W
Do not determine the column widths. This is usually used in conjunction with -t or -T to ensure that only one pass is made through the file.

-zf
Zap the first column. This is useful when converting data into rdb format where there's an extra column separator at the beginning of each row. This has no effect if -exist or -Exist are specified.

-zl
Zap the last column. This is useful when converting data into rdb format where there's an extra column separator at the end of each row. This has no effect if -exist or -Exist are specified.


DESCRIPTION

repair is used to either import data into an rdb file (rewriting the data into the rdb format) or check the consistency of an existing rdb file. Consistency checks include determining the type of data in a column and the column's maximum width, both of which are written to the table header. Leading and trailing spaces in the data fields are ignored when determining if a column is numeric. Fields are added as necessary to rows (null), or to the header (DUM1, DUM2, ...) to make the table structure valid.

repair writes out a new file with the extension .rdb replacing whatever extension the original file had; the original file is untouched. When processing existant rdb files, the .rdb suffix is appended to the original file name. The new rdbtables will be in the current directory (even if the input files are not).

You can override the automatic determination of the output file name with the -o option. If there is more than one file to be processed, the -o option is ignored.

Indicating the start of the header/data

If there is garbage at the front of the file to be repaired, the -d and -k options allow it to be skipped. The options work on both imported and existing rdb files.

Cleaning up data entries

If lines contain leading whitespace which should be removed, use -l. This removes tabs as well as spaces.

Leading and trailing spaces in column entries may be removed with the -blank option.

Importing data

Imported data files should consist of columns of data.

Tab delimited files
This is the default. If individual column entries contain spaces, the columns must be separated by tabs. Tab delimited files may contain empty column entries.

Space delimited files
This is specified with the -w option. Since separating spaces in space delimited files are collapsed, space delimited files may not contain empty column entries.

Arbitrary delimiters
The -c and -C options allow you to specify arbitrary delimiters. The latter permits delimiters to match Perl regular expressions.

Sometimes extra delimiters may be prepended or appended to each row For example:

   | col1 | col2 | col3 |

The -zf and -zl will zap the extra front or last columns which are created by the extra delimiters.

Column names for the imported data file may come from several places.

Existing rdb files

repair also works with existing rdbtables (-exist option) and is convenient for removing leading and trailing space characters from data values (-blank option). Its use is almost a prerequisite to using the ptbl command.

You may use another rdb file (-R) or a header template file (-r) to override the column names in the rdb file. Column consistency checks are done, essentially overriding the column definitions in the rdb or template file. The template file format is given in the headchg documentation.

The -l, and -w options are available for use with existing files as well, but are of limited usage.


AUTHOR

Diab Jerius ( djerius@cfa.harvard.edu )

Based on code by Walter Hobbs


Revision

$Revision: 1.30 $