diff options
| author | Mhd Sulhan <m.shulhan@gmail.com> | 2014-07-27 20:22:12 +0700 |
|---|---|---|
| committer | Mhd Sulhan <m.shulhan@gmail.com> | 2014-07-27 20:22:12 +0700 |
| commit | a5817d2410f65c3a055e4c1ec212270aed50186d (patch) | |
| tree | 60b4f7f0bf5684a8a950ec7602e8536ccc8d456c /doc/dev/GOAL | |
| parent | 69d5c74cb37f32588d78e09fcffb947cd74d9c13 (diff) | |
| download | vos-a5817d2410f65c3a055e4c1ec212270aed50186d.tar.xz | |
Add documentation.
Diffstat (limited to 'doc/dev/GOAL')
| -rw-r--r-- | doc/dev/GOAL | 263 |
1 files changed, 263 insertions, 0 deletions
diff --git a/doc/dev/GOAL b/doc/dev/GOAL new file mode 100644 index 0000000..7578e06 --- /dev/null +++ b/doc/dev/GOAL @@ -0,0 +1,263 @@ +Vos Goals +---------- + Taken from CoSort Technical Specifications. + + +Legend: +- : unimplemented ++ : implemented += : on going/half done +? : is it worth/why/what is that mean + + +Ease of Use +----------- + +- Processes record layouts and SQLlike field definitions from central data + dictionaries. + +- Converts and processes native COBOL copybook, Oracle SQL*Loader control + file, CSV, and W3C extended log format (ELF) file layouts. + +- SortCL data definition files are a supported MIMB metadata format. + +- Mix of online help, preruntime application validation, and runtime + error messages. + +- Leverages centralized application and file layout definitions (metadata + repositories). + += Reports problems to standard error when invoked from a program, or + to an error log. + +- Runs silently or with verbose messaging without user intervention. + +- Allows user control over the amount of informational output produced. + +- Generates a queryready XML audit log for data forensics and privacy + compliance. + += Describes commands and options through man pages and online documentation. + + it's half done because the program is always moving to a new features. + it's not wise to mark this as 'done'. + +- Easytouse interfaces and seamless thirdparty sort replacements preclude + the need for training classes + + +Resource Control +---------------- + ++ Sets and allows user modification of the maximum and minimum number of + concurrent sort threads for sorting on multiCPU and multicore systems. + + using PROCESS_MAX variable. + ++ Uses a specified directory, a combination of directories, for temporary work + files. + + using PROC_TMP_DIR variable. + ++ Limits the amount of main and virtual memory used during sort operations. + + using PROCESS_MAX_ROW variable. + + Since input file size is unpredictable and a human is still need to + run the program, the amount of program memory still cannot decide by + human. What if it's set to 1 kilobytes ?. + ++ Sets the size of the memory blocks used as physical I/O buffers. + + using FILE_BUFFER_SIZE variable. + + +Input and Output +---------------- + += Processes any number of files, of any size, and any number of records, + fixed or variable length to 65,535 bytes passed from an input procedure, + from stdin, a named pipe, a table in memory, or from an application program. + + - TODO: from stdin + - TODO: from a named pipe. + - TODO: from a table in memory. + - TODO: from an application program. + +? Supports the use of environment variables. + + for what ? + += Supports wildcards in the specification of input and output files, as well + as absolute path names and aliases. + + - TODO: supports wildcards in the specification of input files. + ++ Accepts and outputs fixed or variablelength records with delimited field. + +? Generates one or more output files, and/or summary information, including + formatted and dashboardready reports. + +- Returns sorted, merged, or joined records one (or more) at a time to an output + procedure, to stdout (or named pipe), a table in memory, one or more new or + existing files, or to a program. + +- Outputs optional sequence numbers with each record, at any starting value, for + indexed loads and/or reports. + + +Record Selection and Grouping +----------------------------- + += Includes or omits input or output records using fieldtofield or fieldconstant + comparisons. + + TODO: field-to-field comparisons + +- Compares on any number of data fields, using standard and alternate collating + sequences. + ++ Sorts and/or reformats groups of selected records. + + using SORT and CREATE statement. + ++ Matches two or more sorted or unsorted files on inner and outer join criteria using + SQLbased condition syntax. + + using JOIN with '+' or '-' statement. + +- Skips a specified number of records, bytes, or a file header or footer. + +- Processes a specified number of records or bytes, including a saved header. + +- Eliminates or saves records with duplicate keys. + + +Sort Key Processing +------------------- + ++ Allows any number of key fields to be specified in ascending or + descending order. + + using SORT x by x.f1 ASC; or + using SORT x by x.f1 DESC; + ++ Supports any number of fields from 0 to 65,535 bytes in length. + + almost unlimited, the limit is your memory. + ++ Orders fixed position fields, or floating fields with one or more + delimiters. + +- Supports numeric keys, including all C, FORTRAN, and COBOL data types. + +- Supports single and multibyte character keys, including ASCII, EBCDIC, + ASCII in EBCDIC sequence, American, European, ISO and Japanese timestamps, + and natural (localedependent) values, as well as Unicode and doublebyte + characters such as Big5, EUCTW, UTF32, and SJIS. + +- Allows left or right alignment and case shifting of character keys. + +- Accepts user compare procedures for multibyte, encrypted and other + special data. + +- Performs record sequence checking. + ++ Maintains input record order (stability) on duplicate keys. + +- Controls treatment of null fields when specifying floating + (character separated) keys. + +- Collates and converts between many of the following data types (formats): + --- + + +Record Reformatting +------------------- + ++ Inserts, removes, resizes, and reorders fields within records; defines new + fields. + +- Converts data in fields from one format to another either using internal + conversion. + +- Maps common fields from differently formatted input files to a uniform sort + record. + += Joins any fields from several files into an output record, usually based on a + condition. + + using JOIN statement. current support only in joining two input files. + +- Changes record layouts from one file type to another, including: Line + Sequential, Record Sequential, Variable Sequential, Blocked, Microsoft Comma + Separated Values (CSV), ACUCOBOL Vision, MF ISAM, MFVL, Unisys VBF, VSAM + (within UniKik MBM), Extended Log Format (W3C), LDIF, and XML. + +- Maps processed records to many differently formatted output files, including + HTML. + +- Writes multiple record formats to the same file for complex report + requirements. + +- Performs mathematical expressions and functions on field data (including + aggregate data) to generate new output fields. + +- Calculates the difference in days, hours, minutes and seconds betweeen + timestamps. + + +Field Reformatting/Validation +----------------------------- + +- Aligns desired field contents to either the left or right of the target + field, where any leading or trailing fill characters from the source are + moved to the opposite side of the string. + +- Processes values from multidimensional, tabdelimited lookup files. + +- Creates and processes substrings of original field contents, where you can + specify a positive or negative offset and a number of bytes to be contained + in the substring. + +- Finds a userspecified text string in a given field, and replaces all + occurrences of it with a different userspecified text string in the target + field. + +- Supports Perl Compatible Regular Expressions (PCRE), including pattern + matching. + +- Uses Cstyle “iscompare” functions to validate contents at the field level + (for example, to determine if all field characters are printable), which can + also be used for recordfiltering via selection statements. + +- Protects sensitive field data with fieldlevel deidentification and AES256 + encryption routines, along with anonymization, pseudonymization, filtering + and other column-level data masking and obfuscation techniques. + +- Supports custom, userwritten fieldlevel transformation libraries, and + documents an example of a fieldlevel data cleansing routine from + Melissa Data (AddressObject). + + +Record Summarization +-------------------- + +- Consolidates records with equal keys into unique records, while totaling, + averaging, or counting values in specified fields, including derived + (crosscalculated) fields. + +- Produces maximum, minimum, average, sum, and count fields. + +- Displays running summary value(s) up to a break (accumulating aggregates). + +- Nreaks on compound conditions. + +- Allows multiple levels of summary fields in the same report. + +- Remaps summary fields into a new format, allowing relational tables. + +- Ranks data through a running count with descending numeric values. + +- Writes detail and summary records to the same output file for structured + reports. |
