diff options
| author | Shulhan <ms@kilabit.info> | 2026-01-02 17:21:26 +0700 |
|---|---|---|
| committer | Shulhan <ms@kilabit.info> | 2026-01-02 17:21:26 +0700 |
| commit | 797faa817881ea63271d5c6794b80ccd644cc76c (patch) | |
| tree | ce012b4e704d870c07e2fc4f50f6b099ffa82431 /doc/dev/GOAL.adoc | |
| parent | a5817d2410f65c3a055e4c1ec212270aed50186d (diff) | |
| download | vos-797faa817881ea63271d5c6794b80ccd644cc76c.tar.xz | |
doc: add index and reformat some document using asciidoc
This is for publication of doc under https://kilabit.info/project/vos .
Diffstat (limited to 'doc/dev/GOAL.adoc')
| -rw-r--r-- | doc/dev/GOAL.adoc | 258 |
1 files changed, 258 insertions, 0 deletions
diff --git a/doc/dev/GOAL.adoc b/doc/dev/GOAL.adoc new file mode 100644 index 0000000..74370fb --- /dev/null +++ b/doc/dev/GOAL.adoc @@ -0,0 +1,258 @@ += Vos Goals + +Taken from CoSort Technical Specifications. + +Legend: + +* - : unimplemented +* + : implemented +* = : on going/half done +* ? : is it worth/why/what is that mean + + +== Ease of Use + +(-) Processes record layouts and SQLlike field definitions from central +data dictionaries. + +(-) Converts and processes native COBOL copybook, Oracle SQL*Loader control +file, CSV, and W3C extended log format (ELF) file layouts. + +(-) SortCL data definition files are a supported MIMB metadata format. + +(-) Mix of online help, preruntime application validation, and runtime +error messages. + +(-) Leverages centralized application and file layout definitions +(metadata repositories). + +(=) Reports problems to standard error when invoked from a program, or +to an error log. + +(-) Runs silently or with verbose messaging without user intervention. + +(-) Allows user control over the amount of informational output produced. + +(-) Generates a queryready XML audit log for data forensics and privacy +compliance. + +(=) Describes commands and options through man pages and online +documentation. + +it's half done because the program is always moving to a new features. +it's not wise to mark this as 'done'. + +(-) Easytouse interfaces and seamless thirdparty sort replacements +preclude the need for training classes + + +== Resource Control + +(+) Sets and allows user modification of the maximum and minimum number of +concurrent sort threads for sorting on multiCPU and multicore systems. + +Using PROCESS_MAX variable. + +(+) Uses a specified directory, a combination of directories, for temporary +work files. + +Using PROC_TMP_DIR variable. + +(+) Limits the amount of main and virtual memory used during sort +operations. + +Using PROCESS_MAX_ROW variable. + +Since input file size is unpredictable and a human is still need to +run the program, the amount of program memory still cannot decide by +human. What if it's set to 1 kilobytes ?. + +(+) Sets the size of the memory blocks used as physical I/O buffers. + +Using FILE_BUFFER_SIZE variable. + + +== Input and Output + +(=) Processes any number of files, of any size, and any number of records, +fixed or variable length to 65,535 bytes passed from an input procedure, +from stdin, a named pipe, a table in memory, or from an application program. + +- TODO: from stdin +- TODO: from a named pipe. +- TODO: from a table in memory. +- TODO: from an application program. + +(?) Supports the use of environment variables. + +(=) Supports wildcard in the specification of input and output files, as +well as absolute path names and aliases. + +- TODO: supports wildcard in the specification of input files. + +(+) Accepts and outputs fixed or variablelength records with delimited +field. + +(?) Generates one or more output files, and/or summary information, +including formatted and dashboardready reports. + +(-) Returns sorted, merged, or joined records one (or more) at a time to an +output procedure, to stdout (or named pipe), a table in memory, one or more +new or existing files, or to a program. + +(-) Outputs optional sequence numbers with each record, at any starting +value, for indexed loads and/or reports. + + +== Record Selection and Grouping + +(=) Includes or omits input or output records using fieldtofield or field +constant comparisons. + +TODO: field-to-field comparisons + +(-) Compares on any number of data fields, using standard and alternate +collating sequences. + +(+) Sorts and/or reformats groups of selected records. + +Using SORT and CREATE statement. + +(+) Matches two or more sorted or unsorted files on inner and outer join +criteria using SQLbased condition syntax. + +Using JOIN with '+' or '-' statement. + +(-) Skips a specified number of records, bytes, or a file header or footer. + +(-) Processes a specified number of records or bytes, including a saved +header. + +(-) Eliminates or saves records with duplicate keys. + + +== Sort Key Processing + +(+) Allows any number of key fields to be specified in ascending or +descending order. + + using SORT x by x.f1 ASC; or + using SORT x by x.f1 DESC; + +(+) Supports any number of fields from 0 to 65,535 bytes in length. + +Almost unlimited, the limit is your memory. + +(+) Orders fixed position fields, or floating fields with one or more +delimiters. + +(-) Supports numeric keys, including all C, FORTRAN, and COBOL data types. + +(-) Supports single and multibyte character keys, including ASCII, EBCDIC, +ASCII in EBCDIC sequence, American, European, ISO and Japanese timestamps, +and natural (localedependent) values, as well as Unicode and doublebyte +characters such as Big5, EUCTW, UTF32, and SJIS. + +(-) Allows left or right alignment and case shifting of character keys. + +(-) Accepts user compare procedures for multibyte, encrypted and other +special data. + +(-) Performs record sequence checking. + +(+) Maintains input record order (stability) on duplicate keys. + +(-) Controls treatment of null fields when specifying floating +(character separated) keys. + +(-) Collates and converts between many of the following data types +(formats). + + +== Record Reformatting + +(+) Inserts, removes, resizes, and reorders fields within records; defines +new fields. + +(-) Converts data in fields from one format to another either using internal +conversion. + +(-) Maps common fields from differently formatted input files to a uniform +sort record. + +(=) Joins any fields from several files into an output record, usually based +on a condition. + +Using JOIN statement. current support only in joining two input files. + +(-) Changes record layouts from one file type to another, including: Line +Sequential, Record Sequential, Variable Sequential, Blocked, Microsoft Comma +Separated Values (CSV), ACUCOBOL Vision, MF ISAM, MFVL, Unisys VBF, VSAM +(within UniKik MBM), Extended Log Format (W3C), LDIF, and XML. + +(-) Maps processed records to many differently formatted output files, +including HTML. + +(-) Writes multiple record formats to the same file for complex report +requirements. + +(-) Performs mathematical expressions and functions on field data (including +aggregate data) to generate new output fields. + +(-) Calculates the difference in days, hours, minutes and seconds between +timestamps. + + +== Field Reformatting/Validation + +(-) Aligns desired field contents to either the left or right of the target +field, where any leading or trailing fill characters from the source are +moved to the opposite side of the string. + +(-) Processes values from multidimensional, tabdelimited lookup files. + +(-) Creates and processes substrings of original field contents, where you +can specify a positive or negative offset and a number of bytes to be +contained in the substring. + +(-) Finds a userspecified text string in a given field, and replaces all +occurrences of it with a different userspecified text string in the target +field. + +(-) Supports Perl Compatible Regular Expressions (PCRE), including pattern +matching. + +(-) Uses Cstyle “iscompare” functions to validate contents at the field +level (for example, to determine if all field characters are printable), +which can also be used for recordfiltering via selection statements. + +(-) Protects sensitive field data with fieldlevel deidentification and +AES256 encryption routines, along with anonymization, pseudonymization, +filtering and other column-level data masking and obfuscation techniques. + +(-) Supports custom, userwritten fieldlevel transformation libraries, and +documents an example of a fieldlevel data cleansing routine from +Melissa Data (AddressObject). + + +== Record Summarization + +(-) Consolidates records with equal keys into unique records, while +totaling, averaging, or counting values in specified fields, including +derived (crosscalculated) fields. + +(-) Produces maximum, minimum, average, sum, and count fields. + +(-) Displays running summary value(s) up to a break (accumulating +aggregates). + +(-) Breaks on compound conditions. + +(-) Allows multiple levels of summary fields in the same report. + +(-) Remaps summary fields into a new format, allowing relational tables. + +(-) Ranks data through a running count with descending numeric values. + +(-) Writes detail and summary records to the same output file for structured +reports. |
