Add documentation.

author: Mhd Sulhan <m.shulhan@gmail.com> 2014-07-27 20:22:12 +0700
committer: Mhd Sulhan <m.shulhan@gmail.com> 2014-07-27 20:22:12 +0700
commit: a5817d2410f65c3a055e4c1ec212270aed50186d (patch)
tree: 60b4f7f0bf5684a8a950ec7602e8536ccc8d456c /doc/dev/GOAL
parent: 69d5c74cb37f32588d78e09fcffb947cd74d9c13 (diff)
download: vos-a5817d2410f65c3a055e4c1ec212270aed50186d.tar.xz
1 files changed, 263 insertions, 0 deletions
diff --git a/doc/dev/GOAL b/doc/dev/GOAL
new file mode 100644
index 0000000..7578e06
--- /dev/null
+++ b/doc/dev/GOAL
@@ -0,0 +1,263 @@
+Vos Goals
+----------
+				Taken from CoSort Technical Specifications.
+
+
+Legend:
+- : unimplemented
++ : implemented
+= : on going/half done
+? : is it worth/why/what is that mean
+
+
+Ease of Use
+-----------
+
+- Processes record layouts and SQLlike field definitions from central data
+  dictionaries.
+
+- Converts and processes native COBOL copybook, Oracle SQL*Loader control
+  file, CSV, and W3C extended log format (ELF) file layouts.
+
+- SortCL data definition files are a supported MIMB metadata format.
+
+- Mix of online help, preruntime application validation, and runtime
+  error messages.
+
+- Leverages centralized application and file layout definitions (metadata
+  repositories).
+
+= Reports problems to standard error when invoked from a program, or
+  to an error log.
+
+- Runs silently or with verbose messaging without user intervention.
+
+- Allows user control over the amount of informational output produced.
+
+- Generates a queryready XML audit log for data forensics and privacy
+  compliance.
+
+= Describes commands and options through man pages and online documentation.
+
+	it's half done because the program is always moving to a new features.
+	it's not wise to mark this as 'done'.
+
+- Easytouse interfaces and seamless thirdparty sort replacements preclude
+  the need for training classes
+
+
+Resource Control
+----------------
+
++ Sets and allows user modification of the maximum and minimum number of
+  concurrent sort threads for sorting on multiCPU and multicore systems.
+
+	using PROCESS_MAX variable.
+
++ Uses a specified directory, a combination of directories, for temporary work
+  files.
+
+	using PROC_TMP_DIR variable.
+
++ Limits the amount of main and virtual memory used during sort operations.
+
+	using PROCESS_MAX_ROW variable.
+
+	Since input file size is unpredictable and a human is still need to
+	run the program, the amount of program memory still cannot decide by
+	human. What if it's set to 1 kilobytes ?.
+
++ Sets the size of the memory blocks used as physical I/O buffers.
+
+	using FILE_BUFFER_SIZE variable.
+
+
+Input and Output 
+----------------
+
+= Processes any number of files, of any size, and any number of records,
+  fixed or variable length to 65,535 bytes passed from an input procedure,
+  from stdin, a named pipe, a table in memory, or from an application program.
+
+	- TODO: from stdin
+	- TODO: from a named pipe.
+	- TODO: from a table in memory.
+	- TODO: from an application program.
+
+? Supports the use of environment variables.
+
+	for what ?
+
+= Supports wildcards in the specification of input and output files, as well
+  as absolute path names and aliases.
+
+	- TODO: supports wildcards in the specification of input files.
+
++ Accepts and outputs fixed or variablelength records with delimited field.
+
+? Generates one or more output files, and/or summary information, including
+  formatted and dashboardready reports.
+
+- Returns sorted, merged, or joined records one (or more) at a time to an output
+  procedure, to stdout (or named pipe), a table in memory, one or more new or
+  existing files, or to a program.
+
+- Outputs optional sequence numbers with each record, at any starting value, for
+  indexed loads and/or reports.
+
+
+Record Selection and Grouping
+-----------------------------
+
+= Includes or omits input or output records using fieldtofield or fieldconstant
+  comparisons.
+
+	TODO: field-to-field comparisons
+
+- Compares on any number of data fields, using standard and alternate collating
+  sequences.
+
++ Sorts and/or reformats groups of selected records.
+
+	using SORT and CREATE statement.
+
++ Matches two or more sorted or unsorted files on inner and outer join criteria using
+  SQLbased condition syntax.
+
+	using JOIN with '+' or '-' statement.
+
+- Skips a specified number of records, bytes, or a file header or footer.
+
+- Processes a specified number of records or bytes, including a saved header.
+
+- Eliminates or saves records with duplicate keys.
+
+
+Sort Key Processing
+-------------------
+
++ Allows any number of key fields to be specified in ascending or
+  descending order.
+
+	using SORT x by x.f1 ASC; or
+	using SORT x by x.f1 DESC;
+
++ Supports any number of fields from 0 to 65,535 bytes in length.
+
+	almost unlimited, the limit is your memory.
+
++ Orders fixed position fields, or floating fields with one or more
+  delimiters.
+
+- Supports numeric keys, including all C, FORTRAN, and COBOL data types.
+
+- Supports single and multibyte character keys, including ASCII, EBCDIC,
+  ASCII in EBCDIC sequence, American, European, ISO and Japanese timestamps,
+  and natural (localedependent) values, as well as Unicode and doublebyte
+  characters such as Big5, EUCTW, UTF32, and SJIS.
+
+- Allows left or right alignment and case shifting of character keys.
+
+- Accepts user compare procedures for multibyte, encrypted and other
+  special data.
+
+- Performs record sequence checking.
+
++ Maintains input record order (stability) on duplicate keys.
+
+- Controls treatment of null fields when specifying floating
+  (character separated) keys.
+
+- Collates and converts between many of the following data types (formats):
+	---
+
+
+Record Reformatting
+-------------------
+
++ Inserts, removes, resizes, and reorders fields within records; defines new
+  fields.
+
+- Converts data in fields from one format to another either using internal
+  conversion.
+
+- Maps common fields from differently formatted input files to a uniform sort
+  record.
+
+= Joins any fields from several files into an output record, usually based on a
+  condition.
+
+	using JOIN statement. current support only in joining two input files.
+
+- Changes record layouts from one file type to another, including: Line
+  Sequential, Record Sequential, Variable Sequential, Blocked, Microsoft Comma
+  Separated Values (CSV), ACUCOBOL Vision, MF ISAM, MFVL, Unisys VBF, VSAM
+  (within UniKik MBM), Extended Log Format (W3C), LDIF, and XML.
+
+- Maps processed records to many differently formatted output files, including
+  HTML.
+
+- Writes multiple record formats to the same file for complex report
+  requirements.
+
+- Performs mathematical expressions and functions on field data (including
+  aggregate data) to generate new output fields.
+
+- Calculates the difference in days, hours, minutes and seconds betweeen
+  timestamps.
+
+
+Field Reformatting/Validation
+-----------------------------
+
+- Aligns desired field contents to either the left or right of the target
+  field, where any leading or trailing fill characters from the source are
+  moved to the opposite side of the string.
+
+- Processes values from multidimensional, tabdelimited lookup files.
+
+- Creates and processes substrings of original field contents, where you can
+  specify a positive or negative offset and a number of bytes to be contained
+  in the substring.
+
+- Finds a userspecified text string in a given field, and replaces all
+  occurrences of it with a different userspecified text string in the target
+  field.
+
+- Supports Perl Compatible Regular Expressions (PCRE), including pattern
+  matching.
+
+- Uses Cstyle “iscompare” functions to validate contents at the field level
+  (for example, to determine if all field characters are printable), which can
+  also be used for recordfiltering via selection statements.
+
+- Protects sensitive field data with fieldlevel deidentification and AES256
+  encryption routines, along with anonymization, pseudonymization, filtering
+  and other column-level data masking and obfuscation techniques.
+
+- Supports custom, userwritten fieldlevel transformation libraries, and
+  documents an example of a fieldlevel data cleansing routine from
+  Melissa Data (AddressObject).
+
+
+Record Summarization
+--------------------
+
+- Consolidates records with equal keys into unique records, while totaling,
+  averaging, or counting values in specified fields, including derived
+  (crosscalculated) fields.
+
+- Produces maximum, minimum, average, sum, and count fields.
+
+- Displays running summary value(s) up to a break (accumulating aggregates).
+
+- Nreaks on compound conditions.
+
+- Allows multiple levels of summary fields in the same report.
+
+- Remaps summary fields into a new format, allowing relational tables.
+
+- Ranks data through a running count with descending numeric values.
+
+- Writes detail and summary records to the same output file for structured
+  reports.
author	Mhd Sulhan <m.shulhan@gmail.com>	2014-07-27 20:22:12 +0700
committer	Mhd Sulhan <m.shulhan@gmail.com>	2014-07-27 20:22:12 +0700
commit	a5817d2410f65c3a055e4c1ec212270aed50186d (patch)
tree	60b4f7f0bf5684a8a950ec7602e8536ccc8d456c /doc/dev/GOAL
parent	69d5c74cb37f32588d78e09fcffb947cd74d9c13 (diff)
download	vos-a5817d2410f65c3a055e4c1ec212270aed50186d.tar.xz