doc: add index and reformat some document using asciidoc

This is for publication of doc under https://kilabit.info/project/vos .
author: Shulhan <ms@kilabit.info> 2026-01-02 17:21:26 +0700
committer: Shulhan <ms@kilabit.info> 2026-01-02 17:21:26 +0700
commit: 797faa817881ea63271d5c6794b80ccd644cc76c (patch)
tree: ce012b4e704d870c07e2fc4f50f6b099ffa82431
parent: a5817d2410f65c3a055e4c1ec212270aed50186d (diff)
download: vos-797faa817881ea63271d5c6794b80ccd644cc76c.tar.xz
5 files changed, 334 insertions, 299 deletions
diff --git a/doc/.gitignore b/doc/.gitignore
new file mode 100644
index 0000000..2d19fc7
--- /dev/null
+++ b/doc/.gitignore
@@ -0,0 +1 @@
+*.html
diff --git a/doc/dev/GOAL b/doc/dev/GOAL
deleted file mode 100644
index 7578e06..0000000
--- a/doc/dev/GOAL
+++ /dev/null
@@ -1,263 +0,0 @@
-Vos Goals
-----------
-				Taken from CoSort Technical Specifications.
-
-
-Legend:
-- : unimplemented
-+ : implemented
-= : on going/half done
-? : is it worth/why/what is that mean
-
-
-Ease of Use
------------
-
-- Processes record layouts and SQLlike field definitions from central data
-  dictionaries.
-
-- Converts and processes native COBOL copybook, Oracle SQL*Loader control
-  file, CSV, and W3C extended log format (ELF) file layouts.
-
-- SortCL data definition files are a supported MIMB metadata format.
-
-- Mix of online help, preruntime application validation, and runtime
-  error messages.
-
-- Leverages centralized application and file layout definitions (metadata
-  repositories).
-
-= Reports problems to standard error when invoked from a program, or
-  to an error log.
-
-- Runs silently or with verbose messaging without user intervention.
-
-- Allows user control over the amount of informational output produced.
-
-- Generates a queryready XML audit log for data forensics and privacy
-  compliance.
-
-= Describes commands and options through man pages and online documentation.
-
-	it's half done because the program is always moving to a new features.
-	it's not wise to mark this as 'done'.
-
-- Easytouse interfaces and seamless thirdparty sort replacements preclude
-  the need for training classes
-
-
-Resource Control
-----------------
-
-+ Sets and allows user modification of the maximum and minimum number of
-  concurrent sort threads for sorting on multiCPU and multicore systems.
-
-	using PROCESS_MAX variable.
-
-+ Uses a specified directory, a combination of directories, for temporary work
-  files.
-
-	using PROC_TMP_DIR variable.
-
-+ Limits the amount of main and virtual memory used during sort operations.
-
-	using PROCESS_MAX_ROW variable.
-
-	Since input file size is unpredictable and a human is still need to
-	run the program, the amount of program memory still cannot decide by
-	human. What if it's set to 1 kilobytes ?.
-
-+ Sets the size of the memory blocks used as physical I/O buffers.
-
-	using FILE_BUFFER_SIZE variable.
-
-
-Input and Output 
-----------------
-
-= Processes any number of files, of any size, and any number of records,
-  fixed or variable length to 65,535 bytes passed from an input procedure,
-  from stdin, a named pipe, a table in memory, or from an application program.
-
-	- TODO: from stdin
-	- TODO: from a named pipe.
-	- TODO: from a table in memory.
-	- TODO: from an application program.
-
-? Supports the use of environment variables.
-
-	for what ?
-
-= Supports wildcards in the specification of input and output files, as well
-  as absolute path names and aliases.
-
-	- TODO: supports wildcards in the specification of input files.
-
-+ Accepts and outputs fixed or variablelength records with delimited field.
-
-? Generates one or more output files, and/or summary information, including
-  formatted and dashboardready reports.
-
-- Returns sorted, merged, or joined records one (or more) at a time to an output
-  procedure, to stdout (or named pipe), a table in memory, one or more new or
-  existing files, or to a program.
-
-- Outputs optional sequence numbers with each record, at any starting value, for
-  indexed loads and/or reports.
-
-
-Record Selection and Grouping
------------------------------
-
-= Includes or omits input or output records using fieldtofield or fieldconstant
-  comparisons.
-
-	TODO: field-to-field comparisons
-
-- Compares on any number of data fields, using standard and alternate collating
-  sequences.
-
-+ Sorts and/or reformats groups of selected records.
-
-	using SORT and CREATE statement.
-
-+ Matches two or more sorted or unsorted files on inner and outer join criteria using
-  SQLbased condition syntax.
-
-	using JOIN with '+' or '-' statement.
-
-- Skips a specified number of records, bytes, or a file header or footer.
-
-- Processes a specified number of records or bytes, including a saved header.
-
-- Eliminates or saves records with duplicate keys.
-
-
-Sort Key Processing
--------------------
-
-+ Allows any number of key fields to be specified in ascending or
-  descending order.
-
-	using SORT x by x.f1 ASC; or
-	using SORT x by x.f1 DESC;
-
-+ Supports any number of fields from 0 to 65,535 bytes in length.
-
-	almost unlimited, the limit is your memory.
-
-+ Orders fixed position fields, or floating fields with one or more
-  delimiters.
-
-- Supports numeric keys, including all C, FORTRAN, and COBOL data types.
-
-- Supports single and multibyte character keys, including ASCII, EBCDIC,
-  ASCII in EBCDIC sequence, American, European, ISO and Japanese timestamps,
-  and natural (localedependent) values, as well as Unicode and doublebyte
-  characters such as Big5, EUCTW, UTF32, and SJIS.
-
-- Allows left or right alignment and case shifting of character keys.
-
-- Accepts user compare procedures for multibyte, encrypted and other
-  special data.
-
-- Performs record sequence checking.
-
-+ Maintains input record order (stability) on duplicate keys.
-
-- Controls treatment of null fields when specifying floating
-  (character separated) keys.
-
-- Collates and converts between many of the following data types (formats):
-	---
-
-
-Record Reformatting
--------------------
-
-+ Inserts, removes, resizes, and reorders fields within records; defines new
-  fields.
-
-- Converts data in fields from one format to another either using internal
-  conversion.
-
-- Maps common fields from differently formatted input files to a uniform sort
-  record.
-
-= Joins any fields from several files into an output record, usually based on a
-  condition.
-
-	using JOIN statement. current support only in joining two input files.
-
-- Changes record layouts from one file type to another, including: Line
-  Sequential, Record Sequential, Variable Sequential, Blocked, Microsoft Comma
-  Separated Values (CSV), ACUCOBOL Vision, MF ISAM, MFVL, Unisys VBF, VSAM
-  (within UniKik MBM), Extended Log Format (W3C), LDIF, and XML.
-
-- Maps processed records to many differently formatted output files, including
-  HTML.
-
-- Writes multiple record formats to the same file for complex report
-  requirements.
-
-- Performs mathematical expressions and functions on field data (including
-  aggregate data) to generate new output fields.
-
-- Calculates the difference in days, hours, minutes and seconds betweeen
-  timestamps.
-
-
-Field Reformatting/Validation
------------------------------
-
-- Aligns desired field contents to either the left or right of the target
-  field, where any leading or trailing fill characters from the source are
-  moved to the opposite side of the string.
-
-- Processes values from multidimensional, tabdelimited lookup files.
-
-- Creates and processes substrings of original field contents, where you can
-  specify a positive or negative offset and a number of bytes to be contained
-  in the substring.
-
-- Finds a userspecified text string in a given field, and replaces all
-  occurrences of it with a different userspecified text string in the target
-  field.
-
-- Supports Perl Compatible Regular Expressions (PCRE), including pattern
-  matching.
-
-- Uses Cstyle “iscompare” functions to validate contents at the field level
-  (for example, to determine if all field characters are printable), which can
-  also be used for recordfiltering via selection statements.
-
-- Protects sensitive field data with fieldlevel deidentification and AES256
-  encryption routines, along with anonymization, pseudonymization, filtering
-  and other column-level data masking and obfuscation techniques.
-
-- Supports custom, userwritten fieldlevel transformation libraries, and
-  documents an example of a fieldlevel data cleansing routine from
-  Melissa Data (AddressObject).
-
-
-Record Summarization
---------------------
-
-- Consolidates records with equal keys into unique records, while totaling,
-  averaging, or counting values in specified fields, including derived
-  (crosscalculated) fields.
-
-- Produces maximum, minimum, average, sum, and count fields.
-
-- Displays running summary value(s) up to a break (accumulating aggregates).
-
-- Nreaks on compound conditions.
-
-- Allows multiple levels of summary fields in the same report.
-
-- Remaps summary fields into a new format, allowing relational tables.
-
-- Ranks data through a running count with descending numeric values.
-
-- Writes detail and summary records to the same output file for structured
-  reports.
diff --git a/doc/dev/GOAL.adoc b/doc/dev/GOAL.adoc
new file mode 100644
index 0000000..74370fb
--- /dev/null
+++ b/doc/dev/GOAL.adoc
@@ -0,0 +1,258 @@
+= Vos Goals
+
+Taken from CoSort Technical Specifications.
+
+Legend:
+
+* - : unimplemented
+* + : implemented
+* = : on going/half done
+* ? : is it worth/why/what is that mean
+
+
+== Ease of Use
+
+(-) Processes record layouts and SQLlike field definitions from central
+data dictionaries.
+
+(-) Converts and processes native COBOL copybook, Oracle SQL*Loader control
+file, CSV, and W3C extended log format (ELF) file layouts.
+
+(-) SortCL data definition files are a supported MIMB metadata format.
+
+(-) Mix of online help, preruntime application validation, and runtime
+error messages.
+
+(-) Leverages centralized application and file layout definitions
+(metadata repositories).
+
+(=) Reports problems to standard error when invoked from a program, or
+to an error log.
+
+(-) Runs silently or with verbose messaging without user intervention.
+
+(-) Allows user control over the amount of informational output produced.
+
+(-) Generates a queryready XML audit log for data forensics and privacy
+compliance.
+
+(=) Describes commands and options through man pages and online
+documentation.
+
+it's half done because the program is always moving to a new features.
+it's not wise to mark this as 'done'.
+
+(-) Easytouse interfaces and seamless thirdparty sort replacements
+preclude the need for training classes
+
+
+== Resource Control
+
+(+) Sets and allows user modification of the maximum and minimum number of
+concurrent sort threads for sorting on multiCPU and multicore systems.
+
+Using PROCESS_MAX variable.
+
+(+) Uses a specified directory, a combination of directories, for temporary
+work files.
+
+Using PROC_TMP_DIR variable.
+
+(+) Limits the amount of main and virtual memory used during sort
+operations.
+
+Using PROCESS_MAX_ROW variable.
+
+Since input file size is unpredictable and a human is still need to
+run the program, the amount of program memory still cannot decide by
+human. What if it's set to 1 kilobytes ?.
+
+(+) Sets the size of the memory blocks used as physical I/O buffers.
+
+Using FILE_BUFFER_SIZE variable.
+
+
+== Input and Output 
+
+(=) Processes any number of files, of any size, and any number of records,
+fixed or variable length to 65,535 bytes passed from an input procedure,
+from stdin, a named pipe, a table in memory, or from an application program.
+
+- TODO: from stdin
+- TODO: from a named pipe.
+- TODO: from a table in memory.
+- TODO: from an application program.
+
+(?) Supports the use of environment variables.
+
+(=) Supports wildcard in the specification of input and output files, as
+well as absolute path names and aliases.
+
+- TODO: supports wildcard in the specification of input files.
+
+(+) Accepts and outputs fixed or variablelength records with delimited
+field.
+
+(?) Generates one or more output files, and/or summary information,
+including formatted and dashboardready reports.
+
+(-) Returns sorted, merged, or joined records one (or more) at a time to an
+output procedure, to stdout (or named pipe), a table in memory, one or more
+new or existing files, or to a program.
+
+(-) Outputs optional sequence numbers with each record, at any starting
+value, for indexed loads and/or reports.
+
+
+== Record Selection and Grouping
+
+(=) Includes or omits input or output records using fieldtofield or field
+constant comparisons.
+
+TODO: field-to-field comparisons
+
+(-) Compares on any number of data fields, using standard and alternate
+collating sequences.
+
+(+) Sorts and/or reformats groups of selected records.
+
+Using SORT and CREATE statement.
+
+(+) Matches two or more sorted or unsorted files on inner and outer join
+criteria using SQLbased condition syntax.
+
+Using JOIN with '+' or '-' statement.
+
+(-) Skips a specified number of records, bytes, or a file header or footer.
+
+(-) Processes a specified number of records or bytes, including a saved
+header.
+
+(-) Eliminates or saves records with duplicate keys.
+
+
+== Sort Key Processing
+
+(+) Allows any number of key fields to be specified in ascending or
+descending order.
+
+	using SORT x by x.f1 ASC; or
+	using SORT x by x.f1 DESC;
+
+(+) Supports any number of fields from 0 to 65,535 bytes in length.
+
+Almost unlimited, the limit is your memory.
+
+(+) Orders fixed position fields, or floating fields with one or more
+delimiters.
+
+(-) Supports numeric keys, including all C, FORTRAN, and COBOL data types.
+
+(-) Supports single and multibyte character keys, including ASCII, EBCDIC,
+ASCII in EBCDIC sequence, American, European, ISO and Japanese timestamps,
+and natural (localedependent) values, as well as Unicode and doublebyte
+characters such as Big5, EUCTW, UTF32, and SJIS.
+
+(-) Allows left or right alignment and case shifting of character keys.
+
+(-) Accepts user compare procedures for multibyte, encrypted and other
+special data.
+
+(-) Performs record sequence checking.
+
+(+) Maintains input record order (stability) on duplicate keys.
+
+(-) Controls treatment of null fields when specifying floating
+(character separated) keys.
+
+(-) Collates and converts between many of the following data types
+(formats).
+
+
+== Record Reformatting
+
+(+) Inserts, removes, resizes, and reorders fields within records; defines
+new fields.
+
+(-) Converts data in fields from one format to another either using internal
+conversion.
+
+(-) Maps common fields from differently formatted input files to a uniform
+sort record.
+
+(=) Joins any fields from several files into an output record, usually based
+on a condition.
+
+Using JOIN statement. current support only in joining two input files.
+
+(-) Changes record layouts from one file type to another, including: Line
+Sequential, Record Sequential, Variable Sequential, Blocked, Microsoft Comma
+Separated Values (CSV), ACUCOBOL Vision, MF ISAM, MFVL, Unisys VBF, VSAM
+(within UniKik MBM), Extended Log Format (W3C), LDIF, and XML.
+
+(-) Maps processed records to many differently formatted output files,
+including HTML.
+
+(-) Writes multiple record formats to the same file for complex report
+requirements.
+
+(-) Performs mathematical expressions and functions on field data (including
+aggregate data) to generate new output fields.
+
+(-) Calculates the difference in days, hours, minutes and seconds between
+timestamps.
+
+
+== Field Reformatting/Validation
+
+(-) Aligns desired field contents to either the left or right of the target
+field, where any leading or trailing fill characters from the source are
+moved to the opposite side of the string.
+
+(-) Processes values from multidimensional, tabdelimited lookup files.
+
+(-) Creates and processes substrings of original field contents, where you
+can specify a positive or negative offset and a number of bytes to be
+contained in the substring.
+
+(-) Finds a userspecified text string in a given field, and replaces all
+occurrences of it with a different userspecified text string in the target
+field.
+
+(-) Supports Perl Compatible Regular Expressions (PCRE), including pattern
+matching.
+
+(-) Uses Cstyle “iscompare” functions to validate contents at the field
+level (for example, to determine if all field characters are printable),
+which can also be used for recordfiltering via selection statements.
+
+(-) Protects sensitive field data with fieldlevel deidentification and
+AES256 encryption routines, along with anonymization, pseudonymization,
+filtering and other column-level data masking and obfuscation techniques.
+
+(-) Supports custom, userwritten fieldlevel transformation libraries, and
+documents an example of a fieldlevel data cleansing routine from
+Melissa Data (AddressObject).
+
+
+== Record Summarization
+
+(-) Consolidates records with equal keys into unique records, while
+totaling, averaging, or counting values in specified fields, including
+derived (crosscalculated) fields.
+
+(-) Produces maximum, minimum, average, sum, and count fields.
+
+(-) Displays running summary value(s) up to a break (accumulating
+aggregates).
+
+(-) Breaks on compound conditions.
+
+(-) Allows multiple levels of summary fields in the same report.
+
+(-) Remaps summary fields into a new format, allowing relational tables.
+
+(-) Ranks data through a running count with descending numeric values.
+
+(-) Writes detail and summary records to the same output file for structured
+reports.
diff --git a/doc/dev/NOTES b/doc/dev/NOTES.adoc
index 92bf86c..72d1306 100644
--- a/doc/dev/NOTES
+++ b/doc/dev/NOTES.adoc
@@ -1,5 +1,5 @@
-				sometimes i forgot why i write code like this.
-								-- S.T.M.L
+			sometimes i forgot why i write code like this.
+							-- S.T.M.L
 
 - follow linux coding style
 
@@ -29,51 +29,57 @@
 
 
 
-001 - I/O Relation between Statement
------------------------------------------------------------------------------
+== 001 - I/O Relation between Statement
+
 LOAD is an input statement.
 
 SORT, CREATE, JOIN is an output statement, but it can be an input.
 i.e:
 
-	1 - load abc ( ... ) as x;
-	2 - sort x by a, b;
-	3 - create ghi ( x.field, ... ) as out_x;
+----
+1 - load abc ( ... ) as x;
+2 - sort x by a, b;
+3 - create ghi ( x.field, ... ) as out_x;
+----
 
 file output created by sort statement in line 2 will be an input by create
 statement in line 3.
 
 
-002 - Why we need '2nd-loser'
------------------------------------------------------------------------------
+== 002 - Why we need '2nd-loser'
 
 to minimize comparison and insert in merge tree.
 
 
 
-003 - Why we need 'level' on tree node
------------------------------------------------------------------------------
+== 003 - Why we need 'level' on tree node
 
 list of input file to merge is A, B, C contain sorted data :
 
-	A : 10, 11, 12, 13      (1st file)
-	B : 1, 12, 100, 101     (2nd file)
-	C : 2, 13, 200, 201     (3rd file)
+----
+A : 10, 11, 12, 13      (1st file)
+B : 1, 12, 100, 101     (2nd file)
+C : 2, 13, 200, 201     (3rd file)
+----
 
 if we use tree insert algorithm:
 
-	if (root < node)
-		insert to left
-	else
-		insert to right
+----
+if (root < node)
+	insert to left
+else
+	insert to right
+----
 
 after several step we will get:
 
+----
 B-12
     \
     C-13
     /
 A-12
+----
 
 which result in not-a-stable sort,
 
@@ -85,38 +91,45 @@ they should be,
 
 Even if we choose different algorithm in insert:
 
-	if (root <= node)
-		insert to left
-	else
-		insert to right
+----
+if (root <= node)
+	insert to left
+else
+	insert to right
+----
 
 there is also input data that will violate this, i.e:
 
-	A : 2, 13, 200, 201     (1st file)
-	B : 1, 12, 100, 101     (2nd file)
-	C : 10, 11, 12, 13      (3rd file)
+----
+A : 2, 13, 200, 201     (1st file)
+B : 1, 12, 100, 101     (2nd file)
+C : 10, 11, 12, 13      (3rd file)
+----
 
 
-004 - recursives call + thread + free on SunOS 5.10
------------------------------------------------------------------------------
+== 004 - recursives call + thread + free on SunOS 5.10
 
 i did not investigate much, but doing a recursive call + thread + free cause
 SIGSEGV on SunOS 5.10 system, but not in GNU/Linux system. This odd's found
 whee testing on Solaris and by using dbx the SIGSEGV "sometimes" catched in
 str_destroy,
 
-	if (str->buf)
-		free(str->buf); <= dbx catch here
+----
+if (str->buf)
+	free(str->buf); <= dbx catch here
+----
 
 and "sometimes" below that (but not in vos function/stack).
 
 i.e:
-	list_destroy(**ptr)
-	{
-		if (! (*ptr))
-			return;
-		list_destroy((*ptr)->next);
-		free((*ptr));
-	}
+----
+list_destroy(**ptr)
+{
+	if (! (*ptr))
+		return;
+	list_destroy((*ptr)->next);
+	free((*ptr));
+}
+----
 
 and no, it's not about double free.
diff --git a/doc/index.adoc b/doc/index.adoc
new file mode 100644
index 0000000..1cc3be0
--- /dev/null
+++ b/doc/index.adoc
@@ -0,0 +1,26 @@
+= vos
+
+Vos is a program to process formatted data, i.e. CSV data.
+Vos is designed to process a large input file, a file where their size is
+larger than the size of memory, and can be tuned to adapt with your machine
+environment.
+
+link:user/vos_user_manual.html[Vos User Manual] - User manual for vos
+command line.
+
+
+== Development
+
+link:dev/GOAL.html[GOAL] - List the goal of this project.
+
+link:dev/NOTES.html[NOTES] - Miscellaneous notes when developing the
+project.
+
+link:dev/vos-sketch.odg[Vos sketch diagram].
+
+Performance logs,
+
+- link:dev/vos.test.create.log[vos.test.create.log].
+- link:dev/vos.test.create.mem.log[vos.test.create.mem.log].
+- link:dev/vos.test.join.log[vos.test.join.log].
+- link:dev/vos.test.join.log[vos.test.join.log].
author	Shulhan <ms@kilabit.info>	2026-01-02 17:21:26 +0700
committer	Shulhan <ms@kilabit.info>	2026-01-02 17:21:26 +0700
commit	797faa817881ea63271d5c6794b80ccd644cc76c (patch)
tree	ce012b4e704d870c07e2fc4f50f6b099ffa82431
parent	a5817d2410f65c3a055e4c1ec212270aed50186d (diff)
download	vos-797faa817881ea63271d5c6794b80ccd644cc76c.tar.xz