Add documentation.

author: Mhd Sulhan <m.shulhan@gmail.com> 2014-07-27 20:22:12 +0700
committer: Mhd Sulhan <m.shulhan@gmail.com> 2014-07-27 20:22:12 +0700
commit: a5817d2410f65c3a055e4c1ec212270aed50186d (patch)
tree: 60b4f7f0bf5684a8a950ec7602e8536ccc8d456c
parent: 69d5c74cb37f32588d78e09fcffb947cd74d9c13 (diff)
download: vos-a5817d2410f65c3a055e4c1ec212270aed50186d.tar.xz
27 files changed, 1830 insertions, 0 deletions
diff --git a/doc/dev/GOAL b/doc/dev/GOAL
new file mode 100644
index 0000000..7578e06
--- /dev/null
+++ b/doc/dev/GOAL
@@ -0,0 +1,263 @@
+Vos Goals
+----------
+				Taken from CoSort Technical Specifications.
+
+
+Legend:
+- : unimplemented
++ : implemented
+= : on going/half done
+? : is it worth/why/what is that mean
+
+
+Ease of Use
+-----------
+
+- Processes record layouts and SQLlike field definitions from central data
+  dictionaries.
+
+- Converts and processes native COBOL copybook, Oracle SQL*Loader control
+  file, CSV, and W3C extended log format (ELF) file layouts.
+
+- SortCL data definition files are a supported MIMB metadata format.
+
+- Mix of online help, preruntime application validation, and runtime
+  error messages.
+
+- Leverages centralized application and file layout definitions (metadata
+  repositories).
+
+= Reports problems to standard error when invoked from a program, or
+  to an error log.
+
+- Runs silently or with verbose messaging without user intervention.
+
+- Allows user control over the amount of informational output produced.
+
+- Generates a queryready XML audit log for data forensics and privacy
+  compliance.
+
+= Describes commands and options through man pages and online documentation.
+
+	it's half done because the program is always moving to a new features.
+	it's not wise to mark this as 'done'.
+
+- Easytouse interfaces and seamless thirdparty sort replacements preclude
+  the need for training classes
+
+
+Resource Control
+----------------
+
++ Sets and allows user modification of the maximum and minimum number of
+  concurrent sort threads for sorting on multiCPU and multicore systems.
+
+	using PROCESS_MAX variable.
+
++ Uses a specified directory, a combination of directories, for temporary work
+  files.
+
+	using PROC_TMP_DIR variable.
+
++ Limits the amount of main and virtual memory used during sort operations.
+
+	using PROCESS_MAX_ROW variable.
+
+	Since input file size is unpredictable and a human is still need to
+	run the program, the amount of program memory still cannot decide by
+	human. What if it's set to 1 kilobytes ?.
+
++ Sets the size of the memory blocks used as physical I/O buffers.
+
+	using FILE_BUFFER_SIZE variable.
+
+
+Input and Output 
+----------------
+
+= Processes any number of files, of any size, and any number of records,
+  fixed or variable length to 65,535 bytes passed from an input procedure,
+  from stdin, a named pipe, a table in memory, or from an application program.
+
+	- TODO: from stdin
+	- TODO: from a named pipe.
+	- TODO: from a table in memory.
+	- TODO: from an application program.
+
+? Supports the use of environment variables.
+
+	for what ?
+
+= Supports wildcards in the specification of input and output files, as well
+  as absolute path names and aliases.
+
+	- TODO: supports wildcards in the specification of input files.
+
++ Accepts and outputs fixed or variablelength records with delimited field.
+
+? Generates one or more output files, and/or summary information, including
+  formatted and dashboardready reports.
+
+- Returns sorted, merged, or joined records one (or more) at a time to an output
+  procedure, to stdout (or named pipe), a table in memory, one or more new or
+  existing files, or to a program.
+
+- Outputs optional sequence numbers with each record, at any starting value, for
+  indexed loads and/or reports.
+
+
+Record Selection and Grouping
+-----------------------------
+
+= Includes or omits input or output records using fieldtofield or fieldconstant
+  comparisons.
+
+	TODO: field-to-field comparisons
+
+- Compares on any number of data fields, using standard and alternate collating
+  sequences.
+
++ Sorts and/or reformats groups of selected records.
+
+	using SORT and CREATE statement.
+
++ Matches two or more sorted or unsorted files on inner and outer join criteria using
+  SQLbased condition syntax.
+
+	using JOIN with '+' or '-' statement.
+
+- Skips a specified number of records, bytes, or a file header or footer.
+
+- Processes a specified number of records or bytes, including a saved header.
+
+- Eliminates or saves records with duplicate keys.
+
+
+Sort Key Processing
+-------------------
+
++ Allows any number of key fields to be specified in ascending or
+  descending order.
+
+	using SORT x by x.f1 ASC; or
+	using SORT x by x.f1 DESC;
+
++ Supports any number of fields from 0 to 65,535 bytes in length.
+
+	almost unlimited, the limit is your memory.
+
++ Orders fixed position fields, or floating fields with one or more
+  delimiters.
+
+- Supports numeric keys, including all C, FORTRAN, and COBOL data types.
+
+- Supports single and multibyte character keys, including ASCII, EBCDIC,
+  ASCII in EBCDIC sequence, American, European, ISO and Japanese timestamps,
+  and natural (localedependent) values, as well as Unicode and doublebyte
+  characters such as Big5, EUCTW, UTF32, and SJIS.
+
+- Allows left or right alignment and case shifting of character keys.
+
+- Accepts user compare procedures for multibyte, encrypted and other
+  special data.
+
+- Performs record sequence checking.
+
++ Maintains input record order (stability) on duplicate keys.
+
+- Controls treatment of null fields when specifying floating
+  (character separated) keys.
+
+- Collates and converts between many of the following data types (formats):
+	---
+
+
+Record Reformatting
+-------------------
+
++ Inserts, removes, resizes, and reorders fields within records; defines new
+  fields.
+
+- Converts data in fields from one format to another either using internal
+  conversion.
+
+- Maps common fields from differently formatted input files to a uniform sort
+  record.
+
+= Joins any fields from several files into an output record, usually based on a
+  condition.
+
+	using JOIN statement. current support only in joining two input files.
+
+- Changes record layouts from one file type to another, including: Line
+  Sequential, Record Sequential, Variable Sequential, Blocked, Microsoft Comma
+  Separated Values (CSV), ACUCOBOL Vision, MF ISAM, MFVL, Unisys VBF, VSAM
+  (within UniKik MBM), Extended Log Format (W3C), LDIF, and XML.
+
+- Maps processed records to many differently formatted output files, including
+  HTML.
+
+- Writes multiple record formats to the same file for complex report
+  requirements.
+
+- Performs mathematical expressions and functions on field data (including
+  aggregate data) to generate new output fields.
+
+- Calculates the difference in days, hours, minutes and seconds betweeen
+  timestamps.
+
+
+Field Reformatting/Validation
+-----------------------------
+
+- Aligns desired field contents to either the left or right of the target
+  field, where any leading or trailing fill characters from the source are
+  moved to the opposite side of the string.
+
+- Processes values from multidimensional, tabdelimited lookup files.
+
+- Creates and processes substrings of original field contents, where you can
+  specify a positive or negative offset and a number of bytes to be contained
+  in the substring.
+
+- Finds a userspecified text string in a given field, and replaces all
+  occurrences of it with a different userspecified text string in the target
+  field.
+
+- Supports Perl Compatible Regular Expressions (PCRE), including pattern
+  matching.
+
+- Uses Cstyle “iscompare” functions to validate contents at the field level
+  (for example, to determine if all field characters are printable), which can
+  also be used for recordfiltering via selection statements.
+
+- Protects sensitive field data with fieldlevel deidentification and AES256
+  encryption routines, along with anonymization, pseudonymization, filtering
+  and other column-level data masking and obfuscation techniques.
+
+- Supports custom, userwritten fieldlevel transformation libraries, and
+  documents an example of a fieldlevel data cleansing routine from
+  Melissa Data (AddressObject).
+
+
+Record Summarization
+--------------------
+
+- Consolidates records with equal keys into unique records, while totaling,
+  averaging, or counting values in specified fields, including derived
+  (crosscalculated) fields.
+
+- Produces maximum, minimum, average, sum, and count fields.
+
+- Displays running summary value(s) up to a break (accumulating aggregates).
+
+- Nreaks on compound conditions.
+
+- Allows multiple levels of summary fields in the same report.
+
+- Remaps summary fields into a new format, allowing relational tables.
+
+- Ranks data through a running count with descending numeric values.
+
+- Writes detail and summary records to the same output file for structured
+  reports.
diff --git a/doc/dev/NOTES b/doc/dev/NOTES
new file mode 100644
index 0000000..92bf86c
--- /dev/null
+++ b/doc/dev/NOTES
@@ -0,0 +1,122 @@
+				sometimes i forgot why i write code like this.
+								-- S.T.M.L
+
+- follow linux coding style
+
+- priority of source code (4S) :
+	+ stable
+	+ simple
+	+ small
+	+ secure (this option does not need for this program)
+
+- keep as small as possible:
+	+ remove unneeded space
+	+ remove unneeded variable
+
+- write comment/documentation as clear as possible
+
+- learn to use:
+	+ if (1 == var)
+
+- learn to avoid:
+	+ (i < strlen(str))
+		on loop statement because strlen() need temporary variable.
+		try,
+			l = strlen(str);
+			while (i < l) { ... }
+
+- use function in libc as much as possible; if not, wrap it!
+
+
+
+001 - I/O Relation between Statement
+-----------------------------------------------------------------------------
+LOAD is an input statement.
+
+SORT, CREATE, JOIN is an output statement, but it can be an input.
+i.e:
+
+	1 - load abc ( ... ) as x;
+	2 - sort x by a, b;
+	3 - create ghi ( x.field, ... ) as out_x;
+
+file output created by sort statement in line 2 will be an input by create
+statement in line 3.
+
+
+002 - Why we need '2nd-loser'
+-----------------------------------------------------------------------------
+
+to minimize comparison and insert in merge tree.
+
+
+
+003 - Why we need 'level' on tree node
+-----------------------------------------------------------------------------
+
+list of input file to merge is A, B, C contain sorted data :
+
+	A : 10, 11, 12, 13      (1st file)
+	B : 1, 12, 100, 101     (2nd file)
+	C : 2, 13, 200, 201     (3rd file)
+
+if we use tree insert algorithm:
+
+	if (root < node)
+		insert to left
+	else
+		insert to right
+
+after several step we will get:
+
+B-12
+    \
+    C-13
+    /
+A-12
+
+which result in not-a-stable sort,
+
+	B-1 C-2 A-10 A-11 B-12 A-12 ...
+
+they should be,
+
+	B-1 C-2 A-10 A-11 A-12 B-12 ...
+
+Even if we choose different algorithm in insert:
+
+	if (root <= node)
+		insert to left
+	else
+		insert to right
+
+there is also input data that will violate this, i.e:
+
+	A : 2, 13, 200, 201     (1st file)
+	B : 1, 12, 100, 101     (2nd file)
+	C : 10, 11, 12, 13      (3rd file)
+
+
+004 - recursives call + thread + free on SunOS 5.10
+-----------------------------------------------------------------------------
+
+i did not investigate much, but doing a recursive call + thread + free cause
+SIGSEGV on SunOS 5.10 system, but not in GNU/Linux system. This odd's found
+whee testing on Solaris and by using dbx the SIGSEGV "sometimes" catched in
+str_destroy,
+
+	if (str->buf)
+		free(str->buf); <= dbx catch here
+
+and "sometimes" below that (but not in vos function/stack).
+
+i.e:
+	list_destroy(**ptr)
+	{
+		if (! (*ptr))
+			return;
+		list_destroy((*ptr)->next);
+		free((*ptr));
+	}
+
+and no, it's not about double free.
diff --git a/doc/dev/TODO b/doc/dev/TODO
new file mode 100644
index 0000000..3dc6001
--- /dev/null
+++ b/doc/dev/TODO
@@ -0,0 +1,10 @@
+>>
+- add set variable
+	set process_compare_case_sensitive; (default)
+	set process_compare_case_notsensitive;
+
+	set process_tmp_dir	"/path/to/tmp/dir";
+	set process_tmp_dir	"/another/tmp/dir";
+<< DONE
+
+- Produces maximum, minimum, average, sum, and count fields.
diff --git a/doc/dev/slog b/doc/dev/slog
new file mode 100644
index 0000000..72e2185
--- /dev/null
+++ b/doc/dev/slog
@@ -0,0 +1,206 @@
+				i have a odd habit: checking code every time
+				i get bored, which result an error some time.
+				this file prevent me to over checking it.
+							-- May Benot
+
+--- format ---
++ function_name
+@check		: XXXX XXXX
+@last-check	: year.month.day (last check)
+@auditor	: thisman@thatserver.com (last auditor)
+@desc		: fix algorithm
+--- tamrof ---
+
+
+vos_String
+-----------------------------------------------------------------------------
+
++ str_create
+@check		: X
+@last-check	: 2008.12.17
+@auditor	: ms@kilabit.info
+
++ str_append_c
+@check		: X
+@last-check	: 2008.12.19
+@auditor	: ms@kilabit.info
+
++ str_append
+@check		: X
+@last-check	: 2009.01.25
+@auditor	: ms@kilabit.info
+@desc		: fix len increment
+
++ str_detach
+@check		: X
+@last-check	: 2008.12.19
+@auditor	: ms@kilabit.info
+
++ str_rtrim
+@check		: XX
+@last-check	: 2009.01.25
+@auditor	: ms@kilabit.info
+@desc		: removed
+
++ str_prune
+@check		: X
+@last-check	: 2008.12.19
+@auditor	: ms@kilabit.info
+
++ str_destroy
+@check		: X
+@last-check	: 2008.12.19
+@auditor	: ms@kilabit.info
+
++ str_raw_copy
+@check		: X
+@last-check	: 2008.12.19
+@auditor	: ms@kilabit.info
+
++ str_raw_randomize
+@check		: XX
+@last-check	: 2009.01.25
+@auditor	: ms@kilabit.info
+@desc		: 'x' should not be replaced
+
++ str_raw_hash
+@check		: X
+@last-check	: 2008.12.19
+@auditor	: ms@kilabit.info
+
+
+vos_File
+-----------------------------------------------------------------------------
+
++ file_open
+@check		: X
+@last-check	: 2008.12.19
+@auditor	: ms@kilabit.info
+
++ file_read
+@check		: X
+@last-check	: 2009.01.25
+@auditor	: ms@kilabit.info
+
++ file_write
+@check		: X
+@last-check	: 2009.01.25
+@auditor	: ms@kilabit.info
+
++ file_fetch_until
+@check		: X
+@last-check	: 2009.01.25
+@auditor	: ms@kilabit.info
+
++ file_skip_until
+@check		: X
+@last-check	: 2009.01.25
+@auditor	: ms@kilabit.info
+
++ file_skip_space
+@check		: X
+@last-check	: 2009.01.25
+@auditor	: ms@kilabit.info
+
++ file_destroy
+@check		: X
+@last-check	: 2009.01.25
+@auditor	: ms@kilabit.info
+
++ file_raw_get_size
+@check		: X
+@last-check	: 2009.01.25
+@auditor	: ms@kilabit.info
+
++ file_raw_is_exist
+@check		: X
+@last-check	: 2009.01.25
+@auditor	: ms@kilabit.info
+
+
+vos_LL
+-----------------------------------------------------------------------------
+
++ ll_add
+@check		: X
+@last-check	: 2009.01.26
+@auditor	: ms@kilabit.info
+
++ ll_link
+@check		: X
+@last-check	: 2009.01.26
+@auditor	: ms@kilabit.info
+
++ ll_print
+@check		: X
+@last-check	: 2009.01.26
+@auditor	: ms@kilabit.info
+
++ ll_destroy
+@check		: X
+@last-check	: 2009.01.26
+@auditor	: ms@kilabit.info
+
+
+vos_Field
+-----------------------------------------------------------------------------
+
++ field_soft_copy
+@check		: X
+@last-check	: 2009.01.26
+@auditor	: ms@kilabit.info
+
++ field_add
+@check		: X
+@last-check	: 2009.01.26
+@auditor	: ms@kilabit.info
+
++ field_print
+@check		: X
+@last-check	: 2009.01.26
+@auditor	: ms@kilabit.info
+
++ _field_destroy
+@check		: X
+@last-check	: 2009.01.26
+@auditor	: ms@kilabit.info
+
+
+vos_Record
+-----------------------------------------------------------------------------
+
++ record_new
+@check		: X
+@last-check	: 2009.01.26
+@auditor	: ms@kilabit.info
+
++ _record_cmp
+@check		: X
+@last-check	: 2009.01.26
+@auditor	: ms@kilabit.info
+
++ record_add_field
+@check		: X
+@last-check	: 2009.01.26
+@auditor	: ms@kilabit.info
+
++ record_add_row
+@check		: X
+@last-check	: 2009.01.26
+@auditor	: ms@kilabit.info
+
++ record_prune
+@check		: X
+@last-check	: 2009.01.26
+@auditor	: ms@kilabit.info
+
++ record_destroy
+@check		: X
+@last-check	: 2009.01.26
+@auditor	: ms@kilabit.info
+
++ record_print
+@check		: X
+@last-check	: 2009.01.26
+@auditor	: ms@kilabit.info
+
diff --git a/doc/dev/test b/doc/dev/test
new file mode 100644
index 0000000..ff6cde3
--- /dev/null
+++ b/doc/dev/test
@@ -0,0 +1 @@
++ Accepts and outputs fixed or variablelength records with delimited field.
diff --git a/doc/dev/vos-sketch.odg b/doc/dev/vos-sketch.odg
new file mode 100644
index 0000000..45b86a6
--- /dev/null
+++ b/doc/dev/vos-sketch.odg
diff --git a/doc/dev/vos.test.create.log b/doc/dev/vos.test.create.log
new file mode 100644
index 0000000..027c53f
--- /dev/null
+++ b/doc/dev/vos.test.create.log
@@ -0,0 +1,113 @@
+2009.10.12
+
+Comparing vos create process time by setting process max row and buffer size
+==============================================================================
+
+						is it disk or algorithm ?
+
+- file input size : 257,985,910 byte (~ 250 MB)
+- format of input field:
+
+	'\'':field01:'\''::';',
+
+- format of output field:
+
+	'':field01:''::'|',
+
+- number of field in input & output : 11 field
+- process max : 2 (this option does not effect process actually)
+
+
+system copy time
+==============================================================================
+
+	real    0m2.906s
+	user    0m0.010s
+	sys     0m0.747s
+
+
+vos load+create
+==============================================================================
+
+test 000
+--------
+o process max row	: 100,000
+o file buffer size	: 8192
+
+	real    0m30.243s
+	user    0m55.680s
+	sys     0m1.567s
+
+
+test 001
+--------
+o process max row	: 100,000
+o file buffer size	: 1,024,000
+
+	real    0m30.296s
+	user    0m55.536s
+	sys     0m1.790s
+
+
+test 002
+--------
+
+o process max row	: 200,000
+o file buffer size	: 1,024,000
+
+	real    0m30.115s
+	user    0m55.956s
+	sys     0m1.500s
+
+
+test 003
+--------
+
+o process max row	: 100,000
+o file buffer size	: 51,200,000
+
+	real    0m29.924s
+	user    0m55.443s
+	sys     0m1.563s
+
+
+test 004
+--------
+
+o process max row	: 500,000
+o file buffer size	: 51,200,000
+
+	real    0m32.795s
+	user    0m57.013s
+	sys     0m1.697s
+
+
+(source change)
+before:
+- int record_read_filtered(struct Record **R, struct File *F,
+				struct Field *fld);
+after:
+- int record_read_filtered(struct Record **R, struct File *F,
+				struct Field *fld, struct String *str);
+
+
+test 005
+--------
+
+o process max row	: 100,000
+o file buffer size	: 8,192
+
+	real    0m29.783s
+	user    0m54.253s
+	sys     0m1.867s
+
+
+test 006
+--------
+
+o process max row	: 100,000
+o file buffer size	: 51,200,000
+
+	real    0m30.364s
+	user    0m56.000s
+	sys     0m1.570s
diff --git a/doc/dev/vos.test.create.mem.log b/doc/dev/vos.test.create.mem.log
new file mode 100644
index 0000000..f5db2f6
--- /dev/null
+++ b/doc/dev/vos.test.create.mem.log
@@ -0,0 +1,50 @@
+
+ How much vos load+create use memory
+------------------------------------------------------------------------------
+
+o input file size	: 51,521,908
+o input rows		: 501,000
+o input fields		: 11
+o output fields		: 11
+o process max row	: 100,000
+o process max		: 2
+
+
+2009.01.14 - test 000
+------------------------------------------------------------------------------
+
+0 file buffer size		: 51,200,000
+o bytes allocated		: 296,833,772
+o allocs			:  11,000,473
+o running time (w/o memcheck)	:
+
+	real    0m6.187s
+	user    0m10.789s
+	sys     0m0.363s
+
+
+2009.01.15 - test 001
+------------------------------------------------------------------------------
+
+o with new vos_process_create algorithm
+0 file buffer size		: 51,200,000
+o bytes allocated		: 166,820,858 (~ 3 * input file size)
+o allocs			: 3,500,662
+o running time (w/o memcheck)	:
+
+	real    0m4.565s
+	user    0m8.026s
+	sys     0m0.327s
+
+
+2009.01.16 - test 002
+------------------------------------------------------------------------------
+o file buffer size		: 8192 (default)
+o bytes allocated		: 64,437,110 (~ 1.2 * input file size :)
+o allocs			: 3,500,652
+o running time (w/o memcheck)	:
+
+	real    0m4.361s
+	user    0m7.763s
+	sys     0m0.283s
+
diff --git a/doc/dev/vos.test.join.log b/doc/dev/vos.test.join.log
new file mode 100644
index 0000000..d3557f3
--- /dev/null
+++ b/doc/dev/vos.test.join.log
@@ -0,0 +1,39 @@
+ How fast vos_join is and how much memory does it's used
+------------------------------------------------------------------------------
+
+o input file size 1 (already sorted)	: 40,499,908
+o input file size 2 (already sorted)	: 40,499,908
+
+o input rows		: 501,000
+o input fields		: 11
+o output fields		: 22
+
+o process max row	: 100,000
+o process max		: 2
+o file buffer size	: 8192 bytes
+
+
+2009.01.18 - test 000
+------------------------------------------------------------------------------
+
+o allocs			: 24,048,866
+o bytes allocated		: 417,724,740 (~ 5 * inputs file size)
+o running time (w/o memcheck)	:
+
+	real    0m9.118s
+	user    0m8.483s
+	sys     0m0.237s
+
+
+2009.01.18 - test 001
+------------------------------------------------------------------------------
+
+o with new vos_join algorithm
+o allocs			: 542
+o bytes allocated		: 42,134 (~ 0.2 * inputs file size)
+o running time (w/o memcheck)	:
+
+	real    0m5.336s
+	user    0m4.833s
+	sys     0m0.333s
+
diff --git a/doc/dev/vos.test.sort-00.log b/doc/dev/vos.test.sort-00.log
new file mode 100644
index 0000000..d297e74
--- /dev/null
+++ b/doc/dev/vos.test.sort-00.log
@@ -0,0 +1,65 @@
+	How much vos load+sort use memory
+------------------------------------------------------------------------------
+
+o input file size	: 51,521,908
+o input rows		: 501,000
+o input fields		: 11
+o output fields		: 11
+o sorted fields		: field03
+o process max row	: 100,000
+o process max		: 2
+
+
+2009.01.16 - test 000
+------------------------------------------------------------------------------
+
+o file buffer size		: 8192 (default)
+o allocs			: 24,048,740
+o bytes allocated		: 417,820,691 (~ 8 * input file size)
+o running time (w/o memcheck)	:
+
+	real    0m12.341s
+	user    0m15.849s
+	sys     0m0.627s
+
+
+2009.01.16 - test 001
+------------------------------------------------------------------------------
+
+o file buffer size		: 51,200,000
+o allocs			: 24,048,751
+o bytes allocated		: 1,185,697,974 (~ 23 * input file size)
+o running time (w/o memcheck)	:
+
+	real    0m12.341s
+	user    0m15.849s
+	sys     0m0.627s
+
+
+
+2009.01.16 - test 002
+------------------------------------------------------------------------------
+
+o with new sort_process algorithm
+o file buffer size		: 8192
+o allocs			: 18,624,738
+o bytes allocated		: 332,184,755 (~ 6 * input file size)
+o running time (w/o memcheck)	:
+
+	real    0m10.314s
+	user    0m13.059s
+	sys     0m0.583s
+
+
+2009.01.17 - test 003
+------------------------------------------------------------------------------
+
+o with new sort_process & vos_sort_merge algorithm
+o file buffer size		: 8192
+o allocs			: 6,600,924
+o bytes allocated		: 123,352,391 (~ 2 * input file size)
+o running time (w/o memcheck)	:
+
+	real    0m6.936s
+	user    0m9.379s
+	sys     0m0.560s
diff --git a/doc/user/example/album.data b/doc/user/example/album.data
new file mode 100644
index 0000000..af667d6
--- /dev/null
+++ b/doc/user/example/album.data
@@ -0,0 +1,5 @@
+'You Forgot it in People'   1
+'Burn'                      5
+'Get Lifted'                4
+'The Joshua Tree'           2
+'Broken Social Scene'       1
diff --git a/doc/user/example/artist.data b/doc/user/example/artist.data
new file mode 100644
index 0000000..0178395
--- /dev/null
+++ b/doc/user/example/artist.data
@@ -0,0 +1,5 @@
+1,"Broken Social Scene"
+2,"U2"
+3,"Led Zeppelin"
+4,"John Legend"
+5,"Deep Purple"
diff --git a/doc/user/example/create_artist_album.vos b/doc/user/example/create_artist_album.vos
new file mode 100644
index 0000000..7e530a8
--- /dev/null
+++ b/doc/user/example/create_artist_album.vos
@@ -0,0 +1,19 @@
+#
+# example of create statement
+#
+LOAD "artist.data" (
+            :idx :    ::',',
+        '"':name:'"'::
+) as artist;
+
+LOAD "album.data" (
+        '\'':title     :'\''::,
+            :artist_idx:    :28:28
+) as album;
+
+CREATE "create_artist_album.data" from artist, album (
+	    :artist.idx      :    ::'|',
+	'"':artist.name     :'"'::'|',
+	    :album.artist_idx:    ::'|',
+	'[' :album.title     :']' ::
+);
diff --git a/doc/user/example/filter_artist_album.vos b/doc/user/example/filter_artist_album.vos
new file mode 100644
index 0000000..5e989c5
--- /dev/null
+++ b/doc/user/example/filter_artist_album.vos
@@ -0,0 +1,21 @@
+#
+# example of using create with filter
+#
+LOAD "artist.data" (
+	   :idx :::',',
+	'"':name:'"'::
+) as artist;
+
+LOAD "album.data" (
+        '\'':title     :'\''::,
+            :artist_idx:    :28 :28
+) as album;
+
+CREATE "filter_artist_album.data" from artist, album (
+	   :artist.idx      :::'|',
+	'"':artist.name     :'"'::'|',
+	'[':album.title     :']'::
+) FILTER (
+	ACCEPT artist.idx = 1,
+	REJECT album.artist_idx != 1
+);
diff --git a/doc/user/example/join_artist_album.vos b/doc/user/example/join_artist_album.vos
new file mode 100644
index 0000000..d8d2e2c
--- /dev/null
+++ b/doc/user/example/join_artist_album.vos
@@ -0,0 +1,16 @@
+#
+# example of join statement
+#
+LOAD "artist.data" (
+            :idx :    ::',',
+        '"':name:'"'::
+) as artist;
+
+LOAD "album.data" (
+        '\'':title     :'\''::,
+            :artist_idx:    :28 :28
+) as album;
+
+JOIN artist, album INTO "join_artist_album.data" (
+	artist.idx = album.artist_idx
+);
diff --git a/doc/user/example/sort_album.vos b/doc/user/example/sort_album.vos
new file mode 100644
index 0000000..18c0286
--- /dev/null
+++ b/doc/user/example/sort_album.vos
@@ -0,0 +1,9 @@
+#
+# example of sort statement with two field
+#
+LOAD "album.data" (
+        '\'':title     :'\''::,
+            :artist_idx:    :28:28
+) as album;
+
+SORT album BY artist_idx, title INTO "album_sorted.data";
diff --git a/doc/user/example/sort_artist.vos b/doc/user/example/sort_artist.vos
new file mode 100644
index 0000000..39fdda8
--- /dev/null
+++ b/doc/user/example/sort_artist.vos
@@ -0,0 +1,9 @@
+#
+# example of sort statement by descending order
+#
+LOAD "artist.data" (
+           :idx :::',',
+        '"':name:'"'::
+) as artist;
+
+SORT artist BY name DESC;
diff --git a/doc/user/image/create_stmt.png b/doc/user/image/create_stmt.png
new file mode 100644
index 0000000..8c6ae84
--- /dev/null
+++ b/doc/user/image/create_stmt.png
diff --git a/doc/user/image/field_clause.png b/doc/user/image/field_clause.png
new file mode 100644
index 0000000..ca5c387
--- /dev/null
+++ b/doc/user/image/field_clause.png
diff --git a/doc/user/image/filter_clause.png b/doc/user/image/filter_clause.png
new file mode 100644
index 0000000..ff0d5a0
--- /dev/null
+++ b/doc/user/image/filter_clause.png
diff --git a/doc/user/image/join_rules.png b/doc/user/image/join_rules.png
new file mode 100644
index 0000000..4b564ba
--- /dev/null
+++ b/doc/user/image/join_rules.png
diff --git a/doc/user/image/join_stmt.png b/doc/user/image/join_stmt.png
new file mode 100644
index 0000000..5dad5a0
--- /dev/null
+++ b/doc/user/image/join_stmt.png
diff --git a/doc/user/image/load_stmt.png b/doc/user/image/load_stmt.png
new file mode 100644
index 0000000..401d391
--- /dev/null
+++ b/doc/user/image/load_stmt.png
diff --git a/doc/user/image/set_stmt.png b/doc/user/image/set_stmt.png
new file mode 100644
index 0000000..a604664
--- /dev/null
+++ b/doc/user/image/set_stmt.png
diff --git a/doc/user/image/sort_stmt.png b/doc/user/image/sort_stmt.png
new file mode 100644
index 0000000..1f45546
--- /dev/null
+++ b/doc/user/image/sort_stmt.png
diff --git a/doc/user/style/page.css b/doc/user/style/page.css
new file mode 100644
index 0000000..18b3d67
--- /dev/null
+++ b/doc/user/style/page.css
@@ -0,0 +1,68 @@
+body
+{
+	font-family		: sans-serif;
+}
+
+h2
+{
+        background-color        : black;
+        color                   : white;
+	padding			: 4px 4px 4px 16px;
+        border-bottom           : solid;
+        border-color            : silver;
+	margin-top		: 4em;
+}
+
+h3
+{
+	background-color	: silver;
+	padding			: 4px 4px 4px 16px;
+	border-bottom		: solid;
+	border-color		: black;
+	margin-top		: 2em;
+}
+
+h4
+{
+	padding			: 4px 4px 4px 16px;
+	border-bottom		: dashed;
+	border-color		: black;
+	border-width		: 1px;
+	margin-top		: 2em;
+}
+
+.quote
+{
+	font-family		: serif;
+	font-style		: italic;
+	text-align		: right;
+	margin-left		: 50%;
+}
+
+.script
+{
+        font-family     : monospace;
+}
+
+.box-script
+{
+        background      : #F0F0F0;
+        border          : solid;
+        border-width    : thin;
+        font-family     : monospace;
+	padding		: 8px;
+	margin-left	: 2em;
+}
+
+.list-desc
+{
+	margin-left	: 2em;
+}
+
+.image
+{
+	border		: solid;
+	border-width	: 2px;
+	border-color	: gray;
+	margin		: 1em;
+}
diff --git a/doc/user/vos_user_manual.html b/doc/user/vos_user_manual.html
new file mode 100644
index 0000000..ed5618c
--- /dev/null
+++ b/doc/user/vos_user_manual.html
@@ -0,0 +1,809 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
+ "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<title>Vos User Manual</title>
+<style type="text/css">
+@import url(style/page.css);
+</style>
+</head>
+
+<body>
+<h1> Vos User Manual </h1>
+
+<h2> Table of Contents </h2>
+<ul>
+<li><a href="#intro">Introduction</a></li>
+<li><a href="#ch00">Building and Compiling Vos</a></li>
+	<ul>
+	<li><a href="#softreq">Software Requirements</a></li>
+	<li><a href="#make">Compiling Vos From Source</a></li>
+	</ul>
+<li><a href="#ch01">Running Vos</a></li>
+	<ul>
+	<li><a href="#vos_env">Vos Environments</a></li>
+	</ul>
+<li><a href="#ch02">Vos Script</a></li>
+	<ul>
+	<li><a href="#vos_var">Vos Variables</a></li>
+	<li><a href="#vos_stmt">Vos Statements</a></li>
+		<ul>
+		<li><a href="#set_stmt">Set Statement</a></li>
+		<li><a href="#sort_stmt">Sort Statement</a></li>
+		<li><a href="#create_stmt">Create Statement</a></li>
+		<li><a href="#join_stmt">Join Statement</a></li>
+		<li><a href="#field_clause">Field Clause</a></li>
+		<li><a href="#filter_clause">Filter Clause</a></li>
+		</ul>
+	</ul>
+<li><a href="#license">Vos License</a></li>
+</ul>
+
+<h2 id="intro"> Introduction </h2>
+Vos is a program to process formatted data, i.e. CSV data.
+Vos is designed to process a large input file, a file where their size is
+larger than the size of memory, and can be tuned to adapt with your machine
+environment.
+<br /><br />
+Vos currently has four main features,
+<ul>
+<li>Sorting</li>
+<p class="list-desc">
+Vos can sort one or more field in input file in one pass.
+</p>
+<li>Reformatting</li>
+<p class="list-desc">
+By declaring only specific fields or using a different separator in vos
+script, Vos can re-create a new file with a new format and data.
+</p>
+<li>Filtering</li>
+<p class="list-desc">
+Vos can omit and include specific data in specific field/record.
+</p>
+<li>Join</li>
+<p class="list-desc">
+Vos can join two file into one file with all or specific fields included in
+output file.
+</p>
+</ul>
+
+<h2 id="ch00">
+Building and Compiling Vos </h2>
+<p class="quote">
+Vos was developed on GNU/Linux system, so any prerequisite below only valid
+on system that running GNU/Linux system. Usually, any Unix like system
+could compile the source, it just does not fully tested yet.
+</p>
+
+<h3 id="softreq" name="softreq">
+Software Requirements </h3>
+
+<p> This software/tools below is used in developing Vos, therefore we
+recommended you to use the same or greater version when building Vos from the
+source.</p>
+<ul>
+<li> GNU C Compiler
+	<ul>
+	<li> Version : 4.2.1_20070724 </li>
+	<li> Website :
+	<a href="http://gcc.gnu.org"> http://gcc.gnu.org </a></li>
+	</ul>
+</li>
+<li> glibc development
+	<ul>
+	<li> Version : 2.6.1 </li>
+	<li> Website :
+<a href="http://www.gnu.org/software/libc/libc.html">
+http://www.gnu.org/software/libc/libc.html </a> </li>
+	</ul>
+</li>
+<li> GNU Make
+	<ul>
+	<li> Version : 3.81 </li>
+	<li> Website :
+<a href="http://www.gnu.org/software/make/make.html">
+http://www.gnu.org/software/make/make.html </a> </li>
+	</ul>
+</li>
+</ul>
+
+
+<h3 id="make" name="make">
+Compiling Vos From Source </h3>
+<p class="quote">
+This step assume that you already get the source and saved into your machine.
+</p>
+
+<pre class="box-script">
+	$ tar jxvf vos-xxxx.xx.xx.tar.bz2
+	$ cd vos/src
+	$ make
+</pre>
+
+<p>
+Where xxxx.xx.xx is Vos version (depend on which version that has been
+downloaded). When running "make", make program will create directory "build"
+in "vos" directory, vos executable is placed in there (vos/build).
+</p>
+<p>
+For later use, you should copy Vos executable to your PATH directory. In
+example:
+</p>
+<pre class="box-script">
+	$ pwd
+	/home/johndoe/tmp/vos/src
+	$ echo $PATH
+	/home/johndoe/bin:/usr/local/bin:/usr/bin:/bin
+	$ cp ../build/vos /home/johndoe/bin
+</pre>
+
+<h2 id="ch01">
+Running Vos </h2>
+<p>
+Vos program only have one parameter: vos script.
+</p>
+<pre class="box-script">
+	vos &lt; vos-script &gt;
+</pre>
+<p>
+<span class="script">vos-script</span> is a file contains <a href="#vos_stmt">
+vos statements </a> that will be executed and processed.
+</p>
+
+<h3 id="vos_env"> Vos Environments </h3>
+<p>
+Before running Vos program, there are severals environment variables that you
+can set to change behaviour of program while running. Some of the environment
+variable also can be set at the vos script using <a href="#vos_var"> Vos
+variables. </a>
+</p>
+
+<ul>
+<li><span class="script"> VOS_DEBUG &lt; number &gt; </span>
+<p class="list-desc">
+Default value : 0
+<br /><br />
+This variable is an optional and used only for debugging, normal use/user
+should not use this parameter. The <span class="script"> VOS_DEBUG </span>
+environment variables can have a value,
+</p>
+<ul>
+<li> 1 : for testing the script only, not process it </li>
+<li> 2 : for debugging parsing process </li>
+<li> 4 : for debugging sort process </li>
+<li> 8 : for debugging create process </li>
+<li> 16 : for debugging join process </li>
+</ul>
+<p class="list-desc">
+Those value can be combined to get more debug output.
+</p>
+<p class="list-desc">
+Example on setting <span class="script"> VOS_DEBUG </span> variable on Bash
+shell,
+</p>
+<pre class="box-script">
+	$ export VOS_DEBUG=3
+</pre>
+<p class="list-desc">
+this value will tell Vos program to debug parsing process (2) but will not
+process the script (1).
+</p>
+</li>
+</br>
+<!-- VOS_DEBUG end -->
+
+<li>
+<span class="script"> VOS_FILE_BUFFER_SIZE &lt; number &gt; </span>
+<p class="list-desc">
+Default value : 8192
+<br /><br />
+This variable is used to set size of buffer for read/write on file, in bytes.
+<br />
+This example set buffer size to ~ 1 MB,
+</p>
+<pre class="box-script">
+	$ export VOS_FILE_BUFFER_SIZE=1000000;
+</pre>
+</li>
+<br />
+<!-- VOS_FILE_BUFFER_SIZE end -->
+
+<li>
+<span class="script"> VOS_COMPARE_CASE &lt; number &gt; </span>
+<p class="list-desc">
+Default value : 0
+<br /><br />
+This variable affect order on sort output.
+<br />
+If <span class="script"> VOS_COMPARE_CASE </span> is set to 0, "B"
+will come first then "a", but
+<br />
+if <span class="script"> VOS_COMPARE_CASE </span> is set to 1, "a"
+will come first then "B".
+<br />
+Example on how to set it on Bash shell,
+</p>
+<pre class="box-script">
+	$ export VOS_COMPARE_CASE=0; # or
+	$ export VOS_COMPARE_CASE=1;
+</pre>
+</li>
+<br />
+<!-- VOS_COMPARE_CASE end -->
+
+<li>
+<span class="script"> VOS_PROCESS_MAX &lt; number &gt; </span>
+<p class="list-desc">
+Default value : 2
+<br /><br />
+This variable affect on how many thread will be used for sort process. The
+recommended value is equal to a number of processor that you have on your
+machine.
+<br />
+Example on how to set it on Bash shell,
+</p>
+<pre class="box-script">
+	$ export VOS_PROCESS_MAX=8;
+</pre>
+</li>
+<br />
+<!-- VOS_PROCESS_MAX end -->
+
+<li>
+<span class="script"> VOS_PROCESS_MAX_ROW &lt; number &gt;</span>
+<br />
+<br />
+<p class="list-desc">
+Default value : 100,000
+<br /><br />
+This variable affect on how many "row" that program must keep in memory before
+writen to temporary file.
+<br />
+Example on how to use it:
+</p>
+<pre class="box-script">
+	$ export VOS_PROCESS_MAX_ROW=400000;
+</pre>
+</li>
+<br />
+<!-- VOS_PROCESS_MAX_ROW -->
+
+<li>
+<span class="script">
+VOS_TMP_DIR
+&lt;double-quote&gt;&lt;string&gt;&lt;double-quote&gt;[:] ... </span>
+<br />
+<br />
+<p class="list-desc">
+Default value : /tmp/
+<br /><br />
+While in sort process, program sometime use temporary file. This temporary
+file usually, as default, placed in "/tmp/" directory. You can add two or more
+directories as temporary directory, as long as there is free space and user
+who run the Vos program has a write access to it.
+<br /><br />
+We recommended that you to use a temporary directory that has a place in a
+different disk than input file, for technical reason it's decrease processing
+time.
+<br /><br />
+</p>
+<pre class="box-script">
+	$ export VOS_TMP_DIR="/var/tmp/";
+</pre>
+<p class="list-desc">
+which result that program will use "/tmp/" (default from program), "/var/tmp/",
+and "/media/tmp/" as temporary directories.
+<br /><br />
+Another example :
+</p>
+<pre class="box-script">
+	$ export VOS_TMP_DIR	"/media/tmp/":"/disk01/";
+</pre>
+<p class="list-desc">
+which result that program will use "/media/tmp/" and "/disk01/" as temporary
+directories.
+</p>
+</li>
+<br />
+
+
+</ul> <!-- Vos Environment -->
+
+<h2 id="ch02"> Vos Script </h2>
+<p>
+To illustrate on how Vos script work, we will use two input files as an example
+here, "artist.data" and "album.data".
+</p>
+
+<span class="script"> artist.data </span>
+<pre class="box-script">
+1,"Broken Social Scene"
+2,"U2"
+3,"Led Zeppelin"
+4,"John Legend"
+5,"Deep Purple"
+</pre>
+
+<span class="script"> album.data </span>
+<pre class="box-script">
+'You Forgot it in People'   1
+'Burn'                      5
+'Get Lifted'                4
+'The Joshua Tree'           2
+'Broken Social Scene'       1
+</pre>
+
+
+<h3 id="vos_var"> Vos Variables </h3>
+<p class="quote">
+Vos variable is used with "SET" statement.
+</p>
+<p>
+Vos variable is used to adapt with the environment where Vos will be running.
+For example, let say that you have a machine with 8 processor and 16 GB of
+memory and you want to sort 20,000,000 rows of data with it's size maybe
+about 2 GB. Instead of using default maximum row (which is 100,000) with two
+thread you can set maximum row to 2,500,000 and maximum thread to 8, which
+will decrease processing time.
+</p>
+<p>
+There are two method to set Vos variable, first, by explicitly defined it on
+vos script by using <span class="script">SET</span> statement; second, by
+defined in environment variable using shell <span class="script">set</span> or
+<span class="script">export</span>.
+</p>
+
+<ul>
+<li><span class="script"> FILE_BUFFER_SIZE &lt;number&gt; </span>
+<br />
+<br />
+<p class="list-desc">
+Default value : 8192
+<br /><br />
+This variable is used to set size of buffer for read/write on file, in bytes.
+<br />
+This example set buffer size to ~ 1 MB,
+</p>
+<pre class="box-script">
+	set FILE_BUFFER_SIZE 1000000;
+</pre>
+</li>
+<br />
+
+<li><span class="script">PROCESS_COMPARE_CASE_SENSITIVE</span> (default) </li>
+<li><span class="script">PROCESS_COMPARE_CASE_NOTSENSITIVE</span>
+<br />
+<br />
+<p class="list-desc">
+This variable affect order on sort output.
+<br />
+If <span class="script"> PROCESS_COMPARE_CASE_SENSITIVE </span> is used, "B"
+will come first then "a", but
+<br />
+if <span class="script"> PROCESS_COMPARE_CASE_NOTSENSITIVE </span> is used "a"
+will come first then "B".
+<br />
+Example on how to use it:
+</p>
+<pre class="box-script">
+	set PROCESS_COMPARE_CASE_SENSITIVE;
+	set PROCESS_COMPARE_CASE_NOTSENSITIVE;
+</pre>
+</li>
+<br />
+
+<li>
+<span class="script"> PROCESS_MAX &lt;number&gt; </span>
+<br />
+<br />
+<p class="list-desc">
+Default value : 2
+<br /><br />
+This variable affect on how many thread will be used for sort process. The
+recommended value is equal to a number of processor that you have on your
+machine.
+<br />
+Example on how to use it:
+</p>
+<pre class="box-script">
+	set PROCESS_MAX 8;
+</pre>
+</li>
+<br />
+
+<li>
+<span class="script"> PROCESS_MAX_ROW &lt;number&gt;</span>
+<br />
+<br />
+<p class="list-desc">
+Default value : 100,000
+<br /><br />
+This variable affect on how many "row" that program must keep in memory before
+writen to temporary file.
+<br />
+Example on how to use it:
+</p>
+<pre class="box-script">
+	set PROCESS_MAX_ROW 400000;
+</pre>
+</li>
+<br />
+
+<li>
+<span class="script">
+PROCESS_TEMPORARY_DIRECTORY
+[:]&lt;double-quote&gt;&lt;string&gt;&lt;double-quote&gt;[:] ... </span>
+<br />
+<br />
+<p class="list-desc">
+Default value : /tmp/
+<br /><br />
+While in sort process, program sometime use temporary file. This temporary
+file usually, as default, placed in "/tmp/" directory. You can add two or more
+directories as temporary directory, as long as there is free space and user
+who run the Vos program has a write access to it.
+<br /><br />
+We recommended you to use a temporary directory that has a place in a
+different disk than input file, for technical reason it's decrease processing
+time.
+<br /><br />
+When <span class="script">':'</span> is set as the first character in a string
+value then the rest of value is added to the list of temporary directory,
+which means the last or the default temporary directory will not be replaced.
+This setting allow you to add several directories in two or more SET
+statement.
+In example :
+</p>
+<pre class="box-script">
+	SET PROCESS_TEMPORARY_DIRECTORY		:"/var/tmp/";
+	SET PROCESS_TEMPORARY_DIRECTORY		:"/media/tmp/";
+</pre>
+<p class="list-desc">
+which result that program will use "/tmp/" (default from program), "/var/tmp/",
+and "/media/tmp/" as temporary directories.
+<br />
+Another example :
+</p>
+<pre class="box-script">
+	SET PROCESS_TEMPORARY_DIRECTORY		:"/var/tmp/";
+	SET PROCES_TEMPORARY_DIRECTORY		"/media/tmp/":"/disk01/";
+</pre>
+<p class="list-desc">
+which result that program will use "/media/tmp/" and "/disk01/" as temporary
+directories but not include "/var/tmp" because it's has been override by the
+last SET statement.
+</p>
+</li>
+<br />
+</ul><!-- end of vos variable list -->
+
+
+<h3 id="vos_stmt"> Vos Statements </h3>
+<p class="quote">
+Vos script is not case sensitive, "Load" is equal with "LOAD".
+</p>
+
+<h4 id="set_stmt"> Set Statement </h4>
+<img class="image" src="image/set_stmt.png" alt="vos set statement"/> <br />
+For example on how to use Set Statement and list of variable see
+<a href="#vos_var">Vos Variable.</a>
+
+<h4 id="load_stmt"> Load Statement </h4>
+<img class="image" src="image/load_stmt.png" alt="vos load statement" /> <br />
+<ul>
+<li><i>/path/to/input/file</i> is path to input file that you want to process.
+</li>
+<li> Input file must be enclosed by a double-quote character. </li>
+<li> For field declaration see <a href="#field_clause"> Field Clause </a></li>
+<li> <i> alias </i> is optional, but we recommended you to use it for easy of use
+later.  </li>
+<li> Declaring Load statement only does not process anything, it just for
+defining your input file and their fields. </li>
+</ul>
+<p>
+Example on using Load Statement:
+</p>
+<pre class="box-script">
+LOAD "artist.data" (
+	   :idx :   ::',',
+	'"':name:'"'::
+) as artist;
+
+LOAD "album.data" (
+	'\'':title     :'\''::,
+	    :artist_idx:    :28:28
+) as album;
+</pre>
+
+<h4 id="sort_stmt"> Sort Statement </h4>
+<img class="image" src="image/sort_stmt.png" alt="vos sort statement" />
+<ul>
+<li>Please note that the default setting in sort is <u>case sensitive</u> and in
+	<u>ascending</u> order. To change case sensitive to not-sensitive use
+	<a href="#set_stmt"> SET statement.</a>
+<li>If <span class="script">INTO</span> is not defined then sort output will
+	be written into a file "sort.XXXXXXXX", where "XXXXXXXX" will be
+	replaced by a random characters.
+</ul>
+<p>
+Example on using Sort Statement:
+<br />
+<br />
+This script will sort <span class="script"> artist.data </span> by
+<span class="script"> name </span> (second field) on descending order,
+</p>
+<pre class="box-script">
+LOAD "artist.data" (
+	   :idx :   ::',',
+	'"':name:'"'::
+) as artist;
+
+SORT artist BY name DESC;
+</pre>
+<p> If you run the script the output would be like this, </p>
+<pre class="box-script">
+2|U2
+3|Led Zeppelin
+4|John Legend
+5|Deep Purple
+1|Broken Social Scene
+</pre>
+
+<p>
+This script will sort
+<span class="script"> album.data </span> by
+<span class="script"> artist_idx </span> (second field) then by
+<span class="script"> title </span> (first field) and save the output to a
+file <span class="script"> album_sorted.data </span>.
+</p>
+<pre class="box-script">
+LOAD "album.data" (
+	'\'':title     :'\''::,
+	    :artist_idx:    :28:28
+) as album;
+
+SORT album BY artist_idx, title INTO "album_sorted.data";
+</pre>
+<p> If you run the script the output would be like this, </p>
+<pre class="box-script">
+Broken Social Scene|1
+You Forgot it in People|1
+The Joshua Tree|2
+Get Lifted|4
+Burn|5
+</pre>
+
+<h4 id="create_stmt"> Create Statement </h4>
+<img class="image" src="image/create_stmt.png" alt="vos create statement" />
+<p>
+Create statement is used to create a new data with new format or with
+different field output order. <br />
+Create statement also can be used to combine several input file into one file.
+</p>
+
+<ul>
+<li><i>/path/to/output/file</i> is path to output file where data will be
+written. This value must be enclosed with double quote.</li>
+<li> For field declaration see <a href="#field_clause"> Field Clause </a>.</li>
+<li> For filter declaration see
+<a href="#filter_clause"> Filter Clause </a>.</li>
+</ul>
+
+<p>
+Example on using Create Statement,
+<br />
+<br />
+This script will combine <span class="script"> artist.data </span> and
+<span class="script"> album.data </span> into one file, fields will be
+separated by <span class="script">'|'</span>.
+</p>
+
+<pre class="box-script">
+LOAD "artist.data" (
+	   :idx :   ::',',
+	'"':name:'"'::
+) as artist;
+
+LOAD "album.data" (
+	'\'':title:'\''::,
+	    :artist_idx::28:28
+) as album;
+
+CREATE "artist_album.data" from artist, album (
+	   :artist.idx      :   ::'|',
+	'"':artist.name     :'"'::'|',
+	   :album.artist_idx:   ::'|',
+	'[':album.title     :']'::
+);
+</pre>
+
+<p> If you run the script the output would be like this, </p>
+<pre class="box-script">
+1|"Broken Social Scene"|1|[You Forgot it in People]
+2|"U2"|5|[Burn]
+3|"Led Zeppelin"|4|[Get Lifted]
+4|"John Legend"|2|[The Joshua Tree]
+5|"Deep Purple"|1|[Broken Social Scene]
+</pre>
+
+<h4 id="join_stmt"> Join Statement </h4>
+<img class="image" src="image/join_stmt.png" alt="vos joint statement" />
+<img class="image" src="image/join_rules.png" alt="vos join rules" />
+<p>
+Join statement is used to combine two input file into one file, like create
+statement, but using specific fields as a matching rule.
+</p>
+
+<ul>
+<li> if <span class="script"> '+' </span> is defined then the match row and
+	non-match row will be writen to output file. </li>
+<li> if <span class="script"> '-' </span> is defined then the match row will
+	not be writen but non-match row will be writen to output file.</li>
+<li> if non of <span class="script"> '+' </span> and
+	<span class="script"> '-' </span> is defined only the match row will
+	be writen to output file. </li>
+<li> Default input file is in <span class="script">UNSORTED</span>. It's
+	important to defined <span class="script">SORTED</span> if you know
+	that input file is already sorted, so Vos will not sort them again before
+	processing join. </li>
+<li> If <span class="script">INTO</span> is not defined then output file will
+	be writen to a file "join.XXXXXXXX" where "XXXXXXXX" will be replaced
+	by a random characters. </li>
+</ul>
+
+<p>
+Example on using Join statement,
+</p>
+<pre class="box-script">
+LOAD "artist.data" (
+           :idx :   ::',',
+        '"':name:'"'::
+) as artist;
+
+LOAD "album.data" (
+        '\'':title     :'\''::,
+            :artist_idx:    :28 :28
+) as album;
+
+JOIN artist, album INTO "join_artist_album.data" (
+        artist.idx = album.artist_idx
+);
+</pre>
+
+<p> If you run the script the output would be like this, </p>
+<pre class="box-script">
+1|Broken Social Scene|You Forgot it in People|1
+2|U2|The Joshua Tree|2
+4|John Legend|Get Lifted|4
+5|Deep Purple|Burn|5
+</pre>
+
+<h4 id="field_clause"> Field Clause </h4>
+<img class="image" src="image/field_clause.png" alt="vos field clause" />
+
+<ul>
+<li> <i> left-quote </i>, <i>right-quote</i> and <i>separator</i> can be any
+	single character. </li>
+<li> <i> left-quote </i>, <i>right-quote</i> and <i>separator</i> must be
+	enclosed by single quote.</li>
+<li> If charater in single quote is it self (single quote) it must be prefixed
+	with backslash, i.e: <span class="script"> '\'' </span>.
+<li> <i>start-position</i> is a number begin from zero.</li>
+<li> <i>end-position</i> is a number begin from 1 and must be greater that
+	<i>start-position</i>. </li>
+<li> <i>field-type</i> is a string indicated type of data in the field. It's
+	value is only two : <span class="script">STRING</span> or
+	<span class="script">NUMBER</span>.</li>
+</ul>
+
+
+<h5>Priority of quote vs position vs separator</h5>
+
+<p>
+First, when reading field data <i>start-position</i> is have a higher priority than
+<i>left-quote</i>. In example, suppose that input data is like this,
+</p>
+
+<pre class="box-script">
+'You Forgot it in People'
+</pre>
+
+<p> and you defined field like this, </p>
+<pre class="box-script">
+	'\'':field00:'\'':4:22:
+</pre>
+
+<p>
+Vos will always read from position 4, not from first character of
+<i>left-quote</i>, which result <span class="script">" Forgot it in Peopl"</span>.
+</p>
+
+<p>
+Second, while reading field data <i>end-position</i> have a higher priority
+than <i>right-quote</i>, and <i>riqht-quote</i> is have a high priority than
+<i>separator</i>.
+</p>
+
+<h4 id="filter_clause"> Filter Clause </h4>
+<img class="image" src="image/filter_clause.png" alt="vos filter clause" />
+<ul>
+<li><span class="script">'='</span> mean is case not-sensitive compare, "a" is
+	equal with "A".</li>
+<li><span class="script">'=='</span> mean is case sensitive compare, "a" is
+	<u>not</u> equal with "A".</li>
+<li>The only limitation here is <i>field-name</i> must be in left and the value in
+	right, not vice versa. </li>
+</ul>
+<p>
+Example of using filter:
+<br />
+<br />
+This script will only write artist and album where it's field
+<span class="script">idx</span> value is 1.
+</p>
+
+<pre class="box-script">
+LOAD "artist.data" (
+	   :idx :::',',
+	'"':name:'"'::
+) as artist;
+
+LOAD "album.data" (
+	'\'':title     :'\''::,
+	    :artist_idx:    :28:28
+) as album;
+
+CREATE "filter_artist_album.data" from artist, album (
+	   :artist.idx	:::'|',
+	'"':artist.name	:'"'::'|',
+	'[':album.title	:']'::
+) FILTER (
+        ACCEPT artist.idx = 1,
+        REJECT album.artist_idx != 1
+);
+</pre>
+
+<p> If you run the script the output would be like this, </p>
+<pre class="box-script">
+1|"Broken Social Scene"|[You Forgot it in People]
+|""|[Broken Social Scene]
+</pre>
+
+<h2 id="license" name="license"> Vos License </h2> 
+<pre class="box-script">
+Copyright (C) 2009 M. Shulhan (ms@kilabit.info) All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+3. All advertising materials mentioning features or use of this software must
+   display the following acknowledgment:
+   "This product includes software written by M. Shulhan (ms@kilabit.info)"
+
+4. The names "M. Shulhan" or "Vos" must not be used to endorse or promote
+   products derived from this software without specific prior written
+   permission.
+
+5. Products derived from this software may not be called "Vos" nor may "Vos"
+   appear in their names without prior written permission of the author.
+
+THIS SOFTWARE IS PROVIDED BY SHULHAN "AS IS" AND ANY EXPRESS OR IMPLIED
+WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
+EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
+EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+</pre>
+</body>
+</html>
author	Mhd Sulhan <m.shulhan@gmail.com>	2014-07-27 20:22:12 +0700
committer	Mhd Sulhan <m.shulhan@gmail.com>	2014-07-27 20:22:12 +0700
commit	a5817d2410f65c3a055e4c1ec212270aed50186d (patch)
tree	60b4f7f0bf5684a8a950ec7602e8536ccc8d456c
parent	69d5c74cb37f32588d78e09fcffb947cd74d9c13 (diff)
download	vos-a5817d2410f65c3a055e4c1ec212270aed50186d.tar.xz