diff options
| author | Mhd Sulhan <m.shulhan@gmail.com> | 2014-07-27 20:22:12 +0700 |
|---|---|---|
| committer | Mhd Sulhan <m.shulhan@gmail.com> | 2014-07-27 20:22:12 +0700 |
| commit | a5817d2410f65c3a055e4c1ec212270aed50186d (patch) | |
| tree | 60b4f7f0bf5684a8a950ec7602e8536ccc8d456c | |
| parent | 69d5c74cb37f32588d78e09fcffb947cd74d9c13 (diff) | |
| download | vos-a5817d2410f65c3a055e4c1ec212270aed50186d.tar.xz | |
Add documentation.
27 files changed, 1830 insertions, 0 deletions
diff --git a/doc/dev/GOAL b/doc/dev/GOAL new file mode 100644 index 0000000..7578e06 --- /dev/null +++ b/doc/dev/GOAL @@ -0,0 +1,263 @@ +Vos Goals +---------- + Taken from CoSort Technical Specifications. + + +Legend: +- : unimplemented ++ : implemented += : on going/half done +? : is it worth/why/what is that mean + + +Ease of Use +----------- + +- Processes record layouts and SQLlike field definitions from central data + dictionaries. + +- Converts and processes native COBOL copybook, Oracle SQL*Loader control + file, CSV, and W3C extended log format (ELF) file layouts. + +- SortCL data definition files are a supported MIMB metadata format. + +- Mix of online help, preruntime application validation, and runtime + error messages. + +- Leverages centralized application and file layout definitions (metadata + repositories). + += Reports problems to standard error when invoked from a program, or + to an error log. + +- Runs silently or with verbose messaging without user intervention. + +- Allows user control over the amount of informational output produced. + +- Generates a queryready XML audit log for data forensics and privacy + compliance. + += Describes commands and options through man pages and online documentation. + + it's half done because the program is always moving to a new features. + it's not wise to mark this as 'done'. + +- Easytouse interfaces and seamless thirdparty sort replacements preclude + the need for training classes + + +Resource Control +---------------- + ++ Sets and allows user modification of the maximum and minimum number of + concurrent sort threads for sorting on multiCPU and multicore systems. + + using PROCESS_MAX variable. + ++ Uses a specified directory, a combination of directories, for temporary work + files. + + using PROC_TMP_DIR variable. + ++ Limits the amount of main and virtual memory used during sort operations. + + using PROCESS_MAX_ROW variable. + + Since input file size is unpredictable and a human is still need to + run the program, the amount of program memory still cannot decide by + human. What if it's set to 1 kilobytes ?. + ++ Sets the size of the memory blocks used as physical I/O buffers. + + using FILE_BUFFER_SIZE variable. + + +Input and Output +---------------- + += Processes any number of files, of any size, and any number of records, + fixed or variable length to 65,535 bytes passed from an input procedure, + from stdin, a named pipe, a table in memory, or from an application program. + + - TODO: from stdin + - TODO: from a named pipe. + - TODO: from a table in memory. + - TODO: from an application program. + +? Supports the use of environment variables. + + for what ? + += Supports wildcards in the specification of input and output files, as well + as absolute path names and aliases. + + - TODO: supports wildcards in the specification of input files. + ++ Accepts and outputs fixed or variablelength records with delimited field. + +? Generates one or more output files, and/or summary information, including + formatted and dashboardready reports. + +- Returns sorted, merged, or joined records one (or more) at a time to an output + procedure, to stdout (or named pipe), a table in memory, one or more new or + existing files, or to a program. + +- Outputs optional sequence numbers with each record, at any starting value, for + indexed loads and/or reports. + + +Record Selection and Grouping +----------------------------- + += Includes or omits input or output records using fieldtofield or fieldconstant + comparisons. + + TODO: field-to-field comparisons + +- Compares on any number of data fields, using standard and alternate collating + sequences. + ++ Sorts and/or reformats groups of selected records. + + using SORT and CREATE statement. + ++ Matches two or more sorted or unsorted files on inner and outer join criteria using + SQLbased condition syntax. + + using JOIN with '+' or '-' statement. + +- Skips a specified number of records, bytes, or a file header or footer. + +- Processes a specified number of records or bytes, including a saved header. + +- Eliminates or saves records with duplicate keys. + + +Sort Key Processing +------------------- + ++ Allows any number of key fields to be specified in ascending or + descending order. + + using SORT x by x.f1 ASC; or + using SORT x by x.f1 DESC; + ++ Supports any number of fields from 0 to 65,535 bytes in length. + + almost unlimited, the limit is your memory. + ++ Orders fixed position fields, or floating fields with one or more + delimiters. + +- Supports numeric keys, including all C, FORTRAN, and COBOL data types. + +- Supports single and multibyte character keys, including ASCII, EBCDIC, + ASCII in EBCDIC sequence, American, European, ISO and Japanese timestamps, + and natural (localedependent) values, as well as Unicode and doublebyte + characters such as Big5, EUCTW, UTF32, and SJIS. + +- Allows left or right alignment and case shifting of character keys. + +- Accepts user compare procedures for multibyte, encrypted and other + special data. + +- Performs record sequence checking. + ++ Maintains input record order (stability) on duplicate keys. + +- Controls treatment of null fields when specifying floating + (character separated) keys. + +- Collates and converts between many of the following data types (formats): + --- + + +Record Reformatting +------------------- + ++ Inserts, removes, resizes, and reorders fields within records; defines new + fields. + +- Converts data in fields from one format to another either using internal + conversion. + +- Maps common fields from differently formatted input files to a uniform sort + record. + += Joins any fields from several files into an output record, usually based on a + condition. + + using JOIN statement. current support only in joining two input files. + +- Changes record layouts from one file type to another, including: Line + Sequential, Record Sequential, Variable Sequential, Blocked, Microsoft Comma + Separated Values (CSV), ACUCOBOL Vision, MF ISAM, MFVL, Unisys VBF, VSAM + (within UniKik MBM), Extended Log Format (W3C), LDIF, and XML. + +- Maps processed records to many differently formatted output files, including + HTML. + +- Writes multiple record formats to the same file for complex report + requirements. + +- Performs mathematical expressions and functions on field data (including + aggregate data) to generate new output fields. + +- Calculates the difference in days, hours, minutes and seconds betweeen + timestamps. + + +Field Reformatting/Validation +----------------------------- + +- Aligns desired field contents to either the left or right of the target + field, where any leading or trailing fill characters from the source are + moved to the opposite side of the string. + +- Processes values from multidimensional, tabdelimited lookup files. + +- Creates and processes substrings of original field contents, where you can + specify a positive or negative offset and a number of bytes to be contained + in the substring. + +- Finds a userspecified text string in a given field, and replaces all + occurrences of it with a different userspecified text string in the target + field. + +- Supports Perl Compatible Regular Expressions (PCRE), including pattern + matching. + +- Uses Cstyle “iscompare” functions to validate contents at the field level + (for example, to determine if all field characters are printable), which can + also be used for recordfiltering via selection statements. + +- Protects sensitive field data with fieldlevel deidentification and AES256 + encryption routines, along with anonymization, pseudonymization, filtering + and other column-level data masking and obfuscation techniques. + +- Supports custom, userwritten fieldlevel transformation libraries, and + documents an example of a fieldlevel data cleansing routine from + Melissa Data (AddressObject). + + +Record Summarization +-------------------- + +- Consolidates records with equal keys into unique records, while totaling, + averaging, or counting values in specified fields, including derived + (crosscalculated) fields. + +- Produces maximum, minimum, average, sum, and count fields. + +- Displays running summary value(s) up to a break (accumulating aggregates). + +- Nreaks on compound conditions. + +- Allows multiple levels of summary fields in the same report. + +- Remaps summary fields into a new format, allowing relational tables. + +- Ranks data through a running count with descending numeric values. + +- Writes detail and summary records to the same output file for structured + reports. diff --git a/doc/dev/NOTES b/doc/dev/NOTES new file mode 100644 index 0000000..92bf86c --- /dev/null +++ b/doc/dev/NOTES @@ -0,0 +1,122 @@ + sometimes i forgot why i write code like this. + -- S.T.M.L + +- follow linux coding style + +- priority of source code (4S) : + + stable + + simple + + small + + secure (this option does not need for this program) + +- keep as small as possible: + + remove unneeded space + + remove unneeded variable + +- write comment/documentation as clear as possible + +- learn to use: + + if (1 == var) + +- learn to avoid: + + (i < strlen(str)) + on loop statement because strlen() need temporary variable. + try, + l = strlen(str); + while (i < l) { ... } + +- use function in libc as much as possible; if not, wrap it! + + + +001 - I/O Relation between Statement +----------------------------------------------------------------------------- +LOAD is an input statement. + +SORT, CREATE, JOIN is an output statement, but it can be an input. +i.e: + + 1 - load abc ( ... ) as x; + 2 - sort x by a, b; + 3 - create ghi ( x.field, ... ) as out_x; + +file output created by sort statement in line 2 will be an input by create +statement in line 3. + + +002 - Why we need '2nd-loser' +----------------------------------------------------------------------------- + +to minimize comparison and insert in merge tree. + + + +003 - Why we need 'level' on tree node +----------------------------------------------------------------------------- + +list of input file to merge is A, B, C contain sorted data : + + A : 10, 11, 12, 13 (1st file) + B : 1, 12, 100, 101 (2nd file) + C : 2, 13, 200, 201 (3rd file) + +if we use tree insert algorithm: + + if (root < node) + insert to left + else + insert to right + +after several step we will get: + +B-12 + \ + C-13 + / +A-12 + +which result in not-a-stable sort, + + B-1 C-2 A-10 A-11 B-12 A-12 ... + +they should be, + + B-1 C-2 A-10 A-11 A-12 B-12 ... + +Even if we choose different algorithm in insert: + + if (root <= node) + insert to left + else + insert to right + +there is also input data that will violate this, i.e: + + A : 2, 13, 200, 201 (1st file) + B : 1, 12, 100, 101 (2nd file) + C : 10, 11, 12, 13 (3rd file) + + +004 - recursives call + thread + free on SunOS 5.10 +----------------------------------------------------------------------------- + +i did not investigate much, but doing a recursive call + thread + free cause +SIGSEGV on SunOS 5.10 system, but not in GNU/Linux system. This odd's found +whee testing on Solaris and by using dbx the SIGSEGV "sometimes" catched in +str_destroy, + + if (str->buf) + free(str->buf); <= dbx catch here + +and "sometimes" below that (but not in vos function/stack). + +i.e: + list_destroy(**ptr) + { + if (! (*ptr)) + return; + list_destroy((*ptr)->next); + free((*ptr)); + } + +and no, it's not about double free. diff --git a/doc/dev/TODO b/doc/dev/TODO new file mode 100644 index 0000000..3dc6001 --- /dev/null +++ b/doc/dev/TODO @@ -0,0 +1,10 @@ +>> +- add set variable + set process_compare_case_sensitive; (default) + set process_compare_case_notsensitive; + + set process_tmp_dir "/path/to/tmp/dir"; + set process_tmp_dir "/another/tmp/dir"; +<< DONE + +- Produces maximum, minimum, average, sum, and count fields. diff --git a/doc/dev/slog b/doc/dev/slog new file mode 100644 index 0000000..72e2185 --- /dev/null +++ b/doc/dev/slog @@ -0,0 +1,206 @@ + i have a odd habit: checking code every time + i get bored, which result an error some time. + this file prevent me to over checking it. + -- May Benot + +--- format --- ++ function_name +@check : XXXX XXXX +@last-check : year.month.day (last check) +@auditor : thisman@thatserver.com (last auditor) +@desc : fix algorithm +--- tamrof --- + + +vos_String +----------------------------------------------------------------------------- + ++ str_create +@check : X +@last-check : 2008.12.17 +@auditor : ms@kilabit.info + ++ str_append_c +@check : X +@last-check : 2008.12.19 +@auditor : ms@kilabit.info + ++ str_append +@check : X +@last-check : 2009.01.25 +@auditor : ms@kilabit.info +@desc : fix len increment + ++ str_detach +@check : X +@last-check : 2008.12.19 +@auditor : ms@kilabit.info + ++ str_rtrim +@check : XX +@last-check : 2009.01.25 +@auditor : ms@kilabit.info +@desc : removed + ++ str_prune +@check : X +@last-check : 2008.12.19 +@auditor : ms@kilabit.info + ++ str_destroy +@check : X +@last-check : 2008.12.19 +@auditor : ms@kilabit.info + ++ str_raw_copy +@check : X +@last-check : 2008.12.19 +@auditor : ms@kilabit.info + ++ str_raw_randomize +@check : XX +@last-check : 2009.01.25 +@auditor : ms@kilabit.info +@desc : 'x' should not be replaced + ++ str_raw_hash +@check : X +@last-check : 2008.12.19 +@auditor : ms@kilabit.info + + +vos_File +----------------------------------------------------------------------------- + ++ file_open +@check : X +@last-check : 2008.12.19 +@auditor : ms@kilabit.info + ++ file_read +@check : X +@last-check : 2009.01.25 +@auditor : ms@kilabit.info + ++ file_write +@check : X +@last-check : 2009.01.25 +@auditor : ms@kilabit.info + ++ file_fetch_until +@check : X +@last-check : 2009.01.25 +@auditor : ms@kilabit.info + ++ file_skip_until +@check : X +@last-check : 2009.01.25 +@auditor : ms@kilabit.info + ++ file_skip_space +@check : X +@last-check : 2009.01.25 +@auditor : ms@kilabit.info + ++ file_destroy +@check : X +@last-check : 2009.01.25 +@auditor : ms@kilabit.info + ++ file_raw_get_size +@check : X +@last-check : 2009.01.25 +@auditor : ms@kilabit.info + ++ file_raw_is_exist +@check : X +@last-check : 2009.01.25 +@auditor : ms@kilabit.info + + +vos_LL +----------------------------------------------------------------------------- + ++ ll_add +@check : X +@last-check : 2009.01.26 +@auditor : ms@kilabit.info + ++ ll_link +@check : X +@last-check : 2009.01.26 +@auditor : ms@kilabit.info + ++ ll_print +@check : X +@last-check : 2009.01.26 +@auditor : ms@kilabit.info + ++ ll_destroy +@check : X +@last-check : 2009.01.26 +@auditor : ms@kilabit.info + + +vos_Field +----------------------------------------------------------------------------- + ++ field_soft_copy +@check : X +@last-check : 2009.01.26 +@auditor : ms@kilabit.info + ++ field_add +@check : X +@last-check : 2009.01.26 +@auditor : ms@kilabit.info + ++ field_print +@check : X +@last-check : 2009.01.26 +@auditor : ms@kilabit.info + ++ _field_destroy +@check : X +@last-check : 2009.01.26 +@auditor : ms@kilabit.info + + +vos_Record +----------------------------------------------------------------------------- + ++ record_new +@check : X +@last-check : 2009.01.26 +@auditor : ms@kilabit.info + ++ _record_cmp +@check : X +@last-check : 2009.01.26 +@auditor : ms@kilabit.info + ++ record_add_field +@check : X +@last-check : 2009.01.26 +@auditor : ms@kilabit.info + ++ record_add_row +@check : X +@last-check : 2009.01.26 +@auditor : ms@kilabit.info + ++ record_prune +@check : X +@last-check : 2009.01.26 +@auditor : ms@kilabit.info + ++ record_destroy +@check : X +@last-check : 2009.01.26 +@auditor : ms@kilabit.info + ++ record_print +@check : X +@last-check : 2009.01.26 +@auditor : ms@kilabit.info + diff --git a/doc/dev/test b/doc/dev/test new file mode 100644 index 0000000..ff6cde3 --- /dev/null +++ b/doc/dev/test @@ -0,0 +1 @@ ++ Accepts and outputs fixed or variablelength records with delimited field. diff --git a/doc/dev/vos-sketch.odg b/doc/dev/vos-sketch.odg Binary files differnew file mode 100644 index 0000000..45b86a6 --- /dev/null +++ b/doc/dev/vos-sketch.odg diff --git a/doc/dev/vos.test.create.log b/doc/dev/vos.test.create.log new file mode 100644 index 0000000..027c53f --- /dev/null +++ b/doc/dev/vos.test.create.log @@ -0,0 +1,113 @@ +2009.10.12 + +Comparing vos create process time by setting process max row and buffer size +============================================================================== + + is it disk or algorithm ? + +- file input size : 257,985,910 byte (~ 250 MB) +- format of input field: + + '\'':field01:'\''::';', + +- format of output field: + + '':field01:''::'|', + +- number of field in input & output : 11 field +- process max : 2 (this option does not effect process actually) + + +system copy time +============================================================================== + + real 0m2.906s + user 0m0.010s + sys 0m0.747s + + +vos load+create +============================================================================== + +test 000 +-------- +o process max row : 100,000 +o file buffer size : 8192 + + real 0m30.243s + user 0m55.680s + sys 0m1.567s + + +test 001 +-------- +o process max row : 100,000 +o file buffer size : 1,024,000 + + real 0m30.296s + user 0m55.536s + sys 0m1.790s + + +test 002 +-------- + +o process max row : 200,000 +o file buffer size : 1,024,000 + + real 0m30.115s + user 0m55.956s + sys 0m1.500s + + +test 003 +-------- + +o process max row : 100,000 +o file buffer size : 51,200,000 + + real 0m29.924s + user 0m55.443s + sys 0m1.563s + + +test 004 +-------- + +o process max row : 500,000 +o file buffer size : 51,200,000 + + real 0m32.795s + user 0m57.013s + sys 0m1.697s + + +(source change) +before: +- int record_read_filtered(struct Record **R, struct File *F, + struct Field *fld); +after: +- int record_read_filtered(struct Record **R, struct File *F, + struct Field *fld, struct String *str); + + +test 005 +-------- + +o process max row : 100,000 +o file buffer size : 8,192 + + real 0m29.783s + user 0m54.253s + sys 0m1.867s + + +test 006 +-------- + +o process max row : 100,000 +o file buffer size : 51,200,000 + + real 0m30.364s + user 0m56.000s + sys 0m1.570s diff --git a/doc/dev/vos.test.create.mem.log b/doc/dev/vos.test.create.mem.log new file mode 100644 index 0000000..f5db2f6 --- /dev/null +++ b/doc/dev/vos.test.create.mem.log @@ -0,0 +1,50 @@ + + How much vos load+create use memory +------------------------------------------------------------------------------ + +o input file size : 51,521,908 +o input rows : 501,000 +o input fields : 11 +o output fields : 11 +o process max row : 100,000 +o process max : 2 + + +2009.01.14 - test 000 +------------------------------------------------------------------------------ + +0 file buffer size : 51,200,000 +o bytes allocated : 296,833,772 +o allocs : 11,000,473 +o running time (w/o memcheck) : + + real 0m6.187s + user 0m10.789s + sys 0m0.363s + + +2009.01.15 - test 001 +------------------------------------------------------------------------------ + +o with new vos_process_create algorithm +0 file buffer size : 51,200,000 +o bytes allocated : 166,820,858 (~ 3 * input file size) +o allocs : 3,500,662 +o running time (w/o memcheck) : + + real 0m4.565s + user 0m8.026s + sys 0m0.327s + + +2009.01.16 - test 002 +------------------------------------------------------------------------------ +o file buffer size : 8192 (default) +o bytes allocated : 64,437,110 (~ 1.2 * input file size :) +o allocs : 3,500,652 +o running time (w/o memcheck) : + + real 0m4.361s + user 0m7.763s + sys 0m0.283s + diff --git a/doc/dev/vos.test.join.log b/doc/dev/vos.test.join.log new file mode 100644 index 0000000..d3557f3 --- /dev/null +++ b/doc/dev/vos.test.join.log @@ -0,0 +1,39 @@ + How fast vos_join is and how much memory does it's used +------------------------------------------------------------------------------ + +o input file size 1 (already sorted) : 40,499,908 +o input file size 2 (already sorted) : 40,499,908 + +o input rows : 501,000 +o input fields : 11 +o output fields : 22 + +o process max row : 100,000 +o process max : 2 +o file buffer size : 8192 bytes + + +2009.01.18 - test 000 +------------------------------------------------------------------------------ + +o allocs : 24,048,866 +o bytes allocated : 417,724,740 (~ 5 * inputs file size) +o running time (w/o memcheck) : + + real 0m9.118s + user 0m8.483s + sys 0m0.237s + + +2009.01.18 - test 001 +------------------------------------------------------------------------------ + +o with new vos_join algorithm +o allocs : 542 +o bytes allocated : 42,134 (~ 0.2 * inputs file size) +o running time (w/o memcheck) : + + real 0m5.336s + user 0m4.833s + sys 0m0.333s + diff --git a/doc/dev/vos.test.sort-00.log b/doc/dev/vos.test.sort-00.log new file mode 100644 index 0000000..d297e74 --- /dev/null +++ b/doc/dev/vos.test.sort-00.log @@ -0,0 +1,65 @@ + How much vos load+sort use memory +------------------------------------------------------------------------------ + +o input file size : 51,521,908 +o input rows : 501,000 +o input fields : 11 +o output fields : 11 +o sorted fields : field03 +o process max row : 100,000 +o process max : 2 + + +2009.01.16 - test 000 +------------------------------------------------------------------------------ + +o file buffer size : 8192 (default) +o allocs : 24,048,740 +o bytes allocated : 417,820,691 (~ 8 * input file size) +o running time (w/o memcheck) : + + real 0m12.341s + user 0m15.849s + sys 0m0.627s + + +2009.01.16 - test 001 +------------------------------------------------------------------------------ + +o file buffer size : 51,200,000 +o allocs : 24,048,751 +o bytes allocated : 1,185,697,974 (~ 23 * input file size) +o running time (w/o memcheck) : + + real 0m12.341s + user 0m15.849s + sys 0m0.627s + + + +2009.01.16 - test 002 +------------------------------------------------------------------------------ + +o with new sort_process algorithm +o file buffer size : 8192 +o allocs : 18,624,738 +o bytes allocated : 332,184,755 (~ 6 * input file size) +o running time (w/o memcheck) : + + real 0m10.314s + user 0m13.059s + sys 0m0.583s + + +2009.01.17 - test 003 +------------------------------------------------------------------------------ + +o with new sort_process & vos_sort_merge algorithm +o file buffer size : 8192 +o allocs : 6,600,924 +o bytes allocated : 123,352,391 (~ 2 * input file size) +o running time (w/o memcheck) : + + real 0m6.936s + user 0m9.379s + sys 0m0.560s diff --git a/doc/user/example/album.data b/doc/user/example/album.data new file mode 100644 index 0000000..af667d6 --- /dev/null +++ b/doc/user/example/album.data @@ -0,0 +1,5 @@ +'You Forgot it in People' 1 +'Burn' 5 +'Get Lifted' 4 +'The Joshua Tree' 2 +'Broken Social Scene' 1 diff --git a/doc/user/example/artist.data b/doc/user/example/artist.data new file mode 100644 index 0000000..0178395 --- /dev/null +++ b/doc/user/example/artist.data @@ -0,0 +1,5 @@ +1,"Broken Social Scene" +2,"U2" +3,"Led Zeppelin" +4,"John Legend" +5,"Deep Purple" diff --git a/doc/user/example/create_artist_album.vos b/doc/user/example/create_artist_album.vos new file mode 100644 index 0000000..7e530a8 --- /dev/null +++ b/doc/user/example/create_artist_album.vos @@ -0,0 +1,19 @@ +# +# example of create statement +# +LOAD "artist.data" ( + :idx : ::',', + '"':name:'"':: +) as artist; + +LOAD "album.data" ( + '\'':title :'\''::, + :artist_idx: :28:28 +) as album; + +CREATE "create_artist_album.data" from artist, album ( + :artist.idx : ::'|', + '"':artist.name :'"'::'|', + :album.artist_idx: ::'|', + '[' :album.title :']' :: +); diff --git a/doc/user/example/filter_artist_album.vos b/doc/user/example/filter_artist_album.vos new file mode 100644 index 0000000..5e989c5 --- /dev/null +++ b/doc/user/example/filter_artist_album.vos @@ -0,0 +1,21 @@ +# +# example of using create with filter +# +LOAD "artist.data" ( + :idx :::',', + '"':name:'"':: +) as artist; + +LOAD "album.data" ( + '\'':title :'\''::, + :artist_idx: :28 :28 +) as album; + +CREATE "filter_artist_album.data" from artist, album ( + :artist.idx :::'|', + '"':artist.name :'"'::'|', + '[':album.title :']':: +) FILTER ( + ACCEPT artist.idx = 1, + REJECT album.artist_idx != 1 +); diff --git a/doc/user/example/join_artist_album.vos b/doc/user/example/join_artist_album.vos new file mode 100644 index 0000000..d8d2e2c --- /dev/null +++ b/doc/user/example/join_artist_album.vos @@ -0,0 +1,16 @@ +# +# example of join statement +# +LOAD "artist.data" ( + :idx : ::',', + '"':name:'"':: +) as artist; + +LOAD "album.data" ( + '\'':title :'\''::, + :artist_idx: :28 :28 +) as album; + +JOIN artist, album INTO "join_artist_album.data" ( + artist.idx = album.artist_idx +); diff --git a/doc/user/example/sort_album.vos b/doc/user/example/sort_album.vos new file mode 100644 index 0000000..18c0286 --- /dev/null +++ b/doc/user/example/sort_album.vos @@ -0,0 +1,9 @@ +# +# example of sort statement with two field +# +LOAD "album.data" ( + '\'':title :'\''::, + :artist_idx: :28:28 +) as album; + +SORT album BY artist_idx, title INTO "album_sorted.data"; diff --git a/doc/user/example/sort_artist.vos b/doc/user/example/sort_artist.vos new file mode 100644 index 0000000..39fdda8 --- /dev/null +++ b/doc/user/example/sort_artist.vos @@ -0,0 +1,9 @@ +# +# example of sort statement by descending order +# +LOAD "artist.data" ( + :idx :::',', + '"':name:'"':: +) as artist; + +SORT artist BY name DESC; diff --git a/doc/user/image/create_stmt.png b/doc/user/image/create_stmt.png Binary files differnew file mode 100644 index 0000000..8c6ae84 --- /dev/null +++ b/doc/user/image/create_stmt.png diff --git a/doc/user/image/field_clause.png b/doc/user/image/field_clause.png Binary files differnew file mode 100644 index 0000000..ca5c387 --- /dev/null +++ b/doc/user/image/field_clause.png diff --git a/doc/user/image/filter_clause.png b/doc/user/image/filter_clause.png Binary files differnew file mode 100644 index 0000000..ff0d5a0 --- /dev/null +++ b/doc/user/image/filter_clause.png diff --git a/doc/user/image/join_rules.png b/doc/user/image/join_rules.png Binary files differnew file mode 100644 index 0000000..4b564ba --- /dev/null +++ b/doc/user/image/join_rules.png diff --git a/doc/user/image/join_stmt.png b/doc/user/image/join_stmt.png Binary files differnew file mode 100644 index 0000000..5dad5a0 --- /dev/null +++ b/doc/user/image/join_stmt.png diff --git a/doc/user/image/load_stmt.png b/doc/user/image/load_stmt.png Binary files differnew file mode 100644 index 0000000..401d391 --- /dev/null +++ b/doc/user/image/load_stmt.png diff --git a/doc/user/image/set_stmt.png b/doc/user/image/set_stmt.png Binary files differnew file mode 100644 index 0000000..a604664 --- /dev/null +++ b/doc/user/image/set_stmt.png diff --git a/doc/user/image/sort_stmt.png b/doc/user/image/sort_stmt.png Binary files differnew file mode 100644 index 0000000..1f45546 --- /dev/null +++ b/doc/user/image/sort_stmt.png diff --git a/doc/user/style/page.css b/doc/user/style/page.css new file mode 100644 index 0000000..18b3d67 --- /dev/null +++ b/doc/user/style/page.css @@ -0,0 +1,68 @@ +body +{ + font-family : sans-serif; +} + +h2 +{ + background-color : black; + color : white; + padding : 4px 4px 4px 16px; + border-bottom : solid; + border-color : silver; + margin-top : 4em; +} + +h3 +{ + background-color : silver; + padding : 4px 4px 4px 16px; + border-bottom : solid; + border-color : black; + margin-top : 2em; +} + +h4 +{ + padding : 4px 4px 4px 16px; + border-bottom : dashed; + border-color : black; + border-width : 1px; + margin-top : 2em; +} + +.quote +{ + font-family : serif; + font-style : italic; + text-align : right; + margin-left : 50%; +} + +.script +{ + font-family : monospace; +} + +.box-script +{ + background : #F0F0F0; + border : solid; + border-width : thin; + font-family : monospace; + padding : 8px; + margin-left : 2em; +} + +.list-desc +{ + margin-left : 2em; +} + +.image +{ + border : solid; + border-width : 2px; + border-color : gray; + margin : 1em; +} diff --git a/doc/user/vos_user_manual.html b/doc/user/vos_user_manual.html new file mode 100644 index 0000000..ed5618c --- /dev/null +++ b/doc/user/vos_user_manual.html @@ -0,0 +1,809 @@ +<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" + "http://www.w3.org/TR/html4/loose.dtd"> +<html> +<head> +<title>Vos User Manual</title> +<style type="text/css"> +@import url(style/page.css); +</style> +</head> + +<body> +<h1> Vos User Manual </h1> + +<h2> Table of Contents </h2> +<ul> +<li><a href="#intro">Introduction</a></li> +<li><a href="#ch00">Building and Compiling Vos</a></li> + <ul> + <li><a href="#softreq">Software Requirements</a></li> + <li><a href="#make">Compiling Vos From Source</a></li> + </ul> +<li><a href="#ch01">Running Vos</a></li> + <ul> + <li><a href="#vos_env">Vos Environments</a></li> + </ul> +<li><a href="#ch02">Vos Script</a></li> + <ul> + <li><a href="#vos_var">Vos Variables</a></li> + <li><a href="#vos_stmt">Vos Statements</a></li> + <ul> + <li><a href="#set_stmt">Set Statement</a></li> + <li><a href="#sort_stmt">Sort Statement</a></li> + <li><a href="#create_stmt">Create Statement</a></li> + <li><a href="#join_stmt">Join Statement</a></li> + <li><a href="#field_clause">Field Clause</a></li> + <li><a href="#filter_clause">Filter Clause</a></li> + </ul> + </ul> +<li><a href="#license">Vos License</a></li> +</ul> + +<h2 id="intro"> Introduction </h2> +Vos is a program to process formatted data, i.e. CSV data. +Vos is designed to process a large input file, a file where their size is +larger than the size of memory, and can be tuned to adapt with your machine +environment. +<br /><br /> +Vos currently has four main features, +<ul> +<li>Sorting</li> +<p class="list-desc"> +Vos can sort one or more field in input file in one pass. +</p> +<li>Reformatting</li> +<p class="list-desc"> +By declaring only specific fields or using a different separator in vos +script, Vos can re-create a new file with a new format and data. +</p> +<li>Filtering</li> +<p class="list-desc"> +Vos can omit and include specific data in specific field/record. +</p> +<li>Join</li> +<p class="list-desc"> +Vos can join two file into one file with all or specific fields included in +output file. +</p> +</ul> + +<h2 id="ch00"> +Building and Compiling Vos </h2> +<p class="quote"> +Vos was developed on GNU/Linux system, so any prerequisite below only valid +on system that running GNU/Linux system. Usually, any Unix like system +could compile the source, it just does not fully tested yet. +</p> + +<h3 id="softreq" name="softreq"> +Software Requirements </h3> + +<p> This software/tools below is used in developing Vos, therefore we +recommended you to use the same or greater version when building Vos from the +source.</p> +<ul> +<li> GNU C Compiler + <ul> + <li> Version : 4.2.1_20070724 </li> + <li> Website : + <a href="http://gcc.gnu.org"> http://gcc.gnu.org </a></li> + </ul> +</li> +<li> glibc development + <ul> + <li> Version : 2.6.1 </li> + <li> Website : +<a href="http://www.gnu.org/software/libc/libc.html"> +http://www.gnu.org/software/libc/libc.html </a> </li> + </ul> +</li> +<li> GNU Make + <ul> + <li> Version : 3.81 </li> + <li> Website : +<a href="http://www.gnu.org/software/make/make.html"> +http://www.gnu.org/software/make/make.html </a> </li> + </ul> +</li> +</ul> + + +<h3 id="make" name="make"> +Compiling Vos From Source </h3> +<p class="quote"> +This step assume that you already get the source and saved into your machine. +</p> + +<pre class="box-script"> + $ tar jxvf vos-xxxx.xx.xx.tar.bz2 + $ cd vos/src + $ make +</pre> + +<p> +Where xxxx.xx.xx is Vos version (depend on which version that has been +downloaded). When running "make", make program will create directory "build" +in "vos" directory, vos executable is placed in there (vos/build). +</p> +<p> +For later use, you should copy Vos executable to your PATH directory. In +example: +</p> +<pre class="box-script"> + $ pwd + /home/johndoe/tmp/vos/src + $ echo $PATH + /home/johndoe/bin:/usr/local/bin:/usr/bin:/bin + $ cp ../build/vos /home/johndoe/bin +</pre> + +<h2 id="ch01"> +Running Vos </h2> +<p> +Vos program only have one parameter: vos script. +</p> +<pre class="box-script"> + vos < vos-script > +</pre> +<p> +<span class="script">vos-script</span> is a file contains <a href="#vos_stmt"> +vos statements </a> that will be executed and processed. +</p> + +<h3 id="vos_env"> Vos Environments </h3> +<p> +Before running Vos program, there are severals environment variables that you +can set to change behaviour of program while running. Some of the environment +variable also can be set at the vos script using <a href="#vos_var"> Vos +variables. </a> +</p> + +<ul> +<li><span class="script"> VOS_DEBUG < number > </span> +<p class="list-desc"> +Default value : 0 +<br /><br /> +This variable is an optional and used only for debugging, normal use/user +should not use this parameter. The <span class="script"> VOS_DEBUG </span> +environment variables can have a value, +</p> +<ul> +<li> 1 : for testing the script only, not process it </li> +<li> 2 : for debugging parsing process </li> +<li> 4 : for debugging sort process </li> +<li> 8 : for debugging create process </li> +<li> 16 : for debugging join process </li> +</ul> +<p class="list-desc"> +Those value can be combined to get more debug output. +</p> +<p class="list-desc"> +Example on setting <span class="script"> VOS_DEBUG </span> variable on Bash +shell, +</p> +<pre class="box-script"> + $ export VOS_DEBUG=3 +</pre> +<p class="list-desc"> +this value will tell Vos program to debug parsing process (2) but will not +process the script (1). +</p> +</li> +</br> +<!-- VOS_DEBUG end --> + +<li> +<span class="script"> VOS_FILE_BUFFER_SIZE < number > </span> +<p class="list-desc"> +Default value : 8192 +<br /><br /> +This variable is used to set size of buffer for read/write on file, in bytes. +<br /> +This example set buffer size to ~ 1 MB, +</p> +<pre class="box-script"> + $ export VOS_FILE_BUFFER_SIZE=1000000; +</pre> +</li> +<br /> +<!-- VOS_FILE_BUFFER_SIZE end --> + +<li> +<span class="script"> VOS_COMPARE_CASE < number > </span> +<p class="list-desc"> +Default value : 0 +<br /><br /> +This variable affect order on sort output. +<br /> +If <span class="script"> VOS_COMPARE_CASE </span> is set to 0, "B" +will come first then "a", but +<br /> +if <span class="script"> VOS_COMPARE_CASE </span> is set to 1, "a" +will come first then "B". +<br /> +Example on how to set it on Bash shell, +</p> +<pre class="box-script"> + $ export VOS_COMPARE_CASE=0; # or + $ export VOS_COMPARE_CASE=1; +</pre> +</li> +<br /> +<!-- VOS_COMPARE_CASE end --> + +<li> +<span class="script"> VOS_PROCESS_MAX < number > </span> +<p class="list-desc"> +Default value : 2 +<br /><br /> +This variable affect on how many thread will be used for sort process. The +recommended value is equal to a number of processor that you have on your +machine. +<br /> +Example on how to set it on Bash shell, +</p> +<pre class="box-script"> + $ export VOS_PROCESS_MAX=8; +</pre> +</li> +<br /> +<!-- VOS_PROCESS_MAX end --> + +<li> +<span class="script"> VOS_PROCESS_MAX_ROW < number ></span> +<br /> +<br /> +<p class="list-desc"> +Default value : 100,000 +<br /><br /> +This variable affect on how many "row" that program must keep in memory before +writen to temporary file. +<br /> +Example on how to use it: +</p> +<pre class="box-script"> + $ export VOS_PROCESS_MAX_ROW=400000; +</pre> +</li> +<br /> +<!-- VOS_PROCESS_MAX_ROW --> + +<li> +<span class="script"> +VOS_TMP_DIR +<double-quote><string><double-quote>[:] ... </span> +<br /> +<br /> +<p class="list-desc"> +Default value : /tmp/ +<br /><br /> +While in sort process, program sometime use temporary file. This temporary +file usually, as default, placed in "/tmp/" directory. You can add two or more +directories as temporary directory, as long as there is free space and user +who run the Vos program has a write access to it. +<br /><br /> +We recommended that you to use a temporary directory that has a place in a +different disk than input file, for technical reason it's decrease processing +time. +<br /><br /> +</p> +<pre class="box-script"> + $ export VOS_TMP_DIR="/var/tmp/"; +</pre> +<p class="list-desc"> +which result that program will use "/tmp/" (default from program), "/var/tmp/", +and "/media/tmp/" as temporary directories. +<br /><br /> +Another example : +</p> +<pre class="box-script"> + $ export VOS_TMP_DIR "/media/tmp/":"/disk01/"; +</pre> +<p class="list-desc"> +which result that program will use "/media/tmp/" and "/disk01/" as temporary +directories. +</p> +</li> +<br /> + + +</ul> <!-- Vos Environment --> + +<h2 id="ch02"> Vos Script </h2> +<p> +To illustrate on how Vos script work, we will use two input files as an example +here, "artist.data" and "album.data". +</p> + +<span class="script"> artist.data </span> +<pre class="box-script"> +1,"Broken Social Scene" +2,"U2" +3,"Led Zeppelin" +4,"John Legend" +5,"Deep Purple" +</pre> + +<span class="script"> album.data </span> +<pre class="box-script"> +'You Forgot it in People' 1 +'Burn' 5 +'Get Lifted' 4 +'The Joshua Tree' 2 +'Broken Social Scene' 1 +</pre> + + +<h3 id="vos_var"> Vos Variables </h3> +<p class="quote"> +Vos variable is used with "SET" statement. +</p> +<p> +Vos variable is used to adapt with the environment where Vos will be running. +For example, let say that you have a machine with 8 processor and 16 GB of +memory and you want to sort 20,000,000 rows of data with it's size maybe +about 2 GB. Instead of using default maximum row (which is 100,000) with two +thread you can set maximum row to 2,500,000 and maximum thread to 8, which +will decrease processing time. +</p> +<p> +There are two method to set Vos variable, first, by explicitly defined it on +vos script by using <span class="script">SET</span> statement; second, by +defined in environment variable using shell <span class="script">set</span> or +<span class="script">export</span>. +</p> + +<ul> +<li><span class="script"> FILE_BUFFER_SIZE <number> </span> +<br /> +<br /> +<p class="list-desc"> +Default value : 8192 +<br /><br /> +This variable is used to set size of buffer for read/write on file, in bytes. +<br /> +This example set buffer size to ~ 1 MB, +</p> +<pre class="box-script"> + set FILE_BUFFER_SIZE 1000000; +</pre> +</li> +<br /> + +<li><span class="script">PROCESS_COMPARE_CASE_SENSITIVE</span> (default) </li> +<li><span class="script">PROCESS_COMPARE_CASE_NOTSENSITIVE</span> +<br /> +<br /> +<p class="list-desc"> +This variable affect order on sort output. +<br /> +If <span class="script"> PROCESS_COMPARE_CASE_SENSITIVE </span> is used, "B" +will come first then "a", but +<br /> +if <span class="script"> PROCESS_COMPARE_CASE_NOTSENSITIVE </span> is used "a" +will come first then "B". +<br /> +Example on how to use it: +</p> +<pre class="box-script"> + set PROCESS_COMPARE_CASE_SENSITIVE; + set PROCESS_COMPARE_CASE_NOTSENSITIVE; +</pre> +</li> +<br /> + +<li> +<span class="script"> PROCESS_MAX <number> </span> +<br /> +<br /> +<p class="list-desc"> +Default value : 2 +<br /><br /> +This variable affect on how many thread will be used for sort process. The +recommended value is equal to a number of processor that you have on your +machine. +<br /> +Example on how to use it: +</p> +<pre class="box-script"> + set PROCESS_MAX 8; +</pre> +</li> +<br /> + +<li> +<span class="script"> PROCESS_MAX_ROW <number></span> +<br /> +<br /> +<p class="list-desc"> +Default value : 100,000 +<br /><br /> +This variable affect on how many "row" that program must keep in memory before +writen to temporary file. +<br /> +Example on how to use it: +</p> +<pre class="box-script"> + set PROCESS_MAX_ROW 400000; +</pre> +</li> +<br /> + +<li> +<span class="script"> +PROCESS_TEMPORARY_DIRECTORY +[:]<double-quote><string><double-quote>[:] ... </span> +<br /> +<br /> +<p class="list-desc"> +Default value : /tmp/ +<br /><br /> +While in sort process, program sometime use temporary file. This temporary +file usually, as default, placed in "/tmp/" directory. You can add two or more +directories as temporary directory, as long as there is free space and user +who run the Vos program has a write access to it. +<br /><br /> +We recommended you to use a temporary directory that has a place in a +different disk than input file, for technical reason it's decrease processing +time. +<br /><br /> +When <span class="script">':'</span> is set as the first character in a string +value then the rest of value is added to the list of temporary directory, +which means the last or the default temporary directory will not be replaced. +This setting allow you to add several directories in two or more SET +statement. +In example : +</p> +<pre class="box-script"> + SET PROCESS_TEMPORARY_DIRECTORY :"/var/tmp/"; + SET PROCESS_TEMPORARY_DIRECTORY :"/media/tmp/"; +</pre> +<p class="list-desc"> +which result that program will use "/tmp/" (default from program), "/var/tmp/", +and "/media/tmp/" as temporary directories. +<br /> +Another example : +</p> +<pre class="box-script"> + SET PROCESS_TEMPORARY_DIRECTORY :"/var/tmp/"; + SET PROCES_TEMPORARY_DIRECTORY "/media/tmp/":"/disk01/"; +</pre> +<p class="list-desc"> +which result that program will use "/media/tmp/" and "/disk01/" as temporary +directories but not include "/var/tmp" because it's has been override by the +last SET statement. +</p> +</li> +<br /> +</ul><!-- end of vos variable list --> + + +<h3 id="vos_stmt"> Vos Statements </h3> +<p class="quote"> +Vos script is not case sensitive, "Load" is equal with "LOAD". +</p> + +<h4 id="set_stmt"> Set Statement </h4> +<img class="image" src="image/set_stmt.png" alt="vos set statement"/> <br /> +For example on how to use Set Statement and list of variable see +<a href="#vos_var">Vos Variable.</a> + +<h4 id="load_stmt"> Load Statement </h4> +<img class="image" src="image/load_stmt.png" alt="vos load statement" /> <br /> +<ul> +<li><i>/path/to/input/file</i> is path to input file that you want to process. +</li> +<li> Input file must be enclosed by a double-quote character. </li> +<li> For field declaration see <a href="#field_clause"> Field Clause </a></li> +<li> <i> alias </i> is optional, but we recommended you to use it for easy of use +later. </li> +<li> Declaring Load statement only does not process anything, it just for +defining your input file and their fields. </li> +</ul> +<p> +Example on using Load Statement: +</p> +<pre class="box-script"> +LOAD "artist.data" ( + :idx : ::',', + '"':name:'"':: +) as artist; + +LOAD "album.data" ( + '\'':title :'\''::, + :artist_idx: :28:28 +) as album; +</pre> + +<h4 id="sort_stmt"> Sort Statement </h4> +<img class="image" src="image/sort_stmt.png" alt="vos sort statement" /> +<ul> +<li>Please note that the default setting in sort is <u>case sensitive</u> and in + <u>ascending</u> order. To change case sensitive to not-sensitive use + <a href="#set_stmt"> SET statement.</a> +<li>If <span class="script">INTO</span> is not defined then sort output will + be written into a file "sort.XXXXXXXX", where "XXXXXXXX" will be + replaced by a random characters. +</ul> +<p> +Example on using Sort Statement: +<br /> +<br /> +This script will sort <span class="script"> artist.data </span> by +<span class="script"> name </span> (second field) on descending order, +</p> +<pre class="box-script"> +LOAD "artist.data" ( + :idx : ::',', + '"':name:'"':: +) as artist; + +SORT artist BY name DESC; +</pre> +<p> If you run the script the output would be like this, </p> +<pre class="box-script"> +2|U2 +3|Led Zeppelin +4|John Legend +5|Deep Purple +1|Broken Social Scene +</pre> + +<p> +This script will sort +<span class="script"> album.data </span> by +<span class="script"> artist_idx </span> (second field) then by +<span class="script"> title </span> (first field) and save the output to a +file <span class="script"> album_sorted.data </span>. +</p> +<pre class="box-script"> +LOAD "album.data" ( + '\'':title :'\''::, + :artist_idx: :28:28 +) as album; + +SORT album BY artist_idx, title INTO "album_sorted.data"; +</pre> +<p> If you run the script the output would be like this, </p> +<pre class="box-script"> +Broken Social Scene|1 +You Forgot it in People|1 +The Joshua Tree|2 +Get Lifted|4 +Burn|5 +</pre> + +<h4 id="create_stmt"> Create Statement </h4> +<img class="image" src="image/create_stmt.png" alt="vos create statement" /> +<p> +Create statement is used to create a new data with new format or with +different field output order. <br /> +Create statement also can be used to combine several input file into one file. +</p> + +<ul> +<li><i>/path/to/output/file</i> is path to output file where data will be +written. This value must be enclosed with double quote.</li> +<li> For field declaration see <a href="#field_clause"> Field Clause </a>.</li> +<li> For filter declaration see +<a href="#filter_clause"> Filter Clause </a>.</li> +</ul> + +<p> +Example on using Create Statement, +<br /> +<br /> +This script will combine <span class="script"> artist.data </span> and +<span class="script"> album.data </span> into one file, fields will be +separated by <span class="script">'|'</span>. +</p> + +<pre class="box-script"> +LOAD "artist.data" ( + :idx : ::',', + '"':name:'"':: +) as artist; + +LOAD "album.data" ( + '\'':title:'\''::, + :artist_idx::28:28 +) as album; + +CREATE "artist_album.data" from artist, album ( + :artist.idx : ::'|', + '"':artist.name :'"'::'|', + :album.artist_idx: ::'|', + '[':album.title :']':: +); +</pre> + +<p> If you run the script the output would be like this, </p> +<pre class="box-script"> +1|"Broken Social Scene"|1|[You Forgot it in People] +2|"U2"|5|[Burn] +3|"Led Zeppelin"|4|[Get Lifted] +4|"John Legend"|2|[The Joshua Tree] +5|"Deep Purple"|1|[Broken Social Scene] +</pre> + +<h4 id="join_stmt"> Join Statement </h4> +<img class="image" src="image/join_stmt.png" alt="vos joint statement" /> +<img class="image" src="image/join_rules.png" alt="vos join rules" /> +<p> +Join statement is used to combine two input file into one file, like create +statement, but using specific fields as a matching rule. +</p> + +<ul> +<li> if <span class="script"> '+' </span> is defined then the match row and + non-match row will be writen to output file. </li> +<li> if <span class="script"> '-' </span> is defined then the match row will + not be writen but non-match row will be writen to output file.</li> +<li> if non of <span class="script"> '+' </span> and + <span class="script"> '-' </span> is defined only the match row will + be writen to output file. </li> +<li> Default input file is in <span class="script">UNSORTED</span>. It's + important to defined <span class="script">SORTED</span> if you know + that input file is already sorted, so Vos will not sort them again before + processing join. </li> +<li> If <span class="script">INTO</span> is not defined then output file will + be writen to a file "join.XXXXXXXX" where "XXXXXXXX" will be replaced + by a random characters. </li> +</ul> + +<p> +Example on using Join statement, +</p> +<pre class="box-script"> +LOAD "artist.data" ( + :idx : ::',', + '"':name:'"':: +) as artist; + +LOAD "album.data" ( + '\'':title :'\''::, + :artist_idx: :28 :28 +) as album; + +JOIN artist, album INTO "join_artist_album.data" ( + artist.idx = album.artist_idx +); +</pre> + +<p> If you run the script the output would be like this, </p> +<pre class="box-script"> +1|Broken Social Scene|You Forgot it in People|1 +2|U2|The Joshua Tree|2 +4|John Legend|Get Lifted|4 +5|Deep Purple|Burn|5 +</pre> + +<h4 id="field_clause"> Field Clause </h4> +<img class="image" src="image/field_clause.png" alt="vos field clause" /> + +<ul> +<li> <i> left-quote </i>, <i>right-quote</i> and <i>separator</i> can be any + single character. </li> +<li> <i> left-quote </i>, <i>right-quote</i> and <i>separator</i> must be + enclosed by single quote.</li> +<li> If charater in single quote is it self (single quote) it must be prefixed + with backslash, i.e: <span class="script"> '\'' </span>. +<li> <i>start-position</i> is a number begin from zero.</li> +<li> <i>end-position</i> is a number begin from 1 and must be greater that + <i>start-position</i>. </li> +<li> <i>field-type</i> is a string indicated type of data in the field. It's + value is only two : <span class="script">STRING</span> or + <span class="script">NUMBER</span>.</li> +</ul> + + +<h5>Priority of quote vs position vs separator</h5> + +<p> +First, when reading field data <i>start-position</i> is have a higher priority than +<i>left-quote</i>. In example, suppose that input data is like this, +</p> + +<pre class="box-script"> +'You Forgot it in People' +</pre> + +<p> and you defined field like this, </p> +<pre class="box-script"> + '\'':field00:'\'':4:22: +</pre> + +<p> +Vos will always read from position 4, not from first character of +<i>left-quote</i>, which result <span class="script">" Forgot it in Peopl"</span>. +</p> + +<p> +Second, while reading field data <i>end-position</i> have a higher priority +than <i>right-quote</i>, and <i>riqht-quote</i> is have a high priority than +<i>separator</i>. +</p> + +<h4 id="filter_clause"> Filter Clause </h4> +<img class="image" src="image/filter_clause.png" alt="vos filter clause" /> +<ul> +<li><span class="script">'='</span> mean is case not-sensitive compare, "a" is + equal with "A".</li> +<li><span class="script">'=='</span> mean is case sensitive compare, "a" is + <u>not</u> equal with "A".</li> +<li>The only limitation here is <i>field-name</i> must be in left and the value in + right, not vice versa. </li> +</ul> +<p> +Example of using filter: +<br /> +<br /> +This script will only write artist and album where it's field +<span class="script">idx</span> value is 1. +</p> + +<pre class="box-script"> +LOAD "artist.data" ( + :idx :::',', + '"':name:'"':: +) as artist; + +LOAD "album.data" ( + '\'':title :'\''::, + :artist_idx: :28:28 +) as album; + +CREATE "filter_artist_album.data" from artist, album ( + :artist.idx :::'|', + '"':artist.name :'"'::'|', + '[':album.title :']':: +) FILTER ( + ACCEPT artist.idx = 1, + REJECT album.artist_idx != 1 +); +</pre> + +<p> If you run the script the output would be like this, </p> +<pre class="box-script"> +1|"Broken Social Scene"|[You Forgot it in People] +|""|[Broken Social Scene] +</pre> + +<h2 id="license" name="license"> Vos License </h2> +<pre class="box-script"> +Copyright (C) 2009 M. Shulhan (ms@kilabit.info) All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + +1. Redistributions of source code must retain the above copyright notice, this + list of conditions and the following disclaimer. + +2. Redistributions in binary form must reproduce the above copyright notice, + this list of conditions and the following disclaimer in the documentation + and/or other materials provided with the distribution. + +3. All advertising materials mentioning features or use of this software must + display the following acknowledgment: + "This product includes software written by M. Shulhan (ms@kilabit.info)" + +4. The names "M. Shulhan" or "Vos" must not be used to endorse or promote + products derived from this software without specific prior written + permission. + +5. Products derived from this software may not be called "Vos" nor may "Vos" + appear in their names without prior written permission of the author. + +THIS SOFTWARE IS PROVIDED BY SHULHAN "AS IS" AND ANY EXPRESS OR IMPLIED +WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO +EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, +INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, +BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, +DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY +OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING +NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, +EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +</pre> +</body> +</html> |
