diff options
| author | Shulhan <ms@kilabit.info> | 2018-09-17 01:21:27 +0700 |
|---|---|---|
| committer | Shulhan <ms@kilabit.info> | 2018-09-18 01:50:21 +0700 |
| commit | 44b26edf7f390db383fe025454be0c4e30cfbd9b (patch) | |
| tree | 84d02953bc9095312182534936c1b60667957f07 /lib/tabula/README.md | |
| parent | 4a820ec157501c957d2e30f1670656cceec5c044 (diff) | |
| download | pakakeh.go-44b26edf7f390db383fe025454be0c4e30cfbd9b.tar.xz | |
Merge package "github.com/shuLhan/tabula"
Diffstat (limited to 'lib/tabula/README.md')
| -rw-r--r-- | lib/tabula/README.md | 165 |
1 files changed, 165 insertions, 0 deletions
diff --git a/lib/tabula/README.md b/lib/tabula/README.md new file mode 100644 index 00000000..8fbd2a40 --- /dev/null +++ b/lib/tabula/README.md @@ -0,0 +1,165 @@ +[](https://godoc.org/github.com/shuLhan/share/lib/tabula) +[](https://goreportcard.com/report/github.com/shuLhan/share/lib/tabula) + + +Package tabula is a Go library for working with rows, columns, or matrix +(table), or in another terms working with data set. + +# Overview + +Go's slice gave a flexible way to manage sequence of data in one type, but what +if you want to manage a sequence of value but with different type of data? +Or manage a bunch of values like a table? + +You can use this library to manage sequence of value with different type +and manage data in two dimensional tuple. + +## Terminology + +Here are some terminologies that we used in developing this library, which may +help reader understand the internal and API. + +Record is a single cell in row or column, or the smallest building block of +dataset. + +Row is a horizontal representation of records in dataset. + +Column is a vertical representation of records in dataset. +Each column has a unique name and has the same type data. + +Dataset is a collection of rows and columns. + +Given those definitions we can draw the representation of rows, columns, or +matrix: + + COL-0 COL-1 ... COL-x + ROW-0: record record ... record + ROW-1: record record ... record + ... + ROW-y: record record ... record + +## What make this package different from other dataset packages? + +### Record Type + +There are only three valid type in record: int64, float64, and string. + +Each record is a pointer to interface value. Which means, + +- Switching between rows to columns mode, or vice versa, is only a matter of + pointer switching, no memory relocations. +- When using matrix mode, additional memory is used only to allocate slice, the + record in each rows and columns is shared. + +### Dataset Mode + +Tabula has three mode for dataset: rows, columns, or matrix. + +For example, given a table of data, + + col1,col2,col3 + a,b,c + 1,2,3 + +- When in "rows" mode, each line is saved in its own slice, resulting in Rows: + + ``` + Rows[0]: [a b c] + Rows[1]: [1 2 3] + ``` + + Columns is used only to save record metadata: column name, type, flag and + value space. + +- When in "columns" mode, each line saved in columns, resulting in Columns: + + ``` + Columns[0]: {col1 0 0 [] [a 1]} + Columns[1]: {col2 0 0 [] [b 2]} + Columns[1]: {col3 0 0 [] [c 3]} + ``` + + Each column will contain metadata including column name, type, flag, and + value space (all possible value that _may_ contain in column value). + + Rows in "columns" mode is empty. + +- When in "matrix" mode, each record is saved both in row and column using + shared pointer to record. + + Matrix mode consume more memory by allocating two slice in rows and columns, + but give flexible way to manage records. + +## Features + +- **Switching between rows and columns mode**. + +- [**Random pick rows with or without replacement**](https://godoc.org/github.com/shuLhan/share/lib/tabula#RandomPickRows). + +- [**Random pick columns with or without replacement**](https://godoc.org/github.com/shuLhan/share/lib/tabula#RandomPickColumns). + +- [**Select column from dataset by index**](https://godoc.org/github.com/shuLhan/share/lib/tabula#SelectColumnsByIdx). + +- [**Sort columns by index**](https://godoc.org/github.com/shuLhan/share/lib/tabula#SortColumnsByIndex), + or indirect sort. + +- [**Split rows value by numeric**](https://godoc.org/github.com/shuLhan/share/lib/tabula#SplitRowsByNumeric). + For example, given two numeric rows, + + ``` + A: {1,2,3,4} + B: {5,6,7,8} + ``` + + if we split row by value 7, the data will splitted into left set + + ``` + A': {1,2} + B': {5,6} + ``` + + and the right set would be + + ``` + A'': {3,4} + B'': {7,8} + ``` + +- [**Split rows by string**](https://godoc.org/github.com/shuLhan/share/lib/tabula#SplitRowsByCategorical). + For example, given two rows, + + ``` + X: [A,B,A,B,C,D,C,D] + Y: [1,2,3,4,5,6,7,8] + ``` + + if we split the rows with value set `[A,C]`, the data will splitted into left + set which contain all rows that have A or C, + + ``` + X': [A,A,C,C] + Y': [1,3,5,7] + ``` + + and the right set, excluded set, will contain all rows which is not A or C, + + ``` + X'': [B,B,D,D] + Y'': [2,4,6,8] + ``` + +- [**Select row where**](https://godoc.org/github.com/shuLhan/share/lib/tabula#SelectRowsWhere). + Select row at column index x where their value is equal to y (an analogy to + _select where_ in SQL). + For example, given a rows of dataset, + ``` + ROW-1: {1,A} + ROW-2: {2,B} + ROW-3: {3,A} + ROW-4: {4,C} + ``` + we can select row where the second column contain 'A', which result in, + ``` + ROW-1: {1,A} + ROW-3: {3,A} + ``` |
