aboutsummaryrefslogtreecommitdiff
path: root/src/pkg/index/suffixarray/suffixarray.go
AgeCommit message (Collapse)Author
2014-09-08build: move package sources from src/pkg to srcRuss Cox
Preparation was in CL 134570043. This CL contains only the effect of 'hg mv src/pkg/* src'. For more about the move, see golang.org/s/go14nopkg.
2011-11-01src/pkg/[a-m]*: gofix -r error -force=errorRuss Cox
R=golang-dev, iant CC=golang-dev https://golang.org/cl/5322051
2011-09-30index/suffixarray: 4.5x faster index serialization (to memory)Robert Griesemer
Benchmark results (best of 3 runs): old: suffixarray.BenchmarkSaveRestore 1 1931909000 ns/op 28.21 MB/s new: suffixarray.BenchmarkSaveRestore 5 429721800 ns/op 117.14 MB/s R=golang-dev, r CC=golang-dev https://golang.org/cl/5161043
2011-09-27index/suffixarray: revert change from int -> int32Robert Griesemer
CL 5040041 (https://golang.org/cl/5040041) changed the use of []int to []int32 internally so that encoding/binary could be used. This is no longer needed (gobs can encode ints), and using []int is more in sync w/ the semantics of the data structure (the index elements are indices which are ints). Changing it back. R=r CC=golang-dev https://golang.org/cl/5141049
2011-09-26regexp: move to old/regexp, replace with exp/regexpRuss Cox
R=golang-dev, r CC=golang-dev https://golang.org/cl/5127042
2011-09-20suffixarray: improved serialization codeRobert Griesemer
Use gobs to serialize indexes instead of encoding/binary. Even with gobs, serialize data in slices instead of applying gob to the entire data structure at once, to reduce the amount of extra buffer memory needed inside gob. 7x faster Write/Read for new BenchmarkSaveRestore compared to old code; possibly because encoding/binary is more expensive for int32 slice elements (interface call to get little/big endian encoding), while gob's encoding is fixed (unconfirmed). new (using gobs): suffixarray.BenchmarkSaveRestore 1 2153604000 ns/op old (using encoding/binary): suffixarray.BenchmarkSaveRestore 1 15118322000 ns/op The actual serialized data is slightly larger then using the old code for very large indices because full 32bit indices require 5bytes using gobs instead of 4bytes (encoding/binary) in serialized form. R=r CC=golang-dev https://golang.org/cl/5087041
2011-09-15index/suffixarray: support for serializationRobert Griesemer
R=r CC=golang-dev https://golang.org/cl/5040041
2011-09-12godoc, suffixarray: switch to exp/regexpRobert Griesemer
R=rsc CC=golang-dev https://golang.org/cl/4983058
2011-07-14go/printer: changed max. number of newlines from 3 to 2Robert Griesemer
manual changes in src/pkg/go/printer, src/cmd/gofix/signal_test.go (cd src/cmd/gofix/testdata; gofmt -w *.in *.out) (cd src/pkg/go/printer; gotest -update) gofmt -w misc src runs all tests R=golang-dev, rsc CC=golang-dev https://golang.org/cl/4715041
2011-07-08sort: rename helpers: s/Sort// in sort.Sort[Float64s|Ints|Strings]Andrew Gerrand
Includes 'sorthelpers' gofix and updates to tree. R=golang-dev, gri CC=golang-dev https://golang.org/cl/4631098
2011-04-20src/pkg: make package doc comments consistently start with "Package foo".Nigel Tao
R=rsc CC=golang-dev https://golang.org/cl/4442064
2011-01-24suffixarray: use binary search for both ends of LookupEric Eisner
This prevents many unnecessary comparisons when n is large. R=gri, gri1, rsc CC=golang-dev https://golang.org/cl/4068043
2011-01-11suffixarray: faster creation algorithmEric Eisner
This implements the algorithm qsufsort using the sort package as a sorting primitive. Its worst-case performance is O(N*log(N)), and it uses only an additional slice of N ints of memory during creation. Benchmarks (seconds): old new 10k nulls 149 0.044 1M English corpus 32.0 3.6 R=gri, gri1 CC=golang-dev https://golang.org/cl/3752044
2010-12-17suffixarray: implememted FindAllIndex regexp searchRobert Griesemer
Implementation uses fast suffixarray lookup to find initial matches if the regular expression starts with a suitable prefix without meta characters. R=r, rsc CC=golang-dev https://golang.org/cl/3720042
2010-12-14suffixarray: rename Data() -> Bytes()Robert Griesemer
R=rsc CC=golang-dev https://golang.org/cl/3540042
2010-12-13suffixarray: provide accessor to dataRobert Griesemer
R=r CC=golang-dev https://golang.org/cl/3574044
2010-12-08throughout: simplify two-variable ranges with unused second variableRyan Hitchman
R=golang-dev, gri CC=golang-dev https://golang.org/cl/3529041
2010-11-19index/suffixarray: use sort.SearchRuss Cox
R=gri CC=golang-dev https://golang.org/cl/3200041
2010-09-22suffixarray: cleanup per suggestion from Roger PeppeRobert Griesemer
R=rsc CC=golang-dev https://golang.org/cl/2213045
2010-09-21suffixarray: a package for creating suffixarray-based indexesRobert Griesemer
This is a replacement for pending CL 2219042. It only contains the raw suffixarray functionality with two methods: - New create a new index from some data - Lookup lookup occurences of a bytes slice in the data Any other functionality (dealing with multiple data sets and the corresponding position lists) is generic and doesn't have to be part of this package. Known performance bug: This implementation works fine for data sets up to several megabytes as long as it doesn't contain very long contiguous sequences of equal bytes. For instance, index creation for all .go files under GOROOT (250KLOCs, approx. 9MB) takes ~50s on 2.66 GHz Intel Xeon as long as test/fixedbugs/257.go is excluded. With that file, index creation times takes several days. 257.go contains a string of 1M smiley faces. There are more sophisticated suffixarray creation algorithms which can handle very long common prefixes. The implementation can be updated w/o the need to change the interface. R=rsc, r, PeterGo CC=golang-dev https://golang.org/cl/2265041