aboutsummaryrefslogtreecommitdiff
path: root/reftable
AgeCommit message (Collapse)Author
2024-08-01t: move reftable/pq_test.c to the unit testing frameworkChandra Pratap
reftable/pq_test.c exercises a priority queue defined by reftable/pq.{c, h}. Migrate reftable/pq_test.c to the unit testing framework. Migration involves refactoring the tests to use the unit testing framework instead of reftable's test framework, and renaming the tests to align with unit-tests' standards. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-01reftable: change the type of array indices to 'size_t' in reftable/pq.cChandra Pratap
The variables 'i', 'j', 'k' and 'min' are used as indices for 'pq->heap', which is an array. Additionally, 'pq->len' is of type 'size_t' and is often used to assign values to these variables. Hence, change the type of these variables from 'int' to 'size_t'. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-08-01reftable: remove unnecessary curly braces in reftable/pq.cChandra Pratap
According to Documentation/CodingGuidelines, control-flow statements with a single line as their body must omit curly braces. Make reftable/pq.c conform to this guideline. Besides that, remove unnecessary newlines and variable assignment. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-31Merge branch 'cp/unit-test-reftable-merged'Junio C Hamano
Another reftable test has been ported to use the unit test framework. * cp/unit-test-reftable-merged: t-reftable-merged: add test for REFTABLE_FORMAT_ERROR t-reftable-merged: use reftable_ref_record_equal to compare ref records t-reftable-merged: add tests for reftable_merged_table_max_update_index t-reftable-merged: improve the const-correctness of helper functions t-reftable-merged: improve the test t_merged_single_record() t: harmonize t-reftable-merged.c with coding guidelines t: move reftable/merged_test.c to the unit testing framework
2024-07-15Merge branch 'cp/unit-test-reftable-record'Junio C Hamano
A test in reftable library has been rewritten using the unit test framework. * cp/unit-test-reftable-record: t-reftable-record: add tests for reftable_log_record_compare_key() t-reftable-record: add tests for reftable_ref_record_compare_name() t-reftable-record: add index tests for reftable_record_is_deletion() t-reftable-record: add obj tests for reftable_record_is_deletion() t-reftable-record: add log tests for reftable_record_is_deletion() t-reftable-record: add ref tests for reftable_record_is_deletion() t-reftable-record: add comparison tests for obj records t-reftable-record: add comparison tests for index records t-reftable-record: add comparison tests for ref records t-reftable-record: add reftable_record_cmp() tests for log records t: move reftable/record_test.c to the unit testing framework
2024-07-12t: move reftable/merged_test.c to the unit testing frameworkChandra Pratap
reftable/merged_test.c exercises the functions defined in reftable/merged.{c, h}. Migrate reftable/merged_test.c to the unit testing framework. Migration involves refactoring the tests to use the unit testing framework instead of reftable's test framework and renaming the tests according to unit-tests' naming conventions. Also, move strbuf_add_void() and noop_flush() from reftable/test_framework.c to the ported test. This is because both these functions are used in the merged tests and reftable/test_framework.{c, h} is not #included in the ported test. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-07-02Merge branch 'ps/use-the-repository'Junio C Hamano
A CPP macro USE_THE_REPOSITORY_VARIABLE is introduced to help transition the codebase to rely less on the availability of the singleton the_repository instance. * ps/use-the-repository: hex: guard declarations with `USE_THE_REPOSITORY_VARIABLE` t/helper: remove dependency on `the_repository` in "proc-receive" t/helper: fix segfault in "oid-array" command without repository t/helper: use correct object hash in partial-clone helper compat/fsmonitor: fix socket path in networked SHA256 repos replace-object: use hash algorithm from passed-in repository protocol-caps: use hash algorithm from passed-in repository oidset: pass hash algorithm when parsing file http-fetch: don't crash when parsing packfile without a repo hash-ll: merge with "hash.h" refs: avoid include cycle with "repository.h" global: introduce `USE_THE_REPOSITORY_VARIABLE` macro hash: require hash algorithm in `empty_tree_oid_hex()` hash: require hash algorithm in `is_empty_{blob,tree}_oid()` hash: make `is_null_oid()` independent of `the_repository` hash: convert `oidcmp()` and `oideq()` to compare whole hash global: ensure that object IDs are always padded hash: require hash algorithm in `oidread()` and `oidclr()` hash: require hash algorithm in `hasheq()`, `hashcmp()` and `hashclr()` hash: drop (mostly) unused `is_empty_{blob,tree}_sha1()` functions
2024-07-02t: move reftable/record_test.c to the unit testing frameworkChandra Pratap
reftable/record_test.c exercises the functions defined in reftable/record.{c, h}. Migrate reftable/record_test.c to the unit testing framework. Migration involves refactoring the tests to use the unit testing framework instead of reftable's test framework, and renaming the tests to fit unit-tests' naming scheme. While at it, change the type of index variable 'i' to 'size_t' from 'int'. This is because 'i' is used in comparison against 'ARRAY_SIZE(x)' which is of type 'size_t'. Also, use set_hash() which is defined locally in the test file instead of set_test_hash() which is defined by reftable/test_framework.{c, h}. This is fine to do as both these functions are similarly implemented, and reftable/test_framework.{c, h} is not #included in the ported test. Get rid of reftable_record_print() from the tests as well, because it clutters the test framework's output and we have no way of verifying the output. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Acked-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-17Merge branch 'ps/no-writable-strings'Junio C Hamano
Building with "-Werror -Wwrite-strings" is now supported. * ps/no-writable-strings: (27 commits) config.mak.dev: enable `-Wwrite-strings` warning builtin/merge: always store allocated strings in `pull_twohead` builtin/rebase: always store allocated string in `options.strategy` builtin/rebase: do not assign default backend to non-constant field imap-send: fix leaking memory in `imap_server_conf` imap-send: drop global `imap_server_conf` variable mailmap: always store allocated strings in mailmap blob revision: always store allocated strings in output encoding remote-curl: avoid assigning string constant to non-const variable send-pack: always allocate receive status parse-options: cast long name for OPTION_ALIAS http: do not assign string constant to non-const field compat/win32: fix const-correctness with string constants pretty: add casts for decoration option pointers object-file: make `buf` parameter of `index_mem()` a constant object-file: mark cached object buffers as const ident: add casts for fallback name and GECOS entry: refactor how we remove items for delayed checkouts line-log: always allocate the output prefix line-log: stop assigning string constant to file parent buffer ...
2024-06-17Merge branch 'ps/ref-storage-migration'Junio C Hamano
A new command has been added to migrate a repository that uses the files backend for its ref storage to use the reftable backend, with limitations. * ps/ref-storage-migration: builtin/refs: new command to migrate ref storage formats refs: implement logic to migrate between ref storage formats refs: implement removal of ref storages worktree: don't store main worktree twice reftable: inline `merged_table_release()` refs/files: fix NULL pointer deref when releasing ref store refs/files: extract function to iterate through root refs refs/files: refactor `add_pseudoref_and_head_entries()` refs: allow to skip creation of reflog entries refs: pass storage format to `ref_store_init()` explicitly refs: convert ref storage format to an enum setup: unset ref storage when reinitializing repository version
2024-06-14hash-ll: merge with "hash.h"Patrick Steinhardt
The "hash-ll.h" header was introduced via d1cbe1e6d8 (hash-ll.h: split out of hash.h to remove dependency on repository.h, 2023-04-22) to make explicit the split between hash-related functions that rely on the global `the_repository`, and those that don't. This split is no longer necessary now that we we have removed the reliance on `the_repository`. Merge "hash-ll.h" back into "hash.h". This causes some code units to not include "repository.h" anymore, which requires us to add some forward declarations. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-13Merge branch 'ps/ref-storage-migration' into ps/use-the-repositoryJunio C Hamano
* ps/ref-storage-migration: builtin/refs: new command to migrate ref storage formats refs: implement logic to migrate between ref storage formats refs: implement removal of ref storages worktree: don't store main worktree twice reftable: inline `merged_table_release()` refs/files: fix NULL pointer deref when releasing ref store refs/files: extract function to iterate through root refs refs/files: refactor `add_pseudoref_and_head_entries()` refs: allow to skip creation of reflog entries refs: pass storage format to `ref_store_init()` explicitly refs: convert ref storage format to an enum setup: unset ref storage when reinitializing repository version
2024-06-12Merge branch 'cp/reftable-unit-test'Junio C Hamano
Basic unit tests for reftable have been reimplemented under the unit test framework. * cp/reftable-unit-test: t: improve the test-case for parse_names() t: add test for put_be16() t: move tests from reftable/record_test.c to the new unit test t: move tests from reftable/stack_test.c to the new unit test t: move reftable/basics_test.c to the unit testing framework
2024-06-07reftable: cast away constness when assigning constants to recordsPatrick Steinhardt
The reftable records are used in multiple ways throughout the reftable library. In many of those cases they merely act as input to a function without getting modified by it at all. Most importantly, this happens when writing records and when querying for records. We rely on this in our tests and thus assign string constants to those fields, which is about to generate warnings as those fields are of type `char *`. While we could go through the process and instead allocate those strings in all of our tests, this feels quite unnecessary. Instead, add casts to `char *` for all of those strings. As this is part of our tests, this also nicely serves as a demonstration that nothing writes or frees those string constants, which would otherwise lead to segfaults. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-07global: improve const correctness when assigning string constantsPatrick Steinhardt
We're about to enable `-Wwrite-strings`, which changes the type of string constants to `const char[]`. Fix various sites where we assign such constants to non-const variables. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-06-06reftable: inline `merged_table_release()`Patrick Steinhardt
The function `merged_table_release()` releases a merged table, whereas `reftable_merged_table_free()` releases a merged table and then also free's its pointer. But all callsites of `merged_table_release()` are in fact followed by `reftable_merged_table_free()`, which is redundant. Inline `merged_table_release()` into `reftable_merged_table_free()` to get rid of this redundance. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-30Merge branch 'ps/reftable-reusable-iterator'Junio C Hamano
Code clean-up to make the reftable iterator closer to be reusable. * ps/reftable-reusable-iterator: reftable/merged: adapt interface to allow reuse of iterators reftable/stack: provide convenience functions to create iterators reftable/reader: adapt interface to allow reuse of iterators reftable/generic: adapt interface to allow reuse of iterators reftable/generic: move seeking of records into the iterator reftable/merged: simplify indices for subiterators reftable/merged: split up initialization and seeking of records reftable/reader: set up the reader when initializing table iterator reftable/reader: inline `reader_seek_internal()` reftable/reader: separate concerns of table iter and reftable reader reftable/reader: unify indexed and linear seeking reftable/reader: avoid copying index iterator reftable/block: use `size_t` to track restart point index
2024-05-30t: move tests from reftable/record_test.c to the new unit testChandra Pratap
common_prefix_size(), get_be24() and put_be24() are functions defined in reftable/basics.{c, h}. Move the tests for these functions from reftable/record_test.c to the newly ported test. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-30t: move tests from reftable/stack_test.c to the new unit testChandra Pratap
parse_names() and names_equal() are functions defined in reftable/basics.{c, h}. Move the tests for these functions from reftable/stack_test.c to the newly ported test. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-30t: move reftable/basics_test.c to the unit testing frameworkChandra Pratap
reftable/basics_test.c exercise the functions defined in reftable/basics.{c, h}. Migrate reftable/basics_test.c to the unit testing framework. Migration involves refactoring the tests to use the unit testing framework instead of reftable's test framework. Mentored-by: Patrick Steinhardt <ps@pks.im> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Chandra Pratap <chandrapratap3519@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13reftable/merged: adapt interface to allow reuse of iteratorsPatrick Steinhardt
Refactor the interfaces exposed by `struct reftable_merged_table` and `struct merged_iter` such that they support iterator reuse. This is done by separating initialization of the iterator and seeking on it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13reftable/stack: provide convenience functions to create iteratorsPatrick Steinhardt
There exist a bunch of call sites in the reftable backend that want to create iterators for a reftable stack. This is rather convoluted right now, where you always have to go via the merged table. And it is about to become even more convoluted when we split up iterator initialization and seeking in the next commit. Introduce convenience functions that allow the caller to create an iterator from a reftable stack directly without going through the merged table. Adapt callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13reftable/reader: adapt interface to allow reuse of iteratorsPatrick Steinhardt
Refactor the interfaces exposed by `struct reftable_reader` and `struct table_iterator` such that they support iterator reuse. This is done by separating initialization of the iterator and seeking on it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13reftable/generic: adapt interface to allow reuse of iteratorsPatrick Steinhardt
Refactor the interfaces exposed by `struct reftable_table` and `struct reftable_iterator` such that they support iterator reuse. This is done by separating initialization of the iterator and seeking on it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13reftable/generic: move seeking of records into the iteratorPatrick Steinhardt
Reftable iterators are created by seeking on the parent structure of a corresponding record. For example, to create an iterator for the merged table you would call `reftable_merged_table_seek_ref()`. Most notably, it is not posible to create an iterator and then seek it afterwards. While this may be a bit easier to reason about, it comes with two significant downsides. The first downside is that the logic to find records is split up between the parent data structure and the iterator itself. Conceptually, it is more straight forward if all that logic was contained in a single place, which should be the iterator. The second and more significant downside is that it is impossible to reuse iterators for multiple seeks. Whenever you want to look up a record, you need to re-create the whole infrastructure again, which is quite a waste of time. Furthermore, it is impossible to optimize seeks, such as when seeking the same record multiple times. To address this, we essentially split up the concerns properly such that the parent data structure is responsible for setting up the iterator via a new `init_iter()` callback, whereas the iterator handles seeks via a new `seek()` callback. This will eventually allow us to call `seek()` on the iterator multiple times, where every iterator can potentially optimize for certain cases. Note that at this point in time we are not yet ready to reuse the iterators. This will be left for a future patch series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13reftable/merged: simplify indices for subiteratorsPatrick Steinhardt
When seeking on a merged table, we perform the seek for each of the subiterators. If the subiterator has the desired record we add it to the priority queue, otherwise we skip it and don't add it to the stack of subiterators hosted by the merged table. The consequence of this is that the index of the subiterator in the merged table does not necessarily correspond to the index of it in the merged iterator. Next to being potentially confusing, it also means that we won't easily be able to re-seek the merged iterator because we have no clear connection between both of the data structures. Refactor the code so that the index stays the same in both structures. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13reftable/merged: split up initialization and seeking of recordsPatrick Steinhardt
To initialize a `struct merged_iter`, we need to seek all subiterators to the wanted record and then add their results to the priority queue used to sort the records. This logic is split up across two functions, `merged_table_seek_record()` and `merged_iter_init()`. The scope of these functions is somewhat weird though, where `merged_iter_init()` is only responsible for adding the records of the subiterators to the priority queue. Clarify the scope of those functions such that `merged_iter_init()` is only responsible for initializing the iterator's structure. Performing the subiterator seeks are now part of `merged_table_seek_record()`. This step is required to move seeking of records into the generic `struct reftable_iterator` infrastructure. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13reftable/reader: set up the reader when initializing table iteratorPatrick Steinhardt
All the seeking functions accept a `struct reftable_reader` as input such that they can use the reader to look up the respective blocks. Refactor the code to instead set up the reader as a member of `struct table_iter` during initialization such that we don't have to pass the reader on every single call. This step is required to move seeking of records into the generic `struct reftable_iterator` infrastructure. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13reftable/reader: inline `reader_seek_internal()`Patrick Steinhardt
We have both `reader_seek()` and `reader_seek_internal()`, where the former function only exists so that we can exit early in case the given table has no records of the sought-after type. Merge these two functions into one. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13reftable/reader: separate concerns of table iter and reftable readerPatrick Steinhardt
In "reftable/reader.c" we implement two different interfaces: - The reftable reader contains the logic to read reftables. - The table iterator is used to iterate through a single reftable read by the reader. The way those two types are used in the code is somewhat confusing though because seeking inside a table is implemented as if it was part of the reftable reader, even though it is ultimately more of a detail implemented by the table iterator. Make the boundary between those two types clearer by renaming functions that seek records in a table such that they clearly belong to the table iterator's logic. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13reftable/reader: unify indexed and linear seekingPatrick Steinhardt
In `reader_seek_internal()` we either end up doing an indexed seek when there is one or a linear seek otherwise. These two code paths are disjunct without a good reason, where the indexed seek will cause us to exit early. Refactor the two code paths such that it becomes possible to share a bit more code between them. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13reftable/reader: avoid copying index iteratorPatrick Steinhardt
When doing an indexed seek we need to walk down the multi-level index until we finally hit a record of the desired indexed type. This loop performs a copy of the index iterator on every iteration, which is both hard to understand and completely unnecessary. Refactor the code so that we use a single iterator to walk down the indices, only. Note that while this should improve performance, the improvement is negligible in all but the most unreasonable repositories. This is because the effect is only really noticeable when we have to walk down many levels of indices, which is not something that a repository would typically have. So the motivation for this change is really only about readability. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13reftable/block: use `size_t` to track restart point indexPatrick Steinhardt
The function `block_reader_restart_offset()` gets the offset of the `i`th restart point. `i` is a signed integer though, which is certainly not the correct type to track indices like this. Furthermore, both callers end up passing a `size_t`. Refactor the code to use a `size_t` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13reftable: make the compaction factor configurablePatrick Steinhardt
When auto-compacting, the reftable library packs references such that the sizes of the tables form a geometric sequence. The factor for this geometric sequence is hardcoded to 2 right now. We're about to expose this as a config option though, so let's expose the factor via write options. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13reftable: use `uint16_t` to track restart intervalPatrick Steinhardt
The restart interval can at most be `UINT16_MAX` as specified in the technical documentation of the reftable format. Furthermore, it cannot ever be negative. Regardless of that we use an `int` to track the restart interval. Change the type to use an `uint16_t` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13reftable/dump: support dumping a table's block structurePatrick Steinhardt
We're about to introduce new configs that will allow users to have more control over how exactly reftables are written. To verify that these configs are effective we will need to take a peak into the actual blocks written by the reftable backend. Introduce a new mode to the dumping logic that prints out the block structure. This logic can be invoked via `test-tool dump-reftables -b`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13reftable/writer: improve error when passed an invalid block sizePatrick Steinhardt
The reftable format only supports block sizes up to 16MB. When the writer is being passed a value bigger than that it simply calls abort(3P), which isn't all that helpful due to the lack of a proper error message. Improve this by calling `BUG()` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13reftable/writer: drop static variable used to initialize strbufPatrick Steinhardt
We have a static variable in the reftable writer code that is merely used to initialize the `last_key` of the writer. Convert the code to instead use `strbuf_init()` and drop the variable. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13reftable: pass opts as constant pointerPatrick Steinhardt
We sometimes pass the refatble write options as value and sometimes as a pointer. This is quite confusing and makes the reader wonder whether the options get modified sometimes. In fact, `reftable_new_writer()` does cause the caller-provided options to get updated when some values aren't set up. This is quite unexpected, but didn't cause any harm until now. Adapt the code so that we do not modify the caller-provided values anymore. While at it, refactor the code to code to consistently pass the options as a constant pointer to clarify that the caller-provided opts will not ever get modified. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-13reftable: consistently refer to `reftable_write_options` as `opts`Patrick Steinhardt
Throughout the reftable library the `reftable_write_options` are sometimes referred to as `cfg` and sometimes as `opts`. Unify these to consistently use `opts` to avoid confusion. While at it, touch up the coding style a bit by removing unneeded braces around one-line statements and newlines between variable declarations. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-05-08Merge branch 'ps/reftable-write-optim'Junio C Hamano
Code to write out reftable has seen some optimization and simplification. * ps/reftable-write-optim: reftable/block: reuse compressed array reftable/block: reuse zstream when writing log blocks reftable/writer: reset `last_key` instead of releasing it reftable/writer: unify releasing memory reftable/writer: refactorings for `writer_flush_nonempty_block()` reftable/writer: refactorings for `writer_add_record()` refs/reftable: don't recompute committer ident reftable: remove name checks refs/reftable: skip duplicate name checks refs/reftable: perform explicit D/F check when writing symrefs refs/reftable: fix D/F conflict error message on ref copy
2024-04-23Merge branch 'ps/reftable-block-iteration-optim'Junio C Hamano
The code to iterate over reftable blocks has seen some optimization to reduce memory allocation and deallocation. * ps/reftable-block-iteration-optim: reftable/block: avoid copying block iterators on seek reftable/block: reuse `zstream` state on inflation reftable/block: open-code call to `uncompress2()` reftable/block: reuse uncompressed blocks reftable/reader: iterate to next block in place reftable/block: move ownership of block reader into `struct table_iter` reftable/block: introduce `block_reader_release()` reftable/block: better grouping of functions reftable/block: merge `block_iter_seek()` and `block_reader_seek()` reftable/block: rename `block_reader_start()`
2024-04-16Merge branch 'jt/reftable-geometric-compaction'Junio C Hamano
The strategy to compact multiple tables of reftables after many operations accumulate many entries has been improved to avoid accumulating too many tables uncollected. * jt/reftable-geometric-compaction: reftable/stack: use geometric table compaction reftable/stack: add env to disable autocompaction reftable/stack: expose option to disable auto-compaction
2024-04-15reftable/block: avoid copying block iterators on seekPatrick Steinhardt
When seeking a reftable record in a block we need to position the iterator _before_ the sought-after record so that the next call to `block_iter_next()` would yield that record. To achieve this, the loop that performs the linear seeks to restore the previous position once it has found the record. This is done by advancing two `block_iter`s: one to check whether the next record is our sought-after record, and one that we update after every iteration. This of course involves quite a lot of copying and also leads to needless memory allocations. Refactor the code to get rid of the `next` iterator and the copying this involves. Instead, we can restore the previous offset such that the call to `next` will return the correct record. Next to being simpler conceptually this also leads to a nice speedup. The following benchmark parser 10k refs out of 100k existing refs via `git-rev-list --no-walk`: Benchmark 1: rev-list: print many refs (HEAD~) Time (mean ± σ): 170.2 ms ± 1.7 ms [User: 86.1 ms, System: 83.6 ms] Range (min … max): 166.4 ms … 180.3 ms 500 runs Benchmark 2: rev-list: print many refs (HEAD~) Time (mean ± σ): 161.6 ms ± 1.6 ms [User: 78.1 ms, System: 83.0 ms] Range (min … max): 158.4 ms … 172.3 ms 500 runs Summary rev-list: print many refs (HEAD) ran 1.05 ± 0.01 times faster than rev-list: print many refs (HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-15reftable/block: reuse `zstream` state on inflationPatrick Steinhardt
When calling `inflateInit()` and `inflate()`, the zlib library will allocate several data structures for the underlying `zstream` to keep track of various information. Thus, when inflating repeatedly, it is possible to optimize memory allocation patterns by reusing the `zstream` and then calling `inflateReset()` on it to prepare it for the next chunk of data to inflate. This is exactly what the reftable code is doing: when iterating through reflogs we need to potentially inflate many log blocks, but we discard the `zstream` every single time. Instead, as we reuse the `block_reader` for each of the blocks anyway, we can initialize the `zstream` once and then reuse it for subsequent inflations. Refactor the code to do so, which leads to a significant reduction in the number of allocations. The following measurements were done when iterating through 1 million reflog entries. Before: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 23,028 allocs, 22,906 frees, 162,813,552 bytes allocated After: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 302 allocs, 180 frees, 88,352 bytes allocated Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-15reftable/block: open-code call to `uncompress2()`Patrick Steinhardt
The reftable format stores log blocks in a compressed format. Thus, whenever we want to read such a block we first need to decompress it. This is done by calling the convenience function `uncompress2()` of the zlib library, which is a simple wrapper that manages the lifecycle of the `zstream` structure for us. While nice for one-off inflation of data, when iterating through reflogs we will likely end up inflating many such log blocks. This requires us to reallocate the state of the `zstream` every single time, which adds up over time. It would thus be great to reuse the `zstream` instead of discarding it after every inflation. Open-code the call to `uncompress2()` such that we can start reusing the `zstream` in the subsequent commit. Note that our open-coded variant is different from `uncompress2()` in two ways: - We do not loop around `inflate()` until we have processed all input. As our input is limited by the maximum block size, which is 16MB, we should not hit limits of `inflate()`. - We use `Z_FINISH` instead of `Z_NO_FLUSH`. Quoting the `inflate()` documentation: "inflate() should normally be called until it returns Z_STREAM_END or an error. However if all decompression is to be performed in a single step (a single call of inflate), the parameter flush should be set to Z_FINISH." Furthermore, "Z_FINISH also informs inflate to not maintain a sliding window if the stream completes, which reduces inflate's memory footprint." Other than that this commit is expected to be functionally equivalent and does not yet reuse the `zstream`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-15reftable/block: reuse uncompressed blocksPatrick Steinhardt
The reftable backend stores reflog entries in a compressed format and thus needs to uncompress blocks before one can read records from it. For each reflog block we thus have to allocate an array that we can decompress the block contents into. This block is being discarded whenever the table iterator moves to the next block. Consequently, we reallocate a new array on every block, which is quite wasteful. Refactor the code to reuse the uncompressed block data when moving the block reader to a new block. This significantly reduces the number of allocations when iterating through many compressed blocks. The following measurements are done with `git reflog list` when listing 100k reflogs. Before: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 45,755 allocs, 45,633 frees, 254,779,456 bytes allocated After: HEAP SUMMARY: in use at exit: 13,473 bytes in 122 blocks total heap usage: 23,028 allocs, 22,906 frees, 162,813,547 bytes allocated Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-15reftable/reader: iterate to next block in placePatrick Steinhardt
The table iterator has to iterate towards the next block once it has yielded all records of the current block. This is done by creating a new table iterator, initializing it to the next block, releasing the old iterator and then copying over the data. Refactor the code to instead advance the table iterator in place. This is simpler and unlocks some optimizations in subsequent patches. Also, it allows us to avoid some allocations. The following measurements show a single matching ref out of 1 million refs. Before this change: HEAP SUMMARY: in use at exit: 13,603 bytes in 125 blocks total heap usage: 7,235 allocs, 7,110 frees, 301,481 bytes allocated After: HEAP SUMMARY: in use at exit: 13,603 bytes in 125 blocks total heap usage: 315 allocs, 190 frees, 107,027 bytes allocated Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-15reftable/block: move ownership of block reader into `struct table_iter`Patrick Steinhardt
The table iterator allows the caller to iterate through all records in a reftable table. To do so it iterates through all blocks of the desired type one by one, where for each block it creates a new block iterator and yields all its entries. One of the things that is somewhat confusing in this context is who owns the block reader that is being used to read the blocks and pass them to the block iterator. Intuitively, as the table iterator is responsible for iterating through the blocks, one would assume that this iterator is also responsible for managing the lifecycle of the reader. And while it somewhat is, the block reader is ultimately stored inside of the block iterator. Refactor the code such that the block reader is instead fully managed by the table iterator. Instead of passing the reader to the block iterator, we now only end up passing the block data to it. Despite clearing up the lifecycle of the reader, it will also allow for better reuse of the reader in subsequent patches. The following benchmark prints a single matching ref out of 1 million refs. Before: HEAP SUMMARY: in use at exit: 13,603 bytes in 125 blocks total heap usage: 6,607 allocs, 6,482 frees, 509,635 bytes allocated After: HEAP SUMMARY: in use at exit: 13,603 bytes in 125 blocks total heap usage: 7,235 allocs, 7,110 frees, 301,481 bytes allocated Note that while there are more allocation and free calls now, the overall number of bytes allocated is significantly lower. The number of allocations will be reduced significantly by the next patch though. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-04-15reftable/block: introduce `block_reader_release()`Patrick Steinhardt
Introduce a new function `block_reader_release()` that releases resources acquired by the block reader. This function will be extended in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>