aboutsummaryrefslogtreecommitdiff
path: root/reftable
AgeCommit message (Collapse)Author
2024-03-04reftable/merged: make subiters own their recordsPatrick Steinhardt
For each subiterator, the merged table needs to track their current record. This record is owned by the priority queue though instead of by the merged iterator. This is not optimal performance-wise. For one, we need to move around records whenever we add or remove a record from the priority queue. Thus, the bigger the entries the more bytes we need to copy around. And compared to pointers, a reftable record is rather on the bigger side. The other issue is that this makes it harder to reuse the records. Refactor the code so that the merged iterator tracks ownership of the records per-subiter. Instead of having records in the priority queue, we can now use mere pointers to the per-subiter records. This also allows us to swap records between the caller and the per-subiter record instead of doing an actual copy via `reftable_record_copy_from()`, which removes the need to release the caller-provided record. This results in a noticeable speedup when iterating through many refs. The following benchmark iterates through 1 million refs: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 145.5 ms ± 4.5 ms [User: 142.5 ms, System: 2.8 ms] Range (min … max): 141.3 ms … 177.0 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 139.0 ms ± 4.7 ms [User: 136.1 ms, System: 2.8 ms] Range (min … max): 134.2 ms … 182.2 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.05 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~) This refactoring also allows a subsequent refactoring where we start reusing memory allocated by the reftable records because we do not need to release the caller-provided record anymore. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-04reftable/merged: advance subiter on subsequent iterationPatrick Steinhardt
When advancing the merged iterator, we pop the topmost entry from its priority queue and then advance the sub-iterator that the entry belongs to, adding the result as a new entry. This is quite sensible in the case where the merged iterator is used to actually iterate through records. But the merged iterator is also used when we look up a single record, only, so advancing the sub-iterator is wasted effort because we would never even look at the result. Instead of immediately advancing the sub-iterator, we can also defer this to the next iteration of the merged iterator by storing the intent-to-advance. This results in a small speedup when reading many records. The following benchmark creates 10000 refs, which will also end up with many ref lookups: Benchmark 1: update-ref: create many refs (revision = HEAD~) Time (mean ± σ): 337.2 ms ± 7.3 ms [User: 200.1 ms, System: 136.9 ms] Range (min … max): 329.3 ms … 373.2 ms 100 runs Benchmark 2: update-ref: create many refs (revision = HEAD) Time (mean ± σ): 332.5 ms ± 5.9 ms [User: 197.2 ms, System: 135.1 ms] Range (min … max): 327.6 ms … 359.8 ms 100 runs Summary update-ref: create many refs (revision = HEAD) ran 1.01 ± 0.03 times faster than update-ref: create many refs (revision = HEAD~) While this speedup alone isn't really worth it, this refactoring will also allow two additional optimizations in subsequent patches. First, it will allow us to special-case when there is only a single sub-iter left to circumvent the priority queue altogether. And second, it makes it easier to avoid copying records to the caller. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-04reftable/merged: make `merged_iter` structure privatePatrick Steinhardt
The `merged_iter` structure is not used anywhere outside of "merged.c", but is declared in its header. Move it into the code file so that it is clear that its implementation details are never exposed to anything. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-04reftable/pq: use `size_t` to track iterator indexPatrick Steinhardt
The reftable priority queue is used by the merged iterator to yield records from its sub-iterators in the expected order. Each entry has a record corresponding to such a sub-iterator as well as an index that indicates which sub-iterator the record belongs to. But while the sub-iterators are tracked with a `size_t`, we store the index as an `int` in the entry. Fix this and use `size_t` consistently. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-26Merge branch 'ps/reftable-iteration-perf'Junio C Hamano
The code to iterate over refs with the reftable backend has seen some optimization. * ps/reftable-iteration-perf: reftable/reader: add comments to `table_iter_next()` reftable/record: don't try to reallocate ref record name reftable/block: swap buffers instead of copying reftable/pq: allocation-less comparison of entry keys reftable/merged: skip comparison for records of the same subiter reftable/merged: allocation-less dropping of shadowed records reftable/record: introduce function to compare records by key
2024-02-13Merge branch 'jc/comment-style-fixes' into maint-2.43Junio C Hamano
Rewrite //-comments to /* comments */ in files whose comments prevalently use the latter. * jc/comment-style-fixes: reftable/pq_test: comment style fix merge-ort.c: comment style fix builtin/worktree: comment style fixes
2024-02-12Merge branch 'ps/reftable-styles'Junio C Hamano
Code clean-up in various reftable code paths. * ps/reftable-styles: reftable/record: improve semantics when initializing records reftable/merged: refactor initialization of iterators reftable/merged: refactor seeking of records reftable/stack: use `size_t` to track stack length reftable/stack: use `size_t` to track stack slices during compaction reftable/stack: index segments with `size_t` reftable/stack: fix parameter validation when compacting range reftable: introduce macros to allocate arrays reftable: introduce macros to grow arrays
2024-02-12Merge branch 'ps/reftable-multi-level-indices-fix'Junio C Hamano
Write multi-level indices for reftable has been corrected. * ps/reftable-multi-level-indices-fix: reftable: document reading and writing indices reftable/writer: fix writing multi-level indices reftable/writer: simplify writing index records reftable/writer: use correct type to iterate through index entries reftable/reader: be more careful about errors in indexed seeks
2024-02-12reftable/reader: add comments to `table_iter_next()`Patrick Steinhardt
While working on the optimizations in the preceding patches I stumbled upon `table_iter_next()` multiple times. It is quite easy to miss the fact that we don't call `table_iter_next_in_block()` twice, but that the second call is in fact `table_iter_next_block()`. Add comments to explain what exactly is going on here to make things more obvious. While at it, touch up the code to conform to our code style better. Note that one of the refactorings merges two conditional blocks into one. Before, we had the following code: ``` err = table_iter_next_block(&next, ti); if (err != 0) { ti->is_finished = 1; } table_iter_block_done(ti); if (err != 0) { return err; } ``` As `table_iter_block_done()` does not care about `is_finished`, the conditional blocks can be merged into one block: ``` err = table_iter_next_block(&next, ti); table_iter_block_done(ti); if (err != 0) { ti->is_finished = 1; return err; } ``` This is both easier to reason about and more performant because we have one branch less. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12reftable/record: don't try to reallocate ref record namePatrick Steinhardt
When decoding reftable ref records we first release the pointer to the record passed to us and then use realloc(3P) to allocate the refname array. This is a bit misleading though as we know at that point that the refname will always be `NULL`, so we would always end up allocating a new char array anyway. Refactor the code to use `REFTABLE_ALLOC_ARRAY()` instead. As the following benchmark demonstrates this is a tiny bit more efficient. But the bigger selling point really is the gained clarity. Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 150.1 ms ± 4.1 ms [User: 146.6 ms, System: 3.3 ms] Range (min … max): 144.5 ms … 180.5 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 148.9 ms ± 4.5 ms [User: 145.2 ms, System: 3.4 ms] Range (min … max): 143.0 ms … 185.4 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.01 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~) Ideally, we should try and reuse the memory of the old record instead of first freeing and then immediately reallocating it. This requires some more surgery though and is thus left for a future iteration. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12reftable/block: swap buffers instead of copyingPatrick Steinhardt
When iterating towards the next record in a reftable block we need to keep track of the key that the last record had. This is required because reftable records use prefix compression, where subsequent records may reuse parts of their preceding record's key. This key is stored in the `block_iter::last_key`, which we update after every call to `block_iter_next()`: we simply reset the buffer and then add the current key to it. This is a bit inefficient though because it requires us to copy over the key on every iteration, which adds up when iterating over many records. Instead, we can make use of the fact that the `block_iter::key` buffer is basically only a scratch buffer. So instead of copying over contents, we can just swap both buffers. The following benchmark prints a single ref matching a specific pattern out of 1 million refs via git-show-ref(1): Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 155.7 ms ± 5.0 ms [User: 152.1 ms, System: 3.4 ms] Range (min … max): 150.8 ms … 185.7 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 150.8 ms ± 4.2 ms [User: 147.1 ms, System: 3.5 ms] Range (min … max): 145.1 ms … 180.7 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.03 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12reftable/pq: allocation-less comparison of entry keysPatrick Steinhardt
The priority queue is used by the merged iterator to iterate over reftable records from multiple tables in the correct order. The queue ends up having one record for each table that is being iterated over, with the record that is supposed to be shown next at the top. For example, the key of a ref record is equal to its name so that we end up sorting the priority queue lexicographically by ref name. To figure out the order we need to compare the reftable record keys with each other. This comparison is done by formatting them into a `struct strbuf` and then doing `strbuf_strcmp()` on the result. We then discard the buffers immediately after the comparison. This ends up being very expensive. Because the priority queue usually contains as many records as we have tables, we call the comparison function `O(log($tablecount))` many times for every record we insert. Furthermore, when iterating over many refs, we will insert at least one record for every ref we are iterating over. So ultimately, this ends up being called `O($refcount * log($tablecount))` many times. Refactor the code to use the new `refatble_record_cmp()` function that has been implemented in a preceding commit. This function does not need to allocate memory and is thus significantly more efficient. The following benchmark prints a single ref matching a specific pattern out of 1 million refs via git-show-ref(1), where the reftable stack consists of three tables: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 224.4 ms ± 6.5 ms [User: 220.6 ms, System: 3.6 ms] Range (min … max): 216.5 ms … 261.1 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 172.9 ms ± 4.4 ms [User: 169.2 ms, System: 3.6 ms] Range (min … max): 166.5 ms … 204.6 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.30 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12reftable/merged: skip comparison for records of the same subiterPatrick Steinhardt
When retrieving the next entry of a merged iterator we need to drop all records of other sub-iterators that would be shadowed by the record that we are about to return. We do this by comparing record keys, dropping all keys that are smaller or equal to the key of the record we are about to return. There is an edge case here where we can skip that comparison: when the record in the priority queue comes from the same subiterator as the record we are about to return then we know that its key must be larger than the key of the record we are about to return. This property is guaranteed by the sub-iterators, and if it didn't hold then the whole merged iterator would return records in the wrong order, too. While this may seem like a very specific edge case it's in fact quite likely to happen. For most repositories out there you can assume that we will end up with one large table and several smaller ones on top of it. Thus, it is very likely that the next entry will sort towards the top of the priority queue. Special case this and break out of the loop in that case. The following benchmark uses git-show-ref(1) to print a single ref matching a pattern out of 1 million refs: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 162.6 ms ± 4.5 ms [User: 159.0 ms, System: 3.5 ms] Range (min … max): 156.6 ms … 188.5 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 156.8 ms ± 4.7 ms [User: 153.0 ms, System: 3.6 ms] Range (min … max): 151.4 ms … 188.4 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.04 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12reftable/merged: allocation-less dropping of shadowed recordsPatrick Steinhardt
The purpose of the merged reftable iterator is to iterate through all entries of a set of tables in the correct order. This is implemented by using a sub-iterator for each table, where the next entry of each of these iterators gets put into a priority queue. For each iteration, we do roughly the following steps: 1. Retrieve the top record of the priority queue. This is the entry we want to return to the caller. 2. Retrieve the next record of the sub-iterator that this record came from. If any, add it to the priority queue at the correct position. The position is determined by comparing the record keys, which e.g. corresponds to the refname for ref records. 3. Keep removing the top record of the priority queue until we hit the first entry whose key is larger than the returned record's key. This is required to drop "shadowed" records. The last step will lead to at least one comparison to the next entry, but may lead to many comparisons in case the reftable stack consists of many tables with shadowed records. It is thus part of the hot code path when iterating through records. The code to compare the entries with each other is quite inefficient though. Instead of comparing record keys with each other directly, we first format them into `struct strbuf`s and only then compare them with each other. While we already optimized this code path to reuse buffers in 829231dc20 (reftable/merged: reuse buffer to compute record keys, 2023-12-11), the cost to format the keys into the buffers still adds up quite significantly. Refactor the code to use `reftable_record_cmp()` instead, which has been introduced in the preceding commit. This function compares records with each other directly without requiring any memory allocations or copying and is thus way more efficient. The following benchmark uses git-show-ref(1) to print a single ref matching a pattern out of 1 million refs. This is the most direct way to exercise ref iteration speed as we remove all overhead of having to show the refs, too. Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 180.7 ms ± 4.7 ms [User: 177.1 ms, System: 3.4 ms] Range (min … max): 174.9 ms … 211.7 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 162.1 ms ± 4.4 ms [User: 158.5 ms, System: 3.4 ms] Range (min … max): 155.4 ms … 189.3 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.11 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-12reftable/record: introduce function to compare records by keyPatrick Steinhardt
In some places we need to sort reftable records by their keys to determine their ordering. This is done by first formatting the keys into a `struct strbuf` and then using `strbuf_cmp()` to compare them. This logic is needlessly roundabout and can end up costing quite a bit of CPU cycles, both due to the allocation and formatting logic. Introduce a new `reftable_record_cmp()` function that knows how to compare two records with each other without requiring allocations. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-08Merge branch 'en/header-cleanup' into maint-2.43Junio C Hamano
Remove unused header "#include". * en/header-cleanup: treewide: remove unnecessary includes in source files treewide: add direct includes currently only pulled in transitively trace2/tr2_tls.h: remove unnecessary include submodule-config.h: remove unnecessary include pkt-line.h: remove unnecessary include line-log.h: remove unnecessary include http.h: remove unnecessary include fsmonitor--daemon.h: remove unnecessary includes blame.h: remove unnecessary includes archive.h: remove unnecessary include treewide: remove unnecessary includes in source files treewide: remove unnecessary includes from header files
2024-02-06Merge branch 'jc/comment-style-fixes'Junio C Hamano
Rewrite //-comments to /* comments */ in files whose comments prevalently use the latter. * jc/comment-style-fixes: reftable/pq_test: comment style fix merge-ort.c: comment style fix builtin/worktree: comment style fixes
2024-02-06Merge branch 'ps/reftable-compacted-tables-permission-fix'Junio C Hamano
Reftable bugfix. * ps/reftable-compacted-tables-permission-fix: reftable/stack: adjust permissions of compacted tables
2024-02-06Merge branch 'jc/reftable-core-fsync'Junio C Hamano
The write codepath for the reftable data learned to honor core.fsync configuration. * jc/reftable-core-fsync: reftable/stack: fsync "tables.list" during compaction reftable: honor core.fsync
2024-02-06reftable/record: improve semantics when initializing recordsPatrick Steinhardt
According to our usual coding style, the `reftable_new_record()` function would indicate that it is allocating a new record. This is not the case though as the function merely initializes records without allocating any memory. Replace `reftable_new_record()` with a new `reftable_record_init()` function that takes a record pointer as input and initializes it accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-06reftable/merged: refactor initialization of iteratorsPatrick Steinhardt
Refactor the initialization of the merged iterator to fit our code style better. This refactoring prepares the code for a refactoring of how records are being initialized. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-06reftable/merged: refactor seeking of recordsPatrick Steinhardt
The code to seek reftable records in the merged table code is quite hard to read and does not conform to our coding style in multiple ways: - We have multiple exit paths where we release resources even though that is not really necessary. - We use a scoped error variable `e` which is hard to reason about. This variable is not required at all. - We allocate memory in the variable declarations, which is easy to miss. Refactor the function so that it becomes more maintainable in the future. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-06reftable/stack: use `size_t` to track stack lengthPatrick Steinhardt
While the stack length is already stored as `size_t`, we frequently use `int`s to refer to those stacks throughout the reftable library. Convert those cases to use `size_t` instead to make things consistent. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-06reftable/stack: use `size_t` to track stack slices during compactionPatrick Steinhardt
We use `int`s to track reftable slices when compacting the reftable stack, which is considered to be a code smell in the Git project. Convert the code to use `size_t` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-06reftable/stack: index segments with `size_t`Patrick Steinhardt
We use `int`s to index into arrays of segments and track the length of them, which is considered to be a code smell in the Git project. Convert the code to use `size_t` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-06reftable/stack: fix parameter validation when compacting rangePatrick Steinhardt
The `stack_compact_range()` function receives a "first" and "last" index that indicates which tables of the reftable stack should be compacted. Naturally, "first" must be smaller than "last" in order to identify a proper range of tables to compress, which we indeed also assert in the function. But the validations happens after we have already allocated arrays with a size of `last - first + 1`, leading to an underflow and thus an invalid allocation size. Fix this by reordering the array allocations to happen after we have validated parameters. While at it, convert the array allocations to use the newly introduced macros. Note that the relevant variables pointing into arrays should also be converted to use `size_t` instead of `int`. This is left for a later commit in this series. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-06reftable: introduce macros to allocate arraysPatrick Steinhardt
Similar to the preceding commit, let's carry over macros to allocate arrays with `REFTABLE_ALLOC_ARRAY()` and `REFTABLE_CALLOC_ARRAY()`. This requires us to change the signature of `reftable_calloc()`, which only takes a single argument right now and thus puts the burden on the caller to calculate the final array's size. This is a net improvement though as it means that we can now provide proper overflow checks when multiplying the array size with the member size. Convert callsites of `reftable_calloc()` to the new signature and start using the new macros where possible. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-06reftable: introduce macros to grow arraysPatrick Steinhardt
Throughout the reftable library we have many cases where we need to grow arrays. In order to avoid too many reallocations, we roughly double the capacity of the array on each iteration. The resulting code pattern is duplicated across many sites. We have similar patterns in our main codebase, which is why we have eventually introduced an `ALLOC_GROW()` macro to abstract it away and avoid some code duplication. We cannot easily reuse this macro here though because `ALLOC_GROW()` uses `REALLOC_ARRAY()`, which in turn will call realloc(3P) to grow the array. The reftable code is structured as a library though (even if the boundaries are fuzzy), and one property this brings with it is that it is possible to plug in your own allocators. So instead of using realloc(3P), we need to use `reftable_realloc()` that knows to use the user-provided implementation. So let's introduce two new macros `REFTABLE_REALLOC_ARRAY()` and `REFTABLE_ALLOC_GROW()` that mirror what we do in our main codebase, with two modifications: - They use `reftable_realloc()`, as explained above. - They use a different growth factor of `2 * cap + 1` instead of `(cap + 16) * 3 / 2`. The second change is because we know a bit more about the allocation patterns in the reftable library. In most cases, we end up only having a handful of items in the array and don't end up growing them. The initial capacity that our normal growth factor uses (which is 24) would thus end up over-allocating in a lot of code paths. This effect is measurable: - Before change: HEAP SUMMARY: in use at exit: 671,983 bytes in 152 blocks total heap usage: 3,843,446 allocs, 3,843,294 frees, 223,761,402 bytes allocated - After change with a growth factor of `(2 * alloc + 1)`: HEAP SUMMARY: in use at exit: 671,983 bytes in 152 blocks total heap usage: 3,843,446 allocs, 3,843,294 frees, 223,761,410 bytes allocated - After change with a growth factor of `(alloc + 16)* 2 / 3`: HEAP SUMMARY: in use at exit: 671,983 bytes in 152 blocks total heap usage: 3,833,673 allocs, 3,833,521 frees, 4,728,251,742 bytes allocated While the total heap usage is roughly the same, we do end up allocating significantly more bytes with our usual growth factor (in fact, roughly 21 times as many). Convert the reftable library to use these new macros. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-01reftable: document reading and writing indicesPatrick Steinhardt
The way the index gets written and read is not trivial at all and requires the reader to piece together a bunch of parts to figure out how it works. Add some documentation to hopefully make this easier to understand for the next reader. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-01reftable/writer: fix writing multi-level indicesPatrick Steinhardt
When finishing a section we will potentially write an index that makes it more efficient to look up relevant blocks. The index records written will encode, for each block of the indexed section, what the offset of that block is as well as the last key of that block. Thus, the reader would iterate through the index records to find the first key larger or equal to the wanted key and then use the encoded offset to look up the desired block. When there are a lot of blocks to index though we may end up writing multiple index blocks, too. To not require a linear search across all index blocks we instead end up writing a multi-level index. Instead of referring to the block we are after, an index record may point to another index block. The reader will then access the highest-level index and follow down the chain of index blocks until it hits the sought-after block. It has been observed though that it is impossible to seek ref records of the last ref block when using a multi-level index. While the multi-level index exists and looks fine for most of the part, the highest-level index was missing an index record pointing to the last block of the next index. Thus, every additional level made more refs become unseekable at the end of the ref section. The root cause is that we are not flushing the last block of the current level once done writing the level. Consequently, it wasn't recorded in the blocks that need to be indexed by the next-higher level and thus we forgot about it. Fix this bug by flushing blocks after we have written all index records. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-01reftable/writer: simplify writing index recordsPatrick Steinhardt
When finishing the current section some index records might be written for the section to the table. The logic that adds these records to the writer duplicates what we already have in `writer_add_record()`, making this more complicated than it really has to be. Simplify the code by using `writer_add_record()` instead. While at it, drop the unneeded braces around a loop to make the code conform to our code style better. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-01reftable/writer: use correct type to iterate through index entriesPatrick Steinhardt
The reftable writer is tracking the number of blocks it has to index via the `index_len` variable. But while this variable is of type `size_t`, some sites use an `int` to loop through the index entries. Convert the code to consistently use `size_t`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-01reftable/reader: be more careful about errors in indexed seeksPatrick Steinhardt
When doing an indexed seek we first need to do a linear seek in order to find the index block for our wanted key. We do not check the returned error of the linear seek though. This is likely not an issue because the next call to `table_iter_next()` would return error, too. But it very much is a code smell when an error variable is being assigned to without actually checking it. Safeguard the code by checking for errors. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-30reftable/stack: fsync "tables.list" during compactionPatrick Steinhardt
In 1df18a1c9a (reftable: honor core.fsync, 2024-01-23), we have added code to fsync both newly written reftables as well as "tables.list" to disk. But there are two code paths where "tables.list" is being written: - When appending a new table due to a normal ref update. - When compacting a range of tables during compaction. We have only addressed the former code path, but do not yet sync the new "tables.list" file in the latter. Fix this omission. Note that we are not yet adding any tests. These tests will be added once the "reftable" backend has been upstreamed. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-29Merge branch 'ps/reftable-optimize-io'Junio C Hamano
Low-level I/O optimization for reftable. * ps/reftable-optimize-io: reftable/stack: fix race in up-to-date check reftable/stack: unconditionally reload stack after commit reftable/blocksource: use mmap to read tables reftable/blocksource: refactor code to match our coding style reftable/stack: use stat info to avoid re-reading stack list reftable/stack: refactor reloading to use file descriptor reftable/stack: refactor stack reloading to have common exit path
2024-01-29reftable/pq_test: comment style fixJunio C Hamano
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-26reftable/stack: adjust permissions of compacted tablesPatrick Steinhardt
When creating a new compacted table from a range of preexisting ones we don't set the default permissions on the resulting table when specified by the user. This has the effect that the "core.sharedRepository" config will not be honored correctly. Fix this bug and add a test to catch this issue. Note that we only test on non-Windows platforms because Windows does not use POSIX permissions natively. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-23reftable: honor core.fsyncJohn Cai
While the reffiles backend honors configured fsync settings, the reftable backend does not. Address this by fsyncing reftable files using the write-or-die api's fsync_component() in two places: when we add additional entries into the table, and when we close the reftable writer. This commits adds a flush function pointer as a new member of reftable_writer because we are not sure that the first argument to the *write function pointer always contains a file descriptor. In the case of strbuf_add_void, the first argument is a buffer. This way, we can pass in a corresponding flush function that knows how to flush depending on which writer is being used. This patch does not contain tests as they will need to wait for another patch to start to exercise the reftable backend. At that point, the tests will be added to observe that fsyncs are happening when the reftable is in use. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-18reftable/stack: fix race in up-to-date checkPatrick Steinhardt
In 6fdfaf15a0 (reftable/stack: use stat info to avoid re-reading stack list, 2024-01-11) we have introduced a new mechanism to avoid re-reading the table list in case stat(3P) figures out that the stack didn't change since the last time we read it. While this change significantly improved performance when writing many refs, it can unfortunately lead to false negatives in very specific scenarios. Given two processes A and B, there is a feasible sequence of events that cause us to accidentally treat the table list as up-to-date even though it changed: 1. A reads the reftable stack and caches its stat info. 2. B updates the stack, appending a new table to "tables.list". This will both use a new inode and result in a different file size, thus invalidating A's cache in theory. 3. B decides to auto-compact the stack and merges two tables. The file size now matches what A has cached again. Furthermore, the filesystem may decide to recycle the inode number of the file we have replaced in (2) because it is not in use anymore. 4. A reloads the reftable stack. Neither the inode number nor the file size changed. If the timestamps did not change either then we think the cached copy of our stack is up-to-date. In fact, the commit introduced three related issues: - Non-POSIX compliant systems may not report proper `st_dev` and `st_ino` values in stat(3P), which made us rely solely on the file's potentially coarse-grained mtime and ctime. - `stat_validity_check()` and friends may end up not comparing `st_dev` and `st_ino` depending on the "core.checkstat" config, again reducing the signal to the mtime and ctime. - `st_ino` can be recycled, rendering the check moot even on POSIX-compliant systems. Given that POSIX defines that "The st_ino and st_dev fields taken together uniquely identify the file within the system", these issues led to the most important signal to establish file identity to be ignored or become useless in some cases. Refactor the code to stop using `stat_validity_check()`. Instead, we manually stat(3P) the file descriptors to make relevant information available. On Windows and MSYS2 the result will have both `st_dev` and `st_ino` set to 0, which allows us to address the first issue by not using the stat-based cache in that case. It also allows us to make sure that we always compare `st_dev` and `st_ino`, addressing the second issue. The third issue of inode recycling can be addressed by keeping the file descriptor of "files.list" open during the lifetime of the reftable stack. As the file will still exist on disk even though it has been unlinked it is impossible for its inode to be recycled as long as the file descriptor is still open. This should address the race in a POSIX-compliant way. The only real downside is that this mechanism cannot be used on non-POSIX-compliant systems like Windows. But we at least have the second-level caching mechanism in place that compares contents of "files.list" with the currently loaded list of tables. This new mechanism performs roughly the same as the previous one that relied on `stat_validity_check()`: Benchmark 1: update-ref: create many refs (HEAD~) Time (mean ± σ): 4.754 s ± 0.026 s [User: 2.204 s, System: 2.549 s] Range (min … max): 4.694 s … 4.802 s 20 runs Benchmark 2: update-ref: create many refs (HEAD) Time (mean ± σ): 4.721 s ± 0.020 s [User: 2.194 s, System: 2.527 s] Range (min … max): 4.691 s … 4.753 s 20 runs Summary update-ref: create many refs (HEAD~) ran 1.01 ± 0.01 times faster than update-ref: create many refs (HEAD) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-18reftable/stack: unconditionally reload stack after commitPatrick Steinhardt
After we have committed an addition to the reftable stack we call `reftable_stack_reload()` to reload the stack and thus reflect the changes that were just added. This function will only conditionally reload the stack in case `stack_uptodate()` tells us that the stack needs reloading. This check is wasteful though because we already know that the stack needs reloading. Call `reftable_stack_reload_maybe_reuse()` instead, which will unconditionally reload the stack. This is merely a conceptual fix, the code in question was not found to cause any problems in practice. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-16Merge branch 'ps/reftable-fixes-and-optims'Junio C Hamano
More fixes and optimizations to the reftable backend. * ps/reftable-fixes-and-optims: reftable/merged: transfer ownership of records when iterating reftable/merged: really reuse buffers to compute record keys reftable/record: store "val2" hashes as static arrays reftable/record: store "val1" hashes as static arrays reftable/record: constify some parts of the interface reftable/writer: fix index corruption when writing multiple indices reftable/stack: do not auto-compact twice in `reftable_stack_add()` reftable/stack: do not overwrite errors when compacting
2024-01-11reftable/blocksource: use mmap to read tablesPatrick Steinhardt
The blocksource interface provides an interface to read blocks from a reftable table. This interface is implemented using read(3P) calls on the underlying file descriptor. While this works alright, this pattern is very inefficient when repeatedly querying the reftable stack for one or more refs. This inefficiency can mostly be attributed to the fact that we often need to re-read the same blocks over and over again, and every single time we need to call read(3P) again. A natural fit in this context is to use mmap(3P) instead of read(3P), which has a bunch of benefits: - We do not need to come up with a caching strategy for some of the blocks as this will be handled by the kernel already. - We can avoid the overhead of having to call into the read(3P) syscall repeatedly. - We do not need to allocate returned blocks repeatedly, but can instead hand out pointers into the mmapped region directly. Using mmap comes with a significant drawback on Windows though, because mmapped files cannot be deleted and neither is it possible to rename files onto an mmapped file. But for one, the reftable library gracefully handles the case where auto-compaction cannot delete a still-open stack already and ignores any such errors. Also, `reftable_stack_clean()` will prune stale tables which are not referenced by "tables.list" anymore so that those files can eventually be pruned. And second, we never rewrite already-written stacks, so it does not matter that we cannot rename a file over an mmaped file, either. Another unfortunate property of mmap is that it is not supported by all systems. But given that the size of reftables should typically be rather limited (megabytes at most in the vast majority of repositories), we can use the fallback implementation provided by `git_mmap()` which reads the whole file into memory instead. This is the same strategy that the "packed" backend uses. While this change doesn't significantly improve performance in the case where we're seeking through stacks once (like e.g. git-for-each-ref(1) would). But it does speed up usecases where there is lots of random access to refs, e.g. when writing. The following benchmark demonstrates these savings with git-update-ref(1) creating N refs in an otherwise empty repository: Benchmark 1: update-ref: create many refs (refcount = 1, revision = HEAD~) Time (mean ± σ): 5.1 ms ± 0.2 ms [User: 2.5 ms, System: 2.5 ms] Range (min … max): 4.8 ms … 7.1 ms 111 runs Benchmark 2: update-ref: create many refs (refcount = 100, revision = HEAD~) Time (mean ± σ): 14.8 ms ± 0.5 ms [User: 7.1 ms, System: 7.5 ms] Range (min … max): 14.1 ms … 18.7 ms 84 runs Benchmark 3: update-ref: create many refs (refcount = 10000, revision = HEAD~) Time (mean ± σ): 926.4 ms ± 5.6 ms [User: 448.5 ms, System: 477.7 ms] Range (min … max): 920.0 ms … 936.1 ms 10 runs Benchmark 4: update-ref: create many refs (refcount = 1, revision = HEAD) Time (mean ± σ): 5.0 ms ± 0.2 ms [User: 2.4 ms, System: 2.5 ms] Range (min … max): 4.7 ms … 5.4 ms 111 runs Benchmark 5: update-ref: create many refs (refcount = 100, revision = HEAD) Time (mean ± σ): 10.5 ms ± 0.2 ms [User: 5.0 ms, System: 5.3 ms] Range (min … max): 10.0 ms … 10.9 ms 93 runs Benchmark 6: update-ref: create many refs (refcount = 10000, revision = HEAD) Time (mean ± σ): 529.6 ms ± 9.1 ms [User: 268.0 ms, System: 261.4 ms] Range (min … max): 522.4 ms … 547.1 ms 10 runs Summary update-ref: create many refs (refcount = 1, revision = HEAD) ran 1.01 ± 0.06 times faster than update-ref: create many refs (refcount = 1, revision = HEAD~) 2.08 ± 0.07 times faster than update-ref: create many refs (refcount = 100, revision = HEAD) 2.95 ± 0.14 times faster than update-ref: create many refs (refcount = 100, revision = HEAD~) 105.33 ± 3.76 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD) 184.24 ± 5.89 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD~) Theoretically, we could also replicate the strategy of the "packed" backend where small tables are read into memory instead of using mmap. Benchmarks did not confirm that this has a performance benefit though. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-11reftable/blocksource: refactor code to match our coding stylePatrick Steinhardt
Refactor `reftable_block_source_from_file()` to match our coding style better. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-11reftable/stack: use stat info to avoid re-reading stack listPatrick Steinhardt
Whenever we call into the refs interfaces we potentially have to reload refs in case they have been concurrently modified, either in-process or externally. While this happens somewhat automatically for loose refs because we simply try to re-read the files, the "packed" backend will reload its snapshot of the packed-refs file in case its stat info has changed since last reading it. In the reftable backend we have a similar mechanism that is provided by `reftable_stack_reload()`. This function will read the list of stacks from "tables.list" and, if they have changed from the currently stored list, reload the stacks. This is heavily inefficient though, as we have to check whether the stack is up-to-date on basically every read and thus keep on re-reading the file all the time even if it didn't change at all. We can do better and use the same stat(3P)-based mechanism that the "packed" backend uses. Instead of reading the file, we will only open the file descriptor, fstat(3P) it, and then compare the info against the cached value from the last time we have updated the stack. This should always work alright because "tables.list" is updated atomically via a rename, so even if the ctime or mtime wasn't granular enough to identify a change, at least the inode number or file size should have changed. This change significantly speeds up operations where many refs are read, like when using git-update-ref(1). The following benchmark creates N refs in an otherwise-empty repository via `git update-ref --stdin`: Benchmark 1: update-ref: create many refs (refcount = 1, revision = HEAD~) Time (mean ± σ): 5.1 ms ± 0.2 ms [User: 2.4 ms, System: 2.6 ms] Range (min … max): 4.8 ms … 7.2 ms 109 runs Benchmark 2: update-ref: create many refs (refcount = 100, revision = HEAD~) Time (mean ± σ): 19.1 ms ± 0.9 ms [User: 8.9 ms, System: 9.9 ms] Range (min … max): 18.4 ms … 26.7 ms 72 runs Benchmark 3: update-ref: create many refs (refcount = 10000, revision = HEAD~) Time (mean ± σ): 1.336 s ± 0.018 s [User: 0.590 s, System: 0.724 s] Range (min … max): 1.314 s … 1.373 s 10 runs Benchmark 4: update-ref: create many refs (refcount = 1, revision = HEAD) Time (mean ± σ): 5.1 ms ± 0.2 ms [User: 2.4 ms, System: 2.6 ms] Range (min … max): 4.8 ms … 7.2 ms 109 runs Benchmark 5: update-ref: create many refs (refcount = 100, revision = HEAD) Time (mean ± σ): 14.8 ms ± 0.2 ms [User: 7.1 ms, System: 7.5 ms] Range (min … max): 14.2 ms … 15.2 ms 82 runs Benchmark 6: update-ref: create many refs (refcount = 10000, revision = HEAD) Time (mean ± σ): 927.6 ms ± 5.3 ms [User: 437.8 ms, System: 489.5 ms] Range (min … max): 919.4 ms … 936.4 ms 10 runs Summary update-ref: create many refs (refcount = 1, revision = HEAD) ran 1.00 ± 0.07 times faster than update-ref: create many refs (refcount = 1, revision = HEAD~) 2.89 ± 0.14 times faster than update-ref: create many refs (refcount = 100, revision = HEAD) 3.74 ± 0.25 times faster than update-ref: create many refs (refcount = 100, revision = HEAD~) 181.26 ± 8.30 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD) 261.01 ± 12.35 times faster than update-ref: create many refs (refcount = 10000, revision = HEAD~) Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-11reftable/stack: refactor reloading to use file descriptorPatrick Steinhardt
We're about to introduce a stat(3P)-based caching mechanism to reload the list of stacks only when it has changed. In order to avoid race conditions this requires us to have a file descriptor available that we can use to call fstat(3P) on. Prepare for this by converting the code to use `fd_read_lines()` so that we have the file descriptor readily available. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-11reftable/stack: refactor stack reloading to have common exit pathPatrick Steinhardt
The `reftable_stack_reload_maybe_reuse()` function is responsible for reloading the reftable list from disk. The function is quite hard to follow though because it has a bunch of different exit paths, many of which have to free the same set of resources. Refactor the function to have a common exit path. While at it, touch up the style of this function a bit to match our usual coding style better. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-08Merge branch 'en/header-cleanup'Junio C Hamano
Remove unused header "#include". * en/header-cleanup: treewide: remove unnecessary includes in source files treewide: add direct includes currently only pulled in transitively trace2/tr2_tls.h: remove unnecessary include submodule-config.h: remove unnecessary include pkt-line.h: remove unnecessary include line-log.h: remove unnecessary include http.h: remove unnecessary include fsmonitor--daemon.h: remove unnecessary includes blame.h: remove unnecessary includes archive.h: remove unnecessary include treewide: remove unnecessary includes in source files treewide: remove unnecessary includes from header files
2024-01-03reftable/merged: transfer ownership of records when iteratingPatrick Steinhardt
When iterating over records with the merged iterator we put the records into a priority queue before yielding them to the caller. This means that we need to allocate the contents of these records before we can pass them over to the caller. The handover to the caller is quite inefficient though because we first deallocate the record passed in by the caller and then copy over the new record, which requires us to reallocate memory. Refactor the code to instead transfer ownership of the new record to the caller. So instead of reallocating all contents, we now release the old record and then copy contents of the new record into place. The following benchmark of `git show-ref --quiet` in a repository with around 350k refs shows a clear improvement. Before: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 708,058 allocs, 707,865 frees, 36,783,255 bytes allocated After: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 357,007 allocs, 356,814 frees, 24,193,602 bytes allocated This shows that we now have roundabout a single allocation per record that we're yielding from the iterator. Ideally, we'd also get rid of this allocation so that the number of allocations doesn't scale with the number of refs anymore. This would require some larger surgery though because the memory is owned by the priority queue before transferring it over to the caller. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-03reftable/merged: really reuse buffers to compute record keysPatrick Steinhardt
In 829231dc20 (reftable/merged: reuse buffer to compute record keys, 2023-12-11), we have refactored the merged iterator to reuse a pair of long-living strbufs by relying on the fact that `reftable_record_key()` tries to reuse already allocated strbufs by calling `strbuf_reset()`, which should give us significantly fewer reallocations compared to the old code that used on-stack strbufs that are allocated for each and every iteration. Unfortunately, we called `strbuf_release()` on these long-living strbufs that we meant to reuse on each iteration, defeating the optimization. Fix this performance issue by not releasing those buffers on iteration anymore, where we instead rely on `merged_iter_close()` to release the buffers for us. Using `git show-ref --quiet` in a repository with ~350k refs this leads to a significant drop in allocations. Before: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 1,410,148 allocs, 1,409,955 frees, 61,976,068 bytes allocated After: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 708,058 allocs, 707,865 frees, 36,783,255 bytes allocated Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-01-03reftable/record: store "val2" hashes as static arraysPatrick Steinhardt
Similar to the preceding commit, convert ref records of type "val2" to store their object IDs in static arrays instead of allocating them for every single record. We're using the same benchmark as in the preceding commit, with `git show-ref --quiet` in a repository with ~350k refs. This time around though the effects aren't this huge. Before: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 1,419,040 allocs, 1,418,847 frees, 62,153,868 bytes allocated After: HEAP SUMMARY: in use at exit: 21,163 bytes in 193 blocks total heap usage: 1,410,148 allocs, 1,409,955 frees, 61,976,068 bytes allocated This is because "val2"-type records are typically only stored for peeled tags, and the number of annotated tags in the benchmark repository is rather low. Still, it can be seen that this change leads to a reduction of allocations overall, even if only a small one. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>