aboutsummaryrefslogtreecommitdiff
path: root/reftable
AgeCommit message (Collapse)Author
2025-02-18reftable/basics: stop using `SWAP()` macroPatrick Steinhardt
Stop using `SWAP()` macro in favor of an open-coded variant of it. Note that this also requires us to open-code the build assert that `SWAP()` itself uses to verify that the size of both variables matches. This is done to reduce our dependency on the Git codebase. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18reftable/stack: stop using `sleep_millisec()`Patrick Steinhardt
Refactor our use of `sleep_millisec()` by open-coding it with poll(3p), which is the current implementation of this function. Ideally, we'd use a more direct way to sleep, but there is no equivalent to sleep(3p) that would accept milliseconds as input. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18reftable/system: introduce `reftable_rand()`Patrick Steinhardt
Introduce a new system-level `reftable_rand()` function that generates a single unsigned integer for us. The implementation of this function is to be provided by the calling codebase, which allows us to more easily hook into pre-seeded random number generators. Adapt the two callsites where we generated random data. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18reftable/reader: stop using `ARRAY_SIZE()` macroPatrick Steinhardt
We have a single user of the `ARRAY_SIZE()` macro in the reftable reader. Drop its use to reduce our dependence on the Git codebase. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18reftable/basics: provide wrappers for big endian conversionPatrick Steinhardt
We're using a mixture of big endian conversion functions provided by both the reftable library, but also by the Git codebase. Refactor the code so that we exclusively use reftable-provided wrappers in order to untangle us from the Git codebase. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18reftable/basics: stop using `st_mult()` in array allocatorsPatrick Steinhardt
We're using `st_mult()` as part of our macro helpers that allocate arrays. This is bad due two two reasons: - `st_mult()` causes us to die in case the multiplication overflows. - `st_mult()` ties us to the Git codebase. Refactor the code to instead detect overflows manually and return an error in such cases. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18reftable: stop using `BUG()` in trivial casesPatrick Steinhardt
Stop using `BUG()` in the remaining trivial cases that we still have in the reftable library. Instead of aborting the program, we'll now bubble up a `REFTABLE_API_ERROR` to indicate misuse of the calling conventions. Note that in both `reftable_reader_{inc,dec}ref()` we simply stop calling `BUG()` altogether. The only situation where the counter should be zero is when the structure has already been free'd anyway, so we would run into undefined behaviour regardless of whether we try to abort the program or not. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18reftable/record: don't `BUG()` in `reftable_record_cmp()`Patrick Steinhardt
The reftable library aborts with a bug in case `reftable_record_cmp()` is invoked with two records of differing types. This would cause the program to die without the caller being able to handle the error, which is not something we want in the context of library code. And it ties us to the Git codebase. Refactor the code such that `reftable_record_cmp()` returns an error code separate from the actual comparison result. This requires us to also adapt some callers up the callchain in a similar fashion. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18reftable/record: stop using `BUG()` in `reftable_record_init()`Patrick Steinhardt
We're aborting the program via `BUG()` in case `reftable_record_init()` was invoked with an unknown record type. This is bad because we may now die in library code, and because it makes us depend on the Git codebase. Refactor the code such that `reftable_record_init()` can return an error code to the caller. Adapt any callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18reftable/record: stop using `COPY_ARRAY()`Patrick Steinhardt
Drop our use of `COPY_ARRAY()`, replacing it with an open-coded variant thereof. This is done to reduce our dependency on the Git library. While at it, guard the whole array copy logic so that we only copy it in case there actually is anything to be copied. Otherwise, we may end up trying to allocate a zero-sized array, which will return a NULL pointer and thus cause us to return an `REFTABLE_OUT_OF_MEMORY_ERROR`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18reftable/blocksource: stop using `xmmap()`Patrick Steinhardt
We use `xmmap()` to map reftables into memory. This function has two problems: - It causes us to die in case the mmap fails. - It ties us to the Git codebase. Refactor the code to use mmap(3p) instead with manual error checking. Note that this function may not be the system-provided mmap(3p), but may point to our `git_mmap()` wrapper that emulates the syscall on systems that do not have mmap(3p) available. Fix `reftable_block_source_from_file()` to properly bubble up the error code in case the map(3p) call fails. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18reftable/stack: stop using `write_in_full()`Patrick Steinhardt
Similar to the preceding commit, drop our use of `write_in_full()` and implement a new wrapper `reftable_write_full()` that handles this logic for us. This is done to reduce our dependency on the Git library. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-18reftable/stack: stop using `read_in_full()`Patrick Steinhardt
There is a single callsite of `read_in_full()` in the reftable library. Open-code the function to reduce our dependency on the Git library. Note that we only partially port over the logic from `read_in_full()` and its underlying `xread()` helper. Most importantly, the latter also knows to handle `EWOULDBLOCK` via `handle_nonblock()`. This logic is irrelevant for us though because the reftable library never sets the `O_NONBLOCK` option in the first place. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-02-14Merge branch 'kn/reflog-migration-fix-followup'Junio C Hamano
Code clean-up. * kn/reflog-migration-fix-followup: reftable: prevent 'update_index' changes after adding records refs: use 'uint64_t' for 'ref_update.index' refs: mark `ref_transaction_update_reflog()` as static
2025-02-06Merge branch 'ps/zlib-ng'Junio C Hamano
The code paths to interact with zlib has been cleaned up in preparation for building with zlib-ng. * ps/zlib-ng: ci: make "linux-musl" job use zlib-ng ci: switch linux-musl to use Meson compat/zlib: allow use of zlib-ng as backend git-zlib: cast away potential constness of `next_in` pointer compat/zlib: provide stubs for `deflateSetHeader()` compat/zlib: provide `deflateBound()` shim centrally git-compat-util: move include of "compat/zlib.h" into "git-zlib.h" compat: introduce new "zlib.h" header git-compat-util: drop `z_const` define compat: drop `uncompress2()` compatibility shim
2025-01-28git-compat-util: move include of "compat/zlib.h" into "git-zlib.h"Patrick Steinhardt
We include "compat/zlib.h" in "git-compat-util.h", which is unnecessarily broad given that we only have a small handful of files that use the zlib library. Move the header into "git-zlib.h" instead and adapt users of zlib to include that header. One exception is the reftable library, as we don't want to use the Git-specific wrapper of zlib there, so we include "compat/zlib.h" instead. Furthermore, we move the include into "reftable/system.h" so that users of the library other than Git can wire up zlib themselves. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-28compat: introduce new "zlib.h" headerPatrick Steinhardt
Introduce a new "compat/zlib-compat.h" header that we include instead of including <zlib.h> directly. This will allow us to wire up zlib-ng as an alternative backend for zlib compression in a subsequent commit. Note that we cannot just call the file "compat/zlib.h", as that may otherwise cause us to include that file instead of <zlib.h>. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-28Merge branch 'ps/reftable-sign-compare'Junio C Hamano
The reftable/ library code has been made -Wsign-compare clean. * ps/reftable-sign-compare: reftable: address trivial -Wsign-compare warnings reftable/blocksource: adjust `read_block()` to return `ssize_t` reftable/blocksource: adjust type of the block length reftable/block: adjust type of the restart length reftable/block: adapt header and footer size to return a `size_t` reftable/basics: adjust `hash_size()` to return `uint32_t` reftable/basics: adjust `common_prefix_size()` to return `size_t` reftable/record: handle overflows when decoding varints reftable/record: drop unused `print` function pointer meson: stop disabling -Wsign-compare
2025-01-22reftable: prevent 'update_index' changes after adding recordsKarthik Nayak
The function `reftable_writer_set_limits()` allows updating the 'min_update_index' and 'max_update_index' of a reftable writer. These values are written to both the writer's header and footer. Since the header is written during the first block write, any subsequent changes to the update index would create a mismatch between the header and footer values. The footer would contain the newer values while the header retained the original ones. To protect against this bug, prevent callers from updating these values after any record is written. To do this, modify the function to return an error whenever the limits are modified after any record adds. Check for record adds within `reftable_writer_set_limits()` by checking the `last_key` and `next` variable. The former is updated after each record added, but is reset at certain points. The latter is set after writing the first block. Modify all callers of the function to anticipate a return type and handle it accordingly. Add a unit test to also ensure the function returns the error as expected. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21reftable: address trivial -Wsign-compare warningsPatrick Steinhardt
Address the last couple of trivial -Wsign-compare warnings in the reftable library and remove the DISABLE_SIGN_COMPARE_WARNINGS macro that we have in "reftable/system.h". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21reftable/blocksource: adjust `read_block()` to return `ssize_t`Patrick Steinhardt
The `block_source_read_block()` function and its implementations return an integer as a result that reflects either the number of bytes read, or an error. As such its return type, a signed integer, isn't wrong, but it doesn't give the reader a good hint what it actually returns. Refactor the function to return an `ssize_t` instead, which is typical for functions similar to read(3p) and should thus give readers a better signal what they can expect as a result. Adjust callers to better handle the returned value to avoid warnings with -Wsign-compare. One of these callers is `reader_get_block()`, whose return value is only ever used by its callers to figure out whether or not the read was successful. So instead of bubbling up the `ssize_t` there, too, we adapt it to only indicate success or errors. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21reftable/blocksource: adjust type of the block lengthPatrick Steinhardt
The block length is used to track the number of bytes available in a specific block. As such, it is never set to a negative value, but is still represented by a signed integer. Adjust the type of the variable to be `size_t`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21reftable/block: adjust type of the restart lengthPatrick Steinhardt
The restart length is tracked as a positive integer even though it cannot ever be negative. Furthermore, it is effectively capped via the MAX_RESTARTS variable. Adjust the type of the variable to be `uint32_t`. While this type is excessive given that MAX_RESTARTS fits into an `uint16_t`, other places already use 32 bit integers for restarts, so this type is being more consistent. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21reftable/block: adapt header and footer size to return a `size_t`Patrick Steinhardt
The functions `header_size()` and `footer_size()` return a positive integer representing the size of the header and footer, respectively, dependent on the version of the reftable format. Similar to the preceding commit, these functions return a signed integer though, which is nonsensical given that there is no way for these functions to return negative. Adapt the functions to return a `size_t` instead to fix a couple of sign comparison warnings. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21reftable/basics: adjust `hash_size()` to return `uint32_t`Patrick Steinhardt
The `hash_size()` function returns the number of bytes used by the hash function. Weirdly enough though, it returns a signed integer for its size even though the size obviously cannot ever be negative. The only case where it could be negative is if the function returned an error when asked for an unknown hash, but we assert(3p) instead. Adjust the type of `hash_size()` to be `uint32_t` and adapt all places that use signed integers for the hash size to follow suit. This also allows us to get rid of a couple asserts that we had which verified that the size was indeed positive, which further stresses the point that this refactoring makes sense. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21reftable/basics: adjust `common_prefix_size()` to return `size_t`Patrick Steinhardt
The `common_prefix_size()` function computes the length of the common prefix between two buffers. As such its return value will always be an unsigned integer, as the length cannot be negative. Regardless of that, the function returns a signed integer, which is nonsensical and causes a couple of -Wsign-compare warnings all over the place. Adjust the function to return a `size_t` instead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21reftable/record: handle overflows when decoding varintsPatrick Steinhardt
The logic to decode varints isn't able to detect integer overflows: as long as the buffer still has more data available, and as long as the current byte has its 0x80 bit set, we'll continue to add up these values to the result. This will eventually cause the `uint64_t` to overflow, at which point we'll return an invalid result. Refactor the function so that it is able to detect such overflows. The implementation is basically copied from Git's own `decode_varint()`, which already knows to handle overflows. The only adjustment is that we also take into account the string view's length in order to not overrun it. The reftable documentation explicitly notes that those two encoding schemas are supposed to be the same: Varint encoding ^^^^^^^^^^^^^^^ Varint encoding is identical to the ofs-delta encoding method used within pack files. Decoder works as follows: .... val = buf[ptr] & 0x7f while (buf[ptr] & 0x80) { ptr++ val = ((val + 1) << 7) | (buf[ptr] & 0x7f) } .... While at it, refactor `put_var_int()` in the same way by copying over the implementation of `encode_varint()`. While `put_var_int()` doesn't have an issue with overflows, it generates warnings with -Wsign-compare. The implementation of `encode_varint()` doesn't, is battle-tested and at the same time way simpler than what we currently have. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-21reftable/record: drop unused `print` function pointerPatrick Steinhardt
In 42c424d69d (t/helper: inline printing of reftable records, 2024-08-22) we stopped using the `print` function of the reftable record vtable and instead moved its implementation into the single user of it. We didn't remove the function itself from the vtable though. Drop it. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-07reftable/stack: accept insecure random bytesPatrick Steinhardt
The reftable library uses randomness in two call paths: - When reading a stack in case some of the referenced tables disappears. The randomness is used to delay the next read by a couple of milliseconds. - When writing a new table, where the randomness gets appended to the table name (e.g. "0x000000000001-0x000000000002-0b1d8ddf.ref"). In neither of these cases do we need strong randomness. Unfortunately though, we have observed test failures caused by the former case. In t0610 we have a test that spawns a 100 processes at once, all of which try to write a new table to the stack. And given that all of the processes will require randomness, it can happen that these processes make the entropy pool run dry, which will then cause us to die: + test_seq 100 + printf %s commit\trefs/heads/branch-%s\n 68d032e9edd3481ac96382786ececc37ec28709e 1 + printf %s commit\trefs/heads/branch-%s\n 68d032e9edd3481ac96382786ececc37ec28709e 2 ... + git update-ref refs/heads/branch-98 HEAD + git update-ref refs/heads/branch-97 HEAD + git update-ref refs/heads/branch-99 HEAD + git update-ref refs/heads/branch-100 HEAD fatal: unable to get random bytes fatal: unable to get random bytes fatal: unable to get random bytes fatal: unable to get random bytes fatal: unable to get random bytes fatal: unable to get random bytes fatal: unable to get random bytes The report was for NonStop, which uses OpenSSL as the backend for randomness. In the preceding commit we have adapted that backend to also return randomness in case the entropy pool is empty and the caller passes the `CSPRNG_BYTES_INSECURE` flag. Do so to fix the issue. Reported-by: Randall S. Becker <rsbecker@nexbridge.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-01-07wrapper: allow generating insecure random bytesPatrick Steinhardt
The `csprng_bytes()` function generates randomness and writes it into a caller-provided buffer. It abstracts over a couple of implementations, where the exact one that is used depends on the platform. These implementations have different guarantees: while some guarantee to never fail (arc4random(3)), others may fail. There are two significant failures to distinguish from one another: - Systemic failure, where e.g. opening "/dev/urandom" fails or when OpenSSL doesn't have a provider configured. - Entropy failure, where the entropy pool is exhausted, and thus the function cannot guarantee strong cryptographic randomness. While we cannot do anything about the former, the latter failure can be acceptable in some situations where we don't care whether or not the randomness can be predicted. Introduce a new `CSPRNG_BYTES_INSECURE` flag that allows callers to opt into weak cryptographic randomness. The exact behaviour of the flag depends on the underlying implementation: - `arc4random_buf()` never returns an error, so it doesn't change. - `getrandom()` pulls from "/dev/urandom" by default, which never blocks on modern systems even when the entropy pool is empty. - `getentropy()` seems to block when there is not enough randomness available, and there is no way of changing that behaviour. - `GtlGenRandom()` doesn't mention anything about its specific failure mode. - The fallback reads from "/dev/urandom", which also returns bytes in case the entropy pool is drained in modern Linux systems. That only leaves OpenSSL with `RAND_bytes()`, which returns an error in case the returned data wouldn't be cryptographically safe. This function is replaced with a call to `RAND_pseudo_bytes()`, which can indicate whether or not the returned data is cryptographically secure via its return value. If it is insecure, and if the `CSPRNG_BYTES_INSECURE` flag is set, then we ignore the insecurity and return the data regardless. It is somewhat questionable whether we really need the flag in the first place, or whether we wouldn't just ignore the potentially-insecure data. But the risk of doing that is that we might have or grow callsites that aren't aware of the potential insecureness of the data in places where it really matters. So using a flag to opt-in to that behaviour feels like the more secure choice. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-28reftable: handle realloc error in parse_names()René Scharfe
Check the final reallocation for adding the terminating NULL and handle it just like those in the loop. Simply use REFTABLE_ALLOC_GROW instead of keeping the REFTABLE_REALLOC_ARRAY call and adding code to preserve the original pointer value around it. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-28reftable: fix allocation count on realloc errorRené Scharfe
When realloc(3) fails, it returns NULL and keeps the original allocation intact. REFTABLE_ALLOC_GROW overwrites both the original pointer and the allocation count variable in that case, simultaneously leaking the original allocation and misrepresenting the number of storable items. parse_names() avoids the leak by keeping the original pointer if reallocation fails, but still increase the allocation count in such a case as if it succeeded. That's OK, because the error handling code just frees everything and doesn't look at names_cap anymore. reftable_buf_add() does the same, but here it is a problem as it leaves the reftable_buf in a broken state, with ->alloc being roughly twice as big as the actually allocated memory, allowing out-of-bounds writes in subsequent calls. Reimplement REFTABLE_ALLOC_GROW to avoid leaks, keep allocation counts in sync and still signal failures to callers while avoiding code duplication in callers. Make it an expression that evaluates to 0 if no reallocation is needed or it succeeded and 1 on failure while keeping the original pointer and allocation counter values. Adjust REFTABLE_ALLOC_GROW_OR_NULL to the new calling convention for REFTABLE_ALLOC_GROW, but keep its support for non-size_t alloc variables for now. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-28reftable: avoid leaks on realloc errorRené Scharfe
When realloc(3) fails, it returns NULL and keeps the original allocation intact. REFTABLE_ALLOC_GROW overwrites both the original pointer and the allocation count variable in that case, simultaneously leaking the original allocation and misrepresenting the number of storable items. parse_names() and reftable_buf_add() avoid leaking by restoring the original pointer value on failure, but all other callers seem to be OK with losing the old allocation. Add a new variant of the macro, REFTABLE_ALLOC_GROW_OR_NULL, which plugs the leak and zeros the allocation counter. Use it for those callers. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-23Merge branch 'ps/build-sign-compare'Junio C Hamano
Start working to make the codebase buildable with -Wsign-compare. * ps/build-sign-compare: t/helper: don't depend on implicit wraparound scalar: address -Wsign-compare warnings builtin/patch-id: fix type of `get_one_patchid()` builtin/blame: fix type of `length` variable when emitting object ID gpg-interface: address -Wsign-comparison warnings daemon: fix type of `max_connections` daemon: fix loops that have mismatching integer types global: trivial conversions to fix `-Wsign-compare` warnings pkt-line: fix -Wsign-compare warning on 32 bit platform csum-file: fix -Wsign-compare warning on 32-bit platform diff.h: fix index used to loop through unsigned integer config.mak.dev: drop `-Wno-sign-compare` global: mark code units that generate warnings with `-Wsign-compare` compat/win32: fix -Wsign-compare warning in "wWinMain()" compat/regex: explicitly ignore "-Wsign-compare" warnings git-compat-util: introduce macros to disable "-Wsign-compare" warnings
2024-12-23Merge branch 'kn/reftable-writer-log-write-verify'Junio C Hamano
Reftable backend adds check for upper limit of log's update_index. * kn/reftable-writer-log-write-verify: reftable/writer: ensure valid range for log's update_index
2024-12-23Merge branch 'ps/reftable-alloc-failures-zalloc-fix'Junio C Hamano
Recent reftable updates mistook a NULL return from a request for 0-byte allocation as OOM and died unnecessarily, which has been corrected. * ps/reftable-alloc-failures-zalloc-fix: reftable/basics: return NULL on zero-sized allocations reftable/stack: fix zero-sized allocation when there are no readers reftable/merged: fix zero-sized allocation when there are no readers reftable/stack: don't perform auto-compaction with less than two tables
2024-12-22reftable/basics: return NULL on zero-sized allocationsPatrick Steinhardt
In the preceding commits we have fixed a couple of issues when allocating zero-sized objects. These issues were masked by implementation-defined behaviour. Quoting malloc(3p): If size is 0, either: * A null pointer shall be returned and errno may be set to an implementation-defined value, or * A pointer to the allocated space shall be returned. The application shall ensure that the pointer is not used to access an object. So it is perfectly valid that implementations of this function may or may not return a NULL pointer in such a case. Adapt both `reftable_malloc()` and `reftable_realloc()` so that they return NULL pointers on zero-sized allocations. This should remove any implementation-defined behaviour in our allocators and thus allows us to detect such platform-specific issues more easily going forward. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-22reftable/stack: fix zero-sized allocation when there are no readersPatrick Steinhardt
Similar as the preceding commit, we may try to do a zero-sized allocation when reloading a reftable stack that ain't got any tables. It is implementation-defined whether malloc(3p) returns a NULL pointer in that case or a zero-sized object. In case it does return a NULL pointer though it causes us to think we have run into an out-of-memory situation, and thus we return an error. Fix this by only allocating arrays when they have at least one entry. Reported-by: Randall S. Becker <rsbecker@nexbridge.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-22reftable/merged: fix zero-sized allocation when there are no readersPatrick Steinhardt
It was reported [1] that Git started to fail with an out-of-memory error when initializing repositories with the reftable backend on NonStop platforms. A bisect led to 802c0646ac (reftable/merged: handle allocation failures in `merged_table_init_iter()`, 2024-10-02), which changed how we allocate memory when initializing a merged table. The root cause of this seems to be that NonStop returns a `NULL` pointer when doing a zero-sized allocation. This would've already happened before the above change, but we never noticed because we did not check the result. Now we do notice and thus return an out-of-memory error to the caller. Fix the issue by skipping the allocation altogether in case there are no readers. [1]: <00ad01db5017$aa9ce340$ffd6a9c0$@nexbridge.com> Reported-by: Randall S. Becker <rsbecker@nexbridge.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-22reftable/stack: don't perform auto-compaction with less than two tablesPatrick Steinhardt
In order to compact tables we need at least two tables. Bail out early from `reftable_stack_auto_compact()` in case we have less than two tables. In the original, `stack_table_sizes_for_compaction()` yields an array that has the same length as the number of tables. This array is then passed on to `suggest_compaction_segment()`, which returns an empty segment in case we have less than two tables. The segment is then passed to `segment_size()`, which will return `0` because both start and end of the segment are `0`. And because we only call `stack_compact_range()` in case we have a positive segment size we don't perform auto-compaction at all. Consequently, this change does not result in a user-visible change in behaviour when called with a single table. But when called with no tables this protects us against a potential out-of-memory error: `stack_table_sizes_for_compaction()` would try to allocate a zero-byte object when there aren't any tables, and that may lead to a `NULL` pointer on some platforms like NonStop which causes us to bail out with an out-of-memory error. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-10Merge branch 'ps/reftable-iterator-reuse'Junio C Hamano
Optimize reading random references out of the reftable backend by allowing reuse of iterator objects. * ps/reftable-iterator-reuse: refs/reftable: reuse iterators when reading refs reftable/merged: drain priority queue on reseek reftable/stack: add mechanism to notify callers on reload refs/reftable: refactor reflog expiry to use reftable backend refs/reftable: refactor reading symbolic refs to use reftable backend refs/reftable: read references via `struct reftable_backend` refs/reftable: figure out hash via `reftable_stack` reftable/stack: add accessor for the hash ID refs/reftable: handle reloading stacks in the reftable backend refs/reftable: encapsulate reftable stack
2024-12-10Merge branch 'ps/reftable-detach'Junio C Hamano
Isolates the reftable subsystem from the rest of Git's codebase by using fewer pieces of Git's infrastructure. * ps/reftable-detach: reftable/system: provide thin wrapper for lockfile subsystem reftable/stack: drop only use of `get_locked_file_path()` reftable/system: provide thin wrapper for tempfile subsystem reftable/stack: stop using `fsync_component()` directly reftable/system: stop depending on "hash.h" reftable: explicitly handle hash format IDs reftable/system: move "dir.h" to its only user
2024-12-07reftable/writer: ensure valid range for log's update_indexKarthik Nayak
Each reftable addition has an associated update_index. While writing refs, the update_index is verified to be within the range of the reftable writer, i.e. `writer.min_update_index <= ref.update_index` and `writer.max_update_index => ref.update_index`. The corresponding check for reflogs in `reftable_writer_add_log` is however missing. Add a similar check, but only check for the upper limit. This is because reflogs are treated a bit differently than refs. Each reflog entry in reftable has an associated update_index and we also allow expiring entries in the middle, which is done by simply writing a new reflog entry with the same update_index. This means, writing reflog entries with update_index lesser than the writer's update_index is an expected scenario. Add a new unit test to check for the limits and fix some of the existing tests, which were setting arbitrary values for the update_index by ensuring they stay within the now checked limits. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-26reftable/merged: drain priority queue on reseekPatrick Steinhardt
In 5bf96e0c39 (reftable/generic: move seeking of records into the iterator, 2024-05-13) we have refactored the reftable codebase such that iterators can be initialized once and then re-seeked multiple times. This feature is used by 1869525066 (refs/reftable: wire up support for exclude patterns, 2024-09-16) in order to skip records based on exclude patterns provided by the caller. The logic to re-seek the merged iterator is insufficient though because we don't drain the priority queue on a re-seek. This means that the queue may contain stale entries and thus reading the next record in the queue will return the wrong entry. While this is an obvious bug, it is harmless in the context of above exclude patterns: - If the queue contained stale entries that match the pattern then the caller would already know to filter out such refs. This is because our codebase is prepared to handle backends that don't have a way to efficiently implement exclude patterns. - If the queue contained stale entries that don't match the pattern we'd eventually filter out any duplicates. This is because the reftable code discards items with the same ref name and sorts any remaining entries properly. So things happen to work in this context regardless of the bug, and there is no other use case yet where we re-seek iterators. We're about to introduce a caching mechanism though where iterators are reused by the reftable backend, and that will expose the bug. Fix the issue by draining the priority queue when seeking and add a testcase that surfaces the issue. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-26reftable/stack: add mechanism to notify callers on reloadPatrick Steinhardt
Reftable stacks are reloaded in two cases: - When calling `reftable_stack_reload()`, if the stat-cache tells us that the stack has been modified. - When committing a reftable addition. While callers can figure out the second case, they do not have a mechanism to figure out whether `reftable_stack_reload()` led to an actual reload of the on-disk data. All they can do is thus to assume that data is always being reloaded in that case. Improve the situation by introducing a new `on_reload()` callback to the reftable options. If provided, the function will be invoked every time the stack has indeed been reloaded. This allows callers to invalidate data that depends on the current stack data. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-26reftable/stack: add accessor for the hash IDPatrick Steinhardt
Add an accessor function that allows callers to access the hash ID of a reftable stack. This function will be used in a subsequent commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-26reftable: rename scratch bufferPatrick Steinhardt
Both `struct block_writer` and `struct reftable_writer` have a `buf` member that is being reused to optimize the number of allocations. Rename the variable to `scratch` to clarify its intend and provide a comment explaining why it exists. Suggested-by: Christian Couder <christian.couder@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21reftable/block: optimize allocations by using scratch bufferPatrick Steinhardt
The block writer needs to compute the key for every record that one adds to the writer. The buffer for this key is stored on the stack and thus reallocated on every call to `block_writer_add()`, which is inefficient. Refactor the code so that we store the buffer in the `block_writer` struct itself so that we can reuse it. This reduces the number of allocations when writing many refs, e.g. when migrating one million refs from the "files" backend to the "reftable backend. Before this change: HEAP SUMMARY: in use at exit: 80,048 bytes in 49 blocks total heap usage: 3,025,864 allocs, 3,025,815 frees, 372,746,291 bytes allocated After this change: HEAP SUMMARY: in use at exit: 80,048 bytes in 49 blocks total heap usage: 2,013,250 allocs, 2,013,201 frees, 347,543,583 bytes allocated Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-11-21reftable/block: rename `block_writer::buf` variablePatrick Steinhardt
Adapt the name of the `block_writer::buf` variable to instead be called `block`. This aligns it with the existing `block_len` variable, which tracks the length of this buffer, and is generally a bit more tied to the actual context where this variable gets used. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>