aboutsummaryrefslogtreecommitdiff
path: root/object-file.h
AgeCommit message (Collapse)Author
2025-04-15object-file: split out functions relating to object store subsystemPatrick Steinhardt
While we have the "object-store.h" header, most of the functionality for object stores is actually hosted in "object-file.c". This makes it hard to find relevant functions and causes us to mix up concerns. Split out functions relating to the object store subsystem into a new "object-store.c" file. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15object-file: move `git_open_cloexec()` to "compat/open.c"Patrick Steinhardt
The `git_open_cloexec()` wrapper function provides the ability to open a file with `O_CLOEXEC` in a platform-agnostic way. This function is provided by "object-file.c" even though it is not specific to the object subsystem at all. Move the file into "compat/open.c". This file already exists before this commit, but has only been compiled conditionally depending on whether or not open(3p) may return EINTR. With this change we now unconditionally compile the object, but wrap `git_open_with_retry()` in an ifdef. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15object-file: move `safe_create_leading_directories()` into "path.c"Patrick Steinhardt
The `safe_create_leading_directories()` function and its relatives are located in "object-file.c", which is not a good fit as they provide generic functionality not related to objects at all. Move them into "path.c", which already hosts `safe_create_dir()` and its relative `safe_create_dir_in_gitdir()`. "path.c" is free of `the_repository`, but the moved functions depend on `the_repository` to read the "core.sharedRepository" config. Adapt the function signature to accept a repository as argument to fix the issue and adjust callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-04-15object-file: move `mkdir_in_gitdir()` into "path.c"Patrick Steinhardt
The `mkdir_in_gitdir()` function is similar to `safe_create_dir()`, but the former is hosted in "object-file.c" whereas the latter is hosted in "path.c". The latter code unit makes way more sense though as the logic has nothing to do with object files in particular. Move the file into "path.c". While at it, we: - Rename the function to `safe_create_dir_in_gitdir()` so that the function names are similar to one another. - Remove the dependency on `the_repository` by making the callers pass the repository instead. Adjust callers accordingly. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-27finalize_object_file(): implement collision checkTaylor Blau
We've had "FIXME!!! Collision check here ?" in finalize_object_file() since aac1794132 (Improve sha1 object file writing., 2005-05-03). That is, when we try to write a file with the same name, we assume the on-disk contents are the same and blindly throw away the new copy. One of the reasons we never implemented this is because the files it moves are all named after the cryptographic hash of their contents (either loose objects, or packs which have their hash in the name these days). So we are unlikely to see such a collision by accident. And even though there are weaknesses in sha1, we assume they are mitigated by our use of sha1dc. So while it's a theoretical concern now, it hasn't been a priority. However, if we start using weaker hashes for pack checksums and names, this will become a practical concern. So in preparation, let's actually implement a byte-for-byte collision check. The new check will cause the write of new differing content to be a failure, rather than a silent noop, and we'll retain the temporary file on disk. If there's no collision present, we'll clean up the temporary file as usual after either rename()-ing or link()-ing it into place. Note that this may cause some extra computation when the files are in fact identical, but this should happen rarely. Loose objects are exempt from this check, and the collision check may be skipped by calling the _flags variant of this function with the FOF_SKIP_COLLISION_CHECK bit set. This is done for a couple of reasons: - We don't treat the hash of the loose object file's contents as a checksum, since the same loose object can be stored using different bytes on disk (e.g., when adjusting core.compression, using a different version of zlib, etc.). This is fundamentally different from cases where finalize_object_file() is operating over a file which uses the hash value as a checksum of the contents. In other words, a pair of identical loose objects can be stored using different bytes on disk, and that should not be treated as a collision. - We already use the path of the loose object as its hash value / object name, so checking for collisions at the content level doesn't add anything. Adding a content-level collision check would have to happen at a higher level than in finalize_object_file(), since (avoiding race conditions) writing an object loose which already exists in the repository will prevent us from even reaching finalize_object_file() via the object freshening code. There is a collision check in index-pack via its `check_collision()` function, but there isn't an analogous function in unpack-objects, which just feeds the result to write_object_file(). So skipping the collision check here does not change for better or worse the hardness of loose object writes. As a small note related to the latter bullet point above, we must teach the tmp-objdir routines to similarly skip the content-level collision checks when calling migrate_one() on a loose object file, which we do by setting the FOF_SKIP_COLLISION_CHECK bit when we are inside of a loose object shard. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Helped-by: Elijah Newren <newren@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24hash-ll.h: split out of hash.h to remove dependency on repository.hElijah Newren
hash.h depends upon and includes repository.h, due to the definition and use of the_hash_algo (defined as the_repository->hash_algo). However, most headers trying to include hash.h are only interested in the layout of the structs like object_id. Move the parts of hash.h that do not depend upon repository.h into a new file hash-ll.h (the "low level" parts of hash.h), and adjust other files to use this new header where the convenience inline functions aren't needed. This allows hash.h and object.h to be fairly small, minimal headers. It also exposes a lot of hidden dependencies on both path.h (which was brought in by repository.h) and repository.h (which was previously implicitly brought in by object.h), so also adjust other files to be more explicit about what they depend upon. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11object-file.h: move declarations for object-file.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>