aboutsummaryrefslogtreecommitdiff
path: root/hash.c
AgeCommit message (Collapse)Author
2026-03-20object-name: simplify computing common prefixesPatrick Steinhardt
The function `extend_abbrev_len()` computes the length of common hex characters between two object IDs. This is done by: - Making the caller provide the `hex` string for the needle object ID. - Comparing every hex position of the haystack object ID with `get_hex_char_from_oid()`. Turning the binary representation into hex first is roundabout though: we can simply compare the binary representation and give some special attention to the final nibble. Introduce a new function `oid_common_prefix_hexlen()` that does exactly this and refactor the code to use the new function. This allows us to drop the `struct min_abbrev_data::hex` field. Furthermore, this function will be used in by some other callsites in subsequent commits. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-07hash: expose hash context functions to Rustbrian m. carlson
We'd like to be able to hash our data in Rust using the same contexts as in C. However, we need our helper functions to not be inline so they can be linked into the binary appropriately. In addition, to avoid managing memory manually and since we don't know the size of the hash context structure, we want to have simple alloc and free functions we can use to make sure a context can be easily dynamically created. Expose the helper functions and create alloc, free, and init functions we can call. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-07hash: add a function to look up hash algo structsbrian m. carlson
In C, it's easy for us to look up a hash algorithm structure by its offset by simply indexing the hash_algos array. However, in Rust, we sometimes need a pointer to pass to a C function, but we have our own hash algorithm abstraction. To get one from the other, let's provide a simple function that looks up the C structure from the offset and expose it in Rust. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2026-02-07hash: use uint32_t for object_id algorithmbrian m. carlson
We currently use an int for this value, but we'll define this structure from Rust in a future commit and we want to ensure that our data types are exactly identical. To make that possible, use a uint32_t for the hash algorithm. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10hash: stop depending on `the_repository` in `null_oid()`Patrick Steinhardt
The `null_oid()` function returns the object ID that only consists of zeroes. Naturally, this ID also depends on the hash algorithm used, as the number of zeroes is different between SHA1 and SHA256. Consequently, the function returns the hash-algorithm-specific null object ID. This is currently done by depending on `the_hash_algo`, which implicitly makes us depend on `the_repository`. Refactor the function to instead pass in the hash algorithm for which we want to retrieve the null object ID. Adapt callsites accordingly by passing in `the_repository`, thus bubbling up the dependency on that global variable by one layer. There are a couple of trivial exceptions for subsystems that already got rid of `the_repository`. These subsystems instead use the repository that is available via the calling context: - "builtin/grep.c" - "grep.c" - "refs/debug.c" There are also two non-trivial exceptions: - "diff-no-index.c": Here we know that we may not have a repository initialized at all, so we cannot rely on `the_repository`. Instead, we adapt `diff_no_index()` to get a `struct git_hash_algo` as parameter. The only caller is located in "builtin/diff.c", where we know to call `repo_set_hash_algo()` in case we're running outside of a Git repository. Consequently, it is fine to continue passing `the_repository->hash_algo` even in this case. - "builtin/ls-files.c": There is an in-flight patch series that drops `USE_THE_REPOSITORY_VARIABLE` in this file, which causes a semantic conflict because we use `null_oid()` in `show_submodule()`. The value is passed to `repo_submodule_init()`, which may use the object ID to resolve a tree-ish in the superproject from which we want to read the submodule config. As such, the object ID should refer to an object in the superproject, and consequently we need to use its hash algorithm. This means that we could in theory just not bother about this edge case at all and just use `the_repository` in "diff-no-index.c". But doing so would feel misdesigned. Remove the `USE_THE_REPOSITORY_VARIABLE` preprocessor define in "hash.c". Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10hash: fix "-Wsign-compare" warningsPatrick Steinhardt
There are a couple of trivial "-Wsign-compare" warnings in "hash.c". Fix them. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-03-10object-file: split out logic regarding hash algorithmsPatrick Steinhardt
While we have a "hash.h" header, the actual implementation of the subsystem is hosted by "object-file.c". This makes it harder than necessary to find the actual implementation of the hash subsystem and intermingles the different concerns with one another. Split out the implementation of hash algorithms into a new, separate "hash.c" file. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-18remove old hash.[ch] implementationKarsten Blees
Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-18for_each_hash: allow passing a 'void *data' pointer to callbackLinus Torvalds
For the find_exact_renames() function, this allows us to pass the diff_options structure pointer to the low-level routines. We will use that to distinguish between the "rename" and "copy" cases. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-09Add 'const' where appropriate to index handling functionsLinus Torvalds
This is in an effort to make the source index of 'unpack_trees()' as being const, and thus making the compiler help us verify that we only access it for reading. The constification also extended to some of the hashing helpers that get called indirectly. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-22hash: fix lookup_hash semanticsJeff King
We were returning the _address of_ the stored item (or NULL) instead of the item itself. While this sort of indirection is useful for insertion (since you can lookup and then modify), it is unnecessary for read-only lookup. Since the hash code splits these functions between the internal lookup_hash_entry function and the public lookup_hash function, it makes sense for the latter to provide what users of the library expect. The result of this was that the index caching returned bogus results on lookup. We unfortunately didn't catch this because we were returning a "struct cache_entry **" as a "void *", and accidentally assigning it to a "struct cache_entry *". As it happens, this actually _worked_ most of the time, because the entries were defined as: struct cache_entry { struct cache_entry *next; ... }; meaning that interpreting a "struct cache_entry **" as a "struct cache_entry *" would yield an entry where all fields were totally bogus _except_ for the next pointer, which pointed to the actual cache entry. When walking the list, we would look at the bogus "name" field, which was unlikely to match our lookup, and then proceed to the "real" entry. The reading of bogus data was silently ignored most of the time, but could cause a segfault for some data (which seems to be more common on OS X). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26Do linear-time/space rename logic for exact renamesLinus Torvalds
This implements a smarter rename detector for exact renames, which rather than doing a pairwise comparison (time O(m*n)) will just hash the files into a hash-table (size O(n+m)), and only do pairwise comparisons to renames that have the same hash (time O(n+m) except for unrealistic hash collissions, which we just cull aggressively). Admittedly the exact rename case is not nearly as interesting as the generic case, but it's an important case none-the-less. A similar general approach should work for the generic case too, but even then you do need to handle the exact renames/copies separately (to avoid the inevitable added cost factor that comes from the _size_ of the file), so this is worth doing. In the expectation that we will indeed do the same hashing trick for the general rename case, this code uses a generic hash-table implementation that can be used for other things too. In fact, we might be able to consolidate some of our existing hash tables with the new generic code in hash.[ch]. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>