diff options
| author | Ezekiel Newren <ezekielnewren@gmail.com> | 2025-11-18 22:34:18 +0000 |
|---|---|---|
| committer | Junio C Hamano <gitster@pobox.com> | 2025-11-18 14:53:10 -0800 |
| commit | 6a26019c81faa07ba811541b4cf35be9e8ee1ead (patch) | |
| tree | 1f1c230dd72da18f65389c727faaa6c0d4f9bbc5 /xdiff/xtypes.h | |
| parent | b0d4ae30f5a23fa9da87e9396b78e6442b351ddc (diff) | |
| download | git-6a26019c81faa07ba811541b4cf35be9e8ee1ead.tar.xz | |
xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash
The ha field is serving two different purposes, which makes the code
harder to read. At first glance, it looks like many places assume
there could never be hash collisions between lines of the two input
files. In reality, line_hash is used together with xdl_recmatch() to
ensure correct comparisons of lines, even when collisions occur.
To make this clearer, the old ha field has been split:
* line_hash: a straightforward hash of a line, independent of any
external context. Its type is uint64_t, as it comes from a fixed
width hash function.
* minimal_perfect_hash: Not a new concept, but now a separate
field. It comes from the classifier's general-purpose hash table,
which assigns each line a unique and minimal hash across the two
files. A size_t is used here because it's meant to be used to
index an array. This also avoids ` as usize` casts on the Rust
side when using it to index a slice.
Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Diffstat (limited to 'xdiff/xtypes.h')
| -rw-r--r-- | xdiff/xtypes.h | 3 |
1 files changed, 2 insertions, 1 deletions
diff --git a/xdiff/xtypes.h b/xdiff/xtypes.h index 354349b523..d4e9cd2e76 100644 --- a/xdiff/xtypes.h +++ b/xdiff/xtypes.h @@ -41,7 +41,8 @@ typedef struct s_chastore { typedef struct s_xrecord { uint8_t const *ptr; size_t size; - unsigned long ha; + uint64_t line_hash; + size_t minimal_perfect_hash; } xrecord_t; typedef struct s_xdfile { |
