aboutsummaryrefslogtreecommitdiff
path: root/xdiff/xtypes.h
AgeCommit message (Collapse)Author
2026-01-26xdiff: remove "line_hash" field from xrecord_tPhillip Wood
Prior to commit 6a26019c81 (xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash, 2025-11-18) the "ha" field of xrecord_t initially held the "line_hash" value and once the line had been interned that field was updated to hold the "minimal_perfect_hash". The "line_hash" is only used to intern the line so there is no point in storing it after all the input lines have been interned. Removing the "line_hash" field from xrecord_t and storing it in xdlclass_t where it is actually used makes it clearer that it is a temporary value and it should not be used once we're calculated the "minimal_perfect_hash". This also reduces the size of xrecord_t by 25% on 64-bit platforms and 40% on 32-bit platforms. While the struct is small we create one instance per input line so any saving is welcome. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-18xdiff: rename rindex -> reference_indexEzekiel Newren
The classic diff adds only the lines that it's going to consider, during the diff, to an array. A mapping between the compacted array, and the lines of the file that they reference, is facilitated by this array. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-18xdiff: change rindex from long to size_t in xdfile_tEzekiel Newren
The field rindex describes an index offset for other arrays. Change it to size_t. Changing the type of rindex from long to size_t has no cascading refactor impact because it is only ever used to directly index other arrays. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-18xdiff: make xdfile_t.nreff a size_t instead of longEzekiel Newren
size_t is used because nreff describes the number of elements in memory for rindex. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-18xdiff: make xdfile_t.nrec a size_t instead of longEzekiel Newren
size_t is used because nrec describes the number of elements for both recs, and for 'changed' + 2. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-18xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hashEzekiel Newren
The ha field is serving two different purposes, which makes the code harder to read. At first glance, it looks like many places assume there could never be hash collisions between lines of the two input files. In reality, line_hash is used together with xdl_recmatch() to ensure correct comparisons of lines, even when collisions occur. To make this clearer, the old ha field has been split: * line_hash: a straightforward hash of a line, independent of any external context. Its type is uint64_t, as it comes from a fixed width hash function. * minimal_perfect_hash: Not a new concept, but now a separate field. It comes from the classifier's general-purpose hash table, which assigns each line a unique and minimal hash across the two files. A size_t is used here because it's meant to be used to index an array. This also avoids ` as usize` casts on the Rust side when using it to index a slice. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-18xdiff: use size_t for xrecord_t.sizeEzekiel Newren
size_t is the appropriate type because size is describing the number of elements, bytes in this case, in memory. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-18xdiff: make xrecord_t.ptr a uint8_t instead of charEzekiel Newren
Make xrecord_t.ptr uint8_t because it's referring to bytes in memory. In order to avoid a refactor avalanche, many uses of this field were cast to char* or similar. Places where casting was unnecessary: xemit.c:156 xmerge.c:124 xmerge.c:127 xmerge.c:164 xmerge.c:169 xmerge.c:172 xmerge.c:178 Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-11-18xdiff: use ptrdiff_t for dstart/dendEzekiel Newren
ptrdiff_t is appropriate for dstart and dend because they both describe positive or negative offsets relative to a pointer. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-10-03xdiff: change type of xdfile_t.changed from char to boolEzekiel Newren
The only values possible for 'changed' is 1 and 0, which exactly maps to a bool type. It might not look like this because action1 and action2 (which use to be dis1, and dis2) were also of type char and were assigned numerical values within a few lines of 'changed' (what used to be rchg). Using DISCARD/KEEP/INVESTIGATE for action1[i]/action2[j], and true/false for changed[k] makes it clear to future readers that these are logically separate concepts. Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-30xdiff: rename rchg -> changed in xdfile_tEzekiel Newren
The field rchg (now 'changed') declares if a line in a file is changed or not. A later commit will change it's type from 'char' to 'bool' to make its purpose even more clear. Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-30xdiff: delete chastore from xdfile_tEzekiel Newren
xdfile_t currently uses chastore_t which is an arena allocator. I think that xrecord_t used to be a linked list and recs didn't exist originally. When recs was added I think they forgot to remove xdfile_t.next, but was overlooked. This dual data structure setup makes the code somewhat confusing. Additionally the C type chastore_t isn't FFI friendly, and provides little to no performance benefit over using realloc to grow an array. Performance impact of deleting fields from xdfile_t: Deleting ha is about 5% slower. Deleting cha is about 5% faster. Delete ha, but keep cha time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_ha/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null' Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.269 s ± 0.017 s [User: 1.135 s, System: 0.128 s] Range (min … max): 1.249 s … 1.286 s 10 runs Benchmark 2: build_delete_ha/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.339 s ± 0.017 s [User: 1.234 s, System: 0.099 s] Range (min … max): 1.320 s … 1.358 s 10 runs Summary build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran 1.06 ± 0.02 times faster than build_delete_ha/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Delete cha, but keep ha time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_chastore/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null' Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.290 s ± 0.001 s [User: 1.154 s, System: 0.130 s] Range (min … max): 1.288 s … 1.292 s 10 runs Benchmark 2: build_delete_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.232 s ± 0.017 s [User: 1.105 s, System: 0.121 s] Range (min … max): 1.205 s … 1.249 s 10 runs Summary build_delete_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran 1.05 ± 0.01 times faster than build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Delete ha AND chastore time hyperfine --warmup 3 -L exe build_v2.51.0/git,build_delete_ha_and_chastore/git '{exe} log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null' Benchmark 1: build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.291 s ± 0.002 s [User: 1.156 s, System: 0.129 s] Range (min … max): 1.287 s … 1.295 s 10 runs Benchmark 2: build_delete_ha_and_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Time (mean ± σ): 1.306 s ± 0.001 s [User: 1.195 s, System: 0.105 s] Range (min … max): 1.305 s … 1.308 s 10 runs Summary build_v2.51.0/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null ran 1.01 ± 0.00 times faster than build_delete_ha_and_chastore/git log --oneline --shortstat --diff-algorithm=myers -3000 v2.39.1 >/dev/null Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-30xdiff: delete redundant array xdfile_t.haEzekiel Newren
When 0 <= i < xdfile_t.nreff the following is true: xdfile_t.ha[i] == xdfile_t.recs[xdfile_t.rindex[i]] This makes the code about 5% slower. The fields rindex and ha are specific to the classic diff (myers and minimal). I plan on creating a struct for classic diff, but there's a lot of cleanup that needs to be done before that can happen and leaving ha in would make those cleanups harder to follow. A subsequent commit will delete the chastore cha from xdfile_t. That later commit will investigate deleting ha and cha independently and together. Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2025-09-26xdiff: delete unnecessary fields from xrecord_t and xdfile_tEzekiel Newren
xrecord_t.next, xdfile_t.hbits, xdfile_t.rhash are initialized, but never used for anything by the code. Remove them. Best-viewed-with: --color-words Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-09Replace Free Software Foundation address in license noticesTodd Zullinger
The mailing address for the FSF has changed over the years. Rather than updating the address across all files, refer readers to gnu.org, as the GNU GPL documentation now suggests for license notices. The mailing address is retained in the full license files (COPYING and LGPL-2.1). The old address is still present in t/diff-lib/COPYING. This is intentional, as the file is used in tests and the contents are not expected to change. Signed-off-by: Todd Zullinger <tmz@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-07War on whitespaceJunio C Hamano
This uses "git-apply --whitespace=strip" to fix whitespace errors that have crept in to our source files over time. There are a few files that need to have trailing whitespaces (most notably, test vectors). The results still passes the test, and build result in Documentation/ area is unchanged. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2006-03-25Use a *real* built-in diff generatorLinus Torvalds
This uses a simplified libxdiff setup to generate unified diffs _without_ doing fork/execve of GNU "diff". This has several huge advantages, for example: Before: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m24.818s user 0m13.332s sys 0m8.664s After: [torvalds@g5 linux]$ time git diff v2.6.16.. > /dev/null real 0m4.563s user 0m2.944s sys 0m1.580s and the fact that this should be a lot more portable (ie we can ignore all the issues with doing fork/execve under Windows). Perhaps even more importantly, this allows us to do diffs without actually ever writing out the git file contents to a temporary file (and without any of the shell quoting issues on filenames etc etc). NOTE! THIS PATCH DOES NOT DO THAT OPTIMIZATION YET! I was lazy, and the current "diff-core" code actually will always write the temp-files, because it used to be something that you simply had to do. So this current one actually writes a temp-file like before, and then reads it into memory again just to do the diff. Stupid. But if this basic infrastructure is accepted, we can start switching over diff-core to not write temp-files, which should speed things up even further, especially when doing big tree-to-tree diffs. Now, in the interest of full disclosure, I should also point out a few downsides: - the libxdiff algorithm is different, and I bet GNU diff has gotten a lot more testing. And the thing is, generating a diff is not an exact science - you can get two different diffs (and you will), and they can both be perfectly valid. So it's not possible to "validate" the libxdiff output by just comparing it against GNU diff. - GNU diff does some nice eye-candy, like trying to figure out what the last function was, and adding that information to the "@@ .." line. libxdiff doesn't do that. - The libxdiff thing has some known deficiencies. In particular, it gets the "\No newline at end of file" case wrong. So this is currently for the experimental branch only. I hope Davide will help fix it. That said, I think the huge performance advantage, and the fact that it integrates better is definitely worth it. But it should go into a development branch at least due to the missing newline issue. Technical note: this is based on libxdiff-0.17, but I did some surgery to get rid of the extraneous fat - stuff that git doesn't need, and seriously cutting down on mmfile_t, which had much more capabilities than the diff algorithm either needed or used. In this version, "mmfile_t" is just a trivial <pointer,length> tuple. That said, I tried to keep the differences to simple removals, so that you can do a diff between this and the libxdiff origin, and you'll basically see just things getting deleted. Even the mmfile_t simplifications are left in a state where the diffs should be readable. Apologies to Davide, whom I'd love to get feedback on this all from (I wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old complex format had a helper function for that, but I did my surgery with the goal in mind that eventually we _should_ just do mmfile_t mf; buf = read_sha1_file(sha1, type, &size); mf->ptr = buf; mf->size = size; .. use "mf" directly .. which was really a nightmare with the old "helpful" mmfile_t, and really is that easy with the new cut-down interfaces). [ Btw, as any hawk-eye can see from the diff, this was actually generated with itself, so it is "self-hosting". That's about all the testing it has gotten, along with the above kernel diff, which eye-balls correctly, but shows the newline issue when you double-check it with "git-apply" ] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>