git/patch-ids.c, branch gitk-resize-error

use CALLOC_ARRAY

2021-03-14T00:00:09Z

Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe Signed-off-by: Junio C Hamano

Merge branch 'jk/log-cherry-pick-duplicate-patches'

2021-01-25T22:19:19Z

When more than one commit with the same patch ID appears on one side, "git log --cherry-pick A...B" did not exclude them all when a commit with the same patch ID appears on the other side. Now it does. * jk/log-cherry-pick-duplicate-patches: patch-ids: handle duplicate hashmap entries

patch-ids: handle duplicate hashmap entries

2021-01-12T19:13:32Z

This fixes a bug introduced in dfb7a1b4d0 (patch-ids: stop using a hand-rolled hashmap implementation, 2016-07-29) in which git rev-list --cherry-pick A...B will fail to suppress commits reachable from A even if a commit with matching patch-id appears in B. Around the time of that commit, the algorithm for "--cherry-pick" looked something like this: 0. Traverse all of the commits, marking them as being on the left or right side of the symmetric difference. 1. Iterate over the left-hand commits, inserting a patch-id struct for each into a hashmap, and pointing commit->util to the patch-id struct. 2. Iterate over the right-hand commits, checking which are present in the hashmap. If so, we exclude the commit from the output _and_ we mark the patch-id as "seen". 3. Iterate again over the left-hand commits, checking whether commit->util->seen is set; if so, exclude them from the output. At the end, we'll have eliminated commits from both sides that have a matching patch-id on the other side. But there's a subtle assumption here: for any given patch-id, we must have exactly one struct representing it. If two commits from A both have the same patch-id and we allow duplicates in the hashmap, then we run into a problem: a. In step 1, we insert two patch-id structs into the hashmap. b. In step 2, our lookups will find only one of these structs, so only one "seen" flag is marked. c. In step 3, one of the commits in A will have its commit->util->seen set, but the other will not. We'll erroneously output the latter. Prior to dfb7a1b4d0, our hashmap did not allow duplicates. Afterwards, it used hashmap_add(), which explicitly does allow duplicates. At that point, the solution would have been easy: when we are about to add a duplicate, skip doing so and return the existing entry which matches. But it gets more complicated. In 683f17ec44 (patch-ids: replace the seen indicator with a commit pointer, 2016-07-29), our step 3 goes away entirely. Instead, in step 2, when the right-hand side finds a matching patch_id from the left-hand side, we can directly mark the left-hand patch_id->commit to be omitted. Solving that would be easy, too; there's a one-to-many relationship of patch-ids to commits, so we just need to keep a list. But there's more. Commit b3dfeebb92 (rebase: avoid computing unnecessary patch IDs, 2016-07-29) built on that by lazily computing the full patch-ids. So we don't even know when adding to the hashmap whether two commits truly have the same id. We'd have to tentatively assign them a list, and then possibly split them apart (possibly into N new structs) at the moment we compute the real patch-ids. This could work, but it's complicated and error-prone. Instead, let's accept that we may store duplicates, and teach the lookup side to be more clever. Rather than asking for a single matching patch-id, it will need to iterate over all matching patch-ids. This does mean examining every entry in a single hash bucket, but the worst-case for a hash lookup was already doing that. We'll keep the hashmap details out of the caller by providing a simple iteration interface. We can retain the simple has_commit_patch_id() interface for the other callers, but we'll simplify its return value into an integer, rather than returning the patch_id struct. That way they won't be tempted to look at the "commit" field of the return value without iterating. Reported-by: Arnaud Morin Signed-off-by: Jeff King Signed-off-by: Junio C Hamano

hash-lookup: rename from sha1-lookup

2021-01-04T21:01:55Z

Change all remnants of "sha1" in hash-lookup.c and .h and rename them to reflect that we're not just able to handle SHA-1 these days. Signed-off-by: Martin Ågren Reviewed-by: Derrick Stolee Signed-off-by: Junio C Hamano

hashmap: provide deallocation function names

2020-11-02T20:15:50Z

hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed for a while, but aren't necessarily the clearest names, especially with hashmap_partial_clear() being added to the mix and lazy-initialization now being supported. Peff suggested we adopt the following names[1]: - hashmap_clear() - remove all entries and de-allocate any hashmap-specific data, but be ready for reuse - hashmap_clear_and_free() - ditto, but free the entries themselves - hashmap_partial_clear() - remove all entries but don't deallocate table - hashmap_partial_clear_and_free() - ditto, but free the entries This patch provides the new names and converts all existing callers over to the new naming scheme. [1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/ Signed-off-by: Elijah Newren Signed-off-by: Junio C Hamano

hashmap: remove type arg from hashmap_{get,put,remove}_entry

2019-10-07T01:20:12Z

Since these macros already take a `keyvar' pointer of a known type, we can rely on OFFSETOF_VAR to get the correct offset without relying on non-portable `__typeof__' and `offsetof'. Argument order is also rearranged, so `keyvar' and `member' are sequential as they are used as: `keyvar->member' Signed-off-by: Eric Wong Reviewed-by: Derrick Stolee Signed-off-by: Junio C Hamano

hashmap: introduce hashmap_free_entries

2019-10-07T01:20:11Z

`hashmap_free_entries' behaves like `container_of' and passes the offset of the hashmap_entry struct to the internal `hashmap_free_' function, allowing the function to free any struct pointer regardless of where the hashmap_entry field is located. `hashmap_free' no longer takes any arguments aside from the hashmap itself. Signed-off-by: Eric Wong Reviewed-by: Derrick Stolee Signed-off-by: Junio C Hamano

hashmap_cmp_fn takes hashmap_entry params

2019-10-07T01:20:11Z

Another step in eliminating the requirement of hashmap_entry being the first member of a struct. Signed-off-by: Eric Wong Reviewed-by: Derrick Stolee Signed-off-by: Junio C Hamano

hashmap_get{,_from_hash} return "struct hashmap_entry *"

2019-10-07T01:20:11Z

Update callers to use hashmap_get_entry, hashmap_get_entry_from_hash or container_of as appropriate. This is another step towards eliminating the requirement of hashmap_entry being the first field in a struct. Signed-off-by: Eric Wong Reviewed-by: Derrick Stolee Signed-off-by: Junio C Hamano

hashmap_get takes "const struct hashmap_entry *"

2019-10-07T01:20:10Z

This is less error-prone than "const void *" as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong Reviewed-by: Derrick Stolee Signed-off-by: Junio C Hamano