aboutsummaryrefslogtreecommitdiff
path: root/name-hash.c
AgeCommit message (Collapse)Author
2025-05-21name-hash: don't add sparse directories in threaded lazy initAlex Mironov
Ensure that logic added in 5f11669586 (name-hash: don't add directories to name_hash, 2021-04-12) also applies in multithreaded hashtable init path. As per the original single-threaded change above: sparse directory entries represent a directory that is outside the sparse-checkout definition. These are not paths to blobs, so should not be added to the name_hash table. Instead, they should be added to the directory hashtable when 'ignore_case' is true. Add a condition to avoid placing sparse directories into the name_hash hashtable. This avoids filling the table with extra entries that will never be queried. Signed-off-by: Alex Mironov <alexandrfox@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-12-06global: mark code units that generate warnings with `-Wsign-compare`Patrick Steinhardt
Mark code units that generate warnings with `-Wsign-compare`. This allows for a structured approach to get rid of all such warnings over time in a way that can be easily measured. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-09-12environment: guard state depending on a repositoryPatrick Steinhardt
In "environment.h" we have quite a lot of functions and variables that either explicitly or implicitly depend on `the_repository`. The implicit set of stateful declarations includes for example variables which get populated when parsing a repository's Git configuration. This set of variables is broken by design, as their state often depends on the last repository config that has been parsed. So they may or may not represent the state of `the_repository`. Fixing that is quite a big undertaking, and later patches in this series will demonstrate a solution for a first small set of those variables. So for now, let's guard these with `USE_THE_REPOSITORY_VARIABLE` so that callers are aware of the implicit dependency. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-26name-hash: add index_dir_find()Jeff Hostetler
index_dir_exists() returns a boolean to indicate if there is a case-insensitive match in the directory name-hash, but does not provide the caller with the exact spelling of that match. Create index_dir_find() to do the case-insensitive search *and* optionally return the spelling of the matched directory prefix in a provided strbuf. To avoid code duplication, convert index_dir_exists() to be a trivial wrapper around the new index_dir_find(). Signed-off-by: Jeff Hostetler <jeffhostetler@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21cache.h: remove this no-longer-used headerElijah Newren
Since this header showed up in some places besides just #include statements, update/clean-up/remove those other places as well. Note that compat/fsmonitor/fsm-path-utils-darwin.c previously got away with violating the rule that all files must start with an include of git-compat-util.h (or a short-list of alternate headers that happen to include it first). This change exposed the violation and caused it to stop building correctly; fix it by having it include git-compat-util.h first, as per policy. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21read-cache*.h: move declarations for read-cache.c functions from cache.hElijah Newren
For the functions defined in read-cache.c, move their declarations from cache.h to a new header, read-cache-ll.h. Also move some related inline functions from cache.h to read-cache.h. The purpose of the read-cache-ll.h/read-cache.h split is that about 70% of the sites don't need the inline functions and the extra headers they include. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21name-hash.h: move declarations for name-hash.c from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-11treewide: be explicit about dependence on trace.h & trace2.hElijah Newren
Dozens of files made use of trace and trace2 functions, without explicitly including trace.h or trace2.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include trace.h or trace2.h if they are using them. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21environment.h: move declarations for environment.c functions from cache.hElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-21treewide: be explicit about dependence on gettext.hElijah Newren
Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-01git-compat-util.h: use "UNUSED", not "UNUSED(var)"Ævar Arnfjörð Bjarmason
As reported in [1] the "UNUSED(var)" macro introduced in 2174b8c75de (Merge branch 'jk/unused-annotation' into next, 2022-08-24) breaks coccinelle's parsing of our sources in files where it occurs. Let's instead partially go with the approach suggested in [2] of making this not take an argument. As noted in [1] "coccinelle" will ignore such tokens in argument lists that it doesn't know about, and it's less of a surprise to syntax highlighters. This undoes the "help us notice when a parameter marked as unused is actually use" part of 9b240347543 (git-compat-util: add UNUSED macro, 2022-08-19), a subsequent commit will further tweak the macro to implement a replacement for that functionality. 1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19hashmap: mark unused callback parametersJeff King
Hashmap comparison functions must conform to a particular callback interface, but many don't use all of their parameters. Especially the void cmp_data pointer, but some do not use keydata either (because they can easily form a full struct to pass when doing lookups). Let's mark these to make -Wunused-parameter happy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-30Merge branch 'ds/sparse-index-protections'Junio C Hamano
Builds on top of the sparse-index infrastructure to mark operations that are not ready to mark with the sparse index, causing them to fall back on fully-populated index that they always have worked with. * ds/sparse-index-protections: (47 commits) name-hash: use expand_to_path() sparse-index: expand_to_path() name-hash: don't add directories to name_hash revision: ensure full index resolve-undo: ensure full index read-cache: ensure full index pathspec: ensure full index merge-recursive: ensure full index entry: ensure full index dir: ensure full index update-index: ensure full index stash: ensure full index rm: ensure full index merge-index: ensure full index ls-files: ensure full index grep: ensure full index fsck: ensure full index difftool: ensure full index commit: ensure full index checkout: ensure full index ...
2021-04-14name-hash: use expand_to_path()Derrick Stolee
A sparse-index loads the name-hash data for its entries, including the sparse-directory entries. If a caller asks for a path that is contained within a sparse-directory entry, we need to expand to a full index and recalculate the name hash table before returning the result. Insert calls to expand_to_path() to protect against this case. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-14name-hash: don't add directories to name_hashDerrick Stolee
Sparse directory entries represent a directory that is outside the sparse-checkout definition. These are not paths to blobs, so should not be added to the name_hash table. Instead, they should be added to the directory hashtable when 'ignore_case' is true. Add a condition to avoid placing sparse directories into the name_hash hashtable. This avoids filling the table with extra entries that will never be queried. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-13use CALLOC_ARRAYRené Scharfe
Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-23name-hash: use trace2 regions for initDerrick Stolee
The lazy_init_name_hash() populates a hashset with all filenames and another with all directories represented in the index. This is run only if we need to use the hashsets to check for existence or case-folding renames. Place trace2 regions where there is already a performance trace. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-02hashmap: provide deallocation function namesElijah Newren
hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed for a while, but aren't necessarily the clearest names, especially with hashmap_partial_clear() being added to the mix and lazy-initialization now being supported. Peff suggested we adopt the following names[1]: - hashmap_clear() - remove all entries and de-allocate any hashmap-specific data, but be ready for reuse - hashmap_clear_and_free() - ditto, but free the entries themselves - hashmap_partial_clear() - remove all entries but don't deallocate table - hashmap_partial_clear_and_free() - ditto, but free the entries This patch provides the new names and converts all existing callers over to the new naming scheme. [1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/ Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-01Merge branch 'en/doc-typofix'Junio C Hamano
Docfix. * en/doc-typofix: Fix spelling errors in no-longer-updated-from-upstream modules multimail: fix a few simple spelling errors sha1dc: fix trivial comment spelling error Fix spelling errors in test commands Fix spelling errors in messages shown to users Fix spelling errors in names of tests Fix spelling errors in comments of testcases Fix spelling errors in code comments Fix spelling errors in documentation outside of Documentation/ Documentation: fix a bunch of typos, both old and new
2019-11-10Fix spelling errors in code commentsElijah Newren
Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-07name-hash.c: remove duplicate word in commentElijah Newren
Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap: remove type arg from hashmap_{get,put,remove}_entryEric Wong
Since these macros already take a `keyvar' pointer of a known type, we can rely on OFFSETOF_VAR to get the correct offset without relying on non-portable `__typeof__' and `offsetof'. Argument order is also rearranged, so `keyvar' and `member' are sequential as they are used as: `keyvar->member' Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07OFFSETOF_VAR macro to simplify hashmap iteratorsEric Wong
While we cannot rely on a `__typeof__' operator being portable to use with `offsetof'; we can calculate the pointer offset using an existing pointer and the address of a member using pointer arithmetic for compilers without `__typeof__'. This allows us to simplify usage of hashmap iterator macros by not having to specify a type when a pointer of that type is already given. In the future, list iterator macros (e.g. list_for_each_entry) may also be implemented using OFFSETOF_VAR to save hackers the trouble of using container_of/list_entry macros and without relying on non-portable `__typeof__'. v3: use `__typeof__' to avoid clang warnings Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap: introduce hashmap_free_entriesEric Wong
`hashmap_free_entries' behaves like `container_of' and passes the offset of the hashmap_entry struct to the internal `hashmap_free_' function, allowing the function to free any struct pointer regardless of where the hashmap_entry field is located. `hashmap_free' no longer takes any arguments aside from the hashmap itself. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap_cmp_fn takes hashmap_entry paramsEric Wong
Another step in eliminating the requirement of hashmap_entry being the first member of a struct. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap_get{,_from_hash} return "struct hashmap_entry *"Eric Wong
Update callers to use hashmap_get_entry, hashmap_get_entry_from_hash or container_of as appropriate. This is another step towards eliminating the requirement of hashmap_entry being the first field in a struct. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap: use *_entry APIs to wrap container_ofEric Wong
Using `container_of' can be verbose and choosing names for intermediate "struct hashmap_entry" pointers is a hard problem. So introduce "*_entry" APIs inspired by similar linked-list APIs in the Linux kernel. Unfortunately, `__typeof__' is not portable C, so we need an extra parameter to specify the type. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap_get_next returns "struct hashmap_entry *"Eric Wong
This is a step towards removing the requirement for hashmap_entry being the first field of a struct. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap_remove takes "const struct hashmap_entry *"Eric Wong
This is less error-prone than "const void *" as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap_get takes "const struct hashmap_entry *"Eric Wong
This is less error-prone than "const void *" as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap_add takes "struct hashmap_entry *"Eric Wong
This is less error-prone than "void *" as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap_get_next takes "const struct hashmap_entry *"Eric Wong
This is less error-prone than "const void *" as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07hashmap_entry_init takes "struct hashmap_entry *"Eric Wong
C compilers do type checking to make life easier for us. So rely on that and update all hashmap_entry_init callers to take "struct hashmap_entry *" to avoid future bugs while improving safety and readability. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-13cleanup: fix possible overflow errors in binary search, part 2René Scharfe
Calculating the sum of two array indexes to find the midpoint between them can overflow, i.e. code like this is unsafe for big arrays: mid = (first + last) >> 1; Make sure the intermediate value stays within the boundaries instead, like this: mid = first + ((last - first) >> 1); The loop condition of the binary search makes sure that 'last' is always greater than 'first', so this is safe as long as 'first' is not negative. And that can be verified easily using the pre-context of each change, except for name-hash.c, so add an assertion to that effect there. The unsafe calculations were found with: git grep '(.*+.*) *>> *1' This is a continuation of 19716b21a4 (cleanup: fix possible overflow errors in binary search, 2017-10-08). Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-24cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switchNguyễn Thái Ngọc Duy
By default, index compat macros are off from now on, because they could hide the_index dependency. Only those in builtin can use it. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-05Clean up pthread_create() error handlingNguyễn Thái Ngọc Duy
Normally pthread_create() rarely fails. But with new pthreads wrapper, pthread_create() will return ENOSYS on a system without thread support. Threaded code _is_ protected by HAVE_THREADS and pthread_create() should never run in the first place. But the situation could change in the future and bugs may sneak in. Make sure that all pthread_create() reports the error cause. While at there, mark these strings for translation if they aren't. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-05name-hash.c: remove #ifdef NO_PTHREADSNguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-18trace.h: support nested performance tracingNguyễn Thái Ngọc Duy
Performance measurements are listed right now as a flat list, which is fine when we measure big blocks. But when we start adding more and more measurements, some of them could be just part of a bigger measurement and a flat list gives a wrong impression that they are executed at the same level instead of nested. Add trace_performance_enter() and trace_performance_leave() to allow indent these nested measurements. For now it does not help much because the only nested thing is (lazy) name hash initialization (e.g. called in diff-index from "git status"). This will help more because I'm going to add some more tracing that's actually nested. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-27Merge branch 'bp/name-hash-dirname-fix'Junio C Hamano
"git add" files in the same directory, but spelling the directory path in different cases on case insensitive filesystem, corrupted the name hash data structure and led to unexpected results. This has been corrected. * bp/name-hash-dirname-fix: name-hash: properly fold directory names in adjust_dirname_case()
2018-02-08name-hash: properly fold directory names in adjust_dirname_case()Ben Peart
Correct the pointer arithmetic in adjust_dirname_case() so that it calls find_dir_entry() with the correct string length. Previously passing in "dir1/foo" would pass a length of 6 instead of the correct 4. This resulted in find_dir_entry() never finding the entry and so the subsequent memcpy that would fold the name to the version with the correct case never executed. Add a test to validate the corrected behavior with name folding of directories. Signed-off-by: Ben Peart <benpeart@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-02trace: measure where the time is spent in the index-heavy operationsNguyễn Thái Ngọc Duy
All the known heavy code blocks are measured (except object database access). This should help identify if an optimization is effective or not. An unoptimized git-status would give something like below: 0.001791141 s: read cache ... 0.004011363 s: preload index 0.000516161 s: refresh index 0.003139257 s: git command: ... 'status' '--porcelain=2' 0.006788129 s: diff-files 0.002090267 s: diff-index 0.001885735 s: initialize name hash 0.032013138 s: read directory 0.051781209 s: git command: './git' 'status' Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-07hashmap: add API to disable item counting when threadedJeff Hostetler
This is to address concerns raised by ThreadSanitizer on the mailing list about threaded unprotected R/W access to map.size with my previous "disallow rehash" change (0607e10009ee4e37cb49b4cec8d28a9dda1656a4). See: https://public-inbox.org/git/adb37b70139fd1e2bac18bfd22c8b96683ae18eb.1502780344.git.martin.agren@gmail.com/ Add API to hashmap to disable item counting and thus automatic rehashing. Also include API to later re-enable them. When item counting is disabled, the map.size field is invalid. So to prevent accidents, the field has been renamed and an accessor function hashmap_get_size() has been added. All direct references to this field have been been updated. And the name of the field changed to map.private_size to communicate this. Here is the relevant output from ThreadSanitizer showing the problem: WARNING: ThreadSanitizer: data race (pid=10554) Read of size 4 at 0x00000082d488 by thread T2 (mutexes: write M16): #0 hashmap_add hashmap.c:209 #1 hash_dir_entry_with_parent_and_prefix name-hash.c:302 #2 handle_range_dir name-hash.c:347 #3 handle_range_1 name-hash.c:415 #4 lazy_dir_thread_proc name-hash.c:471 #5 <null> <null> Previous write of size 4 at 0x00000082d488 by thread T1 (mutexes: write M31): #0 hashmap_add hashmap.c:209 #1 hash_dir_entry_with_parent_and_prefix name-hash.c:302 #2 handle_range_dir name-hash.c:347 #3 handle_range_1 name-hash.c:415 #4 handle_range_dir name-hash.c:380 #5 handle_range_1 name-hash.c:415 #6 lazy_dir_thread_proc name-hash.c:471 #7 <null> <null> Martin gives instructions for running TSan on test t3008 in this post: https://public-inbox.org/git/CAN0heSoJDL9pWELD6ciLTmWf-a=oyxe4EXXOmCKvsG5MSuzxsA@mail.gmail.com/ Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-05name-hash.c: drop hashmap_cmp_fn castStefan Beller
Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30hashmap.h: compare function has access to a data fieldStefan Beller
When using the hashmap a common need is to have access to caller provided data in the compare function. A couple of times we abuse the keydata field to pass in the data needed. This happens for example in patch-ids.c. This patch changes the function signature of the compare function to have one more void pointer available. The pointer given for each invocation of the compare function must be defined in the init function of the hashmap and is just passed through. Documentation of this new feature is deferred to a later patch. This is a rather mechanical conversion, just adding the new pass-through parameter. However while at it improve the naming of the fields of all compare functions used by hashmaps by ensuring unused parameters are prefixed with 'unused_' and naming the parameters what they are (instead of 'unused' make it 'unused_keydata'). Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-31name-hash: fix buffer overrunKevin Willford
Add check for the end of the entries for the thread partition. Add test for lazy init name hash with specific directory structure The lazy init hash name was causing a buffer overflow when the last entry in the index was multiple folder deep with parent folders that did not have any files in them. This adds a test for the boundary condition of the thread partitions with the folder structure that was triggering the buffer overflow. The fix was to check if it is the last entry for the thread partition in the handle_range_dir and not try to use the next entry in the cache. Signed-off-by: Kevin Willford <kewillf@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-24name-hash: perf improvement for lazy_init_name_hashJeff Hostetler
Improve performance of lazy_init_name_hash() when ignore_case is set. Teach name-hash to build the istate.name_hash and istate.dir_hash simultaneously using a forward-diving technique on the pathname of the index_entry, rather than adding name_hash entries and then searching backwards in the pathname for parent directories. This borrows algorithm ideas from clear_ce_flags_{1,dir}. Multiple threads are used with the new algorithm to speed hashmap construction. This new code path is only used when threads are present (a compiler settings) and when the index is large enough to warrant the pthread complexity. The code in clear_ce_flags_dir() uses a linear search to find the adjacent index entries with the same prefix; a binary search is used here handle_range_dir() to further speed things up. The size of LAZY_THREAD_COST was determined from rough analysis using: t/helper/test-lazy-init-name-hash --analyze Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-22name-hash: specify initial size for istate.dir_hash tableJeff Hostetler
Specify an initial size for the istate.dir_hash HashMap matching the size of the istate.name_hash. Previously hashmap_init() was given 0, causing a 64 bucket hashmap to be created. When working with very large repositories, this would cause numerous rehash() calls to realloc and rebalance the hashmap. This is especially true when the worktree is deep, with many directories containing a few files. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22convert trivial cases to FLEX_ARRAY macrosJeff King
Using FLEX_ARRAY macros reduces the amount of manual computation size we have to do. It also ensures we don't overflow size_t, and it makes sure we write the same number of bytes that we allocated. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-21name-hash: don't reuse cache_entry in dir_entryDavid Turner
Stop reusing cache_entry in dir_entry; doing so causes a use-after-free bug. During merges, we free entries that we no longer need in the destination index. But those entries might have also been stored in the dir_entry cache, and when a later call to add_to_index found them, they would be used after being freed. To prevent this, change dir_entry to store a copy of the name instead of a pointer to a cache_entry. This entails some refactoring of code that expects the cache_entry. Keith McGuigan <kmcguigan@twitter.com> diagnosed this bug and wrote the initial patch, but this version does not use any of Keith's code. Helped-by: Keith McGuigan <kmcguigan@twitter.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07hashmap: add simplified hashmap_get_from_hash() APIKarsten Blees
Hashmap entries are typically looked up by just a key. The hashmap_get() API expects an initialized entry structure instead, to support compound keys. This flexibility is currently only needed by find_dir_entry() in name-hash.c (and compat/win32/fscache.c in the msysgit fork). All other (currently five) call sites of hashmap_get() have to set up a near emtpy entry structure, resulting in duplicate code like this: struct hashmap_entry keyentry; hashmap_entry_init(&keyentry, hash(key)); return hashmap_get(map, &keyentry, key); Add a hashmap_get_from_hash() API that allows hashmap lookups by just specifying the key and its hash code, i.e.: return hashmap_get_from_hash(map, hash(key), key); Signed-off-by: Karsten Blees <blees@dcon.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>