From 10bd0be173b0ab9a9255fc15a768b2518d3eba57 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Wed, 12 Jun 2019 06:29:39 -0700 Subject: commit-graph: remove Future Work section The commit-graph feature began with a long list of planned benefits, most of which are now complete. The future work section has only a few items left. As for making more algorithms aware of generation numbers, some are only waiting for generation number v2 to ensure the performance matches the existing behavior using commit date. It is unlikely that we will ever send a commit-graph file as part of the protocol, since we would need to verify the data, and that is expensive. If we want to start trusting remote content, then that item can be investigated again. While there is more work to be done on the feature, having a section of the docs devoted to a TODO list is wasteful and hard to keep up-to-date. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- Documentation/technical/commit-graph.txt | 17 ----------------- 1 file changed, 17 deletions(-) (limited to 'Documentation/technical/commit-graph.txt') diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index 7805b0968c..fb53341d5e 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt @@ -127,23 +127,6 @@ Design Details helpful for these clones, anyway. The commit-graph will not be read or written when shallow commits are present. -Future Work ------------ - -- After computing and storing generation numbers, we must make graph - walks aware of generation numbers to gain the performance benefits they - enable. This will mostly be accomplished by swapping a commit-date-ordered - priority queue with one ordered by generation number. The following - operations are important candidates: - - - 'log --topo-order' - - 'tag --merged' - -- A server could provide a commit-graph file as part of the network protocol - to avoid extra calculations by clients. This feature is only of benefit if - the user is willing to trust the file, because verifying the file is correct - is as hard as computing it from scratch. - Related Links ------------- [0] https://bugs.chromium.org/p/git/issues/detail?id=8 -- cgit v1.3 From 890345ac106f78dfcfcde002de39ebbb410e0b39 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Tue, 18 Jun 2019 11:14:23 -0700 Subject: commit-graph: document commit-graph chains Add a basic description of commit-graph chains. More details about the feature will be added as we add functionality. This introduction gives a high-level overview to the goals of the feature and the basic layout of commit-graph chains. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- Documentation/technical/commit-graph.txt | 59 ++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) (limited to 'Documentation/technical/commit-graph.txt') diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index fb53341d5e..1dca3bd8fe 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt @@ -127,6 +127,65 @@ Design Details helpful for these clones, anyway. The commit-graph will not be read or written when shallow commits are present. +Commit Graphs Chains +-------------------- + +Typically, repos grow with near-constant velocity (commits per day). Over time, +the number of commits added by a fetch operation is much smaller than the +number of commits in the full history. By creating a "chain" of commit-graphs, +we enable fast writes of new commit data without rewriting the entire commit +history -- at least, most of the time. + +## File Layout + +A commit-graph chain uses multiple files, and we use a fixed naming convention +to organize these files. Each commit-graph file has a name +`$OBJDIR/info/commit-graphs/graph-{hash}.graph` where `{hash}` is the hex- +valued hash stored in the footer of that file (which is a hash of the file's +contents before that hash). For a chain of commit-graph files, a plain-text +file at `$OBJDIR/info/commit-graphs/commit-graph-chain` contains the +hashes for the files in order from "lowest" to "highest". + +For example, if the `commit-graph-chain` file contains the lines + +``` + {hash0} + {hash1} + {hash2} +``` + +then the commit-graph chain looks like the following diagram: + + +-----------------------+ + | graph-{hash2}.graph | + +-----------------------+ + | + +-----------------------+ + | | + | graph-{hash1}.graph | + | | + +-----------------------+ + | + +-----------------------+ + | | + | | + | | + | graph-{hash0}.graph | + | | + | | + | | + +-----------------------+ + +Let X0 be the number of commits in `graph-{hash0}.graph`, X1 be the number of +commits in `graph-{hash1}.graph`, and X2 be the number of commits in +`graph-{hash2}.graph`. If a commit appears in position i in `graph-{hash2}.graph`, +then we interpret this as being the commit in position (X0 + X1 + i), and that +will be used as its "graph position". The commits in `graph-{hash2}.graph` use these +positions to refer to their parents, which may be in `graph-{hash1}.graph` or +`graph-{hash0}.graph`. We can navigate to an arbitrary commit in position j by checking +its containment in the intervals [0, X0), [X0, X0 + X1), [X0 + X1, X0 + X1 + +X2). + Related Links ------------- [0] https://bugs.chromium.org/p/git/issues/detail?id=8 -- cgit v1.3 From 1771be90c8e4797c2466296d1d570dbfa39d9743 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Tue, 18 Jun 2019 11:14:29 -0700 Subject: commit-graph: merge commit-graph chains When searching for a commit in a commit-graph chain of G graphs with N commits, the search takes O(G log N) time. If we always add a new tip graph with every write, the linear G term will start to dominate and slow the lookup process. To keep lookups fast, but also keep most incremental writes fast, create a strategy for merging levels of the commit-graph chain. The strategy is detailed in the commit-graph design document, but is summarized by these two conditions: 1. If the number of commits we are adding is more than half the number of commits in the graph below, then merge with that graph. 2. If we are writing more than 64,000 commits into a single graph, then merge with all lower graphs. The numeric values in the conditions above are currently constant, but can become config options in a future update. As we merge levels of the commit-graph chain, check that the commits still exist in the repository. A garbage-collection operation may have removed those commits from the object store and we do not want to persist them in the commit-graph chain. This is a non-issue if the 'git gc' process wrote a new, single-level commit-graph file. After we merge levels, the old graph-{hash}.graph files are no longer referenced by the commit-graph-chain file. We will expire these files in a future change. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- Documentation/technical/commit-graph.txt | 80 ++++++++++++++ commit-graph.c | 180 +++++++++++++++++++++++++------ t/t5324-split-commit-graph.sh | 13 +++ 3 files changed, 240 insertions(+), 33 deletions(-) (limited to 'Documentation/technical/commit-graph.txt') diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index 1dca3bd8fe..d9c6253b0a 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt @@ -186,6 +186,86 @@ positions to refer to their parents, which may be in `graph-{hash1}.graph` or its containment in the intervals [0, X0), [X0, X0 + X1), [X0 + X1, X0 + X1 + X2). +Each commit-graph file (except the base, `graph-{hash0}.graph`) contains data +specifying the hashes of all files in the lower layers. In the above example, +`graph-{hash1}.graph` contains `{hash0}` while `graph-{hash2}.graph` contains +`{hash0}` and `{hash1}`. + +## Merging commit-graph files + +If we only added a new commit-graph file on every write, we would run into a +linear search problem through many commit-graph files. Instead, we use a merge +strategy to decide when the stack should collapse some number of levels. + +The diagram below shows such a collapse. As a set of new commits are added, it +is determined by the merge strategy that the files should collapse to +`graph-{hash1}`. Thus, the new commits, the commits in `graph-{hash2}` and +the commits in `graph-{hash1}` should be combined into a new `graph-{hash3}` +file. + + +---------------------+ + | | + | (new commits) | + | | + +---------------------+ + | | + +-----------------------+ +---------------------+ + | graph-{hash2} |->| | + +-----------------------+ +---------------------+ + | | | + +-----------------------+ +---------------------+ + | | | | + | graph-{hash1} |->| | + | | | | + +-----------------------+ +---------------------+ + | tmp_graphXXX + +-----------------------+ + | | + | | + | | + | graph-{hash0} | + | | + | | + | | + +-----------------------+ + +During this process, the commits to write are combined, sorted and we write the +contents to a temporary file, all while holding a `commit-graph-chain.lock` +lock-file. When the file is flushed, we rename it to `graph-{hash3}` +according to the computed `{hash3}`. Finally, we write the new chain data to +`commit-graph-chain.lock`: + +``` + {hash3} + {hash0} +``` + +We then close the lock-file. + +## Merge Strategy + +When writing a set of commits that do not exist in the commit-graph stack of +height N, we default to creating a new file at level N + 1. We then decide to +merge with the Nth level if one of two conditions hold: + + 1. The expected file size for level N + 1 is at least half the file size for + level N. + + 2. Level N + 1 contains more than 64,0000 commits. + +This decision cascades down the levels: when we merge a level we create a new +set of commits that then compares to the next level. + +The first condition bounds the number of levels to be logarithmic in the total +number of commits. The second condition bounds the total number of commits in +a `graph-{hashN}` file and not in the `commit-graph` file, preventing +significant performance issues when the stack merges and another process only +partially reads the previous stack. + +The merge strategy values (2 for the size multiple, 64,000 for the maximum +number of commits) could be extracted into config settings for full +flexibility. + Related Links ------------- [0] https://bugs.chromium.org/p/git/issues/detail?id=8 diff --git a/commit-graph.c b/commit-graph.c index 1224309e5f..fb3100921c 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1298,36 +1298,6 @@ static int write_graph_chunk_base(struct hashfile *f, return 0; } -static void init_commit_graph_chain(struct write_commit_graph_context *ctx) -{ - struct commit_graph *g = ctx->r->objects->commit_graph; - uint32_t i; - - ctx->new_base_graph = g; - ctx->base_graph_name = xstrdup(g->filename); - ctx->new_num_commits_in_base = g->num_commits + g->num_commits_in_base; - - ctx->num_commit_graphs_after = ctx->num_commit_graphs_before + 1; - - ALLOC_ARRAY(ctx->commit_graph_filenames_after, ctx->num_commit_graphs_after); - ALLOC_ARRAY(ctx->commit_graph_hash_after, ctx->num_commit_graphs_after); - - for (i = 0; i < ctx->num_commit_graphs_before - 1; i++) - ctx->commit_graph_filenames_after[i] = xstrdup(ctx->commit_graph_filenames_before[i]); - - if (ctx->num_commit_graphs_before) - ctx->commit_graph_filenames_after[ctx->num_commit_graphs_before - 1] = - get_split_graph_filename(ctx->obj_dir, oid_to_hex(&g->oid)); - - i = ctx->num_commit_graphs_before - 1; - - while (g) { - ctx->commit_graph_hash_after[i] = xstrdup(oid_to_hex(&g->oid)); - i--; - g = g->base_graph; - } -} - static int write_commit_graph_file(struct write_commit_graph_context *ctx) { uint32_t i; @@ -1509,6 +1479,145 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) return 0; } +static int split_strategy_max_commits = 64000; +static float split_strategy_size_mult = 2.0f; + +static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) +{ + struct commit_graph *g = ctx->r->objects->commit_graph; + uint32_t num_commits = ctx->commits.nr; + uint32_t i; + + g = ctx->r->objects->commit_graph; + ctx->num_commit_graphs_after = ctx->num_commit_graphs_before + 1; + + while (g && (g->num_commits <= split_strategy_size_mult * num_commits || + num_commits > split_strategy_max_commits)) { + num_commits += g->num_commits; + g = g->base_graph; + + ctx->num_commit_graphs_after--; + } + + ctx->new_base_graph = g; + + ALLOC_ARRAY(ctx->commit_graph_filenames_after, ctx->num_commit_graphs_after); + ALLOC_ARRAY(ctx->commit_graph_hash_after, ctx->num_commit_graphs_after); + + for (i = 0; i < ctx->num_commit_graphs_after && + i < ctx->num_commit_graphs_before; i++) + ctx->commit_graph_filenames_after[i] = xstrdup(ctx->commit_graph_filenames_before[i]); + + i = ctx->num_commit_graphs_before - 1; + g = ctx->r->objects->commit_graph; + + while (g) { + if (i < ctx->num_commit_graphs_after) + ctx->commit_graph_hash_after[i] = xstrdup(oid_to_hex(&g->oid)); + + i--; + g = g->base_graph; + } +} + +static void merge_commit_graph(struct write_commit_graph_context *ctx, + struct commit_graph *g) +{ + uint32_t i; + uint32_t offset = g->num_commits_in_base; + + ALLOC_GROW(ctx->commits.list, ctx->commits.nr + g->num_commits, ctx->commits.alloc); + + for (i = 0; i < g->num_commits; i++) { + struct object_id oid; + struct commit *result; + + display_progress(ctx->progress, i + 1); + + load_oid_from_graph(g, i + offset, &oid); + + /* only add commits if they still exist in the repo */ + result = lookup_commit_reference_gently(ctx->r, &oid, 1); + + if (result) { + ctx->commits.list[ctx->commits.nr] = result; + ctx->commits.nr++; + } + } +} + +static int commit_compare(const void *_a, const void *_b) +{ + const struct commit *a = *(const struct commit **)_a; + const struct commit *b = *(const struct commit **)_b; + return oidcmp(&a->object.oid, &b->object.oid); +} + +static void sort_and_scan_merged_commits(struct write_commit_graph_context *ctx) +{ + uint32_t i, num_parents; + struct commit_list *parent; + + if (ctx->report_progress) + ctx->progress = start_delayed_progress( + _("Scanning merged commits"), + ctx->commits.nr); + + QSORT(ctx->commits.list, ctx->commits.nr, commit_compare); + + ctx->num_extra_edges = 0; + for (i = 0; i < ctx->commits.nr; i++) { + display_progress(ctx->progress, i); + + if (i && oideq(&ctx->commits.list[i - 1]->object.oid, + &ctx->commits.list[i]->object.oid)) { + die(_("unexpected duplicate commit id %s"), + oid_to_hex(&ctx->commits.list[i]->object.oid)); + } else { + num_parents = 0; + for (parent = ctx->commits.list[i]->parents; parent; parent = parent->next) + num_parents++; + + if (num_parents > 2) + ctx->num_extra_edges += num_parents - 2; + } + } + + stop_progress(&ctx->progress); +} + +static void merge_commit_graphs(struct write_commit_graph_context *ctx) +{ + struct commit_graph *g = ctx->r->objects->commit_graph; + uint32_t current_graph_number = ctx->num_commit_graphs_before; + struct strbuf progress_title = STRBUF_INIT; + + while (g && current_graph_number >= ctx->num_commit_graphs_after) { + current_graph_number--; + + if (ctx->report_progress) { + strbuf_addstr(&progress_title, _("Merging commit-graph")); + ctx->progress = start_delayed_progress(progress_title.buf, 0); + } + + merge_commit_graph(ctx, g); + stop_progress(&ctx->progress); + strbuf_release(&progress_title); + + g = g->base_graph; + } + + if (g) { + ctx->new_base_graph = g; + ctx->new_num_commits_in_base = g->num_commits + g->num_commits_in_base; + } + + if (ctx->new_base_graph) + ctx->base_graph_name = xstrdup(ctx->new_base_graph->filename); + + sort_and_scan_merged_commits(ctx); +} + int write_commit_graph(const char *obj_dir, struct string_list *pack_indexes, struct string_list *commit_hex, @@ -1554,6 +1663,9 @@ int write_commit_graph(const char *obj_dir, ctx->approx_nr_objects = approximate_object_count(); ctx->oids.alloc = ctx->approx_nr_objects / 32; + if (ctx->split && ctx->oids.alloc > split_strategy_max_commits) + ctx->oids.alloc = split_strategy_max_commits; + if (ctx->append) { prepare_commit_graph_one(ctx->r, ctx->obj_dir); if (ctx->r->objects->commit_graph) @@ -1607,9 +1719,11 @@ int write_commit_graph(const char *obj_dir, if (!ctx->commits.nr) goto cleanup; - if (ctx->split) - init_commit_graph_chain(ctx); - else + if (ctx->split) { + split_graph_merge_strategy(ctx); + + merge_commit_graphs(ctx); + } else ctx->num_commit_graphs_after = 1; compute_generation_numbers(ctx); diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index ccd24bd22b..5cb5663a30 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -119,4 +119,17 @@ test_expect_success 'add one commit, write a tip graph' ' graph_git_behavior 'three-layer commit-graph: commit 11 vs 6' commits/11 commits/6 +test_expect_success 'add one commit, write a merged graph' ' + test_commit 12 && + git branch commits/12 && + git commit-graph write --reachable --split && + test_path_is_file $graphdir/commit-graph-chain && + test_line_count = 2 $graphdir/commit-graph-chain && + ls $graphdir/graph-*.graph >graph-files && + test_line_count = 4 graph-files && + verify_chain_files_exist $graphdir +' + +graph_git_behavior 'merged commit-graph: commit 12 vs 6' commits/12 commits/6 + test_done -- cgit v1.3 From c523035cbd8515d095dc15e9661fd896733bedbc Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Tue, 18 Jun 2019 11:14:30 -0700 Subject: commit-graph: allow cross-alternate chains In an environment like a fork network, it is helpful to have a commit-graph chain that spans both the base repo and the fork repo. The fork is usually a small set of data on top of the large repo, but sometimes the fork is much larger. For example, git-for-windows/git has almost double the number of commits as git/git because it rebases its commits on every major version update. To allow cross-alternate commit-graph chains, we need a few pieces: 1. When looking for a graph-{hash}.graph file, check all alternates. 2. When merging commit-graph chains, do not merge across alternates. 3. When writing a new commit-graph chain based on a commit-graph file in another object directory, do not allow success if the base file has of the name "commit-graph" instead of "commit-graphs/graph-{hash}.graph". Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- Documentation/technical/commit-graph.txt | 40 +++++++++++++++++++++++ commit-graph.c | 56 +++++++++++++++++++++++++------- commit-graph.h | 1 + t/t5324-split-commit-graph.sh | 37 +++++++++++++++++++++ 4 files changed, 123 insertions(+), 11 deletions(-) (limited to 'Documentation/technical/commit-graph.txt') diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index d9c6253b0a..473032e476 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt @@ -266,6 +266,42 @@ The merge strategy values (2 for the size multiple, 64,000 for the maximum number of commits) could be extracted into config settings for full flexibility. +## Chains across multiple object directories + +In a repo with alternates, we look for the `commit-graph-chain` file starting +in the local object directory and then in each alternate. The first file that +exists defines our chain. As we look for the `graph-{hash}` files for +each `{hash}` in the chain file, we follow the same pattern for the host +directories. + +This allows commit-graphs to be split across multiple forks in a fork network. +The typical case is a large "base" repo with many smaller forks. + +As the base repo advances, it will likely update and merge its commit-graph +chain more frequently than the forks. If a fork updates their commit-graph after +the base repo, then it should "reparent" the commit-graph chain onto the new +chain in the base repo. When reading each `graph-{hash}` file, we track +the object directory containing it. During a write of a new commit-graph file, +we check for any changes in the source object directory and read the +`commit-graph-chain` file for that source and create a new file based on those +files. During this "reparent" operation, we necessarily need to collapse all +levels in the fork, as all of the files are invalid against the new base file. + +It is crucial to be careful when cleaning up "unreferenced" `graph-{hash}.graph` +files in this scenario. It falls to the user to define the proper settings for +their custom environment: + + 1. When merging levels in the base repo, the unreferenced files may still be + referenced by chains from fork repos. + + 2. The expiry time should be set to a length of time such that every fork has + time to recompute their commit-graph chain to "reparent" onto the new base + file(s). + + 3. If the commit-graph chain is updated in the base, the fork will not have + access to the new chain until its chain is updated to reference those files. + (This may change in the future [5].) + Related Links ------------- [0] https://bugs.chromium.org/p/git/issues/detail?id=8 @@ -292,3 +328,7 @@ Related Links [4] https://public-inbox.org/git/20180108154822.54829-1-git@jeffhostetler.com/T/#u A patch to remove the ahead-behind calculation from 'status'. + +[5] https://public-inbox.org/git/f27db281-abad-5043-6d71-cbb083b1c877@gmail.com/ + A discussion of a "two-dimensional graph position" that can allow reading + multiple commit-graph chains at the same time. diff --git a/commit-graph.c b/commit-graph.c index fb3100921c..fba705bc51 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -320,6 +320,9 @@ static struct commit_graph *load_commit_graph_v1(struct repository *r, const cha struct commit_graph *g = load_commit_graph_one(graph_name); free(graph_name); + if (g) + g->obj_dir = obj_dir; + return g; } @@ -379,9 +382,10 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, const count = st.st_size / (the_hash_algo->hexsz + 1); oids = xcalloc(count, sizeof(struct object_id)); - for (i = 0; i < count && valid; i++) { - char *graph_name; - struct commit_graph *g; + prepare_alt_odb(r); + + for (i = 0; i < count; i++) { + struct object_directory *odb; if (strbuf_getline_lf(&line, fp) == EOF) break; @@ -393,14 +397,29 @@ static struct commit_graph *load_commit_graph_chain(struct repository *r, const break; } - graph_name = get_split_graph_filename(obj_dir, line.buf); - g = load_commit_graph_one(graph_name); - free(graph_name); + valid = 0; + for (odb = r->objects->odb; odb; odb = odb->next) { + char *graph_name = get_split_graph_filename(odb->path, line.buf); + struct commit_graph *g = load_commit_graph_one(graph_name); - if (g && add_graph_to_chain(g, graph_chain, oids, i)) - graph_chain = g; - else - valid = 0; + free(graph_name); + + if (g) { + g->obj_dir = odb->path; + + if (add_graph_to_chain(g, graph_chain, oids, i)) { + graph_chain = g; + valid = 1; + } + + break; + } + } + + if (!valid) { + warning(_("unable to find all commit-graph files")); + break; + } } free(oids); @@ -1418,7 +1437,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) if (ctx->split && ctx->base_graph_name && ctx->num_commit_graphs_after > 1) { char *new_base_hash = xstrdup(oid_to_hex(&ctx->new_base_graph->oid)); - char *new_base_name = get_split_graph_filename(ctx->obj_dir, new_base_hash); + char *new_base_name = get_split_graph_filename(ctx->new_base_graph->obj_dir, new_base_hash); free(ctx->commit_graph_filenames_after[ctx->num_commit_graphs_after - 2]); free(ctx->commit_graph_hash_after[ctx->num_commit_graphs_after - 2]); @@ -1493,6 +1512,9 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) while (g && (g->num_commits <= split_strategy_size_mult * num_commits || num_commits > split_strategy_max_commits)) { + if (strcmp(g->obj_dir, ctx->obj_dir)) + break; + num_commits += g->num_commits; g = g->base_graph; @@ -1501,6 +1523,18 @@ static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) ctx->new_base_graph = g; + if (ctx->num_commit_graphs_after == 2) { + char *old_graph_name = get_commit_graph_filename(g->obj_dir); + + if (!strcmp(g->filename, old_graph_name) && + strcmp(g->obj_dir, ctx->obj_dir)) { + ctx->num_commit_graphs_after = 1; + ctx->new_base_graph = NULL; + } + + free(old_graph_name); + } + ALLOC_ARRAY(ctx->commit_graph_filenames_after, ctx->num_commit_graphs_after); ALLOC_ARRAY(ctx->commit_graph_hash_after, ctx->num_commit_graphs_after); diff --git a/commit-graph.h b/commit-graph.h index c321834533..802d35254f 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -48,6 +48,7 @@ struct commit_graph { uint32_t num_commits; struct object_id oid; char *filename; + const char *obj_dir; uint32_t num_commits_in_base; struct commit_graph *base_graph; diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 5cb5663a30..46f0832f68 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -90,6 +90,21 @@ test_expect_success 'add more commits, and write a new base graph' ' graph_read_expect 12 ' +test_expect_success 'fork and fail to base a chain on a commit-graph file' ' + test_when_finished rm -rf fork && + git clone . fork && + ( + cd fork && + rm .git/objects/info/commit-graph && + echo "$(pwd)/../.git/objects" >.git/objects/info/alternates && + test_commit new-commit && + git commit-graph write --reachable --split && + test_path_is_file $graphdir/commit-graph-chain && + test_line_count = 1 $graphdir/commit-graph-chain && + verify_chain_files_exist $graphdir + ) +' + test_expect_success 'add three more commits, write a tip graph' ' git reset --hard commits/3 && git merge merge/1 && @@ -132,4 +147,26 @@ test_expect_success 'add one commit, write a merged graph' ' graph_git_behavior 'merged commit-graph: commit 12 vs 6' commits/12 commits/6 +test_expect_success 'create fork and chain across alternate' ' + git clone . fork && + ( + cd fork && + git config core.commitGraph true && + rm -rf $graphdir && + echo "$(pwd)/../.git/objects" >.git/objects/info/alternates && + test_commit 13 && + git branch commits/13 && + git commit-graph write --reachable --split && + test_path_is_file $graphdir/commit-graph-chain && + test_line_count = 3 $graphdir/commit-graph-chain && + ls $graphdir/graph-*.graph >graph-files && + test_line_count = 1 graph-files && + git -c core.commitGraph=true rev-list HEAD >expect && + git -c core.commitGraph=false rev-list HEAD >actual && + test_cmp expect actual + ) +' + +graph_git_behavior 'alternate: commit 13 vs 6' commits/13 commits/6 + test_done -- cgit v1.3 From 8d84097f9655a954ea7e94ba08d93e5940ccb2ae Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Tue, 18 Jun 2019 11:14:31 -0700 Subject: commit-graph: expire commit-graph files As we merge commit-graph files in a commit-graph chain, we should clean up the files that are no longer used. This change introduces an 'expiry_window' value to the context, which is always zero (for now). We then check the modified time of each graph-{hash}.graph file in the $OBJDIR/info/commit-graphs folder and unlink the files that are older than the expiry_window. Since this is always zero, this immediately clears all unused graph files. We will update the value to match a config setting in a future change. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- Documentation/technical/commit-graph.txt | 15 +++++++ commit-graph.c | 69 ++++++++++++++++++++++++++++++++ t/t5324-split-commit-graph.sh | 2 +- 3 files changed, 85 insertions(+), 1 deletion(-) (limited to 'Documentation/technical/commit-graph.txt') diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index 473032e476..aed4350a59 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt @@ -266,6 +266,21 @@ The merge strategy values (2 for the size multiple, 64,000 for the maximum number of commits) could be extracted into config settings for full flexibility. +## Deleting graph-{hash} files + +After a new tip file is written, some `graph-{hash}` files may no longer +be part of a chain. It is important to remove these files from disk, eventually. +The main reason to delay removal is that another process could read the +`commit-graph-chain` file before it is rewritten, but then look for the +`graph-{hash}` files after they are deleted. + +To allow holding old split commit-graphs for a while after they are unreferenced, +we update the modified times of the files when they become unreferenced. Then, +we scan the `$OBJDIR/info/commit-graphs/` directory for `graph-{hash}` +files whose modified times are older than a given expiry window. This window +defaults to zero, but can be changed using command-line arguments or a config +setting. + ## Chains across multiple object directories In a repo with alternates, we look for the `commit-graph-chain` file starting diff --git a/commit-graph.c b/commit-graph.c index fba705bc51..0cc2ceb349 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -1652,6 +1652,70 @@ static void merge_commit_graphs(struct write_commit_graph_context *ctx) sort_and_scan_merged_commits(ctx); } +static void mark_commit_graphs(struct write_commit_graph_context *ctx) +{ + uint32_t i; + time_t now = time(NULL); + + for (i = ctx->num_commit_graphs_after - 1; i < ctx->num_commit_graphs_before; i++) { + struct stat st; + struct utimbuf updated_time; + + stat(ctx->commit_graph_filenames_before[i], &st); + + updated_time.actime = st.st_atime; + updated_time.modtime = now; + utime(ctx->commit_graph_filenames_before[i], &updated_time); + } +} + +static void expire_commit_graphs(struct write_commit_graph_context *ctx) +{ + struct strbuf path = STRBUF_INIT; + DIR *dir; + struct dirent *de; + size_t dirnamelen; + time_t expire_time = time(NULL); + + strbuf_addstr(&path, ctx->obj_dir); + strbuf_addstr(&path, "/info/commit-graphs"); + dir = opendir(path.buf); + + if (!dir) { + strbuf_release(&path); + return; + } + + strbuf_addch(&path, '/'); + dirnamelen = path.len; + while ((de = readdir(dir)) != NULL) { + struct stat st; + uint32_t i, found = 0; + + strbuf_setlen(&path, dirnamelen); + strbuf_addstr(&path, de->d_name); + + stat(path.buf, &st); + + if (st.st_mtime > expire_time) + continue; + if (path.len < 6 || strcmp(path.buf + path.len - 6, ".graph")) + continue; + + for (i = 0; i < ctx->num_commit_graphs_after; i++) { + if (!strcmp(ctx->commit_graph_filenames_after[i], + path.buf)) { + found = 1; + break; + } + } + + if (!found) + unlink(path.buf); + + } +} + int write_commit_graph(const char *obj_dir, struct string_list *pack_indexes, struct string_list *commit_hex, @@ -1764,6 +1828,11 @@ int write_commit_graph(const char *obj_dir, res = write_commit_graph_file(ctx); + if (ctx->split) { + mark_commit_graphs(ctx); + expire_commit_graphs(ctx); + } + cleanup: free(ctx->graph_name); free(ctx->commits.list); diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 46f0832f68..76068ee407 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -141,7 +141,7 @@ test_expect_success 'add one commit, write a merged graph' ' test_path_is_file $graphdir/commit-graph-chain && test_line_count = 2 $graphdir/commit-graph-chain && ls $graphdir/graph-*.graph >graph-files && - test_line_count = 4 graph-files && + test_line_count = 2 graph-files && verify_chain_files_exist $graphdir ' -- cgit v1.3 From c2bc6e6ab0ade70c475a73d06326e677e70840d2 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Tue, 18 Jun 2019 11:14:32 -0700 Subject: commit-graph: create options for split files The split commit-graph feature is now fully implemented, but needs some more run-time configurability. Allow direct callers to 'git commit-graph write --split' to specify the values used in the merge strategy and the expire time. Update the documentation to specify these values. Signed-off-by: Derrick Stolee Signed-off-by: Junio C Hamano --- Documentation/git-commit-graph.txt | 21 +++++++++++++- Documentation/technical/commit-graph.txt | 7 +++-- builtin/commit-graph.c | 25 +++++++++++++---- builtin/commit.c | 2 +- builtin/gc.c | 3 +- commit-graph.c | 35 ++++++++++++++++-------- commit-graph.h | 12 ++++++-- t/t5324-split-commit-graph.sh | 47 ++++++++++++++++++++++++++++++++ 8 files changed, 128 insertions(+), 24 deletions(-) (limited to 'Documentation/technical/commit-graph.txt') diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index 624470e198..365e145e82 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -26,7 +26,7 @@ OPTIONS Use given directory for the location of packfiles and commit-graph file. This parameter exists to specify the location of an alternate that only has the objects directory, not a full `.git` directory. The - commit-graph file is expected to be at `/info/commit-graph` and + commit-graph file is expected to be in the `/info` directory and the packfiles are expected to be in `/pack`. @@ -51,6 +51,25 @@ or `--stdin-packs`.) + With the `--append` option, include all commits that are present in the existing commit-graph file. ++ +With the `--split` option, write the commit-graph as a chain of multiple +commit-graph files stored in `/info/commit-graphs`. The new commits +not already in the commit-graph are added in a new "tip" file. This file +is merged with the existing file if the following merge conditions are +met: ++ +* If `--size-multiple=` is not specified, let `X` equal 2. If the new +tip file would have `N` commits and the previous tip has `M` commits and +`X` times `N` is greater than `M`, instead merge the two files into a +single file. ++ +* If `--max-commits=` is specified with `M` a positive integer, and the +new tip file would have more than `M` commits, then instead merge the new +tip with the previous tip. ++ +Finally, if `--expire-time=` is not specified, let `datetime` +be the current time. After writing the split commit-graph, delete all +unused commit-graph whose modified times are older than `datetime`. 'read':: diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index aed4350a59..729fbcb32f 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt @@ -248,10 +248,11 @@ When writing a set of commits that do not exist in the commit-graph stack of height N, we default to creating a new file at level N + 1. We then decide to merge with the Nth level if one of two conditions hold: - 1. The expected file size for level N + 1 is at least half the file size for - level N. + 1. `--size-multiple=` is specified or X = 2, and the number of commits in + level N is less than X times the number of commits in level N + 1. - 2. Level N + 1 contains more than 64,0000 commits. + 2. `--max-commits=` is specified with non-zero C and the number of commits + in level N + 1 is more than C commits. This decision cascades down the levels: when we merge a level we create a new set of commits that then compares to the next level. diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index ff8bc3c8b9..6c0b5b17e0 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -10,7 +10,7 @@ static char const * const builtin_commit_graph_usage[] = { N_("git commit-graph [--object-dir ]"), N_("git commit-graph read [--object-dir ]"), N_("git commit-graph verify [--object-dir ]"), - N_("git commit-graph write [--object-dir ] [--append|--split] [--reachable|--stdin-packs|--stdin-commits]"), + N_("git commit-graph write [--object-dir ] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] "), NULL }; @@ -25,7 +25,7 @@ static const char * const builtin_commit_graph_read_usage[] = { }; static const char * const builtin_commit_graph_write_usage[] = { - N_("git commit-graph write [--object-dir ] [--append|--split] [--reachable|--stdin-packs|--stdin-commits]"), + N_("git commit-graph write [--object-dir ] [--append|--split] [--reachable|--stdin-packs|--stdin-commits] "), NULL }; @@ -135,6 +135,7 @@ static int graph_read(int argc, const char **argv) } extern int read_replace_refs; +static struct split_commit_graph_opts split_opts; static int graph_write(int argc, const char **argv) { @@ -158,9 +159,19 @@ static int graph_write(int argc, const char **argv) N_("include all commits already in the commit-graph file")), OPT_BOOL(0, "split", &opts.split, N_("allow writing an incremental commit-graph file")), + OPT_INTEGER(0, "max-commits", &split_opts.max_commits, + N_("maximum number of commits in a non-base split commit-graph")), + OPT_INTEGER(0, "size-multiple", &split_opts.size_multiple, + N_("maximum ratio between two levels of a split commit-graph")), + OPT_EXPIRY_DATE(0, "expire-time", &split_opts.expire_time, + N_("maximum number of commits in a non-base split commit-graph")), OPT_END(), }; + split_opts.size_multiple = 2; + split_opts.max_commits = 0; + split_opts.expire_time = 0; + argc = parse_options(argc, argv, NULL, builtin_commit_graph_write_options, builtin_commit_graph_write_usage, 0); @@ -176,8 +187,11 @@ static int graph_write(int argc, const char **argv) read_replace_refs = 0; - if (opts.reachable) - return write_commit_graph_reachable(opts.obj_dir, flags); + if (opts.reachable) { + if (write_commit_graph_reachable(opts.obj_dir, flags, &split_opts)) + return 1; + return 0; + } string_list_init(&lines, 0); if (opts.stdin_packs || opts.stdin_commits) { @@ -197,7 +211,8 @@ static int graph_write(int argc, const char **argv) if (write_commit_graph(opts.obj_dir, pack_indexes, commit_hex, - flags)) + flags, + &split_opts)) result = 1; UNLEAK(lines); diff --git a/builtin/commit.c b/builtin/commit.c index b001ef565d..9216e9c043 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -1670,7 +1670,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix) "not exceeded, and then \"git reset HEAD\" to recover.")); if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) && - write_commit_graph_reachable(get_object_directory(), 0)) + write_commit_graph_reachable(get_object_directory(), 0, NULL)) return 1; repo_rerere(the_repository, 0); diff --git a/builtin/gc.c b/builtin/gc.c index 20c8f1bfe8..0aa8eac747 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -666,7 +666,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix) if (gc_write_commit_graph && write_commit_graph_reachable(get_object_directory(), - !quiet && !daemonized ? COMMIT_GRAPH_PROGRESS : 0)) + !quiet && !daemonized ? COMMIT_GRAPH_PROGRESS : 0, + NULL)) return 1; if (auto_gc && too_many_loose_objects()) diff --git a/commit-graph.c b/commit-graph.c index 0cc2ceb349..315088d205 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -768,6 +768,8 @@ struct write_commit_graph_context { unsigned append:1, report_progress:1, split:1; + + const struct split_commit_graph_opts *split_opts; }; static void write_graph_chunk_fanout(struct hashfile *f, @@ -1116,14 +1118,15 @@ static int add_ref_to_list(const char *refname, return 0; } -int write_commit_graph_reachable(const char *obj_dir, unsigned int flags) +int write_commit_graph_reachable(const char *obj_dir, unsigned int flags, + const struct split_commit_graph_opts *split_opts) { struct string_list list = STRING_LIST_INIT_DUP; int result; for_each_ref(add_ref_to_list, &list); result = write_commit_graph(obj_dir, NULL, &list, - flags); + flags, split_opts); string_list_clear(&list, 0); return result; @@ -1498,20 +1501,25 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx) return 0; } -static int split_strategy_max_commits = 64000; -static float split_strategy_size_mult = 2.0f; - static void split_graph_merge_strategy(struct write_commit_graph_context *ctx) { struct commit_graph *g = ctx->r->objects->commit_graph; uint32_t num_commits = ctx->commits.nr; uint32_t i; + int max_commits = 0; + int size_mult = 2; + + if (ctx->split_opts) { + max_commits = ctx->split_opts->max_commits; + size_mult = ctx->split_opts->size_multiple; + } + g = ctx->r->objects->commit_graph; ctx->num_commit_graphs_after = ctx->num_commit_graphs_before + 1; - while (g && (g->num_commits <= split_strategy_size_mult * num_commits || - num_commits > split_strategy_max_commits)) { + while (g && (g->num_commits <= size_mult * num_commits || + (max_commits && num_commits > max_commits))) { if (strcmp(g->obj_dir, ctx->obj_dir)) break; @@ -1675,7 +1683,10 @@ static void expire_commit_graphs(struct write_commit_graph_context *ctx) DIR *dir; struct dirent *de; size_t dirnamelen; - time_t expire_time = time(NULL); + timestamp_t expire_time = time(NULL); + + if (ctx->split_opts && ctx->split_opts->expire_time) + expire_time -= ctx->split_opts->expire_time; strbuf_addstr(&path, ctx->obj_dir); strbuf_addstr(&path, "/info/commit-graphs"); @@ -1719,7 +1730,8 @@ static void expire_commit_graphs(struct write_commit_graph_context *ctx) int write_commit_graph(const char *obj_dir, struct string_list *pack_indexes, struct string_list *commit_hex, - unsigned int flags) + unsigned int flags, + const struct split_commit_graph_opts *split_opts) { struct write_commit_graph_context *ctx; uint32_t i, count_distinct = 0; @@ -1734,6 +1746,7 @@ int write_commit_graph(const char *obj_dir, ctx->append = flags & COMMIT_GRAPH_APPEND ? 1 : 0; ctx->report_progress = flags & COMMIT_GRAPH_PROGRESS ? 1 : 0; ctx->split = flags & COMMIT_GRAPH_SPLIT ? 1 : 0; + ctx->split_opts = split_opts; if (ctx->split) { struct commit_graph *g; @@ -1761,8 +1774,8 @@ int write_commit_graph(const char *obj_dir, ctx->approx_nr_objects = approximate_object_count(); ctx->oids.alloc = ctx->approx_nr_objects / 32; - if (ctx->split && ctx->oids.alloc > split_strategy_max_commits) - ctx->oids.alloc = split_strategy_max_commits; + if (ctx->split && split_opts && ctx->oids.alloc > split_opts->max_commits) + ctx->oids.alloc = split_opts->max_commits; if (ctx->append) { prepare_commit_graph_one(ctx->r, ctx->obj_dir); diff --git a/commit-graph.h b/commit-graph.h index 802d35254f..a84c22a560 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -75,17 +75,25 @@ int generation_numbers_enabled(struct repository *r); #define COMMIT_GRAPH_PROGRESS (1 << 1) #define COMMIT_GRAPH_SPLIT (1 << 2) +struct split_commit_graph_opts { + int size_multiple; + int max_commits; + timestamp_t expire_time; +}; + /* * The write_commit_graph* methods return zero on success * and a negative value on failure. Note that if the repository * is not compatible with the commit-graph feature, then the * methods will return 0 without writing a commit-graph. */ -int write_commit_graph_reachable(const char *obj_dir, unsigned int flags); +int write_commit_graph_reachable(const char *obj_dir, unsigned int flags, + const struct split_commit_graph_opts *split_opts); int write_commit_graph(const char *obj_dir, struct string_list *pack_indexes, struct string_list *commit_hex, - unsigned int flags); + unsigned int flags, + const struct split_commit_graph_opts *split_opts); int verify_commit_graph(struct repository *r, struct commit_graph *g); diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 76068ee407..1b699a543c 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -169,4 +169,51 @@ test_expect_success 'create fork and chain across alternate' ' graph_git_behavior 'alternate: commit 13 vs 6' commits/13 commits/6 +test_expect_success 'test merge stragety constants' ' + git clone . merge-2 && + ( + cd merge-2 && + git config core.commitGraph true && + test_line_count = 2 $graphdir/commit-graph-chain && + test_commit 14 && + git commit-graph write --reachable --split --size-multiple=2 && + test_line_count = 3 $graphdir/commit-graph-chain + + ) && + git clone . merge-10 && + ( + cd merge-10 && + git config core.commitGraph true && + test_line_count = 2 $graphdir/commit-graph-chain && + test_commit 14 && + git commit-graph write --reachable --split --size-multiple=10 && + test_line_count = 1 $graphdir/commit-graph-chain && + ls $graphdir/graph-*.graph >graph-files && + test_line_count = 1 graph-files + ) && + git clone . merge-10-expire && + ( + cd merge-10-expire && + git config core.commitGraph true && + test_line_count = 2 $graphdir/commit-graph-chain && + test_commit 15 && + git commit-graph write --reachable --split --size-multiple=10 --expire-time=1980-01-01 && + test_line_count = 1 $graphdir/commit-graph-chain && + ls $graphdir/graph-*.graph >graph-files && + test_line_count = 3 graph-files + ) && + git clone --no-hardlinks . max-commits && + ( + cd max-commits && + git config core.commitGraph true && + test_line_count = 2 $graphdir/commit-graph-chain && + test_commit 16 && + test_commit 17 && + git commit-graph write --reachable --split --max-commits=1 && + test_line_count = 1 $graphdir/commit-graph-chain && + ls $graphdir/graph-*.graph >graph-files && + test_line_count = 1 graph-files + ) +' + test_done -- cgit v1.3