From 2841e8f81cb2820024804b9341577be1d0ce1240 Mon Sep 17 00:00:00 2001 From: Lars Schneider Date: Fri, 30 Jun 2017 22:41:28 +0200 Subject: convert: add "status=delayed" to filter process protocol Some `clean` / `smudge` filters may require a significant amount of time to process a single blob (e.g. the Git LFS smudge filter might perform network requests). During this process the Git checkout operation is blocked and Git needs to wait until the filter is done to continue with the checkout. Teach the filter process protocol, introduced in edcc8581 ("convert: add filter..process option", 2016-10-16), to accept the status "delayed" as response to a filter request. Upon this response Git continues with the checkout operation. After the checkout operation Git calls "finish_delayed_checkout" which queries the filter for remaining blobs. If the filter is still working on the completion, then the filter is expected to block. If the filter has completed all remaining blobs then an empty response is expected. Git has a multiple code paths that checkout a blob. Support delayed checkouts only in `clone` (in unpack-trees.c) and `checkout` operations for now. The optimization is most effective in these code paths as all files of the tree are processed. Signed-off-by: Lars Schneider Signed-off-by: Junio C Hamano --- Documentation/gitattributes.txt | 69 ++++++++++++++++++++++++++++++++++++++--- 1 file changed, 65 insertions(+), 4 deletions(-) (limited to 'Documentation') diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 4736483865..4049a0b9a8 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -425,8 +425,8 @@ packet: git< capability=clean packet: git< capability=smudge packet: git< 0000 ------------------------ -Supported filter capabilities in version 2 are "clean" and -"smudge". +Supported filter capabilities in version 2 are "clean", "smudge", +and "delay". Afterwards Git sends a list of "key=value" pairs terminated with a flush packet. The list will contain at least the filter command @@ -512,12 +512,73 @@ the protocol then Git will stop the filter process and restart it with the next file that needs to be processed. Depending on the `filter..required` flag Git will interpret that as error. -After the filter has processed a blob it is expected to wait for -the next "key=value" list containing a command. Git will close +After the filter has processed a command it is expected to wait for +a "key=value" list containing the next command. Git will close the command pipe on exit. The filter is expected to detect EOF and exit gracefully on its own. Git will wait until the filter process has stopped. +Delay +^^^^^ + +If the filter supports the "delay" capability, then Git can send the +flag "can-delay" after the filter command and pathname. This flag +denotes that the filter can delay filtering the current blob (e.g. to +compensate network latencies) by responding with no content but with +the status "delayed" and a flush packet. +------------------------ +packet: git> command=smudge +packet: git> pathname=path/testfile.dat +packet: git> can-delay=1 +packet: git> 0000 +packet: git> CONTENT +packet: git> 0000 +packet: git< status=delayed +packet: git< 0000 +------------------------ + +If the filter supports the "delay" capability then it must support the +"list_available_blobs" command. If Git sends this command, then the +filter is expected to return a list of pathnames representing blobs +that have been delayed earlier and are now available. +The list must be terminated with a flush packet followed +by a "success" status that is also terminated with a flush packet. If +no blobs for the delayed paths are available, yet, then the filter is +expected to block the response until at least one blob becomes +available. The filter can tell Git that it has no more delayed blobs +by sending an empty list. As soon as the filter responds with an empty +list, Git stops asking. All blobs that Git has not received at this +point are considered missing and will result in an error. + +------------------------ +packet: git> command=list_available_blobs +packet: git> 0000 +packet: git< pathname=path/testfile.dat +packet: git< pathname=path/otherfile.dat +packet: git< 0000 +packet: git< status=success +packet: git< 0000 +------------------------ + +After Git received the pathnames, it will request the corresponding +blobs again. These requests contain a pathname and an empty content +section. The filter is expected to respond with the smudged content +in the usual way as explained above. +------------------------ +packet: git> command=smudge +packet: git> pathname=path/testfile.dat +packet: git> 0000 +packet: git> 0000 # empty content! +packet: git< status=success +packet: git< 0000 +packet: git< SMUDGED_CONTENT +packet: git< 0000 +packet: git< 0000 # empty list, keep "status=success" unchanged! +------------------------ + +Example +^^^^^^^ + A long running filter demo implementation can be found in `contrib/long-running-filter/example.pl` located in the Git core repository. If you develop your own long running filter -- cgit v1.3 From 7e2e1bbb24a5a0868fc83f1eddf804574f9e4b54 Mon Sep 17 00:00:00 2001 From: Jonathan Tan Date: Wed, 26 Jul 2017 11:17:28 -0700 Subject: Documentation: migrate sub-process docs to header Move the documentation for the sub-process API from a separate txt file to its header file. Signed-off-by: Jonathan Tan Signed-off-by: Junio C Hamano --- Documentation/technical/api-sub-process.txt | 59 ----------------------------- sub-process.h | 25 +++++++++++- 2 files changed, 23 insertions(+), 61 deletions(-) delete mode 100644 Documentation/technical/api-sub-process.txt (limited to 'Documentation') diff --git a/Documentation/technical/api-sub-process.txt b/Documentation/technical/api-sub-process.txt deleted file mode 100644 index 793508cf3e..0000000000 --- a/Documentation/technical/api-sub-process.txt +++ /dev/null @@ -1,59 +0,0 @@ -sub-process API -=============== - -The sub-process API makes it possible to run background sub-processes -for the entire lifetime of a Git invocation. If Git needs to communicate -with an external process multiple times, then this can reduces the process -invocation overhead. Git and the sub-process communicate through stdin and -stdout. - -The sub-processes are kept in a hashmap by command name and looked up -via the subprocess_find_entry function. If an existing instance can not -be found then a new process should be created and started. When the -parent git command terminates, all sub-processes are also terminated. - -This API is based on the run-command API. - -Data structures ---------------- - -* `struct subprocess_entry` - -The sub-process structure. Members should not be accessed directly. - -Types ------ - -'int(*subprocess_start_fn)(struct subprocess_entry *entry)':: - - User-supplied function to initialize the sub-process. This is - typically used to negotiate the interface version and capabilities. - - -Functions ---------- - -`cmd2process_cmp`:: - - Function to test two subprocess hashmap entries for equality. - -`subprocess_start`:: - - Start a subprocess and add it to the subprocess hashmap. - -`subprocess_stop`:: - - Kill a subprocess and remove it from the subprocess hashmap. - -`subprocess_find_entry`:: - - Find a subprocess in the subprocess hashmap. - -`subprocess_get_child_process`:: - - Get the underlying `struct child_process` from a subprocess. - -`subprocess_read_status`:: - - Helper function to read packets looking for the last "status=" - key/value pair. diff --git a/sub-process.h b/sub-process.h index 96a2cca360..e546216145 100644 --- a/sub-process.h +++ b/sub-process.h @@ -6,12 +6,23 @@ #include "run-command.h" /* - * Generic implementation of background process infrastructure. - * See: Documentation/technical/api-sub-process.txt + * The sub-process API makes it possible to run background sub-processes + * for the entire lifetime of a Git invocation. If Git needs to communicate + * with an external process multiple times, then this can reduces the process + * invocation overhead. Git and the sub-process communicate through stdin and + * stdout. + * + * The sub-processes are kept in a hashmap by command name and looked up + * via the subprocess_find_entry function. If an existing instance can not + * be found then a new process should be created and started. When the + * parent git command terminates, all sub-processes are also terminated. + * + * This API is based on the run-command API. */ /* data structures */ +/* Members should not be accessed directly. */ struct subprocess_entry { struct hashmap_entry ent; /* must be the first member! */ const char *cmd; @@ -20,21 +31,31 @@ struct subprocess_entry { /* subprocess functions */ +/* Function to test two subprocess hashmap entries for equality. */ extern int cmd2process_cmp(const void *unused_cmp_data, const struct subprocess_entry *e1, const struct subprocess_entry *e2, const void *unused_keydata); +/* + * User-supplied function to initialize the sub-process. This is + * typically used to negotiate the interface version and capabilities. + */ typedef int(*subprocess_start_fn)(struct subprocess_entry *entry); + +/* Start a subprocess and add it to the subprocess hashmap. */ int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, const char *cmd, subprocess_start_fn startfn); +/* Kill a subprocess and remove it from the subprocess hashmap. */ void subprocess_stop(struct hashmap *hashmap, struct subprocess_entry *entry); +/* Find a subprocess in the subprocess hashmap. */ struct subprocess_entry *subprocess_find_entry(struct hashmap *hashmap, const char *cmd); /* subprocess helper functions */ +/* Get the underlying `struct child_process` from a subprocess. */ static inline struct child_process *subprocess_get_child_process( struct subprocess_entry *entry) { -- cgit v1.3