<feed xmlns='http://www.w3.org/2005/Atom'>
<title>git/commit.c, branch v2.0.4</title>
<subtitle>Fork of git SCM with my patches.</subtitle>
<id>http://git.kilabit.info/git/atom?h=v2.0.4</id>
<link rel='self' href='http://git.kilabit.info/git/atom?h=v2.0.4'/>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/'/>
<updated>2014-07-28T17:14:33Z</updated>
<entry>
<title>add object_as_type helper for casting objects</title>
<updated>2014-07-28T17:14:33Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2014-07-13T06:42:03Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=c4ad00f8ccb59a0ae0735e8e32b203d4bd835616'/>
<id>urn:sha1:c4ad00f8ccb59a0ae0735e8e32b203d4bd835616</id>
<content type='text'>
When we call lookup_commit, lookup_tree, etc, the logic goes
something like:

  1. Look for an existing object struct. If we don't have
     one, allocate and return a new one.

  2. Double check that any object we have is the expected
     type (and complain and return NULL otherwise).

  3. Convert an object with type OBJ_NONE (from a prior
     call to lookup_unknown_object) to the expected type.

We can encapsulate steps 2 and 3 in a helper function which
checks whether we have the expected object type, converts
OBJ_NONE as appropriate, and returns the object.

Not only does this shorten the code, but it also provides
one central location for converting OBJ_NONE objects into
objects of other types. Future patches will use that to
enforce type-specific invariants.

Since this is a refactoring, we would want it to behave
exactly as the current code. It takes a little reasoning to
see that this is the case:

  - for lookup_{commit,tree,etc} functions, we are just
    pulling steps 2 and 3 into a function that does the same
    thing.

  - for the call in peel_object, we currently only do step 3
    (but we want to consolidate it with the others, as
    mentioned above). However, step 2 is a noop here, as the
    surrounding conditional makes sure we have OBJ_NONE
    (which we want to keep to avoid an extraneous call to
    sha1_object_info).

  - for the call in lookup_commit_reference_gently, we are
    currently doing step 2 but not step 3. However, step 3
    is a noop here. The object we got will have just come
    from deref_tag, which must have figured out the type for
    each object in order to know when to stop peeling.
    Therefore the type will never be OBJ_NONE.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>move setting of object-&gt;type to alloc_* functions</title>
<updated>2014-07-28T17:14:33Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2014-07-13T06:41:55Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=fe24d396e1b8d2eedbc07f626af3dcd2b14e6012'/>
<id>urn:sha1:fe24d396e1b8d2eedbc07f626af3dcd2b14e6012</id>
<content type='text'>
The "struct object" type implements basic object
polymorphism.  Individual instances are allocated as
concrete types (or as a union type that can store any
object), and a "struct object *" can be cast into its real
type after examining its "type" enum.  This means it is
dangerous to have a type field that does not match the
allocation (e.g., setting the type field of a "struct blob"
to "OBJ_COMMIT" would mean that a reader might read past the
allocated memory).

In most of the current code this is not a problem; the first
thing we do after allocating an object is usually to set its
type field by passing it to create_object. However, the
virtual commits we create in merge-recursive.c do not ever
get their type set. This does not seem to have caused
problems in practice, though (presumably because we always
pass around a "struct commit" pointer and never even look at
the type).

We can fix this oversight and also make it harder for future
code to get it wrong by setting the type directly in the
object allocation functions.

This will also make it easier to fix problems with commit
index allocation, as we know that any object allocated by
alloc_commit_node will meet the invariant that an object
with an OBJ_COMMIT type field will have a unique index
number.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>Merge branch 'bg/xcalloc-nmemb-then-size' into maint</title>
<updated>2014-07-22T17:25:17Z</updated>
<author>
<name>Junio C Hamano</name>
<email>gitster@pobox.com</email>
</author>
<published>2014-07-22T17:25:17Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=cfececfe1ff1554783030a6f08b431ad50263800'/>
<id>urn:sha1:cfececfe1ff1554783030a6f08b431ad50263800</id>
<content type='text'>
* bg/xcalloc-nmemb-then-size:
  transport-helper.c: rearrange xcalloc arguments
  remote.c: rearrange xcalloc arguments
  reflog-walk.c: rearrange xcalloc arguments
  pack-revindex.c: rearrange xcalloc arguments
  notes.c: rearrange xcalloc arguments
  imap-send.c: rearrange xcalloc arguments
  http-push.c: rearrange xcalloc arguments
  diff.c: rearrange xcalloc arguments
  config.c: rearrange xcalloc arguments
  commit.c: rearrange xcalloc arguments
  builtin/remote.c: rearrange xcalloc arguments
  builtin/ls-remote.c: rearrange xcalloc arguments
</content>
</entry>
<entry>
<title>reuse cached commit buffer when parsing signatures</title>
<updated>2014-06-13T19:10:13Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2014-06-13T06:32:11Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=218aa3a6162b80696a82b8745daa38fa826985ae'/>
<id>urn:sha1:218aa3a6162b80696a82b8745daa38fa826985ae</id>
<content type='text'>
When we call show_signature or show_mergetag, we read the
commit object fresh via read_sha1_file and reparse its
headers. However, in most cases we already have the object
data available, attached to the "struct commit". This is
partially laziness in dealing with the memory allocation
issues, but partially defensive programming, in that we
would always want to verify a clean version of the buffer
(not one that might have been munged by other users of the
commit).

However, we do not currently ever munge the commit buffer,
and not using the already-available buffer carries a fairly
big performance penalty when we are looking at a large
number of commits. Here are timings on linux.git:

  [baseline, no signatures]
  $ time git log &gt;/dev/null
  real    0m4.902s
  user    0m4.784s
  sys     0m0.120s

  [before]
  $ time git log --show-signature &gt;/dev/null
  real    0m14.735s
  user    0m9.964s
  sys     0m0.944s

  [after]
  $ time git log --show-signature &gt;/dev/null
  real    0m9.981s
  user    0m5.260s
  sys     0m0.936s

Note that our user CPU time drops almost in half, close to
the non-signature case, but we do still spend more
wall-clock and system time, presumably from dealing with
gpg.

An alternative to this is to note that most commits do not
have signatures (less than 1% in this repo), yet we pay the
re-parsing cost for every commit just to find out if it has
a mergetag or signature. If we checked that when parsing the
commit initially, we could avoid re-examining most commits
later on. Even if we did pursue that direction, however,
this would still speed up the cases where we _do_ have
signatures. So it's probably worth doing either way.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit: record buffer length in cache</title>
<updated>2014-06-13T19:09:38Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2014-06-10T21:44:13Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=8597ea3afea067b39ba7d4adae7ec6c1ee0e7c91'/>
<id>urn:sha1:8597ea3afea067b39ba7d4adae7ec6c1ee0e7c91</id>
<content type='text'>
Most callsites which use the commit buffer try to use the
cached version attached to the commit, rather than
re-reading from disk. Unfortunately, that interface provides
only a pointer to the NUL-terminated buffer, with no
indication of the original length.

For the most part, this doesn't matter. People do not put
NULs in their commit messages, and the log code is happy to
treat it all as a NUL-terminated string. However, some code
paths do care. For example, when checking signatures, we
want to be very careful that we verify all the bytes to
avoid malicious trickery.

This patch just adds an optional "size" out-pointer to
get_commit_buffer and friends. The existing callers all pass
NULL (there did not seem to be any obvious sites where we
could avoid an immediate strlen() call, though perhaps with
some further refactoring we could).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>commit: convert commit-&gt;buffer to a slab</title>
<updated>2014-06-13T19:08:17Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2014-06-10T21:43:02Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=c1b3c71f4b4571abb2b2a457122fd100dc9f7eb0'/>
<id>urn:sha1:c1b3c71f4b4571abb2b2a457122fd100dc9f7eb0</id>
<content type='text'>
This will make it easier to manage the buffer cache
independently of the "struct commit" objects. It also
shrinks "struct commit" by one pointer, which may be
helpful.

Unfortunately it does not reduce the max memory size of
something like "rev-list", because rev-list uses
get_cached_commit_buffer() to decide not to show each
commit's output (and due to the design of slab_at, accessing
the slab requires us to extend it, allocating exactly the
same number of buffer pointers we dropped from the commit
structs).

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>use get_commit_buffer to avoid duplicate code</title>
<updated>2014-06-13T19:08:17Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2014-06-10T21:41:02Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=ba41c1c93fd9109eae954f75a8cb8e32c3e29530'/>
<id>urn:sha1:ba41c1c93fd9109eae954f75a8cb8e32c3e29530</id>
<content type='text'>
For both of these sites, we already do the "fallback to
read_sha1_file" trick. But we can shorten the code by just
using get_commit_buffer.

Note that the error cases are slightly different when
read_sha1_file fails. get_commit_buffer will die() if the
object cannot be loaded, or is a non-commit.

For get_sha1_oneline, this will almost certainly never
happen, as we will have just called parse_object (and if it
does, it's probably worth complaining about).

For record_author_date, the new behavior is probably better;
we notify the user of the error instead of silently ignoring
it. And because it's used only for sorting by author-date,
somebody examining a corrupt repo can fallback to the
regular traversal order.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>provide helpers to access the commit buffer</title>
<updated>2014-06-13T19:08:17Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2014-06-10T21:40:39Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=152ff1ccebd822fd97f27d2a6c3fa2058f088fd8'/>
<id>urn:sha1:152ff1ccebd822fd97f27d2a6c3fa2058f088fd8</id>
<content type='text'>
Many sites look at commit-&gt;buffer to get more detailed
information than what is in the parsed commit struct.
However, we sometimes drop commit-&gt;buffer to save memory,
in which case the caller would need to read the object
afresh. Some callers do this (leading to duplicated code),
and others do not (which opens the possibility of a segfault
if somebody else frees the buffer).

Let's provide a pair of helpers, "get" and "unuse", that let
callers easily get the buffer. They will use the cached
buffer when possible, and otherwise load from disk using
read_sha1_file.

Note that we also need to add a "get_cached" variant which
returns NULL when we do not have a cached buffer. At first
glance this seems to defeat the purpose of "get", which is
to always provide a return value. However, some log code
paths actually use the NULL-ness of commit-&gt;buffer as a
boolean flag to decide whether to try printing the
commit. At least for now, we want to continue supporting
that use.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>provide a helper to set the commit buffer</title>
<updated>2014-06-13T19:08:17Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2014-06-10T21:40:14Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=66c2827ea4deb24ff541e30a5b6239ad5e9f6801'/>
<id>urn:sha1:66c2827ea4deb24ff541e30a5b6239ad5e9f6801</id>
<content type='text'>
Right now this is just a one-liner, but abstracting it will
make it easier to change later.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
<entry>
<title>provide a helper to free commit buffer</title>
<updated>2014-06-13T19:07:47Z</updated>
<author>
<name>Jeff King</name>
<email>peff@peff.net</email>
</author>
<published>2014-06-12T22:05:37Z</published>
<link rel='alternate' type='text/html' href='http://git.kilabit.info/git/commit/?id=0fb370da9ca972f9571530f95c0dacb31368c280'/>
<id>urn:sha1:0fb370da9ca972f9571530f95c0dacb31368c280</id>
<content type='text'>
This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.

Note that we also need to provide a "detach" mechanism for a
tricky case in index-pack. We are passed a buffer for the
object generated by processing the incoming pack. If we are
not using --strict, we just calculate the sha1 on that
buffer and return, leaving the caller to free it.  But if we
are using --strict, we actually attach that buffer to an
object, pass the object to the fsck functions, and then
detach the buffer from the object again (so that the caller
can free it as usual).  In this case, we don't want to free
the buffer ourselves, but just make sure it is no longer
associated with the commit.

Note that we are making the assumption here that the
attach/detach process does not impact the buffer at all
(e.g., it is never reallocated or modified). That holds true
now, and we have no plans to change that. However, as we
abstract the commit_buffer code, this dependency becomes
less obvious. So when we detach, let's also make sure that
we get back the same buffer that we gave to the
commit_buffer code.

Signed-off-by: Jeff King &lt;peff@peff.net&gt;
Signed-off-by: Junio C Hamano &lt;gitster@pobox.com&gt;
</content>
</entry>
</feed>
