From 1e9bfbbfca9a4003368352d173815de134f9bcd9 Mon Sep 17 00:00:00 2001
From: Pavel Cahyna <pcahyna@redhat.com>
Date: Mon, 18 Jun 2018 13:58:25 +0200
Subject: [PATCH] Squashed commit of the following:
commit a1e311b306db9407e0bf83046dce50ef6a7f74bb
Author: Jeff King <peff@peff.net>
Date: Mon Jan 16 16:24:03 2017 -0500
t1450: clean up sub-objects in duplicate-entry test
This test creates a multi-level set of trees, but its
cleanup routine only removes the top-level tree. After the
test finishes, the inner tree and the blob it points to
remain, making the inner tree dangling.
A later test ("cleaned up") verifies that we've removed any
cruft and "git fsck" output is clean. This passes only
because of a bug in git-fsck which fails to notice dangling
trees.
In preparation for fixing the bug, let's teach this earlier
test to clean up after itself correctly. We have to remove
the inner tree (and therefore the blob, too, which becomes
dangling after removing that tree).
Since the setup code happens inside a subshell, we can't
just set a variable for each object. However, we can stuff
all of the sha1s into the $T output variable, which is not
used for anything except cleanup.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit e71a6f0c8a80829017629d1ae595f6a887c4e844
Author: Pavel Cahyna <pcahyna@redhat.com>
Date: Fri Jun 15 11:49:09 2018 +0200
Adapt t7415-submodule-names.sh to git 1.8.3.1 : we don't have -C
commit ba4d4fca832bf7c2ec224ada5243e950d8e32406
Author: Jeff King <peff@peff.net>
Date: Fri May 4 20:03:35 2018 -0400
fsck: complain when .gitmodules is a symlink
commit b7b1fca175f1ed7933f361028c631b9ac86d868d upstream.
We've recently forbidden .gitmodules to be a symlink in
verify_path(). And it's an easy way to circumvent our fsck
checks for .gitmodules content. So let's complain when we
see it.
[jn: backported to 2.1.y:
- using error_func instead of report to report fsck errors
- until v2.6.2~7^2 (fsck: exit with non-zero when problems
are found, 2015-09-23), git fsck did not reliably use the
exit status to indicate errors; callers would have to
check stderr instead. Relaxed the test to permit that.
[pc: revert the above and fix the actual error by moving the
initialization of retval earlier so it is not clobbered]]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit d55fa465ac0c2d825f80e8ad0cbbca877812c0b1
Author: Jeff King <peff@peff.net>
Date: Fri May 4 19:45:01 2018 -0400
index-pack: check .gitmodules files with --strict
commit 73c3f0f704a91b6792e0199a3f3ab6e3a1971675 upstream.
Now that the internal fsck code has all of the plumbing we
need, we can start checking incoming .gitmodules files.
Naively, it seems like we would just need to add a call to
fsck_finish() after we've processed all of the objects. And
that would be enough to cover the initial test included
here. But there are two extra bits:
1. We currently don't bother calling fsck_object() at all
for blobs, since it has traditionally been a noop. We'd
actually catch these blobs in fsck_finish() at the end,
but it's more efficient to check them when we already
have the object loaded in memory.
2. The second pass done by fsck_finish() needs to access
the objects, but we're actually indexing the pack in
this process. In theory we could give the fsck code a
special callback for accessing the in-pack data, but
it's actually quite tricky:
a. We don't have an internal efficient index mapping
oids to packfile offsets. We only generate it on
the fly as part of writing out the .idx file.
b. We'd still have to reconstruct deltas, which means
we'd basically have to replicate all of the
reading logic in packfile.c.
Instead, let's avoid running fsck_finish() until after
we've written out the .idx file, and then just add it
to our internal packed_git list.
This does mean that the objects are "in the repository"
before we finish our fsck checks. But unpack-objects
already exhibits this same behavior, and it's an
acceptable tradeoff here for the same reason: the
quarantine mechanism means that pushes will be
fully protected.
In addition to a basic push test in t7415, we add a sneaky
pack that reverses the usual object order in the pack,
requiring that index-pack access the tree and blob during
the "finish" step.
This already works for unpack-objects (since it will have
written out loose objects), but we'll check it with this
sneaky pack for good measure.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit 098925bcbeaa8aada335cdd48135efa83e3710d8
Author: Jeff King <peff@peff.net>
Date: Fri May 4 19:40:08 2018 -0400
unpack-objects: call fsck_finish() after fscking objects
commit 6e328d6caef218db320978e3e251009135d87d0e upstream.
As with the previous commit, we must call fsck's "finish"
function in order to catch any queued objects for
.gitmodules checks.
This second pass will be able to access any incoming
objects, because we will have exploded them to loose objects
by now.
This isn't quite ideal, because it means that bad objects
may have been written to the object database (and a
subsequent operation could then reference them, even if the
other side doesn't send the objects again). However, this is
sufficient when used with receive.fsckObjects, since those
loose objects will all be placed in a temporary quarantine
area that will get wiped if we find any problems.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit 7f1b69637f1744e42e61fd18c7ac4dac75edd6fb
Author: Jeff King <peff@peff.net>
Date: Wed May 2 17:20:35 2018 -0400
fsck: call fsck_finish() after fscking objects
commit 1995b5e03e1cc97116be58cdc0502d4a23547856 upstream.
Now that the internal fsck code is capable of checking
.gitmodules files, we just need to teach its callers to use
the "finish" function to check any queued objects.
With this, we can now catch the malicious case in t7415 with
git-fsck.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit e197e085e86c5ebb2eb6f4556f6e82e3bdf8afa6
Author: Jeff King <peff@peff.net>
Date: Wed May 2 17:25:27 2018 -0400
fsck: check .gitmodules content
commit ed8b10f631c9a71df3351d46187bf7f3fa4f9b7e upstream.
This patch detects and blocks submodule names which do not
match the policy set forth in submodule-config. These should
already be caught by the submodule code itself, but putting
the check here means that newer versions of Git can protect
older ones from malicious entries (e.g., a server with
receive.fsckObjects will block the objects, protecting
clients which fetch from it).
As a side effect, this means fsck will also complain about
.gitmodules files that cannot be parsed (or were larger than
core.bigFileThreshold).
[jn: backported by using git_config_from_buf instead of
git_config_from_mem for parsing and error_func instead of
report for reporting fsck errors]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit ad298dfb0e89d05db3f17d9a0997f065c7ed96f5
Author: Jeff King <peff@peff.net>
Date: Wed May 2 17:20:08 2018 -0400
fsck: detect gitmodules files
commit 159e7b080bfa5d34559467cacaa79df89a01afc0 upstream.
In preparation for performing fsck checks on .gitmodules
files, this commit plumbs in the actual detection of the
files. Note that unlike most other fsck checks, this cannot
be a property of a single object: we must know that the
object is found at a ".gitmodules" path at the root tree of
a commit.
Since the fsck code only sees one object at a time, we have
to mark the related objects to fit the puzzle together. When
we see a commit we mark its tree as a root tree, and when
we see a root tree with a .gitmodules file, we mark the
corresponding blob to be checked.
In an ideal world, we'd check the objects in topological
order: commits followed by trees followed by blobs. In that
case we can avoid ever loading an object twice, since all
markings would be complete by the time we get to the marked
objects. And indeed, if we are checking a single packfile,
this is the order in which Git will generally write the
objects. But we can't count on that:
1. git-fsck may show us the objects in arbitrary order
(loose objects are fed in sha1 order, but we may also
have multiple packs, and we process each pack fully in
sequence).
2. The type ordering is just what git-pack-objects happens
to write now. The pack format does not require a
specific order, and it's possible that future versions
of Git (or a custom version trying to fool official
Git's fsck checks!) may order it differently.
3. We may not even be fscking all of the relevant objects
at once. Consider pushing with transfer.fsckObjects,
where one push adds a blob at path "foo", and then a
second push adds the same blob at path ".gitmodules".
The blob is not part of the second push at all, but we
need to mark and check it.
So in the general case, we need to make up to three passes
over the objects: once to make sure we've seen all commits,
then once to cover any trees we might have missed, and then
a final pass to cover any .gitmodules blobs we found in the
second pass.
We can simplify things a bit by loosening the requirement
that we find .gitmodules only at root trees. Technically
a file like "subdir/.gitmodules" is not parsed by Git, but
it's not unreasonable for us to declare that Git is aware of
all ".gitmodules" files and make them eligible for checking.
That lets us drop the root-tree requirement, which
eliminates one pass entirely. And it makes our worst case
much better: instead of potentially queueing every root tree
to be re-examined, the worst case is that we queue each
unique .gitmodules blob for a second look.
This patch just adds the boilerplate to find .gitmodules
files. The actual content checks will come in a subsequent
commit.
[jn: backported to 2.1.y:
- using error_func instead of report to report fsck errors
- using sha1s instead of struct object_id
- using "struct hashmap" directly since "struct oidset" isn't
available]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit f8c9e1358806fc649c48f0a80ea69c0c2f7c9ef2
Author: Karsten Blees <karsten.blees@gmail.com>
Date: Thu Jul 3 00:22:11 2014 +0200
hashmap: add simplified hashmap_get_from_hash() API
Hashmap entries are typically looked up by just a key. The hashmap_get()
API expects an initialized entry structure instead, to support compound
keys. This flexibility is currently only needed by find_dir_entry() in
name-hash.c (and compat/win32/fscache.c in the msysgit fork). All other
(currently five) call sites of hashmap_get() have to set up a near emtpy
entry structure, resulting in duplicate code like this:
struct hashmap_entry keyentry;
hashmap_entry_init(&keyentry, hash(key));
return hashmap_get(map, &keyentry, key);
Add a hashmap_get_from_hash() API that allows hashmap lookups by just
specifying the key and its hash code, i.e.:
return hashmap_get_from_hash(map, hash(key), key);
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit 4c8e18c35a61140fa4cae20c66b8783a677c58cc
Author: Karsten Blees <karsten.blees@gmail.com>
Date: Thu Jul 3 00:20:20 2014 +0200
hashmap: factor out getting a hash code from a SHA1
Copying the first bytes of a SHA1 is duplicated in six places,
however, the implications (the actual value would depend on the
endianness of the platform) is documented only once.
Add a properly documented API for this.
[Dropping non-hashmap.[ch] portions of this patch, as a prepreq for
159e7b080bfa5d34559467cacaa79df89a01afc0 "fsck: detect gitmodules
files" which uses the hashmap implementation. --sbeattie]
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit b3f33dc3b1da719027c946b748ad959ea03cb605
Author: Karsten Blees <karsten.blees@gmail.com>
Date: Wed Dec 18 14:41:27 2013 +0100
hashmap.h: use 'unsigned int' for hash-codes everywhere
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit 53b42170beb29fab2378db8fca1455a825329eef
Author: Karsten Blees <karsten.blees@gmail.com>
Date: Thu Nov 14 20:17:54 2013 +0100
add a hashtable implementation that supports O(1) removal
The existing hashtable implementation (in hash.[ch]) uses open addressing
(i.e. resolve hash collisions by distributing entries across the table).
Thus, removal is difficult to implement with less than O(n) complexity.
Resolving collisions of entries with identical hashes (e.g. via chaining)
is left to the client code.
Add a hashtable implementation that supports O(1) removal and is slightly
easier to use due to builtin entry chaining.
Supports all basic operations init, free, get, add, remove and iteration.
Also includes ready-to-use hash functions based on the public domain FNV-1
algorithm (http://www.isthe.com/chongo/tech/comp/fnv).
The per-entry data structure (hashmap_entry) is piggybacked in front of
the client's data structure to save memory. See test-hashmap.c for usage
examples.
The hashtable is resized by a factor of four when 80% full. With these
settings, average memory consumption is about 2/3 of hash.[ch], and
insertion is about twice as fast due to less frequent resizing.
Lookups are also slightly faster, because entries are strictly confined to
their bucket (i.e. no data of other buckets needs to be traversed).
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit 726f4913b8040c3b92a70f1c84e53c1f9bce3b8e
Author: Jeff King <peff@peff.net>
Date: Wed May 2 15:44:51 2018 -0400
fsck: actually fsck blob data
commit 7ac4f3a007e2567f9d2492806186aa063f9a08d6 upstream.
Because fscking a blob has always been a noop, we didn't
bother passing around the blob data. In preparation for
content-level checks, let's fix up a few things:
1. The fsck_object() function just returns success for any
blob. Let's a noop fsck_blob(), which we can fill in
with actual logic later.
2. The fsck_loose() function in builtin/fsck.c
just threw away blob content after loading it. Let's
hold onto it until after we've called fsck_object().
The easiest way to do this is to just drop the
parse_loose_object() helper entirely. Incidentally,
this also fixes a memory leak: if we successfully
loaded the object data but did not parse it, we would
have left the function without freeing it.
3. When fsck_loose() loads the object data, it
does so with a custom read_loose_object() helper. This
function streams any blobs, regardless of size, under
the assumption that we're only checking the sha1.
Instead, let's actually load blobs smaller than
big_file_threshold, as the normal object-reading
code-paths would do. This lets us fsck small files, and
a NULL return is an indication that the blob was so big
that it needed to be streamed, and we can pass that
information along to fsck_blob().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit e2ba4c21e6effb57fb8b123fb7afec8d5de69959
Author: Johannes Schindelin <johannes.schindelin@gmx.de>
Date: Wed Sep 10 15:52:51 2014 +0200
fsck_object(): allow passing object data separately from the object itself
commits 90a398bbd72477d5d228818db5665fdfcf13431b and
4d0d89755e82c40df88cf94d84031978f8eac827 upstream.
When fsck'ing an incoming pack, we need to fsck objects that cannot be
read via read_sha1_file() because they are not local yet (and might even
be rejected if transfer.fsckobjects is set to 'true').
For commits, there is a hack in place: we basically cache commit
objects' buffers anyway, but the same is not true, say, for tag objects.
By refactoring fsck_object() to take the object buffer and size as
optional arguments -- optional, because we still fall back to the
previous method to look at the cached commit objects if the caller
passes NULL -- we prepare the machinery for the upcoming handling of tag
objects.
The assumption that such buffers are inherently NUL terminated is now
wrong, so make sure that there is at least an empty line in the buffer.
That way, our checks would fail if the empty line was encountered
prematurely, and consequently we can get away with the current string
comparisons even with non-NUL-terminated buffers are passed to
fsck_object().
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit cfaa863b28379e5fb3f1acb871357a5ec85d4844
Author: Jeff King <peff@peff.net>
Date: Wed May 2 16:37:09 2018 -0400
index-pack: make fsck error message more specific
commit db5a58c1bda5b20169b9958af1e8b05ddd178b01 upstream.
If fsck reports an error, we say only "Error in object".
This isn't quite as bad as it might seem, since the fsck
code would have dumped some errors to stderr already. But it
might help to give a little more context. The earlier output
would not have even mentioned "fsck", and that may be a clue
that the "fsck.*" or "*.fsckObjects" config may be relevant.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit a828c426b337f0e519bc80acad5b29616458333e
Author: Jeff King <peff@peff.net>
Date: Fri Jan 13 12:59:44 2017 -0500
fsck: parse loose object paths directly
commit c68b489e56431cf27f7719913ab09ddc62f95912 upstream.
When we iterate over the list of loose objects to check, we
get the actual path of each object. But we then throw it
away and pass just the sha1 to fsck_sha1(), which will do a
fresh lookup. Usually it would find the same object, but it
may not if an object exists both as a loose and a packed
object. We may end up checking the packed object twice, and
never look at the loose one.
In practice this isn't too terrible, because if fsck doesn't
complain, it means you have at least one good copy. But
since the point of fsck is to look for corruption, we should
be thorough.
The new read_loose_object() interface can help us get the
data from disk, and then we replace parse_object() with
parse_object_buffer(). As a bonus, our error messages now
mention the path to a corrupted object, which should make it
easier to track down errors when they do happen.
[jn: backported by passing path through the call chain to
fsck_loose, since until v2.7.0-rc0~68^2~4 (fsck: use
for_each_loose_file_in_objdir, 2015-09-24) it is not
available there]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit 902125f4a8ed87ed9d9df3dd6af963e828bf4f79
Author: Jeff King <peff@peff.net>
Date: Thu Sep 24 17:08:28 2015 -0400
fsck: drop inode-sorting code
commit 144e4cf7092ee8cff44e9c7600aaa7515ad6a78f upstream.
Fsck tries to access loose objects in order of inode number,
with the hope that this would make cold cache access faster
on a spinning disk. This dates back to 7e8c174 (fsck-cache:
sort entries by inode number, 2005-05-02), which predates
the invention of packfiles.
These days, there's not much point in trying to optimize
cold cache for a large number of loose objects. You are much
better off to simply pack the objects, which will reduce the
disk footprint _and_ provide better locality of data access.
So while you can certainly construct pathological cases
where this code might help, it is not worth the trouble
anymore.
[backport to 1.9.x> include t1450-fsck.sh changes from commit
2e770fe47ef9c0b20bc687e37f3eb50f1bf919d0 as the change propogates
returning failure for git fsck. --sbeattie]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit e2bf202e0f46789ebec25ee8584455ac4b9e8ee6
Author: Jeff King <peff@peff.net>
Date: Fri Jan 13 12:58:16 2017 -0500
sha1_file: add read_loose_object() function
commit f6371f9210418f1beabc85b097e2a3470aeeb54d upstream.
It's surprisingly hard to ask the sha1_file code to open a
_specific_ incarnation of a loose object. Most of the
functions take a sha1, and loop over the various object
types (packed versus loose) and locations (local versus
alternates) at a low level.
However, some tools like fsck need to look at a specific
file. This patch gives them a function they can use to open
the loose object at a given path.
The implementation unfortunately ends up repeating bits of
related functions, but there's not a good way around it
without some major refactoring of the whole sha1_file stack.
We need to mmap the specific file, then partially read the
zlib stream to know whether we're streaming or not, and then
finally either stream it or copy the data to a buffer.
We can do that by assembling some of the more arcane
internal sha1_file functions, but we end up having to
essentially reimplement unpack_sha1_file(), along with the
streaming bits of check_sha1_signature().
Still, most of the ugliness is contained in the new
function, and the interface is clean enough that it may be
reusable (though it seems unlikely anything but git-fsck
would care about opening a specific file).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit 15508d17089bef577fe32e4ae031f24bf6274f0a
Author: Jeff King <peff@peff.net>
Date: Fri May 4 20:03:35 2018 -0400
verify_path: disallow symlinks in .gitmodules
commit 10ecfa76491e4923988337b2e2243b05376b40de upstream.
There are a few reasons it's not a good idea to make
.gitmodules a symlink, including:
1. It won't be portable to systems without symlinks.
2. It may behave inconsistently, since Git may look at
this file in the index or a tree without bothering to
resolve any symbolic links. We don't do this _yet_, but
the config infrastructure is there and it's planned for
the future.
With some clever code, we could make (2) work. And some
people may not care about (1) if they only work on one
platform. But there are a few security reasons to simply
disallow it:
a. A symlinked .gitmodules file may circumvent any fsck
checks of the content.
b. Git may read and write from the on-disk file without
sanity checking the symlink target. So for example, if
you link ".gitmodules" to "../oops" and run "git
submodule add", we'll write to the file "oops" outside
the repository.
Again, both of those are problems that _could_ be solved
with sufficient code, but given the complications in (1) and
(2), we're better off just outlawing it explicitly.
Note the slightly tricky call to verify_path() in
update-index's update_one(). There we may not have a mode if
we're not updating from the filesystem (e.g., we might just
be removing the file). Passing "0" as the mode there works
fine; since it's not a symlink, we'll just skip the extra
checks.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit 38b09a0a75948ecda0ae5dd1161498f7d09a957f
Author: Jeff King <peff@peff.net>
Date: Mon May 14 11:00:56 2018 -0400
update-index: stat updated files earlier
commit eb12dd0c764d2b71bebd5ffffb7379a3835253ae upstream.
In the update_one(), we check verify_path() on the proposed
path before doing anything else. In preparation for having
verify_path() look at the file mode, let's stat the file
earlier, so we can check the mode accurately.
This is made a bit trickier by the fact that this function
only does an lstat in a few code paths (the ones that flow
down through process_path()). So we can speculatively do the
lstat() here and pass the results down, and just use a dummy
mode for cases where we won't actually be updating the index
from the filesystem.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit 5d0dff51f2ffd04b64532ac391fcc6d3193f4a9c
Author: Jeff King <peff@peff.net>
Date: Tue May 15 09:56:50 2018 -0400
verify_dotfile: mention case-insensitivity in comment
commit 641084b618ddbe099f0992161988c3e479ae848b upstream.
We're more restrictive than we need to be in matching ".GIT"
on case-sensitive filesystems; let's make a note that this
is intentional.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit 887ef4437acc375084de58b342e0e46e29610725
Author: Jeff King <peff@peff.net>
Date: Sun May 13 13:00:23 2018 -0400
verify_path: drop clever fallthrough
commit e19e5e66d691bdeeeb5e0ed2ffcecdd7666b0d7b upstream.
We check ".git" and ".." in the same switch statement, and
fall through the cases to share the end-of-component check.
While this saves us a line or two, it makes modifying the
function much harder. Let's just write it out.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit b80dad3b02a1a108af37788f31ee418faed7595c
Author: Jeff King <peff@peff.net>
Date: Sun May 13 12:57:14 2018 -0400
skip_prefix: add case-insensitive variant
commit 41a80924aec0e94309786837b6f954a3b3f19b71 upstream.
We have the convenient skip_prefix() helper, but if you want
to do case-insensitive matching, you're stuck doing it by
hand. We could add an extra parameter to the function to
let callers ask for this, but the function is small and
somewhat performance-critical. Let's just re-implement it
for the case-insensitive version.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit 5457d4cbd138a9642115f4449f42ab9317c5f1af
Author: Jeff King <peff@peff.net>
Date: Mon Apr 30 03:25:25 2018 -0400
submodule-config: verify submodule names as paths
commit 0383bbb9015898cbc79abd7b64316484d7713b44 upstream.
Submodule "names" come from the untrusted .gitmodules file,
but we blindly append them to $GIT_DIR/modules to create our
on-disk repo paths. This means you can do bad things by
putting "../" into the name (among other things).
Let's sanity-check these names to avoid building a path that
can be exploited. There are two main decisions:
1. What should the allowed syntax be?
It's tempting to reuse verify_path(), since submodule
names typically come from in-repo paths. But there are
two reasons not to:
a. It's technically more strict than what we need, as
we really care only about breaking out of the
$GIT_DIR/modules/ hierarchy. E.g., having a
submodule named "foo/.git" isn't actually
dangerous, and it's possible that somebody has
manually given such a funny name.
b. Since we'll eventually use this checking logic in
fsck to prevent downstream repositories, it should
be consistent across platforms. Because
verify_path() relies on is_dir_sep(), it wouldn't
block "foo\..\bar" on a non-Windows machine.
2. Where should we enforce it? These days most of the
.gitmodules reads go through submodule-config.c, so
I've put it there in the reading step. That should
cover all of the C code.
We also construct the name for "git submodule add"
inside the git-submodule.sh script. This is probably
not a big deal for security since the name is coming
from the user anyway, but it would be polite to remind
them if the name they pick is invalid (and we need to
expose the name-checker to the shell anyway for our
test scripts).
This patch issues a warning when reading .gitmodules
and just ignores the related config entry completely.
This will generally end up producing a sensible error,
as it works the same as a .gitmodules file which is
missing a submodule entry (so "submodule update" will
barf, but "git clone --recurse-submodules" will print
an error but not abort the clone.
There is one minor oddity, which is that we print the
warning once per malformed config key (since that's how
the config subsystem gives us the entries). So in the
new test, for example, the user would see three
warnings. That's OK, since the intent is that this case
should never come up outside of malicious repositories
(and then it might even benefit the user to see the
message multiple times).
Credit for finding this vulnerability and the proof of
concept from which the test script was adapted goes to
Etienne Stalmans.
[jn: backported to 2.1.y:
- adding a skeletal git submodule--helper command to house
the new check-name subcommand. The full submodule--helper
was not introduced until v2.7.0-rc0~136^2~2 (submodule:
rewrite `module_list` shell function in C, 2015-09-02).
- calling 'git submodule--helper check-name' to validate
submodule names in git-submodule.sh::module_name(). That
shell function was rewritten in C in v2.7.0-rc0~136^2~1
(submodule: rewrite `module_name` shell function in C,
2015-09-02).
- propagating the error from module_name in cmd_foreach.
Without that change, the script passed to 'git submodule
foreach' would see an empty $name for submodules with
invalid name. The same bug still exists in v2.17.1.
- ported the checks in C from the submodule-config API
introduced in v2.6.0-rc0~24^2~3 (submodule: implement a
config API for lookup of .gitmodules values, 2015-08-17)
to the older submodule API.
- the original patch expects 'git clone' to succeed in the
test because v2.13.0-rc0~10^2~3 (clone: teach
--recurse-submodules to optionally take a pathspec,
2017-03-17) makes 'git clone' skip invalid submodules.
Updated the test to pass in older Git versions where the
submodule name check makes 'git clone' fail.]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit eca4704a95f052e903b18c4fddf378104741fcbc
Author: Junio C Hamano <gitster@pobox.com>
Date: Thu Jan 29 12:41:22 2015 -0800
apply: do not touch a file beyond a symbolic link
commit e0d201b61601e17e24ed00cc3d16e8e25ca68596 upstream.
Because Git tracks symbolic links as symbolic links, a path that
has a symbolic link in its leading part (e.g. path/to/dir/file,
where path/to/dir is a symbolic link to somewhere else, be it
inside or outside the working tree) can never appear in a patch
that validly applies, unless the same patch first removes the
symbolic link to allow a directory to be created there.
Detect and reject such a patch.
Things to note:
- Unfortunately, we cannot reuse the has_symlink_leading_path()
from dir.c, as that is only about the working tree, but "git
apply" can be told to apply the patch only to the index or to
both the index and to the working tree.
- We cannot directly use has_symlink_leading_path() even when we
are applying only to the working tree, as an early patch of a
valid input may remove a symbolic link path/to/dir and then a
later patch of the input may create a path path/to/dir/file, but
"git apply" first checks the input without touching either the
index or the working tree. The leading symbolic link check must
be done on the interim result we compute in-core (i.e. after the
first patch, there is no path/to/dir symbolic link and it is
perfectly valid to create path/to/dir/file).
Similarly, when an input creates a symbolic link path/to/dir and
then creates a file path/to/dir/file, we need to flag it as an
error without actually creating path/to/dir symbolic link in the
filesystem.
Instead, for any patch in the input that leaves a path (i.e. a non
deletion) in the result, we check all leading paths against the
resulting tree that the patch would create by inspecting all the
patches in the input and then the target of patch application
(either the index or the working tree).
This way, we catch a mischief or a mistake to add a symbolic link
path/to/dir and a file path/to/dir/file at the same time, while
allowing a valid patch that removes a symbolic link path/to/dir and
then adds a file path/to/dir/file.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit 1b375180858c0fa786fe579a713df1f12728f1dc
Author: Junio C Hamano <gitster@pobox.com>
Date: Fri Jan 30 15:34:13 2015 -0800
apply: do not read from beyond a symbolic link
commit fdc2c3a926c21e24986677abd02c8bc568a5de32 upstream.
We should reject a patch, whether it renames/copies dir/file to
elsewhere with or without modificiation, or updates dir/file in
place, if "dir/" part is actually a symbolic link to elsewhere,
by making sure that the code to read the preimage does not read
from a path that is beyond a symbolic link.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit 6e9f81cf425af97da742914abf877f2dbc0a650f
Author: Junio C Hamano <gitster@pobox.com>
Date: Fri Jan 30 15:15:59 2015 -0800
apply: do not read from the filesystem under --index
commit 3c37a2e339e695c7cc41048fe0921cbc8b48b0f0 upstream.
We currently read the preimage to apply a patch from the index only
when the --cached option is given. Do so also when the command is
running under the --index option. With --index, the index entry and
the working tree file for a path that is involved in a patch must be
identical, so this should not affect the result, but by reading from
the index, we will get the protection to avoid reading an unintended
path beyond a symbolic link automatically.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit 714bced2aa3ccc8b762f79c08443b246d2440402
Author: Junio C Hamano <gitster@pobox.com>
Date: Thu Jan 29 15:35:24 2015 -0800
apply: reject input that touches outside the working area
commit c536c0755f6450b7bcce499cfda171f8c6d1e593 upstream.
By default, a patch that affects outside the working area (either a
Git controlled working tree, or the current working directory when
"git apply" is used as a replacement of GNU patch) is rejected as a
mistake (or a mischief). Git itself does not create such a patch,
unless the user bends over backwards and specifies a non-standard
prefix to "git diff" and friends.
When `git apply` is used as a "better GNU patch", the user can pass
the `--unsafe-paths` option to override this safety check. This
option has no effect when `--index` or `--cached` is in use.
The new test was stolen from Jeff King with slight enhancements.
Note that a few new tests for touching outside the working area by
following a symbolic link are still expected to fail at this step,
but will be fixed in later steps.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
commit be3b8a3011560ead1a87da813b6e25ece3cf94b6
Author: Jeff King <peff@peff.net>
Date: Mon Aug 26 17:57:18 2013 -0400
config: do not use C function names as struct members
According to C99, section 7.1.4:
Any function declared in a header may be additionally
implemented as a function-like macro defined in the
header.
Therefore calling our struct member function pointer "fgetc"
may run afoul of unwanted macro expansion when we call:
char c = cf->fgetc(cf);
This turned out to be a problem on uclibc, which defines
fgetc as a macro and causes compilation failure.
The standard suggests fixing this in a few ways:
1. Using extra parentheses to inhibit the function-like
macro expansion. E.g., "(cf->fgetc)(cf)". This is
undesirable as it's ugly, and each call site needs to
remember to use it (and on systems without the macro,
forgetting will compile just fine).
2. Using #undef (because a conforming implementation must
also be providing fgetc as a function). This is
undesirable because presumably the implementation was
using the macro for a performance benefit, and we are
dropping that optimization.
Instead, we can simply use non-colliding names.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit eb937a70aaded50b0f2d194adb2917881bda129b
Author: Heiko Voigt <hvoigt@hvoigt.net>
Date: Fri Jul 12 00:48:30 2013 +0200
do not die when error in config parsing of buf occurs
If a config parsing error in a file occurs we can die and let the user
fix the issue. This is different for the buf parsing function since it
can be used to parse blobs of .gitmodules files. If a parsing error
occurs here we should proceed since otherwise a database containing such
an error in a single revision could be rendered unusable.
Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit 476b7567d81fffb6d5f84d26acfa3e44df67c4e1
Author: Heiko Voigt <hvoigt@hvoigt.net>
Date: Fri Jul 12 00:46:47 2013 +0200
teach config --blob option to parse config from database
This can be used to read configuration values directly from git's
database. For example it is useful for reading to be checked out
.gitmodules files directly from the database.
Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit db9d2efbf64725c4891db20a571623c846033d70
Author: Heiko Voigt <hvoigt@hvoigt.net>
Date: Fri Jul 12 00:44:39 2013 +0200
config: make parsing stack struct independent from actual data source
To simplify adding other sources we extract all functions needed for
parsing into a list of callbacks. We implement those callbacks for the
current file parsing. A new source can implement its own set of callbacks.
Instead of storing the concrete FILE pointer for parsing we store a void
pointer. A new source can use this to store its custom data.
Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit 345dcdb95bc8e96ded9a0d0da7c9777de7bb9290
Author: Heiko Voigt <hvoigt@hvoigt.net>
Date: Sat May 11 15:19:29 2013 +0200
config: drop cf validity check in get_next_char()
The global variable cf is set with an initialized value in all codepaths before
calling this function.
The complete call graph looks like this:
git_config_from_file
-> do_config_from
-> git_parse_file
-> get_next_char
-> get_value
-> get_next_char
-> parse_value
-> get_next_char
-> get_base_var
-> get_next_char
-> get_extended_base_var
-> get_next_char
The variable is initialized in do_config_from.
Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit 30bacff11067cd4d42e4d57ebe691c8f8eb1e570
Author: Heiko Voigt <hvoigt@hvoigt.net>
Date: Sat May 11 15:18:52 2013 +0200
config: factor out config file stack management
Because a config callback may start parsing a new file, the
global context regarding the current config file is stored
as a stack. Currently we only need to manage that stack from
git_config_from_file. Let's factor it out to allow new
sources of config data.
Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
.gitignore | 1 +
Documentation/git-apply.txt | 12 +-
Documentation/git-config.txt | 7 +
Documentation/technical/api-hashmap.txt | 249 ++++++++++++++++++++++++
Makefile | 4 +
builtin.h | 1 +
builtin/apply.c | 142 +++++++++++++-
builtin/config.c | 31 ++-
builtin/fsck.c | 165 +++++++---------
builtin/index-pack.c | 13 +-
builtin/submodule--helper.c | 35 ++++
builtin/unpack-objects.c | 21 +-
builtin/update-index.c | 31 +--
cache.h | 21 +-
config.c | 217 ++++++++++++++++-----
fsck.c | 193 +++++++++++++++++-
fsck.h | 11 +-
git-compat-util.h | 17 ++
git-submodule.sh | 21 +-
git.c | 1 +
hashmap.c | 228 ++++++++++++++++++++++
hashmap.h | 90 +++++++++
read-cache.c | 30 ++-
sha1_file.c | 132 ++++++++++++-
submodule.c | 29 +++
submodule.h | 7 +
t/lib-pack.sh | 110 +++++++++++
t/t0011-hashmap.sh | 240 +++++++++++++++++++++++
t/t1307-config-blob.sh | 70 +++++++
t/t1450-fsck.sh | 48 ++++-
t/t4122-apply-symlink-inside.sh | 106 ++++++++++
t/t4139-apply-escape.sh | 141 ++++++++++++++
t/t7415-submodule-names.sh | 154 +++++++++++++++
test-hashmap.c | 335 ++++++++++++++++++++++++++++++++
34 files changed, 2717 insertions(+), 196 deletions(-)
create mode 100644 Documentation/technical/api-hashmap.txt
create mode 100644 builtin/submodule--helper.c
create mode 100644 hashmap.c
create mode 100644 hashmap.h
create mode 100644 t/lib-pack.sh
create mode 100755 t/t0011-hashmap.sh
create mode 100755 t/t1307-config-blob.sh
create mode 100755 t/t4139-apply-escape.sh
create mode 100755 t/t7415-submodule-names.sh
create mode 100644 test-hashmap.c
diff --git a/.gitignore b/.gitignore
index 6669bf0..92b0483 100644
--- a/.gitignore
+++ b/.gitignore
@@ -154,6 +154,7 @@
/git-status
/git-stripspace
/git-submodule
+/git-submodule--helper
/git-svn
/git-symbolic-ref
/git-tag
diff --git a/Documentation/git-apply.txt b/Documentation/git-apply.txt
index f605327..9489664 100644
--- a/Documentation/git-apply.txt
+++ b/Documentation/git-apply.txt
@@ -16,7 +16,7 @@ SYNOPSIS
[--ignore-space-change | --ignore-whitespace ]
[--whitespace=(nowarn|warn|fix|error|error-all)]
[--exclude=<path>] [--include=<path>] [--directory=<root>]
- [--verbose] [<patch>...]
+ [--verbose] [--unsafe-paths] [<patch>...]
DESCRIPTION
-----------
@@ -229,6 +229,16 @@ For example, a patch that talks about updating `a/git-gui.sh` to `b/git-gui.sh`
can be applied to the file in the working tree `modules/git-gui/git-gui.sh` by
running `git apply --directory=modules/git-gui`.
+--unsafe-paths::
+ By default, a patch that affects outside the working area
+ (either a Git controlled working tree, or the current working
+ directory when "git apply" is used as a replacement of GNU
+ patch) is rejected as a mistake (or a mischief).
++
+When `git apply` is used as a "better GNU patch", the user can pass
+the `--unsafe-paths` option to override this safety check. This option
+has no effect when `--index` or `--cached` is in use.
+
Configuration
-------------
diff --git a/Documentation/git-config.txt b/Documentation/git-config.txt
index d88a6fc..3a4ed10 100644
--- a/Documentation/git-config.txt
+++ b/Documentation/git-config.txt
@@ -118,6 +118,13 @@ See also <<FILES>>.
--file config-file::
Use the given config file instead of the one specified by GIT_CONFIG.
+--blob blob::
+ Similar to '--file' but use the given blob instead of a file. E.g.
+ you can use 'master:.gitmodules' to read values from the file
+ '.gitmodules' in the master branch. See "SPECIFYING REVISIONS"
+ section in linkgit:gitrevisions[7] for a more complete list of
+ ways to spell blob names.
+
--remove-section::
Remove the given section from the configuration file.
diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt
new file mode 100644
index 0000000..0249b50
--- /dev/null
+++ b/Documentation/technical/api-hashmap.txt
@@ -0,0 +1,249 @@
+hashmap API
+===========
+
+The hashmap API is a generic implementation of hash-based key-value mappings.
+
+Data Structures
+---------------
+
+`struct hashmap`::
+
+ The hash table structure.
++
+The `size` member keeps track of the total number of entries. The `cmpfn`
+member is a function used to compare two entries for equality. The `table` and
+`tablesize` members store the hash table and its size, respectively.
+
+`struct hashmap_entry`::
+
+ An opaque structure representing an entry in the hash table, which must
+ be used as first member of user data structures. Ideally it should be
+ followed by an int-sized member to prevent unused memory on 64-bit
+ systems due to alignment.
++
+The `hash` member is the entry's hash code and the `next` member points to the
+next entry in case of collisions (i.e. if multiple entries map to the same
+bucket).
+
+`struct hashmap_iter`::
+
+ An iterator structure, to be used with hashmap_iter_* functions.
+
+Types
+-----
+
+`int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, const void *keydata)`::
+
+ User-supplied function to test two hashmap entries for equality. Shall
+ return 0 if the entries are equal.
++
+This function is always called with non-NULL `entry` / `entry_or_key`
+parameters that have the same hash code. When looking up an entry, the `key`
+and `keydata` parameters to hashmap_get and hashmap_remove are always passed
+as second and third argument, respectively. Otherwise, `keydata` is NULL.
+
+Functions
+---------
+
+`unsigned int strhash(const char *buf)`::
+`unsigned int strihash(const char *buf)`::
+`unsigned int memhash(const void *buf, size_t len)`::
+`unsigned int memihash(const void *buf, size_t len)`::
+
+ Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
+ http://www.isthe.com/chongo/tech/comp/fnv).
++
+`strhash` and `strihash` take 0-terminated strings, while `memhash` and
+`memihash` operate on arbitrary-length memory.
++
+`strihash` and `memihash` are case insensitive versions.
+
+`void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, size_t initial_size)`::
+
+ Initializes a hashmap structure.
++
+`map` is the hashmap to initialize.
++
+The `equals_function` can be specified to compare two entries for equality.
+If NULL, entries are considered equal if their hash codes are equal.
++
+If the total number of entries is known in advance, the `initial_size`
+parameter may be used to preallocate a sufficiently large table and thus
+prevent expensive resizing. If 0, the table is dynamically resized.
+
+`void hashmap_free(struct hashmap *map, int free_entries)`::
+
+ Frees a hashmap structure and allocated memory.
++
+`map` is the hashmap to free.
++
+If `free_entries` is true, each hashmap_entry in the map is freed as well
+(using stdlib's free()).
+
+`void hashmap_entry_init(void *entry, int hash)`::
+
+ Initializes a hashmap_entry structure.
++
+`entry` points to the entry to initialize.
++
+`hash` is the hash code of the entry.
+
+`void *hashmap_get(const struct hashmap *map, const void *key, const void *keydata)`::
+
+ Returns the hashmap entry for the specified key, or NULL if not found.
++
+`map` is the hashmap structure.
++
+`key` is a hashmap_entry structure (or user data structure that starts with
+hashmap_entry) that has at least been initialized with the proper hash code
+(via `hashmap_entry_init`).
++
+If an entry with matching hash code is found, `key` and `keydata` are passed
+to `hashmap_cmp_fn` to decide whether the entry matches the key.
+
+`void *hashmap_get_from_hash(const struct hashmap *map, unsigned int hash, const void *keydata)`::
+
+ Returns the hashmap entry for the specified hash code and key data,
+ or NULL if not found.
++
+`map` is the hashmap structure.
++
+`hash` is the hash code of the entry to look up.
++
+If an entry with matching hash code is found, `keydata` is passed to
+`hashmap_cmp_fn` to decide whether the entry matches the key. The
+`entry_or_key` parameter points to a bogus hashmap_entry structure that
+should not be used in the comparison.
+
+`void *hashmap_get_next(const struct hashmap *map, const void *entry)`::
+
+ Returns the next equal hashmap entry, or NULL if not found. This can be
+ used to iterate over duplicate entries (see `hashmap_add`).
++
+`map` is the hashmap structure.
++
+`entry` is the hashmap_entry to start the search from, obtained via a previous
+call to `hashmap_get` or `hashmap_get_next`.
+
+`void hashmap_add(struct hashmap *map, void *entry)`::
+
+ Adds a hashmap entry. This allows to add duplicate entries (i.e.
+ separate values with the same key according to hashmap_cmp_fn).
++
+`map` is the hashmap structure.
++
+`entry` is the entry to add.
+
+`void *hashmap_put(struct hashmap *map, void *entry)`::
+
+ Adds or replaces a hashmap entry. If the hashmap contains duplicate
+ entries equal to the specified entry, only one of them will be replaced.
++
+`map` is the hashmap structure.
++
+`entry` is the entry to add or replace.
++
+Returns the replaced entry, or NULL if not found (i.e. the entry was added).
+
+`void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)`::
+
+ Removes a hashmap entry matching the specified key. If the hashmap
+ contains duplicate entries equal to the specified key, only one of
+ them will be removed.
++
+`map` is the hashmap structure.
++
+`key` is a hashmap_entry structure (or user data structure that starts with
+hashmap_entry) that has at least been initialized with the proper hash code
+(via `hashmap_entry_init`).
++
+If an entry with matching hash code is found, `key` and `keydata` are
+passed to `hashmap_cmp_fn` to decide whether the entry matches the key.
++
+Returns the removed entry, or NULL if not found.
+
+`void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter)`::
+`void *hashmap_iter_next(struct hashmap_iter *iter)`::
+`void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`::
+
+ Used to iterate over all entries of a hashmap.
++
+`hashmap_iter_init` initializes a `hashmap_iter` structure.
++
+`hashmap_iter_next` returns the next hashmap_entry, or NULL if there are no
+more entries.
++
+`hashmap_iter_first` is a combination of both (i.e. initializes the iterator
+and returns the first entry, if any).
+
+Usage example
+-------------
+
+Here's a simple usage example that maps long keys to double values.
+[source,c]
+------------
+struct hashmap map;
+
+struct long2double {
+ struct hashmap_entry ent; /* must be the first member! */
+ long key;
+ double value;
+};
+
+static int long2double_cmp(const struct long2double *e1, const struct long2double *e2, const void *unused)
+{
+ return !(e1->key == e2->key);
+}
+
+void long2double_init(void)
+{
+ hashmap_init(&map, (hashmap_cmp_fn) long2double_cmp, 0);
+}
+
+void long2double_free(void)
+{
+ hashmap_free(&map, 1);
+}
+
+static struct long2double *find_entry(long key)
+{
+ struct long2double k;
+ hashmap_entry_init(&k, memhash(&key, sizeof(long)));
+ k.key = key;
+ return hashmap_get(&map, &k, NULL);
+}
+
+double get_value(long key)
+{
+ struct long2double *e = find_entry(key);
+ return e ? e->value : 0;
+}
+
+void set_value(long key, double value)
+{
+ struct long2double *e = find_entry(key);
+ if (!e) {
+ e = malloc(sizeof(struct long2double));
+ hashmap_entry_init(e, memhash(&key, sizeof(long)));
+ e->key = key;
+ hashmap_add(&map, e);
+ }
+ e->value = value;
+}
+------------
+
+Using variable-sized keys
+-------------------------
+
+The `hashmap_entry_get` and `hashmap_entry_remove` functions expect an ordinary
+`hashmap_entry` structure as key to find the correct entry. If the key data is
+variable-sized (e.g. a FLEX_ARRAY string) or quite large, it is undesirable
+to create a full-fledged entry structure on the heap and copy all the key data
+into the structure.
+
+In this case, the `keydata` parameter can be used to pass
+variable-sized key data directly to the comparison function, and the `key`
+parameter can be a stripped-down, fixed size entry structure allocated on the
+stack.
+
+See test-hashmap.c for an example using arbitrary-length strings as keys.
diff --git a/Makefile b/Makefile
index 0f931a2..daefb2f 100644
--- a/Makefile
+++ b/Makefile
@@ -551,6 +551,7 @@ TEST_PROGRAMS_NEED_X += test-date
TEST_PROGRAMS_NEED_X += test-delta
TEST_PROGRAMS_NEED_X += test-dump-cache-tree
TEST_PROGRAMS_NEED_X += test-genrandom
+TEST_PROGRAMS_NEED_X += test-hashmap
TEST_PROGRAMS_NEED_X += test-index-version
TEST_PROGRAMS_NEED_X += test-line-buffer
TEST_PROGRAMS_NEED_X += test-match-trees
@@ -669,6 +670,7 @@ LIB_H += gpg-interface.h
LIB_H += graph.h
LIB_H += grep.h
LIB_H += hash.h
+LIB_H += hashmap.h
LIB_H += help.h
LIB_H += http.h
LIB_H += kwset.h
@@ -796,6 +798,7 @@ LIB_OBJS += gpg-interface.o
LIB_OBJS += graph.o
LIB_OBJS += grep.o
LIB_OBJS += hash.o
+LIB_OBJS += hashmap.o
LIB_OBJS += help.o
LIB_OBJS += hex.o
LIB_OBJS += ident.o
@@ -964,6 +967,7 @@ BUILTIN_OBJS += builtin/shortlog.o
BUILTIN_OBJS += builtin/show-branch.o
BUILTIN_OBJS += builtin/show-ref.o
BUILTIN_OBJS += builtin/stripspace.o
+BUILTIN_OBJS += builtin/submodule--helper.o
BUILTIN_OBJS += builtin/symbolic-ref.o
BUILTIN_OBJS += builtin/tag.o
BUILTIN_OBJS += builtin/tar-tree.o
diff --git a/builtin.h b/builtin.h
index faef559..c8330d9 100644
--- a/builtin.h
+++ b/builtin.h
@@ -127,6 +127,7 @@ extern int cmd_show(int argc, const char **argv, const char *prefix);
extern int cmd_show_branch(int argc, const char **argv, const char *prefix);
extern int cmd_status(int argc, const char **argv, const char *prefix);
extern int cmd_stripspace(int argc, const char **argv, const char *prefix);
+extern int cmd_submodule__helper(int argc, const char **argv, const char *prefix);
extern int cmd_symbolic_ref(int argc, const char **argv, const char *prefix);
extern int cmd_tag(int argc, const char **argv, const char *prefix);
extern int cmd_tar_tree(int argc, const char **argv, const char *prefix);
diff --git a/builtin/apply.c b/builtin/apply.c
index 30eefc3..48e900d 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -50,6 +50,7 @@ static int apply_verbosely;
static int allow_overlap;
static int no_add;
static int threeway;
+static int unsafe_paths;
static const char *fake_ancestor;
static int line_termination = '\n';
static unsigned int p_context = UINT_MAX;
@@ -3135,7 +3136,7 @@ static int load_patch_target(struct strbuf *buf,
const char *name,
unsigned expected_mode)
{
- if (cached) {
+ if (cached || check_index) {
if (read_file_or_gitlink(ce, buf))
return error(_("read of %s failed"), name);
} else if (name) {
@@ -3144,6 +3145,8 @@ static int load_patch_target(struct strbuf *buf,
return read_file_or_gitlink(ce, buf);
else
return SUBMODULE_PATCH_WITHOUT_INDEX;
+ } else if (has_symlink_leading_path(name, strlen(name))) {
+ return error(_("reading from '%s' beyond a symbolic link"), name);
} else {
if (read_old_data(st, name, buf))
return error(_("read of %s failed"), name);
@@ -3482,6 +3485,121 @@ static int check_to_create(const char *new_name, int ok_if_exists)
return 0;
}
+/*
+ * We need to keep track of how symlinks in the preimage are
+ * manipulated by the patches. A patch to add a/b/c where a/b
+ * is a symlink should not be allowed to affect the directory
+ * the symlink points at, but if the same patch removes a/b,
+ * it is perfectly fine, as the patch removes a/b to make room
+ * to create a directory a/b so that a/b/c can be created.
+ */
+static struct string_list symlink_changes;
+#define SYMLINK_GOES_AWAY 01
+#define SYMLINK_IN_RESULT 02
+
+static uintptr_t register_symlink_changes(const char *path, uintptr_t what)
+{
+ struct string_list_item *ent;
+
+ ent = string_list_lookup(&symlink_changes, path);
+ if (!ent) {
+ ent = string_list_insert(&symlink_changes, path);
+ ent->util = (void *)0;
+ }
+ ent->util = (void *)(what | ((uintptr_t)ent->util));
+ return (uintptr_t)ent->util;
+}
+
+static uintptr_t check_symlink_changes(const char *path)
+{
+ struct string_list_item *ent;
+
+ ent = string_list_lookup(&symlink_changes, path);
+ if (!ent)
+ return 0;
+ return (uintptr_t)ent->util;
+}
+
+static void prepare_symlink_changes(struct patch *patch)
+{
+ for ( ; patch; patch = patch->next) {
+ if ((patch->old_name && S_ISLNK(patch->old_mode)) &&
+ (patch->is_rename || patch->is_delete))
+ /* the symlink at patch->old_name is removed */
+ register_symlink_changes(patch->old_name, SYMLINK_GOES_AWAY);
+
+ if (patch->new_name && S_ISLNK(patch->new_mode))
+ /* the symlink at patch->new_name is created or remains */
+ register_symlink_changes(patch->new_name, SYMLINK_IN_RESULT);
+ }
+}
+
+static int path_is_beyond_symlink_1(struct strbuf *name)
+{
+ do {
+ unsigned int change;
+
+ while (--name->len && name->buf[name->len] != '/')
+ ; /* scan backwards */
+ if (!name->len)
+ break;
+ name->buf[name->len] = '\0';
+ change = check_symlink_changes(name->buf);
+ if (change & SYMLINK_IN_RESULT)
+ return 1;
+ if (change & SYMLINK_GOES_AWAY)
+ /*
+ * This cannot be "return 0", because we may
+ * see a new one created at a higher level.
+ */
+ continue;
+
+ /* otherwise, check the preimage */
+ if (check_index) {
+ struct cache_entry *ce;
+
+ ce = cache_name_exists(name->buf, name->len, ignore_case);
+ if (ce && S_ISLNK(ce->ce_mode))
+ return 1;
+ } else {
+ struct stat st;
+ if (!lstat(name->buf, &st) && S_ISLNK(st.st_mode))
+ return 1;
+ }
+ } while (1);
+ return 0;
+}
+
+static int path_is_beyond_symlink(const char *name_)
+{
+ int ret;
+ struct strbuf name = STRBUF_INIT;
+
+ assert(*name_ != '\0');
+ strbuf_addstr(&name, name_);
+ ret = path_is_beyond_symlink_1(&name);
+ strbuf_release(&name);
+
+ return ret;
+}
+
+static void die_on_unsafe_path(struct patch *patch)
+{
+ const char *old_name = NULL;
+ const char *new_name = NULL;
+ if (patch->is_delete)
+ old_name = patch->old_name;
+ else if (!patch->is_new && !patch->is_copy)
+ old_name = patch->old_name;
+ if (!patch->is_delete)
+ new_name = patch->new_name;
+
+ if (old_name && !verify_path(old_name, patch->old_mode))
+ die(_("invalid path '%s'"), old_name);
+ if (new_name && !verify_path(new_name, patch->new_mode))
+ die(_("invalid path '%s'"), new_name);
+}
+
/*
* Check and apply the patch in-core; leave the result in patch->result
* for the caller to write it out to the final destination.
@@ -3569,6 +3687,22 @@ static int check_patch(struct patch *patch)
}
}
+ if (!unsafe_paths)
+ die_on_unsafe_path(patch);
+
+ /*
+ * An attempt to read from or delete a path that is beyond a
+ * symbolic link will be prevented by load_patch_target() that
+ * is called at the beginning of apply_data() so we do not
+ * have to worry about a patch marked with "is_delete" bit
+ * here. We however need to make sure that the patch result
+ * is not deposited to a path that is beyond a symbolic link
+ * here.
+ */
+ if (!patch->is_delete && path_is_beyond_symlink(patch->new_name))
+ return error(_("affected file '%s' is beyond a symbolic link"),
+ patch->new_name);
+
if (apply_data(patch, &st, ce) < 0)
return error(_("%s: patch does not apply"), name);
patch->rejected = 0;
@@ -3579,6 +3713,7 @@ static int check_patch_list(struct patch *patch)
{
int err = 0;
+ prepare_symlink_changes(patch);
prepare_fn_table(patch);
while (patch) {
if (apply_verbosely)
@@ -4378,6 +4513,8 @@ int cmd_apply(int argc, const char **argv, const char *prefix_)
N_("make sure the patch is applicable to the current index")),
OPT_BOOLEAN(0, "cached", &cached,
N_("apply a patch without touching the working tree")),
+ OPT_BOOL(0, "unsafe-paths", &unsafe_paths,
+ N_("accept a patch that touches outside the working area")),
OPT_BOOLEAN(0, "apply", &force_apply,
N_("also apply the patch (use with --stat/--summary/--check)")),
OPT_BOOL('3', "3way", &threeway,
@@ -4450,6 +4587,9 @@ int cmd_apply(int argc, const char **argv, const char *prefix_)
die(_("--cached outside a repository"));
check_index = 1;
}
+ if (check_index)
+ unsafe_paths = 0;
+
for (i = 0; i < argc; i++) {
const char *arg = argv[i];
int fd;
diff --git a/builtin/config.c b/builtin/config.c
index 19ffcaf..000d27c 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -21,6 +21,7 @@ static char term = '\n';
static int use_global_config, use_system_config, use_local_config;
static const char *given_config_file;
+static const char *given_config_blob;
static int actions, types;
static const char *get_color_slot, *get_colorbool_slot;
static int end_null;
@@ -53,6 +54,7 @@ static struct option builtin_config_options[] = {
OPT_BOOLEAN(0, "system", &use_system_config, N_("use system config file")),
OPT_BOOLEAN(0, "local", &use_local_config, N_("use repository config file")),
OPT_STRING('f', "file", &given_config_file, N_("file"), N_("use given config file")),
+ OPT_STRING(0, "blob", &given_config_blob, N_("blob-id"), N_("read config from given blob object")),
OPT_GROUP(N_("Action")),
OPT_BIT(0, "get", &actions, N_("get value: name [value-regex]"), ACTION_GET),
OPT_BIT(0, "get-all", &actions, N_("get all values: key [value-regex]"), ACTION_GET_ALL),
@@ -218,7 +220,8 @@ static int get_value(const char *key_, const char *regex_)
}
git_config_with_options(collect_config, &values,
- given_config_file, respect_includes);
+ given_config_file, given_config_blob,
+ respect_includes);
ret = !values.nr;
@@ -302,7 +305,8 @@ static void get_color(const char *def_color)
get_color_found = 0;
parsed_color[0] = '\0';
git_config_with_options(git_get_color_config, NULL,
- given_config_file, respect_includes);
+ given_config_file, given_config_blob,
+ respect_includes);
if (!get_color_found && def_color)
color_parse(def_color, "command line", parsed_color);
@@ -330,7 +334,8 @@ static int get_colorbool(int print)
get_colorbool_found = -1;
get_diff_color_found = -1;
git_config_with_options(git_get_colorbool_config, NULL,
- given_config_file, respect_includes);
+ given_config_file, given_config_blob,
+ respect_includes);
if (get_colorbool_found < 0) {
if (!strcmp(get_colorbool_slot, "color.diff"))
@@ -348,6 +353,12 @@ static int get_colorbool(int print)
return get_colorbool_found ? 0 : 1;
}
+static void check_blob_write(void)
+{
+ if (given_config_blob)
+ die("writing config blobs is not supported");
+}
+
int cmd_config(int argc, const char **argv, const char *prefix)
{
int nongit = !startup_info->have_repository;
@@ -359,7 +370,8 @@ int cmd_config(int argc, const char **argv, const char *prefix)
builtin_config_usage,
PARSE_OPT_STOP_AT_NON_OPTION);
- if (use_global_config + use_system_config + use_local_config + !!given_config_file > 1) {
+ if (use_global_config + use_system_config + use_local_config +
+ !!given_config_file + !!given_config_blob > 1) {
error("only one config file at a time.");
usage_with_options(builtin_config_usage, builtin_config_options);
}
@@ -438,6 +450,7 @@ int cmd_config(int argc, const char **argv, const char *prefix)
check_argc(argc, 0, 0);
if (git_config_with_options(show_all_config, NULL,
given_config_file,
+ given_config_blob,
respect_includes) < 0) {
if (given_config_file)
die_errno("unable to read config file '%s'",
@@ -450,6 +463,8 @@ int cmd_config(int argc, const char **argv, const char *prefix)
check_argc(argc, 0, 0);
if (!given_config_file && nongit)
die("not in a git directory");
+ if (given_config_blob)
+ die("editing blobs is not supported");
git_config(git_default_config, NULL);
launch_editor(given_config_file ?
given_config_file : git_path("config"),
@@ -457,6 +472,7 @@ int cmd_config(int argc, const char **argv, const char *prefix)
}
else if (actions == ACTION_SET) {
int ret;
+ check_blob_write();
check_argc(argc, 2, 2);
value = normalize_value(argv[0], argv[1]);
ret = git_config_set_in_file(given_config_file, argv[0], value);
@@ -466,18 +482,21 @@ int cmd_config(int argc, const char **argv, const char *prefix)
return ret;
}
else if (actions == ACTION_SET_ALL) {
+ check_blob_write();
check_argc(argc, 2, 3);
value = normalize_value(argv[0], argv[1]);
return git_config_set_multivar_in_file(given_config_file,
argv[0], value, argv[2], 0);
}
else if (actions == ACTION_ADD) {
+ check_blob_write();
check_argc(argc, 2, 2);
value = normalize_value(argv[0], argv[1]);
return git_config_set_multivar_in_file(given_config_file,
argv[0], value, "^$", 0);
}
else if (actions == ACTION_REPLACE_ALL) {
+ check_blob_write();
check_argc(argc, 2, 3);
value = normalize_value(argv[0], argv[1]);
return git_config_set_multivar_in_file(given_config_file,
@@ -500,6 +519,7 @@ int cmd_config(int argc, const char **argv, const char *prefix)
return get_value(argv[0], argv[1]);
}
else if (actions == ACTION_UNSET) {
+ check_blob_write();
check_argc(argc, 1, 2);
if (argc == 2)
return git_config_set_multivar_in_file(given_config_file,
@@ -509,12 +529,14 @@ int cmd_config(int argc, const char **argv, const char *prefix)
argv[0], NULL);
}
else if (actions == ACTION_UNSET_ALL) {
+ check_blob_write();
check_argc(argc, 1, 2);
return git_config_set_multivar_in_file(given_config_file,
argv[0], NULL, argv[1], 1);
}
else if (actions == ACTION_RENAME_SECTION) {
int ret;
+ check_blob_write();
check_argc(argc, 2, 2);
ret = git_config_rename_section_in_file(given_config_file,
argv[0], argv[1]);
@@ -525,6 +547,7 @@ int cmd_config(int argc, const char **argv, const char *prefix)
}
else if (actions == ACTION_REMOVE_SECTION) {
int ret;
+ check_blob_write();
check_argc(argc, 1, 1);
ret = git_config_rename_section_in_file(given_config_file,
argv[0], NULL);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index bb9a2cd..b59f956 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -35,14 +35,6 @@ static int show_dangling = 1;
#define ERROR_REACHABLE 02
#define ERROR_PACK 04
-#ifdef NO_D_INO_IN_DIRENT
-#define SORT_DIRENT 0
-#define DIRENT_SORT_HINT(de) 0
-#else
-#define SORT_DIRENT 1
-#define DIRENT_SORT_HINT(de) ((de)->d_ino)
-#endif
-
static void objreport(struct object *obj, const char *severity,
const char *err, va_list params)
{
@@ -288,7 +280,7 @@ static void check_connectivity(void)
}
}
-static int fsck_obj(struct object *obj)
+static int fsck_obj(struct object *obj, void *buffer, unsigned long size)
{
if (obj->flags & SEEN)
return 0;
@@ -300,7 +292,7 @@ static int fsck_obj(struct object *obj)
if (fsck_walk(obj, mark_used, NULL))
objerror(obj, "broken links");
- if (fsck_object(obj, check_strict, fsck_error_func))
+ if (fsck_object(obj, buffer, size, check_strict, fsck_error_func))
return -1;
if (obj->type == OBJ_TREE) {
@@ -332,17 +324,6 @@ static int fsck_obj(struct object *obj)
return 0;
}
-static int fsck_sha1(const unsigned char *sha1)
-{
- struct object *obj = parse_object(sha1);
- if (!obj) {
- errors_found |= ERROR_OBJECT;
- return error("%s: object corrupt or missing",
- sha1_to_hex(sha1));
- }
- return fsck_obj(obj);
-}
-
static int fsck_obj_buffer(const unsigned char *sha1, enum object_type type,
unsigned long size, void *buffer, int *eaten)
{
@@ -352,86 +333,69 @@ static int fsck_obj_buffer(const unsigned char *sha1, enum object_type type,
errors_found |= ERROR_OBJECT;
return error("%s: object corrupt or missing", sha1_to_hex(sha1));
}
- return fsck_obj(obj);
+ return fsck_obj(obj, buffer, size);
}
-/*
- * This is the sorting chunk size: make it reasonably
- * big so that we can sort well..
- */
-#define MAX_SHA1_ENTRIES (1024)
-
-struct sha1_entry {
- unsigned long ino;
- unsigned char sha1[20];
-};
-
-static struct {
- unsigned long nr;
- struct sha1_entry *entry[MAX_SHA1_ENTRIES];
-} sha1_list;
-
-static int ino_compare(const void *_a, const void *_b)
+static inline int is_loose_object_file(struct dirent *de,
+ char *name, unsigned char *sha1)
{
- const struct sha1_entry *a = _a, *b = _b;
- unsigned long ino1 = a->ino, ino2 = b->ino;
- return ino1 < ino2 ? -1 : ino1 > ino2 ? 1 : 0;
+ if (strlen(de->d_name) != 38)
+ return 0;
+ memcpy(name + 2, de->d_name, 39);
+ return !get_sha1_hex(name, sha1);
}
-static void fsck_sha1_list(void)
+static void fsck_loose(const unsigned char *sha1, const char *path)
{
- int i, nr = sha1_list.nr;
-
- if (SORT_DIRENT)
- qsort(sha1_list.entry, nr,
- sizeof(struct sha1_entry *), ino_compare);
- for (i = 0; i < nr; i++) {
- struct sha1_entry *entry = sha1_list.entry[i];
- unsigned char *sha1 = entry->sha1;
-
- sha1_list.entry[i] = NULL;
- fsck_sha1(sha1);
- free(entry);
+ struct object *obj;
+ enum object_type type;
+ unsigned long size;
+ void *contents;
+ int eaten;
+
+ if (read_loose_object(path, sha1, &type, &size, &contents) < 0) {
+ errors_found |= ERROR_OBJECT;
+ error("%s: object corrupt or missing: %s",
+ sha1_to_hex(sha1), path);
+ return; /* keep checking other objects */
}
- sha1_list.nr = 0;
-}
-static void add_sha1_list(unsigned char *sha1, unsigned long ino)
-{
- struct sha1_entry *entry = xmalloc(sizeof(*entry));
- int nr;
-
- entry->ino = ino;
- hashcpy(entry->sha1, sha1);
- nr = sha1_list.nr;
- if (nr == MAX_SHA1_ENTRIES) {
- fsck_sha1_list();
- nr = 0;
+ if (!contents && type != OBJ_BLOB)
+ die("BUG: read_loose_object streamed a non-blob");
+
+ obj = parse_object_buffer(sha1, type, size, contents, &eaten);
+
+ if (!obj) {
+ errors_found |= ERROR_OBJECT;
+ error("%s: object could not be parsed: %s",
+ sha1_to_hex(sha1), path);
+ if (!eaten)
+ free(contents);
+ return; /* keep checking other objects */
}
- sha1_list.entry[nr] = entry;
- sha1_list.nr = ++nr;
-}
-static inline int is_loose_object_file(struct dirent *de,
- char *name, unsigned char *sha1)
-{
- if (strlen(de->d_name) != 38)
- return 0;
- memcpy(name + 2, de->d_name, 39);
- return !get_sha1_hex(name, sha1);
+ if (fsck_obj(obj, contents, size))
+ errors_found |= ERROR_OBJECT;
+
+ if (!eaten)
+ free(contents);
}
-static void fsck_dir(int i, char *path)
+static void fsck_dir(int i, struct strbuf *path)
{
- DIR *dir = opendir(path);
+ DIR *dir = opendir(path->buf);
struct dirent *de;
char name[100];
+ size_t dirlen;
if (!dir)
return;
if (verbose)
- fprintf(stderr, "Checking directory %s\n", path);
+ fprintf(stderr, "Checking directory %s\n", path->buf);
+
+ strbuf_addch(path, '/');
+ dirlen = path->len;
sprintf(name, "%02x", i);
while ((de = readdir(dir)) != NULL) {
@@ -439,15 +403,20 @@ static void fsck_dir(int i, char *path)
if (is_dot_or_dotdot(de->d_name))
continue;
+
+ strbuf_setlen(path, dirlen);
+ strbuf_addstr(path, de->d_name);
+
if (is_loose_object_file(de, name, sha1)) {
- add_sha1_list(sha1, DIRENT_SORT_HINT(de));
+ fsck_loose(sha1, path->buf);
continue;
}
if (!prefixcmp(de->d_name, "tmp_obj_"))
continue;
- fprintf(stderr, "bad sha1 file: %s/%s\n", path, de->d_name);
+ fprintf(stderr, "bad sha1 file: %s\n", path->buf);
}
closedir(dir);
+ strbuf_setlen(path, dirlen-1);
}
static int default_refs;
@@ -533,24 +502,28 @@ static void get_default_heads(void)
}
}
-static void fsck_object_dir(const char *path)
+static void fsck_object_dir(struct strbuf *path)
{
int i;
struct progress *progress = NULL;
+ size_t dirlen;
if (verbose)
fprintf(stderr, "Checking object directory\n");
+ strbuf_addch(path, '/');
+ dirlen = path->len;
+
if (show_progress)
progress = start_progress("Checking object directories", 256);
for (i = 0; i < 256; i++) {
- static char dir[4096];
- sprintf(dir, "%s/%02x", path, i);
- fsck_dir(i, dir);
+ strbuf_setlen(path, dirlen);
+ strbuf_addf(path, "%02x", i);
+ fsck_dir(i, path);
display_progress(progress, i+1);
}
stop_progress(&progress);
- fsck_sha1_list();
+ strbuf_setlen(path, dirlen - 1);
}
static int fsck_head_link(void)
@@ -629,6 +602,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
{
int i, heads;
struct alternate_object_database *alt;
+ struct strbuf dir = STRBUF_INIT;
errors_found = 0;
read_replace_refs = 0;
@@ -646,15 +620,14 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
}
fsck_head_link();
- fsck_object_dir(get_object_directory());
+ strbuf_addstr(&dir, get_object_directory());
+ fsck_object_dir(&dir);
prepare_alt_odb();
for (alt = alt_odb_list; alt; alt = alt->next) {
- char namebuf[PATH_MAX];
- int namelen = alt->name - alt->base;
- memcpy(namebuf, alt->base, namelen);
- namebuf[namelen - 1] = 0;
- fsck_object_dir(namebuf);
+ strbuf_reset(&dir);
+ strbuf_add(&dir, alt->base, alt->name - alt->base - 1);
+ fsck_object_dir(&dir);
}
if (check_full) {
@@ -681,6 +654,9 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
count += p->num_objects;
}
stop_progress(&progress);
+
+ if (fsck_finish(fsck_error_func))
+ errors_found |= ERROR_OBJECT;
}
heads = 0;
@@ -734,5 +710,6 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
}
check_connectivity();
+ strbuf_release(&dir);
return errors_found;
}
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 79dfe47..3cc2bf6 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -742,6 +742,8 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
blob->object.flags |= FLAG_CHECKED;
else
die(_("invalid blob object %s"), sha1_to_hex(sha1));
+ if (fsck_object(&blob->object, (void *)data, size, 1, fsck_error_function))
+ die(_("fsck error in packed object"));
} else {
struct object *obj;
int eaten;
@@ -757,8 +759,9 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
obj = parse_object_buffer(sha1, type, size, buf, &eaten);
if (!obj)
die(_("invalid %s"), typename(type));
- if (fsck_object(obj, 1, fsck_error_function))
- die(_("Error in object"));
+ if (fsck_object(obj, buf, size, 1,
+ fsck_error_function))
+ die(_("fsck error in packed object"));
if (fsck_walk(obj, mark_link, NULL))
die(_("Not all child objects of %s are reachable"), sha1_to_hex(obj->sha1));
@@ -1320,6 +1323,8 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
} else
chmod(final_index_name, 0444);
+ add_packed_git(final_index_name, strlen(final_index_name), 0);
+
if (!from_stdin) {
printf("%s\n", sha1_to_hex(sha1));
} else {
@@ -1643,6 +1648,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
pack_sha1);
else
close(input_fd);
+
+ if (fsck_finish(fsck_error_function))
+ die(_("fsck error in pack objects"));
+
free(objects);
free(index_name_buf);
free(keep_name_buf);
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
new file mode 100644
index 0000000..cc79d05
--- /dev/null
+++ b/builtin/submodule--helper.c
@@ -0,0 +1,35 @@
+#include "builtin.h"
+#include "submodule.h"
+#include "strbuf.h"
+
+/*
+ * Exit non-zero if any of the submodule names given on the command line is
+ * invalid. If no names are given, filter stdin to print only valid names
+ * (which is primarily intended for testing).
+ */
+static int check_name(int argc, const char **argv, const char *prefix)
+{
+ if (argc > 1) {
+ while (*++argv) {
+ if (check_submodule_name(*argv) < 0)
+ return 1;
+ }
+ } else {
+ struct strbuf buf = STRBUF_INIT;
+ while (strbuf_getline(&buf, stdin, '\n') != EOF) {
+ if (!check_submodule_name(buf.buf))
+ printf("%s\n", buf.buf);
+ }
+ strbuf_release(&buf);
+ }
+ return 0;
+}
+
+int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
+{
+ if (argc < 2)
+ usage("git submodule--helper <command>");
+ if (!strcmp(argv[1], "check-name"))
+ return check_name(argc - 1, argv + 1, prefix);
+ die(_("'%s' is not a valid submodule--helper subcommand"), argv[1]);
+}
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 2217d7b..f5a44ba 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -164,10 +164,10 @@ static unsigned nr_objects;
* Called only from check_object() after it verified this object
* is Ok.
*/
-static void write_cached_object(struct object *obj)
+static void write_cached_object(struct object *obj, struct obj_buffer *obj_buf)
{
unsigned char sha1[20];
- struct obj_buffer *obj_buf = lookup_object_buffer(obj);
+
if (write_sha1_file(obj_buf->buffer, obj_buf->size, typename(obj->type), sha1) < 0)
die("failed to write object %s", sha1_to_hex(obj->sha1));
obj->flags |= FLAG_WRITTEN;
@@ -180,6 +180,8 @@ static void write_cached_object(struct object *obj)
*/
static int check_object(struct object *obj, int type, void *data)
{
+ struct obj_buffer *obj_buf;
+
if (!obj)
return 1;
@@ -198,11 +200,15 @@ static int check_object(struct object *obj, int type, void *data)
return 0;
}
- if (fsck_object(obj, 1, fsck_error_function))
- die("Error in object");
+ obj_buf = lookup_object_buffer(obj);
+ if (!obj_buf)
+ die("Whoops! Cannot find object '%s'", sha1_to_hex(obj->sha1));
+ if (fsck_object(obj, obj_buf->buffer, obj_buf->size, 1,
+ fsck_error_function))
+ die("fsck error in packed object");
if (fsck_walk(obj, check_object, NULL))
die("Error on reachable objects of %s", sha1_to_hex(obj->sha1));
- write_cached_object(obj);
+ write_cached_object(obj, obj_buf);
return 0;
}
@@ -548,8 +554,11 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix)
unpack_all();
git_SHA1_Update(&ctx, buffer, offset);
git_SHA1_Final(sha1, &ctx);
- if (strict)
+ if (strict) {
write_rest();
+ if (fsck_finish(fsck_error_function))
+ die(_("fsck error in pack objects"));
+ }
if (hashcmp(fill(20), sha1))
die("final sha1 did not match");
use(20);
diff --git a/builtin/update-index.c b/builtin/update-index.c
index 5c7762e..bd3715e 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -179,10 +179,9 @@ static int process_directory(const char *path, int len, struct stat *st)
return error("%s: is a directory - add files inside instead", path);
}
-static int process_path(const char *path)
+static int process_path(const char *path, struct stat *st, int stat_errno)
{
int pos, len;
- struct stat st;
struct cache_entry *ce;
len = strlen(path);
@@ -206,13 +205,13 @@ static int process_path(const char *path)
* First things first: get the stat information, to decide
* what to do about the pathname!
*/
- if (lstat(path, &st) < 0)
- return process_lstat_error(path, errno);
+ if (stat_errno)
+ return process_lstat_error(path, stat_errno);
- if (S_ISDIR(st.st_mode))
- return process_directory(path, len, &st);
+ if (S_ISDIR(st->st_mode))
+ return process_directory(path, len, st);
- return add_one_path(ce, path, len, &st);
+ return add_one_path(ce, path, len, st);
}
static int add_cacheinfo(unsigned int mode, const unsigned char *sha1,
@@ -221,7 +220,7 @@ static int add_cacheinfo(unsigned int mode, const unsigned char *sha1,
int size, len, option;
struct cache_entry *ce;
- if (!verify_path(path))
+ if (!verify_path(path, mode))
return error("Invalid path '%s'", path);
len = strlen(path);
@@ -276,7 +275,17 @@ static void chmod_path(int flip, const char *path)
static void update_one(const char *path, const char *prefix, int prefix_length)
{
const char *p = prefix_path(prefix, prefix_length, path);
- if (!verify_path(p)) {
+ int stat_errno = 0;
+ struct stat st;
+
+ if (mark_valid_only || mark_skip_worktree_only || force_remove)
+ st.st_mode = 0;
+ else if (lstat(p, &st) < 0) {
+ st.st_mode = 0;
+ stat_errno = errno;
+ } /* else stat is valid */
+
+ if (!verify_path(p, st.st_mode)) {
fprintf(stderr, "Ignoring path %s\n", path);
goto free_return;
}
@@ -297,7 +306,7 @@ static void update_one(const char *path, const char *prefix, int prefix_length)
report("remove '%s'", path);
goto free_return;
}
- if (process_path(p))
+ if (process_path(p, &st, stat_errno))
die("Unable to process path %s", path);
report("add '%s'", path);
free_return:
@@ -367,7 +376,7 @@ static void read_index_info(int line_termination)
path_name = uq.buf;
}
- if (!verify_path(path_name)) {
+ if (!verify_path(path_name, mode)) {
fprintf(stderr, "Ignoring path %s\n", path_name);
continue;
}
diff --git a/cache.h b/cache.h
index 2ab9ffd..e0dc079 100644
--- a/cache.h
+++ b/cache.h
@@ -448,7 +448,7 @@ extern int read_index_unmerged(struct index_state *);
extern int write_index(struct index_state *, int newfd);
extern int discard_index(struct index_state *);
extern int unmerged_index(const struct index_state *);
-extern int verify_path(const char *path);
+extern int verify_path(const char *path, unsigned mode);
extern struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen, int igncase);
extern int index_name_pos(const struct index_state *, const char *name, int namelen);
#define ADD_CACHE_OK_TO_ADD 1 /* Ok to add */
@@ -883,6 +883,19 @@ extern int dwim_log(const char *str, int len, unsigned char *sha1, char **ref);
extern int interpret_branch_name(const char *str, struct strbuf *);
extern int get_sha1_mb(const char *str, unsigned char *sha1);
+/*
+ * Open the loose object at path, check its sha1, and return the contents,
+ * type, and size. If the object is a blob, then "contents" may return NULL,
+ * to allow streaming of large blobs.
+ *
+ * Returns 0 on success, negative on error (details may be written to stderr).
+ */
+int read_loose_object(const char *path,
+ const unsigned char *expected_sha1,
+ enum object_type *type,
+ unsigned long *size,
+ void **contents);
+
extern int refname_match(const char *abbrev_name, const char *full_name, const char **rules);
extern const char *ref_rev_parse_rules[];
#define ref_fetch_rules ref_rev_parse_rules
@@ -1150,11 +1163,15 @@ extern int update_server_info(int);
typedef int (*config_fn_t)(const char *, const char *, void *);
extern int git_default_config(const char *, const char *, void *);
extern int git_config_from_file(config_fn_t fn, const char *, void *);
+extern int git_config_from_buf(config_fn_t fn, const char *name,
+ const char *buf, size_t len, void *data);
extern void git_config_push_parameter(const char *text);
extern int git_config_from_parameters(config_fn_t fn, void *data);
extern int git_config(config_fn_t fn, void *);
extern int git_config_with_options(config_fn_t fn, void *,
- const char *filename, int respect_includes);
+ const char *filename,
+ const char *blob_ref,
+ int respect_includes);
extern int git_config_early(config_fn_t fn, void *, const char *repo_config);
extern int git_parse_ulong(const char *, unsigned long *);
extern int git_config_int(const char *, const char *);
diff --git a/config.c b/config.c
index 830ee14..201930f 100644
--- a/config.c
+++ b/config.c
@@ -10,20 +10,69 @@
#include "strbuf.h"
#include "quote.h"
-typedef struct config_file {
- struct config_file *prev;
- FILE *f;
+struct config_source {
+ struct config_source *prev;
+ union {
+ FILE *file;
+ struct config_buf {
+ const char *buf;
+ size_t len;
+ size_t pos;
+ } buf;
+ } u;
const char *name;
+ int die_on_error;
int linenr;
int eof;
struct strbuf value;
struct strbuf var;
-} config_file;
-static config_file *cf;
+ int (*do_fgetc)(struct config_source *c);
+ int (*do_ungetc)(int c, struct config_source *conf);
+ long (*do_ftell)(struct config_source *c);
+};
+
+static struct config_source *cf;
static int zlib_compression_seen;
+static int config_file_fgetc(struct config_source *conf)
+{
+ return fgetc(conf->u.file);
+}
+
+static int config_file_ungetc(int c, struct config_source *conf)
+{
+ return ungetc(c, conf->u.file);
+}
+
+static long config_file_ftell(struct config_source *conf)
+{
+ return ftell(conf->u.file);
+}
+
+
+static int config_buf_fgetc(struct config_source *conf)
+{
+ if (conf->u.buf.pos < conf->u.buf.len)
+ return conf->u.buf.buf[conf->u.buf.pos++];
+
+ return EOF;
+}
+
+static int config_buf_ungetc(int c, struct config_source *conf)
+{
+ if (conf->u.buf.pos > 0)
+ return conf->u.buf.buf[--conf->u.buf.pos];
+
+ return EOF;
+}
+
+static long config_buf_ftell(struct config_source *conf)
+{
+ return conf->u.buf.pos;
+}
+
#define MAX_INCLUDE_DEPTH 10
static const char include_depth_advice[] =
"exceeded maximum include depth (%d) while including\n"
@@ -168,27 +217,22 @@ int git_config_from_parameters(config_fn_t fn, void *data)
static int get_next_char(void)
{
- int c;
- FILE *f;
-
- c = '\n';
- if (cf && ((f = cf->f) != NULL)) {
- c = fgetc(f);
- if (c == '\r') {
- /* DOS like systems */
- c = fgetc(f);
- if (c != '\n') {
- ungetc(c, f);
- c = '\r';
- }
- }
- if (c == '\n')
- cf->linenr++;
- if (c == EOF) {
- cf->eof = 1;
- c = '\n';
+ int c = cf->do_fgetc(cf);
+
+ if (c == '\r') {
+ /* DOS like systems */
+ c = cf->do_fgetc(cf);
+ if (c != '\n') {
+ cf->do_ungetc(c, cf);
+ c = '\r';
}
}
+ if (c == '\n')
+ cf->linenr++;
+ if (c == EOF) {
+ cf->eof = 1;
+ c = '\n';
+ }
return c;
}
@@ -339,7 +383,7 @@ static int get_base_var(struct strbuf *name)
}
}
-static int git_parse_file(config_fn_t fn, void *data)
+static int git_parse_source(config_fn_t fn, void *data)
{
int comment = 0;
int baselen = 0;
@@ -399,7 +443,10 @@ static int git_parse_file(config_fn_t fn, void *data)
if (get_value(fn, data, var) < 0)
break;
}
- die("bad config file line %d in %s", cf->linenr, cf->name);
+ if (cf->die_on_error)
+ die("bad config file line %d in %s", cf->linenr, cf->name);
+ else
+ return error("bad config file line %d in %s", cf->linenr, cf->name);
}
static int parse_unit_factor(const char *end, uintmax_t *val)
@@ -896,6 +943,33 @@ int git_default_config(const char *var, const char *value, void *dummy)
return 0;
}
+/*
+ * All source specific fields in the union, die_on_error, name and the callbacks
+ * fgetc, ungetc, ftell of top need to be initialized before calling
+ * this function.
+ */
+static int do_config_from(struct config_source *top, config_fn_t fn, void *data)
+{
+ int ret;
+
+ /* push config-file parsing state stack */
+ top->prev = cf;
+ top->linenr = 1;
+ top->eof = 0;
+ strbuf_init(&top->value, 1024);
+ strbuf_init(&top->var, 1024);
+ cf = top;
+
+ ret = git_parse_source(fn, data);
+
+ /* pop config-file parsing state stack */
+ strbuf_release(&top->value);
+ strbuf_release(&top->var);
+ cf = top->prev;
+
+ return ret;
+}
+
int git_config_from_file(config_fn_t fn, const char *filename, void *data)
{
int ret;
@@ -903,30 +977,74 @@ int git_config_from_file(config_fn_t fn, const char *filename, void *data)
ret = -1;
if (f) {
- config_file top;
+ struct config_source top;
- /* push config-file parsing state stack */
- top.prev = cf;
- top.f = f;
+ top.u.file = f;
top.name = filename;
- top.linenr = 1;
- top.eof = 0;
- strbuf_init(&top.value, 1024);
- strbuf_init(&top.var, 1024);
- cf = ⊤
-
- ret = git_parse_file(fn, data);
+ top.die_on_error = 1;
+ top.do_fgetc = config_file_fgetc;
+ top.do_ungetc = config_file_ungetc;
+ top.do_ftell = config_file_ftell;
- /* pop config-file parsing state stack */
- strbuf_release(&top.value);
- strbuf_release(&top.var);
- cf = top.prev;
+ ret = do_config_from(&top, fn, data);
fclose(f);
}
return ret;
}
+int git_config_from_buf(config_fn_t fn, const char *name, const char *buf,
+ size_t len, void *data)
+{
+ struct config_source top;
+
+ top.u.buf.buf = buf;
+ top.u.buf.len = len;
+ top.u.buf.pos = 0;
+ top.name = name;
+ top.die_on_error = 0;
+ top.do_fgetc = config_buf_fgetc;
+ top.do_ungetc = config_buf_ungetc;
+ top.do_ftell = config_buf_ftell;
+
+ return do_config_from(&top, fn, data);
+}
+
+static int git_config_from_blob_sha1(config_fn_t fn,
+ const char *name,
+ const unsigned char *sha1,
+ void *data)
+{
+ enum object_type type;
+ char *buf;
+ unsigned long size;
+ int ret;
+
+ buf = read_sha1_file(sha1, &type, &size);
+ if (!buf)
+ return error("unable to load config blob object '%s'", name);
+ if (type != OBJ_BLOB) {
+ free(buf);
+ return error("reference '%s' does not point to a blob", name);
+ }
+
+ ret = git_config_from_buf(fn, name, buf, size, data);
+ free(buf);
+
+ return ret;
+}
+
+static int git_config_from_blob_ref(config_fn_t fn,
+ const char *name,
+ void *data)
+{
+ unsigned char sha1[20];
+
+ if (get_sha1(name, sha1) < 0)
+ return error("unable to resolve config blob '%s'", name);
+ return git_config_from_blob_sha1(fn, name, sha1, data);
+}
+
const char *git_etc_gitconfig(void)
{
static const char *system_wide;
@@ -992,7 +1110,9 @@ int git_config_early(config_fn_t fn, void *data, const char *repo_config)
}
int git_config_with_options(config_fn_t fn, void *data,
- const char *filename, int respect_includes)
+ const char *filename,
+ const char *blob_ref,
+ int respect_includes)
{
char *repo_config = NULL;
int ret;
@@ -1011,6 +1131,8 @@ int git_config_with_options(config_fn_t fn, void *data,
*/
if (filename)
return git_config_from_file(fn, filename, data);
+ else if (blob_ref)
+ return git_config_from_blob_ref(fn, blob_ref, data);
repo_config = git_pathdup("config");
ret = git_config_early(fn, data, repo_config);
@@ -1021,7 +1143,7 @@ int git_config_with_options(config_fn_t fn, void *data,
int git_config(config_fn_t fn, void *data)
{
- return git_config_with_options(fn, data, NULL, 1);
+ return git_config_with_options(fn, data, NULL, NULL, 1);
}
/*
@@ -1053,7 +1175,6 @@ static int store_aux(const char *key, const char *value, void *cb)
{
const char *ep;
size_t section_len;
- FILE *f = cf->f;
switch (store.state) {
case KEY_SEEN:
@@ -1065,7 +1186,7 @@ static int store_aux(const char *key, const char *value, void *cb)
return 1;
}
- store.offset[store.seen] = ftell(f);
+ store.offset[store.seen] = cf->do_ftell(cf);
store.seen++;
}
break;
@@ -1092,19 +1213,19 @@ static int store_aux(const char *key, const char *value, void *cb)
* Do not increment matches: this is no match, but we
* just made sure we are in the desired section.
*/
- store.offset[store.seen] = ftell(f);
+ store.offset[store.seen] = cf->do_ftell(cf);
/* fallthru */
case SECTION_END_SEEN:
case START:
if (matches(key, value)) {
- store.offset[store.seen] = ftell(f);
+ store.offset[store.seen] = cf->do_ftell(cf);
store.state = KEY_SEEN;
store.seen++;
} else {
if (strrchr(key, '.') - key == store.baselen &&
!strncmp(key, store.key, store.baselen)) {
store.state = SECTION_SEEN;
- store.offset[store.seen] = ftell(f);
+ store.offset[store.seen] = cf->do_ftell(cf);
}
}
}
diff --git a/fsck.c b/fsck.c
index 99c0497..3c090bd 100644
--- a/fsck.c
+++ b/fsck.c
@@ -6,6 +6,42 @@
#include "commit.h"
#include "tag.h"
#include "fsck.h"
+#include "hashmap.h"
+#include "submodule.h"
+
+struct oidhash_entry {
+ struct hashmap_entry ent;
+ unsigned char sha1[20];
+};
+
+static int oidhash_hashcmp(const void *va, const void *vb,
+ const void *vkey)
+{
+ const struct oidhash_entry *a = va, *b = vb;
+ const unsigned char *key = vkey;
+ return hashcmp(a->sha1, key ? key : b->sha1);
+}
+
+static struct hashmap gitmodules_found;
+static struct hashmap gitmodules_done;
+
+static void oidhash_insert(struct hashmap *h, const unsigned char *sha1)
+{
+ struct oidhash_entry *e;
+
+ if (!h->tablesize)
+ hashmap_init(h, oidhash_hashcmp, 0);
+ e = xmalloc(sizeof(*e));
+ hashmap_entry_init(&e->ent, sha1hash(sha1));
+ hashcpy(e->sha1, sha1);
+ hashmap_add(h, e);
+}
+
+static int oidhash_contains(struct hashmap *h, const unsigned char *sha1)
+{
+ return h->tablesize &&
+ !!hashmap_get_from_hash(h, sha1hash(sha1), sha1);
+}
static int fsck_walk_tree(struct tree *tree, fsck_walk_func walk, void *data)
{
@@ -138,7 +174,7 @@ static int verify_ordered(unsigned mode1, const char *name1, unsigned mode2, con
static int fsck_tree(struct tree *item, int strict, fsck_error error_func)
{
- int retval;
+ int retval = 0;
int has_null_sha1 = 0;
int has_full_path = 0;
int has_empty_name = 0;
@@ -178,6 +214,16 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func)
if (!strcmp(name, ".git"))
has_dotgit = 1;
has_zero_pad |= *(char *)desc.buffer == '0';
+
+ if (!strcmp(name, ".gitmodules")) {
+ if (!S_ISLNK(mode))
+ oidhash_insert(&gitmodules_found, sha1);
+ else
+ retval += error_func(&item->object,
+ FSCK_ERROR,
+ ".gitmodules is a symbolic link");
+ }
+
update_tree_entry(&desc);
switch (mode) {
@@ -219,7 +265,6 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func)
o_name = name;
}
- retval = 0;
if (has_null_sha1)
retval += error_func(&item->object, FSCK_WARN, "contains entries pointing to null sha1");
if (has_full_path)
@@ -243,6 +288,26 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func)
return retval;
}
+static int require_end_of_header(const void *data, unsigned long size,
+ struct object *obj, fsck_error error_func)
+{
+ const char *buffer = (const char *)data;
+ unsigned long i;
+
+ for (i = 0; i < size; i++) {
+ switch (buffer[i]) {
+ case '\0':
+ return error_func(obj, FSCK_ERROR,
+ "unterminated header: NUL at offset %d", i);
+ case '\n':
+ if (i + 1 < size && buffer[i + 1] == '\n')
+ return 0;
+ }
+ }
+
+ return error_func(obj, FSCK_ERROR, "unterminated header");
+}
+
static int fsck_ident(char **ident, struct object *obj, fsck_error error_func)
{
if (**ident == '<')
@@ -279,9 +344,10 @@ static int fsck_ident(char **ident, struct object *obj, fsck_error error_func)
return 0;
}
-static int fsck_commit(struct commit *commit, fsck_error error_func)
+static int fsck_commit(struct commit *commit, char *data,
+ unsigned long size, fsck_error error_func)
{
- char *buffer = commit->buffer;
+ char *buffer = data ? data : commit->buffer;
unsigned char tree_sha1[20], sha1[20];
struct commit_graft *graft;
int parents = 0;
@@ -290,6 +356,9 @@ static int fsck_commit(struct commit *commit, fsck_error error_func)
if (commit->date == ULONG_MAX)
return error_func(&commit->object, FSCK_ERROR, "invalid author/committer line");
+ if (require_end_of_header(buffer, size, &commit->object, error_func))
+ return -1;
+
if (memcmp(buffer, "tree ", 5))
return error_func(&commit->object, FSCK_ERROR, "invalid format - expected 'tree' line");
if (get_sha1_hex(buffer+5, tree_sha1) || buffer[45] != '\n')
@@ -340,7 +409,8 @@ static int fsck_commit(struct commit *commit, fsck_error error_func)
return 0;
}
-static int fsck_tag(struct tag *tag, fsck_error error_func)
+static int fsck_tag(struct tag *tag, const char *data,
+ unsigned long size, fsck_error error_func)
{
struct object *tagged = tag->tagged;
@@ -349,19 +419,80 @@ static int fsck_tag(struct tag *tag, fsck_error error_func)
return 0;
}
-int fsck_object(struct object *obj, int strict, fsck_error error_func)
+struct fsck_gitmodules_data {
+ struct object *obj;
+ fsck_error error_func;
+ int ret;
+};
+
+static int fsck_gitmodules_fn(const char *var, const char *value, void *vdata)
+{
+ struct fsck_gitmodules_data *data = vdata;
+ const char *subsection, *key;
+ int subsection_len;
+ char *name;
+
+ if (parse_config_key(var, "submodule", &subsection, &subsection_len, &key) < 0 ||
+ !subsection)
+ return 0;
+
+ name = xmemdupz(subsection, subsection_len);
+ if (check_submodule_name(name) < 0)
+ data->ret += data->error_func(data->obj, FSCK_ERROR,
+ "disallowed submodule name: %s",
+ name);
+ free(name);
+
+ return 0;
+}
+
+static int fsck_blob(struct blob *blob, const char *buf,
+ unsigned long size, fsck_error error_func)
+{
+ struct fsck_gitmodules_data data;
+
+ if (!oidhash_contains(&gitmodules_found, blob->object.sha1))
+ return 0;
+ oidhash_insert(&gitmodules_done, blob->object.sha1);
+
+ if (!buf) {
+ /*
+ * A missing buffer here is a sign that the caller found the
+ * blob too gigantic to load into memory. Let's just consider
+ * that an error.
+ */
+ return error_func(&blob->object, FSCK_ERROR,
+ ".gitmodules too large to parse");
+ }
+
+ data.obj = &blob->object;
+ data.error_func = error_func;
+ data.ret = 0;
+ if (git_config_from_buf(fsck_gitmodules_fn, ".gitmodules",
+ buf, size, &data))
+ data.ret += error_func(&blob->object, FSCK_ERROR,
+ "could not parse gitmodules blob");
+
+ return data.ret;
+}
+
+int fsck_object(struct object *obj, void *data, unsigned long size,
+ int strict, fsck_error error_func)
{
if (!obj)
return error_func(obj, FSCK_ERROR, "no valid object to fsck");
if (obj->type == OBJ_BLOB)
- return 0;
+ return fsck_blob((struct blob *)obj, (const char *) data,
+ size, error_func);
if (obj->type == OBJ_TREE)
return fsck_tree((struct tree *) obj, strict, error_func);
if (obj->type == OBJ_COMMIT)
- return fsck_commit((struct commit *) obj, error_func);
+ return fsck_commit((struct commit *) obj, data,
+ size, error_func);
if (obj->type == OBJ_TAG)
- return fsck_tag((struct tag *) obj, error_func);
+ return fsck_tag((struct tag *) obj, (const char *) data,
+ size, error_func);
return error_func(obj, FSCK_ERROR, "unknown type '%d' (internal fsck error)",
obj->type);
@@ -382,3 +513,47 @@ int fsck_error_function(struct object *obj, int type, const char *fmt, ...)
strbuf_release(&sb);
return 1;
}
+
+int fsck_finish(fsck_error error_func)
+{
+ int retval = 0;
+ struct hashmap_iter iter;
+ const struct oidhash_entry *e;
+
+ hashmap_iter_init(&gitmodules_found, &iter);
+ while ((e = hashmap_iter_next(&iter))) {
+ const unsigned char *sha1 = e->sha1;
+ struct blob *blob;
+ enum object_type type;
+ unsigned long size;
+ char *buf;
+
+ if (oidhash_contains(&gitmodules_done, sha1))
+ continue;
+
+ blob = lookup_blob(sha1);
+ if (!blob) {
+ retval += error_func(&blob->object, FSCK_ERROR,
+ "non-blob found at .gitmodules");
+ continue;
+ }
+
+ buf = read_sha1_file(sha1, &type, &size);
+ if (!buf) {
+ retval += error_func(&blob->object, FSCK_ERROR,
+ "unable to read .gitmodules blob");
+ continue;
+ }
+
+ if (type == OBJ_BLOB)
+ retval += fsck_blob(blob, buf, size, error_func);
+ else
+ retval += error_func(&blob->object, FSCK_ERROR,
+ "non-blob found at .gitmodules");
+ free(buf);
+ }
+
+ hashmap_free(&gitmodules_found, 1);
+ hashmap_free(&gitmodules_done, 1);
+ return retval;
+}
diff --git a/fsck.h b/fsck.h
index 1e4f527..06a0977 100644
--- a/fsck.h
+++ b/fsck.h
@@ -28,6 +28,15 @@ int fsck_error_function(struct object *obj, int type, const char *fmt, ...);
* 0 everything OK
*/
int fsck_walk(struct object *obj, fsck_walk_func walk, void *data);
-int fsck_object(struct object *obj, int strict, fsck_error error_func);
+/* If NULL is passed for data, we assume the object is local and read it. */
+int fsck_object(struct object *obj, void *data, unsigned long size,
+ int strict, fsck_error error_func);
+
+/*
+ * Some fsck checks are context-dependent, and may end up queued; run this
+ * after completing all fsck_object() calls in order to resolve any remaining
+ * checks.
+ */
+int fsck_finish(fsck_error error_func);
#endif
diff --git a/git-compat-util.h b/git-compat-util.h
index 89abf8f..bd62ab0 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -637,6 +637,23 @@ static inline int sane_iscase(int x, int is_lower)
return (x & 0x20) == 0;
}
+/*
+ * Like skip_prefix, but compare case-insensitively. Note that the comparison
+ * is done via tolower(), so it is strictly ASCII (no multi-byte characters or
+ * locale-specific conversions).
+ */
+static inline int skip_iprefix(const char *str, const char *prefix,
+ const char **out)
+{
+ do {
+ if (!*prefix) {
+ *out = str;
+ return 1;
+ }
+ } while (tolower(*str++) == tolower(*prefix++));
+ return 0;
+}
+
static inline int strtoul_ui(char const *s, int base, unsigned int *result)
{
unsigned long ul;
diff --git a/git-submodule.sh b/git-submodule.sh
index bec3362..ca16579 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -188,6 +188,19 @@ get_submodule_config () {
printf '%s' "${value:-$default}"
}
+#
+# Check whether a submodule name is acceptable, dying if not.
+#
+# $1 = submodule name
+#
+check_module_name()
+{
+ sm_name=$1
+ if ! git submodule--helper check-name "$sm_name"
+ then
+ die "$(eval_gettext "'$sm_name' is not a valid submodule name")"
+ fi
+}
#
# Map submodule path to submodule name
@@ -203,6 +216,7 @@ module_name()
sed -n -e 's|^submodule\.\(.*\)\.path '"$re"'$|\1|p' )
test -z "$name" &&
die "$(eval_gettext "No submodule mapping found in .gitmodules for path '\$sm_path'")"
+ check_module_name "$name"
echo "$name"
}
@@ -389,6 +403,11 @@ Use -f if you really want to add it." >&2
sm_name="$sm_path"
fi
+ if ! git submodule--helper check-name "$sm_name"
+ then
+ die "$(eval_gettext "'$sm_name' is not a valid submodule name")"
+ fi
+
# perhaps the path exists and is already a git repo, else clone it
if test -e "$sm_path"
then
@@ -481,7 +500,7 @@ cmd_foreach()
if test -e "$sm_path"/.git
then
say "$(eval_gettext "Entering '\$prefix\$sm_path'")"
- name=$(module_name "$sm_path")
+ name=$(module_name "$sm_path") || exit
(
prefix="$prefix$sm_path/"
clear_local_git_env
diff --git a/git.c b/git.c
index 1ada169..db12099 100644
--- a/git.c
+++ b/git.c
@@ -404,6 +404,7 @@ static void handle_internal_command(int argc, const char **argv)
{ "stage", cmd_add, RUN_SETUP | NEED_WORK_TREE },
{ "status", cmd_status, RUN_SETUP | NEED_WORK_TREE },
{ "stripspace", cmd_stripspace },
+ { "submodule--helper", cmd_submodule__helper },
{ "symbolic-ref", cmd_symbolic_ref, RUN_SETUP },
{ "tag", cmd_tag, RUN_SETUP },
{ "tar-tree", cmd_tar_tree },
diff --git a/hashmap.c b/hashmap.c
new file mode 100644
index 0000000..d1b8056
--- /dev/null
+++ b/hashmap.c
@@ -0,0 +1,228 @@
+/*
+ * Generic implementation of hash-based key value mappings.
+ */
+#include "cache.h"
+#include "hashmap.h"
+
+#define FNV32_BASE ((unsigned int) 0x811c9dc5)
+#define FNV32_PRIME ((unsigned int) 0x01000193)
+
+unsigned int strhash(const char *str)
+{
+ unsigned int c, hash = FNV32_BASE;
+ while ((c = (unsigned char) *str++))
+ hash = (hash * FNV32_PRIME) ^ c;
+ return hash;
+}
+
+unsigned int strihash(const char *str)
+{
+ unsigned int c, hash = FNV32_BASE;
+ while ((c = (unsigned char) *str++)) {
+ if (c >= 'a' && c <= 'z')
+ c -= 'a' - 'A';
+ hash = (hash * FNV32_PRIME) ^ c;
+ }
+ return hash;
+}
+
+unsigned int memhash(const void *buf, size_t len)
+{
+ unsigned int hash = FNV32_BASE;
+ unsigned char *ucbuf = (unsigned char *) buf;
+ while (len--) {
+ unsigned int c = *ucbuf++;
+ hash = (hash * FNV32_PRIME) ^ c;
+ }
+ return hash;
+}
+
+unsigned int memihash(const void *buf, size_t len)
+{
+ unsigned int hash = FNV32_BASE;
+ unsigned char *ucbuf = (unsigned char *) buf;
+ while (len--) {
+ unsigned int c = *ucbuf++;
+ if (c >= 'a' && c <= 'z')
+ c -= 'a' - 'A';
+ hash = (hash * FNV32_PRIME) ^ c;
+ }
+ return hash;
+}
+
+#define HASHMAP_INITIAL_SIZE 64
+/* grow / shrink by 2^2 */
+#define HASHMAP_RESIZE_BITS 2
+/* load factor in percent */
+#define HASHMAP_LOAD_FACTOR 80
+
+static void alloc_table(struct hashmap *map, unsigned int size)
+{
+ map->tablesize = size;
+ map->table = xcalloc(size, sizeof(struct hashmap_entry *));
+
+ /* calculate resize thresholds for new size */
+ map->grow_at = (unsigned int) ((uint64_t) size * HASHMAP_LOAD_FACTOR / 100);
+ if (size <= HASHMAP_INITIAL_SIZE)
+ map->shrink_at = 0;
+ else
+ /*
+ * The shrink-threshold must be slightly smaller than
+ * (grow-threshold / resize-factor) to prevent erratic resizing,
+ * thus we divide by (resize-factor + 1).
+ */
+ map->shrink_at = map->grow_at / ((1 << HASHMAP_RESIZE_BITS) + 1);
+}
+
+static inline int entry_equals(const struct hashmap *map,
+ const struct hashmap_entry *e1, const struct hashmap_entry *e2,
+ const void *keydata)
+{
+ return (e1 == e2) || (e1->hash == e2->hash && !map->cmpfn(e1, e2, keydata));
+}
+
+static inline unsigned int bucket(const struct hashmap *map,
+ const struct hashmap_entry *key)
+{
+ return key->hash & (map->tablesize - 1);
+}
+
+static void rehash(struct hashmap *map, unsigned int newsize)
+{
+ unsigned int i, oldsize = map->tablesize;
+ struct hashmap_entry **oldtable = map->table;
+
+ alloc_table(map, newsize);
+ for (i = 0; i < oldsize; i++) {
+ struct hashmap_entry *e = oldtable[i];
+ while (e) {
+ struct hashmap_entry *next = e->next;
+ unsigned int b = bucket(map, e);
+ e->next = map->table[b];
+ map->table[b] = e;
+ e = next;
+ }
+ }
+ free(oldtable);
+}
+
+static inline struct hashmap_entry **find_entry_ptr(const struct hashmap *map,
+ const struct hashmap_entry *key, const void *keydata)
+{
+ struct hashmap_entry **e = &map->table[bucket(map, key)];
+ while (*e && !entry_equals(map, *e, key, keydata))
+ e = &(*e)->next;
+ return e;
+}
+
+static int always_equal(const void *unused1, const void *unused2, const void *unused3)
+{
+ return 0;
+}
+
+void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
+ size_t initial_size)
+{
+ unsigned int size = HASHMAP_INITIAL_SIZE;
+ map->size = 0;
+ map->cmpfn = equals_function ? equals_function : always_equal;
+
+ /* calculate initial table size and allocate the table */
+ initial_size = (unsigned int) ((uint64_t) initial_size * 100
+ / HASHMAP_LOAD_FACTOR);
+ while (initial_size > size)
+ size <<= HASHMAP_RESIZE_BITS;
+ alloc_table(map, size);
+}
+
+void hashmap_free(struct hashmap *map, int free_entries)
+{
+ if (!map || !map->table)
+ return;
+ if (free_entries) {
+ struct hashmap_iter iter;
+ struct hashmap_entry *e;
+ hashmap_iter_init(map, &iter);
+ while ((e = hashmap_iter_next(&iter)))
+ free(e);
+ }
+ free(map->table);
+ memset(map, 0, sizeof(*map));
+}
+
+void *hashmap_get(const struct hashmap *map, const void *key, const void *keydata)
+{
+ return *find_entry_ptr(map, key, keydata);
+}
+
+void *hashmap_get_next(const struct hashmap *map, const void *entry)
+{
+ struct hashmap_entry *e = ((struct hashmap_entry *) entry)->next;
+ for (; e; e = e->next)
+ if (entry_equals(map, entry, e, NULL))
+ return e;
+ return NULL;
+}
+
+void hashmap_add(struct hashmap *map, void *entry)
+{
+ unsigned int b = bucket(map, entry);
+
+ /* add entry */
+ ((struct hashmap_entry *) entry)->next = map->table[b];
+ map->table[b] = entry;
+
+ /* fix size and rehash if appropriate */
+ map->size++;
+ if (map->size > map->grow_at)
+ rehash(map, map->tablesize << HASHMAP_RESIZE_BITS);
+}
+
+void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)
+{
+ struct hashmap_entry *old;
+ struct hashmap_entry **e = find_entry_ptr(map, key, keydata);
+ if (!*e)
+ return NULL;
+
+ /* remove existing entry */
+ old = *e;
+ *e = old->next;
+ old->next = NULL;
+
+ /* fix size and rehash if appropriate */
+ map->size--;
+ if (map->size < map->shrink_at)
+ rehash(map, map->tablesize >> HASHMAP_RESIZE_BITS);
+ return old;
+}
+
+void *hashmap_put(struct hashmap *map, void *entry)
+{
+ struct hashmap_entry *old = hashmap_remove(map, entry, NULL);
+ hashmap_add(map, entry);
+ return old;
+}
+
+void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter)
+{
+ iter->map = map;
+ iter->tablepos = 0;
+ iter->next = NULL;
+}
+
+void *hashmap_iter_next(struct hashmap_iter *iter)
+{
+ struct hashmap_entry *current = iter->next;
+ for (;;) {
+ if (current) {
+ iter->next = current->next;
+ return current;
+ }
+
+ if (iter->tablepos >= iter->map->tablesize)
+ return NULL;
+
+ current = iter->map->table[iter->tablepos++];
+ }
+}
diff --git a/hashmap.h b/hashmap.h
new file mode 100644
index 0000000..a8b9e3d
--- /dev/null
+++ b/hashmap.h
@@ -0,0 +1,90 @@
+#ifndef HASHMAP_H
+#define HASHMAP_H
+
+/*
+ * Generic implementation of hash-based key-value mappings.
+ * See Documentation/technical/api-hashmap.txt.
+ */
+
+/* FNV-1 functions */
+
+extern unsigned int strhash(const char *buf);
+extern unsigned int strihash(const char *buf);
+extern unsigned int memhash(const void *buf, size_t len);
+extern unsigned int memihash(const void *buf, size_t len);
+
+static inline unsigned int sha1hash(const unsigned char *sha1)
+{
+ /*
+ * Equivalent to 'return *(unsigned int *)sha1;', but safe on
+ * platforms that don't support unaligned reads.
+ */
+ unsigned int hash;
+ memcpy(&hash, sha1, sizeof(hash));
+ return hash;
+}
+
+/* data structures */
+
+struct hashmap_entry {
+ struct hashmap_entry *next;
+ unsigned int hash;
+};
+
+typedef int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key,
+ const void *keydata);
+
+struct hashmap {
+ struct hashmap_entry **table;
+ hashmap_cmp_fn cmpfn;
+ unsigned int size, tablesize, grow_at, shrink_at;
+};
+
+struct hashmap_iter {
+ struct hashmap *map;
+ struct hashmap_entry *next;
+ unsigned int tablepos;
+};
+
+/* hashmap functions */
+
+extern void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
+ size_t initial_size);
+extern void hashmap_free(struct hashmap *map, int free_entries);
+
+/* hashmap_entry functions */
+
+static inline void hashmap_entry_init(void *entry, unsigned int hash)
+{
+ struct hashmap_entry *e = entry;
+ e->hash = hash;
+ e->next = NULL;
+}
+extern void *hashmap_get(const struct hashmap *map, const void *key,
+ const void *keydata);
+extern void *hashmap_get_next(const struct hashmap *map, const void *entry);
+extern void hashmap_add(struct hashmap *map, void *entry);
+extern void *hashmap_put(struct hashmap *map, void *entry);
+extern void *hashmap_remove(struct hashmap *map, const void *key,
+ const void *keydata);
+
+static inline void *hashmap_get_from_hash(const struct hashmap *map,
+ unsigned int hash, const void *keydata)
+{
+ struct hashmap_entry key;
+ hashmap_entry_init(&key, hash);
+ return hashmap_get(map, &key, keydata);
+}
+
+/* hashmap_iter functions */
+
+extern void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter);
+extern void *hashmap_iter_next(struct hashmap_iter *iter);
+static inline void *hashmap_iter_first(struct hashmap *map,
+ struct hashmap_iter *iter)
+{
+ hashmap_iter_init(map, iter);
+ return hashmap_iter_next(iter);
+}
+
+#endif
diff --git a/read-cache.c b/read-cache.c
index 04ed561..a800c11 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -684,7 +684,7 @@ struct cache_entry *make_cache_entry(unsigned int mode,
int size, len;
struct cache_entry *ce;
- if (!verify_path(path)) {
+ if (!verify_path(path, mode)) {
error("Invalid path '%s'", path);
return NULL;
}
@@ -724,7 +724,7 @@ int ce_path_match(const struct cache_entry *ce, const struct pathspec *pathspec)
* Also, we don't want double slashes or slashes at the
* end that can make pathnames ambiguous.
*/
-static int verify_dotfile(const char *rest)
+static int verify_dotfile(const char *rest, unsigned mode)
{
/*
* The first character was '.', but that
@@ -738,16 +738,28 @@ static int verify_dotfile(const char *rest)
switch (*rest) {
/*
- * ".git" followed by NUL or slash is bad. This
- * shares the path end test with the ".." case.
+ * ".git" followed by NUL or slash is bad. Note that we match
+ * case-insensitively here, even if ignore_case is not set.
+ * This outlaws ".GIT" everywhere out of an abundance of caution,
+ * since there's really no good reason to allow it.
+ *
+ * Once we've seen ".git", we can also find ".gitmodules", etc (also
+ * case-insensitively).
*/
case 'g':
if (rest[1] != 'i')
break;
if (rest[2] != 't')
break;
- rest += 2;
- /* fallthrough */
+ if (rest[3] == '\0' || is_dir_sep(rest[3]))
+ return 0;
+ if (S_ISLNK(mode)) {
+ rest += 3;
+ if (skip_iprefix(rest, "modules", &rest) &&
+ (*rest == '\0' || is_dir_sep(*rest)))
+ return 0;
+ }
+ break;
case '.':
if (rest[1] == '\0' || is_dir_sep(rest[1]))
return 0;
@@ -755,7 +767,7 @@ static int verify_dotfile(const char *rest)
return 1;
}
-int verify_path(const char *path)
+int verify_path(const char *path, unsigned mode)
{
char c;
@@ -769,7 +781,7 @@ int verify_path(const char *path)
if (is_dir_sep(c)) {
inside:
c = *path++;
- if ((c == '.' && !verify_dotfile(path)) ||
+ if ((c == '.' && !verify_dotfile(path, mode)) ||
is_dir_sep(c) || c == '\0')
return 0;
}
@@ -947,7 +959,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
if (!ok_to_add)
return -1;
- if (!verify_path(ce->name))
+ if (!verify_path(ce->name, ce->ce_mode))
return error("Invalid path '%s'", ce->name);
if (!skip_df_check &&
diff --git a/sha1_file.c b/sha1_file.c
index b114cc9..c92efa6 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1325,12 +1325,21 @@ static int open_sha1_file(const unsigned char *sha1)
return -1;
}
-void *map_sha1_file(const unsigned char *sha1, unsigned long *size)
+/*
+ * Map the loose object at "path" if it is not NULL, or the path found by
+ * searching for a loose object named "sha1".
+ */
+static void *map_sha1_file_1(const char *path,
+ const unsigned char *sha1,
+ unsigned long *size)
{
void *map;
int fd;
- fd = open_sha1_file(sha1);
+ if (path)
+ fd = git_open_noatime(path);
+ else
+ fd = open_sha1_file(sha1);
map = NULL;
if (fd >= 0) {
struct stat st;
@@ -1394,6 +1403,11 @@ static int experimental_loose_object(unsigned char *map)
return 1;
}
+void *map_sha1_file(const unsigned char *sha1, unsigned long *size)
+{
+ return map_sha1_file_1(NULL, sha1, size);
+}
+
unsigned long unpack_object_header_buffer(const unsigned char *buf,
unsigned long len, enum object_type *type, unsigned long *sizep)
{
@@ -3043,3 +3057,117 @@ void assert_sha1_type(const unsigned char *sha1, enum object_type expect)
die("%s is not a valid '%s' object", sha1_to_hex(sha1),
typename(expect));
}
+
+static int check_stream_sha1(git_zstream *stream,
+ const char *hdr,
+ unsigned long size,
+ const char *path,
+ const unsigned char *expected_sha1)
+{
+ git_SHA_CTX c;
+ unsigned char real_sha1[20];
+ unsigned char buf[4096];
+ unsigned long total_read;
+ int status = Z_OK;
+
+ git_SHA1_Init(&c);
+ git_SHA1_Update(&c, hdr, stream->total_out);
+
+ /*
+ * We already read some bytes into hdr, but the ones up to the NUL
+ * do not count against the object's content size.
+ */
+ total_read = stream->total_out - strlen(hdr) - 1;
+
+ /*
+ * This size comparison must be "<=" to read the final zlib packets;
+ * see the comment in unpack_sha1_rest for details.
+ */
+ while (total_read <= size &&
+ (status == Z_OK || status == Z_BUF_ERROR)) {
+ stream->next_out = buf;
+ stream->avail_out = sizeof(buf);
+ if (size - total_read < stream->avail_out)
+ stream->avail_out = size - total_read;
+ status = git_inflate(stream, Z_FINISH);
+ git_SHA1_Update(&c, buf, stream->next_out - buf);
+ total_read += stream->next_out - buf;
+ }
+ git_inflate_end(stream);
+
+ if (status != Z_STREAM_END) {
+ error("corrupt loose object '%s'", sha1_to_hex(expected_sha1));
+ return -1;
+ }
+
+ git_SHA1_Final(real_sha1, &c);
+ if (hashcmp(expected_sha1, real_sha1)) {
+ error("sha1 mismatch for %s (expected %s)", path,
+ sha1_to_hex(expected_sha1));
+ return -1;
+ }
+
+ return 0;
+}
+
+int read_loose_object(const char *path,
+ const unsigned char *expected_sha1,
+ enum object_type *type,
+ unsigned long *size,
+ void **contents)
+{
+ int ret = -1;
+ int fd = -1;
+ void *map = NULL;
+ unsigned long mapsize;
+ git_zstream stream;
+ char hdr[32];
+
+ *contents = NULL;
+
+ map = map_sha1_file_1(path, NULL, &mapsize);
+ if (!map) {
+ error("unable to mmap %s: %s", path, strerror(errno));
+ goto out;
+ }
+
+ if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+ error("unable to unpack header of %s", path);
+ goto out;
+ }
+
+ *type = parse_sha1_header(hdr, size);
+ if (*type < 0) {
+ error("unable to parse header of %s", path);
+ git_inflate_end(&stream);
+ goto out;
+ }
+
+ if (*type == OBJ_BLOB && *size > big_file_threshold) {
+ if (check_stream_sha1(&stream, hdr, *size, path, expected_sha1) < 0)
+ goto out;
+ } else {
+ *contents = unpack_sha1_rest(&stream, hdr, *size, expected_sha1);
+ if (!*contents) {
+ error("unable to unpack contents of %s", path);
+ git_inflate_end(&stream);
+ goto out;
+ }
+ if (check_sha1_signature(expected_sha1, *contents,
+ *size, typename(*type))) {
+ error("sha1 mismatch for %s (expected %s)", path,
+ sha1_to_hex(expected_sha1));
+ free(*contents);
+ goto out;
+ }
+ }
+
+ ret = 0; /* everything checks out */
+
+out:
+ if (map)
+ munmap(map, mapsize);
+ if (fd >= 0)
+ close(fd);
+ return ret;
+}
diff --git a/submodule.c b/submodule.c
index 1821a5b..6337cab 100644
--- a/submodule.c
+++ b/submodule.c
@@ -124,6 +124,31 @@ void gitmodules_config(void)
}
}
+int check_submodule_name(const char *name)
+{
+ /* Disallow empty names */
+ if (!*name)
+ return -1;
+
+ /*
+ * Look for '..' as a path component. Check both '/' and '\\' as
+ * separators rather than is_dir_sep(), because we want the name rules
+ * to be consistent across platforms.
+ */
+ goto in_component; /* always start inside component */
+ while (*name) {
+ char c = *name++;
+ if (c == '/' || c == '\\') {
+in_component:
+ if (name[0] == '.' && name[1] == '.' &&
+ (!name[2] || name[2] == '/' || name[2] == '\\'))
+ return -1;
+ }
+ }
+
+ return 0;
+}
+
int parse_submodule_config_option(const char *var, const char *value)
{
struct string_list_item *config;
@@ -132,6 +157,10 @@ int parse_submodule_config_option(const char *var, const char *value)
if (parse_config_key(var, "submodule", &name, &namelen, &key) < 0 || !name)
return 0;
+ if (check_submodule_name(name) < 0) {
+ warning(_("ignoring suspicious submodule name: %s"), name);
+ return 0;
+ }
if (!strcmp(key, "path")) {
config = unsorted_string_list_lookup(&config_name_for_path, value);
diff --git a/submodule.h b/submodule.h
index c7ffc7c..59dbdfb 100644
--- a/submodule.h
+++ b/submodule.h
@@ -37,4 +37,11 @@ int find_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_nam
struct string_list *needs_pushing);
int push_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name);
+/*
+ * Returns 0 if the name is syntactically acceptable as a submodule "name"
+ * (e.g., that may be found in the subsection of a .gitmodules file) and -1
+ * otherwise.
+ */
+int check_submodule_name(const char *name);
+
#endif
diff --git a/t/lib-pack.sh b/t/lib-pack.sh
new file mode 100644
index 0000000..4674899
--- /dev/null
+++ b/t/lib-pack.sh
@@ -0,0 +1,110 @@
+# Support routines for hand-crafting weird or malicious packs.
+#
+# You can make a complete pack like:
+#
+# pack_header 2 >foo.pack &&
+# pack_obj e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 >>foo.pack &&
+# pack_obj e68fe8129b546b101aee9510c5328e7f21ca1d18 >>foo.pack &&
+# pack_trailer foo.pack
+
+# Print the big-endian 4-byte octal representation of $1
+uint32_octal () {
+ n=$1
+ printf '\\%o' $(($n / 16777216)); n=$((n % 16777216))
+ printf '\\%o' $(($n / 65536)); n=$((n % 65536))
+ printf '\\%o' $(($n / 256)); n=$((n % 256))
+ printf '\\%o' $(($n ));
+}
+
+# Print the big-endian 4-byte binary representation of $1
+uint32_binary () {
+ printf "$(uint32_octal "$1")"
+}
+
+# Print a pack header, version 2, for a pack with $1 objects
+pack_header () {
+ printf 'PACK' &&
+ printf '\0\0\0\2' &&
+ uint32_binary "$1"
+}
+
+# Print the pack data for object $1, as a delta against object $2 (or as a full
+# object if $2 is missing or empty). The output is suitable for including
+# directly in the packfile, and represents the entirety of the object entry.
+# Doing this on the fly (especially picking your deltas) is quite tricky, so we
+# have hardcoded some well-known objects. See the case statements below for the
+# complete list.
+pack_obj () {
+ case "$1" in
+ # empty blob
+ e69de29bb2d1d6434b8b29ae775ad8c2e48c5391)
+ case "$2" in
+ '')
+ printf '\060\170\234\003\0\0\0\0\1'
+ return
+ ;;
+ esac
+ ;;
+
+ # blob containing "\7\76"
+ e68fe8129b546b101aee9510c5328e7f21ca1d18)
+ case "$2" in
+ '')
+ printf '\062\170\234\143\267\3\0\0\116\0\106'
+ return
+ ;;
+ 01d7713666f4de822776c7622c10f1b07de280dc)
+ printf '\165\1\327\161\66\146\364\336\202\47\166' &&
+ printf '\307\142\54\20\361\260\175\342\200\334\170' &&
+ printf '\234\143\142\142\142\267\003\0\0\151\0\114'
+ return
+ ;;
+ esac
+ ;;
+
+ # blob containing "\7\0"
+ 01d7713666f4de822776c7622c10f1b07de280dc)
+ case "$2" in
+ '')
+ printf '\062\170\234\143\147\0\0\0\20\0\10'
+ return
+ ;;
+ e68fe8129b546b101aee9510c5328e7f21ca1d18)
+ printf '\165\346\217\350\22\233\124\153\20\32\356' &&
+ printf '\225\20\305\62\216\177\41\312\35\30\170\234' &&
+ printf '\143\142\142\142\147\0\0\0\53\0\16'
+ return
+ ;;
+ esac
+ ;;
+ esac
+
+ # If it's not a delta, we can convince pack-objects to generate a pack
+ # with just our entry, and then strip off the header (12 bytes) and
+ # trailer (20 bytes).
+ if test -z "$2"
+ then
+ echo "$1" | git pack-objects --stdout >pack_obj.tmp &&
+ size=$(wc -c <pack_obj.tmp) &&
+ dd if=pack_obj.tmp bs=1 count=$((size - 20 - 12)) skip=12 &&
+ rm -f pack_obj.tmp
+ return
+ fi
+
+ echo >&2 "BUG: don't know how to print $1${2:+ (from $2)}"
+ return 1
+}
+
+# Compute and append pack trailer to "$1"
+pack_trailer () {
+ test-sha1 -b <"$1" >trailer.tmp &&
+ cat trailer.tmp >>"$1" &&
+ rm -f trailer.tmp
+}
+
+# Remove any existing packs to make sure that
+# whatever we index next will be the pack that we
+# actually use.
+clear_packs () {
+ rm -f .git/objects/pack/*
+}
diff --git a/t/t0011-hashmap.sh b/t/t0011-hashmap.sh
new file mode 100755
index 0000000..391e2b6
--- /dev/null
+++ b/t/t0011-hashmap.sh
@@ -0,0 +1,240 @@
+#!/bin/sh
+
+test_description='test hashmap and string hash functions'
+. ./test-lib.sh
+
+test_hashmap() {
+ echo "$1" | test-hashmap $3 > actual &&
+ echo "$2" > expect &&
+ test_cmp expect actual
+}
+
+test_expect_success 'hash functions' '
+
+test_hashmap "hash key1" "2215982743 2215982743 116372151 116372151" &&
+test_hashmap "hash key2" "2215982740 2215982740 116372148 116372148" &&
+test_hashmap "hash fooBarFrotz" "1383912807 1383912807 3189766727 3189766727" &&
+test_hashmap "hash foobarfrotz" "2862305959 2862305959 3189766727 3189766727"
+
+'
+
+test_expect_success 'put' '
+
+test_hashmap "put key1 value1
+put key2 value2
+put fooBarFrotz value3
+put foobarfrotz value4
+size" "NULL
+NULL
+NULL
+NULL
+64 4"
+
+'
+
+test_expect_success 'put (case insensitive)' '
+
+test_hashmap "put key1 value1
+put key2 value2
+put fooBarFrotz value3
+size" "NULL
+NULL
+NULL
+64 3" ignorecase
+
+'
+
+test_expect_success 'replace' '
+
+test_hashmap "put key1 value1
+put key1 value2
+put fooBarFrotz value3
+put fooBarFrotz value4
+size" "NULL
+value1
+NULL
+value3
+64 2"
+
+'
+
+test_expect_success 'replace (case insensitive)' '
+
+test_hashmap "put key1 value1
+put Key1 value2
+put fooBarFrotz value3
+put foobarfrotz value4
+size" "NULL
+value1
+NULL
+value3
+64 2" ignorecase
+
+'
+
+test_expect_success 'get' '
+
+test_hashmap "put key1 value1
+put key2 value2
+put fooBarFrotz value3
+put foobarfrotz value4
+get key1
+get key2
+get fooBarFrotz
+get notInMap" "NULL
+NULL
+NULL
+NULL
+value1
+value2
+value3
+NULL"
+
+'
+
+test_expect_success 'get (case insensitive)' '
+
+test_hashmap "put key1 value1
+put key2 value2
+put fooBarFrotz value3
+get Key1
+get keY2
+get foobarfrotz
+get notInMap" "NULL
+NULL
+NULL
+value1
+value2
+value3
+NULL" ignorecase
+
+'
+
+test_expect_success 'add' '
+
+test_hashmap "add key1 value1
+add key1 value2
+add fooBarFrotz value3
+add fooBarFrotz value4
+get key1
+get fooBarFrotz
+get notInMap" "value2
+value1
+value4
+value3
+NULL"
+
+'
+
+test_expect_success 'add (case insensitive)' '
+
+test_hashmap "add key1 value1
+add Key1 value2
+add fooBarFrotz value3
+add foobarfrotz value4
+get key1
+get Foobarfrotz
+get notInMap" "value2
+value1
+value4
+value3
+NULL" ignorecase
+
+'
+
+test_expect_success 'remove' '
+
+test_hashmap "put key1 value1
+put key2 value2
+put fooBarFrotz value3
+remove key1
+remove key2
+remove notInMap
+size" "NULL
+NULL
+NULL
+value1
+value2
+NULL
+64 1"
+
+'
+
+test_expect_success 'remove (case insensitive)' '
+
+test_hashmap "put key1 value1
+put key2 value2
+put fooBarFrotz value3
+remove Key1
+remove keY2
+remove notInMap
+size" "NULL
+NULL
+NULL
+value1
+value2
+NULL
+64 1" ignorecase
+
+'
+
+test_expect_success 'iterate' '
+
+test_hashmap "put key1 value1
+put key2 value2
+put fooBarFrotz value3
+iterate" "NULL
+NULL
+NULL
+key2 value2
+key1 value1
+fooBarFrotz value3"
+
+'
+
+test_expect_success 'iterate (case insensitive)' '
+
+test_hashmap "put key1 value1
+put key2 value2
+put fooBarFrotz value3
+iterate" "NULL
+NULL
+NULL
+fooBarFrotz value3
+key2 value2
+key1 value1" ignorecase
+
+'
+
+test_expect_success 'grow / shrink' '
+
+ rm -f in &&
+ rm -f expect &&
+ for n in $(test_seq 51)
+ do
+ echo put key$n value$n >> in &&
+ echo NULL >> expect
+ done &&
+ echo size >> in &&
+ echo 64 51 >> expect &&
+ echo put key52 value52 >> in &&
+ echo NULL >> expect
+ echo size >> in &&
+ echo 256 52 >> expect &&
+ for n in $(test_seq 12)
+ do
+ echo remove key$n >> in &&
+ echo value$n >> expect
+ done &&
+ echo size >> in &&
+ echo 256 40 >> expect &&
+ echo remove key40 >> in &&
+ echo value40 >> expect &&
+ echo size >> in &&
+ echo 64 39 >> expect &&
+ cat in | test-hashmap > out &&
+ test_cmp expect out
+
+'
+
+test_done
diff --git a/t/t1307-config-blob.sh b/t/t1307-config-blob.sh
new file mode 100755
index 0000000..fdc257e
--- /dev/null
+++ b/t/t1307-config-blob.sh
@@ -0,0 +1,70 @@
+#!/bin/sh
+
+test_description='support for reading config from a blob'
+. ./test-lib.sh
+
+test_expect_success 'create config blob' '
+ cat >config <<-\EOF &&
+ [some]
+ value = 1
+ EOF
+ git add config &&
+ git commit -m foo
+'
+
+test_expect_success 'list config blob contents' '
+ echo some.value=1 >expect &&
+ git config --blob=HEAD:config --list >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'fetch value from blob' '
+ echo true >expect &&
+ git config --blob=HEAD:config --bool some.value >actual &&
+ test_cmp expect actual
+'
+
+test_expect_success 'reading non-existing value from blob is an error' '
+ test_must_fail git config --blob=HEAD:config non.existing
+'
+
+test_expect_success 'reading from blob and file is an error' '
+ test_must_fail git config --blob=HEAD:config --system --list
+'
+
+test_expect_success 'reading from missing ref is an error' '
+ test_must_fail git config --blob=HEAD:doesnotexist --list
+'
+
+test_expect_success 'reading from non-blob is an error' '
+ test_must_fail git config --blob=HEAD --list
+'
+
+test_expect_success 'setting a value in a blob is an error' '
+ test_must_fail git config --blob=HEAD:config some.value foo
+'
+
+test_expect_success 'deleting a value in a blob is an error' '
+ test_must_fail git config --blob=HEAD:config --unset some.value
+'
+
+test_expect_success 'editing a blob is an error' '
+ test_must_fail git config --blob=HEAD:config --edit
+'
+
+test_expect_success 'parse errors in blobs are properly attributed' '
+ cat >config <<-\EOF &&
+ [some]
+ value = "
+ EOF
+ git add config &&
+ git commit -m broken &&
+
+ test_must_fail git config --blob=HEAD:config some.value 2>err &&
+
+ # just grep for our token as the exact error message is likely to
+ # change or be internationalized
+ grep "HEAD:config" err
+'
+
+test_done
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index d730734..9dfa4b0 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -69,7 +69,7 @@ test_expect_success 'object with bad sha1' '
git update-ref refs/heads/bogus $cmt &&
test_when_finished "git update-ref -d refs/heads/bogus" &&
- test_might_fail git fsck 2>out &&
+ test_must_fail git fsck 2>out &&
cat out &&
grep "$sha.*corrupt" out
'
@@ -101,7 +101,7 @@ test_expect_success 'email with embedded > is not okay' '
test_when_finished "remove_object $new" &&
git update-ref refs/heads/bogus "$new" &&
test_when_finished "git update-ref -d refs/heads/bogus" &&
- git fsck 2>out &&
+ test_must_fail git fsck 2>out &&
cat out &&
grep "error in commit $new" out
'
@@ -113,7 +113,7 @@ test_expect_success 'missing < email delimiter is reported nicely' '
test_when_finished "remove_object $new" &&
git update-ref refs/heads/bogus "$new" &&
test_when_finished "git update-ref -d refs/heads/bogus" &&
- git fsck 2>out &&
+ test_must_fail git fsck 2>out &&
cat out &&
grep "error in commit $new.* - bad name" out
'
@@ -125,7 +125,7 @@ test_expect_success 'missing email is reported nicely' '
test_when_finished "remove_object $new" &&
git update-ref refs/heads/bogus "$new" &&
test_when_finished "git update-ref -d refs/heads/bogus" &&
- git fsck 2>out &&
+ test_must_fail git fsck 2>out &&
cat out &&
grep "error in commit $new.* - missing email" out
'
@@ -137,11 +137,33 @@ test_expect_success '> in name is reported' '
test_when_finished "remove_object $new" &&
git update-ref refs/heads/bogus "$new" &&
test_when_finished "git update-ref -d refs/heads/bogus" &&
- git fsck 2>out &&
+ test_must_fail git fsck 2>out &&
cat out &&
grep "error in commit $new" out
'
+test_expect_success 'malformatted tree object' '
+ test_when_finished "git update-ref -d refs/tags/wrong" &&
+ test_when_finished "for i in \$T; do remove_object \$i; done" &&
+ T=$(
+ GIT_INDEX_FILE=test-index &&
+ export GIT_INDEX_FILE &&
+ rm -f test-index &&
+ >x &&
+ git add x &&
+ git rev-parse :x &&
+ T=$(git write-tree) &&
+ echo $T &&
+ (
+ git cat-file tree $T &&
+ git cat-file tree $T
+ ) |
+ git hash-object -w -t tree --stdin
+ ) &&
+ test_must_fail git fsck 2>out &&
+ grep "error in tree .*contains duplicate file entries" out
+'
+
test_expect_success 'tag pointing to nonexistent' '
cat >invalid-tag <<-\EOF &&
object ffffffffffffffffffffffffffffffffffffffff
@@ -268,4 +290,20 @@ test_expect_success 'fsck notices ".git" in trees' '
)
'
+test_expect_success 'fsck finds problems in duplicate loose objects' '
+ rm -rf broken-duplicate &&
+ git init broken-duplicate &&
+ (
+ cd broken-duplicate &&
+ test_commit duplicate &&
+ # no "-d" here, so we end up with duplicates
+ git repack &&
+ # now corrupt the loose copy
+ file=$(sha1_file "$(git rev-parse HEAD)") &&
+ rm "$file" &&
+ echo broken >"$file" &&
+ test_must_fail git fsck
+ )
+'
+
test_done
diff --git a/t/t4122-apply-symlink-inside.sh b/t/t4122-apply-symlink-inside.sh
index 3940737..b5832e5 100755
--- a/t/t4122-apply-symlink-inside.sh
+++ b/t/t4122-apply-symlink-inside.sh
@@ -52,4 +52,110 @@ test_expect_success SYMLINKS 'check result' '
'
+test_expect_success SYMLINKS 'do not read from beyond symbolic link' '
+ git reset --hard &&
+ mkdir -p arch/x86_64/dir &&
+ >arch/x86_64/dir/file &&
+ git add arch/x86_64/dir/file &&
+ echo line >arch/x86_64/dir/file &&
+ git diff >patch &&
+ git reset --hard &&
+
+ mkdir arch/i386/dir &&
+ >arch/i386/dir/file &&
+ ln -s ../i386/dir arch/x86_64/dir &&
+
+ test_must_fail git apply patch &&
+ test_must_fail git apply --cached patch &&
+ test_must_fail git apply --index patch
+
+'
+
+test_expect_success SYMLINKS 'do not follow symbolic link (setup)' '
+
+ rm -rf arch/i386/dir arch/x86_64/dir &&
+ git reset --hard &&
+ ln -s ../i386/dir arch/x86_64/dir &&
+ git add arch/x86_64/dir &&
+ git diff HEAD >add_symlink.patch &&
+ git reset --hard &&
+
+ mkdir arch/x86_64/dir &&
+ >arch/x86_64/dir/file &&
+ git add arch/x86_64/dir/file &&
+ git diff HEAD >add_file.patch &&
+ git diff -R HEAD >del_file.patch &&
+ git reset --hard &&
+ rm -fr arch/x86_64/dir &&
+
+ cat add_symlink.patch add_file.patch >patch &&
+ cat add_symlink.patch del_file.patch >tricky_del &&
+
+ mkdir arch/i386/dir
+'
+
+test_expect_success SYMLINKS 'do not follow symbolic link (same input)' '
+
+ # same input creates a confusing symbolic link
+ test_must_fail git apply patch 2>error-wt &&
+ test_i18ngrep "beyond a symbolic link" error-wt &&
+ test_path_is_missing arch/x86_64/dir &&
+ test_path_is_missing arch/i386/dir/file &&
+
+ test_must_fail git apply --index patch 2>error-ix &&
+ test_i18ngrep "beyond a symbolic link" error-ix &&
+ test_path_is_missing arch/x86_64/dir &&
+ test_path_is_missing arch/i386/dir/file &&
+ test_must_fail git ls-files --error-unmatch arch/x86_64/dir &&
+ test_must_fail git ls-files --error-unmatch arch/i386/dir &&
+
+ test_must_fail git apply --cached patch 2>error-ct &&
+ test_i18ngrep "beyond a symbolic link" error-ct &&
+ test_must_fail git ls-files --error-unmatch arch/x86_64/dir &&
+ test_must_fail git ls-files --error-unmatch arch/i386/dir &&
+
+ >arch/i386/dir/file &&
+ git add arch/i386/dir/file &&
+
+ test_must_fail git apply tricky_del &&
+ test_path_is_file arch/i386/dir/file &&
+
+ test_must_fail git apply --index tricky_del &&
+ test_path_is_file arch/i386/dir/file &&
+ test_must_fail git ls-files --error-unmatch arch/x86_64/dir &&
+ git ls-files --error-unmatch arch/i386/dir &&
+
+ test_must_fail git apply --cached tricky_del &&
+ test_must_fail git ls-files --error-unmatch arch/x86_64/dir &&
+ git ls-files --error-unmatch arch/i386/dir
+'
+
+test_expect_success SYMLINKS 'do not follow symbolic link (existing)' '
+
+ # existing symbolic link
+ git reset --hard &&
+ ln -s ../i386/dir arch/x86_64/dir &&
+ git add arch/x86_64/dir &&
+
+ test_must_fail git apply add_file.patch 2>error-wt-add &&
+ test_i18ngrep "beyond a symbolic link" error-wt-add &&
+ test_path_is_missing arch/i386/dir/file &&
+
+ mkdir arch/i386/dir &&
+ >arch/i386/dir/file &&
+ test_must_fail git apply del_file.patch 2>error-wt-del &&
+ test_i18ngrep "beyond a symbolic link" error-wt-del &&
+ test_path_is_file arch/i386/dir/file &&
+ rm arch/i386/dir/file &&
+
+ test_must_fail git apply --index add_file.patch 2>error-ix-add &&
+ test_i18ngrep "beyond a symbolic link" error-ix-add &&
+ test_path_is_missing arch/i386/dir/file &&
+ test_must_fail git ls-files --error-unmatch arch/i386/dir &&
+
+ test_must_fail git apply --cached add_file.patch 2>error-ct-file &&
+ test_i18ngrep "beyond a symbolic link" error-ct-file &&
+ test_must_fail git ls-files --error-unmatch arch/i386/dir
+'
+
test_done
diff --git a/t/t4139-apply-escape.sh b/t/t4139-apply-escape.sh
new file mode 100755
index 0000000..45b5660
--- /dev/null
+++ b/t/t4139-apply-escape.sh
@@ -0,0 +1,141 @@
+#!/bin/sh
+
+test_description='paths written by git-apply cannot escape the working tree'
+. ./test-lib.sh
+
+# tests will try to write to ../foo, and we do not
+# want them to escape the trash directory when they
+# fail
+test_expect_success 'bump git repo one level down' '
+ mkdir inside &&
+ mv .git inside/ &&
+ cd inside
+'
+
+# $1 = name of file
+# $2 = current path to file (if different)
+mkpatch_add () {
+ rm -f "${2:-$1}" &&
+ cat <<-EOF
+ diff --git a/$1 b/$1
+ new file mode 100644
+ index 0000000..53c74cd
+ --- /dev/null
+ +++ b/$1
+ @@ -0,0 +1 @@
+ +evil
+ EOF
+}
+
+mkpatch_del () {
+ echo evil >"${2:-$1}" &&
+ cat <<-EOF
+ diff --git a/$1 b/$1
+ deleted file mode 100644
+ index 53c74cd..0000000
+ --- a/$1
+ +++ /dev/null
+ @@ -1 +0,0 @@
+ -evil
+ EOF
+}
+
+# $1 = name of file
+# $2 = content of symlink
+mkpatch_symlink () {
+ rm -f "$1" &&
+ cat <<-EOF
+ diff --git a/$1 b/$1
+ new file mode 120000
+ index 0000000..$(printf "%s" "$2" | git hash-object --stdin)
+ --- /dev/null
+ +++ b/$1
+ @@ -0,0 +1 @@
+ +$2
+ \ No newline at end of file
+ EOF
+}
+
+test_expect_success 'cannot create file containing ..' '
+ mkpatch_add ../foo >patch &&
+ test_must_fail git apply patch &&
+ test_path_is_missing ../foo
+'
+
+test_expect_success 'can create file containing .. with --unsafe-paths' '
+ mkpatch_add ../foo >patch &&
+ git apply --unsafe-paths patch &&
+ test_path_is_file ../foo
+'
+
+test_expect_success 'cannot create file containing .. (index)' '
+ mkpatch_add ../foo >patch &&
+ test_must_fail git apply --index patch &&
+ test_path_is_missing ../foo
+'
+
+test_expect_success 'cannot create file containing .. with --unsafe-paths (index)' '
+ mkpatch_add ../foo >patch &&
+ test_must_fail git apply --index --unsafe-paths patch &&
+ test_path_is_missing ../foo
+'
+
+test_expect_success 'cannot delete file containing ..' '
+ mkpatch_del ../foo >patch &&
+ test_must_fail git apply patch &&
+ test_path_is_file ../foo
+'
+
+test_expect_success 'can delete file containing .. with --unsafe-paths' '
+ mkpatch_del ../foo >patch &&
+ git apply --unsafe-paths patch &&
+ test_path_is_missing ../foo
+'
+
+test_expect_success 'cannot delete file containing .. (index)' '
+ mkpatch_del ../foo >patch &&
+ test_must_fail git apply --index patch &&
+ test_path_is_file ../foo
+'
+
+test_expect_success SYMLINKS 'symlink escape via ..' '
+ {
+ mkpatch_symlink tmp .. &&
+ mkpatch_add tmp/foo ../foo
+ } >patch &&
+ test_must_fail git apply patch &&
+ test_path_is_missing tmp &&
+ test_path_is_missing ../foo
+'
+
+test_expect_success SYMLINKS 'symlink escape via .. (index)' '
+ {
+ mkpatch_symlink tmp .. &&
+ mkpatch_add tmp/foo ../foo
+ } >patch &&
+ test_must_fail git apply --index patch &&
+ test_path_is_missing tmp &&
+ test_path_is_missing ../foo
+'
+
+test_expect_success SYMLINKS 'symlink escape via absolute path' '
+ {
+ mkpatch_symlink tmp "$(pwd)" &&
+ mkpatch_add tmp/foo ../foo
+ } >patch &&
+ test_must_fail git apply patch &&
+ test_path_is_missing tmp &&
+ test_path_is_missing ../foo
+'
+
+test_expect_success SYMLINKS 'symlink escape via absolute path (index)' '
+ {
+ mkpatch_symlink tmp "$(pwd)" &&
+ mkpatch_add tmp/foo ../foo
+ } >patch &&
+ test_must_fail git apply --index patch &&
+ test_path_is_missing tmp &&
+ test_path_is_missing ../foo
+'
+
+test_done
diff --git a/t/t7415-submodule-names.sh b/t/t7415-submodule-names.sh
new file mode 100755
index 0000000..6456d5a
--- /dev/null
+++ b/t/t7415-submodule-names.sh
@@ -0,0 +1,154 @@
+#!/bin/sh
+
+test_description='check handling of .. in submodule names
+
+Exercise the name-checking function on a variety of names, and then give a
+real-world setup that confirms we catch this in practice.
+'
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-pack.sh
+
+test_expect_success 'check names' '
+ cat >expect <<-\EOF &&
+ valid
+ valid/with/paths
+ EOF
+
+ git submodule--helper check-name >actual <<-\EOF &&
+ valid
+ valid/with/paths
+
+ ../foo
+ /../foo
+ ..\foo
+ \..\foo
+ foo/..
+ foo/../
+ foo\..
+ foo\..\
+ foo/../bar
+ EOF
+
+ test_cmp expect actual
+'
+
+test_expect_success 'create innocent subrepo' '
+ git init innocent &&
+ ( cd innocent && git commit --allow-empty -m foo )
+'
+
+test_expect_success 'submodule add refuses invalid names' '
+ test_must_fail \
+ git submodule add --name ../../modules/evil "$PWD/innocent" evil
+'
+
+test_expect_success 'add evil submodule' '
+ git submodule add "$PWD/innocent" evil &&
+
+ mkdir modules &&
+ cp -r .git/modules/evil modules &&
+ write_script modules/evil/hooks/post-checkout <<-\EOF &&
+ echo >&2 "RUNNING POST CHECKOUT"
+ EOF
+
+ git config -f .gitmodules submodule.evil.update checkout &&
+ git config -f .gitmodules --rename-section \
+ submodule.evil submodule.../../modules/evil &&
+ git add modules &&
+ git commit -am evil
+'
+
+# This step seems like it shouldn't be necessary, since the payload is
+# contained entirely in the evil submodule. But due to the vagaries of the
+# submodule code, checking out the evil module will fail unless ".git/modules"
+# exists. Adding another submodule (with a name that sorts before "evil") is an
+# easy way to make sure this is the case in the victim clone.
+test_expect_success 'add other submodule' '
+ git submodule add "$PWD/innocent" another-module &&
+ git add another-module &&
+ git commit -am another
+'
+
+test_expect_success 'clone evil superproject' '
+ test_might_fail git clone --recurse-submodules . victim >output 2>&1 &&
+ cat output &&
+ ! grep "RUNNING POST CHECKOUT" output
+'
+
+test_expect_success 'fsck detects evil superproject' '
+ test_must_fail git fsck
+'
+
+test_expect_success 'transfer.fsckObjects detects evil superproject (unpack)' '
+ rm -rf dst.git &&
+ git init --bare dst.git &&
+ ( cd dst.git && git config transfer.fsckObjects true ) &&
+ test_must_fail git push dst.git HEAD
+'
+
+test_expect_success 'transfer.fsckObjects detects evil superproject (index)' '
+ rm -rf dst.git &&
+ git init --bare dst.git &&
+ ( cd dst.git && git config transfer.fsckObjects true &&
+ git config transfer.unpackLimit 1 ) &&
+ test_must_fail git push dst.git HEAD
+'
+
+# Normally our packs contain commits followed by trees followed by blobs. This
+# reverses the order, which requires backtracking to find the context of a
+# blob. We'll start with a fresh gitmodules-only tree to make it simpler.
+test_expect_success 'create oddly ordered pack' '
+ git checkout --orphan odd &&
+ git rm -rf --cached . &&
+ git add .gitmodules &&
+ git commit -m odd &&
+ {
+ pack_header 3 &&
+ pack_obj $(git rev-parse HEAD:.gitmodules) &&
+ pack_obj $(git rev-parse HEAD^{tree}) &&
+ pack_obj $(git rev-parse HEAD)
+ } >odd.pack &&
+ pack_trailer odd.pack
+'
+
+test_expect_success 'transfer.fsckObjects handles odd pack (unpack)' '
+ rm -rf dst.git &&
+ git init --bare dst.git &&
+ ( cd dst.git && test_must_fail git unpack-objects --strict ) <odd.pack
+'
+
+test_expect_success 'transfer.fsckObjects handles odd pack (index)' '
+ rm -rf dst.git &&
+ git init --bare dst.git &&
+ ( cd dst.git && test_must_fail git index-pack --strict --stdin ) <odd.pack
+'
+
+test_expect_success 'fsck detects symlinked .gitmodules file' '
+ git init symlink &&
+ (
+ cd symlink &&
+
+ # Make the tree directly to avoid index restrictions.
+ #
+ # Because symlinks store the target as a blob, choose
+ # a pathname that could be parsed as a .gitmodules file
+ # to trick naive non-symlink-aware checking.
+ tricky="[foo]bar=true" &&
+ content=$(git hash-object -w ../.gitmodules) &&
+ target=$(printf "$tricky" | git hash-object -w --stdin) &&
+ tree=$(
+ {
+ printf "100644 blob $content\t$tricky\n" &&
+ printf "120000 blob $target\t.gitmodules\n"
+ } | git mktree
+ ) &&
+ commit=$(git commit-tree $tree) &&
+
+ # Check not only that fsck fails, but that it is due to the
+ # symlink detector.
+ test_must_fail git fsck 2>output &&
+ test_i18ngrep "is a symbolic link" output
+ )
+'
+
+test_done
diff --git a/test-hashmap.c b/test-hashmap.c
new file mode 100644
index 0000000..1a7ac2c
--- /dev/null
+++ b/test-hashmap.c
@@ -0,0 +1,335 @@
+#include "cache.h"
+#include "hashmap.h"
+#include <stdio.h>
+
+struct test_entry
+{
+ struct hashmap_entry ent;
+ /* key and value as two \0-terminated strings */
+ char key[FLEX_ARRAY];
+};
+
+static const char *get_value(const struct test_entry *e)
+{
+ return e->key + strlen(e->key) + 1;
+}
+
+static int test_entry_cmp(const struct test_entry *e1,
+ const struct test_entry *e2, const char* key)
+{
+ return strcmp(e1->key, key ? key : e2->key);
+}
+
+static int test_entry_cmp_icase(const struct test_entry *e1,
+ const struct test_entry *e2, const char* key)
+{
+ return strcasecmp(e1->key, key ? key : e2->key);
+}
+
+static struct test_entry *alloc_test_entry(int hash, char *key, int klen,
+ char *value, int vlen)
+{
+ struct test_entry *entry = malloc(sizeof(struct test_entry) + klen
+ + vlen + 2);
+ hashmap_entry_init(entry, hash);
+ memcpy(entry->key, key, klen + 1);
+ memcpy(entry->key + klen + 1, value, vlen + 1);
+ return entry;
+}
+
+#define HASH_METHOD_FNV 0
+#define HASH_METHOD_I 1
+#define HASH_METHOD_IDIV10 2
+#define HASH_METHOD_0 3
+#define HASH_METHOD_X2 4
+#define TEST_SPARSE 8
+#define TEST_ADD 16
+#define TEST_SIZE 100000
+
+static unsigned int hash(unsigned int method, unsigned int i, const char *key)
+{
+ unsigned int hash;
+ switch (method & 3)
+ {
+ case HASH_METHOD_FNV:
+ hash = strhash(key);
+ break;
+ case HASH_METHOD_I:
+ hash = i;
+ break;
+ case HASH_METHOD_IDIV10:
+ hash = i / 10;
+ break;
+ case HASH_METHOD_0:
+ hash = 0;
+ break;
+ }
+
+ if (method & HASH_METHOD_X2)
+ hash = 2 * hash;
+ return hash;
+}
+
+/*
+ * Test performance of hashmap.[ch]
+ * Usage: time echo "perfhashmap method rounds" | test-hashmap
+ */
+static void perf_hashmap(unsigned int method, unsigned int rounds)
+{
+ struct hashmap map;
+ char buf[16];
+ struct test_entry **entries;
+ unsigned int *hashes;
+ unsigned int i, j;
+
+ entries = malloc(TEST_SIZE * sizeof(struct test_entry *));
+ hashes = malloc(TEST_SIZE * sizeof(int));
+ for (i = 0; i < TEST_SIZE; i++) {
+ snprintf(buf, sizeof(buf), "%i", i);
+ entries[i] = alloc_test_entry(0, buf, strlen(buf), "", 0);
+ hashes[i] = hash(method, i, entries[i]->key);
+ }
+
+ if (method & TEST_ADD) {
+ /* test adding to the map */
+ for (j = 0; j < rounds; j++) {
+ hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp, 0);
+
+ /* add entries */
+ for (i = 0; i < TEST_SIZE; i++) {
+ hashmap_entry_init(entries[i], hashes[i]);
+ hashmap_add(&map, entries[i]);
+ }
+
+ hashmap_free(&map, 0);
+ }
+ } else {
+ /* test map lookups */
+ hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp, 0);
+
+ /* fill the map (sparsely if specified) */
+ j = (method & TEST_SPARSE) ? TEST_SIZE / 10 : TEST_SIZE;
+ for (i = 0; i < j; i++) {
+ hashmap_entry_init(entries[i], hashes[i]);
+ hashmap_add(&map, entries[i]);
+ }
+
+ for (j = 0; j < rounds; j++) {
+ for (i = 0; i < TEST_SIZE; i++) {
+ hashmap_get_from_hash(&map, hashes[i],
+ entries[i]->key);
+ }
+ }
+
+ hashmap_free(&map, 0);
+ }
+}
+
+struct hash_entry
+{
+ struct hash_entry *next;
+ char key[FLEX_ARRAY];
+};
+
+/*
+ * Test performance of hash.[ch]
+ * Usage: time echo "perfhash method rounds" | test-hashmap
+ */
+static void perf_hash(unsigned int method, unsigned int rounds)
+{
+ struct hash_table map;
+ char buf[16];
+ struct hash_entry **entries, **res, *entry;
+ unsigned int *hashes;
+ unsigned int i, j;
+
+ entries = malloc(TEST_SIZE * sizeof(struct hash_entry *));
+ hashes = malloc(TEST_SIZE * sizeof(int));
+ for (i = 0; i < TEST_SIZE; i++) {
+ snprintf(buf, sizeof(buf), "%i", i);
+ entries[i] = malloc(sizeof(struct hash_entry) + strlen(buf) + 1);
+ strcpy(entries[i]->key, buf);
+ hashes[i] = hash(method, i, entries[i]->key);
+ }
+
+ if (method & TEST_ADD) {
+ /* test adding to the map */
+ for (j = 0; j < rounds; j++) {
+ init_hash(&map);
+
+ /* add entries */
+ for (i = 0; i < TEST_SIZE; i++) {
+ res = (struct hash_entry **) insert_hash(
+ hashes[i], entries[i], &map);
+ if (res) {
+ entries[i]->next = *res;
+ *res = entries[i];
+ } else {
+ entries[i]->next = NULL;
+ }
+ }
+
+ free_hash(&map);
+ }
+ } else {
+ /* test map lookups */
+ init_hash(&map);
+
+ /* fill the map (sparsely if specified) */
+ j = (method & TEST_SPARSE) ? TEST_SIZE / 10 : TEST_SIZE;
+ for (i = 0; i < j; i++) {
+ res = (struct hash_entry **) insert_hash(hashes[i],
+ entries[i], &map);
+ if (res) {
+ entries[i]->next = *res;
+ *res = entries[i];
+ } else {
+ entries[i]->next = NULL;
+ }
+ }
+
+ for (j = 0; j < rounds; j++) {
+ for (i = 0; i < TEST_SIZE; i++) {
+ entry = lookup_hash(hashes[i], &map);
+ while (entry) {
+ if (!strcmp(entries[i]->key, entry->key))
+ break;
+ entry = entry->next;
+ }
+ }
+ }
+
+ free_hash(&map);
+
+ }
+}
+
+#define DELIM " \t\r\n"
+
+/*
+ * Read stdin line by line and print result of commands to stdout:
+ *
+ * hash key -> strhash(key) memhash(key) strihash(key) memihash(key)
+ * put key value -> NULL / old value
+ * get key -> NULL / value
+ * remove key -> NULL / old value
+ * iterate -> key1 value1\nkey2 value2\n...
+ * size -> tablesize numentries
+ *
+ * perfhashmap method rounds -> test hashmap.[ch] performance
+ * perfhash method rounds -> test hash.[ch] performance
+ */
+int main(int argc, char *argv[])
+{
+ char line[1024];
+ struct hashmap map;
+ int icase;
+
+ /* init hash map */
+ icase = argc > 1 && !strcmp("ignorecase", argv[1]);
+ hashmap_init(&map, (hashmap_cmp_fn) (icase ? test_entry_cmp_icase
+ : test_entry_cmp), 0);
+
+ /* process commands from stdin */
+ while (fgets(line, sizeof(line), stdin)) {
+ char *cmd, *p1 = NULL, *p2 = NULL;
+ int l1 = 0, l2 = 0, hash = 0;
+ struct test_entry *entry;
+
+ /* break line into command and up to two parameters */
+ cmd = strtok(line, DELIM);
+ /* ignore empty lines */
+ if (!cmd || *cmd == '#')
+ continue;
+
+ p1 = strtok(NULL, DELIM);
+ if (p1) {
+ l1 = strlen(p1);
+ hash = icase ? strihash(p1) : strhash(p1);
+ p2 = strtok(NULL, DELIM);
+ if (p2)
+ l2 = strlen(p2);
+ }
+
+ if (!strcmp("hash", cmd) && l1) {
+
+ /* print results of different hash functions */
+ printf("%u %u %u %u\n", strhash(p1), memhash(p1, l1),
+ strihash(p1), memihash(p1, l1));
+
+ } else if (!strcmp("add", cmd) && l1 && l2) {
+
+ /* create entry with key = p1, value = p2 */
+ entry = alloc_test_entry(hash, p1, l1, p2, l2);
+
+ /* add to hashmap */
+ hashmap_add(&map, entry);
+
+ } else if (!strcmp("put", cmd) && l1 && l2) {
+
+ /* create entry with key = p1, value = p2 */
+ entry = alloc_test_entry(hash, p1, l1, p2, l2);
+
+ /* add / replace entry */
+ entry = hashmap_put(&map, entry);
+
+ /* print and free replaced entry, if any */
+ puts(entry ? get_value(entry) : "NULL");
+ free(entry);
+
+ } else if (!strcmp("get", cmd) && l1) {
+
+ /* lookup entry in hashmap */
+ entry = hashmap_get_from_hash(&map, hash, p1);
+
+ /* print result */
+ if (!entry)
+ puts("NULL");
+ while (entry) {
+ puts(get_value(entry));
+ entry = hashmap_get_next(&map, entry);
+ }
+
+ } else if (!strcmp("remove", cmd) && l1) {
+
+ /* setup static key */
+ struct hashmap_entry key;
+ hashmap_entry_init(&key, hash);
+
+ /* remove entry from hashmap */
+ entry = hashmap_remove(&map, &key, p1);
+
+ /* print result and free entry*/
+ puts(entry ? get_value(entry) : "NULL");
+ free(entry);
+
+ } else if (!strcmp("iterate", cmd)) {
+
+ struct hashmap_iter iter;
+ hashmap_iter_init(&map, &iter);
+ while ((entry = hashmap_iter_next(&iter)))
+ printf("%s %s\n", entry->key, get_value(entry));
+
+ } else if (!strcmp("size", cmd)) {
+
+ /* print table sizes */
+ printf("%u %u\n", map.tablesize, map.size);
+
+ } else if (!strcmp("perfhashmap", cmd) && l1 && l2) {
+
+ perf_hashmap(atoi(p1), atoi(p2));
+
+ } else if (!strcmp("perfhash", cmd) && l1 && l2) {
+
+ perf_hash(atoi(p1), atoi(p2));
+
+ } else {
+
+ printf("Unknown command %s\n", cmd);
+
+ }
+ }
+
+ hashmap_free(&map, 1);
+ return 0;
+}
--
2.14.4