diff --git a/SOURCES/git-cve-2018-11235.patch b/SOURCES/git-cve-2018-11235.patch new file mode 100644 index 0000000..e14d703 --- /dev/null +++ b/SOURCES/git-cve-2018-11235.patch @@ -0,0 +1,4983 @@ +From d5b68e9bb5d3bee62f98579022fe5e92fa5f60f0 Mon Sep 17 00:00:00 2001 +From: Pavel Cahyna +Date: Mon, 18 Jun 2018 13:58:25 +0200 +Subject: [PATCH] Squashed commit of the following: + +commit a1e311b306db9407e0bf83046dce50ef6a7f74bb +Author: Jeff King +Date: Mon Jan 16 16:24:03 2017 -0500 + + t1450: clean up sub-objects in duplicate-entry test + + This test creates a multi-level set of trees, but its + cleanup routine only removes the top-level tree. After the + test finishes, the inner tree and the blob it points to + remain, making the inner tree dangling. + + A later test ("cleaned up") verifies that we've removed any + cruft and "git fsck" output is clean. This passes only + because of a bug in git-fsck which fails to notice dangling + trees. + + In preparation for fixing the bug, let's teach this earlier + test to clean up after itself correctly. We have to remove + the inner tree (and therefore the blob, too, which becomes + dangling after removing that tree). + + Since the setup code happens inside a subshell, we can't + just set a variable for each object. However, we can stuff + all of the sha1s into the $T output variable, which is not + used for anything except cleanup. + + Signed-off-by: Jeff King + Signed-off-by: Junio C Hamano + +commit e71a6f0c8a80829017629d1ae595f6a887c4e844 +Author: Pavel Cahyna +Date: Fri Jun 15 11:49:09 2018 +0200 + + Adapt t7415-submodule-names.sh to git 1.8.3.1 : we don't have -C + +commit ba4d4fca832bf7c2ec224ada5243e950d8e32406 +Author: Jeff King +Date: Fri May 4 20:03:35 2018 -0400 + + fsck: complain when .gitmodules is a symlink + + commit b7b1fca175f1ed7933f361028c631b9ac86d868d upstream. + + We've recently forbidden .gitmodules to be a symlink in + verify_path(). And it's an easy way to circumvent our fsck + checks for .gitmodules content. So let's complain when we + see it. + + [jn: backported to 2.1.y: + - using error_func instead of report to report fsck errors + - until v2.6.2~7^2 (fsck: exit with non-zero when problems + are found, 2015-09-23), git fsck did not reliably use the + exit status to indicate errors; callers would have to + check stderr instead. Relaxed the test to permit that.] + + Signed-off-by: Jeff King + Signed-off-by: Jonathan Nieder + +commit d55fa465ac0c2d825f80e8ad0cbbca877812c0b1 +Author: Jeff King +Date: Fri May 4 19:45:01 2018 -0400 + + index-pack: check .gitmodules files with --strict + + commit 73c3f0f704a91b6792e0199a3f3ab6e3a1971675 upstream. + + Now that the internal fsck code has all of the plumbing we + need, we can start checking incoming .gitmodules files. + Naively, it seems like we would just need to add a call to + fsck_finish() after we've processed all of the objects. And + that would be enough to cover the initial test included + here. But there are two extra bits: + + 1. We currently don't bother calling fsck_object() at all + for blobs, since it has traditionally been a noop. We'd + actually catch these blobs in fsck_finish() at the end, + but it's more efficient to check them when we already + have the object loaded in memory. + + 2. The second pass done by fsck_finish() needs to access + the objects, but we're actually indexing the pack in + this process. In theory we could give the fsck code a + special callback for accessing the in-pack data, but + it's actually quite tricky: + + a. We don't have an internal efficient index mapping + oids to packfile offsets. We only generate it on + the fly as part of writing out the .idx file. + + b. We'd still have to reconstruct deltas, which means + we'd basically have to replicate all of the + reading logic in packfile.c. + + Instead, let's avoid running fsck_finish() until after + we've written out the .idx file, and then just add it + to our internal packed_git list. + + This does mean that the objects are "in the repository" + before we finish our fsck checks. But unpack-objects + already exhibits this same behavior, and it's an + acceptable tradeoff here for the same reason: the + quarantine mechanism means that pushes will be + fully protected. + + In addition to a basic push test in t7415, we add a sneaky + pack that reverses the usual object order in the pack, + requiring that index-pack access the tree and blob during + the "finish" step. + + This already works for unpack-objects (since it will have + written out loose objects), but we'll check it with this + sneaky pack for good measure. + + Signed-off-by: Jeff King + Signed-off-by: Jonathan Nieder + +commit 098925bcbeaa8aada335cdd48135efa83e3710d8 +Author: Jeff King +Date: Fri May 4 19:40:08 2018 -0400 + + unpack-objects: call fsck_finish() after fscking objects + + commit 6e328d6caef218db320978e3e251009135d87d0e upstream. + + As with the previous commit, we must call fsck's "finish" + function in order to catch any queued objects for + .gitmodules checks. + + This second pass will be able to access any incoming + objects, because we will have exploded them to loose objects + by now. + + This isn't quite ideal, because it means that bad objects + may have been written to the object database (and a + subsequent operation could then reference them, even if the + other side doesn't send the objects again). However, this is + sufficient when used with receive.fsckObjects, since those + loose objects will all be placed in a temporary quarantine + area that will get wiped if we find any problems. + + Signed-off-by: Jeff King + Signed-off-by: Jonathan Nieder + +commit 7f1b69637f1744e42e61fd18c7ac4dac75edd6fb +Author: Jeff King +Date: Wed May 2 17:20:35 2018 -0400 + + fsck: call fsck_finish() after fscking objects + + commit 1995b5e03e1cc97116be58cdc0502d4a23547856 upstream. + + Now that the internal fsck code is capable of checking + .gitmodules files, we just need to teach its callers to use + the "finish" function to check any queued objects. + + With this, we can now catch the malicious case in t7415 with + git-fsck. + + Signed-off-by: Jeff King + Signed-off-by: Jonathan Nieder + +commit e197e085e86c5ebb2eb6f4556f6e82e3bdf8afa6 +Author: Jeff King +Date: Wed May 2 17:25:27 2018 -0400 + + fsck: check .gitmodules content + + commit ed8b10f631c9a71df3351d46187bf7f3fa4f9b7e upstream. + + This patch detects and blocks submodule names which do not + match the policy set forth in submodule-config. These should + already be caught by the submodule code itself, but putting + the check here means that newer versions of Git can protect + older ones from malicious entries (e.g., a server with + receive.fsckObjects will block the objects, protecting + clients which fetch from it). + + As a side effect, this means fsck will also complain about + .gitmodules files that cannot be parsed (or were larger than + core.bigFileThreshold). + + [jn: backported by using git_config_from_buf instead of + git_config_from_mem for parsing and error_func instead of + report for reporting fsck errors] + + Signed-off-by: Jeff King + Signed-off-by: Jonathan Nieder + +commit ad298dfb0e89d05db3f17d9a0997f065c7ed96f5 +Author: Jeff King +Date: Wed May 2 17:20:08 2018 -0400 + + fsck: detect gitmodules files + + commit 159e7b080bfa5d34559467cacaa79df89a01afc0 upstream. + + In preparation for performing fsck checks on .gitmodules + files, this commit plumbs in the actual detection of the + files. Note that unlike most other fsck checks, this cannot + be a property of a single object: we must know that the + object is found at a ".gitmodules" path at the root tree of + a commit. + + Since the fsck code only sees one object at a time, we have + to mark the related objects to fit the puzzle together. When + we see a commit we mark its tree as a root tree, and when + we see a root tree with a .gitmodules file, we mark the + corresponding blob to be checked. + + In an ideal world, we'd check the objects in topological + order: commits followed by trees followed by blobs. In that + case we can avoid ever loading an object twice, since all + markings would be complete by the time we get to the marked + objects. And indeed, if we are checking a single packfile, + this is the order in which Git will generally write the + objects. But we can't count on that: + + 1. git-fsck may show us the objects in arbitrary order + (loose objects are fed in sha1 order, but we may also + have multiple packs, and we process each pack fully in + sequence). + + 2. The type ordering is just what git-pack-objects happens + to write now. The pack format does not require a + specific order, and it's possible that future versions + of Git (or a custom version trying to fool official + Git's fsck checks!) may order it differently. + + 3. We may not even be fscking all of the relevant objects + at once. Consider pushing with transfer.fsckObjects, + where one push adds a blob at path "foo", and then a + second push adds the same blob at path ".gitmodules". + The blob is not part of the second push at all, but we + need to mark and check it. + + So in the general case, we need to make up to three passes + over the objects: once to make sure we've seen all commits, + then once to cover any trees we might have missed, and then + a final pass to cover any .gitmodules blobs we found in the + second pass. + + We can simplify things a bit by loosening the requirement + that we find .gitmodules only at root trees. Technically + a file like "subdir/.gitmodules" is not parsed by Git, but + it's not unreasonable for us to declare that Git is aware of + all ".gitmodules" files and make them eligible for checking. + That lets us drop the root-tree requirement, which + eliminates one pass entirely. And it makes our worst case + much better: instead of potentially queueing every root tree + to be re-examined, the worst case is that we queue each + unique .gitmodules blob for a second look. + + This patch just adds the boilerplate to find .gitmodules + files. The actual content checks will come in a subsequent + commit. + + [jn: backported to 2.1.y: + - using error_func instead of report to report fsck errors + - using sha1s instead of struct object_id + - using "struct hashmap" directly since "struct oidset" isn't + available] + + Signed-off-by: Jeff King + Signed-off-by: Jonathan Nieder + +commit f8c9e1358806fc649c48f0a80ea69c0c2f7c9ef2 +Author: Karsten Blees +Date: Thu Jul 3 00:22:11 2014 +0200 + + hashmap: add simplified hashmap_get_from_hash() API + + Hashmap entries are typically looked up by just a key. The hashmap_get() + API expects an initialized entry structure instead, to support compound + keys. This flexibility is currently only needed by find_dir_entry() in + name-hash.c (and compat/win32/fscache.c in the msysgit fork). All other + (currently five) call sites of hashmap_get() have to set up a near emtpy + entry structure, resulting in duplicate code like this: + + struct hashmap_entry keyentry; + hashmap_entry_init(&keyentry, hash(key)); + return hashmap_get(map, &keyentry, key); + + Add a hashmap_get_from_hash() API that allows hashmap lookups by just + specifying the key and its hash code, i.e.: + + return hashmap_get_from_hash(map, hash(key), key); + + Signed-off-by: Karsten Blees + Signed-off-by: Junio C Hamano + +commit 4c8e18c35a61140fa4cae20c66b8783a677c58cc +Author: Karsten Blees +Date: Thu Jul 3 00:20:20 2014 +0200 + + hashmap: factor out getting a hash code from a SHA1 + + Copying the first bytes of a SHA1 is duplicated in six places, + however, the implications (the actual value would depend on the + endianness of the platform) is documented only once. + + Add a properly documented API for this. + + [Dropping non-hashmap.[ch] portions of this patch, as a prepreq for + 159e7b080bfa5d34559467cacaa79df89a01afc0 "fsck: detect gitmodules + files" which uses the hashmap implementation. --sbeattie] + + Signed-off-by: Karsten Blees + Signed-off-by: Junio C Hamano + +commit b3f33dc3b1da719027c946b748ad959ea03cb605 +Author: Karsten Blees +Date: Wed Dec 18 14:41:27 2013 +0100 + + hashmap.h: use 'unsigned int' for hash-codes everywhere + + Signed-off-by: Karsten Blees + Signed-off-by: Junio C Hamano + +commit 53b42170beb29fab2378db8fca1455a825329eef +Author: Karsten Blees +Date: Thu Nov 14 20:17:54 2013 +0100 + + add a hashtable implementation that supports O(1) removal + + The existing hashtable implementation (in hash.[ch]) uses open addressing + (i.e. resolve hash collisions by distributing entries across the table). + Thus, removal is difficult to implement with less than O(n) complexity. + Resolving collisions of entries with identical hashes (e.g. via chaining) + is left to the client code. + + Add a hashtable implementation that supports O(1) removal and is slightly + easier to use due to builtin entry chaining. + + Supports all basic operations init, free, get, add, remove and iteration. + + Also includes ready-to-use hash functions based on the public domain FNV-1 + algorithm (http://www.isthe.com/chongo/tech/comp/fnv). + + The per-entry data structure (hashmap_entry) is piggybacked in front of + the client's data structure to save memory. See test-hashmap.c for usage + examples. + + The hashtable is resized by a factor of four when 80% full. With these + settings, average memory consumption is about 2/3 of hash.[ch], and + insertion is about twice as fast due to less frequent resizing. + + Lookups are also slightly faster, because entries are strictly confined to + their bucket (i.e. no data of other buckets needs to be traversed). + + Signed-off-by: Karsten Blees + Signed-off-by: Junio C Hamano + +commit 726f4913b8040c3b92a70f1c84e53c1f9bce3b8e +Author: Jeff King +Date: Wed May 2 15:44:51 2018 -0400 + + fsck: actually fsck blob data + + commit 7ac4f3a007e2567f9d2492806186aa063f9a08d6 upstream. + + Because fscking a blob has always been a noop, we didn't + bother passing around the blob data. In preparation for + content-level checks, let's fix up a few things: + + 1. The fsck_object() function just returns success for any + blob. Let's a noop fsck_blob(), which we can fill in + with actual logic later. + + 2. The fsck_loose() function in builtin/fsck.c + just threw away blob content after loading it. Let's + hold onto it until after we've called fsck_object(). + + The easiest way to do this is to just drop the + parse_loose_object() helper entirely. Incidentally, + this also fixes a memory leak: if we successfully + loaded the object data but did not parse it, we would + have left the function without freeing it. + + 3. When fsck_loose() loads the object data, it + does so with a custom read_loose_object() helper. This + function streams any blobs, regardless of size, under + the assumption that we're only checking the sha1. + + Instead, let's actually load blobs smaller than + big_file_threshold, as the normal object-reading + code-paths would do. This lets us fsck small files, and + a NULL return is an indication that the blob was so big + that it needed to be streamed, and we can pass that + information along to fsck_blob(). + + Signed-off-by: Jeff King + Signed-off-by: Jonathan Nieder + +commit e2ba4c21e6effb57fb8b123fb7afec8d5de69959 +Author: Johannes Schindelin +Date: Wed Sep 10 15:52:51 2014 +0200 + + fsck_object(): allow passing object data separately from the object itself + + commits 90a398bbd72477d5d228818db5665fdfcf13431b and + 4d0d89755e82c40df88cf94d84031978f8eac827 upstream. + + When fsck'ing an incoming pack, we need to fsck objects that cannot be + read via read_sha1_file() because they are not local yet (and might even + be rejected if transfer.fsckobjects is set to 'true'). + + For commits, there is a hack in place: we basically cache commit + objects' buffers anyway, but the same is not true, say, for tag objects. + + By refactoring fsck_object() to take the object buffer and size as + optional arguments -- optional, because we still fall back to the + previous method to look at the cached commit objects if the caller + passes NULL -- we prepare the machinery for the upcoming handling of tag + objects. + + The assumption that such buffers are inherently NUL terminated is now + wrong, so make sure that there is at least an empty line in the buffer. + That way, our checks would fail if the empty line was encountered + prematurely, and consequently we can get away with the current string + comparisons even with non-NUL-terminated buffers are passed to + fsck_object(). + + Signed-off-by: Johannes Schindelin + Signed-off-by: Junio C Hamano + Signed-off-by: Jonathan Nieder + +commit cfaa863b28379e5fb3f1acb871357a5ec85d4844 +Author: Jeff King +Date: Wed May 2 16:37:09 2018 -0400 + + index-pack: make fsck error message more specific + + commit db5a58c1bda5b20169b9958af1e8b05ddd178b01 upstream. + + If fsck reports an error, we say only "Error in object". + This isn't quite as bad as it might seem, since the fsck + code would have dumped some errors to stderr already. But it + might help to give a little more context. The earlier output + would not have even mentioned "fsck", and that may be a clue + that the "fsck.*" or "*.fsckObjects" config may be relevant. + + Signed-off-by: Jeff King + Signed-off-by: Jonathan Nieder + +commit a828c426b337f0e519bc80acad5b29616458333e +Author: Jeff King +Date: Fri Jan 13 12:59:44 2017 -0500 + + fsck: parse loose object paths directly + + commit c68b489e56431cf27f7719913ab09ddc62f95912 upstream. + + When we iterate over the list of loose objects to check, we + get the actual path of each object. But we then throw it + away and pass just the sha1 to fsck_sha1(), which will do a + fresh lookup. Usually it would find the same object, but it + may not if an object exists both as a loose and a packed + object. We may end up checking the packed object twice, and + never look at the loose one. + + In practice this isn't too terrible, because if fsck doesn't + complain, it means you have at least one good copy. But + since the point of fsck is to look for corruption, we should + be thorough. + + The new read_loose_object() interface can help us get the + data from disk, and then we replace parse_object() with + parse_object_buffer(). As a bonus, our error messages now + mention the path to a corrupted object, which should make it + easier to track down errors when they do happen. + + [jn: backported by passing path through the call chain to + fsck_loose, since until v2.7.0-rc0~68^2~4 (fsck: use + for_each_loose_file_in_objdir, 2015-09-24) it is not + available there] + + Signed-off-by: Jeff King + Signed-off-by: Junio C Hamano + Signed-off-by: Jonathan Nieder + +commit 902125f4a8ed87ed9d9df3dd6af963e828bf4f79 +Author: Jeff King +Date: Thu Sep 24 17:08:28 2015 -0400 + + fsck: drop inode-sorting code + + commit 144e4cf7092ee8cff44e9c7600aaa7515ad6a78f upstream. + + Fsck tries to access loose objects in order of inode number, + with the hope that this would make cold cache access faster + on a spinning disk. This dates back to 7e8c174 (fsck-cache: + sort entries by inode number, 2005-05-02), which predates + the invention of packfiles. + + These days, there's not much point in trying to optimize + cold cache for a large number of loose objects. You are much + better off to simply pack the objects, which will reduce the + disk footprint _and_ provide better locality of data access. + + So while you can certainly construct pathological cases + where this code might help, it is not worth the trouble + anymore. + + [backport to 1.9.x> include t1450-fsck.sh changes from commit + 2e770fe47ef9c0b20bc687e37f3eb50f1bf919d0 as the change propogates + returning failure for git fsck. --sbeattie] + + Signed-off-by: Jeff King + Signed-off-by: Junio C Hamano + Signed-off-by: Jonathan Nieder + +commit e2bf202e0f46789ebec25ee8584455ac4b9e8ee6 +Author: Jeff King +Date: Fri Jan 13 12:58:16 2017 -0500 + + sha1_file: add read_loose_object() function + + commit f6371f9210418f1beabc85b097e2a3470aeeb54d upstream. + + It's surprisingly hard to ask the sha1_file code to open a + _specific_ incarnation of a loose object. Most of the + functions take a sha1, and loop over the various object + types (packed versus loose) and locations (local versus + alternates) at a low level. + + However, some tools like fsck need to look at a specific + file. This patch gives them a function they can use to open + the loose object at a given path. + + The implementation unfortunately ends up repeating bits of + related functions, but there's not a good way around it + without some major refactoring of the whole sha1_file stack. + We need to mmap the specific file, then partially read the + zlib stream to know whether we're streaming or not, and then + finally either stream it or copy the data to a buffer. + + We can do that by assembling some of the more arcane + internal sha1_file functions, but we end up having to + essentially reimplement unpack_sha1_file(), along with the + streaming bits of check_sha1_signature(). + + Still, most of the ugliness is contained in the new + function, and the interface is clean enough that it may be + reusable (though it seems unlikely anything but git-fsck + would care about opening a specific file). + + Signed-off-by: Jeff King + Signed-off-by: Junio C Hamano + Signed-off-by: Jonathan Nieder + +commit 15508d17089bef577fe32e4ae031f24bf6274f0a +Author: Jeff King +Date: Fri May 4 20:03:35 2018 -0400 + + verify_path: disallow symlinks in .gitmodules + + commit 10ecfa76491e4923988337b2e2243b05376b40de upstream. + + There are a few reasons it's not a good idea to make + .gitmodules a symlink, including: + + 1. It won't be portable to systems without symlinks. + + 2. It may behave inconsistently, since Git may look at + this file in the index or a tree without bothering to + resolve any symbolic links. We don't do this _yet_, but + the config infrastructure is there and it's planned for + the future. + + With some clever code, we could make (2) work. And some + people may not care about (1) if they only work on one + platform. But there are a few security reasons to simply + disallow it: + + a. A symlinked .gitmodules file may circumvent any fsck + checks of the content. + + b. Git may read and write from the on-disk file without + sanity checking the symlink target. So for example, if + you link ".gitmodules" to "../oops" and run "git + submodule add", we'll write to the file "oops" outside + the repository. + + Again, both of those are problems that _could_ be solved + with sufficient code, but given the complications in (1) and + (2), we're better off just outlawing it explicitly. + + Note the slightly tricky call to verify_path() in + update-index's update_one(). There we may not have a mode if + we're not updating from the filesystem (e.g., we might just + be removing the file). Passing "0" as the mode there works + fine; since it's not a symlink, we'll just skip the extra + checks. + + Signed-off-by: Jeff King + Signed-off-by: Jonathan Nieder + +commit 38b09a0a75948ecda0ae5dd1161498f7d09a957f +Author: Jeff King +Date: Mon May 14 11:00:56 2018 -0400 + + update-index: stat updated files earlier + + commit eb12dd0c764d2b71bebd5ffffb7379a3835253ae upstream. + + In the update_one(), we check verify_path() on the proposed + path before doing anything else. In preparation for having + verify_path() look at the file mode, let's stat the file + earlier, so we can check the mode accurately. + + This is made a bit trickier by the fact that this function + only does an lstat in a few code paths (the ones that flow + down through process_path()). So we can speculatively do the + lstat() here and pass the results down, and just use a dummy + mode for cases where we won't actually be updating the index + from the filesystem. + + Signed-off-by: Jeff King + Signed-off-by: Jonathan Nieder + +commit 5d0dff51f2ffd04b64532ac391fcc6d3193f4a9c +Author: Jeff King +Date: Tue May 15 09:56:50 2018 -0400 + + verify_dotfile: mention case-insensitivity in comment + + commit 641084b618ddbe099f0992161988c3e479ae848b upstream. + + We're more restrictive than we need to be in matching ".GIT" + on case-sensitive filesystems; let's make a note that this + is intentional. + + Signed-off-by: Jeff King + Signed-off-by: Jonathan Nieder + +commit 887ef4437acc375084de58b342e0e46e29610725 +Author: Jeff King +Date: Sun May 13 13:00:23 2018 -0400 + + verify_path: drop clever fallthrough + + commit e19e5e66d691bdeeeb5e0ed2ffcecdd7666b0d7b upstream. + + We check ".git" and ".." in the same switch statement, and + fall through the cases to share the end-of-component check. + While this saves us a line or two, it makes modifying the + function much harder. Let's just write it out. + + Signed-off-by: Jeff King + Signed-off-by: Jonathan Nieder + +commit b80dad3b02a1a108af37788f31ee418faed7595c +Author: Jeff King +Date: Sun May 13 12:57:14 2018 -0400 + + skip_prefix: add case-insensitive variant + + commit 41a80924aec0e94309786837b6f954a3b3f19b71 upstream. + + We have the convenient skip_prefix() helper, but if you want + to do case-insensitive matching, you're stuck doing it by + hand. We could add an extra parameter to the function to + let callers ask for this, but the function is small and + somewhat performance-critical. Let's just re-implement it + for the case-insensitive version. + + Signed-off-by: Jeff King + Signed-off-by: Jonathan Nieder + +commit 5457d4cbd138a9642115f4449f42ab9317c5f1af +Author: Jeff King +Date: Mon Apr 30 03:25:25 2018 -0400 + + submodule-config: verify submodule names as paths + + commit 0383bbb9015898cbc79abd7b64316484d7713b44 upstream. + + Submodule "names" come from the untrusted .gitmodules file, + but we blindly append them to $GIT_DIR/modules to create our + on-disk repo paths. This means you can do bad things by + putting "../" into the name (among other things). + + Let's sanity-check these names to avoid building a path that + can be exploited. There are two main decisions: + + 1. What should the allowed syntax be? + + It's tempting to reuse verify_path(), since submodule + names typically come from in-repo paths. But there are + two reasons not to: + + a. It's technically more strict than what we need, as + we really care only about breaking out of the + $GIT_DIR/modules/ hierarchy. E.g., having a + submodule named "foo/.git" isn't actually + dangerous, and it's possible that somebody has + manually given such a funny name. + + b. Since we'll eventually use this checking logic in + fsck to prevent downstream repositories, it should + be consistent across platforms. Because + verify_path() relies on is_dir_sep(), it wouldn't + block "foo\..\bar" on a non-Windows machine. + + 2. Where should we enforce it? These days most of the + .gitmodules reads go through submodule-config.c, so + I've put it there in the reading step. That should + cover all of the C code. + + We also construct the name for "git submodule add" + inside the git-submodule.sh script. This is probably + not a big deal for security since the name is coming + from the user anyway, but it would be polite to remind + them if the name they pick is invalid (and we need to + expose the name-checker to the shell anyway for our + test scripts). + + This patch issues a warning when reading .gitmodules + and just ignores the related config entry completely. + This will generally end up producing a sensible error, + as it works the same as a .gitmodules file which is + missing a submodule entry (so "submodule update" will + barf, but "git clone --recurse-submodules" will print + an error but not abort the clone. + + There is one minor oddity, which is that we print the + warning once per malformed config key (since that's how + the config subsystem gives us the entries). So in the + new test, for example, the user would see three + warnings. That's OK, since the intent is that this case + should never come up outside of malicious repositories + (and then it might even benefit the user to see the + message multiple times). + + Credit for finding this vulnerability and the proof of + concept from which the test script was adapted goes to + Etienne Stalmans. + + [jn: backported to 2.1.y: + - adding a skeletal git submodule--helper command to house + the new check-name subcommand. The full submodule--helper + was not introduced until v2.7.0-rc0~136^2~2 (submodule: + rewrite `module_list` shell function in C, 2015-09-02). + - calling 'git submodule--helper check-name' to validate + submodule names in git-submodule.sh::module_name(). That + shell function was rewritten in C in v2.7.0-rc0~136^2~1 + (submodule: rewrite `module_name` shell function in C, + 2015-09-02). + - propagating the error from module_name in cmd_foreach. + Without that change, the script passed to 'git submodule + foreach' would see an empty $name for submodules with + invalid name. The same bug still exists in v2.17.1. + - ported the checks in C from the submodule-config API + introduced in v2.6.0-rc0~24^2~3 (submodule: implement a + config API for lookup of .gitmodules values, 2015-08-17) + to the older submodule API. + - the original patch expects 'git clone' to succeed in the + test because v2.13.0-rc0~10^2~3 (clone: teach + --recurse-submodules to optionally take a pathspec, + 2017-03-17) makes 'git clone' skip invalid submodules. + Updated the test to pass in older Git versions where the + submodule name check makes 'git clone' fail.] + + Signed-off-by: Jeff King + Signed-off-by: Jonathan Nieder + +commit eca4704a95f052e903b18c4fddf378104741fcbc +Author: Junio C Hamano +Date: Thu Jan 29 12:41:22 2015 -0800 + + apply: do not touch a file beyond a symbolic link + + commit e0d201b61601e17e24ed00cc3d16e8e25ca68596 upstream. + + Because Git tracks symbolic links as symbolic links, a path that + has a symbolic link in its leading part (e.g. path/to/dir/file, + where path/to/dir is a symbolic link to somewhere else, be it + inside or outside the working tree) can never appear in a patch + that validly applies, unless the same patch first removes the + symbolic link to allow a directory to be created there. + + Detect and reject such a patch. + + Things to note: + + - Unfortunately, we cannot reuse the has_symlink_leading_path() + from dir.c, as that is only about the working tree, but "git + apply" can be told to apply the patch only to the index or to + both the index and to the working tree. + + - We cannot directly use has_symlink_leading_path() even when we + are applying only to the working tree, as an early patch of a + valid input may remove a symbolic link path/to/dir and then a + later patch of the input may create a path path/to/dir/file, but + "git apply" first checks the input without touching either the + index or the working tree. The leading symbolic link check must + be done on the interim result we compute in-core (i.e. after the + first patch, there is no path/to/dir symbolic link and it is + perfectly valid to create path/to/dir/file). + + Similarly, when an input creates a symbolic link path/to/dir and + then creates a file path/to/dir/file, we need to flag it as an + error without actually creating path/to/dir symbolic link in the + filesystem. + + Instead, for any patch in the input that leaves a path (i.e. a non + deletion) in the result, we check all leading paths against the + resulting tree that the patch would create by inspecting all the + patches in the input and then the target of patch application + (either the index or the working tree). + + This way, we catch a mischief or a mistake to add a symbolic link + path/to/dir and a file path/to/dir/file at the same time, while + allowing a valid patch that removes a symbolic link path/to/dir and + then adds a file path/to/dir/file. + + Signed-off-by: Junio C Hamano + Signed-off-by: Jonathan Nieder + +commit 1b375180858c0fa786fe579a713df1f12728f1dc +Author: Junio C Hamano +Date: Fri Jan 30 15:34:13 2015 -0800 + + apply: do not read from beyond a symbolic link + + commit fdc2c3a926c21e24986677abd02c8bc568a5de32 upstream. + + We should reject a patch, whether it renames/copies dir/file to + elsewhere with or without modificiation, or updates dir/file in + place, if "dir/" part is actually a symbolic link to elsewhere, + by making sure that the code to read the preimage does not read + from a path that is beyond a symbolic link. + + Signed-off-by: Junio C Hamano + Signed-off-by: Jonathan Nieder + +commit 6e9f81cf425af97da742914abf877f2dbc0a650f +Author: Junio C Hamano +Date: Fri Jan 30 15:15:59 2015 -0800 + + apply: do not read from the filesystem under --index + + commit 3c37a2e339e695c7cc41048fe0921cbc8b48b0f0 upstream. + + We currently read the preimage to apply a patch from the index only + when the --cached option is given. Do so also when the command is + running under the --index option. With --index, the index entry and + the working tree file for a path that is involved in a patch must be + identical, so this should not affect the result, but by reading from + the index, we will get the protection to avoid reading an unintended + path beyond a symbolic link automatically. + + Signed-off-by: Junio C Hamano + Signed-off-by: Jonathan Nieder + +commit 714bced2aa3ccc8b762f79c08443b246d2440402 +Author: Junio C Hamano +Date: Thu Jan 29 15:35:24 2015 -0800 + + apply: reject input that touches outside the working area + + commit c536c0755f6450b7bcce499cfda171f8c6d1e593 upstream. + + By default, a patch that affects outside the working area (either a + Git controlled working tree, or the current working directory when + "git apply" is used as a replacement of GNU patch) is rejected as a + mistake (or a mischief). Git itself does not create such a patch, + unless the user bends over backwards and specifies a non-standard + prefix to "git diff" and friends. + + When `git apply` is used as a "better GNU patch", the user can pass + the `--unsafe-paths` option to override this safety check. This + option has no effect when `--index` or `--cached` is in use. + + The new test was stolen from Jeff King with slight enhancements. + Note that a few new tests for touching outside the working area by + following a symbolic link are still expected to fail at this step, + but will be fixed in later steps. + + Signed-off-by: Junio C Hamano + Signed-off-by: Jonathan Nieder + +commit be3b8a3011560ead1a87da813b6e25ece3cf94b6 +Author: Jeff King +Date: Mon Aug 26 17:57:18 2013 -0400 + + config: do not use C function names as struct members + + According to C99, section 7.1.4: + + Any function declared in a header may be additionally + implemented as a function-like macro defined in the + header. + + Therefore calling our struct member function pointer "fgetc" + may run afoul of unwanted macro expansion when we call: + + char c = cf->fgetc(cf); + + This turned out to be a problem on uclibc, which defines + fgetc as a macro and causes compilation failure. + + The standard suggests fixing this in a few ways: + + 1. Using extra parentheses to inhibit the function-like + macro expansion. E.g., "(cf->fgetc)(cf)". This is + undesirable as it's ugly, and each call site needs to + remember to use it (and on systems without the macro, + forgetting will compile just fine). + + 2. Using #undef (because a conforming implementation must + also be providing fgetc as a function). This is + undesirable because presumably the implementation was + using the macro for a performance benefit, and we are + dropping that optimization. + + Instead, we can simply use non-colliding names. + + Signed-off-by: Jeff King + Signed-off-by: Junio C Hamano + +commit eb937a70aaded50b0f2d194adb2917881bda129b +Author: Heiko Voigt +Date: Fri Jul 12 00:48:30 2013 +0200 + + do not die when error in config parsing of buf occurs + + If a config parsing error in a file occurs we can die and let the user + fix the issue. This is different for the buf parsing function since it + can be used to parse blobs of .gitmodules files. If a parsing error + occurs here we should proceed since otherwise a database containing such + an error in a single revision could be rendered unusable. + + Signed-off-by: Heiko Voigt + Acked-by: Jeff King + Signed-off-by: Junio C Hamano + +commit 476b7567d81fffb6d5f84d26acfa3e44df67c4e1 +Author: Heiko Voigt +Date: Fri Jul 12 00:46:47 2013 +0200 + + teach config --blob option to parse config from database + + This can be used to read configuration values directly from git's + database. For example it is useful for reading to be checked out + .gitmodules files directly from the database. + + Signed-off-by: Heiko Voigt + Acked-by: Jeff King + Signed-off-by: Junio C Hamano + +commit db9d2efbf64725c4891db20a571623c846033d70 +Author: Heiko Voigt +Date: Fri Jul 12 00:44:39 2013 +0200 + + config: make parsing stack struct independent from actual data source + + To simplify adding other sources we extract all functions needed for + parsing into a list of callbacks. We implement those callbacks for the + current file parsing. A new source can implement its own set of callbacks. + + Instead of storing the concrete FILE pointer for parsing we store a void + pointer. A new source can use this to store its custom data. + + Signed-off-by: Heiko Voigt + Acked-by: Jeff King + Signed-off-by: Junio C Hamano + +commit 345dcdb95bc8e96ded9a0d0da7c9777de7bb9290 +Author: Heiko Voigt +Date: Sat May 11 15:19:29 2013 +0200 + + config: drop cf validity check in get_next_char() + + The global variable cf is set with an initialized value in all codepaths before + calling this function. + + The complete call graph looks like this: + + git_config_from_file + -> do_config_from + -> git_parse_file + -> get_next_char + -> get_value + -> get_next_char + -> parse_value + -> get_next_char + -> get_base_var + -> get_next_char + -> get_extended_base_var + -> get_next_char + + The variable is initialized in do_config_from. + + Signed-off-by: Heiko Voigt + Acked-by: Jeff King + Signed-off-by: Junio C Hamano + +commit 30bacff11067cd4d42e4d57ebe691c8f8eb1e570 +Author: Heiko Voigt +Date: Sat May 11 15:18:52 2013 +0200 + + config: factor out config file stack management + + Because a config callback may start parsing a new file, the + global context regarding the current config file is stored + as a stack. Currently we only need to manage that stack from + git_config_from_file. Let's factor it out to allow new + sources of config data. + + Signed-off-by: Heiko Voigt + Acked-by: Jeff King + Signed-off-by: Junio C Hamano +--- + .gitignore | 1 + + Documentation/git-apply.txt | 12 +- + Documentation/git-config.txt | 7 + + Documentation/technical/api-hashmap.txt | 249 ++++++++++++++++++++++++ + Makefile | 4 + + builtin.h | 1 + + builtin/apply.c | 142 +++++++++++++- + builtin/config.c | 31 ++- + builtin/fsck.c | 165 +++++++--------- + builtin/index-pack.c | 13 +- + builtin/submodule--helper.c | 35 ++++ + builtin/unpack-objects.c | 21 +- + builtin/update-index.c | 31 +-- + cache.h | 21 +- + config.c | 217 ++++++++++++++++----- + fsck.c | 190 +++++++++++++++++- + fsck.h | 11 +- + git-compat-util.h | 17 ++ + git-submodule.sh | 21 +- + git.c | 1 + + hashmap.c | 228 ++++++++++++++++++++++ + hashmap.h | 90 +++++++++ + read-cache.c | 30 ++- + sha1_file.c | 132 ++++++++++++- + submodule.c | 29 +++ + submodule.h | 7 + + t/lib-pack.sh | 110 +++++++++++ + t/t0011-hashmap.sh | 240 +++++++++++++++++++++++ + t/t1307-config-blob.sh | 70 +++++++ + t/t1450-fsck.sh | 48 ++++- + t/t4122-apply-symlink-inside.sh | 106 ++++++++++ + t/t4139-apply-escape.sh | 141 ++++++++++++++ + t/t7415-submodule-names.sh | 154 +++++++++++++++ + test-hashmap.c | 335 ++++++++++++++++++++++++++++++++ + 34 files changed, 2716 insertions(+), 194 deletions(-) + create mode 100644 Documentation/technical/api-hashmap.txt + create mode 100644 builtin/submodule--helper.c + create mode 100644 hashmap.c + create mode 100644 hashmap.h + create mode 100644 t/lib-pack.sh + create mode 100755 t/t0011-hashmap.sh + create mode 100755 t/t1307-config-blob.sh + create mode 100755 t/t4139-apply-escape.sh + create mode 100755 t/t7415-submodule-names.sh + create mode 100644 test-hashmap.c + +diff --git a/.gitignore b/.gitignore +index 6669bf0..92b0483 100644 +--- a/.gitignore ++++ b/.gitignore +@@ -154,6 +154,7 @@ + /git-status + /git-stripspace + /git-submodule ++/git-submodule--helper + /git-svn + /git-symbolic-ref + /git-tag +diff --git a/Documentation/git-apply.txt b/Documentation/git-apply.txt +index f605327..9489664 100644 +--- a/Documentation/git-apply.txt ++++ b/Documentation/git-apply.txt +@@ -16,7 +16,7 @@ SYNOPSIS + [--ignore-space-change | --ignore-whitespace ] + [--whitespace=(nowarn|warn|fix|error|error-all)] + [--exclude=] [--include=] [--directory=] +- [--verbose] [...] ++ [--verbose] [--unsafe-paths] [...] + + DESCRIPTION + ----------- +@@ -229,6 +229,16 @@ For example, a patch that talks about updating `a/git-gui.sh` to `b/git-gui.sh` + can be applied to the file in the working tree `modules/git-gui/git-gui.sh` by + running `git apply --directory=modules/git-gui`. + ++--unsafe-paths:: ++ By default, a patch that affects outside the working area ++ (either a Git controlled working tree, or the current working ++ directory when "git apply" is used as a replacement of GNU ++ patch) is rejected as a mistake (or a mischief). +++ ++When `git apply` is used as a "better GNU patch", the user can pass ++the `--unsafe-paths` option to override this safety check. This option ++has no effect when `--index` or `--cached` is in use. ++ + Configuration + ------------- + +diff --git a/Documentation/git-config.txt b/Documentation/git-config.txt +index d88a6fc..3a4ed10 100644 +--- a/Documentation/git-config.txt ++++ b/Documentation/git-config.txt +@@ -118,6 +118,13 @@ See also <>. + --file config-file:: + Use the given config file instead of the one specified by GIT_CONFIG. + ++--blob blob:: ++ Similar to '--file' but use the given blob instead of a file. E.g. ++ you can use 'master:.gitmodules' to read values from the file ++ '.gitmodules' in the master branch. See "SPECIFYING REVISIONS" ++ section in linkgit:gitrevisions[7] for a more complete list of ++ ways to spell blob names. ++ + --remove-section:: + Remove the given section from the configuration file. + +diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt +new file mode 100644 +index 0000000..0249b50 +--- /dev/null ++++ b/Documentation/technical/api-hashmap.txt +@@ -0,0 +1,249 @@ ++hashmap API ++=========== ++ ++The hashmap API is a generic implementation of hash-based key-value mappings. ++ ++Data Structures ++--------------- ++ ++`struct hashmap`:: ++ ++ The hash table structure. +++ ++The `size` member keeps track of the total number of entries. The `cmpfn` ++member is a function used to compare two entries for equality. The `table` and ++`tablesize` members store the hash table and its size, respectively. ++ ++`struct hashmap_entry`:: ++ ++ An opaque structure representing an entry in the hash table, which must ++ be used as first member of user data structures. Ideally it should be ++ followed by an int-sized member to prevent unused memory on 64-bit ++ systems due to alignment. +++ ++The `hash` member is the entry's hash code and the `next` member points to the ++next entry in case of collisions (i.e. if multiple entries map to the same ++bucket). ++ ++`struct hashmap_iter`:: ++ ++ An iterator structure, to be used with hashmap_iter_* functions. ++ ++Types ++----- ++ ++`int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, const void *keydata)`:: ++ ++ User-supplied function to test two hashmap entries for equality. Shall ++ return 0 if the entries are equal. +++ ++This function is always called with non-NULL `entry` / `entry_or_key` ++parameters that have the same hash code. When looking up an entry, the `key` ++and `keydata` parameters to hashmap_get and hashmap_remove are always passed ++as second and third argument, respectively. Otherwise, `keydata` is NULL. ++ ++Functions ++--------- ++ ++`unsigned int strhash(const char *buf)`:: ++`unsigned int strihash(const char *buf)`:: ++`unsigned int memhash(const void *buf, size_t len)`:: ++`unsigned int memihash(const void *buf, size_t len)`:: ++ ++ Ready-to-use hash functions for strings, using the FNV-1 algorithm (see ++ http://www.isthe.com/chongo/tech/comp/fnv). +++ ++`strhash` and `strihash` take 0-terminated strings, while `memhash` and ++`memihash` operate on arbitrary-length memory. +++ ++`strihash` and `memihash` are case insensitive versions. ++ ++`void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, size_t initial_size)`:: ++ ++ Initializes a hashmap structure. +++ ++`map` is the hashmap to initialize. +++ ++The `equals_function` can be specified to compare two entries for equality. ++If NULL, entries are considered equal if their hash codes are equal. +++ ++If the total number of entries is known in advance, the `initial_size` ++parameter may be used to preallocate a sufficiently large table and thus ++prevent expensive resizing. If 0, the table is dynamically resized. ++ ++`void hashmap_free(struct hashmap *map, int free_entries)`:: ++ ++ Frees a hashmap structure and allocated memory. +++ ++`map` is the hashmap to free. +++ ++If `free_entries` is true, each hashmap_entry in the map is freed as well ++(using stdlib's free()). ++ ++`void hashmap_entry_init(void *entry, int hash)`:: ++ ++ Initializes a hashmap_entry structure. +++ ++`entry` points to the entry to initialize. +++ ++`hash` is the hash code of the entry. ++ ++`void *hashmap_get(const struct hashmap *map, const void *key, const void *keydata)`:: ++ ++ Returns the hashmap entry for the specified key, or NULL if not found. +++ ++`map` is the hashmap structure. +++ ++`key` is a hashmap_entry structure (or user data structure that starts with ++hashmap_entry) that has at least been initialized with the proper hash code ++(via `hashmap_entry_init`). +++ ++If an entry with matching hash code is found, `key` and `keydata` are passed ++to `hashmap_cmp_fn` to decide whether the entry matches the key. ++ ++`void *hashmap_get_from_hash(const struct hashmap *map, unsigned int hash, const void *keydata)`:: ++ ++ Returns the hashmap entry for the specified hash code and key data, ++ or NULL if not found. +++ ++`map` is the hashmap structure. +++ ++`hash` is the hash code of the entry to look up. +++ ++If an entry with matching hash code is found, `keydata` is passed to ++`hashmap_cmp_fn` to decide whether the entry matches the key. The ++`entry_or_key` parameter points to a bogus hashmap_entry structure that ++should not be used in the comparison. ++ ++`void *hashmap_get_next(const struct hashmap *map, const void *entry)`:: ++ ++ Returns the next equal hashmap entry, or NULL if not found. This can be ++ used to iterate over duplicate entries (see `hashmap_add`). +++ ++`map` is the hashmap structure. +++ ++`entry` is the hashmap_entry to start the search from, obtained via a previous ++call to `hashmap_get` or `hashmap_get_next`. ++ ++`void hashmap_add(struct hashmap *map, void *entry)`:: ++ ++ Adds a hashmap entry. This allows to add duplicate entries (i.e. ++ separate values with the same key according to hashmap_cmp_fn). +++ ++`map` is the hashmap structure. +++ ++`entry` is the entry to add. ++ ++`void *hashmap_put(struct hashmap *map, void *entry)`:: ++ ++ Adds or replaces a hashmap entry. If the hashmap contains duplicate ++ entries equal to the specified entry, only one of them will be replaced. +++ ++`map` is the hashmap structure. +++ ++`entry` is the entry to add or replace. +++ ++Returns the replaced entry, or NULL if not found (i.e. the entry was added). ++ ++`void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)`:: ++ ++ Removes a hashmap entry matching the specified key. If the hashmap ++ contains duplicate entries equal to the specified key, only one of ++ them will be removed. +++ ++`map` is the hashmap structure. +++ ++`key` is a hashmap_entry structure (or user data structure that starts with ++hashmap_entry) that has at least been initialized with the proper hash code ++(via `hashmap_entry_init`). +++ ++If an entry with matching hash code is found, `key` and `keydata` are ++passed to `hashmap_cmp_fn` to decide whether the entry matches the key. +++ ++Returns the removed entry, or NULL if not found. ++ ++`void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter)`:: ++`void *hashmap_iter_next(struct hashmap_iter *iter)`:: ++`void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`:: ++ ++ Used to iterate over all entries of a hashmap. +++ ++`hashmap_iter_init` initializes a `hashmap_iter` structure. +++ ++`hashmap_iter_next` returns the next hashmap_entry, or NULL if there are no ++more entries. +++ ++`hashmap_iter_first` is a combination of both (i.e. initializes the iterator ++and returns the first entry, if any). ++ ++Usage example ++------------- ++ ++Here's a simple usage example that maps long keys to double values. ++[source,c] ++------------ ++struct hashmap map; ++ ++struct long2double { ++ struct hashmap_entry ent; /* must be the first member! */ ++ long key; ++ double value; ++}; ++ ++static int long2double_cmp(const struct long2double *e1, const struct long2double *e2, const void *unused) ++{ ++ return !(e1->key == e2->key); ++} ++ ++void long2double_init(void) ++{ ++ hashmap_init(&map, (hashmap_cmp_fn) long2double_cmp, 0); ++} ++ ++void long2double_free(void) ++{ ++ hashmap_free(&map, 1); ++} ++ ++static struct long2double *find_entry(long key) ++{ ++ struct long2double k; ++ hashmap_entry_init(&k, memhash(&key, sizeof(long))); ++ k.key = key; ++ return hashmap_get(&map, &k, NULL); ++} ++ ++double get_value(long key) ++{ ++ struct long2double *e = find_entry(key); ++ return e ? e->value : 0; ++} ++ ++void set_value(long key, double value) ++{ ++ struct long2double *e = find_entry(key); ++ if (!e) { ++ e = malloc(sizeof(struct long2double)); ++ hashmap_entry_init(e, memhash(&key, sizeof(long))); ++ e->key = key; ++ hashmap_add(&map, e); ++ } ++ e->value = value; ++} ++------------ ++ ++Using variable-sized keys ++------------------------- ++ ++The `hashmap_entry_get` and `hashmap_entry_remove` functions expect an ordinary ++`hashmap_entry` structure as key to find the correct entry. If the key data is ++variable-sized (e.g. a FLEX_ARRAY string) or quite large, it is undesirable ++to create a full-fledged entry structure on the heap and copy all the key data ++into the structure. ++ ++In this case, the `keydata` parameter can be used to pass ++variable-sized key data directly to the comparison function, and the `key` ++parameter can be a stripped-down, fixed size entry structure allocated on the ++stack. ++ ++See test-hashmap.c for an example using arbitrary-length strings as keys. +diff --git a/Makefile b/Makefile +index 0f931a2..daefb2f 100644 +--- a/Makefile ++++ b/Makefile +@@ -551,6 +551,7 @@ TEST_PROGRAMS_NEED_X += test-date + TEST_PROGRAMS_NEED_X += test-delta + TEST_PROGRAMS_NEED_X += test-dump-cache-tree + TEST_PROGRAMS_NEED_X += test-genrandom ++TEST_PROGRAMS_NEED_X += test-hashmap + TEST_PROGRAMS_NEED_X += test-index-version + TEST_PROGRAMS_NEED_X += test-line-buffer + TEST_PROGRAMS_NEED_X += test-match-trees +@@ -669,6 +670,7 @@ LIB_H += gpg-interface.h + LIB_H += graph.h + LIB_H += grep.h + LIB_H += hash.h ++LIB_H += hashmap.h + LIB_H += help.h + LIB_H += http.h + LIB_H += kwset.h +@@ -796,6 +798,7 @@ LIB_OBJS += gpg-interface.o + LIB_OBJS += graph.o + LIB_OBJS += grep.o + LIB_OBJS += hash.o ++LIB_OBJS += hashmap.o + LIB_OBJS += help.o + LIB_OBJS += hex.o + LIB_OBJS += ident.o +@@ -964,6 +967,7 @@ BUILTIN_OBJS += builtin/shortlog.o + BUILTIN_OBJS += builtin/show-branch.o + BUILTIN_OBJS += builtin/show-ref.o + BUILTIN_OBJS += builtin/stripspace.o ++BUILTIN_OBJS += builtin/submodule--helper.o + BUILTIN_OBJS += builtin/symbolic-ref.o + BUILTIN_OBJS += builtin/tag.o + BUILTIN_OBJS += builtin/tar-tree.o +diff --git a/builtin.h b/builtin.h +index faef559..c8330d9 100644 +--- a/builtin.h ++++ b/builtin.h +@@ -127,6 +127,7 @@ extern int cmd_show(int argc, const char **argv, const char *prefix); + extern int cmd_show_branch(int argc, const char **argv, const char *prefix); + extern int cmd_status(int argc, const char **argv, const char *prefix); + extern int cmd_stripspace(int argc, const char **argv, const char *prefix); ++extern int cmd_submodule__helper(int argc, const char **argv, const char *prefix); + extern int cmd_symbolic_ref(int argc, const char **argv, const char *prefix); + extern int cmd_tag(int argc, const char **argv, const char *prefix); + extern int cmd_tar_tree(int argc, const char **argv, const char *prefix); +diff --git a/builtin/apply.c b/builtin/apply.c +index 30eefc3..48e900d 100644 +--- a/builtin/apply.c ++++ b/builtin/apply.c +@@ -50,6 +50,7 @@ static int apply_verbosely; + static int allow_overlap; + static int no_add; + static int threeway; ++static int unsafe_paths; + static const char *fake_ancestor; + static int line_termination = '\n'; + static unsigned int p_context = UINT_MAX; +@@ -3135,7 +3136,7 @@ static int load_patch_target(struct strbuf *buf, + const char *name, + unsigned expected_mode) + { +- if (cached) { ++ if (cached || check_index) { + if (read_file_or_gitlink(ce, buf)) + return error(_("read of %s failed"), name); + } else if (name) { +@@ -3144,6 +3145,8 @@ static int load_patch_target(struct strbuf *buf, + return read_file_or_gitlink(ce, buf); + else + return SUBMODULE_PATCH_WITHOUT_INDEX; ++ } else if (has_symlink_leading_path(name, strlen(name))) { ++ return error(_("reading from '%s' beyond a symbolic link"), name); + } else { + if (read_old_data(st, name, buf)) + return error(_("read of %s failed"), name); +@@ -3482,6 +3485,121 @@ static int check_to_create(const char *new_name, int ok_if_exists) + return 0; + } + ++/* ++ * We need to keep track of how symlinks in the preimage are ++ * manipulated by the patches. A patch to add a/b/c where a/b ++ * is a symlink should not be allowed to affect the directory ++ * the symlink points at, but if the same patch removes a/b, ++ * it is perfectly fine, as the patch removes a/b to make room ++ * to create a directory a/b so that a/b/c can be created. ++ */ ++static struct string_list symlink_changes; ++#define SYMLINK_GOES_AWAY 01 ++#define SYMLINK_IN_RESULT 02 ++ ++static uintptr_t register_symlink_changes(const char *path, uintptr_t what) ++{ ++ struct string_list_item *ent; ++ ++ ent = string_list_lookup(&symlink_changes, path); ++ if (!ent) { ++ ent = string_list_insert(&symlink_changes, path); ++ ent->util = (void *)0; ++ } ++ ent->util = (void *)(what | ((uintptr_t)ent->util)); ++ return (uintptr_t)ent->util; ++} ++ ++static uintptr_t check_symlink_changes(const char *path) ++{ ++ struct string_list_item *ent; ++ ++ ent = string_list_lookup(&symlink_changes, path); ++ if (!ent) ++ return 0; ++ return (uintptr_t)ent->util; ++} ++ ++static void prepare_symlink_changes(struct patch *patch) ++{ ++ for ( ; patch; patch = patch->next) { ++ if ((patch->old_name && S_ISLNK(patch->old_mode)) && ++ (patch->is_rename || patch->is_delete)) ++ /* the symlink at patch->old_name is removed */ ++ register_symlink_changes(patch->old_name, SYMLINK_GOES_AWAY); ++ ++ if (patch->new_name && S_ISLNK(patch->new_mode)) ++ /* the symlink at patch->new_name is created or remains */ ++ register_symlink_changes(patch->new_name, SYMLINK_IN_RESULT); ++ } ++} ++ ++static int path_is_beyond_symlink_1(struct strbuf *name) ++{ ++ do { ++ unsigned int change; ++ ++ while (--name->len && name->buf[name->len] != '/') ++ ; /* scan backwards */ ++ if (!name->len) ++ break; ++ name->buf[name->len] = '\0'; ++ change = check_symlink_changes(name->buf); ++ if (change & SYMLINK_IN_RESULT) ++ return 1; ++ if (change & SYMLINK_GOES_AWAY) ++ /* ++ * This cannot be "return 0", because we may ++ * see a new one created at a higher level. ++ */ ++ continue; ++ ++ /* otherwise, check the preimage */ ++ if (check_index) { ++ struct cache_entry *ce; ++ ++ ce = cache_name_exists(name->buf, name->len, ignore_case); ++ if (ce && S_ISLNK(ce->ce_mode)) ++ return 1; ++ } else { ++ struct stat st; ++ if (!lstat(name->buf, &st) && S_ISLNK(st.st_mode)) ++ return 1; ++ } ++ } while (1); ++ return 0; ++} ++ ++static int path_is_beyond_symlink(const char *name_) ++{ ++ int ret; ++ struct strbuf name = STRBUF_INIT; ++ ++ assert(*name_ != '\0'); ++ strbuf_addstr(&name, name_); ++ ret = path_is_beyond_symlink_1(&name); ++ strbuf_release(&name); ++ ++ return ret; ++} ++ ++static void die_on_unsafe_path(struct patch *patch) ++{ ++ const char *old_name = NULL; ++ const char *new_name = NULL; ++ if (patch->is_delete) ++ old_name = patch->old_name; ++ else if (!patch->is_new && !patch->is_copy) ++ old_name = patch->old_name; ++ if (!patch->is_delete) ++ new_name = patch->new_name; ++ ++ if (old_name && !verify_path(old_name, patch->old_mode)) ++ die(_("invalid path '%s'"), old_name); ++ if (new_name && !verify_path(new_name, patch->new_mode)) ++ die(_("invalid path '%s'"), new_name); ++} ++ + /* + * Check and apply the patch in-core; leave the result in patch->result + * for the caller to write it out to the final destination. +@@ -3569,6 +3687,22 @@ static int check_patch(struct patch *patch) + } + } + ++ if (!unsafe_paths) ++ die_on_unsafe_path(patch); ++ ++ /* ++ * An attempt to read from or delete a path that is beyond a ++ * symbolic link will be prevented by load_patch_target() that ++ * is called at the beginning of apply_data() so we do not ++ * have to worry about a patch marked with "is_delete" bit ++ * here. We however need to make sure that the patch result ++ * is not deposited to a path that is beyond a symbolic link ++ * here. ++ */ ++ if (!patch->is_delete && path_is_beyond_symlink(patch->new_name)) ++ return error(_("affected file '%s' is beyond a symbolic link"), ++ patch->new_name); ++ + if (apply_data(patch, &st, ce) < 0) + return error(_("%s: patch does not apply"), name); + patch->rejected = 0; +@@ -3579,6 +3713,7 @@ static int check_patch_list(struct patch *patch) + { + int err = 0; + ++ prepare_symlink_changes(patch); + prepare_fn_table(patch); + while (patch) { + if (apply_verbosely) +@@ -4378,6 +4513,8 @@ int cmd_apply(int argc, const char **argv, const char *prefix_) + N_("make sure the patch is applicable to the current index")), + OPT_BOOLEAN(0, "cached", &cached, + N_("apply a patch without touching the working tree")), ++ OPT_BOOL(0, "unsafe-paths", &unsafe_paths, ++ N_("accept a patch that touches outside the working area")), + OPT_BOOLEAN(0, "apply", &force_apply, + N_("also apply the patch (use with --stat/--summary/--check)")), + OPT_BOOL('3', "3way", &threeway, +@@ -4450,6 +4587,9 @@ int cmd_apply(int argc, const char **argv, const char *prefix_) + die(_("--cached outside a repository")); + check_index = 1; + } ++ if (check_index) ++ unsafe_paths = 0; ++ + for (i = 0; i < argc; i++) { + const char *arg = argv[i]; + int fd; +diff --git a/builtin/config.c b/builtin/config.c +index 19ffcaf..000d27c 100644 +--- a/builtin/config.c ++++ b/builtin/config.c +@@ -21,6 +21,7 @@ static char term = '\n'; + + static int use_global_config, use_system_config, use_local_config; + static const char *given_config_file; ++static const char *given_config_blob; + static int actions, types; + static const char *get_color_slot, *get_colorbool_slot; + static int end_null; +@@ -53,6 +54,7 @@ static struct option builtin_config_options[] = { + OPT_BOOLEAN(0, "system", &use_system_config, N_("use system config file")), + OPT_BOOLEAN(0, "local", &use_local_config, N_("use repository config file")), + OPT_STRING('f', "file", &given_config_file, N_("file"), N_("use given config file")), ++ OPT_STRING(0, "blob", &given_config_blob, N_("blob-id"), N_("read config from given blob object")), + OPT_GROUP(N_("Action")), + OPT_BIT(0, "get", &actions, N_("get value: name [value-regex]"), ACTION_GET), + OPT_BIT(0, "get-all", &actions, N_("get all values: key [value-regex]"), ACTION_GET_ALL), +@@ -218,7 +220,8 @@ static int get_value(const char *key_, const char *regex_) + } + + git_config_with_options(collect_config, &values, +- given_config_file, respect_includes); ++ given_config_file, given_config_blob, ++ respect_includes); + + ret = !values.nr; + +@@ -302,7 +305,8 @@ static void get_color(const char *def_color) + get_color_found = 0; + parsed_color[0] = '\0'; + git_config_with_options(git_get_color_config, NULL, +- given_config_file, respect_includes); ++ given_config_file, given_config_blob, ++ respect_includes); + + if (!get_color_found && def_color) + color_parse(def_color, "command line", parsed_color); +@@ -330,7 +334,8 @@ static int get_colorbool(int print) + get_colorbool_found = -1; + get_diff_color_found = -1; + git_config_with_options(git_get_colorbool_config, NULL, +- given_config_file, respect_includes); ++ given_config_file, given_config_blob, ++ respect_includes); + + if (get_colorbool_found < 0) { + if (!strcmp(get_colorbool_slot, "color.diff")) +@@ -348,6 +353,12 @@ static int get_colorbool(int print) + return get_colorbool_found ? 0 : 1; + } + ++static void check_blob_write(void) ++{ ++ if (given_config_blob) ++ die("writing config blobs is not supported"); ++} ++ + int cmd_config(int argc, const char **argv, const char *prefix) + { + int nongit = !startup_info->have_repository; +@@ -359,7 +370,8 @@ int cmd_config(int argc, const char **argv, const char *prefix) + builtin_config_usage, + PARSE_OPT_STOP_AT_NON_OPTION); + +- if (use_global_config + use_system_config + use_local_config + !!given_config_file > 1) { ++ if (use_global_config + use_system_config + use_local_config + ++ !!given_config_file + !!given_config_blob > 1) { + error("only one config file at a time."); + usage_with_options(builtin_config_usage, builtin_config_options); + } +@@ -438,6 +450,7 @@ int cmd_config(int argc, const char **argv, const char *prefix) + check_argc(argc, 0, 0); + if (git_config_with_options(show_all_config, NULL, + given_config_file, ++ given_config_blob, + respect_includes) < 0) { + if (given_config_file) + die_errno("unable to read config file '%s'", +@@ -450,6 +463,8 @@ int cmd_config(int argc, const char **argv, const char *prefix) + check_argc(argc, 0, 0); + if (!given_config_file && nongit) + die("not in a git directory"); ++ if (given_config_blob) ++ die("editing blobs is not supported"); + git_config(git_default_config, NULL); + launch_editor(given_config_file ? + given_config_file : git_path("config"), +@@ -457,6 +472,7 @@ int cmd_config(int argc, const char **argv, const char *prefix) + } + else if (actions == ACTION_SET) { + int ret; ++ check_blob_write(); + check_argc(argc, 2, 2); + value = normalize_value(argv[0], argv[1]); + ret = git_config_set_in_file(given_config_file, argv[0], value); +@@ -466,18 +482,21 @@ int cmd_config(int argc, const char **argv, const char *prefix) + return ret; + } + else if (actions == ACTION_SET_ALL) { ++ check_blob_write(); + check_argc(argc, 2, 3); + value = normalize_value(argv[0], argv[1]); + return git_config_set_multivar_in_file(given_config_file, + argv[0], value, argv[2], 0); + } + else if (actions == ACTION_ADD) { ++ check_blob_write(); + check_argc(argc, 2, 2); + value = normalize_value(argv[0], argv[1]); + return git_config_set_multivar_in_file(given_config_file, + argv[0], value, "^$", 0); + } + else if (actions == ACTION_REPLACE_ALL) { ++ check_blob_write(); + check_argc(argc, 2, 3); + value = normalize_value(argv[0], argv[1]); + return git_config_set_multivar_in_file(given_config_file, +@@ -500,6 +519,7 @@ int cmd_config(int argc, const char **argv, const char *prefix) + return get_value(argv[0], argv[1]); + } + else if (actions == ACTION_UNSET) { ++ check_blob_write(); + check_argc(argc, 1, 2); + if (argc == 2) + return git_config_set_multivar_in_file(given_config_file, +@@ -509,12 +529,14 @@ int cmd_config(int argc, const char **argv, const char *prefix) + argv[0], NULL); + } + else if (actions == ACTION_UNSET_ALL) { ++ check_blob_write(); + check_argc(argc, 1, 2); + return git_config_set_multivar_in_file(given_config_file, + argv[0], NULL, argv[1], 1); + } + else if (actions == ACTION_RENAME_SECTION) { + int ret; ++ check_blob_write(); + check_argc(argc, 2, 2); + ret = git_config_rename_section_in_file(given_config_file, + argv[0], argv[1]); +@@ -525,6 +547,7 @@ int cmd_config(int argc, const char **argv, const char *prefix) + } + else if (actions == ACTION_REMOVE_SECTION) { + int ret; ++ check_blob_write(); + check_argc(argc, 1, 1); + ret = git_config_rename_section_in_file(given_config_file, + argv[0], NULL); +diff --git a/builtin/fsck.c b/builtin/fsck.c +index bb9a2cd..b59f956 100644 +--- a/builtin/fsck.c ++++ b/builtin/fsck.c +@@ -35,14 +35,6 @@ static int show_dangling = 1; + #define ERROR_REACHABLE 02 + #define ERROR_PACK 04 + +-#ifdef NO_D_INO_IN_DIRENT +-#define SORT_DIRENT 0 +-#define DIRENT_SORT_HINT(de) 0 +-#else +-#define SORT_DIRENT 1 +-#define DIRENT_SORT_HINT(de) ((de)->d_ino) +-#endif +- + static void objreport(struct object *obj, const char *severity, + const char *err, va_list params) + { +@@ -288,7 +280,7 @@ static void check_connectivity(void) + } + } + +-static int fsck_obj(struct object *obj) ++static int fsck_obj(struct object *obj, void *buffer, unsigned long size) + { + if (obj->flags & SEEN) + return 0; +@@ -300,7 +292,7 @@ static int fsck_obj(struct object *obj) + + if (fsck_walk(obj, mark_used, NULL)) + objerror(obj, "broken links"); +- if (fsck_object(obj, check_strict, fsck_error_func)) ++ if (fsck_object(obj, buffer, size, check_strict, fsck_error_func)) + return -1; + + if (obj->type == OBJ_TREE) { +@@ -332,17 +324,6 @@ static int fsck_obj(struct object *obj) + return 0; + } + +-static int fsck_sha1(const unsigned char *sha1) +-{ +- struct object *obj = parse_object(sha1); +- if (!obj) { +- errors_found |= ERROR_OBJECT; +- return error("%s: object corrupt or missing", +- sha1_to_hex(sha1)); +- } +- return fsck_obj(obj); +-} +- + static int fsck_obj_buffer(const unsigned char *sha1, enum object_type type, + unsigned long size, void *buffer, int *eaten) + { +@@ -352,86 +333,69 @@ static int fsck_obj_buffer(const unsigned char *sha1, enum object_type type, + errors_found |= ERROR_OBJECT; + return error("%s: object corrupt or missing", sha1_to_hex(sha1)); + } +- return fsck_obj(obj); ++ return fsck_obj(obj, buffer, size); + } + +-/* +- * This is the sorting chunk size: make it reasonably +- * big so that we can sort well.. +- */ +-#define MAX_SHA1_ENTRIES (1024) +- +-struct sha1_entry { +- unsigned long ino; +- unsigned char sha1[20]; +-}; +- +-static struct { +- unsigned long nr; +- struct sha1_entry *entry[MAX_SHA1_ENTRIES]; +-} sha1_list; +- +-static int ino_compare(const void *_a, const void *_b) ++static inline int is_loose_object_file(struct dirent *de, ++ char *name, unsigned char *sha1) + { +- const struct sha1_entry *a = _a, *b = _b; +- unsigned long ino1 = a->ino, ino2 = b->ino; +- return ino1 < ino2 ? -1 : ino1 > ino2 ? 1 : 0; ++ if (strlen(de->d_name) != 38) ++ return 0; ++ memcpy(name + 2, de->d_name, 39); ++ return !get_sha1_hex(name, sha1); + } + +-static void fsck_sha1_list(void) ++static void fsck_loose(const unsigned char *sha1, const char *path) + { +- int i, nr = sha1_list.nr; +- +- if (SORT_DIRENT) +- qsort(sha1_list.entry, nr, +- sizeof(struct sha1_entry *), ino_compare); +- for (i = 0; i < nr; i++) { +- struct sha1_entry *entry = sha1_list.entry[i]; +- unsigned char *sha1 = entry->sha1; +- +- sha1_list.entry[i] = NULL; +- fsck_sha1(sha1); +- free(entry); ++ struct object *obj; ++ enum object_type type; ++ unsigned long size; ++ void *contents; ++ int eaten; ++ ++ if (read_loose_object(path, sha1, &type, &size, &contents) < 0) { ++ errors_found |= ERROR_OBJECT; ++ error("%s: object corrupt or missing: %s", ++ sha1_to_hex(sha1), path); ++ return; /* keep checking other objects */ + } +- sha1_list.nr = 0; +-} + +-static void add_sha1_list(unsigned char *sha1, unsigned long ino) +-{ +- struct sha1_entry *entry = xmalloc(sizeof(*entry)); +- int nr; +- +- entry->ino = ino; +- hashcpy(entry->sha1, sha1); +- nr = sha1_list.nr; +- if (nr == MAX_SHA1_ENTRIES) { +- fsck_sha1_list(); +- nr = 0; ++ if (!contents && type != OBJ_BLOB) ++ die("BUG: read_loose_object streamed a non-blob"); ++ ++ obj = parse_object_buffer(sha1, type, size, contents, &eaten); ++ ++ if (!obj) { ++ errors_found |= ERROR_OBJECT; ++ error("%s: object could not be parsed: %s", ++ sha1_to_hex(sha1), path); ++ if (!eaten) ++ free(contents); ++ return; /* keep checking other objects */ + } +- sha1_list.entry[nr] = entry; +- sha1_list.nr = ++nr; +-} + +-static inline int is_loose_object_file(struct dirent *de, +- char *name, unsigned char *sha1) +-{ +- if (strlen(de->d_name) != 38) +- return 0; +- memcpy(name + 2, de->d_name, 39); +- return !get_sha1_hex(name, sha1); ++ if (fsck_obj(obj, contents, size)) ++ errors_found |= ERROR_OBJECT; ++ ++ if (!eaten) ++ free(contents); + } + +-static void fsck_dir(int i, char *path) ++static void fsck_dir(int i, struct strbuf *path) + { +- DIR *dir = opendir(path); ++ DIR *dir = opendir(path->buf); + struct dirent *de; + char name[100]; ++ size_t dirlen; + + if (!dir) + return; + + if (verbose) +- fprintf(stderr, "Checking directory %s\n", path); ++ fprintf(stderr, "Checking directory %s\n", path->buf); ++ ++ strbuf_addch(path, '/'); ++ dirlen = path->len; + + sprintf(name, "%02x", i); + while ((de = readdir(dir)) != NULL) { +@@ -439,15 +403,20 @@ static void fsck_dir(int i, char *path) + + if (is_dot_or_dotdot(de->d_name)) + continue; ++ ++ strbuf_setlen(path, dirlen); ++ strbuf_addstr(path, de->d_name); ++ + if (is_loose_object_file(de, name, sha1)) { +- add_sha1_list(sha1, DIRENT_SORT_HINT(de)); ++ fsck_loose(sha1, path->buf); + continue; + } + if (!prefixcmp(de->d_name, "tmp_obj_")) + continue; +- fprintf(stderr, "bad sha1 file: %s/%s\n", path, de->d_name); ++ fprintf(stderr, "bad sha1 file: %s\n", path->buf); + } + closedir(dir); ++ strbuf_setlen(path, dirlen-1); + } + + static int default_refs; +@@ -533,24 +502,28 @@ static void get_default_heads(void) + } + } + +-static void fsck_object_dir(const char *path) ++static void fsck_object_dir(struct strbuf *path) + { + int i; + struct progress *progress = NULL; ++ size_t dirlen; + + if (verbose) + fprintf(stderr, "Checking object directory\n"); + ++ strbuf_addch(path, '/'); ++ dirlen = path->len; ++ + if (show_progress) + progress = start_progress("Checking object directories", 256); + for (i = 0; i < 256; i++) { +- static char dir[4096]; +- sprintf(dir, "%s/%02x", path, i); +- fsck_dir(i, dir); ++ strbuf_setlen(path, dirlen); ++ strbuf_addf(path, "%02x", i); ++ fsck_dir(i, path); + display_progress(progress, i+1); + } + stop_progress(&progress); +- fsck_sha1_list(); ++ strbuf_setlen(path, dirlen - 1); + } + + static int fsck_head_link(void) +@@ -629,6 +602,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) + { + int i, heads; + struct alternate_object_database *alt; ++ struct strbuf dir = STRBUF_INIT; + + errors_found = 0; + read_replace_refs = 0; +@@ -646,15 +620,14 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) + } + + fsck_head_link(); +- fsck_object_dir(get_object_directory()); ++ strbuf_addstr(&dir, get_object_directory()); ++ fsck_object_dir(&dir); + + prepare_alt_odb(); + for (alt = alt_odb_list; alt; alt = alt->next) { +- char namebuf[PATH_MAX]; +- int namelen = alt->name - alt->base; +- memcpy(namebuf, alt->base, namelen); +- namebuf[namelen - 1] = 0; +- fsck_object_dir(namebuf); ++ strbuf_reset(&dir); ++ strbuf_add(&dir, alt->base, alt->name - alt->base - 1); ++ fsck_object_dir(&dir); + } + + if (check_full) { +@@ -681,6 +654,9 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) + count += p->num_objects; + } + stop_progress(&progress); ++ ++ if (fsck_finish(fsck_error_func)) ++ errors_found |= ERROR_OBJECT; + } + + heads = 0; +@@ -734,5 +710,6 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) + } + + check_connectivity(); ++ strbuf_release(&dir); + return errors_found; + } +diff --git a/builtin/index-pack.c b/builtin/index-pack.c +index 79dfe47..3cc2bf6 100644 +--- a/builtin/index-pack.c ++++ b/builtin/index-pack.c +@@ -742,6 +742,8 @@ static void sha1_object(const void *data, struct object_entry *obj_entry, + blob->object.flags |= FLAG_CHECKED; + else + die(_("invalid blob object %s"), sha1_to_hex(sha1)); ++ if (fsck_object(&blob->object, (void *)data, size, 1, fsck_error_function)) ++ die(_("fsck error in packed object")); + } else { + struct object *obj; + int eaten; +@@ -757,8 +759,9 @@ static void sha1_object(const void *data, struct object_entry *obj_entry, + obj = parse_object_buffer(sha1, type, size, buf, &eaten); + if (!obj) + die(_("invalid %s"), typename(type)); +- if (fsck_object(obj, 1, fsck_error_function)) +- die(_("Error in object")); ++ if (fsck_object(obj, buf, size, 1, ++ fsck_error_function)) ++ die(_("fsck error in packed object")); + if (fsck_walk(obj, mark_link, NULL)) + die(_("Not all child objects of %s are reachable"), sha1_to_hex(obj->sha1)); + +@@ -1320,6 +1323,8 @@ static void final(const char *final_pack_name, const char *curr_pack_name, + } else + chmod(final_index_name, 0444); + ++ add_packed_git(final_index_name, strlen(final_index_name), 0); ++ + if (!from_stdin) { + printf("%s\n", sha1_to_hex(sha1)); + } else { +@@ -1643,6 +1648,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) + pack_sha1); + else + close(input_fd); ++ ++ if (fsck_finish(fsck_error_function)) ++ die(_("fsck error in pack objects")); ++ + free(objects); + free(index_name_buf); + free(keep_name_buf); +diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c +new file mode 100644 +index 0000000..cc79d05 +--- /dev/null ++++ b/builtin/submodule--helper.c +@@ -0,0 +1,35 @@ ++#include "builtin.h" ++#include "submodule.h" ++#include "strbuf.h" ++ ++/* ++ * Exit non-zero if any of the submodule names given on the command line is ++ * invalid. If no names are given, filter stdin to print only valid names ++ * (which is primarily intended for testing). ++ */ ++static int check_name(int argc, const char **argv, const char *prefix) ++{ ++ if (argc > 1) { ++ while (*++argv) { ++ if (check_submodule_name(*argv) < 0) ++ return 1; ++ } ++ } else { ++ struct strbuf buf = STRBUF_INIT; ++ while (strbuf_getline(&buf, stdin, '\n') != EOF) { ++ if (!check_submodule_name(buf.buf)) ++ printf("%s\n", buf.buf); ++ } ++ strbuf_release(&buf); ++ } ++ return 0; ++} ++ ++int cmd_submodule__helper(int argc, const char **argv, const char *prefix) ++{ ++ if (argc < 2) ++ usage("git submodule--helper "); ++ if (!strcmp(argv[1], "check-name")) ++ return check_name(argc - 1, argv + 1, prefix); ++ die(_("'%s' is not a valid submodule--helper subcommand"), argv[1]); ++} +diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c +index 2217d7b..f5a44ba 100644 +--- a/builtin/unpack-objects.c ++++ b/builtin/unpack-objects.c +@@ -164,10 +164,10 @@ static unsigned nr_objects; + * Called only from check_object() after it verified this object + * is Ok. + */ +-static void write_cached_object(struct object *obj) ++static void write_cached_object(struct object *obj, struct obj_buffer *obj_buf) + { + unsigned char sha1[20]; +- struct obj_buffer *obj_buf = lookup_object_buffer(obj); ++ + if (write_sha1_file(obj_buf->buffer, obj_buf->size, typename(obj->type), sha1) < 0) + die("failed to write object %s", sha1_to_hex(obj->sha1)); + obj->flags |= FLAG_WRITTEN; +@@ -180,6 +180,8 @@ static void write_cached_object(struct object *obj) + */ + static int check_object(struct object *obj, int type, void *data) + { ++ struct obj_buffer *obj_buf; ++ + if (!obj) + return 1; + +@@ -198,11 +200,15 @@ static int check_object(struct object *obj, int type, void *data) + return 0; + } + +- if (fsck_object(obj, 1, fsck_error_function)) +- die("Error in object"); ++ obj_buf = lookup_object_buffer(obj); ++ if (!obj_buf) ++ die("Whoops! Cannot find object '%s'", sha1_to_hex(obj->sha1)); ++ if (fsck_object(obj, obj_buf->buffer, obj_buf->size, 1, ++ fsck_error_function)) ++ die("fsck error in packed object"); + if (fsck_walk(obj, check_object, NULL)) + die("Error on reachable objects of %s", sha1_to_hex(obj->sha1)); +- write_cached_object(obj); ++ write_cached_object(obj, obj_buf); + return 0; + } + +@@ -548,8 +554,11 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix) + unpack_all(); + git_SHA1_Update(&ctx, buffer, offset); + git_SHA1_Final(sha1, &ctx); +- if (strict) ++ if (strict) { + write_rest(); ++ if (fsck_finish(fsck_error_function)) ++ die(_("fsck error in pack objects")); ++ } + if (hashcmp(fill(20), sha1)) + die("final sha1 did not match"); + use(20); +diff --git a/builtin/update-index.c b/builtin/update-index.c +index 5c7762e..bd3715e 100644 +--- a/builtin/update-index.c ++++ b/builtin/update-index.c +@@ -179,10 +179,9 @@ static int process_directory(const char *path, int len, struct stat *st) + return error("%s: is a directory - add files inside instead", path); + } + +-static int process_path(const char *path) ++static int process_path(const char *path, struct stat *st, int stat_errno) + { + int pos, len; +- struct stat st; + struct cache_entry *ce; + + len = strlen(path); +@@ -206,13 +205,13 @@ static int process_path(const char *path) + * First things first: get the stat information, to decide + * what to do about the pathname! + */ +- if (lstat(path, &st) < 0) +- return process_lstat_error(path, errno); ++ if (stat_errno) ++ return process_lstat_error(path, stat_errno); + +- if (S_ISDIR(st.st_mode)) +- return process_directory(path, len, &st); ++ if (S_ISDIR(st->st_mode)) ++ return process_directory(path, len, st); + +- return add_one_path(ce, path, len, &st); ++ return add_one_path(ce, path, len, st); + } + + static int add_cacheinfo(unsigned int mode, const unsigned char *sha1, +@@ -221,7 +220,7 @@ static int add_cacheinfo(unsigned int mode, const unsigned char *sha1, + int size, len, option; + struct cache_entry *ce; + +- if (!verify_path(path)) ++ if (!verify_path(path, mode)) + return error("Invalid path '%s'", path); + + len = strlen(path); +@@ -276,7 +275,17 @@ static void chmod_path(int flip, const char *path) + static void update_one(const char *path, const char *prefix, int prefix_length) + { + const char *p = prefix_path(prefix, prefix_length, path); +- if (!verify_path(p)) { ++ int stat_errno = 0; ++ struct stat st; ++ ++ if (mark_valid_only || mark_skip_worktree_only || force_remove) ++ st.st_mode = 0; ++ else if (lstat(p, &st) < 0) { ++ st.st_mode = 0; ++ stat_errno = errno; ++ } /* else stat is valid */ ++ ++ if (!verify_path(p, st.st_mode)) { + fprintf(stderr, "Ignoring path %s\n", path); + goto free_return; + } +@@ -297,7 +306,7 @@ static void update_one(const char *path, const char *prefix, int prefix_length) + report("remove '%s'", path); + goto free_return; + } +- if (process_path(p)) ++ if (process_path(p, &st, stat_errno)) + die("Unable to process path %s", path); + report("add '%s'", path); + free_return: +@@ -367,7 +376,7 @@ static void read_index_info(int line_termination) + path_name = uq.buf; + } + +- if (!verify_path(path_name)) { ++ if (!verify_path(path_name, mode)) { + fprintf(stderr, "Ignoring path %s\n", path_name); + continue; + } +diff --git a/cache.h b/cache.h +index 2ab9ffd..e0dc079 100644 +--- a/cache.h ++++ b/cache.h +@@ -448,7 +448,7 @@ extern int read_index_unmerged(struct index_state *); + extern int write_index(struct index_state *, int newfd); + extern int discard_index(struct index_state *); + extern int unmerged_index(const struct index_state *); +-extern int verify_path(const char *path); ++extern int verify_path(const char *path, unsigned mode); + extern struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen, int igncase); + extern int index_name_pos(const struct index_state *, const char *name, int namelen); + #define ADD_CACHE_OK_TO_ADD 1 /* Ok to add */ +@@ -883,6 +883,19 @@ extern int dwim_log(const char *str, int len, unsigned char *sha1, char **ref); + extern int interpret_branch_name(const char *str, struct strbuf *); + extern int get_sha1_mb(const char *str, unsigned char *sha1); + ++/* ++ * Open the loose object at path, check its sha1, and return the contents, ++ * type, and size. If the object is a blob, then "contents" may return NULL, ++ * to allow streaming of large blobs. ++ * ++ * Returns 0 on success, negative on error (details may be written to stderr). ++ */ ++int read_loose_object(const char *path, ++ const unsigned char *expected_sha1, ++ enum object_type *type, ++ unsigned long *size, ++ void **contents); ++ + extern int refname_match(const char *abbrev_name, const char *full_name, const char **rules); + extern const char *ref_rev_parse_rules[]; + #define ref_fetch_rules ref_rev_parse_rules +@@ -1150,11 +1163,15 @@ extern int update_server_info(int); + typedef int (*config_fn_t)(const char *, const char *, void *); + extern int git_default_config(const char *, const char *, void *); + extern int git_config_from_file(config_fn_t fn, const char *, void *); ++extern int git_config_from_buf(config_fn_t fn, const char *name, ++ const char *buf, size_t len, void *data); + extern void git_config_push_parameter(const char *text); + extern int git_config_from_parameters(config_fn_t fn, void *data); + extern int git_config(config_fn_t fn, void *); + extern int git_config_with_options(config_fn_t fn, void *, +- const char *filename, int respect_includes); ++ const char *filename, ++ const char *blob_ref, ++ int respect_includes); + extern int git_config_early(config_fn_t fn, void *, const char *repo_config); + extern int git_parse_ulong(const char *, unsigned long *); + extern int git_config_int(const char *, const char *); +diff --git a/config.c b/config.c +index 830ee14..201930f 100644 +--- a/config.c ++++ b/config.c +@@ -10,20 +10,69 @@ + #include "strbuf.h" + #include "quote.h" + +-typedef struct config_file { +- struct config_file *prev; +- FILE *f; ++struct config_source { ++ struct config_source *prev; ++ union { ++ FILE *file; ++ struct config_buf { ++ const char *buf; ++ size_t len; ++ size_t pos; ++ } buf; ++ } u; + const char *name; ++ int die_on_error; + int linenr; + int eof; + struct strbuf value; + struct strbuf var; +-} config_file; + +-static config_file *cf; ++ int (*do_fgetc)(struct config_source *c); ++ int (*do_ungetc)(int c, struct config_source *conf); ++ long (*do_ftell)(struct config_source *c); ++}; ++ ++static struct config_source *cf; + + static int zlib_compression_seen; + ++static int config_file_fgetc(struct config_source *conf) ++{ ++ return fgetc(conf->u.file); ++} ++ ++static int config_file_ungetc(int c, struct config_source *conf) ++{ ++ return ungetc(c, conf->u.file); ++} ++ ++static long config_file_ftell(struct config_source *conf) ++{ ++ return ftell(conf->u.file); ++} ++ ++ ++static int config_buf_fgetc(struct config_source *conf) ++{ ++ if (conf->u.buf.pos < conf->u.buf.len) ++ return conf->u.buf.buf[conf->u.buf.pos++]; ++ ++ return EOF; ++} ++ ++static int config_buf_ungetc(int c, struct config_source *conf) ++{ ++ if (conf->u.buf.pos > 0) ++ return conf->u.buf.buf[--conf->u.buf.pos]; ++ ++ return EOF; ++} ++ ++static long config_buf_ftell(struct config_source *conf) ++{ ++ return conf->u.buf.pos; ++} ++ + #define MAX_INCLUDE_DEPTH 10 + static const char include_depth_advice[] = + "exceeded maximum include depth (%d) while including\n" +@@ -168,27 +217,22 @@ int git_config_from_parameters(config_fn_t fn, void *data) + + static int get_next_char(void) + { +- int c; +- FILE *f; +- +- c = '\n'; +- if (cf && ((f = cf->f) != NULL)) { +- c = fgetc(f); +- if (c == '\r') { +- /* DOS like systems */ +- c = fgetc(f); +- if (c != '\n') { +- ungetc(c, f); +- c = '\r'; +- } +- } +- if (c == '\n') +- cf->linenr++; +- if (c == EOF) { +- cf->eof = 1; +- c = '\n'; ++ int c = cf->do_fgetc(cf); ++ ++ if (c == '\r') { ++ /* DOS like systems */ ++ c = cf->do_fgetc(cf); ++ if (c != '\n') { ++ cf->do_ungetc(c, cf); ++ c = '\r'; + } + } ++ if (c == '\n') ++ cf->linenr++; ++ if (c == EOF) { ++ cf->eof = 1; ++ c = '\n'; ++ } + return c; + } + +@@ -339,7 +383,7 @@ static int get_base_var(struct strbuf *name) + } + } + +-static int git_parse_file(config_fn_t fn, void *data) ++static int git_parse_source(config_fn_t fn, void *data) + { + int comment = 0; + int baselen = 0; +@@ -399,7 +443,10 @@ static int git_parse_file(config_fn_t fn, void *data) + if (get_value(fn, data, var) < 0) + break; + } +- die("bad config file line %d in %s", cf->linenr, cf->name); ++ if (cf->die_on_error) ++ die("bad config file line %d in %s", cf->linenr, cf->name); ++ else ++ return error("bad config file line %d in %s", cf->linenr, cf->name); + } + + static int parse_unit_factor(const char *end, uintmax_t *val) +@@ -896,6 +943,33 @@ int git_default_config(const char *var, const char *value, void *dummy) + return 0; + } + ++/* ++ * All source specific fields in the union, die_on_error, name and the callbacks ++ * fgetc, ungetc, ftell of top need to be initialized before calling ++ * this function. ++ */ ++static int do_config_from(struct config_source *top, config_fn_t fn, void *data) ++{ ++ int ret; ++ ++ /* push config-file parsing state stack */ ++ top->prev = cf; ++ top->linenr = 1; ++ top->eof = 0; ++ strbuf_init(&top->value, 1024); ++ strbuf_init(&top->var, 1024); ++ cf = top; ++ ++ ret = git_parse_source(fn, data); ++ ++ /* pop config-file parsing state stack */ ++ strbuf_release(&top->value); ++ strbuf_release(&top->var); ++ cf = top->prev; ++ ++ return ret; ++} ++ + int git_config_from_file(config_fn_t fn, const char *filename, void *data) + { + int ret; +@@ -903,30 +977,74 @@ int git_config_from_file(config_fn_t fn, const char *filename, void *data) + + ret = -1; + if (f) { +- config_file top; ++ struct config_source top; + +- /* push config-file parsing state stack */ +- top.prev = cf; +- top.f = f; ++ top.u.file = f; + top.name = filename; +- top.linenr = 1; +- top.eof = 0; +- strbuf_init(&top.value, 1024); +- strbuf_init(&top.var, 1024); +- cf = ⊤ +- +- ret = git_parse_file(fn, data); ++ top.die_on_error = 1; ++ top.do_fgetc = config_file_fgetc; ++ top.do_ungetc = config_file_ungetc; ++ top.do_ftell = config_file_ftell; + +- /* pop config-file parsing state stack */ +- strbuf_release(&top.value); +- strbuf_release(&top.var); +- cf = top.prev; ++ ret = do_config_from(&top, fn, data); + + fclose(f); + } + return ret; + } + ++int git_config_from_buf(config_fn_t fn, const char *name, const char *buf, ++ size_t len, void *data) ++{ ++ struct config_source top; ++ ++ top.u.buf.buf = buf; ++ top.u.buf.len = len; ++ top.u.buf.pos = 0; ++ top.name = name; ++ top.die_on_error = 0; ++ top.do_fgetc = config_buf_fgetc; ++ top.do_ungetc = config_buf_ungetc; ++ top.do_ftell = config_buf_ftell; ++ ++ return do_config_from(&top, fn, data); ++} ++ ++static int git_config_from_blob_sha1(config_fn_t fn, ++ const char *name, ++ const unsigned char *sha1, ++ void *data) ++{ ++ enum object_type type; ++ char *buf; ++ unsigned long size; ++ int ret; ++ ++ buf = read_sha1_file(sha1, &type, &size); ++ if (!buf) ++ return error("unable to load config blob object '%s'", name); ++ if (type != OBJ_BLOB) { ++ free(buf); ++ return error("reference '%s' does not point to a blob", name); ++ } ++ ++ ret = git_config_from_buf(fn, name, buf, size, data); ++ free(buf); ++ ++ return ret; ++} ++ ++static int git_config_from_blob_ref(config_fn_t fn, ++ const char *name, ++ void *data) ++{ ++ unsigned char sha1[20]; ++ ++ if (get_sha1(name, sha1) < 0) ++ return error("unable to resolve config blob '%s'", name); ++ return git_config_from_blob_sha1(fn, name, sha1, data); ++} ++ + const char *git_etc_gitconfig(void) + { + static const char *system_wide; +@@ -992,7 +1110,9 @@ int git_config_early(config_fn_t fn, void *data, const char *repo_config) + } + + int git_config_with_options(config_fn_t fn, void *data, +- const char *filename, int respect_includes) ++ const char *filename, ++ const char *blob_ref, ++ int respect_includes) + { + char *repo_config = NULL; + int ret; +@@ -1011,6 +1131,8 @@ int git_config_with_options(config_fn_t fn, void *data, + */ + if (filename) + return git_config_from_file(fn, filename, data); ++ else if (blob_ref) ++ return git_config_from_blob_ref(fn, blob_ref, data); + + repo_config = git_pathdup("config"); + ret = git_config_early(fn, data, repo_config); +@@ -1021,7 +1143,7 @@ int git_config_with_options(config_fn_t fn, void *data, + + int git_config(config_fn_t fn, void *data) + { +- return git_config_with_options(fn, data, NULL, 1); ++ return git_config_with_options(fn, data, NULL, NULL, 1); + } + + /* +@@ -1053,7 +1175,6 @@ static int store_aux(const char *key, const char *value, void *cb) + { + const char *ep; + size_t section_len; +- FILE *f = cf->f; + + switch (store.state) { + case KEY_SEEN: +@@ -1065,7 +1186,7 @@ static int store_aux(const char *key, const char *value, void *cb) + return 1; + } + +- store.offset[store.seen] = ftell(f); ++ store.offset[store.seen] = cf->do_ftell(cf); + store.seen++; + } + break; +@@ -1092,19 +1213,19 @@ static int store_aux(const char *key, const char *value, void *cb) + * Do not increment matches: this is no match, but we + * just made sure we are in the desired section. + */ +- store.offset[store.seen] = ftell(f); ++ store.offset[store.seen] = cf->do_ftell(cf); + /* fallthru */ + case SECTION_END_SEEN: + case START: + if (matches(key, value)) { +- store.offset[store.seen] = ftell(f); ++ store.offset[store.seen] = cf->do_ftell(cf); + store.state = KEY_SEEN; + store.seen++; + } else { + if (strrchr(key, '.') - key == store.baselen && + !strncmp(key, store.key, store.baselen)) { + store.state = SECTION_SEEN; +- store.offset[store.seen] = ftell(f); ++ store.offset[store.seen] = cf->do_ftell(cf); + } + } + } +diff --git a/fsck.c b/fsck.c +index 99c0497..8117241 100644 +--- a/fsck.c ++++ b/fsck.c +@@ -6,6 +6,42 @@ + #include "commit.h" + #include "tag.h" + #include "fsck.h" ++#include "hashmap.h" ++#include "submodule.h" ++ ++struct oidhash_entry { ++ struct hashmap_entry ent; ++ unsigned char sha1[20]; ++}; ++ ++static int oidhash_hashcmp(const void *va, const void *vb, ++ const void *vkey) ++{ ++ const struct oidhash_entry *a = va, *b = vb; ++ const unsigned char *key = vkey; ++ return hashcmp(a->sha1, key ? key : b->sha1); ++} ++ ++static struct hashmap gitmodules_found; ++static struct hashmap gitmodules_done; ++ ++static void oidhash_insert(struct hashmap *h, const unsigned char *sha1) ++{ ++ struct oidhash_entry *e; ++ ++ if (!h->tablesize) ++ hashmap_init(h, oidhash_hashcmp, 0); ++ e = xmalloc(sizeof(*e)); ++ hashmap_entry_init(&e->ent, sha1hash(sha1)); ++ hashcpy(e->sha1, sha1); ++ hashmap_add(h, e); ++} ++ ++static int oidhash_contains(struct hashmap *h, const unsigned char *sha1) ++{ ++ return h->tablesize && ++ !!hashmap_get_from_hash(h, sha1hash(sha1), sha1); ++} + + static int fsck_walk_tree(struct tree *tree, fsck_walk_func walk, void *data) + { +@@ -178,6 +214,16 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func) + if (!strcmp(name, ".git")) + has_dotgit = 1; + has_zero_pad |= *(char *)desc.buffer == '0'; ++ ++ if (!strcmp(name, ".gitmodules")) { ++ if (!S_ISLNK(mode)) ++ oidhash_insert(&gitmodules_found, sha1); ++ else ++ retval += error_func(&item->object, ++ FSCK_ERROR, ++ ".gitmodules is a symbolic link"); ++ } ++ + update_tree_entry(&desc); + + switch (mode) { +@@ -243,6 +289,26 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func) + return retval; + } + ++static int require_end_of_header(const void *data, unsigned long size, ++ struct object *obj, fsck_error error_func) ++{ ++ const char *buffer = (const char *)data; ++ unsigned long i; ++ ++ for (i = 0; i < size; i++) { ++ switch (buffer[i]) { ++ case '\0': ++ return error_func(obj, FSCK_ERROR, ++ "unterminated header: NUL at offset %d", i); ++ case '\n': ++ if (i + 1 < size && buffer[i + 1] == '\n') ++ return 0; ++ } ++ } ++ ++ return error_func(obj, FSCK_ERROR, "unterminated header"); ++} ++ + static int fsck_ident(char **ident, struct object *obj, fsck_error error_func) + { + if (**ident == '<') +@@ -279,9 +345,10 @@ static int fsck_ident(char **ident, struct object *obj, fsck_error error_func) + return 0; + } + +-static int fsck_commit(struct commit *commit, fsck_error error_func) ++static int fsck_commit(struct commit *commit, char *data, ++ unsigned long size, fsck_error error_func) + { +- char *buffer = commit->buffer; ++ char *buffer = data ? data : commit->buffer; + unsigned char tree_sha1[20], sha1[20]; + struct commit_graft *graft; + int parents = 0; +@@ -290,6 +357,9 @@ static int fsck_commit(struct commit *commit, fsck_error error_func) + if (commit->date == ULONG_MAX) + return error_func(&commit->object, FSCK_ERROR, "invalid author/committer line"); + ++ if (require_end_of_header(buffer, size, &commit->object, error_func)) ++ return -1; ++ + if (memcmp(buffer, "tree ", 5)) + return error_func(&commit->object, FSCK_ERROR, "invalid format - expected 'tree' line"); + if (get_sha1_hex(buffer+5, tree_sha1) || buffer[45] != '\n') +@@ -340,7 +410,8 @@ static int fsck_commit(struct commit *commit, fsck_error error_func) + return 0; + } + +-static int fsck_tag(struct tag *tag, fsck_error error_func) ++static int fsck_tag(struct tag *tag, const char *data, ++ unsigned long size, fsck_error error_func) + { + struct object *tagged = tag->tagged; + +@@ -349,19 +420,80 @@ static int fsck_tag(struct tag *tag, fsck_error error_func) + return 0; + } + +-int fsck_object(struct object *obj, int strict, fsck_error error_func) ++struct fsck_gitmodules_data { ++ struct object *obj; ++ fsck_error error_func; ++ int ret; ++}; ++ ++static int fsck_gitmodules_fn(const char *var, const char *value, void *vdata) ++{ ++ struct fsck_gitmodules_data *data = vdata; ++ const char *subsection, *key; ++ int subsection_len; ++ char *name; ++ ++ if (parse_config_key(var, "submodule", &subsection, &subsection_len, &key) < 0 || ++ !subsection) ++ return 0; ++ ++ name = xmemdupz(subsection, subsection_len); ++ if (check_submodule_name(name) < 0) ++ data->ret += data->error_func(data->obj, FSCK_ERROR, ++ "disallowed submodule name: %s", ++ name); ++ free(name); ++ ++ return 0; ++} ++ ++static int fsck_blob(struct blob *blob, const char *buf, ++ unsigned long size, fsck_error error_func) ++{ ++ struct fsck_gitmodules_data data; ++ ++ if (!oidhash_contains(&gitmodules_found, blob->object.sha1)) ++ return 0; ++ oidhash_insert(&gitmodules_done, blob->object.sha1); ++ ++ if (!buf) { ++ /* ++ * A missing buffer here is a sign that the caller found the ++ * blob too gigantic to load into memory. Let's just consider ++ * that an error. ++ */ ++ return error_func(&blob->object, FSCK_ERROR, ++ ".gitmodules too large to parse"); ++ } ++ ++ data.obj = &blob->object; ++ data.error_func = error_func; ++ data.ret = 0; ++ if (git_config_from_buf(fsck_gitmodules_fn, ".gitmodules", ++ buf, size, &data)) ++ data.ret += error_func(&blob->object, FSCK_ERROR, ++ "could not parse gitmodules blob"); ++ ++ return data.ret; ++} ++ ++int fsck_object(struct object *obj, void *data, unsigned long size, ++ int strict, fsck_error error_func) + { + if (!obj) + return error_func(obj, FSCK_ERROR, "no valid object to fsck"); + + if (obj->type == OBJ_BLOB) +- return 0; ++ return fsck_blob((struct blob *)obj, (const char *) data, ++ size, error_func); + if (obj->type == OBJ_TREE) + return fsck_tree((struct tree *) obj, strict, error_func); + if (obj->type == OBJ_COMMIT) +- return fsck_commit((struct commit *) obj, error_func); ++ return fsck_commit((struct commit *) obj, data, ++ size, error_func); + if (obj->type == OBJ_TAG) +- return fsck_tag((struct tag *) obj, error_func); ++ return fsck_tag((struct tag *) obj, (const char *) data, ++ size, error_func); + + return error_func(obj, FSCK_ERROR, "unknown type '%d' (internal fsck error)", + obj->type); +@@ -382,3 +514,47 @@ int fsck_error_function(struct object *obj, int type, const char *fmt, ...) + strbuf_release(&sb); + return 1; + } ++ ++int fsck_finish(fsck_error error_func) ++{ ++ int retval = 0; ++ struct hashmap_iter iter; ++ const struct oidhash_entry *e; ++ ++ hashmap_iter_init(&gitmodules_found, &iter); ++ while ((e = hashmap_iter_next(&iter))) { ++ const unsigned char *sha1 = e->sha1; ++ struct blob *blob; ++ enum object_type type; ++ unsigned long size; ++ char *buf; ++ ++ if (oidhash_contains(&gitmodules_done, sha1)) ++ continue; ++ ++ blob = lookup_blob(sha1); ++ if (!blob) { ++ retval += error_func(&blob->object, FSCK_ERROR, ++ "non-blob found at .gitmodules"); ++ continue; ++ } ++ ++ buf = read_sha1_file(sha1, &type, &size); ++ if (!buf) { ++ retval += error_func(&blob->object, FSCK_ERROR, ++ "unable to read .gitmodules blob"); ++ continue; ++ } ++ ++ if (type == OBJ_BLOB) ++ retval += fsck_blob(blob, buf, size, error_func); ++ else ++ retval += error_func(&blob->object, FSCK_ERROR, ++ "non-blob found at .gitmodules"); ++ free(buf); ++ } ++ ++ hashmap_free(&gitmodules_found, 1); ++ hashmap_free(&gitmodules_done, 1); ++ return retval; ++} +diff --git a/fsck.h b/fsck.h +index 1e4f527..06a0977 100644 +--- a/fsck.h ++++ b/fsck.h +@@ -28,6 +28,15 @@ int fsck_error_function(struct object *obj, int type, const char *fmt, ...); + * 0 everything OK + */ + int fsck_walk(struct object *obj, fsck_walk_func walk, void *data); +-int fsck_object(struct object *obj, int strict, fsck_error error_func); ++/* If NULL is passed for data, we assume the object is local and read it. */ ++int fsck_object(struct object *obj, void *data, unsigned long size, ++ int strict, fsck_error error_func); ++ ++/* ++ * Some fsck checks are context-dependent, and may end up queued; run this ++ * after completing all fsck_object() calls in order to resolve any remaining ++ * checks. ++ */ ++int fsck_finish(fsck_error error_func); + + #endif +diff --git a/git-compat-util.h b/git-compat-util.h +index 89abf8f..bd62ab0 100644 +--- a/git-compat-util.h ++++ b/git-compat-util.h +@@ -637,6 +637,23 @@ static inline int sane_iscase(int x, int is_lower) + return (x & 0x20) == 0; + } + ++/* ++ * Like skip_prefix, but compare case-insensitively. Note that the comparison ++ * is done via tolower(), so it is strictly ASCII (no multi-byte characters or ++ * locale-specific conversions). ++ */ ++static inline int skip_iprefix(const char *str, const char *prefix, ++ const char **out) ++{ ++ do { ++ if (!*prefix) { ++ *out = str; ++ return 1; ++ } ++ } while (tolower(*str++) == tolower(*prefix++)); ++ return 0; ++} ++ + static inline int strtoul_ui(char const *s, int base, unsigned int *result) + { + unsigned long ul; +diff --git a/git-submodule.sh b/git-submodule.sh +index bec3362..ca16579 100755 +--- a/git-submodule.sh ++++ b/git-submodule.sh +@@ -188,6 +188,19 @@ get_submodule_config () { + printf '%s' "${value:-$default}" + } + ++# ++# Check whether a submodule name is acceptable, dying if not. ++# ++# $1 = submodule name ++# ++check_module_name() ++{ ++ sm_name=$1 ++ if ! git submodule--helper check-name "$sm_name" ++ then ++ die "$(eval_gettext "'$sm_name' is not a valid submodule name")" ++ fi ++} + + # + # Map submodule path to submodule name +@@ -203,6 +216,7 @@ module_name() + sed -n -e 's|^submodule\.\(.*\)\.path '"$re"'$|\1|p' ) + test -z "$name" && + die "$(eval_gettext "No submodule mapping found in .gitmodules for path '\$sm_path'")" ++ check_module_name "$name" + echo "$name" + } + +@@ -389,6 +403,11 @@ Use -f if you really want to add it." >&2 + sm_name="$sm_path" + fi + ++ if ! git submodule--helper check-name "$sm_name" ++ then ++ die "$(eval_gettext "'$sm_name' is not a valid submodule name")" ++ fi ++ + # perhaps the path exists and is already a git repo, else clone it + if test -e "$sm_path" + then +@@ -481,7 +500,7 @@ cmd_foreach() + if test -e "$sm_path"/.git + then + say "$(eval_gettext "Entering '\$prefix\$sm_path'")" +- name=$(module_name "$sm_path") ++ name=$(module_name "$sm_path") || exit + ( + prefix="$prefix$sm_path/" + clear_local_git_env +diff --git a/git.c b/git.c +index 1ada169..db12099 100644 +--- a/git.c ++++ b/git.c +@@ -404,6 +404,7 @@ static void handle_internal_command(int argc, const char **argv) + { "stage", cmd_add, RUN_SETUP | NEED_WORK_TREE }, + { "status", cmd_status, RUN_SETUP | NEED_WORK_TREE }, + { "stripspace", cmd_stripspace }, ++ { "submodule--helper", cmd_submodule__helper }, + { "symbolic-ref", cmd_symbolic_ref, RUN_SETUP }, + { "tag", cmd_tag, RUN_SETUP }, + { "tar-tree", cmd_tar_tree }, +diff --git a/hashmap.c b/hashmap.c +new file mode 100644 +index 0000000..d1b8056 +--- /dev/null ++++ b/hashmap.c +@@ -0,0 +1,228 @@ ++/* ++ * Generic implementation of hash-based key value mappings. ++ */ ++#include "cache.h" ++#include "hashmap.h" ++ ++#define FNV32_BASE ((unsigned int) 0x811c9dc5) ++#define FNV32_PRIME ((unsigned int) 0x01000193) ++ ++unsigned int strhash(const char *str) ++{ ++ unsigned int c, hash = FNV32_BASE; ++ while ((c = (unsigned char) *str++)) ++ hash = (hash * FNV32_PRIME) ^ c; ++ return hash; ++} ++ ++unsigned int strihash(const char *str) ++{ ++ unsigned int c, hash = FNV32_BASE; ++ while ((c = (unsigned char) *str++)) { ++ if (c >= 'a' && c <= 'z') ++ c -= 'a' - 'A'; ++ hash = (hash * FNV32_PRIME) ^ c; ++ } ++ return hash; ++} ++ ++unsigned int memhash(const void *buf, size_t len) ++{ ++ unsigned int hash = FNV32_BASE; ++ unsigned char *ucbuf = (unsigned char *) buf; ++ while (len--) { ++ unsigned int c = *ucbuf++; ++ hash = (hash * FNV32_PRIME) ^ c; ++ } ++ return hash; ++} ++ ++unsigned int memihash(const void *buf, size_t len) ++{ ++ unsigned int hash = FNV32_BASE; ++ unsigned char *ucbuf = (unsigned char *) buf; ++ while (len--) { ++ unsigned int c = *ucbuf++; ++ if (c >= 'a' && c <= 'z') ++ c -= 'a' - 'A'; ++ hash = (hash * FNV32_PRIME) ^ c; ++ } ++ return hash; ++} ++ ++#define HASHMAP_INITIAL_SIZE 64 ++/* grow / shrink by 2^2 */ ++#define HASHMAP_RESIZE_BITS 2 ++/* load factor in percent */ ++#define HASHMAP_LOAD_FACTOR 80 ++ ++static void alloc_table(struct hashmap *map, unsigned int size) ++{ ++ map->tablesize = size; ++ map->table = xcalloc(size, sizeof(struct hashmap_entry *)); ++ ++ /* calculate resize thresholds for new size */ ++ map->grow_at = (unsigned int) ((uint64_t) size * HASHMAP_LOAD_FACTOR / 100); ++ if (size <= HASHMAP_INITIAL_SIZE) ++ map->shrink_at = 0; ++ else ++ /* ++ * The shrink-threshold must be slightly smaller than ++ * (grow-threshold / resize-factor) to prevent erratic resizing, ++ * thus we divide by (resize-factor + 1). ++ */ ++ map->shrink_at = map->grow_at / ((1 << HASHMAP_RESIZE_BITS) + 1); ++} ++ ++static inline int entry_equals(const struct hashmap *map, ++ const struct hashmap_entry *e1, const struct hashmap_entry *e2, ++ const void *keydata) ++{ ++ return (e1 == e2) || (e1->hash == e2->hash && !map->cmpfn(e1, e2, keydata)); ++} ++ ++static inline unsigned int bucket(const struct hashmap *map, ++ const struct hashmap_entry *key) ++{ ++ return key->hash & (map->tablesize - 1); ++} ++ ++static void rehash(struct hashmap *map, unsigned int newsize) ++{ ++ unsigned int i, oldsize = map->tablesize; ++ struct hashmap_entry **oldtable = map->table; ++ ++ alloc_table(map, newsize); ++ for (i = 0; i < oldsize; i++) { ++ struct hashmap_entry *e = oldtable[i]; ++ while (e) { ++ struct hashmap_entry *next = e->next; ++ unsigned int b = bucket(map, e); ++ e->next = map->table[b]; ++ map->table[b] = e; ++ e = next; ++ } ++ } ++ free(oldtable); ++} ++ ++static inline struct hashmap_entry **find_entry_ptr(const struct hashmap *map, ++ const struct hashmap_entry *key, const void *keydata) ++{ ++ struct hashmap_entry **e = &map->table[bucket(map, key)]; ++ while (*e && !entry_equals(map, *e, key, keydata)) ++ e = &(*e)->next; ++ return e; ++} ++ ++static int always_equal(const void *unused1, const void *unused2, const void *unused3) ++{ ++ return 0; ++} ++ ++void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, ++ size_t initial_size) ++{ ++ unsigned int size = HASHMAP_INITIAL_SIZE; ++ map->size = 0; ++ map->cmpfn = equals_function ? equals_function : always_equal; ++ ++ /* calculate initial table size and allocate the table */ ++ initial_size = (unsigned int) ((uint64_t) initial_size * 100 ++ / HASHMAP_LOAD_FACTOR); ++ while (initial_size > size) ++ size <<= HASHMAP_RESIZE_BITS; ++ alloc_table(map, size); ++} ++ ++void hashmap_free(struct hashmap *map, int free_entries) ++{ ++ if (!map || !map->table) ++ return; ++ if (free_entries) { ++ struct hashmap_iter iter; ++ struct hashmap_entry *e; ++ hashmap_iter_init(map, &iter); ++ while ((e = hashmap_iter_next(&iter))) ++ free(e); ++ } ++ free(map->table); ++ memset(map, 0, sizeof(*map)); ++} ++ ++void *hashmap_get(const struct hashmap *map, const void *key, const void *keydata) ++{ ++ return *find_entry_ptr(map, key, keydata); ++} ++ ++void *hashmap_get_next(const struct hashmap *map, const void *entry) ++{ ++ struct hashmap_entry *e = ((struct hashmap_entry *) entry)->next; ++ for (; e; e = e->next) ++ if (entry_equals(map, entry, e, NULL)) ++ return e; ++ return NULL; ++} ++ ++void hashmap_add(struct hashmap *map, void *entry) ++{ ++ unsigned int b = bucket(map, entry); ++ ++ /* add entry */ ++ ((struct hashmap_entry *) entry)->next = map->table[b]; ++ map->table[b] = entry; ++ ++ /* fix size and rehash if appropriate */ ++ map->size++; ++ if (map->size > map->grow_at) ++ rehash(map, map->tablesize << HASHMAP_RESIZE_BITS); ++} ++ ++void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata) ++{ ++ struct hashmap_entry *old; ++ struct hashmap_entry **e = find_entry_ptr(map, key, keydata); ++ if (!*e) ++ return NULL; ++ ++ /* remove existing entry */ ++ old = *e; ++ *e = old->next; ++ old->next = NULL; ++ ++ /* fix size and rehash if appropriate */ ++ map->size--; ++ if (map->size < map->shrink_at) ++ rehash(map, map->tablesize >> HASHMAP_RESIZE_BITS); ++ return old; ++} ++ ++void *hashmap_put(struct hashmap *map, void *entry) ++{ ++ struct hashmap_entry *old = hashmap_remove(map, entry, NULL); ++ hashmap_add(map, entry); ++ return old; ++} ++ ++void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter) ++{ ++ iter->map = map; ++ iter->tablepos = 0; ++ iter->next = NULL; ++} ++ ++void *hashmap_iter_next(struct hashmap_iter *iter) ++{ ++ struct hashmap_entry *current = iter->next; ++ for (;;) { ++ if (current) { ++ iter->next = current->next; ++ return current; ++ } ++ ++ if (iter->tablepos >= iter->map->tablesize) ++ return NULL; ++ ++ current = iter->map->table[iter->tablepos++]; ++ } ++} +diff --git a/hashmap.h b/hashmap.h +new file mode 100644 +index 0000000..a8b9e3d +--- /dev/null ++++ b/hashmap.h +@@ -0,0 +1,90 @@ ++#ifndef HASHMAP_H ++#define HASHMAP_H ++ ++/* ++ * Generic implementation of hash-based key-value mappings. ++ * See Documentation/technical/api-hashmap.txt. ++ */ ++ ++/* FNV-1 functions */ ++ ++extern unsigned int strhash(const char *buf); ++extern unsigned int strihash(const char *buf); ++extern unsigned int memhash(const void *buf, size_t len); ++extern unsigned int memihash(const void *buf, size_t len); ++ ++static inline unsigned int sha1hash(const unsigned char *sha1) ++{ ++ /* ++ * Equivalent to 'return *(unsigned int *)sha1;', but safe on ++ * platforms that don't support unaligned reads. ++ */ ++ unsigned int hash; ++ memcpy(&hash, sha1, sizeof(hash)); ++ return hash; ++} ++ ++/* data structures */ ++ ++struct hashmap_entry { ++ struct hashmap_entry *next; ++ unsigned int hash; ++}; ++ ++typedef int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, ++ const void *keydata); ++ ++struct hashmap { ++ struct hashmap_entry **table; ++ hashmap_cmp_fn cmpfn; ++ unsigned int size, tablesize, grow_at, shrink_at; ++}; ++ ++struct hashmap_iter { ++ struct hashmap *map; ++ struct hashmap_entry *next; ++ unsigned int tablepos; ++}; ++ ++/* hashmap functions */ ++ ++extern void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, ++ size_t initial_size); ++extern void hashmap_free(struct hashmap *map, int free_entries); ++ ++/* hashmap_entry functions */ ++ ++static inline void hashmap_entry_init(void *entry, unsigned int hash) ++{ ++ struct hashmap_entry *e = entry; ++ e->hash = hash; ++ e->next = NULL; ++} ++extern void *hashmap_get(const struct hashmap *map, const void *key, ++ const void *keydata); ++extern void *hashmap_get_next(const struct hashmap *map, const void *entry); ++extern void hashmap_add(struct hashmap *map, void *entry); ++extern void *hashmap_put(struct hashmap *map, void *entry); ++extern void *hashmap_remove(struct hashmap *map, const void *key, ++ const void *keydata); ++ ++static inline void *hashmap_get_from_hash(const struct hashmap *map, ++ unsigned int hash, const void *keydata) ++{ ++ struct hashmap_entry key; ++ hashmap_entry_init(&key, hash); ++ return hashmap_get(map, &key, keydata); ++} ++ ++/* hashmap_iter functions */ ++ ++extern void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter); ++extern void *hashmap_iter_next(struct hashmap_iter *iter); ++static inline void *hashmap_iter_first(struct hashmap *map, ++ struct hashmap_iter *iter) ++{ ++ hashmap_iter_init(map, iter); ++ return hashmap_iter_next(iter); ++} ++ ++#endif +diff --git a/read-cache.c b/read-cache.c +index 04ed561..a800c11 100644 +--- a/read-cache.c ++++ b/read-cache.c +@@ -684,7 +684,7 @@ struct cache_entry *make_cache_entry(unsigned int mode, + int size, len; + struct cache_entry *ce; + +- if (!verify_path(path)) { ++ if (!verify_path(path, mode)) { + error("Invalid path '%s'", path); + return NULL; + } +@@ -724,7 +724,7 @@ int ce_path_match(const struct cache_entry *ce, const struct pathspec *pathspec) + * Also, we don't want double slashes or slashes at the + * end that can make pathnames ambiguous. + */ +-static int verify_dotfile(const char *rest) ++static int verify_dotfile(const char *rest, unsigned mode) + { + /* + * The first character was '.', but that +@@ -738,16 +738,28 @@ static int verify_dotfile(const char *rest) + + switch (*rest) { + /* +- * ".git" followed by NUL or slash is bad. This +- * shares the path end test with the ".." case. ++ * ".git" followed by NUL or slash is bad. Note that we match ++ * case-insensitively here, even if ignore_case is not set. ++ * This outlaws ".GIT" everywhere out of an abundance of caution, ++ * since there's really no good reason to allow it. ++ * ++ * Once we've seen ".git", we can also find ".gitmodules", etc (also ++ * case-insensitively). + */ + case 'g': + if (rest[1] != 'i') + break; + if (rest[2] != 't') + break; +- rest += 2; +- /* fallthrough */ ++ if (rest[3] == '\0' || is_dir_sep(rest[3])) ++ return 0; ++ if (S_ISLNK(mode)) { ++ rest += 3; ++ if (skip_iprefix(rest, "modules", &rest) && ++ (*rest == '\0' || is_dir_sep(*rest))) ++ return 0; ++ } ++ break; + case '.': + if (rest[1] == '\0' || is_dir_sep(rest[1])) + return 0; +@@ -755,7 +767,7 @@ static int verify_dotfile(const char *rest) + return 1; + } + +-int verify_path(const char *path) ++int verify_path(const char *path, unsigned mode) + { + char c; + +@@ -769,7 +781,7 @@ int verify_path(const char *path) + if (is_dir_sep(c)) { + inside: + c = *path++; +- if ((c == '.' && !verify_dotfile(path)) || ++ if ((c == '.' && !verify_dotfile(path, mode)) || + is_dir_sep(c) || c == '\0') + return 0; + } +@@ -947,7 +959,7 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e + + if (!ok_to_add) + return -1; +- if (!verify_path(ce->name)) ++ if (!verify_path(ce->name, ce->ce_mode)) + return error("Invalid path '%s'", ce->name); + + if (!skip_df_check && +diff --git a/sha1_file.c b/sha1_file.c +index b114cc9..c92efa6 100644 +--- a/sha1_file.c ++++ b/sha1_file.c +@@ -1325,12 +1325,21 @@ static int open_sha1_file(const unsigned char *sha1) + return -1; + } + +-void *map_sha1_file(const unsigned char *sha1, unsigned long *size) ++/* ++ * Map the loose object at "path" if it is not NULL, or the path found by ++ * searching for a loose object named "sha1". ++ */ ++static void *map_sha1_file_1(const char *path, ++ const unsigned char *sha1, ++ unsigned long *size) + { + void *map; + int fd; + +- fd = open_sha1_file(sha1); ++ if (path) ++ fd = git_open_noatime(path); ++ else ++ fd = open_sha1_file(sha1); + map = NULL; + if (fd >= 0) { + struct stat st; +@@ -1394,6 +1403,11 @@ static int experimental_loose_object(unsigned char *map) + return 1; + } + ++void *map_sha1_file(const unsigned char *sha1, unsigned long *size) ++{ ++ return map_sha1_file_1(NULL, sha1, size); ++} ++ + unsigned long unpack_object_header_buffer(const unsigned char *buf, + unsigned long len, enum object_type *type, unsigned long *sizep) + { +@@ -3043,3 +3057,117 @@ void assert_sha1_type(const unsigned char *sha1, enum object_type expect) + die("%s is not a valid '%s' object", sha1_to_hex(sha1), + typename(expect)); + } ++ ++static int check_stream_sha1(git_zstream *stream, ++ const char *hdr, ++ unsigned long size, ++ const char *path, ++ const unsigned char *expected_sha1) ++{ ++ git_SHA_CTX c; ++ unsigned char real_sha1[20]; ++ unsigned char buf[4096]; ++ unsigned long total_read; ++ int status = Z_OK; ++ ++ git_SHA1_Init(&c); ++ git_SHA1_Update(&c, hdr, stream->total_out); ++ ++ /* ++ * We already read some bytes into hdr, but the ones up to the NUL ++ * do not count against the object's content size. ++ */ ++ total_read = stream->total_out - strlen(hdr) - 1; ++ ++ /* ++ * This size comparison must be "<=" to read the final zlib packets; ++ * see the comment in unpack_sha1_rest for details. ++ */ ++ while (total_read <= size && ++ (status == Z_OK || status == Z_BUF_ERROR)) { ++ stream->next_out = buf; ++ stream->avail_out = sizeof(buf); ++ if (size - total_read < stream->avail_out) ++ stream->avail_out = size - total_read; ++ status = git_inflate(stream, Z_FINISH); ++ git_SHA1_Update(&c, buf, stream->next_out - buf); ++ total_read += stream->next_out - buf; ++ } ++ git_inflate_end(stream); ++ ++ if (status != Z_STREAM_END) { ++ error("corrupt loose object '%s'", sha1_to_hex(expected_sha1)); ++ return -1; ++ } ++ ++ git_SHA1_Final(real_sha1, &c); ++ if (hashcmp(expected_sha1, real_sha1)) { ++ error("sha1 mismatch for %s (expected %s)", path, ++ sha1_to_hex(expected_sha1)); ++ return -1; ++ } ++ ++ return 0; ++} ++ ++int read_loose_object(const char *path, ++ const unsigned char *expected_sha1, ++ enum object_type *type, ++ unsigned long *size, ++ void **contents) ++{ ++ int ret = -1; ++ int fd = -1; ++ void *map = NULL; ++ unsigned long mapsize; ++ git_zstream stream; ++ char hdr[32]; ++ ++ *contents = NULL; ++ ++ map = map_sha1_file_1(path, NULL, &mapsize); ++ if (!map) { ++ error("unable to mmap %s: %s", path, strerror(errno)); ++ goto out; ++ } ++ ++ if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) { ++ error("unable to unpack header of %s", path); ++ goto out; ++ } ++ ++ *type = parse_sha1_header(hdr, size); ++ if (*type < 0) { ++ error("unable to parse header of %s", path); ++ git_inflate_end(&stream); ++ goto out; ++ } ++ ++ if (*type == OBJ_BLOB && *size > big_file_threshold) { ++ if (check_stream_sha1(&stream, hdr, *size, path, expected_sha1) < 0) ++ goto out; ++ } else { ++ *contents = unpack_sha1_rest(&stream, hdr, *size, expected_sha1); ++ if (!*contents) { ++ error("unable to unpack contents of %s", path); ++ git_inflate_end(&stream); ++ goto out; ++ } ++ if (check_sha1_signature(expected_sha1, *contents, ++ *size, typename(*type))) { ++ error("sha1 mismatch for %s (expected %s)", path, ++ sha1_to_hex(expected_sha1)); ++ free(*contents); ++ goto out; ++ } ++ } ++ ++ ret = 0; /* everything checks out */ ++ ++out: ++ if (map) ++ munmap(map, mapsize); ++ if (fd >= 0) ++ close(fd); ++ return ret; ++} +diff --git a/submodule.c b/submodule.c +index 1821a5b..6337cab 100644 +--- a/submodule.c ++++ b/submodule.c +@@ -124,6 +124,31 @@ void gitmodules_config(void) + } + } + ++int check_submodule_name(const char *name) ++{ ++ /* Disallow empty names */ ++ if (!*name) ++ return -1; ++ ++ /* ++ * Look for '..' as a path component. Check both '/' and '\\' as ++ * separators rather than is_dir_sep(), because we want the name rules ++ * to be consistent across platforms. ++ */ ++ goto in_component; /* always start inside component */ ++ while (*name) { ++ char c = *name++; ++ if (c == '/' || c == '\\') { ++in_component: ++ if (name[0] == '.' && name[1] == '.' && ++ (!name[2] || name[2] == '/' || name[2] == '\\')) ++ return -1; ++ } ++ } ++ ++ return 0; ++} ++ + int parse_submodule_config_option(const char *var, const char *value) + { + struct string_list_item *config; +@@ -132,6 +157,10 @@ int parse_submodule_config_option(const char *var, const char *value) + + if (parse_config_key(var, "submodule", &name, &namelen, &key) < 0 || !name) + return 0; ++ if (check_submodule_name(name) < 0) { ++ warning(_("ignoring suspicious submodule name: %s"), name); ++ return 0; ++ } + + if (!strcmp(key, "path")) { + config = unsorted_string_list_lookup(&config_name_for_path, value); +diff --git a/submodule.h b/submodule.h +index c7ffc7c..59dbdfb 100644 +--- a/submodule.h ++++ b/submodule.h +@@ -37,4 +37,11 @@ int find_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_nam + struct string_list *needs_pushing); + int push_unpushed_submodules(unsigned char new_sha1[20], const char *remotes_name); + ++/* ++ * Returns 0 if the name is syntactically acceptable as a submodule "name" ++ * (e.g., that may be found in the subsection of a .gitmodules file) and -1 ++ * otherwise. ++ */ ++int check_submodule_name(const char *name); ++ + #endif +diff --git a/t/lib-pack.sh b/t/lib-pack.sh +new file mode 100644 +index 0000000..4674899 +--- /dev/null ++++ b/t/lib-pack.sh +@@ -0,0 +1,110 @@ ++# Support routines for hand-crafting weird or malicious packs. ++# ++# You can make a complete pack like: ++# ++# pack_header 2 >foo.pack && ++# pack_obj e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 >>foo.pack && ++# pack_obj e68fe8129b546b101aee9510c5328e7f21ca1d18 >>foo.pack && ++# pack_trailer foo.pack ++ ++# Print the big-endian 4-byte octal representation of $1 ++uint32_octal () { ++ n=$1 ++ printf '\\%o' $(($n / 16777216)); n=$((n % 16777216)) ++ printf '\\%o' $(($n / 65536)); n=$((n % 65536)) ++ printf '\\%o' $(($n / 256)); n=$((n % 256)) ++ printf '\\%o' $(($n )); ++} ++ ++# Print the big-endian 4-byte binary representation of $1 ++uint32_binary () { ++ printf "$(uint32_octal "$1")" ++} ++ ++# Print a pack header, version 2, for a pack with $1 objects ++pack_header () { ++ printf 'PACK' && ++ printf '\0\0\0\2' && ++ uint32_binary "$1" ++} ++ ++# Print the pack data for object $1, as a delta against object $2 (or as a full ++# object if $2 is missing or empty). The output is suitable for including ++# directly in the packfile, and represents the entirety of the object entry. ++# Doing this on the fly (especially picking your deltas) is quite tricky, so we ++# have hardcoded some well-known objects. See the case statements below for the ++# complete list. ++pack_obj () { ++ case "$1" in ++ # empty blob ++ e69de29bb2d1d6434b8b29ae775ad8c2e48c5391) ++ case "$2" in ++ '') ++ printf '\060\170\234\003\0\0\0\0\1' ++ return ++ ;; ++ esac ++ ;; ++ ++ # blob containing "\7\76" ++ e68fe8129b546b101aee9510c5328e7f21ca1d18) ++ case "$2" in ++ '') ++ printf '\062\170\234\143\267\3\0\0\116\0\106' ++ return ++ ;; ++ 01d7713666f4de822776c7622c10f1b07de280dc) ++ printf '\165\1\327\161\66\146\364\336\202\47\166' && ++ printf '\307\142\54\20\361\260\175\342\200\334\170' && ++ printf '\234\143\142\142\142\267\003\0\0\151\0\114' ++ return ++ ;; ++ esac ++ ;; ++ ++ # blob containing "\7\0" ++ 01d7713666f4de822776c7622c10f1b07de280dc) ++ case "$2" in ++ '') ++ printf '\062\170\234\143\147\0\0\0\20\0\10' ++ return ++ ;; ++ e68fe8129b546b101aee9510c5328e7f21ca1d18) ++ printf '\165\346\217\350\22\233\124\153\20\32\356' && ++ printf '\225\20\305\62\216\177\41\312\35\30\170\234' && ++ printf '\143\142\142\142\147\0\0\0\53\0\16' ++ return ++ ;; ++ esac ++ ;; ++ esac ++ ++ # If it's not a delta, we can convince pack-objects to generate a pack ++ # with just our entry, and then strip off the header (12 bytes) and ++ # trailer (20 bytes). ++ if test -z "$2" ++ then ++ echo "$1" | git pack-objects --stdout >pack_obj.tmp && ++ size=$(wc -c &2 "BUG: don't know how to print $1${2:+ (from $2)}" ++ return 1 ++} ++ ++# Compute and append pack trailer to "$1" ++pack_trailer () { ++ test-sha1 -b <"$1" >trailer.tmp && ++ cat trailer.tmp >>"$1" && ++ rm -f trailer.tmp ++} ++ ++# Remove any existing packs to make sure that ++# whatever we index next will be the pack that we ++# actually use. ++clear_packs () { ++ rm -f .git/objects/pack/* ++} +diff --git a/t/t0011-hashmap.sh b/t/t0011-hashmap.sh +new file mode 100755 +index 0000000..391e2b6 +--- /dev/null ++++ b/t/t0011-hashmap.sh +@@ -0,0 +1,240 @@ ++#!/bin/sh ++ ++test_description='test hashmap and string hash functions' ++. ./test-lib.sh ++ ++test_hashmap() { ++ echo "$1" | test-hashmap $3 > actual && ++ echo "$2" > expect && ++ test_cmp expect actual ++} ++ ++test_expect_success 'hash functions' ' ++ ++test_hashmap "hash key1" "2215982743 2215982743 116372151 116372151" && ++test_hashmap "hash key2" "2215982740 2215982740 116372148 116372148" && ++test_hashmap "hash fooBarFrotz" "1383912807 1383912807 3189766727 3189766727" && ++test_hashmap "hash foobarfrotz" "2862305959 2862305959 3189766727 3189766727" ++ ++' ++ ++test_expect_success 'put' ' ++ ++test_hashmap "put key1 value1 ++put key2 value2 ++put fooBarFrotz value3 ++put foobarfrotz value4 ++size" "NULL ++NULL ++NULL ++NULL ++64 4" ++ ++' ++ ++test_expect_success 'put (case insensitive)' ' ++ ++test_hashmap "put key1 value1 ++put key2 value2 ++put fooBarFrotz value3 ++size" "NULL ++NULL ++NULL ++64 3" ignorecase ++ ++' ++ ++test_expect_success 'replace' ' ++ ++test_hashmap "put key1 value1 ++put key1 value2 ++put fooBarFrotz value3 ++put fooBarFrotz value4 ++size" "NULL ++value1 ++NULL ++value3 ++64 2" ++ ++' ++ ++test_expect_success 'replace (case insensitive)' ' ++ ++test_hashmap "put key1 value1 ++put Key1 value2 ++put fooBarFrotz value3 ++put foobarfrotz value4 ++size" "NULL ++value1 ++NULL ++value3 ++64 2" ignorecase ++ ++' ++ ++test_expect_success 'get' ' ++ ++test_hashmap "put key1 value1 ++put key2 value2 ++put fooBarFrotz value3 ++put foobarfrotz value4 ++get key1 ++get key2 ++get fooBarFrotz ++get notInMap" "NULL ++NULL ++NULL ++NULL ++value1 ++value2 ++value3 ++NULL" ++ ++' ++ ++test_expect_success 'get (case insensitive)' ' ++ ++test_hashmap "put key1 value1 ++put key2 value2 ++put fooBarFrotz value3 ++get Key1 ++get keY2 ++get foobarfrotz ++get notInMap" "NULL ++NULL ++NULL ++value1 ++value2 ++value3 ++NULL" ignorecase ++ ++' ++ ++test_expect_success 'add' ' ++ ++test_hashmap "add key1 value1 ++add key1 value2 ++add fooBarFrotz value3 ++add fooBarFrotz value4 ++get key1 ++get fooBarFrotz ++get notInMap" "value2 ++value1 ++value4 ++value3 ++NULL" ++ ++' ++ ++test_expect_success 'add (case insensitive)' ' ++ ++test_hashmap "add key1 value1 ++add Key1 value2 ++add fooBarFrotz value3 ++add foobarfrotz value4 ++get key1 ++get Foobarfrotz ++get notInMap" "value2 ++value1 ++value4 ++value3 ++NULL" ignorecase ++ ++' ++ ++test_expect_success 'remove' ' ++ ++test_hashmap "put key1 value1 ++put key2 value2 ++put fooBarFrotz value3 ++remove key1 ++remove key2 ++remove notInMap ++size" "NULL ++NULL ++NULL ++value1 ++value2 ++NULL ++64 1" ++ ++' ++ ++test_expect_success 'remove (case insensitive)' ' ++ ++test_hashmap "put key1 value1 ++put key2 value2 ++put fooBarFrotz value3 ++remove Key1 ++remove keY2 ++remove notInMap ++size" "NULL ++NULL ++NULL ++value1 ++value2 ++NULL ++64 1" ignorecase ++ ++' ++ ++test_expect_success 'iterate' ' ++ ++test_hashmap "put key1 value1 ++put key2 value2 ++put fooBarFrotz value3 ++iterate" "NULL ++NULL ++NULL ++key2 value2 ++key1 value1 ++fooBarFrotz value3" ++ ++' ++ ++test_expect_success 'iterate (case insensitive)' ' ++ ++test_hashmap "put key1 value1 ++put key2 value2 ++put fooBarFrotz value3 ++iterate" "NULL ++NULL ++NULL ++fooBarFrotz value3 ++key2 value2 ++key1 value1" ignorecase ++ ++' ++ ++test_expect_success 'grow / shrink' ' ++ ++ rm -f in && ++ rm -f expect && ++ for n in $(test_seq 51) ++ do ++ echo put key$n value$n >> in && ++ echo NULL >> expect ++ done && ++ echo size >> in && ++ echo 64 51 >> expect && ++ echo put key52 value52 >> in && ++ echo NULL >> expect ++ echo size >> in && ++ echo 256 52 >> expect && ++ for n in $(test_seq 12) ++ do ++ echo remove key$n >> in && ++ echo value$n >> expect ++ done && ++ echo size >> in && ++ echo 256 40 >> expect && ++ echo remove key40 >> in && ++ echo value40 >> expect && ++ echo size >> in && ++ echo 64 39 >> expect && ++ cat in | test-hashmap > out && ++ test_cmp expect out ++ ++' ++ ++test_done +diff --git a/t/t1307-config-blob.sh b/t/t1307-config-blob.sh +new file mode 100755 +index 0000000..fdc257e +--- /dev/null ++++ b/t/t1307-config-blob.sh +@@ -0,0 +1,70 @@ ++#!/bin/sh ++ ++test_description='support for reading config from a blob' ++. ./test-lib.sh ++ ++test_expect_success 'create config blob' ' ++ cat >config <<-\EOF && ++ [some] ++ value = 1 ++ EOF ++ git add config && ++ git commit -m foo ++' ++ ++test_expect_success 'list config blob contents' ' ++ echo some.value=1 >expect && ++ git config --blob=HEAD:config --list >actual && ++ test_cmp expect actual ++' ++ ++test_expect_success 'fetch value from blob' ' ++ echo true >expect && ++ git config --blob=HEAD:config --bool some.value >actual && ++ test_cmp expect actual ++' ++ ++test_expect_success 'reading non-existing value from blob is an error' ' ++ test_must_fail git config --blob=HEAD:config non.existing ++' ++ ++test_expect_success 'reading from blob and file is an error' ' ++ test_must_fail git config --blob=HEAD:config --system --list ++' ++ ++test_expect_success 'reading from missing ref is an error' ' ++ test_must_fail git config --blob=HEAD:doesnotexist --list ++' ++ ++test_expect_success 'reading from non-blob is an error' ' ++ test_must_fail git config --blob=HEAD --list ++' ++ ++test_expect_success 'setting a value in a blob is an error' ' ++ test_must_fail git config --blob=HEAD:config some.value foo ++' ++ ++test_expect_success 'deleting a value in a blob is an error' ' ++ test_must_fail git config --blob=HEAD:config --unset some.value ++' ++ ++test_expect_success 'editing a blob is an error' ' ++ test_must_fail git config --blob=HEAD:config --edit ++' ++ ++test_expect_success 'parse errors in blobs are properly attributed' ' ++ cat >config <<-\EOF && ++ [some] ++ value = " ++ EOF ++ git add config && ++ git commit -m broken && ++ ++ test_must_fail git config --blob=HEAD:config some.value 2>err && ++ ++ # just grep for our token as the exact error message is likely to ++ # change or be internationalized ++ grep "HEAD:config" err ++' ++ ++test_done +diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh +index d730734..9dfa4b0 100755 +--- a/t/t1450-fsck.sh ++++ b/t/t1450-fsck.sh +@@ -69,7 +69,7 @@ test_expect_success 'object with bad sha1' ' + git update-ref refs/heads/bogus $cmt && + test_when_finished "git update-ref -d refs/heads/bogus" && + +- test_might_fail git fsck 2>out && ++ test_must_fail git fsck 2>out && + cat out && + grep "$sha.*corrupt" out + ' +@@ -101,7 +101,7 @@ test_expect_success 'email with embedded > is not okay' ' + test_when_finished "remove_object $new" && + git update-ref refs/heads/bogus "$new" && + test_when_finished "git update-ref -d refs/heads/bogus" && +- git fsck 2>out && ++ test_must_fail git fsck 2>out && + cat out && + grep "error in commit $new" out + ' +@@ -113,7 +113,7 @@ test_expect_success 'missing < email delimiter is reported nicely' ' + test_when_finished "remove_object $new" && + git update-ref refs/heads/bogus "$new" && + test_when_finished "git update-ref -d refs/heads/bogus" && +- git fsck 2>out && ++ test_must_fail git fsck 2>out && + cat out && + grep "error in commit $new.* - bad name" out + ' +@@ -125,7 +125,7 @@ test_expect_success 'missing email is reported nicely' ' + test_when_finished "remove_object $new" && + git update-ref refs/heads/bogus "$new" && + test_when_finished "git update-ref -d refs/heads/bogus" && +- git fsck 2>out && ++ test_must_fail git fsck 2>out && + cat out && + grep "error in commit $new.* - missing email" out + ' +@@ -137,11 +137,33 @@ test_expect_success '> in name is reported' ' + test_when_finished "remove_object $new" && + git update-ref refs/heads/bogus "$new" && + test_when_finished "git update-ref -d refs/heads/bogus" && +- git fsck 2>out && ++ test_must_fail git fsck 2>out && + cat out && + grep "error in commit $new" out + ' + ++test_expect_success 'malformatted tree object' ' ++ test_when_finished "git update-ref -d refs/tags/wrong" && ++ test_when_finished "for i in \$T; do remove_object \$i; done" && ++ T=$( ++ GIT_INDEX_FILE=test-index && ++ export GIT_INDEX_FILE && ++ rm -f test-index && ++ >x && ++ git add x && ++ git rev-parse :x && ++ T=$(git write-tree) && ++ echo $T && ++ ( ++ git cat-file tree $T && ++ git cat-file tree $T ++ ) | ++ git hash-object -w -t tree --stdin ++ ) && ++ test_must_fail git fsck 2>out && ++ grep "error in tree .*contains duplicate file entries" out ++' ++ + test_expect_success 'tag pointing to nonexistent' ' + cat >invalid-tag <<-\EOF && + object ffffffffffffffffffffffffffffffffffffffff +@@ -268,4 +290,20 @@ test_expect_success 'fsck notices ".git" in trees' ' + ) + ' + ++test_expect_success 'fsck finds problems in duplicate loose objects' ' ++ rm -rf broken-duplicate && ++ git init broken-duplicate && ++ ( ++ cd broken-duplicate && ++ test_commit duplicate && ++ # no "-d" here, so we end up with duplicates ++ git repack && ++ # now corrupt the loose copy ++ file=$(sha1_file "$(git rev-parse HEAD)") && ++ rm "$file" && ++ echo broken >"$file" && ++ test_must_fail git fsck ++ ) ++' ++ + test_done +diff --git a/t/t4122-apply-symlink-inside.sh b/t/t4122-apply-symlink-inside.sh +index 3940737..b5832e5 100755 +--- a/t/t4122-apply-symlink-inside.sh ++++ b/t/t4122-apply-symlink-inside.sh +@@ -52,4 +52,110 @@ test_expect_success SYMLINKS 'check result' ' + + ' + ++test_expect_success SYMLINKS 'do not read from beyond symbolic link' ' ++ git reset --hard && ++ mkdir -p arch/x86_64/dir && ++ >arch/x86_64/dir/file && ++ git add arch/x86_64/dir/file && ++ echo line >arch/x86_64/dir/file && ++ git diff >patch && ++ git reset --hard && ++ ++ mkdir arch/i386/dir && ++ >arch/i386/dir/file && ++ ln -s ../i386/dir arch/x86_64/dir && ++ ++ test_must_fail git apply patch && ++ test_must_fail git apply --cached patch && ++ test_must_fail git apply --index patch ++ ++' ++ ++test_expect_success SYMLINKS 'do not follow symbolic link (setup)' ' ++ ++ rm -rf arch/i386/dir arch/x86_64/dir && ++ git reset --hard && ++ ln -s ../i386/dir arch/x86_64/dir && ++ git add arch/x86_64/dir && ++ git diff HEAD >add_symlink.patch && ++ git reset --hard && ++ ++ mkdir arch/x86_64/dir && ++ >arch/x86_64/dir/file && ++ git add arch/x86_64/dir/file && ++ git diff HEAD >add_file.patch && ++ git diff -R HEAD >del_file.patch && ++ git reset --hard && ++ rm -fr arch/x86_64/dir && ++ ++ cat add_symlink.patch add_file.patch >patch && ++ cat add_symlink.patch del_file.patch >tricky_del && ++ ++ mkdir arch/i386/dir ++' ++ ++test_expect_success SYMLINKS 'do not follow symbolic link (same input)' ' ++ ++ # same input creates a confusing symbolic link ++ test_must_fail git apply patch 2>error-wt && ++ test_i18ngrep "beyond a symbolic link" error-wt && ++ test_path_is_missing arch/x86_64/dir && ++ test_path_is_missing arch/i386/dir/file && ++ ++ test_must_fail git apply --index patch 2>error-ix && ++ test_i18ngrep "beyond a symbolic link" error-ix && ++ test_path_is_missing arch/x86_64/dir && ++ test_path_is_missing arch/i386/dir/file && ++ test_must_fail git ls-files --error-unmatch arch/x86_64/dir && ++ test_must_fail git ls-files --error-unmatch arch/i386/dir && ++ ++ test_must_fail git apply --cached patch 2>error-ct && ++ test_i18ngrep "beyond a symbolic link" error-ct && ++ test_must_fail git ls-files --error-unmatch arch/x86_64/dir && ++ test_must_fail git ls-files --error-unmatch arch/i386/dir && ++ ++ >arch/i386/dir/file && ++ git add arch/i386/dir/file && ++ ++ test_must_fail git apply tricky_del && ++ test_path_is_file arch/i386/dir/file && ++ ++ test_must_fail git apply --index tricky_del && ++ test_path_is_file arch/i386/dir/file && ++ test_must_fail git ls-files --error-unmatch arch/x86_64/dir && ++ git ls-files --error-unmatch arch/i386/dir && ++ ++ test_must_fail git apply --cached tricky_del && ++ test_must_fail git ls-files --error-unmatch arch/x86_64/dir && ++ git ls-files --error-unmatch arch/i386/dir ++' ++ ++test_expect_success SYMLINKS 'do not follow symbolic link (existing)' ' ++ ++ # existing symbolic link ++ git reset --hard && ++ ln -s ../i386/dir arch/x86_64/dir && ++ git add arch/x86_64/dir && ++ ++ test_must_fail git apply add_file.patch 2>error-wt-add && ++ test_i18ngrep "beyond a symbolic link" error-wt-add && ++ test_path_is_missing arch/i386/dir/file && ++ ++ mkdir arch/i386/dir && ++ >arch/i386/dir/file && ++ test_must_fail git apply del_file.patch 2>error-wt-del && ++ test_i18ngrep "beyond a symbolic link" error-wt-del && ++ test_path_is_file arch/i386/dir/file && ++ rm arch/i386/dir/file && ++ ++ test_must_fail git apply --index add_file.patch 2>error-ix-add && ++ test_i18ngrep "beyond a symbolic link" error-ix-add && ++ test_path_is_missing arch/i386/dir/file && ++ test_must_fail git ls-files --error-unmatch arch/i386/dir && ++ ++ test_must_fail git apply --cached add_file.patch 2>error-ct-file && ++ test_i18ngrep "beyond a symbolic link" error-ct-file && ++ test_must_fail git ls-files --error-unmatch arch/i386/dir ++' ++ + test_done +diff --git a/t/t4139-apply-escape.sh b/t/t4139-apply-escape.sh +new file mode 100755 +index 0000000..45b5660 +--- /dev/null ++++ b/t/t4139-apply-escape.sh +@@ -0,0 +1,141 @@ ++#!/bin/sh ++ ++test_description='paths written by git-apply cannot escape the working tree' ++. ./test-lib.sh ++ ++# tests will try to write to ../foo, and we do not ++# want them to escape the trash directory when they ++# fail ++test_expect_success 'bump git repo one level down' ' ++ mkdir inside && ++ mv .git inside/ && ++ cd inside ++' ++ ++# $1 = name of file ++# $2 = current path to file (if different) ++mkpatch_add () { ++ rm -f "${2:-$1}" && ++ cat <<-EOF ++ diff --git a/$1 b/$1 ++ new file mode 100644 ++ index 0000000..53c74cd ++ --- /dev/null ++ +++ b/$1 ++ @@ -0,0 +1 @@ ++ +evil ++ EOF ++} ++ ++mkpatch_del () { ++ echo evil >"${2:-$1}" && ++ cat <<-EOF ++ diff --git a/$1 b/$1 ++ deleted file mode 100644 ++ index 53c74cd..0000000 ++ --- a/$1 ++ +++ /dev/null ++ @@ -1 +0,0 @@ ++ -evil ++ EOF ++} ++ ++# $1 = name of file ++# $2 = content of symlink ++mkpatch_symlink () { ++ rm -f "$1" && ++ cat <<-EOF ++ diff --git a/$1 b/$1 ++ new file mode 120000 ++ index 0000000..$(printf "%s" "$2" | git hash-object --stdin) ++ --- /dev/null ++ +++ b/$1 ++ @@ -0,0 +1 @@ ++ +$2 ++ \ No newline at end of file ++ EOF ++} ++ ++test_expect_success 'cannot create file containing ..' ' ++ mkpatch_add ../foo >patch && ++ test_must_fail git apply patch && ++ test_path_is_missing ../foo ++' ++ ++test_expect_success 'can create file containing .. with --unsafe-paths' ' ++ mkpatch_add ../foo >patch && ++ git apply --unsafe-paths patch && ++ test_path_is_file ../foo ++' ++ ++test_expect_success 'cannot create file containing .. (index)' ' ++ mkpatch_add ../foo >patch && ++ test_must_fail git apply --index patch && ++ test_path_is_missing ../foo ++' ++ ++test_expect_success 'cannot create file containing .. with --unsafe-paths (index)' ' ++ mkpatch_add ../foo >patch && ++ test_must_fail git apply --index --unsafe-paths patch && ++ test_path_is_missing ../foo ++' ++ ++test_expect_success 'cannot delete file containing ..' ' ++ mkpatch_del ../foo >patch && ++ test_must_fail git apply patch && ++ test_path_is_file ../foo ++' ++ ++test_expect_success 'can delete file containing .. with --unsafe-paths' ' ++ mkpatch_del ../foo >patch && ++ git apply --unsafe-paths patch && ++ test_path_is_missing ../foo ++' ++ ++test_expect_success 'cannot delete file containing .. (index)' ' ++ mkpatch_del ../foo >patch && ++ test_must_fail git apply --index patch && ++ test_path_is_file ../foo ++' ++ ++test_expect_success SYMLINKS 'symlink escape via ..' ' ++ { ++ mkpatch_symlink tmp .. && ++ mkpatch_add tmp/foo ../foo ++ } >patch && ++ test_must_fail git apply patch && ++ test_path_is_missing tmp && ++ test_path_is_missing ../foo ++' ++ ++test_expect_success SYMLINKS 'symlink escape via .. (index)' ' ++ { ++ mkpatch_symlink tmp .. && ++ mkpatch_add tmp/foo ../foo ++ } >patch && ++ test_must_fail git apply --index patch && ++ test_path_is_missing tmp && ++ test_path_is_missing ../foo ++' ++ ++test_expect_success SYMLINKS 'symlink escape via absolute path' ' ++ { ++ mkpatch_symlink tmp "$(pwd)" && ++ mkpatch_add tmp/foo ../foo ++ } >patch && ++ test_must_fail git apply patch && ++ test_path_is_missing tmp && ++ test_path_is_missing ../foo ++' ++ ++test_expect_success SYMLINKS 'symlink escape via absolute path (index)' ' ++ { ++ mkpatch_symlink tmp "$(pwd)" && ++ mkpatch_add tmp/foo ../foo ++ } >patch && ++ test_must_fail git apply --index patch && ++ test_path_is_missing tmp && ++ test_path_is_missing ../foo ++' ++ ++test_done +diff --git a/t/t7415-submodule-names.sh b/t/t7415-submodule-names.sh +new file mode 100755 +index 0000000..a1919b3 +--- /dev/null ++++ b/t/t7415-submodule-names.sh +@@ -0,0 +1,154 @@ ++#!/bin/sh ++ ++test_description='check handling of .. in submodule names ++ ++Exercise the name-checking function on a variety of names, and then give a ++real-world setup that confirms we catch this in practice. ++' ++. ./test-lib.sh ++. "$TEST_DIRECTORY"/lib-pack.sh ++ ++test_expect_success 'check names' ' ++ cat >expect <<-\EOF && ++ valid ++ valid/with/paths ++ EOF ++ ++ git submodule--helper check-name >actual <<-\EOF && ++ valid ++ valid/with/paths ++ ++ ../foo ++ /../foo ++ ..\foo ++ \..\foo ++ foo/.. ++ foo/../ ++ foo\.. ++ foo\..\ ++ foo/../bar ++ EOF ++ ++ test_cmp expect actual ++' ++ ++test_expect_success 'create innocent subrepo' ' ++ git init innocent && ++ ( cd innocent && git commit --allow-empty -m foo ) ++' ++ ++test_expect_success 'submodule add refuses invalid names' ' ++ test_must_fail \ ++ git submodule add --name ../../modules/evil "$PWD/innocent" evil ++' ++ ++test_expect_success 'add evil submodule' ' ++ git submodule add "$PWD/innocent" evil && ++ ++ mkdir modules && ++ cp -r .git/modules/evil modules && ++ write_script modules/evil/hooks/post-checkout <<-\EOF && ++ echo >&2 "RUNNING POST CHECKOUT" ++ EOF ++ ++ git config -f .gitmodules submodule.evil.update checkout && ++ git config -f .gitmodules --rename-section \ ++ submodule.evil submodule.../../modules/evil && ++ git add modules && ++ git commit -am evil ++' ++ ++# This step seems like it shouldn't be necessary, since the payload is ++# contained entirely in the evil submodule. But due to the vagaries of the ++# submodule code, checking out the evil module will fail unless ".git/modules" ++# exists. Adding another submodule (with a name that sorts before "evil") is an ++# easy way to make sure this is the case in the victim clone. ++test_expect_success 'add other submodule' ' ++ git submodule add "$PWD/innocent" another-module && ++ git add another-module && ++ git commit -am another ++' ++ ++test_expect_success 'clone evil superproject' ' ++ test_might_fail git clone --recurse-submodules . victim >output 2>&1 && ++ cat output && ++ ! grep "RUNNING POST CHECKOUT" output ++' ++ ++test_expect_success 'fsck detects evil superproject' ' ++ test_must_fail git fsck ++' ++ ++test_expect_success 'transfer.fsckObjects detects evil superproject (unpack)' ' ++ rm -rf dst.git && ++ git init --bare dst.git && ++ ( cd dst.git && git config transfer.fsckObjects true ) && ++ test_must_fail git push dst.git HEAD ++' ++ ++test_expect_success 'transfer.fsckObjects detects evil superproject (index)' ' ++ rm -rf dst.git && ++ git init --bare dst.git && ++ ( cd dst.git && git config transfer.fsckObjects true && ++ git config transfer.unpackLimit 1 ) && ++ test_must_fail git push dst.git HEAD ++' ++ ++# Normally our packs contain commits followed by trees followed by blobs. This ++# reverses the order, which requires backtracking to find the context of a ++# blob. We'll start with a fresh gitmodules-only tree to make it simpler. ++test_expect_success 'create oddly ordered pack' ' ++ git checkout --orphan odd && ++ git rm -rf --cached . && ++ git add .gitmodules && ++ git commit -m odd && ++ { ++ pack_header 3 && ++ pack_obj $(git rev-parse HEAD:.gitmodules) && ++ pack_obj $(git rev-parse HEAD^{tree}) && ++ pack_obj $(git rev-parse HEAD) ++ } >odd.pack && ++ pack_trailer odd.pack ++' ++ ++test_expect_success 'transfer.fsckObjects handles odd pack (unpack)' ' ++ rm -rf dst.git && ++ git init --bare dst.git && ++ ( cd dst.git && test_must_fail git unpack-objects --strict ) output && ++ test_i18ngrep "is a symbolic link" output ++ ) ++' ++ ++test_done +diff --git a/test-hashmap.c b/test-hashmap.c +new file mode 100644 +index 0000000..1a7ac2c +--- /dev/null ++++ b/test-hashmap.c +@@ -0,0 +1,335 @@ ++#include "cache.h" ++#include "hashmap.h" ++#include ++ ++struct test_entry ++{ ++ struct hashmap_entry ent; ++ /* key and value as two \0-terminated strings */ ++ char key[FLEX_ARRAY]; ++}; ++ ++static const char *get_value(const struct test_entry *e) ++{ ++ return e->key + strlen(e->key) + 1; ++} ++ ++static int test_entry_cmp(const struct test_entry *e1, ++ const struct test_entry *e2, const char* key) ++{ ++ return strcmp(e1->key, key ? key : e2->key); ++} ++ ++static int test_entry_cmp_icase(const struct test_entry *e1, ++ const struct test_entry *e2, const char* key) ++{ ++ return strcasecmp(e1->key, key ? key : e2->key); ++} ++ ++static struct test_entry *alloc_test_entry(int hash, char *key, int klen, ++ char *value, int vlen) ++{ ++ struct test_entry *entry = malloc(sizeof(struct test_entry) + klen ++ + vlen + 2); ++ hashmap_entry_init(entry, hash); ++ memcpy(entry->key, key, klen + 1); ++ memcpy(entry->key + klen + 1, value, vlen + 1); ++ return entry; ++} ++ ++#define HASH_METHOD_FNV 0 ++#define HASH_METHOD_I 1 ++#define HASH_METHOD_IDIV10 2 ++#define HASH_METHOD_0 3 ++#define HASH_METHOD_X2 4 ++#define TEST_SPARSE 8 ++#define TEST_ADD 16 ++#define TEST_SIZE 100000 ++ ++static unsigned int hash(unsigned int method, unsigned int i, const char *key) ++{ ++ unsigned int hash; ++ switch (method & 3) ++ { ++ case HASH_METHOD_FNV: ++ hash = strhash(key); ++ break; ++ case HASH_METHOD_I: ++ hash = i; ++ break; ++ case HASH_METHOD_IDIV10: ++ hash = i / 10; ++ break; ++ case HASH_METHOD_0: ++ hash = 0; ++ break; ++ } ++ ++ if (method & HASH_METHOD_X2) ++ hash = 2 * hash; ++ return hash; ++} ++ ++/* ++ * Test performance of hashmap.[ch] ++ * Usage: time echo "perfhashmap method rounds" | test-hashmap ++ */ ++static void perf_hashmap(unsigned int method, unsigned int rounds) ++{ ++ struct hashmap map; ++ char buf[16]; ++ struct test_entry **entries; ++ unsigned int *hashes; ++ unsigned int i, j; ++ ++ entries = malloc(TEST_SIZE * sizeof(struct test_entry *)); ++ hashes = malloc(TEST_SIZE * sizeof(int)); ++ for (i = 0; i < TEST_SIZE; i++) { ++ snprintf(buf, sizeof(buf), "%i", i); ++ entries[i] = alloc_test_entry(0, buf, strlen(buf), "", 0); ++ hashes[i] = hash(method, i, entries[i]->key); ++ } ++ ++ if (method & TEST_ADD) { ++ /* test adding to the map */ ++ for (j = 0; j < rounds; j++) { ++ hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp, 0); ++ ++ /* add entries */ ++ for (i = 0; i < TEST_SIZE; i++) { ++ hashmap_entry_init(entries[i], hashes[i]); ++ hashmap_add(&map, entries[i]); ++ } ++ ++ hashmap_free(&map, 0); ++ } ++ } else { ++ /* test map lookups */ ++ hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp, 0); ++ ++ /* fill the map (sparsely if specified) */ ++ j = (method & TEST_SPARSE) ? TEST_SIZE / 10 : TEST_SIZE; ++ for (i = 0; i < j; i++) { ++ hashmap_entry_init(entries[i], hashes[i]); ++ hashmap_add(&map, entries[i]); ++ } ++ ++ for (j = 0; j < rounds; j++) { ++ for (i = 0; i < TEST_SIZE; i++) { ++ hashmap_get_from_hash(&map, hashes[i], ++ entries[i]->key); ++ } ++ } ++ ++ hashmap_free(&map, 0); ++ } ++} ++ ++struct hash_entry ++{ ++ struct hash_entry *next; ++ char key[FLEX_ARRAY]; ++}; ++ ++/* ++ * Test performance of hash.[ch] ++ * Usage: time echo "perfhash method rounds" | test-hashmap ++ */ ++static void perf_hash(unsigned int method, unsigned int rounds) ++{ ++ struct hash_table map; ++ char buf[16]; ++ struct hash_entry **entries, **res, *entry; ++ unsigned int *hashes; ++ unsigned int i, j; ++ ++ entries = malloc(TEST_SIZE * sizeof(struct hash_entry *)); ++ hashes = malloc(TEST_SIZE * sizeof(int)); ++ for (i = 0; i < TEST_SIZE; i++) { ++ snprintf(buf, sizeof(buf), "%i", i); ++ entries[i] = malloc(sizeof(struct hash_entry) + strlen(buf) + 1); ++ strcpy(entries[i]->key, buf); ++ hashes[i] = hash(method, i, entries[i]->key); ++ } ++ ++ if (method & TEST_ADD) { ++ /* test adding to the map */ ++ for (j = 0; j < rounds; j++) { ++ init_hash(&map); ++ ++ /* add entries */ ++ for (i = 0; i < TEST_SIZE; i++) { ++ res = (struct hash_entry **) insert_hash( ++ hashes[i], entries[i], &map); ++ if (res) { ++ entries[i]->next = *res; ++ *res = entries[i]; ++ } else { ++ entries[i]->next = NULL; ++ } ++ } ++ ++ free_hash(&map); ++ } ++ } else { ++ /* test map lookups */ ++ init_hash(&map); ++ ++ /* fill the map (sparsely if specified) */ ++ j = (method & TEST_SPARSE) ? TEST_SIZE / 10 : TEST_SIZE; ++ for (i = 0; i < j; i++) { ++ res = (struct hash_entry **) insert_hash(hashes[i], ++ entries[i], &map); ++ if (res) { ++ entries[i]->next = *res; ++ *res = entries[i]; ++ } else { ++ entries[i]->next = NULL; ++ } ++ } ++ ++ for (j = 0; j < rounds; j++) { ++ for (i = 0; i < TEST_SIZE; i++) { ++ entry = lookup_hash(hashes[i], &map); ++ while (entry) { ++ if (!strcmp(entries[i]->key, entry->key)) ++ break; ++ entry = entry->next; ++ } ++ } ++ } ++ ++ free_hash(&map); ++ ++ } ++} ++ ++#define DELIM " \t\r\n" ++ ++/* ++ * Read stdin line by line and print result of commands to stdout: ++ * ++ * hash key -> strhash(key) memhash(key) strihash(key) memihash(key) ++ * put key value -> NULL / old value ++ * get key -> NULL / value ++ * remove key -> NULL / old value ++ * iterate -> key1 value1\nkey2 value2\n... ++ * size -> tablesize numentries ++ * ++ * perfhashmap method rounds -> test hashmap.[ch] performance ++ * perfhash method rounds -> test hash.[ch] performance ++ */ ++int main(int argc, char *argv[]) ++{ ++ char line[1024]; ++ struct hashmap map; ++ int icase; ++ ++ /* init hash map */ ++ icase = argc > 1 && !strcmp("ignorecase", argv[1]); ++ hashmap_init(&map, (hashmap_cmp_fn) (icase ? test_entry_cmp_icase ++ : test_entry_cmp), 0); ++ ++ /* process commands from stdin */ ++ while (fgets(line, sizeof(line), stdin)) { ++ char *cmd, *p1 = NULL, *p2 = NULL; ++ int l1 = 0, l2 = 0, hash = 0; ++ struct test_entry *entry; ++ ++ /* break line into command and up to two parameters */ ++ cmd = strtok(line, DELIM); ++ /* ignore empty lines */ ++ if (!cmd || *cmd == '#') ++ continue; ++ ++ p1 = strtok(NULL, DELIM); ++ if (p1) { ++ l1 = strlen(p1); ++ hash = icase ? strihash(p1) : strhash(p1); ++ p2 = strtok(NULL, DELIM); ++ if (p2) ++ l2 = strlen(p2); ++ } ++ ++ if (!strcmp("hash", cmd) && l1) { ++ ++ /* print results of different hash functions */ ++ printf("%u %u %u %u\n", strhash(p1), memhash(p1, l1), ++ strihash(p1), memihash(p1, l1)); ++ ++ } else if (!strcmp("add", cmd) && l1 && l2) { ++ ++ /* create entry with key = p1, value = p2 */ ++ entry = alloc_test_entry(hash, p1, l1, p2, l2); ++ ++ /* add to hashmap */ ++ hashmap_add(&map, entry); ++ ++ } else if (!strcmp("put", cmd) && l1 && l2) { ++ ++ /* create entry with key = p1, value = p2 */ ++ entry = alloc_test_entry(hash, p1, l1, p2, l2); ++ ++ /* add / replace entry */ ++ entry = hashmap_put(&map, entry); ++ ++ /* print and free replaced entry, if any */ ++ puts(entry ? get_value(entry) : "NULL"); ++ free(entry); ++ ++ } else if (!strcmp("get", cmd) && l1) { ++ ++ /* lookup entry in hashmap */ ++ entry = hashmap_get_from_hash(&map, hash, p1); ++ ++ /* print result */ ++ if (!entry) ++ puts("NULL"); ++ while (entry) { ++ puts(get_value(entry)); ++ entry = hashmap_get_next(&map, entry); ++ } ++ ++ } else if (!strcmp("remove", cmd) && l1) { ++ ++ /* setup static key */ ++ struct hashmap_entry key; ++ hashmap_entry_init(&key, hash); ++ ++ /* remove entry from hashmap */ ++ entry = hashmap_remove(&map, &key, p1); ++ ++ /* print result and free entry*/ ++ puts(entry ? get_value(entry) : "NULL"); ++ free(entry); ++ ++ } else if (!strcmp("iterate", cmd)) { ++ ++ struct hashmap_iter iter; ++ hashmap_iter_init(&map, &iter); ++ while ((entry = hashmap_iter_next(&iter))) ++ printf("%s %s\n", entry->key, get_value(entry)); ++ ++ } else if (!strcmp("size", cmd)) { ++ ++ /* print table sizes */ ++ printf("%u %u\n", map.tablesize, map.size); ++ ++ } else if (!strcmp("perfhashmap", cmd) && l1 && l2) { ++ ++ perf_hashmap(atoi(p1), atoi(p2)); ++ ++ } else if (!strcmp("perfhash", cmd) && l1 && l2) { ++ ++ perf_hash(atoi(p1), atoi(p2)); ++ ++ } else { ++ ++ printf("Unknown command %s\n", cmd); ++ ++ } ++ } ++ ++ hashmap_free(&map, 1); ++ return 0; ++} +-- +2.14.4 + diff --git a/SPECS/git.spec b/SPECS/git.spec index 8f26a1f..a85f91e 100644 --- a/SPECS/git.spec +++ b/SPECS/git.spec @@ -51,7 +51,7 @@ Name: git Version: 1.8.3.1 -Release: 13%{?dist} +Release: 14%{?dist} Summary: Fast Version Control System License: GPLv2 Group: Development/Tools @@ -93,6 +93,7 @@ Patch12: 0001-Fix-CVE-2016-2315-CVE-2016-2324.patch Patch14: 0007-git-prompt.patch Patch15: 0008-Fix-CVE-2017-8386.patch Patch16: git-cve-2017-1000117.patch +Patch19: git-cve-2018-11235.patch BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root-%(%{__id_u} -n) @@ -348,6 +349,9 @@ Requires: emacs-git = %{version}-%{release} %patch16 -p1 %patch17 -p1 %patch18 -p1 +%patch19 -p1 + +chmod a+x t/t0011-hashmap.sh t/t1307-config-blob.sh t/t4139-apply-escape.sh t/t7415-submodule-names.sh %if %{use_prebuilt_docs} mkdir -p prebuilt_docs/{html,man} @@ -547,6 +551,17 @@ rm -f {Documentation/technical,contrib/emacs,contrib/credential/gnome-keyring}/. chmod a-x Documentation/technical/api-index.sh find contrib -type f | xargs chmod -x +%check +# Tests to skip on all releases and architectures +# t9001-send-email - Can't locate Data/Dumper.pm in @INC - prbly missing dep +GIT_SKIP_TESTS="t9001" + +export GIT_SKIP_TESTS + +# Set LANG so various UTF-8 tests are run +export LANG=en_US.UTF-8 + +make test %clean rm -rf %{buildroot} @@ -673,6 +688,11 @@ rm -rf %{buildroot} # No files for you! %changelog +* Mon Jun 18 2018 Pavel Cahyna - 1.8.3.1-14 +- Backport fix for CVE-2018-1123 +- Thanks to Jonathan Nieder for backporting to 2.1.x + and to Steve Beattie for backporting to 1.9.1 + * Wed Sep 13 2017 Petr Stodulka - 1.8.3.1-13 - fall back to Basic auth if Negotiate fails Resolves: #1490998