<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/fs/exec.c, branch v3.12.62</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v3.12.62</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v3.12.62'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2015-05-15T07:10:41+00:00</updated>
<entry>
<title>fs: take i_mutex during prepare_binprm for set[ug]id executables</title>
<updated>2015-05-15T07:10:41+00:00</updated>
<author>
<name>Jann Horn</name>
<email>jann@thejh.net</email>
</author>
<published>2015-04-19T00:48:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5176b77f1aacdc560eaeac4685ade444bb814689'/>
<id>urn:sha1:5176b77f1aacdc560eaeac4685ade444bb814689</id>
<content type='text'>
commit 8b01fc86b9f425899f8a3a8fc1c47d73c2c20543 upstream.

This prevents a race between chown() and execve(), where chowning a
setuid-user binary to root would momentarily make the binary setuid
root.

This patch was mostly written by Linus Torvalds.

Signed-off-by: Jann Horn &lt;jann@thejh.net&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Charles Williams &lt;ciwillia@brocade.com&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>mm: per-thread vma caching</title>
<updated>2014-09-26T09:51:57+00:00</updated>
<author>
<name>Davidlohr Bueso</name>
<email>davidlohr@hp.com</email>
</author>
<published>2014-08-28T18:34:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9c0073071075474cb066cca50b7b8574c1314e3d'/>
<id>urn:sha1:9c0073071075474cb066cca50b7b8574c1314e3d</id>
<content type='text'>
commit 615d6e8756c87149f2d4c1b93d471bca002bd849 upstream.

This patch is a continuation of efforts trying to optimize find_vma(),
avoiding potentially expensive rbtree walks to locate a vma upon faults.
The original approach (https://lkml.org/lkml/2013/11/1/410), where the
largest vma was also cached, ended up being too specific and random,
thus further comparison with other approaches were needed.  There are
two things to consider when dealing with this, the cache hit rate and
the latency of find_vma().  Improving the hit-rate does not necessarily
translate in finding the vma any faster, as the overhead of any fancy
caching schemes can be too high to consider.

We currently cache the last used vma for the whole address space, which
provides a nice optimization, reducing the total cycles in find_vma() by
up to 250%, for workloads with good locality.  On the other hand, this
simple scheme is pretty much useless for workloads with poor locality.
Analyzing ebizzy runs shows that, no matter how many threads are
running, the mmap_cache hit rate is less than 2%, and in many situations
below 1%.

The proposed approach is to replace this scheme with a small per-thread
cache, maximizing hit rates at a very low maintenance cost.
Invalidations are performed by simply bumping up a 32-bit sequence
number.  The only expensive operation is in the rare case of a seq
number overflow, where all caches that share the same address space are
flushed.  Upon a miss, the proposed replacement policy is based on the
page number that contains the virtual address in question.  Concretely,
the following results are seen on an 80 core, 8 socket x86-64 box:

1) System bootup: Most programs are single threaded, so the per-thread
   scheme does improve ~50% hit rate by just adding a few more slots to
   the cache.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 50.61%   | 19.90            |
| patched        | 73.45%   | 13.58            |
+----------------+----------+------------------+

2) Kernel build: This one is already pretty good with the current
   approach as we're dealing with good locality.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 75.28%   | 11.03            |
| patched        | 88.09%   | 9.31             |
+----------------+----------+------------------+

3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 70.66%   | 17.14            |
| patched        | 91.15%   | 12.57            |
+----------------+----------+------------------+

4) Ebizzy: There's a fair amount of variation from run to run, but this
   approach always shows nearly perfect hit rates, while baseline is just
   about non-existent.  The amounts of cycles can fluctuate between
   anywhere from ~60 to ~116 for the baseline scheme, but this approach
   reduces it considerably.  For instance, with 80 threads:

+----------------+----------+------------------+
| caching scheme | hit-rate | cycles (billion) |
+----------------+----------+------------------+
| baseline       | 1.06%    | 91.54            |
| patched        | 99.97%   | 14.18            |
+----------------+----------+------------------+

[akpm@linux-foundation.org: fix nommu build, per Davidlohr]
[akpm@linux-foundation.org: document vmacache_valid() logic]
[akpm@linux-foundation.org: attempt to untangle header files]
[akpm@linux-foundation.org: add vmacache_find() BUG_ON]
[hughd@google.com: add vmacache_valid_mm() (from Oleg)]
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: adjust and enhance comments]
Signed-off-by: Davidlohr Bueso &lt;davidlohr@hp.com&gt;
Reviewed-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Reviewed-by: Michel Lespinasse &lt;walken@google.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Tested-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>metag: Reduce maximum stack size to 256MB</title>
<updated>2014-06-09T13:53:41+00:00</updated>
<author>
<name>James Hogan</name>
<email>james.hogan@imgtec.com</email>
</author>
<published>2014-05-13T22:58:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3aa3ff52dc8f87de726372855a405bbea1b343c1'/>
<id>urn:sha1:3aa3ff52dc8f87de726372855a405bbea1b343c1</id>
<content type='text'>
commit d71f290b4e98a39f49f2595a13be3b4d5ce8e1f1 upstream.

Specify the maximum stack size for arches where the stack grows upward
(parisc and metag) in asm/processor.h rather than hard coding in
fs/exec.c so that metag can specify a smaller value of 256MB rather than
1GB.

This fixes a BUG on metag if the RLIMIT_STACK hard limit is increased
beyond a safe value by root. E.g. when starting a process after running
"ulimit -H -s unlimited" it will then attempt to use a stack size of the
maximum 1GB which is far too big for metag's limited user virtual
address space (stack_top is usually 0x3ffff000):

BUG: failure at fs/exec.c:589/shift_arg_pages()!

Signed-off-by: James Hogan &lt;james.hogan@imgtec.com&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: "James E.J. Bottomley" &lt;jejb@parisc-linux.org&gt;
Cc: linux-parisc@vger.kernel.org
Cc: linux-metag@vger.kernel.org
Cc: John David Anglin &lt;dave.anglin@bell.net&gt;
Signed-off-by: Jiri Slaby &lt;jslaby@suse.cz&gt;
</content>
</entry>
<entry>
<title>exec/ptrace: fix get_dumpable() incorrect tests</title>
<updated>2013-11-29T19:27:58+00:00</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2013-11-12T23:11:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9d4dd888b4b5799ecadfb0d8c9adda7a76779806'/>
<id>urn:sha1:9d4dd888b4b5799ecadfb0d8c9adda7a76779806</id>
<content type='text'>
commit d049f74f2dbe71354d43d393ac3a188947811348 upstream.

The get_dumpable() return value is not boolean.  Most users of the
function actually want to be testing for non-SUID_DUMP_USER(1) rather than
SUID_DUMP_DISABLE(0).  The SUID_DUMP_ROOT(2) is also considered a
protected state.  Almost all places did this correctly, excepting the two
places fixed in this patch.

Wrong logic:
    if (dumpable == SUID_DUMP_DISABLE) { /* be protective */ }
        or
    if (dumpable == 0) { /* be protective */ }
        or
    if (!dumpable) { /* be protective */ }

Correct logic:
    if (dumpable != SUID_DUMP_USER) { /* be protective */ }
        or
    if (dumpable != 1) { /* be protective */ }

Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a
user was able to ptrace attach to processes that had dropped privileges to
that user.  (This may have been partially mitigated if Yama was enabled.)

The macros have been moved into the file that declares get/set_dumpable(),
which means things like the ia64 code can see them too.

CVE-2013-2929

Reported-by: Vasily Kulikov &lt;segoon@openwall.com&gt;
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
</entry>
<entry>
<title>exec: cleanup the error handling in search_binary_handler()</title>
<updated>2013-09-11T22:59:09+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2013-09-11T21:24:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6b3c538f5b2cfc53cb6803ec5001bbcf8f18a98e'/>
<id>urn:sha1:6b3c538f5b2cfc53cb6803ec5001bbcf8f18a98e</id>
<content type='text'>
The error hanling and ret-from-loop look confusing and inconsistent.

- "retval &gt;= 0" simply returns

- "!bprm-&gt;file" returns too but with read_unlock() because
   binfmt_lock was already re-acquired

- "retval != -ENOEXEC || bprm-&gt;mm == NULL" does "break" and
  relies on the same check after the main loop

Consolidate these checks into a single if/return statement.

need_retry still checks "retval == -ENOEXEC", but this and -ENOENT before
the main loop are not needed.  This is only for pathological and
impossible list_empty(&amp;formats) case.

It is not clear why do we check "bprm-&gt;mm == NULL", probably this
should be removed.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Evgeniy Polyakov &lt;zbr@ioremap.net&gt;
Cc: Zach Levis &lt;zml@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>exec: don't retry if request_module() fails</title>
<updated>2013-09-11T22:59:07+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2013-09-11T21:24:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4e0621a07ea58a0dc15859be3b743bdeb194a51b'/>
<id>urn:sha1:4e0621a07ea58a0dc15859be3b743bdeb194a51b</id>
<content type='text'>
A separate one-liner for better documentation.

It doesn't make sense to retry if request_module() fails to exec
/sbin/modprobe, add the additional "request_module() &lt; 0" check.

However, this logic still doesn't look exactly right:

1. It would be better to check "request_module() != 0", the user
   space modprobe process should report the correct exit code.
   But I didn't dare to add the user-visible change.

2. The whole ENOEXEC logic looks suboptimal. Suppose that we try
   to exec a "#!path-to-unsupported-binary" script. In this case
   request_module() + "retry" will be done twice: first by the
   "depth == 1" code, and then again by the "depth == 0" caller
   which doesn't make sense.

3. And note that in the case above bprm-&gt;buf was already changed
   by load_script()-&gt;prepare_binprm(), so this looks even more
   ugly.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Evgeniy Polyakov &lt;zbr@ioremap.net&gt;
Cc: Zach Levis &lt;zml@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>exec: cleanup the CONFIG_MODULES logic</title>
<updated>2013-09-11T22:59:05+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2013-09-11T21:24:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=cb7b6b1cbc20a970c7124efae1c2478155604b54'/>
<id>urn:sha1:cb7b6b1cbc20a970c7124efae1c2478155604b54</id>
<content type='text'>
search_binary_handler() uses "for (try=0; try&lt;2; try++)" to avoid "goto"
but the code looks too complicated and horrible imho.  We still need to
check "try == 0" before request_module() and add the additional "break"
for !CONFIG_MODULES case.

Kill this loop and use a simple "bool need_retry" + "goto retry".  The
code looks much simpler and we do not even need ifdef's, gcc can optimize
out the "if (need_retry)" block if !IS_ENABLED().

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Evgeniy Polyakov &lt;zbr@ioremap.net&gt;
Cc: Zach Levis &lt;zml@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>exec: kill -&gt;load_binary != NULL check in search_binary_handler()</title>
<updated>2013-09-11T22:59:05+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2013-09-11T21:24:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=92eaa565add62d56b90987f58ea9feafc5a7c183'/>
<id>urn:sha1:92eaa565add62d56b90987f58ea9feafc5a7c183</id>
<content type='text'>
search_binary_handler() checks -&gt;load_binary != NULL for no reason, this
method should be always defined.  Turn this check into WARN_ON() and move
it into __register_binfmt().

Also, kill the function pointer.  The current code looks confusing, as if
-&gt;load_binary can go away after read_unlock(&amp;binfmt_lock).  But we rely on
module_get(fmt-&gt;module), this fmt can't be changed or unregistered,
otherwise this code is buggy anyway.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Evgeniy Polyakov &lt;zbr@ioremap.net&gt;
Cc: Zach Levis &lt;zml@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>exec: move allow_write_access/fput to exec_binprm()</title>
<updated>2013-09-11T22:59:05+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2013-09-11T21:24:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=52f14282bb0c3d3e5ba2a9eaacb12ff37a033e7e'/>
<id>urn:sha1:52f14282bb0c3d3e5ba2a9eaacb12ff37a033e7e</id>
<content type='text'>
When search_binary_handler() succeeds it does allow_write_access() and
fput(), then it clears bprm-&gt;file to ensure the caller will not do the
same.

We can simply move this code to exec_binprm() which is called only once.
In fact we could move this to free_bprm() and remove the same code in
do_execve_common's error path.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Evgeniy Polyakov &lt;zbr@ioremap.net&gt;
Cc: Zach Levis &lt;zml@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>exec: proc_exec_connector() should be called only once</title>
<updated>2013-09-11T22:59:05+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2013-09-11T21:24:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9beb266f2d7e5362c5bb9f999255aa1af5318aef'/>
<id>urn:sha1:9beb266f2d7e5362c5bb9f999255aa1af5318aef</id>
<content type='text'>
A separate one-liner with the minor fix.

PROC_EVENT_EXEC reports the "exec" event, but this message is sent at
least twice if search_binary_handler() is called by -&gt;load_binary()
recursively, say, load_script().

Move it to exec_binprm(), this is "depth == 0" code too.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Al Viro &lt;viro@ZenIV.linux.org.uk&gt;
Cc: Evgeniy Polyakov &lt;zbr@ioremap.net&gt;
Cc: Zach Levis &lt;zml@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
