<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/linux/init_task.h, branch v2.6.25.2</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v2.6.25.2</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v2.6.25.2'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2008-02-05T17:44:20+00:00</updated>
<entry>
<title>capabilities: introduce per-process capability bounding set</title>
<updated>2008-02-05T17:44:20+00:00</updated>
<author>
<name>Serge E. Hallyn</name>
<email>serue@us.ibm.com</email>
</author>
<published>2008-02-05T06:29:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3b7391de67da515c91f48aa371de77cb6cc5c07e'/>
<id>urn:sha1:3b7391de67da515c91f48aa371de77cb6cc5c07e</id>
<content type='text'>
The capability bounding set is a set beyond which capabilities cannot grow.
 Currently cap_bset is per-system.  It can be manipulated through sysctl,
but only init can add capabilities.  Root can remove capabilities.  By
default it includes all caps except CAP_SETPCAP.

This patch makes the bounding set per-process when file capabilities are
enabled.  It is inherited at fork from parent.  Noone can add elements,
CAP_SETPCAP is required to remove them.

One example use of this is to start a safer container.  For instance, until
device namespaces or per-container device whitelists are introduced, it is
best to take CAP_MKNOD away from a container.

The bounding set will not affect pP and pE immediately.  It will only
affect pP' and pE' after subsequent exec()s.  It also does not affect pI,
and exec() does not constrain pI'.  So to really start a shell with no way
of regain CAP_MKNOD, you would do

	prctl(PR_CAPBSET_DROP, CAP_MKNOD);
	cap_t cap = cap_get_proc();
	cap_value_t caparray[1];
	caparray[0] = CAP_MKNOD;
	cap_set_flag(cap, CAP_INHERITABLE, 1, caparray, CAP_DROP);
	cap_set_proc(cap);
	cap_free(cap);

The following test program will get and set the bounding
set (but not pI).  For instance

	./bset get
		(lists capabilities in bset)
	./bset drop cap_net_raw
		(starts shell with new bset)
		(use capset, setuid binary, or binary with
		file capabilities to try to increase caps)

************************************************************
cap_bound.c
************************************************************
 #include &lt;sys/prctl.h&gt;
 #include &lt;linux/capability.h&gt;
 #include &lt;sys/types.h&gt;
 #include &lt;unistd.h&gt;
 #include &lt;stdio.h&gt;
 #include &lt;stdlib.h&gt;
 #include &lt;string.h&gt;

 #ifndef PR_CAPBSET_READ
 #define PR_CAPBSET_READ 23
 #endif

 #ifndef PR_CAPBSET_DROP
 #define PR_CAPBSET_DROP 24
 #endif

int usage(char *me)
{
	printf("Usage: %s get\n", me);
	printf("       %s drop &lt;capability&gt;\n", me);
	return 1;
}

 #define numcaps 32
char *captable[numcaps] = {
	"cap_chown",
	"cap_dac_override",
	"cap_dac_read_search",
	"cap_fowner",
	"cap_fsetid",
	"cap_kill",
	"cap_setgid",
	"cap_setuid",
	"cap_setpcap",
	"cap_linux_immutable",
	"cap_net_bind_service",
	"cap_net_broadcast",
	"cap_net_admin",
	"cap_net_raw",
	"cap_ipc_lock",
	"cap_ipc_owner",
	"cap_sys_module",
	"cap_sys_rawio",
	"cap_sys_chroot",
	"cap_sys_ptrace",
	"cap_sys_pacct",
	"cap_sys_admin",
	"cap_sys_boot",
	"cap_sys_nice",
	"cap_sys_resource",
	"cap_sys_time",
	"cap_sys_tty_config",
	"cap_mknod",
	"cap_lease",
	"cap_audit_write",
	"cap_audit_control",
	"cap_setfcap"
};

int getbcap(void)
{
	int comma=0;
	unsigned long i;
	int ret;

	printf("i know of %d capabilities\n", numcaps);
	printf("capability bounding set:");
	for (i=0; i&lt;numcaps; i++) {
		ret = prctl(PR_CAPBSET_READ, i);
		if (ret &lt; 0)
			perror("prctl");
		else if (ret==1)
			printf("%s%s", (comma++) ? ", " : " ", captable[i]);
	}
	printf("\n");
	return 0;
}

int capdrop(char *str)
{
	unsigned long i;

	int found=0;
	for (i=0; i&lt;numcaps; i++) {
		if (strcmp(captable[i], str) == 0) {
			found=1;
			break;
		}
	}
	if (!found)
		return 1;
	if (prctl(PR_CAPBSET_DROP, i)) {
		perror("prctl");
		return 1;
	}
	return 0;
}

int main(int argc, char *argv[])
{
	if (argc&lt;2)
		return usage(argv[0]);
	if (strcmp(argv[1], "get")==0)
		return getbcap();
	if (strcmp(argv[1], "drop")!=0 || argc&lt;3)
		return usage(argv[0]);
	if (capdrop(argv[2])) {
		printf("unknown capability\n");
		return 1;
	}
	return execl("/bin/bash", "/bin/bash", NULL);
}
************************************************************

[serue@us.ibm.com: fix typo]
Signed-off-by: Serge E. Hallyn &lt;serue@us.ibm.com&gt;
Signed-off-by: Andrew G. Morgan &lt;morgan@kernel.org&gt;
Cc: Stephen Smalley &lt;sds@tycho.nsa.gov&gt;
Cc: James Morris &lt;jmorris@namei.org&gt;
Cc: Chris Wright &lt;chrisw@sous-sol.org&gt;
Cc: Casey Schaufler &lt;casey@schaufler-ca.com&gt;a
Signed-off-by: "Serge E. Hallyn" &lt;serue@us.ibm.com&gt;
Tested-by: Jiri Slaby &lt;jirislaby@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>[AUDIT] add session id to audit messages</title>
<updated>2008-02-01T19:06:51+00:00</updated>
<author>
<name>Eric Paris</name>
<email>eparis@redhat.com</email>
</author>
<published>2008-01-08T15:06:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4746ec5b01ed07205a91e4f7ed9de9d70f371407'/>
<id>urn:sha1:4746ec5b01ed07205a91e4f7ed9de9d70f371407</id>
<content type='text'>
In order to correlate audit records to an individual login add a session
id.  This is incremented every time a user logs in and is included in
almost all messages which currently output the auid.  The field is
labeled ses=  or oses=

Signed-off-by: Eric Paris &lt;eparis@redhat.com&gt;
</content>
</entry>
<entry>
<title>[PATCH] get rid of loginuid races</title>
<updated>2008-02-01T19:05:28+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2008-01-10T09:53:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=bfef93a5d1fb5654fe2025276c55e202d10b5255'/>
<id>urn:sha1:bfef93a5d1fb5654fe2025276c55e202d10b5255</id>
<content type='text'>
Keeping loginuid in audit_context is racy and results in messier
code.  Taken to task_struct, out of the way of -&gt;audit_context
changes.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>ioprio: move io priority from task_struct to io_context</title>
<updated>2008-01-28T09:50:29+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>jens.axboe@oracle.com</email>
</author>
<published>2008-01-24T07:52:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fd0928df98b9578be8a786ac0cb78a47a5e17a20'/>
<id>urn:sha1:fd0928df98b9578be8a786ac0cb78a47a5e17a20</id>
<content type='text'>
This is where it belongs and then it doesn't take up space for a
process that doesn't do IO.

Signed-off-by: Jens Axboe &lt;jens.axboe@oracle.com&gt;
</content>
</entry>
<entry>
<title>sched: rt group scheduling</title>
<updated>2008-01-25T20:08:30+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2008-01-25T20:08:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=6f505b16425a51270058e4a93441fe64de3dd435'/>
<id>urn:sha1:6f505b16425a51270058e4a93441fe64de3dd435</id>
<content type='text'>
Extend group scheduling to also cover the realtime classes. It uses the time
limiting introduced by the previous patch to allow multiple realtime groups.

The hard time limit is required to keep behaviour deterministic.

The algorithms used make the realtime scheduler O(tg), linear scaling wrt the
number of task groups. This is the worst case behaviour I can't seem to get out
of, the avg. case of the algorithms can be improved, I focused on correctness
and worst case.

[ akpm@linux-foundation.org: move side-effects out of BUG_ON(). ]

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>sched: sched_rt_entity</title>
<updated>2008-01-25T20:08:27+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2008-01-25T20:08:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fa717060f1ab7eb6570f2fb49136f838fc9195a9'/>
<id>urn:sha1:fa717060f1ab7eb6570f2fb49136f838fc9195a9</id>
<content type='text'>
Move the task_struct members specific to rt scheduling together.
A future optimization could be to put sched_entity and sched_rt_entity
into a union.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
CC: Srivatsa Vaddagiri &lt;vatsa@linux.vnet.ibm.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>sched: add RT-balance cpu-weight</title>
<updated>2008-01-25T20:08:07+00:00</updated>
<author>
<name>Gregory Haskins</name>
<email>ghaskins@novell.com</email>
</author>
<published>2008-01-25T20:08:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=73fe6aae84400e2b475e2a1dc4e8592cd3ed6e69'/>
<id>urn:sha1:73fe6aae84400e2b475e2a1dc4e8592cd3ed6e69</id>
<content type='text'>
Some RT tasks (particularly kthreads) are bound to one specific CPU.
It is fairly common for two or more bound tasks to get queued up at the
same time.  Consider, for instance, softirq_timer and softirq_sched.  A
timer goes off in an ISR which schedules softirq_thread to run at RT50.
Then the timer handler determines that it's time to smp-rebalance the
system so it schedules softirq_sched to run.  So we are in a situation
where we have two RT50 tasks queued, and the system will go into
rt-overload condition to request other CPUs for help.

This causes two problems in the current code:

1) If a high-priority bound task and a low-priority unbounded task queue
   up behind the running task, we will fail to ever relocate the unbounded
   task because we terminate the search on the first unmovable task.

2) We spend precious futile cycles in the fast-path trying to pull
   overloaded tasks over.  It is therefore optimial to strive to avoid the
   overhead all together if we can cheaply detect the condition before
   overload even occurs.

This patch tries to achieve this optimization by utilizing the hamming
weight of the task-&gt;cpus_allowed mask.  A weight of 1 indicates that
the task cannot be migrated.  We will then utilize this information to
skip non-migratable tasks and to eliminate uncessary rebalance attempts.

We introduce a per-rq variable to count the number of migratable tasks
that are currently running.  We only go into overload if we have more
than one rt task, AND at least one of them is migratable.

In addition, we introduce a per-task variable to cache the cpus_allowed
weight, since the hamming calculation is probably relatively expensive.
We only update the cached value when the mask is updated which should be
relatively infrequent, especially compared to scheduling frequency
in the fast path.

Signed-off-by: Gregory Haskins &lt;ghaskins@novell.com&gt;
Signed-off-by: Steven Rostedt &lt;srostedt@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
</entry>
<entry>
<title>Isolate the explicit usage of signal-&gt;pgrp</title>
<updated>2007-10-19T18:53:43+00:00</updated>
<author>
<name>Pavel Emelyanov</name>
<email>xemul@openvz.org</email>
</author>
<published>2007-10-19T06:40:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9a2e70572e94e21e7ec4186702d045415422bda0'/>
<id>urn:sha1:9a2e70572e94e21e7ec4186702d045415422bda0</id>
<content type='text'>
The pgrp field is not used widely around the kernel so it is now marked as
deprecated with appropriate comment.

The initialization of INIT_SIGNALS is trimmed because
a) they are set to 0 automatically;
b) gcc cannot properly initialize two anonymous (the second one
   is the one with the session) unions. In this particular case
   to make it compile we'd have to add some field initialized
   right before the .pgrp.

This is the same patch as the 1ec320afdc9552c92191d5f89fcd1ebe588334ca one
(from Cedric), but for the pgrp field.

Some progress report:

We have to deprecate the pid, tgid, session and pgrp fields on struct
task_struct and struct signal_struct.  The session and pgrp are already
deprecated.  The tgid value is close to being such - the worst known usage
in in fs/locks.c and audit code.  The pid field deprecation is mainly
blocked by numerous printk-s around the kernel that print the tsk-&gt;pid to
log.

Signed-off-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Sukadev Bhattiprolu &lt;sukadev@us.ibm.com&gt;
Cc: Cedric Le Goater &lt;clg@fr.ibm.com&gt;
Cc: Serge Hallyn &lt;serue@us.ibm.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Herbert Poetzl &lt;herbert@13thfloor.at&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>pid namespaces: remove the struct pid unneeded fields</title>
<updated>2007-10-19T18:53:40+00:00</updated>
<author>
<name>Pavel Emelyanov</name>
<email>xemul@openvz.org</email>
</author>
<published>2007-10-19T06:40:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=19b9b9b54e5f115907efd56be2c3799775a46561'/>
<id>urn:sha1:19b9b9b54e5f115907efd56be2c3799775a46561</id>
<content type='text'>
Since we've switched from using pid-&gt;nr to pid-&gt;upids-&gt;nr some
fields on struct pid are no longer needed

Signed-off-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Sukadev Bhattiprolu &lt;sukadev@us.ibm.com&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>pid namespaces: introduce struct upid</title>
<updated>2007-10-19T18:53:38+00:00</updated>
<author>
<name>Sukadev Bhattiprolu</name>
<email>sukadev@us.ibm.com</email>
</author>
<published>2007-10-19T06:40:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=4c3f2ead5a3dff9069a45560ba4d007c8ae2e2ee'/>
<id>urn:sha1:4c3f2ead5a3dff9069a45560ba4d007c8ae2e2ee</id>
<content type='text'>
Since task will be visible from different pid namespaces each of them have to
be addressed by multiple pids.  struct upid is to store the information about
which id refers to which namespace.

The constuciton looks like this.  Each struct pid carried the reference
counter and the list of tasks attached to this pid.  At its end it has a
variable length array of struct upid-s.  Each struct upid has a numerical id
(pid itself), pointer to the namespace, this ID is valid in and is hashed into
a pid_hash for searching the pids.

The nr and pid_chain fields are kept in struct pid for a while to make kernel
still work (no patch initialize the upids yet), but it will be removed at the
end of this series when we switch to upids completely.

Signed-off-by: Sukadev Bhattiprolu &lt;sukadev@us.ibm.com&gt;
Signed-off-by: Pavel Emelyanov &lt;xemul@openvz.org&gt;
Cc: Oleg Nesterov &lt;oleg@tv-sign.ru&gt;
Cc: Paul Menage &lt;menage@google.com&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
