From 02d43b1d13a0a55a75bb5c5f98d2b13dbe71ecf9 Mon Sep 17 00:00:00 2001 From: Mathieu Desnoyers Date: Mon, 1 Dec 2008 05:46:38 -0500 Subject: documentation: local_ops fix on_each_cpu Impact: update code example in documentation * Dr. David Alan Gilbert (dave@treblig.org) wrote: [...] > I noticed while looking at something else that the example in > local_ops.txt still has the 4 operand on_each_cpu in the latest git; > I don't know the rest of the code around there very well so I thought > it best to mention it rather than post a patch. Reported-by: Dr. David Alan Gilbert Signed-off-by: Mathieu Desnoyers Signed-off-by: Ingo Molnar --- Documentation/local_ops.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/local_ops.txt b/Documentation/local_ops.txt index f4f8b1c6c8ba..23045b8b50f0 100644 --- a/Documentation/local_ops.txt +++ b/Documentation/local_ops.txt @@ -149,7 +149,7 @@ static void do_test_timer(unsigned long data) int cpu; /* Increment the counters */ - on_each_cpu(test_each, NULL, 0, 1); + on_each_cpu(test_each, NULL, 1); /* Read all the counters */ printk("Counters read from CPU %d\n", smp_processor_id()); for_each_online_cpu(cpu) { -- cgit v1.2.3 From a2eee69b814854095ed835a6eb64b8efc220cd6a Mon Sep 17 00:00:00 2001 From: Mark Fasheh Date: Tue, 18 Nov 2008 15:08:42 -0800 Subject: ocfs2: Small documentation update Remove some features from the "not-supported" list that are actually supported now. Signed-off-by: Mark Fasheh --- Documentation/filesystems/ocfs2.txt | 3 --- 1 file changed, 3 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/ocfs2.txt b/Documentation/filesystems/ocfs2.txt index 4340cc825796..67310fbbb7df 100644 --- a/Documentation/filesystems/ocfs2.txt +++ b/Documentation/filesystems/ocfs2.txt @@ -28,10 +28,7 @@ Manish Singh Caveats ======= Features which OCFS2 does not support yet: - - extended attributes - quotas - - cluster aware flock - - cluster aware lockf - Directory change notification (F_NOTIFY) - Distributed Caching (F_SETLEASE/F_GETLEASE/break_lease) - POSIX ACLs -- cgit v1.2.3 From dc8c214a9c37eb288b1c4782632649e55d251c68 Mon Sep 17 00:00:00 2001 From: roel kluin Date: Mon, 1 Dec 2008 13:13:51 -0800 Subject: spi documentation: use __initdata on struct Use __initdata for data, not __init. Signed-off-by: Roel Kluin Signed-off-by: David Brownell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/spi/spi-summary | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/spi/spi-summary b/Documentation/spi/spi-summary index 8bae2f018d34..0f5122eb282b 100644 --- a/Documentation/spi/spi-summary +++ b/Documentation/spi/spi-summary @@ -215,7 +215,7 @@ So for example arch/.../mach-*/board-*.c files might have code like: /* if your mach-* infrastructure doesn't support kernels that can * run on multiple boards, pdata wouldn't benefit from "__init". */ - static struct mysoc_spi_data __init pdata = { ... }; + static struct mysoc_spi_data __initdata pdata = { ... }; static __init board_init(void) { -- cgit v1.2.3 From 7ef9964e6d1b911b78709f144000aacadd0ebc21 Mon Sep 17 00:00:00 2001 From: Davide Libenzi Date: Mon, 1 Dec 2008 13:13:55 -0800 Subject: epoll: introduce resource usage limits It has been thought that the per-user file descriptors limit would also limit the resources that a normal user can request via the epoll interface. Vegard Nossum reported a very simple program (a modified version attached) that can make a normal user to request a pretty large amount of kernel memory, well within the its maximum number of fds. To solve such problem, default limits are now imposed, and /proc based configuration has been introduced. A new directory has been created, named /proc/sys/fs/epoll/ and inside there, there are two configuration points: max_user_instances = Maximum number of devices - per user max_user_watches = Maximum number of "watched" fds - per user The current default for "max_user_watches" limits the memory used by epoll to store "watches", to 1/32 of the amount of the low RAM. As example, a 256MB 32bit machine, will have "max_user_watches" set to roughly 90000. That should be enough to not break existing heavy epoll users. The default value for "max_user_instances" is set to 128, that should be enough too. This also changes the userspace, because a new error code can now come out from EPOLL_CTL_ADD (-ENOSPC). The EMFILE from epoll_create() was already listed, so that should be ok. [akpm@linux-foundation.org: use get_current_user()] Signed-off-by: Davide Libenzi Cc: Michael Kerrisk Cc: Cc: Cyrill Gorcunov Reported-by: Vegard Nossum Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/proc.txt | 27 ++++++++++++ fs/eventpoll.c | 85 ++++++++++++++++++++++++++++++++++---- include/linux/sched.h | 4 ++ kernel/sysctl.c | 10 +++++ 4 files changed, 118 insertions(+), 8 deletions(-) (limited to 'Documentation') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index bcceb99b81dd..bb1b0dd3bfcb 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -44,6 +44,7 @@ Table of Contents 2.14 /proc//io - Display the IO accounting fields 2.15 /proc//coredump_filter - Core dump filtering settings 2.16 /proc//mountinfo - Information about mounts + 2.17 /proc/sys/fs/epoll - Configuration options for the epoll interface ------------------------------------------------------------------------------ Preface @@ -2483,4 +2484,30 @@ For more information on mount propagation see: Documentation/filesystems/sharedsubtree.txt +2.17 /proc/sys/fs/epoll - Configuration options for the epoll interface +-------------------------------------------------------- + +This directory contains configuration options for the epoll(7) interface. + +max_user_instances +------------------ + +This is the maximum number of epoll file descriptors that a single user can +have open at a given time. The default value is 128, and should be enough +for normal users. + +max_user_watches +---------------- + +Every epoll file descriptor can store a number of files to be monitored +for event readiness. Each one of these monitored files constitutes a "watch". +This configuration option sets the maximum number of "watches" that are +allowed for each user. +Each "watch" costs roughly 90 bytes on a 32bit kernel, and roughly 160 bytes +on a 64bit one. +The current default value for max_user_watches is the 1/32 of the available +low memory, divided for the "watch" cost in bytes. + + ------------------------------------------------------------------------------ + diff --git a/fs/eventpoll.c b/fs/eventpoll.c index aec5c13f6341..96355d505347 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -102,6 +102,8 @@ #define EP_UNACTIVE_PTR ((void *) -1L) +#define EP_ITEM_COST (sizeof(struct epitem) + sizeof(struct eppoll_entry)) + struct epoll_filefd { struct file *file; int fd; @@ -200,6 +202,9 @@ struct eventpoll { * holding ->lock. */ struct epitem *ovflist; + + /* The user that created the eventpoll descriptor */ + struct user_struct *user; }; /* Wait structure used by the poll hooks */ @@ -226,10 +231,18 @@ struct ep_pqueue { struct epitem *epi; }; +/* + * Configuration options available inside /proc/sys/fs/epoll/ + */ +/* Maximum number of epoll devices, per user */ +static int max_user_instances __read_mostly; +/* Maximum number of epoll watched descriptors, per user */ +static int max_user_watches __read_mostly; + /* * This mutex is used to serialize ep_free() and eventpoll_release_file(). */ -static struct mutex epmutex; +static DEFINE_MUTEX(epmutex); /* Safe wake up implementation */ static struct poll_safewake psw; @@ -240,6 +253,33 @@ static struct kmem_cache *epi_cache __read_mostly; /* Slab cache used to allocate "struct eppoll_entry" */ static struct kmem_cache *pwq_cache __read_mostly; +#ifdef CONFIG_SYSCTL + +#include + +static int zero; + +ctl_table epoll_table[] = { + { + .procname = "max_user_instances", + .data = &max_user_instances, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec_minmax, + .extra1 = &zero, + }, + { + .procname = "max_user_watches", + .data = &max_user_watches, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec_minmax, + .extra1 = &zero, + }, + { .ctl_name = 0 } +}; +#endif /* CONFIG_SYSCTL */ + /* Setup the structure that is used as key for the RB tree */ static inline void ep_set_ffd(struct epoll_filefd *ffd, @@ -402,6 +442,8 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi) /* At this point it is safe to free the eventpoll item */ kmem_cache_free(epi_cache, epi); + atomic_dec(&ep->user->epoll_watches); + DNPRINTK(3, (KERN_INFO "[%p] eventpoll: ep_remove(%p, %p)\n", current, ep, file)); @@ -449,6 +491,8 @@ static void ep_free(struct eventpoll *ep) mutex_unlock(&epmutex); mutex_destroy(&ep->mtx); + atomic_dec(&ep->user->epoll_devs); + free_uid(ep->user); kfree(ep); } @@ -532,10 +576,19 @@ void eventpoll_release_file(struct file *file) static int ep_alloc(struct eventpoll **pep) { - struct eventpoll *ep = kzalloc(sizeof(*ep), GFP_KERNEL); + int error; + struct user_struct *user; + struct eventpoll *ep; - if (!ep) - return -ENOMEM; + user = get_current_user(); + error = -EMFILE; + if (unlikely(atomic_read(&user->epoll_devs) >= + max_user_instances)) + goto free_uid; + error = -ENOMEM; + ep = kzalloc(sizeof(*ep), GFP_KERNEL); + if (unlikely(!ep)) + goto free_uid; spin_lock_init(&ep->lock); mutex_init(&ep->mtx); @@ -544,12 +597,17 @@ static int ep_alloc(struct eventpoll **pep) INIT_LIST_HEAD(&ep->rdllist); ep->rbr = RB_ROOT; ep->ovflist = EP_UNACTIVE_PTR; + ep->user = user; *pep = ep; DNPRINTK(3, (KERN_INFO "[%p] eventpoll: ep_alloc() ep=%p\n", current, ep)); return 0; + +free_uid: + free_uid(user); + return error; } /* @@ -703,9 +761,11 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event, struct epitem *epi; struct ep_pqueue epq; - error = -ENOMEM; + if (unlikely(atomic_read(&ep->user->epoll_watches) >= + max_user_watches)) + return -ENOSPC; if (!(epi = kmem_cache_alloc(epi_cache, GFP_KERNEL))) - goto error_return; + return -ENOMEM; /* Item initialization follow here ... */ INIT_LIST_HEAD(&epi->rdllink); @@ -735,6 +795,7 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event, * install process. Namely an allocation for a wait queue failed due * high memory pressure. */ + error = -ENOMEM; if (epi->nwait < 0) goto error_unregister; @@ -765,6 +826,8 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event, spin_unlock_irqrestore(&ep->lock, flags); + atomic_inc(&ep->user->epoll_watches); + /* We have to call this outside the lock */ if (pwake) ep_poll_safewake(&psw, &ep->poll_wait); @@ -789,7 +852,7 @@ error_unregister: spin_unlock_irqrestore(&ep->lock, flags); kmem_cache_free(epi_cache, epi); -error_return: + return error; } @@ -1078,6 +1141,7 @@ asmlinkage long sys_epoll_create1(int flags) flags & O_CLOEXEC); if (fd < 0) ep_free(ep); + atomic_inc(&ep->user->epoll_devs); error_return: DNPRINTK(3, (KERN_INFO "[%p] eventpoll: sys_epoll_create(%d) = %d\n", @@ -1299,7 +1363,12 @@ asmlinkage long sys_epoll_pwait(int epfd, struct epoll_event __user *events, static int __init eventpoll_init(void) { - mutex_init(&epmutex); + struct sysinfo si; + + si_meminfo(&si); + max_user_instances = 128; + max_user_watches = (((si.totalram - si.totalhigh) / 32) << PAGE_SHIFT) / + EP_ITEM_COST; /* Initialize the structure used to perform safe poll wait head wake ups */ ep_poll_safewake_init(&psw); diff --git a/include/linux/sched.h b/include/linux/sched.h index 644ffbda17ca..55e30d114477 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -630,6 +630,10 @@ struct user_struct { atomic_t inotify_watches; /* How many inotify watches does this user have? */ atomic_t inotify_devs; /* How many inotify devs does this user have opened? */ #endif +#ifdef CONFIG_EPOLL + atomic_t epoll_devs; /* The number of epoll descriptors currently open */ + atomic_t epoll_watches; /* The number of file descriptors currently watched */ +#endif #ifdef CONFIG_POSIX_MQUEUE /* protected by mq_lock */ unsigned long mq_bytes; /* How many bytes can be allocated to mqueue? */ diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 9d048fa2d902..3d56fe7570da 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -176,6 +176,9 @@ extern struct ctl_table random_table[]; #ifdef CONFIG_INOTIFY_USER extern struct ctl_table inotify_table[]; #endif +#ifdef CONFIG_EPOLL +extern struct ctl_table epoll_table[]; +#endif #ifdef HAVE_ARCH_PICK_MMAP_LAYOUT int sysctl_legacy_va_layout; @@ -1325,6 +1328,13 @@ static struct ctl_table fs_table[] = { .child = inotify_table, }, #endif +#ifdef CONFIG_EPOLL + { + .procname = "epoll", + .mode = 0555, + .child = epoll_table, + }, +#endif #endif { .ctl_name = KERN_SETUID_DUMPABLE, -- cgit v1.2.3 From 1d678f365dae28420fa7329a2a35390b3582678d Mon Sep 17 00:00:00 2001 From: FUJITA Tomonori Date: Mon, 1 Dec 2008 13:14:01 -0800 Subject: DMA-API.txt: fix description of pci_map_sg/dma_map_sg scatterlists handling - pci_map_sg/dma_map_sg are used with a scatter gather list that doesn't come from the block layer (e.g. some network drivers do). - how IOMMUs merge adjacent elements of the scatter/gather list is independent of how the block layer determines sees elements. Signed-off-by: FUJITA Tomonori Cc: James Bottomley Cc: Tejun Heo Cc: Jens Axboe Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/DMA-API.txt | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt index b8e86460046e..b462bb149543 100644 --- a/Documentation/DMA-API.txt +++ b/Documentation/DMA-API.txt @@ -316,12 +316,10 @@ reduce current DMA mapping usage or delay and try again later). pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg, int nents, int direction) -Maps a scatter gather list from the block layer. - Returns: the number of physical segments mapped (this may be shorter -than passed in if the block layer determines that some -elements of the scatter/gather list are physically adjacent and thus -may be mapped with a single entry). +than passed in if some elements of the scatter/gather list are +physically or virtually adjacent and an IOMMU maps them with a single +entry). Please note that the sg cannot be mapped again if it has been mapped once. The mapping process is allowed to destroy information in the sg. -- cgit v1.2.3