<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/ipc/util.h, branch v6.6.131</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v6.6.131'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2023-01-28T00:08:00+00:00</updated>
<entry>
<title>ipc,namespace: batch free ipc_namespace structures</title>
<updated>2023-01-28T00:08:00+00:00</updated>
<author>
<name>Rik van Riel</name>
<email>riel@surriel.com</email>
</author>
<published>2023-01-27T18:46:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=da27f796a832122ee533c7685438dad1c4e338dd'/>
<id>urn:sha1:da27f796a832122ee533c7685438dad1c4e338dd</id>
<content type='text'>
Instead of waiting for an RCU grace period between each ipc_namespace
structure that is being freed, wait an RCU grace period for every batch
of ipc_namespace structures.

Thanks to Al Viro for the suggestion of the helper function.

This speeds up the run time of the test case that allocates ipc_namespaces
in a loop from 6 minutes, to a little over 1 second:

real	0m1.192s
user	0m0.038s
sys	0m1.152s

Signed-off-by: Rik van Riel &lt;riel@surriel.com&gt;
Reported-by: Chris Mason &lt;clm@meta.com&gt;
Tested-by: Giuseppe Scrivano &lt;gscrivan@redhat.com&gt;
Suggested-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>ipc/msg: mitigate the lock contention with percpu counter</title>
<updated>2022-10-03T21:21:44+00:00</updated>
<author>
<name>Jiebin Sun</name>
<email>jiebin.sun@intel.com</email>
</author>
<published>2022-09-13T19:25:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=72d1e611082eda18689106a0c192f2827072713c'/>
<id>urn:sha1:72d1e611082eda18689106a0c192f2827072713c</id>
<content type='text'>
The msg_bytes and msg_hdrs atomic counters are frequently updated when IPC
msg queue is in heavy use, causing heavy cache bounce and overhead. 
Change them to percpu_counter greatly improve the performance.  Since
there is one percpu struct per namespace, additional memory cost is
minimal.  Reading of the count done in msgctl call, which is infrequent. 
So the need to sum up the counts in each CPU is infrequent.

Apply the patch and test the pts/stress-ng-1.4.0
-- system v message passing (160 threads).

Score gain: 3.99x

CPU: ICX 8380 x 2 sockets
Core number: 40 x 2 physical cores
Benchmark: pts/stress-ng-1.4.0
-- system v message passing (160 threads)

[akpm@linux-foundation.org: coding-style cleanups]
[jiebin.sun@intel.com: avoid negative value by overflow in msginfo]
  Link: https://lkml.kernel.org/r/20220920150809.4014944-1-jiebin.sun@intel.com
[akpm@linux-foundation.org: fix min() warnings]
Link: https://lkml.kernel.org/r/20220913192538.3023708-3-jiebin.sun@intel.com
Signed-off-by: Jiebin Sun &lt;jiebin.sun@intel.com&gt;
Reviewed-by: Tim Chen &lt;tim.c.chen@linux.intel.com&gt;
Cc: Alexander Mikhalitsyn &lt;alexander.mikhalitsyn@virtuozzo.com&gt;
Cc: Alexey Gladkov &lt;legion@kernel.org&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Dennis Zhou &lt;dennis@kernel.org&gt;
Cc: "Eric W . Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Vasily Averin &lt;vasily.averin@linux.dev&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc/util.c: use binary search for max_idx</title>
<updated>2021-07-01T18:06:07+00:00</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2021-07-01T01:57:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=b869d5be0acf0e125e69adcffdca04000dc5b17c'/>
<id>urn:sha1:b869d5be0acf0e125e69adcffdca04000dc5b17c</id>
<content type='text'>
If semctl(), msgctl() and shmctl() are called with IPC_INFO, SEM_INFO,
MSG_INFO or SHM_INFO, then the return value is the index of the highest
used index in the kernel's internal array recording information about all
SysV objects of the requested type for the current namespace.  (This
information can be used with repeated ..._STAT or ..._STAT_ANY operations
to obtain information about all SysV objects on the system.)

There is a cache for this value.  But when the cache needs up be updated,
then the highest used index is determined by looping over all possible
values.  With the introduction of IPCMNI_EXTEND_SHIFT, this could be a
loop over 16 million entries.  And due to /proc/sys/kernel/*next_id, the
index values do not need to be consecutive.

With &lt;write 16000000 to msg_next_id&gt;, msgget(), msgctl(,IPC_RMID) in a
loop, I have observed a performance increase of around factor 13000.

As there is no get_last() function for idr structures: Implement a
"get_last()" using a binary search.

As far as I see, ipc is the only user that needs get_last(), thus
implement it in ipc/util.c and not in a central location.

[akpm@linux-foundation.org: tweak comment, fix typo]

Link: https://lkml.kernel.org/r/20210425075208.11777-2-manfred@colorfullife.com
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Acked-by: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: &lt;1vier1@web.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc: fix sparc64 ipc() wrapper</title>
<updated>2019-09-07T19:42:25+00:00</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2019-09-05T14:48:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=fb377eb80c80339b580831a3c0fcce34a4c9d1ad'/>
<id>urn:sha1:fb377eb80c80339b580831a3c0fcce34a4c9d1ad</id>
<content type='text'>
Matt bisected a sparc64 specific issue with semctl, shmctl and msgctl
to a commit from my y2038 series in linux-5.1, as I missed the custom
sys_ipc() wrapper that sparc64 uses in place of the generic version that
I patched.

The problem is that the sys_{sem,shm,msg}ctl() functions in the kernel
now do not allow being called with the IPC_64 flag any more, resulting
in a -EINVAL error when they don't recognize the command.

Instead, the correct way to do this now is to call the internal
ksys_old_{sem,shm,msg}ctl() functions to select the API version.

As we generally move towards these functions anyway, change all of
sparc_ipc() to consistently use those in place of the sys_*() versions,
and move the required ksys_*() declarations into linux/syscalls.h

The IS_ENABLED(CONFIG_SYSVIPC) check is required to avoid link
errors when ipc is disabled.

Reported-by: Matt Turner &lt;mattst88@gmail.com&gt;
Fixes: 275f22148e87 ("ipc: rename old-style shmctl/semctl/msgctl syscalls")
Cc: stable@vger.kernel.org
Tested-by: Matt Turner &lt;mattst88@gmail.com&gt;
Tested-by: Anatoly Pugachev &lt;matorola@gmail.com&gt;
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</content>
</entry>
<entry>
<title>ipc: do cyclic id allocation for the ipc object.</title>
<updated>2019-05-15T02:52:52+00:00</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2019-05-14T22:46:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=99db46ea292780cd978d56932d9445b1e8bdafe8'/>
<id>urn:sha1:99db46ea292780cd978d56932d9445b1e8bdafe8</id>
<content type='text'>
For ipcmni_extend mode, the sequence number space is only 7 bits.  So
the chance of id reuse is relatively high compared with the non-extended
mode.

To alleviate this id reuse problem, this patch enables cyclic allocation
for the index to the radix tree (idx).  The disadvantage is that this
can cause a slight slow-down of the fast path, as the radix tree could
be higher than necessary.

To limit the radix tree height, I have chosen the following limits:
 1) The cycling is done over in_use*1.5.
 2) At least, the cycling is done over
   "normal" ipcnmi mode: RADIX_TREE_MAP_SIZE elements
   "ipcmni_extended": 4096 elements

Result:
- for normal mode:
	No change for &lt;= 42 active ipc elements. With more than 42
	active ipc elements, a 2nd level would be added to the radix
	tree.
	Without cyclic allocation, a 2nd level would be added only with
	more than 63 active elements.

- for extended mode:
	Cycling creates always at least a 2-level radix tree.
	With more than 2730 active objects, a 3rd level would be
	added, instead of &gt; 4095 active objects until the 3rd level
	is added without cyclic allocation.

For a 2-level radix tree compared to a 1-level radix tree, I have
observed &lt; 1% performance impact.

Notes:
1) Normal "x=semget();y=semget();" is unaffected: Then the idx
  is e.g. a and a+1, regardless if idr_alloc() or idr_alloc_cyclic()
  is used.

2) The -1% happens in a microbenchmark after this situation:
	x=semget();
	for(i=0;i&lt;4000;i++) {t=semget();semctl(t,0,IPC_RMID);}
	y=semget();
	Now perform semget calls on x and y that do not sleep.

3) The worst-case reuse cycle time is unfortunately unaffected:
   If you have 2^24-1 ipc objects allocated, and get/remove the last
   possible element in a loop, then the id is reused after 128
   get/remove pairs.

Performance check:
A microbenchmark that performes no-op semop() randomly on two IDs,
with only these two IDs allocated.
The IDs were set using /proc/sys/kernel/sem_next_id.
The test was run 5 times, averages are shown.

1 &amp; 2: Base (6.22 seconds for 10.000.000 semops)
1 &amp; 40: -0.2%
1 &amp; 3348: - 0.8%
1 &amp; 27348: - 1.6%
1 &amp; 15777204: - 3.2%

Or: ~12.6 cpu cycles per additional radix tree level.
The cpu is an Intel I3-5010U. ~1300 cpu cycles/syscall is slower
than what I remember (spectre impact?).

V2 of the patch:
- use "min" and "max"
- use RADIX_TREE_MAP_SIZE * RADIX_TREE_MAP_SIZE instead of
	(2&lt;&lt;12).

[akpm@linux-foundation.org: fix max() warning]
Link: http://lkml.kernel.org/r/20190329204930.21620-3-longman@redhat.com
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Acked-by: Waiman Long &lt;longman@redhat.com&gt;
Cc: "Luis R. Rodriguez" &lt;mcgrof@kernel.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: "Eric W . Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Takashi Iwai &lt;tiwai@suse.de&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc: conserve sequence numbers in ipcmni_extend mode</title>
<updated>2019-05-15T02:52:52+00:00</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2019-05-14T22:46:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=3278a2c20cb302d27e6f6ee45a3f57361176e426'/>
<id>urn:sha1:3278a2c20cb302d27e6f6ee45a3f57361176e426</id>
<content type='text'>
Rewrite, based on the patch from Waiman Long:

The mixing in of a sequence number into the IPC IDs is probably to avoid
ID reuse in userspace as much as possible.  With ipcmni_extend mode, the
number of usable sequence numbers is greatly reduced leading to higher
chance of ID reuse.

To address this issue, we need to conserve the sequence number space as
much as possible.  Right now, the sequence number is incremented for
every new ID created.  In reality, we only need to increment the
sequence number when new allocated ID is not greater than the last one
allocated.  It is in such case that the new ID may collide with an
existing one.  This is being done irrespective of the ipcmni mode.

In order to avoid any races, the index is first allocated and then the
pointer is replaced.

Changes compared to the initial patch:
 - Handle failures from idr_alloc().
 - Avoid that concurrent operations can see the wrong sequence number.
   (This is achieved by using idr_replace()).
 - IPCMNI_SEQ_SHIFT is not a constant, thus renamed to
   ipcmni_seq_shift().
 - IPCMNI_SEQ_MAX is not a constant, thus renamed to ipcmni_seq_max().

Link: http://lkml.kernel.org/r/20190329204930.21620-2-longman@redhat.com
Signed-off-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Suggested-by: Matthew Wilcox &lt;willy@infradead.org&gt;
Acked-by: Waiman Long &lt;longman@redhat.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: "Eric W . Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: "Luis R. Rodriguez" &lt;mcgrof@kernel.org&gt;
Cc: Takashi Iwai &lt;tiwai@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc: allow boot time extension of IPCMNI from 32k to 16M</title>
<updated>2019-05-15T02:52:52+00:00</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2019-05-14T22:46:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=5ac893b8cb10fe2a47a77780d37f9bf5b142854b'/>
<id>urn:sha1:5ac893b8cb10fe2a47a77780d37f9bf5b142854b</id>
<content type='text'>
The maximum number of unique System V IPC identifiers was limited to
32k.  That limit should be big enough for most use cases.

However, there are some users out there requesting for more, especially
those that are migrating from Solaris which uses 24 bits for unique
identifiers.  To satisfy the need of those users, a new boot time kernel
option "ipcmni_extend" is added to extend the IPCMNI value to 16M.  This
is a 512X increase which should be big enough for users out there that
need a large number of unique IPC identifier.

The use of this new option will change the pattern of the IPC
identifiers returned by functions like shmget(2).  An application that
depends on such pattern may not work properly.  So it should only be
used if the users really need more than 32k of unique IPC numbers.

This new option does have the side effect of reducing the maximum number
of unique sequence numbers from 64k down to 128.  So it is a trade-off.

The computation of a new IPC id is not done in the performance critical
path.  So a little bit of additional overhead shouldn't have any real
performance impact.

Link: http://lkml.kernel.org/r/20190329204930.21620-1-longman@redhat.com
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Acked-by: Manfred Spraul &lt;manfred@colorfullife.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Davidlohr Bueso &lt;dbueso@suse.de&gt;
Cc: "Eric W . Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: "Luis R. Rodriguez" &lt;mcgrof@kernel.org&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Takashi Iwai &lt;tiwai@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>ipc: rename old-style shmctl/semctl/msgctl syscalls</title>
<updated>2019-01-25T16:22:50+00:00</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2018-12-31T21:22:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=275f22148e8720e84b180d9e0cdf8abfd69bac5b'/>
<id>urn:sha1:275f22148e8720e84b180d9e0cdf8abfd69bac5b</id>
<content type='text'>
The behavior of these system calls is slightly different between
architectures, as determined by the CONFIG_ARCH_WANT_IPC_PARSE_VERSION
symbol. Most architectures that implement the split IPC syscalls don't set
that symbol and only get the modern version, but alpha, arm, microblaze,
mips-n32, mips-n64 and xtensa expect the caller to pass the IPC_64 flag.

For the architectures that so far only implement sys_ipc(), i.e. m68k,
mips-o32, powerpc, s390, sh, sparc, and x86-32, we want the new behavior
when adding the split syscalls, so we need to distinguish between the
two groups of architectures.

The method I picked for this distinction is to have a separate system call
entry point: sys_old_*ctl() now uses ipc_parse_version, while sys_*ctl()
does not. The system call tables of the five architectures are changed
accordingly.

As an additional benefit, we no longer need the configuration specific
definition for ipc_parse_version(), it always does the same thing now,
but simply won't get called on architectures with the modern interface.

A small downside is that on architectures that do set
ARCH_WANT_IPC_PARSE_VERSION, we now have an extra set of entry points
that are never called. They only add a few bytes of bloat, so it seems
better to keep them compared to adding yet another Kconfig symbol.
I considered adding new syscall numbers for the IPC_64 variants for
consistency, but decided against that for now.

Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</content>
</entry>
<entry>
<title>ipc: IPCMNI limit check for semmni</title>
<updated>2018-10-31T15:54:14+00:00</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2018-10-30T22:07:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=8c81ddd2acd2c10979f5a64f6784ce7c6717495e'/>
<id>urn:sha1:8c81ddd2acd2c10979f5a64f6784ce7c6717495e</id>
<content type='text'>
For SysV semaphores, the semmni value is the last part of the 4-element
sem number array.  To make semmni behave in a similar way to msgmni and
shmmni, we can't directly use the _minmax handler.  Instead, a special sem
specific handler is added to check the last argument to make sure that it
is limited to the [0, IPCMNI] range.  An error will be returned if this is
not the case.

Link: http://lkml.kernel.org/r/1536352137-12003-3-git-send-email-longman@redhat.com
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Reviewed-by: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Luis R. Rodriguez &lt;mcgrof@kernel.org&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Takashi Iwai &lt;tiwai@suse.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>y2038: globally rename compat_time to old_time32</title>
<updated>2018-08-27T12:48:48+00:00</updated>
<author>
<name>Arnd Bergmann</name>
<email>arnd@arndb.de</email>
</author>
<published>2018-07-13T10:52:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=9afc5eee65ca7d717a99d6fe8f4adfe32a40940a'/>
<id>urn:sha1:9afc5eee65ca7d717a99d6fe8f4adfe32a40940a</id>
<content type='text'>
Christoph Hellwig suggested a slightly different path for handling
backwards compatibility with the 32-bit time_t based system calls:

Rather than simply reusing the compat_sys_* entry points on 32-bit
architectures unchanged, we get rid of those entry points and the
compat_time types by renaming them to something that makes more sense
on 32-bit architectures (which don't have a compat mode otherwise),
and then share the entry points under the new name with the 64-bit
architectures that use them for implementing the compatibility.

The following types and interfaces are renamed here, and moved
from linux/compat_time.h to linux/time32.h:

old				new
---				---
compat_time_t			old_time32_t
struct compat_timeval		struct old_timeval32
struct compat_timespec		struct old_timespec32
struct compat_itimerspec	struct old_itimerspec32
ns_to_compat_timeval()		ns_to_old_timeval32()
get_compat_itimerspec64()	get_old_itimerspec32()
put_compat_itimerspec64()	put_old_itimerspec32()
compat_get_timespec64()		get_old_timespec32()
compat_put_timespec64()		put_old_timespec32()

As we already have aliases in place, this patch addresses only the
instances that are relevant to the system call interface in particular,
not those that occur in device drivers and other modules. Those
will get handled separately, while providing the 64-bit version
of the respective interfaces.

I'm not renaming the timex, rusage and itimerval structures, as we are
still debating what the new interface will look like, and whether we
will need a replacement at all.

This also doesn't change the names of the syscall entry points, which can
be done more easily when we actually switch over the 32-bit architectures
to use them, at that point we need to change COMPAT_SYSCALL_DEFINEx to
SYSCALL_DEFINEx with a new name, e.g. with a _time32 suffix.

Suggested-by: Christoph Hellwig &lt;hch@infradead.org&gt;
Link: https://lore.kernel.org/lkml/20180705222110.GA5698@infradead.org/
Signed-off-by: Arnd Bergmann &lt;arnd@arndb.de&gt;
</content>
</entry>
</feed>
