<feed xmlns='http://www.w3.org/2005/Atom'>
<title>kernel/linux.git/include/uapi/linux/prctl.h, branch v5.16.1</title>
<subtitle>Linux kernel stable tree (mirror)</subtitle>
<id>https://git.radix-linux.su/kernel/linux.git/atom?h=v5.16.1</id>
<link rel='self' href='https://git.radix-linux.su/kernel/linux.git/atom?h=v5.16.1'/>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/'/>
<updated>2021-11-11T00:10:47+00:00</updated>
<entry>
<title>Merge tag 'kernel.sys.v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux</title>
<updated>2021-11-11T00:10:47+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-11-11T00:10:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=a41b74451b35f7a6529689760eb8c05241feecbc'/>
<id>urn:sha1:a41b74451b35f7a6529689760eb8c05241feecbc</id>
<content type='text'>
Pull prctl updates from Christian Brauner:
 "This contains the missing prctl uapi pieces for PR_SCHED_CORE.

  In order to activate core scheduling the caller is expected to specify
  the scope of the new core scheduling domain.

  For example, passing 2 in the 4th argument of

     prctl(PR_SCHED_CORE, PR_SCHED_CORE_CREATE, &lt;pid&gt;,  2, 0);

  would indicate that the new core scheduling domain encompasses all
  tasks in the process group of &lt;pid&gt;. Specifying 0 would only create a
  core scheduling domain for the thread identified by &lt;pid&gt; and 2 would
  encompass the whole thread-group of &lt;pid&gt;.

  Note, the values 0, 1, and 2 correspond to PIDTYPE_PID, PIDTYPE_TGID,
  and PIDTYPE_PGID. A first version tried to expose those values
  directly to which I objected because:

   - PIDTYPE_* is an enum that is kernel internal which we should not
     expose to userspace directly.

   - PIDTYPE_* indicates what a given struct pid is used for it doesn't
     express a scope.

  But what the 4th argument of PR_SCHED_CORE prctl() expresses is the
  scope of the operation, i.e. the scope of the core scheduling domain
  at creation time. So Eugene's patch now simply introduces three new
  defines PR_SCHED_CORE_SCOPE_THREAD, PR_SCHED_CORE_SCOPE_THREAD_GROUP,
  and PR_SCHED_CORE_SCOPE_PROCESS_GROUP. They simply express what
  happens.

  This has been on the mailing list for quite a while with all relevant
  scheduler folks Cced. I announced multiple times that I'd pick this up
  if I don't see or her anyone else doing it. None of this touches
  proper scheduler code but only concerns uapi so I think this is fine.

  With core scheduling being quite common now for vm managers (e.g.
  moving individual vcpu threads into their own core scheduling domain)
  and container managers (e.g. moving the init process into its own core
  scheduling domain and letting all created children inherit it) having
  to rely on raw numbers passed as the 4th argument in prctl() is a bit
  annoying and everyone is starting to come up with their own defines"

* tag 'kernel.sys.v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  uapi/linux/prctl: provide macro definitions for the PR_SCHED_CORE type argument
</content>
</entry>
<entry>
<title>arm64: mte: change PR_MTE_TCF_NONE back into an unsigned long</title>
<updated>2021-11-08T10:04:33+00:00</updated>
<author>
<name>Peter Collingbourne</name>
<email>pcc@google.com</email>
</author>
<published>2021-11-05T23:08:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=aedad3e1c6ddec234b63cfb57ac231da0f680e50'/>
<id>urn:sha1:aedad3e1c6ddec234b63cfb57ac231da0f680e50</id>
<content type='text'>
This constant was previously an unsigned long, but was changed
into an int in commit 433c38f40f6a ("arm64: mte: change ASYNC and
SYNC TCF settings into bitfields"). This ended up causing spurious
unsigned-signed comparison warnings in expressions such as:

(x &amp; PR_MTE_TCF_MASK) != PR_MTE_TCF_NONE

Therefore, change it back into an unsigned long to silence these
warnings.

Link: https://linux-review.googlesource.com/id/I07a72310db30227a5b7d789d0b817d78b657c639
Signed-off-by: Peter Collingbourne &lt;pcc@google.com&gt;
Link: https://lore.kernel.org/r/20211105230829.2254790-1-pcc@google.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
</entry>
<entry>
<title>uapi/linux/prctl: provide macro definitions for the PR_SCHED_CORE type argument</title>
<updated>2021-09-29T11:00:05+00:00</updated>
<author>
<name>Eugene Syromiatnikov</name>
<email>esyr@redhat.com</email>
</author>
<published>2021-08-25T17:06:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=61bc346ce64a3864ac55f5d18bdc1572cda4fb18'/>
<id>urn:sha1:61bc346ce64a3864ac55f5d18bdc1572cda4fb18</id>
<content type='text'>
Commit 7ac592aa35a684ff ("sched: prctl() core-scheduling interface")
made use of enum pid_type in prctl's arg4; this type and the associated
enumeration definitions are not exposed to userspace.  Christian
has suggested to provide additional macro definitions that convey
the meaning of the type argument more in alignment with its actual
usage, and this patch does exactly that.

Link: https://lore.kernel.org/r/20210825170613.GA3884@asgard.redhat.com
Suggested-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Acked-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Signed-off-by: Eugene Syromiatnikov &lt;esyr@redhat.com&gt;
Complements: 7ac592aa35a684ff ("sched: prctl() core-scheduling interface")
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux</title>
<updated>2021-09-01T22:04:29+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2021-09-01T22:04:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=57c78a234e809e3a0516491e37ae5ccc6eeb21e8'/>
<id>urn:sha1:57c78a234e809e3a0516491e37ae5ccc6eeb21e8</id>
<content type='text'>
Pull arm64 updates from Catalin Marinas:

 - Support for 32-bit tasks on asymmetric AArch32 systems (on top of the
   scheduler changes merged via the tip tree).

 - More entry.S clean-ups and conversion to C.

 - MTE updates: allow a preferred tag checking mode to be set per CPU
   (the overhead of synchronous mode is smaller for some CPUs than
   others); optimisations for kernel entry/exit path; optionally disable
   MTE on the kernel command line.

 - Kselftest improvements for SVE and signal handling, PtrAuth.

 - Fix unlikely race where a TLBI could use stale ASID on an ASID
   roll-over (found by inspection).

 - Miscellaneous fixes: disable trapping of PMSNEVFR_EL1 to higher
   exception levels; drop unnecessary sigdelsetmask() call in the
   signal32 handling; remove BUG_ON when failing to allocate SVE state
   (just signal the process); SYM_CODE annotations.

 - Other trivial clean-ups: use macros instead of magic numbers, remove
   redundant returns, typos.

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (56 commits)
  arm64: Do not trap PMSNEVFR_EL1
  arm64: mm: fix comment typo of pud_offset_phys()
  arm64: signal32: Drop pointless call to sigdelsetmask()
  arm64/sve: Better handle failure to allocate SVE register storage
  arm64: Document the requirement for SCR_EL3.HCE
  arm64: head: avoid over-mapping in map_memory
  arm64/sve: Add a comment documenting the binutils needed for SVE asm
  arm64/sve: Add some comments for sve_save/load_state()
  kselftest/arm64: signal: Add a TODO list for signal handling tests
  kselftest/arm64: signal: Add test case for SVE register state in signals
  kselftest/arm64: signal: Verify that signals can't change the SVE vector length
  kselftest/arm64: signal: Check SVE signal frame shows expected vector length
  kselftest/arm64: signal: Support signal frames with SVE register data
  kselftest/arm64: signal: Add SVE to the set of features we can check for
  arm64: replace in_irq() with in_hardirq()
  kselftest/arm64: pac: Fix skipping of tests on systems without PAC
  Documentation: arm64: describe asymmetric 32-bit support
  arm64: Remove logic to kill 32-bit tasks on 64-bit-only cores
  arm64: Hook up cmdline parameter to allow mismatched 32-bit EL0
  arm64: Advertise CPUs capable of running 32-bit applications in sysfs
  ...
</content>
</entry>
<entry>
<title>arm64: mte: change ASYNC and SYNC TCF settings into bitfields</title>
<updated>2021-07-28T17:33:43+00:00</updated>
<author>
<name>Peter Collingbourne</name>
<email>pcc@google.com</email>
</author>
<published>2021-07-27T20:52:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=433c38f40f6a81cf3988b9372f2983912737f322'/>
<id>urn:sha1:433c38f40f6a81cf3988b9372f2983912737f322</id>
<content type='text'>
Allow the user program to specify both ASYNC and SYNC TCF modes by
repurposing the existing constants as bitfields. This will allow the
kernel to select one of the modes on behalf of the user program. With
this patch the kernel will always select async mode, but a subsequent
patch will make this configurable.

Link: https://linux-review.googlesource.com/id/Icc5923c85a8ea284588cc399ae74fd19ec291230
Signed-off-by: Peter Collingbourne &lt;pcc@google.com&gt;
Reviewed-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Link: https://lore.kernel.org/r/20210727205300.2554659-3-pcc@google.com
Acked-by: Will Deacon &lt;will@kernel.org&gt;
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
</entry>
<entry>
<title>x86, prctl: Hook L1D flushing in via prctl</title>
<updated>2021-07-28T09:42:25+00:00</updated>
<author>
<name>Balbir Singh</name>
<email>sblbir@amazon.com</email>
</author>
<published>2021-01-08T12:10:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=e893bb1bb4d2eb635eba61e5d9c5135d96855773'/>
<id>urn:sha1:e893bb1bb4d2eb635eba61e5d9c5135d96855773</id>
<content type='text'>
Use the existing PR_GET/SET_SPECULATION_CTRL API to expose the L1D flush
capability. For L1D flushing PR_SPEC_FORCE_DISABLE and
PR_SPEC_DISABLE_NOEXEC are not supported.

Enabling L1D flush does not check if the task is running on an SMT enabled
core, rather a check is done at runtime (at the time of flush), if the task
runs on a SMT sibling then the task is sent a SIGBUS which is executed
before the task returns to user space or to a guest.

This is better than the other alternatives of:

  a. Ensuring strict affinity of the task (hard to enforce without further
     changes in the scheduler)

  b. Silently skipping flush for tasks that move to SMT enabled cores.

Hook up the core prctl and implement the x86 specific parts which in turn
makes it functional.

Suggested-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Balbir Singh &lt;sblbir@amazon.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/r/20210108121056.21940-5-sblbir@amazon.com
</content>
</entry>
<entry>
<title>sched: prctl() core-scheduling interface</title>
<updated>2021-05-12T09:43:31+00:00</updated>
<author>
<name>Chris Hyser</name>
<email>chris.hyser@oracle.com</email>
</author>
<published>2021-03-24T21:40:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=7ac592aa35a684ff1858fb9ec282886b9e3575ac'/>
<id>urn:sha1:7ac592aa35a684ff1858fb9ec282886b9e3575ac</id>
<content type='text'>
This patch provides support for setting and copying core scheduling
'task cookies' between threads (PID), processes (TGID), and process
groups (PGID).

The value of core scheduling isn't that tasks don't share a core,
'nosmt' can do that. The value lies in exploiting all the sharing
opportunities that exist to recover possible lost performance and that
requires a degree of flexibility in the API.

From a security perspective (and there are others), the thread,
process and process group distinction is an existent hierarchal
categorization of tasks that reflects many of the security concerns
about 'data sharing'. For example, protecting against cache-snooping
by a thread that can just read the memory directly isn't all that
useful.

With this in mind, subcommands to CREATE/SHARE (TO/FROM) provide a
mechanism to create and share cookies. CREATE/SHARE_TO specify a
target pid with enum pidtype used to specify the scope of the targeted
tasks. For example, PIDTYPE_TGID will share the cookie with the
process and all of it's threads as typically desired in a security
scenario.

API:

  prctl(PR_SCHED_CORE, PR_SCHED_CORE_GET, tgtpid, pidtype, &amp;cookie)
  prctl(PR_SCHED_CORE, PR_SCHED_CORE_CREATE, tgtpid, pidtype, NULL)
  prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_TO, tgtpid, pidtype, NULL)
  prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_FROM, srcpid, pidtype, NULL)

where 'tgtpid/srcpid == 0' implies the current process and pidtype is
kernel enum pid_type {PIDTYPE_PID, PIDTYPE_TGID, PIDTYPE_PGID, ...}.

For return values, EINVAL, ENOMEM are what they say. ESRCH means the
tgtpid/srcpid was not found. EPERM indicates lack of PTRACE permission
access to tgtpid/srcpid. ENODEV indicates your machines lacks SMT.

[peterz: complete rewrite]
Signed-off-by: Chris Hyser &lt;chris.hyser@oracle.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Don Hiatt &lt;dhiatt@digitalocean.com&gt;
Tested-by: Hongyu Ning &lt;hongyu.ning@linux.intel.com&gt;
Tested-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Link: https://lkml.kernel.org/r/20210422123309.039845339@infradead.org
</content>
</entry>
<entry>
<title>arm64: Introduce prctl(PR_PAC_{SET,GET}_ENABLED_KEYS)</title>
<updated>2021-04-13T16:31:44+00:00</updated>
<author>
<name>Peter Collingbourne</name>
<email>pcc@google.com</email>
</author>
<published>2021-03-19T03:10:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=201698626fbca1cf1a3b686ba14cf2a056500716'/>
<id>urn:sha1:201698626fbca1cf1a3b686ba14cf2a056500716</id>
<content type='text'>
This change introduces a prctl that allows the user program to control
which PAC keys are enabled in a particular task. The main reason
why this is useful is to enable a userspace ABI that uses PAC to
sign and authenticate function pointers and other pointers exposed
outside of the function, while still allowing binaries conforming
to the ABI to interoperate with legacy binaries that do not sign or
authenticate pointers.

The idea is that a dynamic loader or early startup code would issue
this prctl very early after establishing that a process may load legacy
binaries, but before executing any PAC instructions.

This change adds a small amount of overhead to kernel entry and exit
due to additional required instruction sequences.

On a DragonBoard 845c (Cortex-A75) with the powersave governor, the
overhead of similar instruction sequences was measured as 4.9ns when
simulating the common case where IA is left enabled, or 43.7ns when
simulating the uncommon case where IA is disabled. These numbers can
be seen as the worst case scenario, since in more realistic scenarios
a better performing governor would be used and a newer chip would be
used that would support PAC unlike Cortex-A75 and would be expected
to be faster than Cortex-A75.

On an Apple M1 under a hypervisor, the overhead of the entry/exit
instruction sequences introduced by this patch was measured as 0.3ns
in the case where IA is left enabled, and 33.0ns in the case where
IA is disabled.

Signed-off-by: Peter Collingbourne &lt;pcc@google.com&gt;
Reviewed-by: Dave Martin &lt;Dave.Martin@arm.com&gt;
Link: https://linux-review.googlesource.com/id/Ibc41a5e6a76b275efbaa126b31119dc197b927a5
Link: https://lore.kernel.org/r/d6609065f8f40397a4124654eb68c9f490b4d477.1616123271.git.pcc@google.com
Signed-off-by: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
</content>
</entry>
<entry>
<title>entry: Use different define for selector variable in SUD</title>
<updated>2021-02-05T23:21:42+00:00</updated>
<author>
<name>Gabriel Krisman Bertazi</name>
<email>krisman@collabora.com</email>
</author>
<published>2021-02-05T18:43:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=36a6c843fd0d8e02506681577e96dabd203dd8e8'/>
<id>urn:sha1:36a6c843fd0d8e02506681577e96dabd203dd8e8</id>
<content type='text'>
Michael Kerrisk suggested that, from an API perspective, it is a bad
idea to share the PR_SYS_DISPATCH_ defines between the prctl operation
and the selector variable.

Therefore, define two new constants to be used by SUD's selector variable
and update the corresponding documentation and test cases.

While this changes the API syscall user dispatch has never been part of a
Linux release, it will show up for the first time in 5.11.

Suggested-by: Michael Kerrisk (man-pages) &lt;mtk.manpages@gmail.com&gt;
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@collabora.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Link: https://lore.kernel.org/r/20210205184321.2062251-1-krisman@collabora.com


</content>
</entry>
<entry>
<title>kernel: Implement selective syscall userspace redirection</title>
<updated>2020-12-02T14:07:56+00:00</updated>
<author>
<name>Gabriel Krisman Bertazi</name>
<email>krisman@collabora.com</email>
</author>
<published>2020-11-27T19:32:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.radix-linux.su/kernel/linux.git/commit/?id=1446e1df9eb183fdf81c3f0715402f1d7595d4cb'/>
<id>urn:sha1:1446e1df9eb183fdf81c3f0715402f1d7595d4cb</id>
<content type='text'>
Introduce a mechanism to quickly disable/enable syscall handling for a
specific process and redirect to userspace via SIGSYS.  This is useful
for processes with parts that require syscall redirection and parts that
don't, but who need to perform this boundary crossing really fast,
without paying the cost of a system call to reconfigure syscall handling
on each boundary transition.  This is particularly important for Windows
games running over Wine.

The proposed interface looks like this:

  prctl(PR_SET_SYSCALL_USER_DISPATCH, &lt;op&gt;, &lt;off&gt;, &lt;length&gt;, [selector])

The range [&lt;offset&gt;,&lt;offset&gt;+&lt;length&gt;) is a part of the process memory
map that is allowed to by-pass the redirection code and dispatch
syscalls directly, such that in fast paths a process doesn't need to
disable the trap nor the kernel has to check the selector.  This is
essential to return from SIGSYS to a blocked area without triggering
another SIGSYS from rt_sigreturn.

selector is an optional pointer to a char-sized userspace memory region
that has a key switch for the mechanism. This key switch is set to
either PR_SYS_DISPATCH_ON, PR_SYS_DISPATCH_OFF to enable and disable the
redirection without calling the kernel.

The feature is meant to be set per-thread and it is disabled on
fork/clone/execv.

Internally, this doesn't add overhead to the syscall hot path, and it
requires very little per-architecture support.  I avoided using seccomp,
even though it duplicates some functionality, due to previous feedback
that maybe it shouldn't mix with seccomp since it is not a security
mechanism.  And obviously, this should never be considered a security
mechanism, since any part of the program can by-pass it by using the
syscall dispatcher.

For the sysinfo benchmark, which measures the overhead added to
executing a native syscall that doesn't require interception, the
overhead using only the direct dispatcher region to issue syscalls is
pretty much irrelevant.  The overhead of using the selector goes around
40ns for a native (unredirected) syscall in my system, and it is (as
expected) dominated by the supervisor-mode user-address access.  In
fact, with SMAP off, the overhead is consistently less than 5ns on my
test box.

Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@collabora.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Andy Lutomirski &lt;luto@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Link: https://lore.kernel.org/r/20201127193238.821364-4-krisman@collabora.com

</content>
</entry>
</feed>
