Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull kvm updates from Paolo Bonzini:
"ARM:
- VHE optimizations
- EL2 address space randomization
- speculative execution mitigations ("variant 3a", aka execution past
invalid privilege register access)
- bugfixes and cleanups
PPC:
- improvements for the radix page fault handler for HV KVM on POWER9
s390:
- more kvm stat counters
- virtio gpu plumbing
- documentation
- facilities improvements
x86:
- support for VMware magic I/O port and pseudo-PMCs
- AMD pause loop exiting
- support for AMD core performance extensions
- support for synchronous register access
- expose nVMX capabilities to userspace
- support for Hyper-V signaling via eventfd
- use Enlightened VMCS when running on Hyper-V
- allow userspace to disable MWAIT/HLT/PAUSE vmexits
- usual roundup of optimizations and nested virtualization bugfixes
Generic:
- API selftest infrastructure (though the only tests are for x86 as
of now)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (174 commits)
kvm: x86: fix a prototype warning
kvm: selftests: add sync_regs_test
kvm: selftests: add API testing infrastructure
kvm: x86: fix a compile warning
KVM: X86: Add Force Emulation Prefix for "emulate the next instruction"
KVM: X86: Introduce handle_ud()
KVM: vmx: unify adjacent #ifdefs
x86: kvm: hide the unused 'cpu' variable
KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig
Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown"
kvm: Add emulation for movups/movupd
KVM: VMX: raise internal error for exception during invalid protected mode state
KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending
KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending
KVM: x86: Fix misleading comments on handling pending exceptions
KVM: x86: Rename interrupt.pending to interrupt.injected
KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt
x86/kvm: use Enlightened VMCS when running on Hyper-V
x86/hyper-v: detect nested features
x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits
...
|
|
Pull VFIO updates from Alex Williamson:
- Adopt iommu_unmap_fast() interface to type1 backend
(Suravee Suthikulpanit)
- mdev sample driver fixup (Shunyong Yang)
- More efficient PFN mapping handling in type1 backend
(Jason Cai)
- VFIO device ioeventfd interface (Alex Williamson)
- Tag new vfio-platform sub-maintainer (Alex Williamson)
* tag 'vfio-v4.17-rc1' of git://github.com/awilliam/linux-vfio:
MAINTAINERS: vfio/platform: Update sub-maintainer
vfio/pci: Add ioeventfd support
vfio/pci: Use endian neutral helpers
vfio/pci: Pull BAR mapping setup from read-write path
vfio/type1: Improve memory pinning process for raw PFN mapping
vfio-mdev/samples: change RDI interrupt condition
vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs
|
|
Pull fw_cfg, vhost updates from Michael Tsirkin:
"This cleans up the qemu fw cfg device driver.
On top of this, vmcore is dumped there on crash to help debugging
with kASLR enabled.
Also included are some fixes in vhost"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost: add vsock compat ioctl
vhost: fix vhost ioctl signature to build with clang
fw_cfg: write vmcoreinfo details
crash: export paddr_vmcoreinfo_note()
fw_cfg: add DMA register
fw_cfg: add a public uapi header
fw_cfg: handle fw_cfg_read_blob() error
fw_cfg: remove inline from fw_cfg_read_blob()
fw_cfg: fix sparse warnings around FW_CFG_FILE_DIR read
fw_cfg: fix sparse warning reading FW_CFG_ID
fw_cfg: fix sparse warnings with fw_cfg_file
fw_cfg: fix sparse warnings in fw_cfg_sel_endianness()
ptr_ring: fix build
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- move pci_uevent_ers() out of pci.h (Michael Ellerman)
- skip ASPM common clock warning if BIOS already configured it (Sinan
Kaya)
- fix ASPM Coverity warning about threshold_ns (Gustavo A. R. Silva)
- remove last user of pci_get_bus_and_slot() and the function itself
(Sinan Kaya)
- add decoding for 16 GT/s link speed (Jay Fang)
- add interfaces to get max link speed and width (Tal Gilboa)
- add pcie_bandwidth_capable() to compute max supported link bandwidth
(Tal Gilboa)
- add pcie_bandwidth_available() to compute bandwidth available to
device (Tal Gilboa)
- add pcie_print_link_status() to log link speed and whether it's
limited (Tal Gilboa)
- use PCI core interfaces to report when device performance may be
limited by its slot instead of doing it in each driver (Tal Gilboa)
- fix possible cpqphp NULL pointer dereference (Shawn Lin)
- rescan more of the hierarchy on ACPI hotplug to fix Thunderbolt/xHCI
hotplug (Mika Westerberg)
- add support for PCI I/O port space that's neither directly accessible
via CPU in/out instructions nor directly mapped into CPU physical
memory space. This is fairly intrusive and includes minor changes to
interfaces used for I/O space on most platforms (Zhichang Yuan, John
Garry)
- add support for HiSilicon Hip06/Hip07 LPC I/O space (Zhichang Yuan,
John Garry)
- use PCI_EXP_DEVCTL2_COMP_TIMEOUT in rapidio/tsi721 (Bjorn Helgaas)
- remove possible NULL pointer dereference in of_pci_bus_find_domain_nr()
(Shawn Lin)
- report quirk timings with dev_info (Bjorn Helgaas)
- report quirks that take longer than 10ms (Bjorn Helgaas)
- add and use Altera Vendor ID (Johannes Thumshirn)
- tidy Makefiles and comments (Bjorn Helgaas)
- don't set up INTx if MSI or MSI-X is enabled to align cris, frv,
ia64, and mn10300 with x86 (Bjorn Helgaas)
- move pcieport_if.h to drivers/pci/pcie/ to encapsulate it (Frederick
Lawler)
- merge pcieport_if.h into portdrv.h (Bjorn Helgaas)
- move workaround for BIOS PME issue from portdrv to PCI core (Bjorn
Helgaas)
- completely disable portdrv with "pcie_ports=compat" (Bjorn Helgaas)
- remove portdrv link order dependency (Bjorn Helgaas)
- remove support for unused VC portdrv service (Bjorn Helgaas)
- simplify portdrv feature permission checking (Bjorn Helgaas)
- remove "pcie_hp=nomsi" parameter (use "pci=nomsi" instead) (Bjorn
Helgaas)
- remove unnecessary "pcie_ports=auto" parameter (Bjorn Helgaas)
- use cached AER capability offset (Frederick Lawler)
- don't enable DPC if BIOS hasn't granted AER control (Mika Westerberg)
- rename pcie-dpc.c to dpc.c (Bjorn Helgaas)
- use generic pci_mmap_resource_range() instead of powerpc and xtensa
arch-specific versions (David Woodhouse)
- support arbitrary PCI host bridge offsets on sparc (Yinghai Lu)
- remove System and Video ROM reservations on sparc (Bjorn Helgaas)
- probe for device reset support during enumeration instead of runtime
(Bjorn Helgaas)
- add ACS quirk for Ampere (née APM) root ports (Feng Kan)
- add function 1 DMA alias quirk for Marvell 88SE9220 (Thomas
Vincent-Cross)
- protect device restore with device lock (Sinan Kaya)
- handle failure of FLR gracefully (Sinan Kaya)
- handle CRS (config retry status) after device resets (Sinan Kaya)
- skip various config reads for SR-IOV VFs as an optimization
(KarimAllah Ahmed)
- consolidate VPD code in vpd.c (Bjorn Helgaas)
- add Tegra dependency on PCI_MSI_IRQ_DOMAIN (Arnd Bergmann)
- add DT support for R-Car r8a7743 (Biju Das)
- fix a PCI_EJECT vs PCI_BUS_RELATIONS race condition in Hyper-V host
bridge driver that causes a general protection fault (Dexuan Cui)
- fix Hyper-V host bridge hang in MSI setup on 1-vCPU VMs with SR-IOV
(Dexuan Cui)
- fix Hyper-V host bridge hang when ejecting a VF before setting up MSI
(Dexuan Cui)
- make several structures static (Fengguang Wu)
- increase number of MSI IRQs supported by Synopsys DesignWare bridges
from 32 to 256 (Gustavo Pimentel)
- implemented multiplexed IRQ domain API and remove obsolete MSI IRQ
API from DesignWare drivers (Gustavo Pimentel)
- add Tegra power management support (Manikanta Maddireddy)
- add Tegra loadable module support (Manikanta Maddireddy)
- handle 64-bit BARs correctly in endpoint support (Niklas Cassel)
- support optional regulator for HiSilicon STB (Shawn Guo)
- use regulator bulk API for Qualcomm apq8064 (Srinivas Kandagatla)
- support power supplies for Qualcomm msm8996 (Srinivas Kandagatla)
* tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (123 commits)
MAINTAINERS: Add John Garry as maintainer for HiSilicon LPC driver
HISI LPC: Add ACPI support
ACPI / scan: Do not enumerate Indirect IO host children
ACPI / scan: Rename acpi_is_serial_bus_slave() for more general use
HISI LPC: Support the LPC host on Hip06/Hip07 with DT bindings
of: Add missing I/O range exception for indirect-IO devices
PCI: Apply the new generic I/O management on PCI IO hosts
PCI: Add fwnode handler as input param of pci_register_io_range()
PCI: Remove __weak tag from pci_register_io_range()
MAINTAINERS: Add missing /drivers/pci/cadence directory entry
fm10k: Report PCIe link properties with pcie_print_link_status()
net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth
net/mlx5: Report PCIe link properties with pcie_print_link_status()
net/mlx4_core: Report PCIe link properties with pcie_print_link_status()
PCI: Add pcie_print_link_status() to log link speed and whether it's limited
PCI: Add pcie_bandwidth_available() to compute bandwidth available to device
misc: pci_endpoint_test: Handle 64-bit BARs properly
PCI: designware-ep: Make dw_pcie_ep_reset_bar() handle 64-bit BARs properly
PCI: endpoint: Make sure that BAR_5 does not have 64-bit flag set when clearing
PCI: endpoint: Make epc->ops->clear_bar()/pci_epc_clear_bar() take struct *epf_bar
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"Doug and I are at a conference next week so if another PR is sent I
expect it to only be bug fixes. Parav noted yesterday that there are
some fringe case behavior changes in his work that he would like to
fix, and I see that Intel has a number of rc looking patches for HFI1
they posted yesterday.
Parav is again the biggest contributor by patch count with his ongoing
work to enable container support in the RDMA stack, followed by Leon
doing syzkaller inspired cleanups, though most of the actual fixing
went to RC.
There is one uncomfortable series here fixing the user ABI to actually
work as intended in 32 bit mode. There are lots of notes in the commit
messages, but the basic summary is we don't think there is an actual
32 bit kernel user of drivers/infiniband for several good reasons.
However we are seeing people want to use a 32 bit user space with 64
bit kernel, which didn't completely work today. So in fixing it we
required a 32 bit rxe user to upgrade their userspace. rxe users are
still already quite rare and we think a 32 bit one is non-existing.
- Fix RDMA uapi headers to actually compile in userspace and be more
complete
- Three shared with netdev pull requests from Mellanox:
* 7 patches, mostly to net with 1 IB related one at the back).
This series addresses an IRQ performance issue (patch 1),
cleanups related to the fix for the IRQ performance problem
(patches 2-6), and then extends the fragmented completion queue
support that already exists in the net side of the driver to the
ib side of the driver (patch 7).
* Mostly IB, with 5 patches to net that are needed to support the
remaining 10 patches to the IB subsystem. This series extends
the current 'representor' framework when the mlx5 driver is in
switchdev mode from being a netdev only construct to being a
netdev/IB dev construct. The IB dev is limited to raw Eth queue
pairs only, but by having an IB dev of this type attached to the
representor for a switchdev port, it enables DPDK to work on the
switchdev device.
* All net related, but needed as infrastructure for the rdma
driver
- Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers
- SRP performance updates
- IB uverbs write path cleanup patch series from Leon
- Add RDMA_CM support to ib_srpt. This is disabled by default. Users
need to set the port for ib_srpt to listen on in configfs in order
for it to be enabled
(/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port)
- TSO and Scatter FCS support in mlx4
- Refactor of modify_qp routine to resolve problems seen while
working on new code that is forthcoming
- More refactoring and updates of RDMA CM for containers support from
Parav
- mlx5 'fine grained packet pacing', 'ipsec offload' and 'device
memory' user API features
- Infrastructure updates for the new IOCTL interface, based on
increased usage
- ABI compatibility bug fixes to fully support 32 bit userspace on 64
bit kernel as was originally intended. See the commit messages for
extensive details
- Syzkaller bugs and code cleanups motivated by them"
* tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (199 commits)
IB/rxe: Fix for oops in rxe_register_device on ppc64le arch
IB/mlx5: Device memory mr registration support
net/mlx5: Mkey creation command adjustments
IB/mlx5: Device memory support in mlx5_ib
net/mlx5: Query device memory capabilities
IB/uverbs: Add device memory registration ioctl support
IB/uverbs: Add alloc/free dm uverbs ioctl support
IB/uverbs: Add device memory capabilities reporting
IB/uverbs: Expose device memory capabilities to user
RDMA/qedr: Fix wmb usage in qedr
IB/rxe: Removed GID add/del dummy routines
RDMA/qedr: Zero stack memory before copying to user space
IB/mlx5: Add ability to hash by IPSEC_SPI when creating a TIR
IB/mlx5: Add information for querying IPsec capabilities
IB/mlx5: Add IPsec support for egress and ingress
{net,IB}/mlx5: Add ipsec helper
IB/mlx5: Add modify_flow_action_esp verb
IB/mlx5: Add implementation for create and destroy action_xfrm
IB/uverbs: Introduce ESP steering match filter
IB/uverbs: Add modify ESP flow_action
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull SELinux updates from Paul Moore:
"A bigger than usual pull request for SELinux, 13 patches (lucky!)
along with a scary looking diffstat.
Although if you look a bit closer, excluding the usual minor
tweaks/fixes, there are really only two significant changes in this
pull request: the addition of proper SELinux access controls for SCTP
and the encapsulation of a lot of internal SELinux state.
The SCTP changes are the result of a multi-month effort (maybe even a
year or longer?) between the SELinux folks and the SCTP folks to add
proper SELinux controls. A special thanks go to Richard for seeing
this through and keeping the effort moving forward.
The state encapsulation work is a bit of janitorial work that came out
of some early work on SELinux namespacing. The question of namespacing
is still an open one, but I believe there is some real value in the
encapsulation work so we've split that out and are now sending that up
to you"
* tag 'selinux-pr-20180403' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: wrap AVC state
selinux: wrap selinuxfs state
selinux: fix handling of uninitialized selinux state in get_bools/classes
selinux: Update SELinux SCTP documentation
selinux: Fix ltp test connect-syscall failure
selinux: rename the {is,set}_enforcing() functions
selinux: wrap global selinux state
selinux: fix typo in selinux_netlbl_sctp_sk_clone declaration
selinux: Add SCTP support
sctp: Add LSM hooks
sctp: Add ip option support
security: Add support for SCTP security hooks
netlabel: If PF_INET6, check sk_buff ip header version
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- DM core passthrough ioctl fix to retain reference to DM table, and
that table's block devices, while issuing the ioctl to one of those
block devices.
- DM core passthrough ioctl fix to _not_ override the fmode_t used to
issue the ioctl. Overriding by using the fmode_t that the block
device was originally open with during DM table load is a liability.
- Add DM core support for secure erase forwarding and update the DM
linear and DM striped targets to support them.
- A DM core 4.16 stable fix to allow abnormal IO (e.g. discard, write
same, write zeroes) for targets that make use of the non-splitting IO
variant (as is done for multipath or thinp when layered directly on
NVMe).
- Allow DM targets to return a payload in response to a DM message that
they are sent. This is useful for DM targets that would like to
provide statistics data in response to DM messages.
- Update DM bufio to support non-power-of-2 block sizes. Numerous other
related changes prepare the DM bufio code for this support.
- Fix DM crypt to use a bounded amount of memory across the entire
system. This is to avoid OOM that can otherwise occur in response to
certain pathological IO workloads (e.g. discarding a large DM crypt
device).
- Add a 'check_at_most_once' feature to the DM verity target to allow
verity to be used on mobile devices that have very limited resources.
- Fix the DM integrity target to fail early if a keyed algorithm (e.g.
HMAC) is to be used but the key isn't set.
- Add non-power-of-2 support to the DM unstripe target.
- Eliminate the use of a Variable Length Array in the DM stripe target.
- Update the DM log-writes target to record metadata (REQ_META flag).
- DM raid fixes for its nosync status and some variable range issues.
* tag 'for-4.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (28 commits)
dm: remove fmode_t argument from .prepare_ioctl hook
dm: hold DM table for duration of ioctl rather than use blkdev_get
dm raid: fix parse_raid_params() variable range issue
dm verity: make verity_for_io_block static
dm verity: add 'check_at_most_once' option to only validate hashes once
dm bufio: don't embed a bio in the dm_buffer structure
dm bufio: support non-power-of-two block sizes
dm bufio: use slab cache for dm_buffer structure allocations
dm bufio: reorder fields in dm_buffer structure
dm bufio: relax alignment constraint on slab cache
dm bufio: remove code that merges slab caches
dm bufio: get rid of slab cache name allocations
dm bufio: move dm-bufio.h to include/linux/
dm bufio: delete outdated comment
dm: add support for secure erase forwarding
dm: backfill abnormal IO support to non-splitting IO submission
dm raid: fix nosync status
dm mpath: use DM_MAPIO_SUBMITTED instead of magic number 0 in process_queued_bios()
dm stripe: get rid of a Variable Length Array (VLA)
dm log writes: record metadata flag for better flags record
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull siginfo updates from Eric Biederman:
"The work on cleaning up and getting the bugs out of siginfo generation
was largely stalled this round. The progress that was made was the
definition of FPE_FLTUNK. Which is usable to fix many of the cases
where siginfo generation is erroneously generating SI_USER by setting
si_code to 0, that has recently been tagged as FPE_FIXME.
You already have the change by way of the arm64 tree as that
definition was pulled into the arm64 tree to allow fixing the problem
there.
What remains is the second round of fixing for what I thought was a
trivial change to the struct siginfo when put the union in _sigfault
where it belongs. Do to historical reasons 32bit m68k only ensures
that pointers are 2 byte aligned. So I have added a m68k test case
made of BUILD_BUG_ONs to verify I have this fix correct and possibly
catch problems, and I have computed the number of bytes of padding
needed for the _addr_bnd and _addr_pkey cases and just use an array of
characters that size.
For pure paranoia I have written the code so if there is an
architecture out there that does not perform any alignment of
structures it should still work.
With the removal of all of the stale arechitectures this cycle future
work on cleaning up struct siginfo should be much easier. Almost all
of the conflicting si_code definitions have been removed with the
removal of (blackfin, tile, and frv). Plus some of the most difficult
to test cases have simply been removed from the tree.
Which means that with a little luck copy_siginfo_to_user can become a
light weight wrapper around copy_to_user in the next cycle"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
m68k: Verify the offsets in struct siginfo never change.
signal: Correct the offset of si_pkey and si_lower in struct siginfo on m68k
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull misc filesystem updates from Jan Kara:
"udf, ext2, quota, fsnotify fixes & cleanups:
- udf fixes for handling of media without uid/gid
- udf fixes for some corner cases in parsing of volume recognition
sequence
- improvements of fsnotify handling of ENOMEM
- new ioctl to allow setting of watch descriptor id for inotify (for
checkpoint - restart)
- small ext2, reiserfs, quota cleanups"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: Kill an unused extern entry form quota.h
reiserfs: Remove VLA from fs/reiserfs/reiserfs.h
udf: fix potential refcnt problem of nls module
ext2: change return code to -ENOMEM when failing memory allocation
udf: Do not mark possibly inconsistent filesystems as closed
fsnotify: Let userspace know about lost events due to ENOMEM
fanotify: Avoid lost events due to ENOMEM for unlimited queues
udf: Remove never implemented mount options
udf: Update mount option documentation
udf: Provide saner default for invalid uid / gid
udf: Clean up handling of invalid uid/gid
udf: Apply uid/gid mount options also to new inodes & chown
udf: Ignore [ug]id=ignore mount options
udf: Fix handling of Partition Descriptors
udf: Unify common handling of descriptors
udf: Convert descriptor index definitions to enum
udf: Allow volume descriptor sequence to be terminated by unrecorded block
udf: Simplify handling of Volume Descriptor Pointers
udf: Fix off-by-one in volume descriptor sequence length
inotify: Extend ioctl to allow to request id of new watch descriptor
|
|
Pull block layer updates from Jens Axboe:
"It's a pretty quiet round this time, which is nice. This contains:
- series from Bart, cleaning up the way we set/test/clear atomic
queue flags.
- series from Bart, fixing races between gendisk and queue
registration and removal.
- set of bcache fixes and improvements from various folks, by way of
Michael Lyle.
- set of lightnvm updates from Matias, most of it being the 1.2 to
2.0 transition.
- removal of unused DIO flags from Nikolay.
- blk-mq/sbitmap memory ordering fixes from Omar.
- divide-by-zero fix for BFQ from Paolo.
- minor documentation patches from Randy.
- timeout fix from Tejun.
- Alpha "can't write a char atomically" fix from Mikulas.
- set of NVMe fixes by way of Keith.
- bsg and bsg-lib improvements from Christoph.
- a few sed-opal fixes from Jonas.
- cdrom check-disk-change deadlock fix from Maurizio.
- various little fixes, comment fixes, etc from various folks"
* tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-block: (139 commits)
blk-mq: Directly schedule q->timeout_work when aborting a request
blktrace: fix comment in blktrace_api.h
lightnvm: remove function name in strings
lightnvm: pblk: remove some unnecessary NULL checks
lightnvm: pblk: don't recover unwritten lines
lightnvm: pblk: implement 2.0 support
lightnvm: pblk: implement get log report chunk
lightnvm: pblk: rename ppaf* to addrf*
lightnvm: pblk: check for supported version
lightnvm: implement get log report chunk helpers
lightnvm: make address conversions depend on generic device
lightnvm: add support for 2.0 address format
lightnvm: normalize geometry nomenclature
lightnvm: complete geo structure with maxoc*
lightnvm: add shorten OCSSD version in geo
lightnvm: add minor version to generic geometry
lightnvm: simplify geometry structure
lightnvm: pblk: refactor init/exit sequences
lightnvm: Avoid validation of default op value
lightnvm: centralize permission check for lightnvm ioctl
...
|
|
This patch adds the mlx5_ib driver implementation for the device
memory allocation API.
It implements the ib_device callbacks for allocation and deallocation
operations as well as a new mmap command support which allows mapping
an allocated device memory to a VMA.
The change also adds reporting of device memory maximum size and
alignment parameters reported in device capabilities.
The allocation/deallocation operations are using new firmware
commands to allocate MEMIC memory on the device.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"This became a large update. The changes are scattered widely, and the
majority of them are attributed to ASoC componentization. The gitk
output made me dizzy, but it's slightly better than London tube.
OK, below are some highlights:
- Continued hardening works in ALSA PCM core; most of the existing
syzkaller reports should have been covered.
- USB-audio got the initial USB Audio Class 3 support, as well as
UAC2 jack detection support and more DSD-device support.
- ASoC componentization: finally each individual driver was converted
to components framework, which is more future-proof for further
works. Most of conversations were systematic.
- Lots of fixes for Intel Baytrail / Cherrytrail devices with Realtek
codecs, typically tablets and small PCs.
- Fixes / cleanups for Samsung Odroid systems
- Cleanups in Freescale SSI driver
- New ASoC drivers:
* AKM AK4458 and AK5558 codecs
* A few AMD based machine drivers
* Intel Kabylake machine drivers
* Maxim MAX9759 codec
* Motorola CPCAP codec
* Socionext Uniphier SoCs
* TI PCM1789 and TDA7419 codecs
- Retirement of Blackfin drivers along with architecture removal"
* tag 'sound-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (497 commits)
ALSA: pcm: Fix UAF at PCM release via PCM timer access
ALSA: usb-audio: silence a static checker warning
ASoC: tscs42xx: Remove owner assignment from i2c_driver
ASoC: mediatek: remove "simple-mfd" in the example
ASoC: cpcap: replace codec to component
ASoC: Intel: bytcr_rt5651: don't use codec anymore
ASoC: amd: don't use codec anymore
ALSA: usb-audio: fix memory leak on cval
ALSA: pcm: Fix mutex unbalance in OSS emulation ioctls
ASoC: topology: Fix kcontrol name string handling
ALSA: aloop: Mark paused device as inactive
ALSA: usb-audio: update clock valid control
ALSA: usb-audio: UAC2 jack detection
ALSA: pcm: Return -EBUSY for OSS ioctls changing busy streams
ALSA: pcm: Avoid potential races between OSS ioctls and read/write
ALSA: usb-audio: Integrate native DSD support for ITF-USB based DACs.
ALSA: usb-audio: FIX native DSD support for TEAC UD-501 DAC
ALSA: usb-audio: Add native DSD support for Luxman DA-06
ALSA: usb-audio: fix uac control query argument
ASoC: nau8824: recover system clock when device changes
...
|
|
Adding new ioctl method for the MR object - REG_DM_MR.
This command can be used by users to register an allocated
device memory buffer as an MR and receive lkey and rkey
to be used within work requests.
It is added as a new method under the MR object and using a new
ib_device callback - reg_dm_mr.
The command creates a standard ib_mr object which represents the
registered memory.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
This change adds uverbs support for allocation/freeing
of device memory commands.
A new uverbs object is defined of type idr to represent
and track the new resource type allocation per context.
The API requires provider driver to implement 2 new ib_device
callbacks - one for allocation and one for deallocation which
return and accept (respectively) the ib_dm object which represents
the allocated memory on the device.
The support is added via the ioctl command infrastructure
only.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Adding a new capability field under ib_uverbs_ex_query_device_resp -
max_dm_size - which reflects the maximum amount of device memory
that is available for allocation on a device in bytes.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc updates from Greg KH:
"Here is the big set of char/misc driver patches for 4.17-rc1.
There are a lot of little things in here, nothing huge, but all
important to the different hardware types involved:
- thunderbolt driver updates
- parport updates (people still care...)
- nvmem driver updates
- mei updates (as always)
- hwtracing driver updates
- hyperv driver updates
- extcon driver updates
- ... and a handful of even smaller driver subsystem and individual
driver updates
All of these have been in linux-next with no reported issues"
* tag 'char-misc-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (149 commits)
hwtracing: Add HW tracing support menu
intel_th: Add ACPI glue layer
intel_th: Allow forcing host mode through drvdata
intel_th: Pick up irq number from resources
intel_th: Don't touch switch routing in host mode
intel_th: Use correct method of finding hub
intel_th: Add SPDX GPL-2.0 header to replace GPLv2 boilerplate
stm class: Make dummy's master/channel ranges configurable
stm class: Add SPDX GPL-2.0 header to replace GPLv2 boilerplate
MAINTAINERS: Bestow upon myself the care for drivers/hwtracing
hv: add SPDX license id to Kconfig
hv: add SPDX license to trace
Drivers: hv: vmbus: do not mark HV_PCIE as perf_device
Drivers: hv: vmbus: respect what we get from hv_get_synint_state()
/dev/mem: Avoid overwriting "err" in read_mem()
eeprom: at24: use SPDX identifier instead of GPL boiler-plate
eeprom: at24: simplify the i2c functionality checking
eeprom: at24: fix a line break
eeprom: at24: tweak newlines
eeprom: at24: refactor at24_probe()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO updates from Greg KH:
"Here is the big set of Staging/IIO driver patches for 4.17-rc1.
It is a lot, over 500 changes, but not huge by previous kernel release
standards. We deleted more lines than we added again (27k added vs.
91k remvoed), thanks to finally being able to delete the IRDA drivers
and networking code.
We also deleted the ccree crypto driver, but that's coming back in
through the crypto tree to you, in a much cleaned-up form.
Added this round is at lot of "mt7621" device support, which is for an
embedded device that Neil Brown cares about, and of course a handful
of new IIO drivers as well.
And finally, the fsl-mc core code moved out of the staging tree to the
"real" part of the kernel, which is nice to see happen as well.
Full details are in the shortlog, which has all of the tiny cleanup
patches described.
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (579 commits)
staging: rtl8723bs: Remove yield call, replace with cond_resched()
staging: rtl8723bs: Replace yield() call with cond_resched()
staging: rtl8723bs: Remove unecessary newlines from 'odm.h'.
staging: rtl8723bs: Rework 'struct _ODM_Phy_Status_Info_' coding style.
staging: rtl8723bs: Rework 'struct _ODM_Per_Pkt_Info_' coding style.
staging: rtl8723bs: Replace NULL pointer comparison with '!'.
staging: rtl8723bs: Factor out rtl8723bs_recv_tasklet() sections.
staging: rtl8723bs: Fix function signature that goes over 80 characters.
staging: rtl8723bs: Fix lines too long in update_recvframe_attrib().
staging: rtl8723bs: Remove unnecessary blank lines in 'rtl8723bs_recv.c'.
staging: rtl8723bs: Change camel case to snake case in 'rtl8723bs_recv.c'.
staging: rtl8723bs: Add missing braces in else statement.
staging: rtl8723bs: Add spaces around ternary operators.
staging: rtl8723bs: Fix lines with trailing open parentheses.
staging: rtl8723bs: Remove unnecessary length #define's.
staging: rtl8723bs: Fix IEEE80211 authentication algorithm constants.
staging: rtl8723bs: Fix alignment in rtw_wx_set_auth().
staging: rtl8723bs: Remove braces from single statement conditionals.
staging: rtl8723bs: Remove unecessary braces from switch statement.
staging: rtl8723bs: Fix newlines in rtw_wx_set_auth().
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver updates from Greg KH:
"Here is the big set of tty and serial driver patches for 4.17-rc1
Not all that big really, most are just small fixes and additions to
existing drivers. There's a bunch of work on the imx serial driver
recently for some reason, and a new embedded serial driver added as
well.
Full details are in the shortlog.
All of these have been in the linux-next tree for a while with no
reported issues"
* tag 'tty-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (66 commits)
serial: expose buf_overrun count through proc interface
serial: mvebu-uart: fix tx lost characters
tty: serial: msm_geni_serial: Fix return value check in qcom_geni_serial_probe()
tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP
8250-men-mcb: add support for 16z025 and 16z057
powerpc: Mark the variable earlycon_acpi_spcr_enable maybe_unused
serial: stm32: fix initialization of RS485 mode
ARM: dts: STi: Remove "console=ttyASN" from bootargs for STi boards
vt: change SGR 21 to follow the standards
serdev: Fix typo in serdev_device_alloc
ARM: dts: STi: Fix aliases property name for STi boards
tty: st-asc: Update tty alias
serial: stm32: add support for RS485 hardware control mode
dt-bindings: serial: stm32: add RS485 optional properties
selftests: add devpts selftests
devpts: comment devpts_mntget()
devpts: resolve devpts bind-mounts
devpts: hoist out check for DEVPTS_SUPER_MAGIC
serial: 8250: Add Nuvoton NPCM UART
serial: mxs-auart: disable clks of Alphascale ASM9260
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"Nothing particularly stands out here, probably because people were
tied up with spectre/meltdown stuff last time around. Still, the main
pieces are:
- Rework of our CPU features framework so that we can whitelist CPUs
that don't require kpti even in a heterogeneous system
- Support for the IDC/DIC architecture extensions, which allow us to
elide instruction and data cache maintenance when writing out
instructions
- Removal of the large memory model which resulted in suboptimal
codegen by the compiler and increased the use of literal pools,
which could potentially be used as ROP gadgets since they are
mapped as executable
- Rework of forced signal delivery so that the siginfo_t is
well-formed and handling of show_unhandled_signals is consolidated
and made consistent between different fault types
- More siginfo cleanup based on the initial patches from Eric
Biederman
- Workaround for Cortex-A55 erratum #1024718
- Some small ACPI IORT updates and cleanups from Lorenzo Pieralisi
- Misc cleanups and non-critical fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (70 commits)
arm64: uaccess: Fix omissions from usercopy whitelist
arm64: fpsimd: Split cpu field out from struct fpsimd_state
arm64: tlbflush: avoid writing RES0 bits
arm64: cmpxchg: Include linux/compiler.h in asm/cmpxchg.h
arm64: move percpu cmpxchg implementation from cmpxchg.h to percpu.h
arm64: cmpxchg: Include build_bug.h instead of bug.h for BUILD_BUG
arm64: lse: Include compiler_types.h and export.h for out-of-line LL/SC
arm64: fpsimd: include <linux/init.h> in fpsimd.h
drivers/perf: arm_pmu_platform: do not warn about affinity on uniprocessor
perf: arm_spe: include linux/vmalloc.h for vmap()
Revert "arm64: Revert L1_CACHE_SHIFT back to 6 (64-byte cache line size)"
arm64: cpufeature: Avoid warnings due to unused symbols
arm64: Add work around for Arm Cortex-A55 Erratum 1024718
arm64: Delay enabling hardware DBM feature
arm64: Add MIDR encoding for Arm Cortex-A55 and Cortex-A35
arm64: capabilities: Handle shared entries
arm64: capabilities: Add support for checks based on a list of MIDRs
arm64: Add helpers for checking CPU MIDR against a range
arm64: capabilities: Clean up midr range helpers
arm64: capabilities: Change scope of VHE to Boot CPU feature
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull time(r) updates from Thomas Gleixner:
"A small set of updates for timers and timekeeping:
- The most interesting change is the consolidation of clock MONOTONIC
and clock BOOTTIME.
Clock MONOTONIC behaves now exactly like clock BOOTTIME and does
not longer ignore the time spent in suspend. A new clock
MONOTONIC_ACTIVE is provived which behaves like clock MONOTONIC in
kernels before this change. This allows applications to
programmatically check for the clock MONOTONIC behaviour.
As discussed in the review thread, this has the potential of
breaking user space and we might have to revert this. Knock on wood
that we can avoid that exercise.
- Updates to the NTP mechanism to improve accuracy
- A new kernel internal data structure to aid the ongoing Y2038 work.
- Cleanups and simplifications of the clocksource code.
- Make the alarmtimer code play nicely with debugobjects"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
alarmtimer: Init nanosleep alarm timer on stack
y2038: Introduce struct __kernel_old_timeval
tracing: Unify the "boot" and "mono" tracing clocks
hrtimer: Unify MONOTONIC and BOOTTIME clock behavior
posix-timers: Unify MONOTONIC and BOOTTIME clock behavior
timekeeping: Remove boot time specific code
Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior
timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock
timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock
timekeeping/ntp: Determine the multiplier directly from NTP tick length
timekeeping/ntp: Don't align NTP frequency adjustments to ticks
clocksource: Use ATTRIBUTE_GROUPS
clocksource: Use DEVICE_ATTR_RW/RO/WO to define device attributes
clocksource: Don't walk the clocksource list for empty override
|
|
When a Raw Ethernet QP is created, we actually create a few objects.
One of these objects is a TIR. Currently, a TIR could hash (and spread
the traffic) by IP or port only. Adding a hashing by IPSec SPI to TIR
creation with the required UAPI bit.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Users should be able to query for IPSec support. Adding a few
capabilities bits as part of the driver specific part in
alloc_ucontext:
MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_REQ_METADATA
Payload's header is returned with metadata representing the
IPSec decryption state.
MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_RX
Support ESP_AES_GCM in ingress path.
MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_TX
Support ESP_AES_GCM in egress path.
MLX5_USER_ALLOC_UCONTEXT_FLOW_ACTION_FLAGS_ESP_AES_GCM_SPI_RSS_ONLY
Hardware doesn't support matching SPI in flow steering rules
but just hashing and spreading the traffic accordingly.
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Adding implementation in mlx5 driver to create and destroy action_xfrm
object. This merely call the accel layer.
A user may pass MLX5_IB_XFRM_FLAGS_REQUIRE_METADATA flag which states
that [s]he expects a metadata header to be added to the payload. This
header represents information regarding the transformation's state.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Adding a new ESP steering match filter that could match against
spi and seq used in IPSec protocol.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
flow_actions of ESP type could be modified during runtime. This could be
common for example when ESN should be changed. Adding a new
UVERBS_FLOW_ACTION_ESP_MODIFY method for changing ESP parameters of an
existing ESP flow_action.
The new method uses the UVERBS_FLOW_ACTION_ESP_CREATE attributes, but
adds a new IB_FLOW_ACTION_ESP_FLAGS_MOD_ESP_ATTRS which means ESP_ATTRS
should be changed.
In addition, we add a new FLOW_ACTION_ESP_REPLAY_NONE replay type that
could be used when one wants to disable a replay protection over a
specific flow_action.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Binding a flow_action to flow steering rule requires using a new
specification. Therefore, adding such an IB_FLOW_SPEC_ACTION_HANDLE flow
specification.
Flow steering rules could use flow_action(s) and as of that we need to
avoid deleting flow_action(s) as long as they're being used.
Moreover, when the attached rules are deleted, action_handle reference
count should be decremented. Introducing a new mechanism of flow
resources to keep track on the attached action_handle(s). Later on, this
mechanism should be extended to other attached flow steering resources
like flow counters.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
A verbs application may receive and transmits packets using a data
path pipeline. Sometimes, the first stage in the receive pipeline or
the last stage in the transmit pipeline involves transforming a
packet, either in order to make it easier for later stages to process
it or to prepare it for transmission over the wire. Such transformation
could be stripping/encapsulating the packet (i.e. vxlan),
decrypting/encrypting it (i.e. ipsec), altering headers, doing some
complex FPGA changes, etc.
Some hardware could do such transformations without software data path
intervention at all. The flow steering API supports steering a
packet (either to a QP or dropping it) and some simple packet
immutable actions (i.e. tagging a packet). Complex actions, that may
change the packet, could bloat the flow steering API extensively.
Sometimes the same action should be applied to several flows.
In this case, it's easier to bind several flows to the same action and
modify it than change all matching flows.
Introducing a new flow_action object that abstracts any packet
transformation (out of a standard and well defined set of actions).
This flow_action object could be tied to a flow steering rule via a
new specification.
Currently, we support esp flow_action, which encrypts or decrypts a
packet according to the given parameters. However, we present a
flexible schema that could be used to other transformation actions tied
to flow rules.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Methods sometimes need to get one attribute out of a group of
pre-defined attributes. This is an enum-like behavior. Since
this is a common requirement, we add a new ENUM attribute to the
generic uverbs ioctl() layer. This attribute is embedded in methods,
like any other attributes we currently have. ENUM attributes point to
an array of standard UVERBS_ATTR_PTR_IN. The user-space encodes the
enum's attribute id in the id field and the internal PTR_IN attr id in
the enum_data.elem_id field. This ENUM attribute could be shared by
several attributes and it can get UVERBS_ATTR_SPEC_F_MANDATORY flag,
stating this attribute must be supported by the kernel, like any other
attribute.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Commit 519049afead ("dm: use blkdev_get rather than bdgrab when issuing
pass-through ioctl") inadvertantly introduced a regression relative to
users of device cgroups that issue ioctls (e.g. libvirt). Using
blkdev_get() in DM's passthrough ioctl support implicitly introduced a
cgroup permissions check that would fail unless care were taken to add
all devices in the IO stack to the device cgroup. E.g. rather than just
adding the top-level DM multipath device to the cgroup all the
underlying devices would need to be allowed.
Fix this, to no longer require allowing all underlying devices, by
simply holding the live DM table (which includes the table's original
blkdev_get() reference on the blockdevice that the ioctl will be issued
to) for the duration of the ioctl.
Also, bump the DM ioctl version so a user can know that their device
cgroup allow workaround is no longer needed.
Reported-by: Michal Privoznik <mprivozn@redhat.com>
Suggested-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 519049afead ("dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl")
Cc: stable@vger.kernel.org # 4.16
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- new CEC pin injection code for testing purposes
- DVB frontend cxd2099 promoted from staging
- new platform driver for Sony cxd2880 DVB devices
- new sensor drivers: mt9t112, ov2685, ov5695, ov772x, tda1997x,
tw9910.c
- removal of unused cx18 and ivtv alsa mixers
- the reneseas-ceu driver doesn't depend on soc_camera anymore and
moved from staging
- removed the mantis_vp3028 driver, unused since 2009
- s5p-mfc: add support for version 10 of the MSP
- added a decoder for imon protocol
- atomisp: lots of cleanups
- imx074 and mt9t031: don't depend on soc_camera anymore, being
promoted from staging
- added helper functions to better support DVB I2C binding
- lots of driver improvements and cleanups
* tag 'media/v4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (438 commits)
media: v4l2-ioctl: rename a temp var that stores _IOC_SIZE(cmd)
media: fimc-capture: get rid of two warnings
media: dvb-usb-v2: fix a missing dependency of I2C_MUX
media: uvc: to the right check at uvc_ioctl_enum_framesizes()
media: cec-core: fix a bug at cec_error_inj_write()
media: tda9840: cleanup a warning
media: tm6000: avoid casting just to print pointer address
media: em28xx-input: improve error handling code
media: zr364xx: avoid casting just to print pointer address
media: vivid-radio-rx: add a cast to avoid a warning
media: saa7134-alsa: don't use casts to print a buffer address
media: solo6x10: get rid of an address space warning
media: zoran: don't cast pointers to print them
media: ir-kbd-i2c: change the if logic to avoid a warning
media: ir-kbd-i2c: improve error handling code
media: saa7134-input: improve error handling
media: s2255drv: fix a casting warning
media: ivtvfb: Cleanup some warnings
media: videobuf-dma-sg: Fix a weird cast
soc_camera: fix a weird cast on printk
...
|
|
Pull sparc updates from David Miller:
1) Add support for ADI (Application Data Integrity) found in more
recent sparc64 cpus. Essentially this is keyed based access to
virtual memory, and if the key encoded in the virual address is
wrong you get a trap.
The mm changes were reviewed by Andrew Morton and others.
Work by Khalid Aziz.
2) Validate DAX completion index range properly, from Rob Gardner.
3) Add proper Kconfig deps for DAX driver. From Guenter Roeck.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next:
sparc64: Make atomic_xchg() an inline function rather than a macro.
sparc64: Properly range check DAX completion index
sparc: Make auxiliary vectors for ADI available on 32-bit as well
sparc64: Oracle DAX driver depends on SPARC64
sparc64: Update signal delivery to use new helper functions
sparc64: Add support for ADI (Application Data Integrity)
mm: Allow arch code to override copy_highpage()
mm: Clear arch specific VM flags on protection change
mm: Add address parameter to arch_validate_prot()
sparc64: Add auxiliary vectors to report platform ADI properties
sparc64: Add handler for "Memory Corruption Detected" trap
sparc64: Add HV fault type handlers for ADI related faults
sparc64: Add support for ADI register fields, ASIs and traps
mm, swap: Add infrastructure for saving page metadata on swap
signals, sparc: Add signal codes for ADI violations
|
|
Pull networking updates from David Miller:
1) Support offloading wireless authentication to userspace via
NL80211_CMD_EXTERNAL_AUTH, from Srinivas Dasari.
2) A lot of work on network namespace setup/teardown from Kirill Tkhai.
Setup and cleanup of namespaces now all run asynchronously and thus
performance is significantly increased.
3) Add rx/tx timestamping support to mv88e6xxx driver, from Brandon
Streiff.
4) Support zerocopy on RDS sockets, from Sowmini Varadhan.
5) Use denser instruction encoding in x86 eBPF JIT, from Daniel
Borkmann.
6) Support hw offload of vlan filtering in mvpp2 dreiver, from Maxime
Chevallier.
7) Support grafting of child qdiscs in mlxsw driver, from Nogah
Frankel.
8) Add packet forwarding tests to selftests, from Ido Schimmel.
9) Deal with sub-optimal GSO packets better in BBR congestion control,
from Eric Dumazet.
10) Support 5-tuple hashing in ipv6 multipath routing, from David Ahern.
11) Add path MTU tests to selftests, from Stefano Brivio.
12) Various bits of IPSEC offloading support for mlx5, from Aviad
Yehezkel, Yossi Kuperman, and Saeed Mahameed.
13) Support RSS spreading on ntuple filters in SFC driver, from Edward
Cree.
14) Lots of sockmap work from John Fastabend. Applications can use eBPF
to filter sendmsg and sendpage operations.
15) In-kernel receive TLS support, from Dave Watson.
16) Add XDP support to ixgbevf, this is significant because it should
allow optimized XDP usage in various cloud environments. From Tony
Nguyen.
17) Add new Intel E800 series "ice" ethernet driver, from Anirudh
Venkataramanan et al.
18) IP fragmentation match offload support in nfp driver, from Pieter
Jansen van Vuuren.
19) Support XDP redirect in i40e driver, from Björn Töpel.
20) Add BPF_RAW_TRACEPOINT program type for accessing the arguments of
tracepoints in their raw form, from Alexei Starovoitov.
21) Lots of striding RQ improvements to mlx5 driver with many
performance improvements, from Tariq Toukan.
22) Use rhashtable for inet frag reassembly, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1678 commits)
net: mvneta: improve suspend/resume
net: mvneta: split rxq/txq init and txq deinit into SW and HW parts
ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh
net: bgmac: Fix endian access in bgmac_dma_tx_ring_free()
net: bgmac: Correctly annotate register space
route: check sysctl_fib_multipath_use_neigh earlier than hash
fix typo in command value in drivers/net/phy/mdio-bitbang.
sky2: Increase D3 delay to sky2 stops working after suspend
net/mlx5e: Set EQE based as default TX interrupt moderation mode
ibmvnic: Disable irqs before exiting reset from closed state
net: sched: do not emit messages while holding spinlock
vlan: also check phy_driver ts_info for vlan's real device
Bluetooth: Mark expected switch fall-throughs
Bluetooth: Set HCI_QUIRK_SIMULTANEOUS_DISCOVERY for BTUSB_QCA_ROME
Bluetooth: btrsi: remove unused including <linux/version.h>
Bluetooth: hci_bcm: Remove DMI quirk for the MINIX Z83-4
sh_eth: kill useless check in __sh_eth_get_regs()
sh_eth: add sh_eth_cpu_data::no_xdfar flag
ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data()
ipv4: factorize sk_wmem_alloc updates done by __ip_append_data()
...
|
|
This structure is pushed down the ex and the non-ex path, so it needs to be
aligned to 8 bytes to go through ex without implicit padding.
Old user space will provide 4 bytes of resp on !ex and 8 bytes on ex, so
take the approach of just copying the minimum length.
New user space will consistently provide 8 bytes in both cases.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Pull IPMI updates from Corey Minyard:
"Mostly small changes, as usual.
This does add an IPMI BMC server-side driver, to allow a Linux system
to act as an IPMI controller. That's the biggest change, but it is
just a new driver that is fairly narrow in use.
The other largish change is removing ACPI SPMI probe support, which
should have never really been there in the beginning"
* tag 'for-linus-4.17' of git://github.com/cminyard/linux-ipmi:
ipmi/parisc: Add IPMI chassis poweroff for certain HP PA-RISC and IA-64 servers
ipmi_ssif: Fix kernel panic at msg_done_handler
ipmi:pci: Blacklist a Realtek "IPMI" device
ipmi: Remove ACPI SPMI probing from the system interface driver
ipmi: Remove ACPI SPMI probing from the SSIF (I2C) driver
ipmi: missing error code in try_smi_init()
ipmi: use ARRAY_SIZE for poweroff_functions array sizing calculation
ipmi: Consolidate cleanup code
ipmi: Remove some unnecessary initializations
ipmi: Fix some error cleanup issues
ipmi: Add or fix SPDX-License-Identifier in all files
ipmi: Re-use existing macros for built-in properties
ipmi:pci: Make the PCI defines consistent with normal Linux ones
ipmi: kcs_bmc: coding-style fixes and use new poll type
char/ipmi: add documentation for sysfs interface
ipmi: kcs_bmc: mark expected switch fall-through in kcs_bmc_handle_data
ipmi: add an Aspeed KCS IPMI BMC driver
ipmi: add a KCS IPMI BMC driver
|
|
Could be useful for a target to return stats or other information.
If a target does DMEMIT() anything to @result from its .message method
then it must return 1 to the caller.
Signed-off-By: Mike Snitzer <snitzer@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pul removal of obsolete architecture ports from Arnd Bergmann:
"This removes the entire architecture code for blackfin, cris, frv,
m32r, metag, mn10300, score, and tile, including the associated device
drivers.
I have been working with the (former) maintainers for each one to
ensure that my interpretation was right and the code is definitely
unused in mainline kernels. Many had fond memories of working on the
respective ports to start with and getting them included in upstream,
but also saw no point in keeping the port alive without any users.
In the end, it seems that while the eight architectures are extremely
different, they all suffered the same fate: There was one company in
charge of an SoC line, a CPU microarchitecture and a software
ecosystem, which was more costly than licensing newer off-the-shelf
CPU cores from a third party (typically ARM, MIPS, or RISC-V). It
seems that all the SoC product lines are still around, but have not
used the custom CPU architectures for several years at this point. In
contrast, CPU instruction sets that remain popular and have actively
maintained kernel ports tend to all be used across multiple licensees.
[ See the new nds32 port merged in the previous commit for the next
generation of "one company in charge of an SoC line, a CPU
microarchitecture and a software ecosystem" - Linus ]
The removal came out of a discussion that is now documented at
https://lwn.net/Articles/748074/. Unlike the original plans, I'm not
marking any ports as deprecated but remove them all at once after I
made sure that they are all unused. Some architectures (notably tile,
mn10300, and blackfin) are still being shipped in products with old
kernels, but those products will never be updated to newer kernel
releases.
After this series, we still have a few architectures without mainline
gcc support:
- unicore32 and hexagon both have very outdated gcc releases, but the
maintainers promised to work on providing something newer. At least
in case of hexagon, this will only be llvm, not gcc.
- openrisc, risc-v and nds32 are still in the process of finishing
their support or getting it added to mainline gcc in the first
place. They all have patched gcc-7.3 ports that work to some
degree, but complete upstream support won't happen before gcc-8.1.
Csky posted their first kernel patch set last week, their situation
will be similar
[ Palmer Dabbelt points out that RISC-V support is in mainline gcc
since gcc-7, although gcc-7.3.0 is the recommended minimum - Linus ]"
This really says it all:
2498 files changed, 95 insertions(+), 467668 deletions(-)
* tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (74 commits)
MAINTAINERS: UNICORE32: Change email account
staging: iio: remove iio-trig-bfin-timer driver
tty: hvc: remove tile driver
tty: remove bfin_jtag_comm and hvc_bfin_jtag drivers
serial: remove tile uart driver
serial: remove m32r_sio driver
serial: remove blackfin drivers
serial: remove cris/etrax uart drivers
usb: Remove Blackfin references in USB support
usb: isp1362: remove blackfin arch glue
usb: musb: remove blackfin port
usb: host: remove tilegx platform glue
pwm: remove pwm-bfin driver
i2c: remove bfin-twi driver
spi: remove blackfin related host drivers
watchdog: remove bfin_wdt driver
can: remove bfin_can driver
mmc: remove bfin_sdh driver
input: misc: remove blackfin rotary driver
input: keyboard: remove bf54x driver
...
|
|
The change moving addr_lsb into the _sigfault union failed to take
into account that _sigfault._addr_bnd._lower being a pointer forced
the entire union to have pointer alignment. The fix for
_sigfault._addr_bnd._lower having pointer alignment failed to take
into account that m68k has a pointer alignment less than the size
of a pointer. So simply making the padding members pointers changed
the location of later members in the structure.
Fix this by directly computing the needed size of the padding members,
and making the padding members char arrays of the needed size. AKA
if __alignof__(void *) is 1 sizeof(short) otherwise __alignof__(void *).
Which should be exactly the same rules the compiler whould have
used when computing the padding.
I have tested this change by adding BUILD_BUG_ONs to m68k to verify
the offset of every member of struct siginfo, and with those testing
that the offsets of the fields in struct siginfo is the same before
I changed the generic _sigfault member and after the correction
to the _sigfault member.
I have also verified that the x86 with it's own BUILD_BUG_ONs to verify
the offsets of the siginfo members also compiles cleanly.
Cc: stable@vger.kernel.org
Reported-by: Eugene Syromiatnikov <esyr@redhat.com>
Fixes: 859d880cf544 ("signal: Correct the offset of si_pkey in struct siginfo")
Fixes: b68a68d3dcc1 ("signal: Move addr_lsb into the _sigfault union for clarity")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"The main kernel side changes were:
- Modernize the kprobe and uprobe creation/destruction tooling ABIs:
The existing text based APIs (kprobe_events and uprobe_events in
tracefs), are naive, limited ABIs in that they require user-space
to clean up after themselves, which is both difficult and fragile
if the tool is buggy or exits unexpectedly. In other words they are
not really suited for modern, robust tooling.
So introduce a modern, file descriptor based ABI that does not have
these limitations: introduce the 'perf_kprobe' and 'perf_uprobe'
PMUs and extend the perf_event_open() syscall to create events with
a kprobe/uprobe attached to them. These [k,u]probe are associated
with this file descriptor, so they are not available in tracefs.
(Song Liu)
- Intel Cannon Lake CPU support (Harry Pan)
- Intel PT cleanups (Alexander Shishkin)
- Improve the performance of pinned/flexible event groups by using RB
trees (Alexey Budankov)
- Add PERF_EVENT_IOC_MODIFY_ATTRIBUTES which allows the modification
of hardware breakpoints, which new ABI variant massively speeds up
existing tooling that uses hardware breakpoints to instrument (and
debug) memory usage.
(Milind Chabbi, Jiri Olsa)
- Various Intel PEBS handling fixes and improvements, and other Intel
PMU improvements (Kan Liang)
- Various perf core improvements and optimizations (Peter Zijlstra)
- ... misc cleanups, fixes and updates.
There's over 200 tooling commits, here's an (imperfect) list of
highlights:
- 'perf annotate' improvements:
* Recognize and handle jumps to other functions as calls, which
improves the navigation along jumps and back. (Arnaldo Carvalho
de Melo)
* Add the 'P' hotkey in TUI annotation to dump annotation output
into a file, to ease e-mail reporting of annotation details.
(Arnaldo Carvalho de Melo)
* Add an IPC/cycles column to the TUI (Jin Yao)
* Improve s390 assembly annotation (Thomas Richter)
* Refactor the output formatting logic to better separate it into
interactive and non-interactive features and add the --stdio2
output variant to demonstrate this. (Arnaldo Carvalho de Melo)
- 'perf script' improvements:
* Add Python 3 support (Jaroslav Škarvada)
* Add --show-round-event (Jiri Olsa)
- 'perf c2c' improvements:
* Add NUMA analysis support (Jiri Olsa)
- 'perf trace' improvements:
* Improve PowerPC support (Ravi Bangoria)
- 'perf inject' improvements:
* Integrate ARM CoreSight traces (Robert Walker)
- 'perf stat' improvements:
* Add the --interval-count option (yuzhoujian)
* Add the --timeout option (yuzhoujian)
- 'perf sched' improvements (Changbin Du)
- Vendor events improvements :
* Add IBM s390 vendor events (Thomas Richter)
* Add and improve arm64 vendor events (John Garry, Ganapatrao
Kulkarni)
* Update POWER9 vendor events (Sukadev Bhattiprolu)
- Intel PT tooling improvements (Adrian Hunter)
- PMU handling improvements (Agustin Vega-Frias)
- Record machine topology in perf.data (Jiri Olsa)
- Various overwrite related cleanups (Kan Liang)
- Add arm64 dwarf post unwind support (Kim Phillips, Jean Pihet)
- ... and lots of other changes, cleanups and fixes, see the shortlog
and Git history for details"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (262 commits)
perf/x86/intel: Enable C-state residency events for Cannon Lake
perf/x86/intel: Add Cannon Lake support for RAPL profiling
perf/x86/pt, coresight: Clean up address filter structure
perf vendor events s390: Add JSON files for IBM z14
perf vendor events s390: Add JSON files for IBM z13
perf vendor events s390: Add JSON files for IBM zEC12 zBC12
perf vendor events s390: Add JSON files for IBM z196
perf vendor events s390: Add JSON files for IBM z10EC z10BC
perf mmap: Be consistent when checking for an unmaped ring buffer
perf mmap: Fix accessing unmapped mmap in perf_mmap__read_done()
perf build: Fix check-headers.sh opts assignment
perf/x86: Update rdpmc_always_available static key to the modern API
perf annotate: Use absolute addresses to calculate jump target offsets
perf annotate: Defer searching for comma in raw line till it is needed
perf annotate: Support jumping from one function to another
perf annotate: Add "_local" to jump/offset validation routines
perf python: Reference Py_None before returning it
perf annotate: Mark jumps to outher functions with the call arrow
perf annotate: Pass function descriptor to its instruction parsing routines
perf annotate: No need to calculate notes->start twice
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Updates for v4.17
This is a *very* big release for ASoC. Not much change in the core but
there s the transition of all the individual drivers over to components
which is intended to support further core work. The goal is to make it
easier to do further core work by removing the need to special case all
the different driver classes in the core, many of the devices end up
being used in multiple roles in modern systems.
We also have quite a lot of new drivers added this month of all kinds,
quite a few for simple devices but also some more advanced ones with
more substantial code.
- The biggest thing is the huge series from Morimoto-san which
converted everything over to components. This is a huge change by
code volume but was fairly mechanical
- Many fixes for some of the Realtek based Baytrail systems covering
both the CODECs and the CPUs, contributed by Hans de Goode.
- Lots of cleanups for Samsung based Odroid systems from Sylwester
Nawrocki.
- The Freescale SSI driver also got a lot of cleanups from Nicolin
Chen.
- The Blackfin drivers have been removed as part of the removal of the
architecture.
- New drivers for AKM AK4458 and AK5558, several AMD based machines,
several Intel based machines, Maxim MAX9759, Motorola CPCAP,
Socionext Uniphier SoCs, and TI PCM1789 and TDA7419
|
|
Minor conflicts in drivers/net/ethernet/mellanox/mlx5/core/en_rep.c,
we had some overlapping changes:
1) In 'net' MLX5E_PARAMS_LOG_{SQ,RQ}_SIZE -->
MLX5E_REP_PARAMS_LOG_{SQ,RQ}_SIZE
2) In 'net-next' params->log_rq_size is renamed to be
params->log_rq_mtu_frames.
3) In 'net-next' params->hard_mtu is added.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-03-31
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add raw BPF tracepoint API in order to have a BPF program type that
can access kernel internal arguments of the tracepoints in their
raw form similar to kprobes based BPF programs. This infrastructure
also adds a new BPF_RAW_TRACEPOINT_OPEN command to BPF syscall which
returns an anon-inode backed fd for the tracepoint object that allows
for automatic detach of the BPF program resp. unregistering of the
tracepoint probe on fd release, from Alexei.
2) Add new BPF cgroup hooks at bind() and connect() entry in order to
allow BPF programs to reject, inspect or modify user space passed
struct sockaddr, and as well a hook at post bind time once the port
has been allocated. They are used in FB's container management engine
for implementing policy, replacing fragile LD_PRELOAD wrapper
intercepting bind() and connect() calls that only works in limited
scenarios like glibc based apps but not for other runtimes in
containerized applications, from Andrey.
3) BPF_F_INGRESS flag support has been added to sockmap programs for
their redirect helper call bringing it in line with cls_bpf based
programs. Support is added for both variants of sockmap programs,
meaning for tx ULP hooks as well as recv skb hooks, from John.
4) Various improvements on BPF side for the nfp driver, besides others
this work adds BPF map update and delete helper call support from
the datapath, JITing of 32 and 64 bit XADD instructions as well as
offload support of bpf_get_prandom_u32() call. Initial implementation
of nfp packet cache has been tackled that optimizes memory access
(see merge commit for further details), from Jakub and Jiong.
5) Removal of struct bpf_verifier_env argument from the print_bpf_insn()
API has been done in order to prepare to use print_bpf_insn() soon
out of perf tool directly. This makes the print_bpf_insn() API more
generic and pushes the env into private data. bpftool is adjusted
as well with the print_bpf_insn() argument removal, from Jiri.
6) Couple of cleanups and prep work for the upcoming BTF (BPF Type
Format). The latter will reuse the current BPF verifier log as
well, thus bpf_verifier_log() is further generalized, from Martin.
7) For bpf_getsockopt() and bpf_setsockopt() helpers, IPv4 IP_TOS read
and write support has been added in similar fashion to existing
IPv6 IPV6_TCLASS socket option we already have, from Nikita.
8) Fixes in recent sockmap scatterlist API usage, which did not use
sg_init_table() for initialization thus triggering a BUG_ON() in
scatterlist API when CONFIG_DEBUG_SG was enabled. This adds and
uses a small helper sg_init_marker() to properly handle the affected
cases, from Prashant.
9) Let the BPF core follow IDR code convention and therefore use the
idr_preload() and idr_preload_end() helpers, which would also help
idr_alloc_cyclic() under GFP_ATOMIC to better succeed under memory
pressure, from Shaohua.
10) Last but not least, a spelling fix in an error message for the
BPF cookie UID helper under BPF sample code, from Colin.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
gcc points out that the combined length of the fixed-length inputs to
l->name is larger than the destination buffer size:
net/tipc/link.c: In function 'tipc_link_create':
net/tipc/link.c:465:26: error: '%s' directive writing up to 32 bytes
into a region of size between 26 and 58 [-Werror=format-overflow=]
sprintf(l->name, "%s:%s-%s:unknown", self_str, if_name, peer_str);
net/tipc/link.c:465:2: note: 'sprintf' output 11 or more bytes
(assuming 75) into a destination of size 60
sprintf(l->name, "%s:%s-%s:unknown", self_str, if_name, peer_str);
A detailed analysis reveals that the theoretical maximum length of
a link name is:
max self_str + 1 + max if_name + 1 + max peer_str + 1 + max if_name =
16 + 1 + 15 + 1 + 16 + 1 + 15 = 65
Since we also need space for a trailing zero we now set MAX_LINK_NAME
to 68.
Just to be on the safe side we also replace the sprintf() call with
snprintf().
Fixes: 25b0b9c4e835 ("tipc: handle collisions of 32-bit node address
hash values")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The three address type structs in the user API have names that in
reality reflect the specific, non-Linux environment where they were
originally created.
We now give them more intuitive names, in accordance with how TIPC is
described in the current documentation.
struct tipc_portid -> struct tipc_socket_addr
struct tipc_name -> struct tipc_service_addr
struct tipc_name_seq -> struct tipc_service_range
To avoid confusion, we also update some commmets and macro names to
match the new terminology.
For compatibility, we add macros that map all old names to the new ones.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
"Post-hooks" are hooks that are called right before returning from
sys_bind. At this time IP and port are already allocated and no further
changes to `struct sock` can happen before returning from sys_bind but
BPF program has a chance to inspect the socket and change sys_bind
result.
Specifically it can e.g. inspect what port was allocated and if it
doesn't satisfy some policy, BPF program can force sys_bind to fail and
return EPERM to user.
Another example of usage is recording the IP:port pair to some map to
use it in later calls to sys_connect. E.g. if some TCP server inside
cgroup was bound to some IP:port_n, it can be recorded to a map. And
later when some TCP client inside same cgroup is trying to connect to
127.0.0.1:port_n, BPF hook for sys_connect can override the destination
and connect application to IP:port_n instead of 127.0.0.1:port_n. That
helps forcing all applications inside a cgroup to use desired IP and not
break those applications if they e.g. use localhost to communicate
between each other.
== Implementation details ==
Post-hooks are implemented as two new attach types
`BPF_CGROUP_INET4_POST_BIND` and `BPF_CGROUP_INET6_POST_BIND` for
existing prog type `BPF_PROG_TYPE_CGROUP_SOCK`.
Separate attach types for IPv4 and IPv6 are introduced to avoid access
to IPv6 field in `struct sock` from `inet_bind()` and to IPv4 field from
`inet6_bind()` since those fields might not make sense in such cases.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
== The problem ==
See description of the problem in the initial patch of this patch set.
== The solution ==
The patch provides much more reliable in-kernel solution for the 2nd
part of the problem: making outgoing connecttion from desired IP.
It adds new attach types `BPF_CGROUP_INET4_CONNECT` and
`BPF_CGROUP_INET6_CONNECT` for program type
`BPF_PROG_TYPE_CGROUP_SOCK_ADDR` that can be used to override both
source and destination of a connection at connect(2) time.
Local end of connection can be bound to desired IP using newly
introduced BPF-helper `bpf_bind()`. It allows to bind to only IP though,
and doesn't support binding to port, i.e. leverages
`IP_BIND_ADDRESS_NO_PORT` socket option. There are two reasons for this:
* looking for a free port is expensive and can affect performance
significantly;
* there is no use-case for port.
As for remote end (`struct sockaddr *` passed by user), both parts of it
can be overridden, remote IP and remote port. It's useful if an
application inside cgroup wants to connect to another application inside
same cgroup or to itself, but knows nothing about IP assigned to the
cgroup.
Support is added for IPv4 and IPv6, for TCP and UDP.
IPv4 and IPv6 have separate attach types for same reason as sys_bind
hooks, i.e. to prevent reading from / writing to e.g. user_ip6 fields
when user passes sockaddr_in since it'd be out-of-bound.
== Implementation notes ==
The patch introduces new field in `struct proto`: `pre_connect` that is
a pointer to a function with same signature as `connect` but is called
before it. The reason is in some cases BPF hooks should be called way
before control is passed to `sk->sk_prot->connect`. Specifically
`inet_dgram_connect` autobinds socket before calling
`sk->sk_prot->connect` and there is no way to call `bpf_bind()` from
hooks from e.g. `ip4_datagram_connect` or `ip6_datagram_connect` since
it'd cause double-bind. On the other hand `proto.pre_connect` provides a
flexible way to add BPF hooks for connect only for necessary `proto` and
call them at desired time before `connect`. Since `bpf_bind()` is
allowed to bind only to IP and autobind in `inet_dgram_connect` binds
only port there is no chance of double-bind.
bpf_bind() sets `force_bind_address_no_port` to bind to only IP despite
of value of `bind_address_no_port` socket field.
bpf_bind() sets `with_lock` to `false` when calling to __inet_bind()
and __inet6_bind() since all call-sites, where bpf_bind() is called,
already hold socket lock.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
== The problem ==
There is a use-case when all processes inside a cgroup should use one
single IP address on a host that has multiple IP configured. Those
processes should use the IP for both ingress and egress, for TCP and UDP
traffic. So TCP/UDP servers should be bound to that IP to accept
incoming connections on it, and TCP/UDP clients should make outgoing
connections from that IP. It should not require changing application
code since it's often not possible.
Currently it's solved by intercepting glibc wrappers around syscalls
such as `bind(2)` and `connect(2)`. It's done by a shared library that
is preloaded for every process in a cgroup so that whenever TCP/UDP
server calls `bind(2)`, the library replaces IP in sockaddr before
passing arguments to syscall. When application calls `connect(2)` the
library transparently binds the local end of connection to that IP
(`bind(2)` with `IP_BIND_ADDRESS_NO_PORT` to avoid performance penalty).
Shared library approach is fragile though, e.g.:
* some applications clear env vars (incl. `LD_PRELOAD`);
* `/etc/ld.so.preload` doesn't help since some applications are linked
with option `-z nodefaultlib`;
* other applications don't use glibc and there is nothing to intercept.
== The solution ==
The patch provides much more reliable in-kernel solution for the 1st
part of the problem: binding TCP/UDP servers on desired IP. It does not
depend on application environment and implementation details (whether
glibc is used or not).
It adds new eBPF program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR` and
attach types `BPF_CGROUP_INET4_BIND` and `BPF_CGROUP_INET6_BIND`
(similar to already existing `BPF_CGROUP_INET_SOCK_CREATE`).
The new program type is intended to be used with sockets (`struct sock`)
in a cgroup and provided by user `struct sockaddr`. Pointers to both of
them are parts of the context passed to programs of newly added types.
The new attach types provides hooks in `bind(2)` system call for both
IPv4 and IPv6 so that one can write a program to override IP addresses
and ports user program tries to bind to and apply such a program for
whole cgroup.
== Implementation notes ==
[1]
Separate attach types for `AF_INET` and `AF_INET6` are added
intentionally to prevent reading/writing to offsets that don't make
sense for corresponding socket family. E.g. if user passes `sockaddr_in`
it doesn't make sense to read from / write to `user_ip6[]` context
fields.
[2]
The write access to `struct bpf_sock_addr_kern` is implemented using
special field as an additional "register".
There are just two registers in `sock_addr_convert_ctx_access`: `src`
with value to write and `dst` with pointer to context that can't be
changed not to break later instructions. But the fields, allowed to
write to, are not available directly and to access them address of
corresponding pointer has to be loaded first. To get additional register
the 1st not used by `src` and `dst` one is taken, its content is saved
to `bpf_sock_addr_kern.tmp_reg`, then the register is used to load
address of pointer field, and finally the register's content is restored
from the temporary field after writing `src` value.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
== The problem ==
There are use-cases when a program of some type can be attached to
multiple attach points and those attach points must have different
permissions to access context or to call helpers.
E.g. context structure may have fields for both IPv4 and IPv6 but it
doesn't make sense to read from / write to IPv6 field when attach point
is somewhere in IPv4 stack.
Same applies to BPF-helpers: it may make sense to call some helper from
some attach point, but not from other for same prog type.
== The solution ==
Introduce `expected_attach_type` field in in `struct bpf_attr` for
`BPF_PROG_LOAD` command. If scenario described in "The problem" section
is the case for some prog type, the field will be checked twice:
1) At load time prog type is checked to see if attach type for it must
be known to validate program permissions correctly. Prog will be
rejected with EINVAL if it's the case and `expected_attach_type` is
not specified or has invalid value.
2) At attach time `attach_type` is compared with `expected_attach_type`,
if prog type requires to have one, and, if they differ, attach will
be rejected with EINVAL.
The `expected_attach_type` is now available as part of `struct bpf_prog`
in both `bpf_verifier_ops->is_valid_access()` and
`bpf_verifier_ops->get_func_proto()` () and can be used to check context
accesses and calls to helpers correspondingly.
Initially the idea was discussed by Alexei Starovoitov <ast@fb.com> and
Daniel Borkmann <daniel@iogearbox.net> here:
https://marc.info/?l=linux-netdev&m=152107378717201&w=2
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
The `__u64 time` field of the blk_io_trace struct refers to
the time in nanoseconds, not in microseconds. It is set in
__blk_add_trace, which does the following:
t->time = ktime_to_ns(ktime_get());
ktime_to_ns returns ktime_t in nanoseconds, not microseconds.
Signed-off-by: Souvik Banerjee <souvik1997@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your net-next
tree. This batch comes with more input sanitization for xtables to
address bug reports from fuzzers, preparation works to the flowtable
infrastructure and assorted updates. In no particular order, they are:
1) Make sure userspace provides a valid standard target verdict, from
Florian Westphal.
2) Sanitize error target size, also from Florian.
3) Validate that last rule in basechain matches underflow/policy since
userspace assumes this when decoding the ruleset blob that comes
from the kernel, from Florian.
4) Consolidate hook entry checks through xt_check_table_hooks(),
patch from Florian.
5) Cap ruleset allocations at 512 mbytes, 134217728 rules and reject
very large compat offset arrays, so we have a reasonable upper limit
and fuzzers don't exercise the oom-killer. Patches from Florian.
6) Several WARN_ON checks on xtables mutex helper, from Florian.
7) xt_rateest now has a hashtable per net, from Cong Wang.
8) Consolidate counter allocation in xt_counters_alloc(), from Florian.
9) Earlier xt_table_unlock() call in {ip,ip6,arp,eb}tables, patch
from Xin Long.
10) Set FLOW_OFFLOAD_DIR_* to IP_CT_DIR_* definitions, patch from
Felix Fietkau.
11) Consolidate code through flow_offload_fill_dir(), also from Felix.
12) Inline ip6_dst_mtu_forward() just like ip_dst_mtu_maybe_forward()
to remove a dependency with flowtable and ipv6.ko, from Felix.
13) Cache mtu size in flow_offload_tuple object, this is safe for
forwarding as f87c10a8aa1e describes, from Felix.
14) Rename nf_flow_table.c to nf_flow_table_core.o, to simplify too
modular infrastructure, from Felix.
15) Add rt0, rt2 and rt4 IPv6 routing extension support, patch from
Ahmed Abdelsalam.
16) Remove unused parameter in nf_conncount_count(), from Yi-Hung Wei.
17) Support for counting only to nf_conncount infrastructure, patch
from Yi-Hung Wei.
18) Add strict NFT_CT_{SRC_IP,DST_IP,SRC_IP6,DST_IP6} key datatypes
to nft_ct.
19) Use boolean as return value from ipt_ah and from IPVS too, patch
from Gustavo A. R. Silva.
20) Remove useless parameters in nfnl_acct_overquota() and
nf_conntrack_broadcast_help(), from Taehee Yoo.
21) Use ipv6_addr_is_multicast() from xt_cluster, also from Taehee Yoo.
22) Statify nf_tables_obj_lookup_byhandle, patch from Fengguang Wu.
23) Fix typo in xt_limit, from Geert Uytterhoeven.
24) Do no use VLAs in Netfilter code, again from Gustavo.
25) Use ADD_COUNTER from ebtables, from Taehee Yoo.
26) Bitshift support for CONNMARK and MARK targets, from Jack Ma.
27) Use pr_*() and add pr_fmt(), from Arushi Singhal.
28) Add synproxy support to ctnetlink.
29) ICMP type and IGMP matching support for ebtables, patches from
Matthias Schiffer.
30) Support for the revision infrastructure to ebtables, from
Bernie Harris.
31) String match support for ebtables, also from Bernie.
32) Documentation for the new flowtable infrastructure.
33) Use generic comparison functions in ebt_stp, from Joe Perches.
34) Demodularize filter chains in nftables.
35) Register conntrack hooks in case nftables NAT chain is added.
36) Merge assignments with return in a couple of spots in the
Netfilter codebase, also from Arushi.
37) Document that xtables percpu counters are stored in the same
memory area, from Ben Hutchings.
38) Revert mark_source_chains() sanity checks that break existing
rulesets, from Florian Westphal.
39) Use is_zero_ether_addr() in the ipset codebase, from Joe Perches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently ebtables assumes that the revision number of all match
modules is 0, which is an issue when trying to use existing
xtables matches with ebtables. The solution is to modify ebtables
to allow extensions to specify a revision number, similar to
iptables. This gets passed down to the kernel, which is then able
to find the match module correctly.
To main binary backwards compatibility, the size of the ebt_entry
structures is not changed, only the size of the name field is
decreased by 1 byte to make room for the revision field.
Signed-off-by: Bernie Harris <bernie.harris@alliedtelesis.co.nz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|