summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-11-16net: dsa: tag_8021q: Fix dsa_8021q_restore_pvid for an absent pvidVladimir Oltean1-1/+1
This sequence of operations: ip link set dev br0 type bridge vlan_filtering 1 bridge vlan del dev swp2 vid 1 ip link set dev br0 type bridge vlan_filtering 1 ip link set dev br0 type bridge vlan_filtering 0 apparently fails with the message: [ 31.305716] sja1105 spi0.1: Reset switch and programmed static config. Reason: VLAN filtering [ 31.322161] sja1105 spi0.1: Couldn't determine PVID attributes (pvid 0) [ 31.328939] sja1105 spi0.1: Failed to setup VLAN tagging for port 1: -2 [ 31.335599] ------------[ cut here ]------------ [ 31.340215] WARNING: CPU: 1 PID: 194 at net/switchdev/switchdev.c:157 switchdev_port_attr_set_now+0x9c/0xa4 [ 31.349981] br0: Commit of attribute (id=6) failed. [ 31.354890] Modules linked in: [ 31.357942] CPU: 1 PID: 194 Comm: ip Not tainted 5.4.0-rc6-01792-gf4f632e07665-dirty #2062 [ 31.366167] Hardware name: Freescale LS1021A [ 31.370437] [<c03144dc>] (unwind_backtrace) from [<c030e184>] (show_stack+0x10/0x14) [ 31.378153] [<c030e184>] (show_stack) from [<c11d1c1c>] (dump_stack+0xe0/0x10c) [ 31.385437] [<c11d1c1c>] (dump_stack) from [<c034c730>] (__warn+0xf4/0x10c) [ 31.392373] [<c034c730>] (__warn) from [<c034c7bc>] (warn_slowpath_fmt+0x74/0xb8) [ 31.399827] [<c034c7bc>] (warn_slowpath_fmt) from [<c11ca204>] (switchdev_port_attr_set_now+0x9c/0xa4) [ 31.409097] [<c11ca204>] (switchdev_port_attr_set_now) from [<c117036c>] (__br_vlan_filter_toggle+0x6c/0x118) [ 31.418971] [<c117036c>] (__br_vlan_filter_toggle) from [<c115d010>] (br_changelink+0xf8/0x518) [ 31.427637] [<c115d010>] (br_changelink) from [<c0f8e9ec>] (__rtnl_newlink+0x3f4/0x76c) [ 31.435613] [<c0f8e9ec>] (__rtnl_newlink) from [<c0f8eda8>] (rtnl_newlink+0x44/0x60) [ 31.443329] [<c0f8eda8>] (rtnl_newlink) from [<c0f89f20>] (rtnetlink_rcv_msg+0x2cc/0x51c) [ 31.451477] [<c0f89f20>] (rtnetlink_rcv_msg) from [<c1008df8>] (netlink_rcv_skb+0xb8/0x110) [ 31.459796] [<c1008df8>] (netlink_rcv_skb) from [<c1008648>] (netlink_unicast+0x17c/0x1f8) [ 31.468026] [<c1008648>] (netlink_unicast) from [<c1008980>] (netlink_sendmsg+0x2bc/0x3b4) [ 31.476261] [<c1008980>] (netlink_sendmsg) from [<c0f43858>] (___sys_sendmsg+0x230/0x250) [ 31.484408] [<c0f43858>] (___sys_sendmsg) from [<c0f44c84>] (__sys_sendmsg+0x50/0x8c) [ 31.492209] [<c0f44c84>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x28) [ 31.500090] Exception stack(0xedf47fa8 to 0xedf47ff0) [ 31.505122] 7fa0: 00000002 b6f2e060 00000003 beabd6a4 00000000 00000000 [ 31.513265] 7fc0: 00000002 b6f2e060 5d6e3213 00000128 00000000 00000001 00000006 000619c4 [ 31.521405] 7fe0: 00086078 beabd658 0005edbc b6e7ce68 The reason is the implementation of br_get_pvid: static inline u16 br_get_pvid(const struct net_bridge_vlan_group *vg) { if (!vg) return 0; smp_rmb(); return vg->pvid; } Since VID 0 is an invalid pvid from the bridge's point of view, let's add this check in dsa_8021q_restore_pvid to avoid restoring a pvid that doesn't really exist. Fixes: 5f33183b7fdf ("net: dsa: tag_8021q: Restore bridge VLANs when enabling vlan_filtering") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16Merge branch 'seg6-fixes-to-Segment-Routing-in-IPv6'David S. Miller1-0/+11
Andrea Mayer says: ==================== seg6: fixes to Segment Routing in IPv6 This patchset is divided in 2 patches and it introduces some fixes to Segment Routing in IPv6, which are: - in function get_srh() fix the srh pointer after calling pskb_may_pull(); - fix the skb->transport_header after calling decap_and_validate() function; Any comments on the patchset are welcome. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16seg6: fix skb transport_header after decap_and_validate()Andrea Mayer1-0/+6
in the receive path (more precisely in ip6_rcv_core()) the skb->transport_header is set to skb->network_header + sizeof(*hdr). As a consequence, after routing operations, destination input expects to find skb->transport_header correctly set to the next protocol (or extension header) that follows the network protocol. However, decap behaviors (DX*, DT*) remove the outer IPv6 and SRH extension and do not set again the skb->transport_header pointer correctly. For this reason, the patch sets the skb->transport_header to the skb->network_header + sizeof(hdr) in each DX* and DT* behavior. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16seg6: fix srh pointer in get_srh()Andrea Mayer1-0/+5
pskb_may_pull may change pointers in header. For this reason, it is mandatory to reload any pointer that points into skb header. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16net: stmmac: Use the correct style for SPDX License IdentifierNishad Kamdar3-3/+3
This patch corrects the SPDX License Identifier style in header files related to STMicroelectronics based Multi-Gigabit Ethernet driver. For C header files Documentation/process/license-rules.rst mandates C-like comments (opposed to C source files where C++ style should be used). Changes made by using a script provided by Joe Perches here: https://lkml.org/lkml/2019/2/7/46. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16octeontx2-af: Use the correct style for SPDX License IdentifierNishad Kamdar9-18/+18
This patch corrects the SPDX License Identifier style in header files related to Marvell OcteonTX2 network devices. It uses an expilict block comment for the SPDX License Identifier. Changes made by using a script provided by Joe Perches here: https://lkml.org/lkml/2019/2/7/46. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-16Merge branch 'akpm' (patches from Andrew)Linus Torvalds12-86/+133
Merge misc fixes from Andrew Morton: "11 fixes" MM fixes and one xz decompressor fix. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/debug.c: PageAnon() is true for PageKsm() pages mm/debug.c: __dump_page() prints an extra line mm/page_io.c: do not free shared swap slots mm/memory_hotplug: fix try_offline_node() mm,thp: recheck each page before collapsing file THP mm: slub: really fix slab walking for init_on_free mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm() lib/xz: fix XZ_DYNALLOC to avoid useless memory reallocations mm: fix trying to reclaim unevictable lru page when calling madvise_pageout mm: mempolicy: fix the wrong return value and potential pages leak of mbind
2019-11-16Merge branch 'for-linus' of ↵Linus Torvalds3-0/+11
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull more input fixes from Dmitry Torokhov: "A couple of fixes in driver teardown paths and another ID for Synaptics RMI mode" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: synaptics - enable RMI mode for X1 Extreme 2nd Generation Input: synaptics-rmi4 - destroy F54 poller workqueue when removing Input: ff-memless - kill timer in destroy()
2019-11-16mm/debug.c: PageAnon() is true for PageKsm() pagesRalph Campbell1-3/+3
PageAnon() and PageKsm() use the low two bits of the page->mapping pointer to indicate the page type. PageAnon() only checks the LSB while PageKsm() checks the least significant 2 bits are equal to 3. Therefore, PageAnon() is true for KSM pages. __dump_page() incorrectly will never print "ksm" because it checks PageAnon() first. Fix this by checking PageKsm() first. Link: http://lkml.kernel.org/r/20191113000651.20677-1-rcampbell@nvidia.com Fixes: 1c6fb1d89e73 ("mm: print more information about mapping in __dump_page") Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-16mm/debug.c: __dump_page() prints an extra lineRalph Campbell1-12/+15
When dumping struct page information, __dump_page() prints the page type with a trailing blank followed by the page flags on a separate line: anon flags: 0x100000000090034(uptodate|lru|active|head|swapbacked) It looks like the intent was to use pr_cont() for printing "flags:" but pr_cont() usage is discouraged so fix this by extending the format to include the flags into a single line: anon flags: 0x100000000090034(uptodate|lru|active|head|swapbacked) If the page is file backed, the name might be long so use two lines: shmem_aops name:"dev/zero" flags: 0x10000000008000c(uptodate|dirty|swapbacked) Eliminate pr_conf() usage as well for appending compound_mapcount. Link: http://lkml.kernel.org/r/20191112012608.16926-1-rcampbell@nvidia.com Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-16mm/page_io.c: do not free shared swap slotsVinayak Menon1-3/+3
The following race is observed due to which a processes faulting on a swap entry, finds the page neither in swapcache nor swap. This causes zram to give a zero filled page that gets mapped to the process, resulting in a user space crash later. Consider parent and child processes Pa and Pb sharing the same swap slot with swap_count 2. Swap is on zram with SWP_SYNCHRONOUS_IO set. Virtual address 'VA' of Pa and Pb points to the shared swap entry. Pa Pb fault on VA fault on VA do_swap_page do_swap_page lookup_swap_cache fails lookup_swap_cache fails Pb scheduled out swapin_readahead (deletes zram entry) swap_free (makes swap_count 1) Pb scheduled in swap_readpage (swap_count == 1) Takes SWP_SYNCHRONOUS_IO path zram enrty absent zram gives a zero filled page Fix this by making sure that swap slot is freed only when swap count drops down to one. Link: http://lkml.kernel.org/r/1571743294-14285-1-git-send-email-vinmenon@codeaurora.org Fixes: aa8d22a11da9 ("mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped page has no other reference") Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Suggested-by: Minchan Kim <minchan@google.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-16mm/memory_hotplug: fix try_offline_node()David Hildenbrand3-16/+64
try_offline_node() is pretty much broken right now: - The node span is updated when onlining memory, not when adding it. We ignore memory that was mever onlined. Bad. - We touch possible garbage memmaps. The pfn_to_nid(pfn) can easily trigger a kernel panic. Bad for memory that is offline but also bad for subsection hotadd with ZONE_DEVICE, whereby the memmap of the first PFN of a section might contain garbage. - Sections belonging to mixed nodes are not properly considered. As memory blocks might belong to multiple nodes, we would have to walk all pageblocks (or at least subsections) within present sections. However, we don't have a way to identify whether a memmap that is not online was initialized (relevant for ZONE_DEVICE). This makes things more complicated. Luckily, we can piggy pack on the node span and the nid stored in memory blocks. Currently, the node span is grown when calling move_pfn_range_to_zone() - e.g., when onlining memory, and shrunk when removing memory, before calling try_offline_node(). Sysfs links are created via link_mem_sections(), e.g., during boot or when adding memory. If the node still spans memory or if any memory block belongs to the nid, we don't set the node offline. As memory blocks that span multiple nodes cannot get offlined, the nid stored in memory blocks is reliable enough (for such online memory blocks, the node still spans the memory). Introduce for_each_memory_block() to efficiently walk all memory blocks. Note: We will soon stop shrinking the ZONE_DEVICE zone and the node span when removing ZONE_DEVICE memory to fix similar issues (access of garbage memmaps) - until we have a reliable way to identify whether these memmaps were properly initialized. This implies later, that once a node had ZONE_DEVICE memory, we won't be able to set a node offline - which should be acceptable. Since commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") memory that is added is not assoziated with a zone/node (memmap not initialized). The introducing commit 60a5a19e7419 ("memory-hotplug: remove sysfs file of node") already missed that we could have multiple nodes for a section and that the zone/node span is updated when onlining pages, not when adding them. I tested this by hotplugging two DIMMs to a memory-less and cpu-less NUMA node. The node is properly onlined when adding the DIMMs. When removing the DIMMs, the node is properly offlined. Masayoshi Mizuma reported: : Without this patch, memory hotplug fails as panic: : : BUG: kernel NULL pointer dereference, address: 0000000000000000 : ... : Call Trace: : remove_memory_block_devices+0x81/0xc0 : try_remove_memory+0xb4/0x130 : __remove_memory+0xa/0x20 : acpi_memory_device_remove+0x84/0x100 : acpi_bus_trim+0x57/0x90 : acpi_bus_trim+0x2e/0x90 : acpi_device_hotplug+0x2b2/0x4d0 : acpi_hotplug_work_fn+0x1a/0x30 : process_one_work+0x171/0x380 : worker_thread+0x49/0x3f0 : kthread+0xf8/0x130 : ret_from_fork+0x35/0x40 [david@redhat.com: v3] Link: http://lkml.kernel.org/r/20191102120221.7553-1-david@redhat.com Link: http://lkml.kernel.org/r/20191028105458.28320-1-david@redhat.com Fixes: 60a5a19e7419 ("memory-hotplug: remove sysfs file of node") Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visiable after d0dc12e86b319 Signed-off-by: David Hildenbrand <david@redhat.com> Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Keith Busch <keith.busch@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Nayna Jain <nayna@linux.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-16mm,thp: recheck each page before collapsing file THPSong Liu1-12/+16
In collapse_file(), for !is_shmem case, current check cannot guarantee the locked page is up-to-date. Specifically, xas_unlock_irq() should not be called before lock_page() and get_page(); and it is necessary to recheck PageUptodate() after locking the page. With this bug and CONFIG_READ_ONLY_THP_FOR_FS=y, madvise(HUGE)'ed .text may contain corrupted data. This is because khugepaged mistakenly collapses some not up-to-date sub pages into a huge page, and assumes the huge page is up-to-date. This will NOT corrupt data in the disk, because the page is read-only and never written back. Fix this by properly checking PageUptodate() after locking the page. This check replaces "VM_BUG_ON_PAGE(!PageUptodate(page), page);". Also, move PageDirty() check after locking the page. Current khugepaged should not try to collapse dirty file THP, because it is limited to read-only .text. The only case we hit a dirty page here is when the page hasn't been written since write. Bail out and retry when this happens. syzbot reported bug on previous version of this patch. Link: http://lkml.kernel.org/r/20191106060930.2571389-2-songliubraving@fb.com Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by: Song Liu <songliubraving@fb.com> Reported-by: syzbot+efb9e48b9fbdc49bb34a@syzkaller.appspotmail.com Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-16mm: slub: really fix slab walking for init_on_freeLaura Abbott1-30/+9
Commit 1b7e816fc80e ("mm: slub: Fix slab walking for init_on_free") fixed one problem with the slab walking but missed a key detail: When walking the list, the head and tail pointers need to be updated since we end up reversing the list as a result. Without doing this, bulk free is broken. One way this is exposed is a NULL pointer with slub_debug=F: ============================================================================= BUG skbuff_head_cache (Tainted: G T): Object already free ----------------------------------------------------------------------------- INFO: Slab 0x000000000d2d2f8f objects=16 used=3 fp=0x0000000064309071 flags=0x3fff00000000201 BUG: kernel NULL pointer dereference, address: 0000000000000000 Oops: 0000 [#1] PREEMPT SMP PTI RIP: 0010:print_trailer+0x70/0x1d5 Call Trace: <IRQ> free_debug_processing.cold.37+0xc9/0x149 __slab_free+0x22a/0x3d0 kmem_cache_free_bulk+0x415/0x420 __kfree_skb_flush+0x30/0x40 net_rx_action+0x2dd/0x480 __do_softirq+0xf0/0x246 irq_exit+0x93/0xb0 do_IRQ+0xa0/0x110 common_interrupt+0xf/0xf </IRQ> Given we're now almost identical to the existing debugging code which correctly walks the list, combine with that. Link: https://lkml.kernel.org/r/20191104170303.GA50361@gandi.net Link: http://lkml.kernel.org/r/20191106222208.26815-1-labbott@redhat.com Fixes: 1b7e816fc80e ("mm: slub: Fix slab walking for init_on_free") Signed-off-by: Laura Abbott <labbott@redhat.com> Reported-by: Thibaut Sautereau <thibaut.sautereau@clip-os.org> Acked-by: David Rientjes <rientjes@google.com> Tested-by: Alexander Potapenko <glider@google.com> Acked-by: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <clipos@ssi.gouv.fr> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-16mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup()Roman Gushchin1-1/+1
An exiting task might belong to an offline cgroup. In this case an attempt to grab a cgroup reference from the task can end up with an infinite loop in hugetlb_cgroup_charge_cgroup(), because neither the cgroup will become online, neither the task will be migrated to a live cgroup. Fix this by switching over to css_tryget(). As css_tryget_online() can't guarantee that the cgroup won't go offline, in most cases the check doesn't make sense. In this particular case users of hugetlb_cgroup_charge_cgroup() are not affected by this change. A similar problem is described by commit 18fa84a2db0e ("cgroup: Use css_tryget() instead of css_tryget_online() in task_get_css()"). Link: http://lkml.kernel.org/r/20191106225131.3543616-2-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-16mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()Roman Gushchin1-1/+1
We've encountered a rcu stall in get_mem_cgroup_from_mm(): rcu: INFO: rcu_sched self-detected stall on CPU rcu: 33-....: (21000 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=35441/35441 fqs=5017 (t=21031 jiffies g=324821 q=95837) NMI backtrace for cpu 33 <...> RIP: 0010:get_mem_cgroup_from_mm+0x2f/0x90 <...> __memcg_kmem_charge+0x55/0x140 __alloc_pages_nodemask+0x267/0x320 pipe_write+0x1ad/0x400 new_sync_write+0x127/0x1c0 __kernel_write+0x4f/0xf0 dump_emit+0x91/0xc0 writenote+0xa0/0xc0 elf_core_dump+0x11af/0x1430 do_coredump+0xc65/0xee0 get_signal+0x132/0x7c0 do_signal+0x36/0x640 exit_to_usermode_loop+0x61/0xd0 do_syscall_64+0xd4/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The problem is caused by an exiting task which is associated with an offline memcg. We're iterating over and over in the do {} while (!css_tryget_online()) loop, but obviously the memcg won't become online and the exiting task won't be migrated to a live memcg. Let's fix it by switching from css_tryget_online() to css_tryget(). As css_tryget_online() cannot guarantee that the memcg won't go offline, the check is usually useless, except some rare cases when for example it determines if something should be presented to a user. A similar problem is described by commit 18fa84a2db0e ("cgroup: Use css_tryget() instead of css_tryget_online() in task_get_css()"). Johannes: : The bug aside, it doesn't matter whether the cgroup is online for the : callers. It used to matter when offlining needed to evacuate all charges : from the memcg, and so needed to prevent new ones from showing up, but we : don't care now. Link: http://lkml.kernel.org/r/20191106225131.3543616-1-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Shakeel Butt <shakeeb@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutn <mkoutny@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-16lib/xz: fix XZ_DYNALLOC to avoid useless memory reallocationsLasse Collin1-0/+1
s->dict.allocated was initialized to 0 but never set after a successful allocation, thus the code always thought that the dictionary buffer has to be reallocated. Link: http://lkml.kernel.org/r/20191104185107.3b6330df@tukaani.org Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Reported-by: Yu Sun <yusun2@cisco.com> Acked-by: Daniel Walker <danielwa@cisco.com> Cc: "Yixia Si (yisi)" <yisi@cisco.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-16mm: fix trying to reclaim unevictable lru page when calling madvise_pageoutzhong jiang1-4/+12
Recently, I hit the following issue when running upstream. kernel BUG at mm/vmscan.c:1521! invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 0 PID: 23385 Comm: syz-executor.6 Not tainted 5.4.0-rc4+ #1 RIP: 0010:shrink_page_list+0x12b6/0x3530 mm/vmscan.c:1521 Call Trace: reclaim_pages+0x499/0x800 mm/vmscan.c:2188 madvise_cold_or_pageout_pte_range+0x58a/0x710 mm/madvise.c:453 walk_pmd_range mm/pagewalk.c:53 [inline] walk_pud_range mm/pagewalk.c:112 [inline] walk_p4d_range mm/pagewalk.c:139 [inline] walk_pgd_range mm/pagewalk.c:166 [inline] __walk_page_range+0x45a/0xc20 mm/pagewalk.c:261 walk_page_range+0x179/0x310 mm/pagewalk.c:349 madvise_pageout_page_range mm/madvise.c:506 [inline] madvise_pageout+0x1f0/0x330 mm/madvise.c:542 madvise_vma mm/madvise.c:931 [inline] __do_sys_madvise+0x7d2/0x1600 mm/madvise.c:1113 do_syscall_64+0x9f/0x4c0 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe madvise_pageout() accesses the specified range of the vma and isolates them, then runs shrink_page_list() to reclaim its memory. But it also isolates the unevictable pages to reclaim. Hence, we can catch the cases in shrink_page_list(). The root cause is that we scan the page tables instead of specific LRU list. and so we need to filter out the unevictable lru pages from our end. Link: http://lkml.kernel.org/r/1572616245-18946-1-git-send-email-zhongjiang@huawei.com Fixes: 1a4e58cce84e ("mm: introduce MADV_PAGEOUT") Signed-off-by: zhong jiang <zhongjiang@huawei.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-16mm: mempolicy: fix the wrong return value and potential pages leak of mbindYang Shi1-5/+9
Commit d883544515aa ("mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified") fixed the return value of mbind() for a couple of corner cases. But, it altered the errno for some other cases, for example, mbind() should return -EFAULT when part or all of the memory range specified by nodemask and maxnode points outside your accessible address space, or there was an unmapped hole in the specified memory range specified by addr and len. Fix this by preserving the errno returned by queue_pages_range(). And, the pagelist may be not empty even though queue_pages_range() returns error, put the pages back to LRU since mbind_range() is not called to really apply the policy so those pages should not be migrated, this is also the old behavior before the problematic commit. Link: http://lkml.kernel.org/r/1572454731-3925-1-git-send-email-yang.shi@linux.alibaba.com Fixes: d883544515aa ("mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified") Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Reported-by: Li Xinhai <lixinhai.lxh@gmail.com> Reviewed-by: Li Xinhai <lixinhai.lxh@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> [4.19 and 5.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-11-16Input: synaptics - enable RMI mode for X1 Extreme 2nd GenerationLyude Paul1-0/+1
Just got one of these for debugging some unrelated issues, and noticed that Lenovo seems to have gone back to using RMI4 over smbus with Synaptics touchpads on some of their new systems, particularly this one. So, let's enable RMI mode for the X1 Extreme 2nd Generation. Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://lore.kernel.org/r/20191115221814.31903-1-lyude@redhat.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2019-11-16Merge tag 'for-linus-20191115' of git://git.kernel.dk/linux-blockLinus Torvalds5-17/+59
Pull block fixes from Jens Axboe: "A few fixes that should make it into this release. This contains: - io_uring: - The timeout command assumes sequence == 0 means that we want one completion, but this kind of overloading is unfortunate as it prevents users from doing a pure time based wait. Since this operation was introduced in this cycle, let's correct it now, while we can. (me) - One-liner to fix an issue with dependent links and fixed buffer reads. The actual IO completed fine, but the link got severed since we stored the wrong expected value. (me) - Add TIMEOUT to list of opcodes that don't need a file. (Pavel) - rsxx missing workqueue destry calls. Old bug. (Chuhong) - Fix blk-iocost active list check (Jiufei) - Fix impossible-to-hit overflow merge condition, that still hit some folks very rarely (Junichi) - Fix bfq hang issue from 5.3. This didn't get marked for stable, but will go into stable post this merge (Paolo)" * tag 'for-linus-20191115' of git://git.kernel.dk/linux-block: rsxx: add missed destroy_workqueue calls in remove iocost: check active_list of all the ancestors in iocg_activate() block, bfq: deschedule empty bfq_queues not referred by any process io_uring: ensure registered buffer import returns the IO length io_uring: Fix getting file for timeout block: check bi_size overflow before merge io_uring: make timeout sequence == 0 mean no sequence
2019-11-15Merge branch 'ptp-Validate-the-ancillary-ioctl-flags-more-carefully'David S. Miller11-8/+156
Richard Cochran says: ==================== ptp: Validate the ancillary ioctl flags more carefully. The flags passed to the ioctls for periodic output signals and time stamping of external signals were never checked, and thus formed a useless ABI inadvertently. More recently, a version 2 of the ioctls was introduced in order make the flags meaningful. This series tightens up the checks on the new ioctl flags. - Patch 1 ensures at least one edge flag is set for the new ioctl. - Patches 2-7 are Jacob's recent checks, picking up the tags. - Patch 8 introduces a "strict" flag for passing to the drivers when the new ioctl is used. - Patches 9-12 implement the "strict" checking in the drivers. - Patch 13 extends the test program to exercise combinations of flags. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15ptp: Extend the test program to check the external time stamp flags.Richard Cochran1-2/+51
Because each driver and hardware has different capabilities, the test cannot provide a simple pass/fail result, but it can at least show what combinations of flags are supported. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15mlx5: Reject requests to enable time stamping on both edges.Richard Cochran1-0/+6
This driver enables rising edge or falling edge, but not both, and so this patch validates that the request contains only one of the two edges. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15igb: Reject requests that fail to enable time stamping on both edges.Richard Cochran1-0/+6
This hardware always time stamps rising and falling edges, and so this patch validates that the request does contains both edges. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15dp83640: Reject requests to enable time stamping on both edges.Richard Cochran1-0/+7
This driver enables rising edge or falling edge, but not both, and so this patch validates that the request contains only one of the two edges. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15mv88e6xxx: Reject requests to enable time stamping on both edges.Richard Cochran1-0/+6
This driver enables rising edge or falling edge, but not both, and so this patch validates that the request contains only one of the two edges. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15ptp: Introduce strict checking of external time stamp options.Richard Cochran7-6/+15
User space may request time stamps on rising edges, falling edges, or both. However, the particular mode may or may not be supported in the hardware or in the driver. This patch adds a "strict" flag that tells drivers to ensure that the requested mode will be honored. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15renesas: reject unsupported external timestamp flagsJacob Keller1-0/+6
Fix the renesas PTP support to explicitly reject any future flags that get added to the external timestamp request ioctl. In order to maintain currently functioning code, this patch accepts all three current flags. This is because the PTP_RISING_EDGE and PTP_FALLING_EDGE flags have unclear semantics and each driver seems to have interpreted them slightly differently. Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15mlx5: reject unsupported external timestamp flagsJacob Keller1-0/+6
Fix the mlx5 core PTP support to explicitly reject any future flags that get added to the external timestamp request ioctl. In order to maintain currently functioning code, this patch accepts all three current flags. This is because the PTP_RISING_EDGE and PTP_FALLING_EDGE flags have unclear semantics and each driver seems to have interpreted them slightly differently. [ RC: I'm not 100% sure what this driver does, but if I'm not wrong it follows the dp83640: flags Meaning ---------------------------------------------------- -------------------------- PTP_ENABLE_FEATURE Time stamp rising edge PTP_ENABLE_FEATURE|PTP_RISING_EDGE Time stamp rising edge PTP_ENABLE_FEATURE|PTP_FALLING_EDGE Time stamp falling edge PTP_ENABLE_FEATURE|PTP_RISING_EDGE|PTP_FALLING_EDGE Time stamp falling edge ] Cc: Feras Daoud <ferasda@mellanox.com> Cc: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15igb: reject unsupported external timestamp flagsJacob Keller1-0/+6
Fix the igb PTP support to explicitly reject any future flags that get added to the external timestamp request ioctl. In order to maintain currently functioning code, this patch accepts all three current flags. This is because the PTP_RISING_EDGE and PTP_FALLING_EDGE flags have unclear semantics and each driver seems to have interpreted them slightly differently. This HW always time stamps both edges: flags Meaning ---------------------------------------------------- -------------------------- PTP_ENABLE_FEATURE Time stamp both edges PTP_ENABLE_FEATURE|PTP_RISING_EDGE Time stamp both edges PTP_ENABLE_FEATURE|PTP_FALLING_EDGE Time stamp both edges PTP_ENABLE_FEATURE|PTP_RISING_EDGE|PTP_FALLING_EDGE Time stamp both edges Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15dp83640: reject unsupported external timestamp flagsJacob Keller1-0/+5
Fix the dp83640 PTP support to explicitly reject any future flags that get added to the external timestamp request ioctl. In order to maintain currently functioning code, this patch accepts all three current flags. This is because the PTP_RISING_EDGE and PTP_FALLING_EDGE flags have unclear semantics and each driver seems to have interpreted them slightly differently. For the record, the semantics of this driver are: flags Meaning ---------------------------------------------------- -------------------------- PTP_ENABLE_FEATURE Time stamp rising edge PTP_ENABLE_FEATURE|PTP_RISING_EDGE Time stamp rising edge PTP_ENABLE_FEATURE|PTP_FALLING_EDGE Time stamp falling edge PTP_ENABLE_FEATURE|PTP_RISING_EDGE|PTP_FALLING_EDGE Time stamp falling edge Cc: Stefan Sørensen <stefan.sorensen@spectralink.com> Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15mv88e6xxx: reject unsupported external timestamp flagsJacob Keller1-0/+6
Fix the mv88e6xxx PTP support to explicitly reject any future flags that get added to the external timestamp request ioctl. In order to maintain currently functioning code, this patch accepts all three current flags. This is because the PTP_RISING_EDGE and PTP_FALLING_EDGE flags have unclear semantics and each driver seems to have interpreted them slightly differently. For the record, the semantics of this driver are: flags Meaning ---------------------------------------------------- -------------------------- PTP_ENABLE_FEATURE Time stamp falling edge PTP_ENABLE_FEATURE|PTP_RISING_EDGE Time stamp rising edge PTP_ENABLE_FEATURE|PTP_FALLING_EDGE Time stamp falling edge PTP_ENABLE_FEATURE|PTP_RISING_EDGE|PTP_FALLING_EDGE Time stamp rising edge Cc: Brandon Streiff <brandon.streiff@ni.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net: reject PTP periodic output requests with unsupported flagsJacob Keller7-0/+27
Commit 823eb2a3c4c7 ("PTP: add support for one-shot output") introduced a new flag for the PTP periodic output request ioctl. This flag is not currently supported by any driver. Fix all drivers which implement the periodic output request ioctl to explicitly reject any request with flags they do not understand. This ensures that the driver does not accidentally misinterpret the PTP_PEROUT_ONE_SHOT flag, or any new flag introduced in the future. This is important for forward compatibility: if a new flag is introduced, the driver should reject requests to enable the flag until the driver has actually been modified to support the flag in question. Cc: Felipe Balbi <felipe.balbi@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Christopher Hall <christopher.s.hall@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15ptp: Validate requests to enable time stamping of external signals.Richard Cochran2-5/+14
Commit 415606588c61 ("PTP: introduce new versions of IOCTLs") introduced a new external time stamp ioctl that validates the flags. This patch extends the validation to ensure that at least one rising or falling edge flag is set when enabling external time stamps. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15tun: fix data-race in gro_normal_list()Petar Penkov1-2/+2
There is a race in the TUN driver between napi_busy_loop and napi_gro_frags. This commit resolves the race by adding the NAPI struct via netif_tx_napi_add, instead of netif_napi_add, which disables polling for the NAPI struct. KCSAN reported: BUG: KCSAN: data-race in gro_normal_list.part.0 / napi_busy_loop write to 0xffff8880b5d474b0 of 4 bytes by task 11205 on cpu 0: gro_normal_list.part.0+0x77/0xb0 net/core/dev.c:5682 gro_normal_list net/core/dev.c:5678 [inline] gro_normal_one net/core/dev.c:5692 [inline] napi_frags_finish net/core/dev.c:5705 [inline] napi_gro_frags+0x625/0x770 net/core/dev.c:5778 tun_get_user+0x2150/0x26a0 drivers/net/tun.c:1976 tun_chr_write_iter+0x79/0xd0 drivers/net/tun.c:2022 call_write_iter include/linux/fs.h:1895 [inline] do_iter_readv_writev+0x487/0x5b0 fs/read_write.c:693 do_iter_write fs/read_write.c:970 [inline] do_iter_write+0x13b/0x3c0 fs/read_write.c:951 vfs_writev+0x118/0x1c0 fs/read_write.c:1015 do_writev+0xe3/0x250 fs/read_write.c:1058 __do_sys_writev fs/read_write.c:1131 [inline] __se_sys_writev fs/read_write.c:1128 [inline] __x64_sys_writev+0x4e/0x60 fs/read_write.c:1128 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x44/0xa9 read to 0xffff8880b5d474b0 of 4 bytes by task 11168 on cpu 1: gro_normal_list net/core/dev.c:5678 [inline] napi_busy_loop+0xda/0x4f0 net/core/dev.c:6126 sk_busy_loop include/net/busy_poll.h:108 [inline] __skb_recv_udp+0x4ad/0x560 net/ipv4/udp.c:1689 udpv6_recvmsg+0x29e/0xe90 net/ipv6/udp.c:288 inet6_recvmsg+0xbb/0x240 net/ipv6/af_inet6.c:592 sock_recvmsg_nosec net/socket.c:871 [inline] sock_recvmsg net/socket.c:889 [inline] sock_recvmsg+0x92/0xb0 net/socket.c:885 sock_read_iter+0x15f/0x1e0 net/socket.c:967 call_read_iter include/linux/fs.h:1889 [inline] new_sync_read+0x389/0x4f0 fs/read_write.c:414 __vfs_read+0xb1/0xc0 fs/read_write.c:427 vfs_read fs/read_write.c:461 [inline] vfs_read+0x143/0x2c0 fs/read_write.c:446 ksys_read+0xd5/0x1b0 fs/read_write.c:587 __do_sys_read fs/read_write.c:597 [inline] __se_sys_read fs/read_write.c:595 [inline] __x64_sys_read+0x4c/0x60 fs/read_write.c:595 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 11168 Comm: syz-executor.0 Not tainted 5.4.0-rc6+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 943170998b20 ("tun: enable NAPI for TUN/TAP driver") Signed-off-by: Petar Penkov <ppenkov@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15selftests: net: tcp_mmap should create detached threadsEric Dumazet1-1/+6
Since we do not plan using pthread_join() in the server do_accept() loop, we better create detached threads, or risk increasing memory footprint over time. Fixes: 192dc405f308 ("selftests: net: add tcp_mmap program") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net: openvswitch: don't call pad_packet if not necessaryTonghao Zhang1-14/+8
The nla_put_u16/nla_put_u32 makes sure that *attrlen is align. The call tree is that: nla_put_u16/nla_put_u32 -> nla_put attrlen = sizeof(u16) or sizeof(u32) -> __nla_put attrlen -> __nla_reserve attrlen -> skb_put(skb, nla_total_size(attrlen)) nla_total_size returns the total length of attribute including padding. Cc: Joe Stringer <joe@ovn.org> Cc: William Tu <u9012063@gmail.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net: ep93xx_eth: fix mismatch of request_mem_region in removeChuhong Yuan1-2/+3
The driver calls release_resource in remove to match request_mem_region in probe, which is incorrect. Fix it by using the right one, release_mem_region. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15Merge branch 'DSA-driver-for-Vitesse-Felix-switch'David S. Miller19-599/+2043
Vladimir Oltean says: ==================== DSA driver for Vitesse Felix switch This series builds upon the previous "Accomodate DSA front-end into Ocelot" topic and does the following: - Reworks the Ocelot (VSC7514) driver to support one more switching core (VSC9959), used in NPI mode. Some code which was thought to be SoC-specific (ocelot_board.c) wasn't, and vice versa, so it is being accordingly moved. - Exports ocelot driver structures and functions to include/soc/mscc. - Adds a DSA ocelot front-end for VSC9959, which is a PCI device and uses the exported ocelot functionality for hardware configuration. - Adds a tagger driver for the Vitesse injection/extraction DSA headers. This is known to be compatible with at least Ocelot and Felix. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net: dsa: ocelot: add driver for Felix switch familyVladimir Oltean8-0/+1066
This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net: dsa: ocelot: add tagger for Ocelot/Felix switchesVladimir Oltean5-0/+246
While it is entirely possible that this tagger format is in fact more generic than just these 2 switch families, I don't have that knowledge. The Seville switch in NXP T1040 has a similar frame format, but there are enough differences (e.g. DEST field starts at bit 57 instead of 56) that calling this file tag_vitesse.c is a bit of a stretch at the moment. The frame format has been listed in a comment so that people who add support for further Vitesse switches can rework this tagger while keeping compatibility with Felix. The "ocelot" name was chosen instead of "felix" because even the Ocelot switch can act as a DSA device when it is used in NPI mode, and the Felix tagger format is almost identical. Currently it is only used for the Felix switch embedded in the NXP LS1028A chip. The ABI for this tagger should be considered "not stable" at the moment. The DSA tag is always placed before the Ethernet header and therefore, we are using the long prefix for RX tags to avoid putting the DSA master port in promiscuous mode. Once there will be an API in DSA for drivers to request DSA masters to be in promiscuous mode unconditionally, we will switch to the "no prefix" extraction frame header, which will save 16 padding bytes for each RX frame. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net: mscc: ocelot: publish ocelot_sys.h to include/soc/msccVladimir Oltean3-1/+2
The Felix DSA driver needs to write to SYS_RAM_INIT_RAM_INIT for its own chip initialization process. Also update the MAINTAINERS file such that the headers exported by the ocelot driver are under the same maintainers' umbrella as the driver itself. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net: mscc: ocelot: publish structure definitions to include/soc/mscc/ocelot.hVladimir Oltean3-511/+588
We will be registering another switch driver based on ocelot, which lives under drivers/net/dsa. Make sure the Felix DSA front-end has the necessary abstractions to implement a new Ocelot driver instantiation. This includes the function prototypes for implementing DSA callbacks. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net: mscc: ocelot: separate the implementation of switch resetVladimir Oltean3-13/+33
The Felix switch has a different reset procedure, so a function pointer needs to be created and added to the ocelot_ops structure. The reset procedure has been moved into ocelot_init. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net: mscc: ocelot: adjust MTU on the CPU port in NPI modeVladimir Oltean2-0/+11
When using the NPI port, the DSA tag is passed through Ethernet, so the switch's MAC needs to accept it as it comes from the DSA master. Increase the MTU on the external CPU port to account for the length of the injection header. Without this patch, MTU-sized frames are dropped by the switch's CPU port on xmit, which is especially obvious in TCP sessions. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net: mscc: ocelot: export a constant for the tag length in bytesVladimir Oltean3-5/+5
This constant will be used in a future patch to increase the MTU on NPI ports, and will also be used in the tagger driver for Felix. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net: mscc: ocelot: create a helper for changing the port MTUVladimir Oltean1-17/+23
Since in an NPI/DSA setup, not all ports will have the same MTU, we need to make sure the watermarks for pause frames and/or tail dropping logic that existed in the driver is still coherent for the new MTU values. We need to do this because the NPI (aka external CPU) port needs an increased MTU for the DSA tag. This will be done in a future patch. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net: mscc: ocelot: move invariant configs out of adjust_linkVladimir Oltean1-42/+43
It doesn't make sense to rewrite all these registers every time the PHY library notifies us about a link state change. In a future patch we will customize the MTU for the CPU port, and since the MTU was previously configured from adjust_link, if we don't make this change, its value would have got overridden. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-15net: mscc: ocelot: filter out ocelot SoC specific PCS config from common pathClaudiu Manoil4-20/+39
The adjust_link routine should be generic enough to be (re)used by any SoC that integrates a switch core compatible with the Ocelot core switch driver. Currently all configurations are generic except for the PCS settings that are SoC specific. Move these out to the Ocelot SoC/board instance. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>