summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/amazon/ena/ena_netdev.c
AgeCommit message (Collapse)AuthorFilesLines
2019-02-27net: ena: fix race between link up and device initalizationArthur Kiyanovski1-5/+5
[ Upstream commit e1f1bd9bfbedcfce428ee7e1b82a6ec12d4c3863 ] Fix race condition between ena_update_on_link_change() and ena_restore_device(). This race can occur if link notification arrives while the driver is performing a reset sequence. In this case link can be set up, enabling the device, before it is fully restored. If packets are sent at this time, the driver might access uninitialized data structures, causing kernel crash. Move the clearing of ENA_FLAG_ONGOING_RESET and netif_carrier_on() after ena_up() to ensure the device is ready when link is set up. Fixes: d18e4f683445 ("net: ena: fix race condition between device reset and link up setup") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-13net: ena: fix crash during failed resume from hibernationArthur Kiyanovski1-1/+1
[ Upstream commit e76ad21d070f79e566ac46ce0b0584c3c93e1b43 ] During resume from hibernation if ena_restore_device fails, ena_com_dev_reset() is called, and uses the readless read mechanism, which was already destroyed by the call to ena_com_mmio_reg_read_request_destroy(). This causes a NULL pointer reference. In this commit we switch the call order of the above two functions to avoid this crash. Fixes: d7703ddbd7c9 ("net: ena: fix rare bug when failed restart/resume is followed by driver removal") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-10-09net: ena: fix NULL dereference due to untimely napi initializationArthur Kiyanovski1-2/+7
napi poll functions should be initialized before running request_irq(), to handle a rare condition where there is a pending interrupt, causing the ISR to fire immediately while the poll function wasn't set yet, causing a NULL dereference. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-09net: ena: fix rare bug when failed restart/resume is followed by driver removalArthur Kiyanovski1-0/+4
In a rare scenario when ena_device_restore() fails, followed by device remove, an FLR will not be issued. In this case, the device will keep sending asynchronous AENQ keep-alive events, even after driver removal, leading to memory corruption. Fixes: 8c5c7abdeb2d ("net: ena: add power management ops to the ENA driver") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-09net: ena: fix warning in rmmod caused by double iounmapArthur Kiyanovski1-8/+1
Memory mapped with devm_ioremap is automatically freed when the driver is disconnected from the device. Therefore there is no need to explicitly call devm_iounmap. Fixes: 0857d92f71b6 ("net: ena: add missing unmap bars on device removal") Fixes: 411838e7b41c ("net: ena: fix rare kernel crash when bar memory remap fails") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-28net: ena: remove ndo_poll_controllerEric Dumazet1-22/+0
As diagnosed by Song Liu, ndo_poll_controller() can be very dangerous on loaded hosts, since the cpu calling ndo_poll_controller() might steal all NAPI contexts (for all RX/TX queues of the NIC). This capture can last for unlimited amount of time, since one cpu is generally not able to drain all the queues under load. ena uses NAPI for TX completions, so we better let core networking stack call the napi->poll() to avoid the capture. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Netanel Belgazal <netanel@amazon.com> Cc: Saeed Bishara <saeedb@amazon.com> Cc: Zorik Machulsky <zorik@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09net: ena: fix incorrect usage of memory barriersNetanel Belgazal1-19/+11
Added memory barriers where they were missing to support multiple architectures, and removed redundant ones. As part of removing the redundant memory barriers and improving performance, we moved to more relaxed versions of memory barriers, as well as to the more relaxed version of writel - writel_relaxed, while maintaining correctness. Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09net: ena: fix missing lock during device destructionNetanel Belgazal1-13/+7
acquire the rtnl_lock during device destruction to avoid using partially destroyed device. ena_remove() shares almost the same logic as ena_destroy_device(), so use ena_destroy_device() and avoid duplications. Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09net: ena: fix potential double ena_destroy_device()Netanel Belgazal1-0/+5
ena_destroy_device() can potentially be called twice. To avoid this, check that the device is running and only then proceed destroying it. Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09net: ena: fix device destruction to gracefully free resourcesNetanel Belgazal1-6/+7
When ena_destroy_device() is called from ena_suspend(), the device is still reachable from the driver. Therefore, the driver can send a command to the device to free all resources. However, in all other cases of calling ena_destroy_device(), the device is potentially in an error state and unreachable from the driver. In these cases the driver must not send commands to the device. The current implementation does not request resource freeing from the device even when possible. We add the graceful parameter to ena_destroy_device() to enable resource freeing when possible, and use it in ena_suspend(). Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09net: ena: fix driver when PAGE_SIZE == 64kBNetanel Belgazal1-5/+5
The buffer length field in the ena rx descriptor is 16 bit, and the current driver passes a full page in each ena rx descriptor. When PAGE_SIZE equals 64kB or more, the buffer length field becomes zero. To solve this issue, limit the ena Rx descriptor to use 16kB even when allocating 64kB kernel pages. This change would not impact ena device functionality, as 16kB is still larger than maximum MTU. Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-09-09net: ena: fix surprise unplug NULL dereference kernel crashNetanel Belgazal1-2/+2
Starting with driver version 1.5.0, in case of a surprise device unplug, there is a race caused by invoking ena_destroy_device() from two different places. As a result, the readless register might be accessed after it was destroyed. Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-09net: allow fallback function to pass netdevAlexander Duyck1-1/+1
For most of these calls we can just pass NULL through to the fallback function as the sb_dev. The only cases where we cannot are the cases where we might be dealing with either an upper device or a driver that would have configured things to support an sb_dev itself. The only driver that has any significant change in this patch set should be ixgbe as we can drop the redundant functionality that existed in both the ndo_select_queue function and the fallback function that was passed through to us. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-07-09net: allow ndo_select_queue to pass netdevAlexander Duyck1-1/+2
This patch makes it so that instead of passing a void pointer as the accel_priv we instead pass a net_device pointer as sb_dev. Making this change allows us to pass the subordinate device through to the fallback function eventually so that we can keep the actual code in the ndo_select_queue call as focused on possible on the exception cases. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-25net: ena: Use pci_sriov_configure_simple() to enable VFsAlexander Duyck1-27/+1
Instead of implementing our own version of a SR-IOV configuration stub in the ena driver, use the existing pci_sriov_configure_simple() function. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2018-03-26net: ena: Eliminate duplicate barriers on weakly-ordered archsSinan Kaya1-2/+3
Code includes barrier() followed by writel(). writel() already has a barrier on some architectures like arm64. This ends up CPU observing two barriers back to back before executing the register write. Create a new wrapper function with relaxed write operator. Use the new wrapper when a write is following a barrier(). Since code already has an explicit barrier call, changing writel() to writel_relaxed() and adding mmiowb() for ordering protection. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-15/+30
2018-01-03net: ena: fix error handling in ena_down() sequenceNetanel Belgazal1-2/+17
ENA admin command queue errors are not handled as part of ena_down(). As a result, in case of error admin queue transitions to non-running state and aborts all subsequent commands including those coming from ena_up(). Reset scheduled by the driver from the timer service context would not proceed due to sharing rtnl with ena_up()/ena_down() Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-03net: ena: unmask MSI-X only after device initialization is completedNetanel Belgazal1-13/+13
Under certain conditions MSI-X interrupt might arrive right after it was unmasked in ena_up(). There is a chance it would be processed by the driver before device ENA_FLAG_DEV_UP flag is set. In such a case the interrupt is ignored. ENA device operates in auto-masked mode, therefore ignoring interrupt leaves it masked for good. Moving unmask of interrupt to be the last step in ena_up(). Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-02net: ena: add detection and recovery mechanism for handling missed/misrouted ↵Netanel Belgazal1-7/+61
MSI-X A mechanism for detection of stuck Rx/Tx rings due to missed or misrouted interrupts. Check if there are unhandled completion descriptors before the first MSI-X interrupt arrived. The check is per queue and per interrupt vector. Once such condition is detected, driver and device reset is scheduled. Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-22treewide: setup_timer() -> timer_setup()Kees Cook1-4/+3
This converts all remaining cases of the old setup_timer() API into using timer_setup(), where the callback argument is the structure already holding the struct timer_list. These should have no behavioral changes, since they just change which pointer is passed into the callback with the same available pointers after conversion. It handles the following examples, in addition to some other variations. Casting from unsigned long: void my_callback(unsigned long data) { struct something *ptr = (struct something *)data; ... } ... setup_timer(&ptr->my_timer, my_callback, ptr); and forced object casts: void my_callback(struct something *ptr) { ... } ... setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr); become: void my_callback(struct timer_list *t) { struct something *ptr = from_timer(ptr, t, my_timer); ... } ... timer_setup(&ptr->my_timer, my_callback, 0); Direct function assignments: void my_callback(unsigned long data) { struct something *ptr = (struct something *)data; ... } ... ptr->my_timer.function = my_callback; have a temporary cast added, along with converting the args: void my_callback(struct timer_list *t) { struct something *ptr = from_timer(ptr, t, my_timer); ... } ... ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback; And finally, callbacks without a data assignment: void my_callback(unsigned long data) { ... } ... setup_timer(&ptr->my_timer, my_callback, 0); have their argument renamed to verify they're unused during conversion: void my_callback(struct timer_list *unused) { ... } ... timer_setup(&ptr->my_timer, my_callback, 0); The conversion is done with the following Coccinelle script: spatch --very-quiet --all-includes --include-headers \ -I ./arch/x86/include -I ./arch/x86/include/generated \ -I ./include -I ./arch/x86/include/uapi \ -I ./arch/x86/include/generated/uapi -I ./include/uapi \ -I ./include/generated/uapi --include ./include/linux/kconfig.h \ --dir . \ --cocci-file ~/src/data/timer_setup.cocci @fix_address_of@ expression e; @@ setup_timer( -&(e) +&e , ...) // Update any raw setup_timer() usages that have a NULL callback, but // would otherwise match change_timer_function_usage, since the latter // will update all function assignments done in the face of a NULL // function initialization in setup_timer(). @change_timer_function_usage_NULL@ expression _E; identifier _timer; type _cast_data; @@ ( -setup_timer(&_E->_timer, NULL, _E); +timer_setup(&_E->_timer, NULL, 0); | -setup_timer(&_E->_timer, NULL, (_cast_data)_E); +timer_setup(&_E->_timer, NULL, 0); | -setup_timer(&_E._timer, NULL, &_E); +timer_setup(&_E._timer, NULL, 0); | -setup_timer(&_E._timer, NULL, (_cast_data)&_E); +timer_setup(&_E._timer, NULL, 0); ) @change_timer_function_usage@ expression _E; identifier _timer; struct timer_list _stl; identifier _callback; type _cast_func, _cast_data; @@ ( -setup_timer(&_E->_timer, _callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, &_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, &_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)&_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E._timer, _callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, &_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, &_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | _E->_timer@_stl.function = _callback; | _E->_timer@_stl.function = &_callback; | _E->_timer@_stl.function = (_cast_func)_callback; | _E->_timer@_stl.function = (_cast_func)&_callback; | _E._timer@_stl.function = _callback; | _E._timer@_stl.function = &_callback; | _E._timer@_stl.function = (_cast_func)_callback; | _E._timer@_stl.function = (_cast_func)&_callback; ) // callback(unsigned long arg) @change_callback_handle_cast depends on change_timer_function_usage@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; identifier _handle; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { ( ... when != _origarg _handletype *_handle = -(_handletype *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle = -(void *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(_handletype *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(void *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg ) } // callback(unsigned long arg) without existing variable @change_callback_handle_cast_no_arg depends on change_timer_function_usage && !change_callback_handle_cast@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { + _handletype *_origarg = from_timer(_origarg, t, _timer); + ... when != _origarg - (_handletype *)_origarg + _origarg ... when != _origarg } // Avoid already converted callbacks. @match_callback_converted depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier t; @@ void _callback(struct timer_list *t) { ... } // callback(struct something *handle) @change_callback_handle_arg depends on change_timer_function_usage && !match_callback_converted && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; @@ void _callback( -_handletype *_handle +struct timer_list *t ) { + _handletype *_handle = from_timer(_handle, t, _timer); ... } // If change_callback_handle_arg ran on an empty function, remove // the added handler. @unchange_callback_handle_arg depends on change_timer_function_usage && change_callback_handle_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; identifier t; @@ void _callback(struct timer_list *t) { - _handletype *_handle = from_timer(_handle, t, _timer); } // We only want to refactor the setup_timer() data argument if we've found // the matching callback. This undoes changes in change_timer_function_usage. @unchange_timer_function_usage depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg && !change_callback_handle_arg@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type change_timer_function_usage._cast_data; @@ ( -timer_setup(&_E->_timer, _callback, 0); +setup_timer(&_E->_timer, _callback, (_cast_data)_E); | -timer_setup(&_E._timer, _callback, 0); +setup_timer(&_E._timer, _callback, (_cast_data)&_E); ) // If we fixed a callback from a .function assignment, fix the // assignment cast now. @change_timer_function_assignment depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_func; typedef TIMER_FUNC_TYPE; @@ ( _E->_timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -&_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -(_cast_func)_callback; +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -&_callback; +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -(_cast_func)_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; ) // Sometimes timer functions are called directly. Replace matched args. @change_timer_function_calls depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression _E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_data; @@ _callback( ( -(_cast_data)_E +&_E->_timer | -(_cast_data)&_E +&_E._timer | -_E +&_E->_timer ) ) // If a timer has been configured without a data argument, it can be // converted without regard to the callback argument, since it is unused. @match_timer_function_unused_data@ expression _E; identifier _timer; identifier _callback; @@ ( -setup_timer(&_E->_timer, _callback, 0); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, 0L); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, 0UL); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0L); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0UL); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_timer, _callback, 0); +timer_setup(&_timer, _callback, 0); | -setup_timer(&_timer, _callback, 0L); +timer_setup(&_timer, _callback, 0); | -setup_timer(&_timer, _callback, 0UL); +timer_setup(&_timer, _callback, 0); | -setup_timer(_timer, _callback, 0); +timer_setup(_timer, _callback, 0); | -setup_timer(_timer, _callback, 0L); +timer_setup(_timer, _callback, 0); | -setup_timer(_timer, _callback, 0UL); +timer_setup(_timer, _callback, 0); ) @change_callback_unused_data depends on match_timer_function_unused_data@ identifier match_timer_function_unused_data._callback; type _origtype; identifier _origarg; @@ void _callback( -_origtype _origarg +struct timer_list *unused ) { ... when != _origarg } Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-20net: ena: fix race condition between device reset and link up setupNetanel Belgazal1-2/+9
In rare cases, ena driver would reset and re-start the device, for example, in case of misbehaving application that causes transmit timeout The first step in the reset procedure is to stop the Tx traffic by calling ena_carrier_off(). After the driver have just started the device reset procedure, device happens to send an asynchronous notification (via AENQ) to the driver than there was a link change (to link-up state). This link change is mapped to a call to netif_carrier_on() which re-activates the Tx queues, violating the assumption of no tx traffic until device reset is completed, as the reset task might still be in the process of queues initialization, leading to an access to uninitialized memory. Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-16Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+1
Merge updates from Andrew Morton: - a few misc bits - ocfs2 updates - almost all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits) memory hotplug: fix comments when adding section mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP mm: simplify nodemask printing mm,oom_reaper: remove pointless kthread_run() error check mm/page_ext.c: check if page_ext is not prepared writeback: remove unused function parameter mm: do not rely on preempt_count in print_vma_addr mm, sparse: do not swamp log with huge vmemmap allocation failures mm/hmm: remove redundant variable align_end mm/list_lru.c: mark expected switch fall-through mm/shmem.c: mark expected switch fall-through mm/page_alloc.c: broken deferred calculation mm: don't warn about allocations which stall for too long fs: fuse: account fuse_inode slab memory as reclaimable mm, page_alloc: fix potential false positive in __zone_watermark_ok mm: mlock: remove lru_add_drain_all() mm, sysctl: make NUMA stats configurable shmem: convert shmem_init_inodecache() to void Unify migrate_pages and move_pages access checks mm, pagevec: rename pagevec drained field ...
2017-11-16mm: remove __GFP_COLDMel Gorman1-1/+1
As the page free path makes no distinction between cache hot and cold pages, there is no real useful ordering of pages in the free list that allocation requests can take advantage of. Juding from the users of __GFP_COLD, it is likely that a number of them are the result of copying other sites instead of actually measuring the impact. Remove the __GFP_COLD parameter which simplifies a number of paths in the page allocator. This is potentially controversial but bear in mind that the size of the per-cpu pagelists versus modern cache sizes means that the whole per-cpu list can often fit in the L3 cache. Hence, there is only a potential benefit for microbenchmarks that alloc/free pages in a tight loop. It's even worse when THP is taken into account which has little or no chance of getting a cache-hot page as the per-cpu list is bypassed and the zeroing of multiple pages will thrash the cache anyway. The truncate microbenchmarks are not shown as this patch affects the allocation path and not the free path. A page fault microbenchmark was tested but it showed no sigificant difference which is not surprising given that the __GFP_COLD branches are a miniscule percentage of the fault path. Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-10-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-3/+4
There were quite a few overlapping sets of changes here. Daniel's bug fix for off-by-ones in the new BPF branch instructions, along with the added allowances for "data_end > ptr + x" forms collided with the metadata additions. Along with those three changes came veritifer test cases, which in their final form I tried to group together properly. If I had just trimmed GIT's conflict tags as-is, this would have split up the meta tests unnecessarily. In the socketmap code, a set of preemption disabling changes overlapped with the rename of bpf_compute_data_end() to bpf_compute_data_pointers(). Changes were made to the mv88e6060.c driver set addr method which got removed in net-next. The hyperv transport socket layer had a locking change in 'net' which overlapped with a change of socket state macro usage in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-19net: ena: add statistics for missed tx packetsNetanel Belgazal1-13/+17
Add a new statistic to ethtool stats that show the number of packets without transmit acknowledgement from ENA device. Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-19net: ena: add power management ops to the ENA driverNetanel Belgazal1-27/+90
Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-19net: ena: remove legacy suspend suspend/resume supportNetanel Belgazal1-50/+0
Remove ena_device_io_suspend/resume() methods Those methods were intend to be used by the device to trigger suspend/resume but eventually it was dropped. Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-19net: ena: fix rare kernel crash when bar memory remap failsNetanel Belgazal1-1/+2
This failure is rare and only found on testing where deliberately fail devm_ioremap() [ 451.170464] ena 0000:04:00.0: failed to remap regs bar 451.170549] Workqueue: pciehp-1 pciehp_power_thread [ 451.170551] task: ffff88085a5f2d00 task.stack: ffffc9000756c000 [ 451.170552] RIP: 0010:devm_iounmap+0x2d/0x40 [ 451.170553] RSP: 0018:ffffc9000756fac0 EFLAGS: 00010282 [ 451.170554] RAX: 00000000fffffffe RBX: 0000000000000000 RCX: 0000000000000000 [ 451.170555] RDX: ffffffff813a7e00 RSI: 0000000000000282 RDI: 0000000000000282 [ 451.170556] RBP: ffffc9000756fac8 R08: 00000000fffffffe R09: 00000000000009b7 [ 451.170557] R10: 0000000000000005 R11: 00000000000009b6 R12: ffff880856c9d0a0 [ 451.170558] R13: ffffc9000f5c90c0 R14: ffff880856c9d0a0 R15: 0000000000000028 [ 451.170559] FS: 0000000000000000(0000) GS:ffff88085f400000(0000) knlGS:0000000000000000 [ 451.170560] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 451.170561] CR2: 00007f169038b000 CR3: 0000000001c09000 CR4: 00000000003406f0 [ 451.170562] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 451.170562] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 451.170563] Call Trace: [ 451.170572] ena_release_bars.isra.48+0x34/0x60 [ena] [ 451.170574] ena_probe+0x144/0xd90 [ena] [ 451.170579] ? ida_simple_get+0x98/0x100 [ 451.170585] ? kernfs_next_descendant_post+0x40/0x50 [ 451.170591] local_pci_probe+0x45/0xa0 [ 451.170592] pci_device_probe+0x157/0x180 [ 451.170599] driver_probe_device+0x2a8/0x460 [ 451.170600] __device_attach_driver+0x7e/0xe0 [ 451.170602] ? driver_allows_async_probing+0x30/0x30 [ 451.170603] bus_for_each_drv+0x68/0xb0 [ 451.170605] __device_attach+0xdd/0x160 [ 451.170607] device_attach+0x10/0x20 [ 451.170610] pci_bus_add_device+0x4f/0xa0 [ 451.170611] pci_bus_add_devices+0x39/0x70 [ 451.170613] pciehp_configure_device+0x96/0x120 [ 451.170614] pciehp_enable_slot+0x1b3/0x290 [ 451.170616] pciehp_power_thread+0x3b/0xb0 [ 451.170622] process_one_work+0x149/0x360 [ 451.170623] worker_thread+0x4d/0x3c0 [ 451.170626] kthread+0x109/0x140 [ 451.170627] ? rescuer_thread+0x380/0x380 [ 451.170628] ? kthread_park+0x60/0x60 [ 451.170632] ret_from_fork+0x25/0x30 Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-19net: ena: reduce the severity of some printoutsNetanel Belgazal1-2/+2
Decrease log level of checksum errors as these messages can be triggered remotely by bad packets. Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23net: ena: update driver's rx drop statisticsNetanel Belgazal1-0/+9
rx drop counter is reported by the device in the keep-alive event. update the driver's counter with the device counter. Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23net: ena: separate skb allocation to dedicated functionNetanel Belgazal1-17/+27
Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23net: ena: use napi_schedule_irqoff when possibleNetanel Belgazal1-1/+1
Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23net: ena: allow the driver to work with small number of msix vectorsNetanel Belgazal1-19/+46
Current driver tries to allocate msix vectors as the number of the negotiated io queues. (with another msix vector for management). If pci_alloc_irq_vectors() fails, the driver aborts the probe and the ENA network device is never brought up. With this patch, the driver's logic will reduce the number of IO queues to the number of allocated msix vectors (minus one for management) instead of failing probe(). Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23net: ena: add support for out of order rx buffers refillNetanel Belgazal1-15/+68
ENA driver post Rx buffers through the Rx submission queue for the ENA device to fill them with receive packets. Each Rx buffer is marked with req_id in the Rx descriptor. Newer ENA devices could consume the posted Rx buffer in out of order, and as result the corresponding Rx completion queue will have Rx completion descriptors with non contiguous req_id(s) In this change the driver holds two rings. The first ring (called free_rx_ids) is a mapping ring. It holds all the unused request ids. The values in this ring are from 0 to ring_size -1. When the driver wants to allocate a new Rx buffer it uses the head of free_rx_ids and uses it's value as the index for rx_buffer_info ring. The req_id is also written to the Rx descriptor Upon Rx completion, The driver took the req_id from the completion descriptor and uses it as index in rx_buffer_info. The req_id is then return to the free_rx_ids ring. This patch also adds statistics to inform when the driver receive out of range or unused req_id. Note: free_rx_ids is only accessible from the napi handler, so no locking is required Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23net: ena: add reset reason for each device FLRNetanel Belgazal1-4/+13
For each device reset, log to the device what is the cause the reset occur. Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23net: ena: add hardware hints capability to the driverNetanel Belgazal1-5/+61
With this patch, ENA device can update the ena driver about the desired timeout values: These values are part of the "hardware hints" which are transmitted to the driver as Asynchronous event through ENA async event notification queue. In case the ENA device does not support this capability, the driver will use its own default values. Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-23net: ena: change return value for unsupported features unsupported return valueNetanel Belgazal1-10/+10
return -EOPNOTSUPP instead of -EPERM. Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-11net: ena: bug fix in lost tx packets detection mechanismNetanel Belgazal1-28/+38
check_for_missing_tx_completions() is called from a timer task and looking for lost tx packets. The old implementation accumulate all the lost tx packets and did not check if those packets were retrieved on a later stage. This cause to a situation where the driver reset the device for no reason. Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-11net: ena: fix theoretical Rx hang on low memory systemsNetanel Belgazal1-0/+55
For the rare case where the device runs out of free rx buffer descriptors (in case of pressure on kernel memory), and the napi handler continuously fail to refill new Rx descriptors until device rx queue totally runs out of all free rx buffers to post incoming packet, leading to a deadlock: * The device won't send interrupts since all the new Rx packets will be dropped. * The napi handler won't try to allocate new Rx descriptors since allocation is part of NAPI that's not being invoked any more The fix involves detecting this scenario and rescheduling NAPI (to refill buffers) by the keepalive/watchdog task. Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-11net: ena: add missing unmap bars on device removalNetanel Belgazal1-4/+11
This patch also change the mapping functions to devm_ functions Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-11net: ena: add missing return when ena_com_get_io_handlers() failsNetanel Belgazal1-0/+2
Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-11net: ena: fix bug that might cause hang after consecutive open/close interface.Netanel Belgazal1-15/+26
Fixing a bug that the driver does not unmask the IO interrupts in ndo_open(): occasionally, the MSI-X interrupt (for one or more IO queues) can be masked when ndo_close() was called. If that is followed by ndo open(), then the MSI-X will be still masked so no interrupt will be received by the driver. Fixes: 1738cd3ed342 ("Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Netanel Belgazal <netanel@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-11net/ena: switch to pci_alloc_irq_vectorsChristoph Hellwig1-43/+12
Remove the deprecated pci_enable_msix API in favour of its successor. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-20net: ena: remove superfluous check in ena_remove()Lino Sanfilippo1-6/+0
The check in ena_remove() for the pci driver data not being NULL is not needed, since it is always set in the probe() function. Remove the superfluous check. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10net/ena: change condition for host attribute configurationNetanel Belgazal1-2/+3
Move the host info config to be the first admin command that is executed. This change require the driver to remove the 'feature check' from host info configuration flow. The check is removed since the supported features bitmask field is retrieved only after calling ENA_ADMIN_DEVICE_ATTRIBUTES admin command. If set host info is not supported an error will be returned by the device. Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10net/ena: reduce the severity of ena printoutsNetanel Belgazal1-3/+11
Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10net/ena: use napi_complete_done() return valueNetanel Belgazal1-15/+29
Do not unamsk interrupts if we are in busy poll mode. Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10net/ena: fix potential access to freed memory during device resetNetanel Belgazal1-13/+43
If the ena driver detects that the device is not behave as expected, it tries to reset the device. The reset flow calls ena_down, which will frees all the resources the driver allocates and then it will reset the device. This flow can cause memory corruption if the device is still writes to the driver's memory space. To overcome this potential race, move the reset before the device resources are freed. Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10net/ena: refactor ena_get_stats64 to be atomic context safeNetanel Belgazal1-15/+33
ndo_get_stat64() can be called from atomic context, but the current implementation sends an admin command to retrieve the statistics from the device. This admin command can sleep. This patch re-factors the implementation of ena_get_stats64() to use the {rx,tx}bytes/count from the driver's inner counters, and to obtain the rx drop counter from the asynchronous keep alive (heart bit) event. Signed-off-by: Netanel Belgazal <netanel@annapurnalabs.com> Signed-off-by: David S. Miller <davem@davemloft.net>