Age | Commit message (Collapse) | Author | Files | Lines |
|
Commit a7854487cd7128a30a7f4f5259de9f67d5efb95f:
md: When RAID5 is dirty, force reconstruct-write instead of read-modify-write.
Causes an RCW cycle to be forced even when the array is degraded.
A degraded array cannot support RCW as that requires reading all data
blocks, and one may be missing.
Forcing an RCW when it is not possible causes a live-lock and the code
spins, repeatedly deciding to do something that cannot succeed.
So change the condition to only force RCW on non-degraded arrays.
Reported-by: Manibalan P <pmanibalan@amiindia.co.in>
Bisected-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Tested-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Fixes: a7854487cd7128a30a7f4f5259de9f67d5efb95f
Cc: stable@vger.kernel.org (v3.7+)
|
|
RAID10 version of earlier fix for RAID1. We must never initiate
IO with sizes less that logical_block_size.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
This modifies raid1's narrow_write_error to round up block_sectors to the
device's logical block size.
This prevents sd complaining about "Bad block number requested" for non-512-byte
sector disks.
Signed-off-by: Nate Dailey <nate.dailey@stratus.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
A RAID0 array (like a LINEAR array) does not have a concept
of 'size' being the amount of each device that is in use.
Rather, as much of each device as is available is used.
So the 'size' is set to 0 and ignored.
RAID10 does have this concept and needs it to be set correctly.
So when we convert RAID0 to RAID10 we must determine the
'size' (that being the size of the first 'strip_zone' in the
RAID0), and set it correctly.
Reported-and-tested-by: Xiao Ni <xni@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
After each call to rdev_dec_pending() we should wakeup the
md thread if the device is found to be faulty.
Otherwise we'll incur heavy delays on failing devices.
Signed-off-by: Neil Brown <nfbrown@suse.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
|
|
Rather than using mddev_lock() to take the reconfig_mutex
when writing to any md sysfs file, we only take mddev_lock()
in the particular _store() functions that require it.
Admittedly this is most, but it isn't all.
This also allows us to remove special-case handling for new_dev_store
(in md_attr_store).
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
The one which is not inline (mddev_unlock) gets EXPORTed.
This makes the locking available to personality modules so that it
doesn't have to be imposed upon them.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
There are interdependencies between these two sysfs attributes
and whether a resync is currently running.
Rather than depending on reconfig_mutex to ensure no races when
testing these interdependencies are met, use the spinlock.
This will allow the mutex to be remove from protecting this
code in a subsequent patch.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
There isn't really much room for races with ->safemode_delay.
But as I am trying to clean up any racy code and will soon
be removing reconfig_mutex protection from most _store()
functions:
- only set mddev->safemode_delay once, to ensure no code
can see an intermediate value
- use safemode_timer to call md_safemode_timeout() rather than
calling it directly, to ensure it never races with itself.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
It makes more sense to report bitmap_info->file, rather than
bitmap->file (the later is only available once the array is
active).
With that change, use mddev->lock to protect bitmap_info being
set to NULL, and we can call get_bitmap_file() without taking
the mutex.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
1/ delay setting mddev->bitmap_info.file until 'f' looks
usable, so we don't have to unset it.
2/ Don't allow bitmap file to be set if bitmap_info.file
is already set.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
'buf' is only used because d_path fills from the end of the
buffer instead of from the start.
We don't need a separate buf to handle that, we just need to use
memmove() to move the string to the start.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
No rdev attributes need locking for 'show', though
state_show() might benefit from ensuring it sees a
consistent set of flags.
None even use rdev->mddev, so testing for it isn't really
needed and it certainly doesn't need to be held constant.
So improve state_show() and remove the locking.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
Most attributes can be read safely without any locking.
A race might lead to a slightly out-dated value, but nothing wrong.
We already have locking in some places where needed.
All that remains is can_clear_show(), behind_writes_used_show()
and action_show() which are easily fixed.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
It is important that mddev->private isn't freed while
a sysfs attribute function is accessing it.
So use mddev->lock to protect the setting of ->private to NULL, and
take that lock when checking ->private for NULL and de-referencing it
in the sysfs access functions.
This only applies to the read ('show') side of access. Write
access will be handled separately.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
The only access in md_seq_show that could suffer from races
not protected by ->lock is walking the rdev list.
This can receive sufficient protection from 'rcu'.
So use rdev_for_each_rcu() and get rid of mddev_lock().
Now reading /proc/mdstat will never block in md_seq_show.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
This makes it safe to inspect the struct while holding only
the spinlock.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
->pers is already protected by ->reconfig_mutex, and
cannot possibly change when there are threads running or
outstanding IO.
However there are some places where we access ->pers
not in a thread or IO context, and where ->reconfig_mutex
is unnecessarily heavy-weight: level_show and md_seq_show().
So protect all changes, and those accesses, with ->lock.
This is a step toward taking those accesses out from under
reconfig_mutex.
[Fixed missing "mddev->pers" -> "pers" conversion, thanks to
Dan Carpenter <dan.carpenter@oracle.com>]
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
Gather all the changes that can happen atomically and might
be relevant to other code into one place. This will
make it easier to refine the locking.
Note that this puts quite a few things between mddev_detach()
and ->free(). Enabling this was the point of some recent patches.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
Now that the ->stop function only frees the private data,
rename is accordingly.
Also pass in the private pointer as an arg rather than using
mddev->private. This flexibility will be useful in level_store().
Finally, don't clear ->private. It doesn't make sense to clear
it seeing that isn't what we free, and it is no longer necessary
to clear ->private (it was some time ago before ->to_remove was
introduced).
Setting ->to_remove in ->free() is a bit of a wart, but not a
big problem at the moment.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
Each md personality has a 'stop' operation which does two
things:
1/ it finalizes some aspects of the array to ensure nothing
is accessing the ->private data
2/ it frees the ->private data.
All the steps in '1' can apply to all arrays and so can be
performed in common code.
This is useful as in the case where we change the personality which
manages an array (in level_store()), it would be helpful to do
step 1 early, and step 2 later.
So split the 'step 1' functionality out into a new mddev_detach().
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
The use of 'rcu' to protect accesses to ->private_data so that
the ->private_data could be updated predates the introduction
of mddev_suspend/mddev_resume.
These are a cleaner mechanism for providing stability while
swapping in a new ->private data - it is used by level_store()
to support changing of raid levels.
So get rid of the RCU stuff and just use mddev_suspend, mddev_resume.
As these function call ->quiesce(), we add an empty function for
linear just like for raid0.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
There is no locking around calls to merge_bvec_fn(), so
it is possible that calls which coincide with a level (or personality)
change could go wrong.
So create a central dispatch point for these functions and use
rcu_read_lock().
If the array is suspended, reject any merge that can be rejected.
If not, we know it is safe to call the function.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
There is currently no locking around calls to the 'congested'
bdi function. If called at an awkward time while an array is
being converted from one level (or personality) to another, there
is a tiny chance of running code in an unreferenced module etc.
So add a 'congested' function to the md_personality operations
structure, and call it with appropriate locking from a central
'mddev_congested'.
When the array personality is changing the array will be 'suspended'
so no IO is processed.
If mddev_congested detects this, it simply reports that the
array is congested, which is a safe guess.
As mddev_suspend calls synchronize_rcu(), mddev_congested can
avoid races by included the whole call inside an rcu_read_lock()
region.
This require that the congested functions for all subordinate devices
can be run under rcu_lock. Fortunately this is the case.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
This lock is used for (slightly) more than helping with writing
superblocks, and it will soon be extended further. So the
name is inappropriate.
Also, the _irq variant hasn't been needed since 2.6.37 as it is
never taking from interrupt or bh context.
So:
-rename write_lock to lock
-document what it protects
-remove _irq ... except in md_flush_request() as there
is no wait_event_lock() (with no _irq). This can be
cleaned up after appropriate changes to wait.h.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
That last condition is unclear and over cautious.
There are two related issues here.
If a partial write is destined for a missing device, then
either RMW or RCW can work. We must read all the available
block. Only then can the missing blocks be calculated, and
then the parity update performed.
If RMW is not an option, then there is a complication even
without partial writes. If we would need to read a missing
device to perform the reconstruction, then we must first read every
block so the missing device data can be computed.
This is the case for RAID6 (Which currently does not support
RMW) and for times when we don't trust the parity (after a crash)
and so are in the process of resyncing it.
So make these two cases more clear and separate, and perform
the relevant tests more thoroughly.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
Both the last two cases are only relevant if something has failed and
something needs to be written (but not over-written), and if it is OK
to pre-read blocks at this point. So factor out those tests and
explain them.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
Some of the conditions in need_this_block have very straight
forward motivation. Separate those out and document them.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
fetch_block() has a very large and hard to read 'if' condition.
Separate it into its own function so that it can be
made more readable.
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
67f455486d2ea20b2d94d6adf5b9b783d079e321 introduced a call to
md_wakeup_thread() when adding to the delayed_list. However the md
thread is woken up unconditionally just below.
Remove the unnecessary wakeup call.
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
commit 8eb23b9f35aae413140d3fda766a98092c21e9b0
sched: Debug nested sleeps
causes false-positive warnings in RAID5 code.
This annotation removes them and adds a comment
explaining why there is no real problem.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
If a non-page-aligned write is destined for a device which
is missing/faulty, we can deadlock.
As the target device is missing, a read-modify-write cycle
is not possible.
As the write is not for a full-page, a recontruct-write cycle
is not possible.
This should be handled by logic in fetch_block() which notices
there is a non-R5_OVERWRITE write to a missing device, and so
loads all blocks.
However since commit 67f455486d2ea2, that code requires
STRIPE_PREREAD_ACTIVE before it will active, and those circumstances
never set STRIPE_PREREAD_ACTIVE.
So: in handle_stripe_dirtying, if neither rmw or rcw was possible,
set STRIPE_DELAYED, which will cause STRIPE_PREREAD_ACTIVE be set
after a suitable delay.
Fixes: 67f455486d2ea20b2d94d6adf5b9b783d079e321
Cc: stable@vger.kernel.org (v3.16+)
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Tested-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>
|
|
Pull networking fixes from David Miller:
1) Don't OOPS on socket AIO, from Christoph Hellwig.
2) Scheduled scans should be aborted upon RFKILL, from Emmanuel
Grumbach.
3) Fix sleep in atomic context in kvaser_usb, from Ahmed S Darwish.
4) Fix RCU locking across copy_to_user() in bpf code, from Alexei
Starovoitov.
5) Lots of crash, memory leak, short TX packet et al bug fixes in
sh_eth from Ben Hutchings.
6) Fix memory corruption in SCTP wrt. INIT collitions, from Daniel
Borkmann.
7) Fix return value logic for poll handlers in netxen, enic, and bnx2x.
From Eric Dumazet and Govindarajulu Varadarajan.
8) Header length calculation fix in mac80211 from Fred Chou.
9) mv643xx_eth doesn't handle highmem correctly in non-TSO code paths.
From Ezequiel Garcia.
10) udp_diag has bogus logic in it's hash chain skipping, copy same fix
tcp diag used. From Herbert Xu.
11) amd-xgbe programs wrong rx flow control register, from Thomas
Lendacky.
12) Fix race leading to use after free in ping receive path, from Subash
Abhinov Kasiviswanathan.
13) Cache redirect routes otherwise we can get a heavy backlog of rcu
jobs liberating DST_NOCACHE entries. From Hannes Frederic Sowa.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits)
net: don't OOPS on socket aio
stmmac: prevent probe drivers to crash kernel
bnx2x: fix napi poll return value for repoll
ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too
sh_eth: Fix DMA-API usage for RX buffers
sh_eth: Check for DMA mapping errors on transmit
sh_eth: Ensure DMA engines are stopped before freeing buffers
sh_eth: Remove RX overflow log messages
ping: Fix race in free in receive path
udp_diag: Fix socket skipping within chain
can: kvaser_usb: Fix state handling upon BUS_ERROR events
can: kvaser_usb: Retry the first bulk transfer on -ETIMEDOUT
can: kvaser_usb: Send correct context to URB completion
can: kvaser_usb: Do not sleep in atomic context
ipv4: try to cache dst_entries which would cause a redirect
samples: bpf: relax test_maps check
bpf: rcu lock must not be held when calling copy_to_user()
net: sctp: fix slab corruption from use after free on INIT collisions
net: mv643xx_eth: Fix highmem support in non-TSO egress path
sh_eth: Fix serialisation of interrupt disable with interrupt & NAPI handlers
...
|
|
In the case when alloc_netdev fails we return NULL to a caller. But there is no
check for NULL in the probe drivers. This patch changes NULL to an error
pointer. The function description is amended to reflect what we may get
returned.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull one more module fix from Rusty Russell:
"SCSI was using module_refcount() to figure out when the module was
unloading: this broke with new atomic refcounting. The code is still
suspicious, but this solves the WARN_ON()"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
scsi: always increment reference count
|
|
With the commit d75b1ade567ffab ("net: less interrupt masking in NAPI") napi
repoll is done only when work_done == budget. When in busy_poll is we return 0
in napi_poll. We should return budget.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
- Use the return value of dma_map_single(), rather than calling
virt_to_page() separately
- Check for mapping failue
- Call dma_unmap_single() rather than dma_sync_single_for_cpu()
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dma_map_single() may fail if an IOMMU or swiotlb is in use, so
we need to check for this.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently we try to clear EDRRR and EDTRR and immediately continue to
free buffers. This is unsafe because:
- In general, register writes are not serialised with DMA, so we still
have to wait for DMA to complete somehow
- The R8A7790 (R-Car H2) manual states that the TX running flag cannot
be cleared by writing to EDTRR
- The same manual states that clearing the RX running flag only stops
RX DMA at the next packet boundary
I applied this patch to the driver to detect DMA writes to freed
buffers:
> --- a/drivers/net/ethernet/renesas/sh_eth.c
> +++ b/drivers/net/ethernet/renesas/sh_eth.c
> @@ -1098,7 +1098,14 @@ static void sh_eth_ring_free(struct net_device *ndev)
> /* Free Rx skb ringbuffer */
> if (mdp->rx_skbuff) {
> for (i = 0; i < mdp->num_rx_ring; i++)
> + memcpy(mdp->rx_skbuff[i]->data,
> + "Hello, world", 12);
> + msleep(100);
> + for (i = 0; i < mdp->num_rx_ring; i++) {
> + WARN_ON(memcmp(mdp->rx_skbuff[i]->data,
> + "Hello, world", 12));
> dev_kfree_skb(mdp->rx_skbuff[i]);
> + }
> }
> kfree(mdp->rx_skbuff);
> mdp->rx_skbuff = NULL;
then ran the loop:
while ethtool -G eth0 rx 128 ; ethtool -G eth0 rx 64; do echo -n .; done
and 'ping -f' toward the sh_eth port from another machine. The
warning fired several times a minute.
To fix these issues:
- Deactivate all TX descriptors rather than writing to EDTRR
- As there seems to be no way of telling when RX DMA is stopped,
perform a soft reset to ensure that both DMA enginess are stopped
- To reduce the possibility of the reset truncating a transmitted
frame, disable egress and wait a reasonable time to reach a
packet boundary before resetting
- Update statistics before resetting
(The 'reasonable time' does not allow for CS/CD in half-duplex
mode, but half-duplex no longer seems reasonable!)
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If RX traffic is overflowing the FIFO or DMA ring, logging every time
this happens just makes things worse. These errors are visible in the
statistics anyway.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While being in an ERROR_WARNING state, and receiving further
bus error events with error counters still in the ERROR_WARNING
range of 97-127 inclusive, the state handling code erroneously
reverts back to ERROR_ACTIVE.
Per the CAN standard, only revert to ERROR_ACTIVE when the
error counters are less than 96.
Moreover, in certain Kvaser models, the BUS_ERROR flag is
always set along with undefined bits in the M16C status
register. Thus use bitwise operators instead of full equality
for checking that register against bus errors.
Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
On some x86 laptops, plugging a Kvaser device again after an
unplug makes the firmware always ignore the very first command.
For such a case, provide some room for retries instead of
completely exiting the driver init code.
Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Send expected argument to the URB completion hander: a CAN
netdevice instead of the network interface private context
`kvaser_usb_net_priv'.
This was discovered by having some garbage in the kernel
log in place of the netdevice names: can0 and can1.
Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Upon receiving a hardware event with the BUS_RESET flag set,
the driver kills all of its anchored URBs and resets all of
its transmit URB contexts.
Unfortunately it does so under the context of URB completion
handler `kvaser_usb_read_bulk_callback()', which is often
called in an atomic context.
While the device is flooded with many received error packets,
usb_kill_urb() typically sleeps/reschedules till the transfer
request of each killed URB in question completes, leading to
the sleep in atomic bug. [3]
In v2 submission of the original driver patch [1], it was
stated that the URBs kill and tx contexts reset was needed
since we don't receive any tx acknowledgments later and thus
such resources will be locked down forever. Fortunately this
is no longer needed since an earlier bugfix in this patch
series is now applied: all tx URB contexts are reset upon CAN
channel close. [2]
Moreover, a BUS_RESET is now treated _exactly_ like a BUS_OFF
event, which is the recommended handling method advised by
the device manufacturer.
[1] http://article.gmane.org/gmane.linux.network/239442
http://www.webcitation.org/6Vr2yagAQ
[2] can: kvaser_usb: Reset all URB tx contexts upon channel close
889b77f7fd2bcc922493d73a4c51d8a851505815
[3] Stacktrace:
<IRQ> [<ffffffff8158de87>] dump_stack+0x45/0x57
[<ffffffff8158b60c>] __schedule_bug+0x41/0x4f
[<ffffffff815904b1>] __schedule+0x5f1/0x700
[<ffffffff8159360a>] ? _raw_spin_unlock_irqrestore+0xa/0x10
[<ffffffff81590684>] schedule+0x24/0x70
[<ffffffff8147d0a5>] usb_kill_urb+0x65/0xa0
[<ffffffff81077970>] ? prepare_to_wait_event+0x110/0x110
[<ffffffff8147d7d8>] usb_kill_anchored_urbs+0x48/0x80
[<ffffffffa01f4028>] kvaser_usb_unlink_tx_urbs+0x18/0x50 [kvaser_usb]
[<ffffffffa01f45d0>] kvaser_usb_rx_error+0xc0/0x400 [kvaser_usb]
[<ffffffff8108b14a>] ? vprintk_default+0x1a/0x20
[<ffffffffa01f5241>] kvaser_usb_read_bulk_callback+0x4c1/0x5f0 [kvaser_usb]
[<ffffffff8147a73e>] __usb_hcd_giveback_urb+0x5e/0xc0
[<ffffffff8147a8a1>] usb_hcd_giveback_urb+0x41/0x110
[<ffffffffa0008748>] finish_urb+0x98/0x180 [ohci_hcd]
[<ffffffff810cd1a7>] ? acct_account_cputime+0x17/0x20
[<ffffffff81069f65>] ? local_clock+0x15/0x30
[<ffffffffa000a36b>] ohci_work+0x1fb/0x5a0 [ohci_hcd]
[<ffffffff814fbb31>] ? process_backlog+0xb1/0x130
[<ffffffffa000cd5b>] ohci_irq+0xeb/0x270 [ohci_hcd]
[<ffffffff81479fc1>] usb_hcd_irq+0x21/0x30
[<ffffffff8108bfd3>] handle_irq_event_percpu+0x43/0x120
[<ffffffff8108c0ed>] handle_irq_event+0x3d/0x60
[<ffffffff8108ec84>] handle_fasteoi_irq+0x74/0x110
[<ffffffff81004dfd>] handle_irq+0x1d/0x30
[<ffffffff81004727>] do_IRQ+0x57/0x100
[<ffffffff8159482a>] common_interrupt+0x6a/0x6a
Signed-off-by: Ahmed S. Darwish <ahmed.darwish@valeo.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
|
|
Merge misc fixes from Andrew Morton:
"Six fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
drivers/rtc/rtc-s5m.c: terminate s5m_rtc_id array with empty element
printk: add dummy routine for when CONFIG_PRINTK=n
mm/vmscan: fix highidx argument type
memcg: remove extra newlines from memcg oom kill log
x86, build: replace Perl script with Shell script
mm: page_alloc: embed OOM killing naturally into allocation slowpath
|
|
Commit 69ad0dd7af22b61d9e0e68e56b6290121618b0fb
Author: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Date: Mon May 19 13:59:59 2014 -0300
net: mv643xx_eth: Use dma_map_single() to map the skb fragments
caused a nasty regression by removing the support for highmem skb
fragments. By using page_address() to get the address of a fragment's
page, we are assuming a lowmem page. However, such assumption is incorrect,
as fragments can be in highmem pages, resulting in very nasty issues.
This commit fixes this by using the skb_frag_dma_map() helper,
which takes care of mapping the skb fragment properly. Additionally,
the type of mapping is now tracked, so it can be unmapped using
dma_unmap_page or dma_unmap_single when appropriate.
This commit also fixes the error path in txq_init() to release the
resources properly.
Fixes: 69ad0dd7af22 ("net: mv643xx_eth: Use dma_map_single() to map the skb fragments")
Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to stop the RX path accessing the RX ring while it's being
stopped or resized, we clear the interrupt mask (EESIPR) and then call
free_irq() or synchronise_irq(). This is insufficient because the
interrupt handler or NAPI poller may set EESIPR again after we clear
it. Also, in sh_eth_set_ringparam() we currently don't disable NAPI
polling at all.
I could easily trigger a crash by running the loop:
while ethtool -G eth0 rx 128 && ethtool -G eth0 rx 64; do echo -n .; done
and 'ping -f' toward the sh_eth port from another machine.
To fix this:
- Add a software flag (irq_enabled) to signal whether interrupts
should be enabled
- In the interrupt handler, if the flag is clear then clear EESIPR
and return
- In the NAPI poller, if the flag is clear then don't set EESIPR
- Set the flag before enabling interrupts in sh_eth_dev_init() and
sh_eth_set_ringparam()
- Clear the flag and serialise with the interrupt and NAPI
handlers before clearing EESIPR in sh_eth_close() and
sh_eth_set_ringparam()
After this, I could run the loop for 100,000 iterations successfully.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If the device is down then no packet buffers should be allocated.
We also must not touch its registers as it may be powered off.
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We must only ever stop TX queues when they are full or the net device
is not 'ready' so far as the net core, and specifically the watchdog,
is concerned. Otherwise, the watchdog may fire *immediately* if no
packets have been added to the queue in the last 5 seconds.
What's more, sh_eth_tx_timeout() will likely crash if called while
we're resizing the TX ring.
I could easily trigger this by running the loop:
while ethtool -G eth0 rx 128 && ethtool -G eth0 rx 64; do echo -n .; done
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If an skb to be transmitted is shorter than the minimum Ethernet frame
length, we currently set the DMA descriptor length to the minimum but
do not add zero-padding. This could result in leaking sensitive
data. We also pass different lengths to dma_map_single() and
dma_unmap_single().
Use skb_padto() to pad properly, before calling dma_map_single().
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|