summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2014-02-15libata: disable a disk via libata.force paramsRobin H. Johnson2-0/+3
commit b8bd6dc36186fe99afa7b73e9e2d9a98ad5c4865 upstream. A user on StackExchange had a failing SSD that's soldered directly onto the motherboard of his system. The BIOS does not give any option to disable it at all, so he can't just hide it from the OS via the BIOS. The old IDE layer had hdX=noprobe override for situations like this, but that was never ported to the libata layer. This patch implements a disable flag for libata.force. Example use: libata.force=2.0:disable [v2 of the patch, removed the nodisable flag per Tejun Heo] Signed-off-by: Robin H. Johnson <robbat2@gentoo.org> Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://unix.stackexchange.com/questions/102648/how-to-tell-linux-kernel-3-0-to-completely-ignore-a-failing-disk Link: http://askubuntu.com/questions/352836/how-can-i-tell-linux-kernel-to-completely-ignore-a-disk-as-if-it-was-not-even-co Link: http://superuser.com/questions/599333/how-to-disable-kernel-probing-for-drive [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15ftrace: Initialize the ftrace profiler for each possible cpuMiao Xie1-1/+1
commit c4602c1c818bd6626178d6d3fcc152d9f2f48ac0 upstream. Ftrace currently initializes only the online CPUs. This implementation has two problems: - If we online a CPU after we enable the function profile, and then run the test, we will lose the trace information on that CPU. Steps to reproduce: # echo 0 > /sys/devices/system/cpu/cpu1/online # cd <debugfs>/tracing/ # echo <some function name> >> set_ftrace_filter # echo 1 > function_profile_enabled # echo 1 > /sys/devices/system/cpu/cpu1/online # run test - If we offline a CPU before we enable the function profile, we will not clear the trace information when we enable the function profile. It will trouble the users. Steps to reproduce: # cd <debugfs>/tracing/ # echo <some function name> >> set_ftrace_filter # echo 1 > function_profile_enabled # run test # cat trace_stat/function* # echo 0 > /sys/devices/system/cpu/cpu1/online # echo 0 > function_profile_enabled # echo 1 > function_profile_enabled # cat trace_stat/function* # run test # cat trace_stat/function* So it is better that we initialize the ftrace profiler for each possible cpu every time we enable the function profile instead of just the online ones. Link: http://lkml.kernel.org/r/1387178401-10619-1-git-send-email-miaox@cn.fujitsu.com Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15radiotap: fix bitmap-end-finding buffer overrunJohannes Berg1-0/+4
commit bd02cd2549cfcdfc57cb5ce57ffc3feb94f70575 upstream. Evan Huus found (by fuzzing in wireshark) that the radiotap iterator code can access beyond the length of the buffer if the first bitmap claims an extension but then there's no data at all. Fix this. Reported-by: Evan Huus <eapache@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15gpio: msm: Fix irq mask/unmask by writing bits instead of numbersStephen Boyd1-2/+2
commit 4cc629b7a20945ce35628179180329b6bc9e552b upstream. We should be writing bits here but instead we're writing the numbers that correspond to the bits we want to write. Fix it by wrapping the numbers in the BIT() macro. This fixes gpios acting as interrupts. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15ALSA: hda - Add enable_msi=0 workaround for four HP machinesDavid Henningsson1-0/+4
commit 693e0cb052c607e2d41edf9e9f1fa99ff8c266c1 upstream. While enabling these machines, we found we would sometimes lose an interrupt if we change hardware volume during playback, and that disabling msi fixed this issue. (Losing the interrupt caused underruns and crackling audio, as the one second timeout is usually bigger than the period size.) The machines were all machines from HP, running AMD Hudson controller, and Realtek ALC282 codec. BugLink: https://bugs.launchpad.net/bugs/1260225 Signed-off-by: David Henningsson <david.henningsson@canonical.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15drm/radeon: Fix sideport problems on certain RS690 boardsAlex Deucher1-0/+10
commit 8333f0fe133be420ce3fcddfd568c3a559ab274e upstream. Some RS690 boards with 64MB of sideport memory show up as having 128MB sideport + 256MB of UMA. In this case, just skip the sideport memory and use UMA. This fixes rendering corruption and should improve performance. bug: https://bugs.freedesktop.org/show_bug.cgi?id=35457 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15iscsi-target: Fix-up all zero data-length CDBs with R/W_BIT setNicholas Bellinger1-14/+12
commit 4454b66cb67f14c33cd70ddcf0ff4985b26324b7 upstream. This patch changes special case handling for ISCSI_OP_SCSI_CMD where an initiator sends a zero length Expected Data Transfer Length (EDTL), but still sets the WRITE and/or READ flag bits when no payload transfer is requested. Many, many moons ago two special cases where added for an ancient version of ESX that has long since been fixed, so instead of adding a new special case for the reported bug with a Broadcom 57800 NIC, go ahead and always strip off the incorrect WRITE + READ flag bits. Also, avoid sending a reject here, as RFC-3720 does mandate this case be handled without protocol error. Reported-by: Witold Bazakbal <865perl@wp.pl> Tested-by: Witold Bazakbal <865perl@wp.pl> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15xhci: Limit the spurious wakeup fix only to HP machinesTakashi Iwai1-1/+6
commit 6962d914f317b119e0db7189199b21ec77a4b3e0 upstream. We've got regression reports that my previous fix for spurious wakeups after S5 on HP Haswell machines leads to the automatic reboot at shutdown on some machines. It turned out that the fix for one side triggers another BIOS bug in other side. So, it's exclusive. Since the original S5 wakeups have been confirmed only on HP machines, it'd be safer to apply it only to limited machines. As a wild guess, limiting to machines with HP PCI SSID should suffice. This patch should be backported to kernels as old as 3.12, that contain the commit 638298dc66ea36623dbc2757a24fc2c4ab41b016 "xhci: Fix spurious wakeups after S5 on Haswell". Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=66171 Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Tested-by: <dashing.meng@gmail.com> Reported-by: Niklas Schnelle <niklas@komani.de> Reported-by: Giorgos <ganastasiouGR@gmail.com> Reported-by: <art1@vhex.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15ext4: fix del_timer() misuse for ->s_err_reportAl Viro1-2/+2
commit 9105bb149bbbc555d2e11ba5166dfe7a24eae09e upstream. That thing should be del_timer_sync(); consider what happens if ext4_put_super() call of del_timer() happens to come just as it's getting run on another CPU. Since that timer reschedules itself to run next day, you are pretty much guaranteed that you'll end up with kfree'd scheduled timer, with usual fun consequences. AFAICS, that's -stable fodder all way back to 2010... [the second del_timer_sync() is almost certainly not needed, but it doesn't hurt either] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15ext2: Fix oops in ext2_get_block() called from ext2_quota_write()Jan Kara1-0/+1
commit df4e7ac0bb70abc97fbfd9ef09671fc084b3f9db upstream. ext2_quota_write() doesn't properly setup bh it passes to ext2_get_block() and thus we hit assertion BUG_ON(maxblocks == 0) in ext2_get_blocks() (or we could actually ask for mapping arbitrary number of blocks depending on whatever value was on stack). Fix ext2_quota_write() to properly fill in number of blocks to map. Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Christoph Hellwig <hch@lst.de> Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15ext4: check for overlapping extents in ext4_valid_extent_entries()Eryu Guan1-1/+18
commit 5946d089379a35dda0e531710b48fca05446a196 upstream. A corrupted ext4 may have out of order leaf extents, i.e. extent: lblk 0--1023, len 1024, pblk 9217, flags: LEAF UNINIT extent: lblk 1000--2047, len 1024, pblk 10241, flags: LEAF UNINIT ^^^^ overlap with previous extent Reading such extent could hit BUG_ON() in ext4_es_cache_extent(). BUG_ON(end < lblk); The problem is that __read_extent_tree_block() tries to cache holes as well but assumes 'lblk' is greater than 'prev' and passes underflowed length to ext4_es_cache_extent(). Fix it by checking for overlapping extents in ext4_valid_extent_entries(). I hit this when fuzz testing ext4, and am able to reproduce it by modifying the on-disk extent by hand. Also add the check for (ee_block + len - 1) in ext4_valid_extent() to make sure the value is not overflow. Ran xfstests on patched ext4 and no regression. Cc: Lukáš Czerner <lczerner@redhat.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15ext4: fix use-after-free in ext4_mb_new_blocksJunho Ryu1-3/+8
commit 4e8d2139802ce4f41936a687f06c560b12115247 upstream. ext4_mb_put_pa should hold pa->pa_lock before accessing pa->pa_count. While ext4_mb_use_preallocated checks pa->pa_deleted first and then increments pa->count later, ext4_mb_put_pa decrements pa->pa_count before holding pa->pa_lock and then sets pa->pa_deleted. * Free sequence ext4_mb_put_pa (1): atomic_dec_and_test pa->pa_count ext4_mb_put_pa (2): lock pa->pa_lock ext4_mb_put_pa (3): check pa->pa_deleted ext4_mb_put_pa (4): set pa->pa_deleted=1 ext4_mb_put_pa (5): unlock pa->pa_lock ext4_mb_put_pa (6): remove pa from a list ext4_mb_pa_callback: free pa * Use sequence ext4_mb_use_preallocated (1): iterate over preallocation ext4_mb_use_preallocated (2): lock pa->pa_lock ext4_mb_use_preallocated (3): check pa->pa_deleted ext4_mb_use_preallocated (4): increase pa->pa_count ext4_mb_use_preallocated (5): unlock pa->pa_lock ext4_mb_release_context: access pa * Use-after-free sequence [initial status] <pa->pa_deleted = 0, pa_count = 1> ext4_mb_use_preallocated (1): iterate over preallocation ext4_mb_use_preallocated (2): lock pa->pa_lock ext4_mb_use_preallocated (3): check pa->pa_deleted ext4_mb_put_pa (1): atomic_dec_and_test pa->pa_count [pa_count decremented] <pa->pa_deleted = 0, pa_count = 0> ext4_mb_use_preallocated (4): increase pa->pa_count [pa_count incremented] <pa->pa_deleted = 0, pa_count = 1> ext4_mb_use_preallocated (5): unlock pa->pa_lock ext4_mb_put_pa (2): lock pa->pa_lock ext4_mb_put_pa (3): check pa->pa_deleted ext4_mb_put_pa (4): set pa->pa_deleted=1 [race condition!] <pa->pa_deleted = 1, pa_count = 1> ext4_mb_put_pa (5): unlock pa->pa_lock ext4_mb_put_pa (6): remove pa from a list ext4_mb_pa_callback: free pa ext4_mb_release_context: access pa AddressSanitizer has detected use-after-free in ext4_mb_new_blocks Bug report: http://goo.gl/rG1On3 Signed-off-by: Junho Ryu <jayr@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15ext4: call ext4_error_inode() if jbd2_journal_dirty_metadata() failsTheodore Ts'o1-0/+4
commit ae1495b12df1897d4f42842a7aa7276d920f6290 upstream. While it's true that errors can only happen if there is a bug in jbd2_journal_dirty_metadata(), if a bug does happen, we need to halt the kernel or remount the file system read-only in order to avoid further data loss. The ext4_journal_abort_handle() function doesn't do any of this, and while it's likely that this call (since it doesn't adjust refcounts) will likely result in the file system eventually deadlocking since the current transaction will never be able to close, it's much cleaner to call let ext4's error handling system deal with this situation. There's a separate bug here which is that if certain jbd2 errors errors occur and file system is mounted errors=continue, the file system will probably eventually end grind to a halt as described above. But things have been this way in a long time, and usually when we have these sorts of errors it's pretty much a disaster --- and that's why the jbd2 layer aggressively retries memory allocations, which is the most likely cause of these jbd2 errors. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> [bwh: Backported to 3.2: drop logging of missing transaction debug data] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15libata: add ATA_HORKAGE_BROKEN_FPDMA_AA quirk for Seagate Momentus SpinPoint M8Michele Baldessari1-0/+3
commit 87809942d3fa60bafb7a58d0bdb1c79e90a6821d upstream. We've received multiple reports in Fedora via (BZ 907193) that the Seagate Momentus SpinPoint M8 errors out when enabling AA: [ 2.555905] ata2.00: failed to enable AA (error_mask=0x1) [ 2.568482] ata2.00: failed to enable AA (error_mask=0x1) Add the ATA_HORKAGE_BROKEN_FPDMA_AA for this specific harddisk. Reported-by: Nicholas <arealityfarbetween@googlemail.com> Signed-off-by: Michele Baldessari <michele@acksyn.org> Tested-by: Nicholas <arealityfarbetween@googlemail.com> Acked-by: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15sh: always link in helper functions extracted from libgccGeert Uytterhoeven1-1/+1
commit 84ed8a99058e61567f495cc43118344261641c5f upstream. E.g. landisk_defconfig, which has CONFIG_NTFS_FS=m: ERROR: "__ashrdi3" [fs/ntfs/ntfs.ko] undefined! For "lib-y", if no symbols in a compilation unit are referenced by other units, the compilation unit will not be included in vmlinux. This breaks modules that do reference those symbols. Use "obj-y" instead to fix this. http://kisskb.ellerman.id.au/kisskb/buildresult/8838077/ This doesn't fix all cases. There are others, e.g. udivsi3. This is also not limited to sh, many architectures handle this in the same way. A simple solution is to unconditionally include all helper functions. A more complex solution is to make the choice of "lib-y" or "obj-y" depend on CONFIG_MODULES: obj-$(CONFIG_MODULES) += ... lib-y($CONFIG_MODULES) += ... Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Paul Mundt <lethal@linux-sh.org> Tested-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com> Reviewed-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15ceph: wake up 'safe' waiters when unregistering requestYan, Zheng1-1/+2
commit fc55d2c9448b34218ca58733a6f51fbede09575b upstream. We also need to wake up 'safe' waiters if error occurs or request aborted. Otherwise sync(2)/fsync(2) may hang forever. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15ceph: cleanup aborted requests when re-sending requests.Yan, Zheng1-1/+4
commit eb1b8af33c2e42a9a57fc0a7588f4a7b255d2e79 upstream. Aborted requests usually get cleared when the reply is received. If MDS crashes, no reply will be received. So we need to cleanup aborted requests when re-sending requests. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15mm: ensure get_unmapped_area() returns higher address than mmap_min_addrAkira Takeuchi1-7/+8
commit 2afc745f3e3079ab16c826be4860da2529054dd2 upstream. This patch fixes the problem that get_unmapped_area() can return illegal address and result in failing mmap(2) etc. In case that the address higher than PAGE_SIZE is set to /proc/sys/vm/mmap_min_addr, the address lower than mmap_min_addr can be returned by get_unmapped_area(), even if you do not pass any virtual address hint (i.e. the second argument). This is because the current get_unmapped_area() code does not take into account mmap_min_addr. This leads to two actual problems as follows: 1. mmap(2) can fail with EPERM on the process without CAP_SYS_RAWIO, although any illegal parameter is not passed. 2. The bottom-up search path after the top-down search might not work in arch_get_unmapped_area_topdown(). Note: The first and third chunk of my patch, which changes "len" check, are for more precise check using mmap_min_addr, and not for solving the above problem. [How to reproduce] --- test.c ------------------------------------------------- #include <stdio.h> #include <unistd.h> #include <sys/mman.h> #include <sys/errno.h> int main(int argc, char *argv[]) { void *ret = NULL, *last_map; size_t pagesize = sysconf(_SC_PAGESIZE); do { last_map = ret; ret = mmap(0, pagesize, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); // printf("ret=%p\n", ret); } while (ret != MAP_FAILED); if (errno != ENOMEM) { printf("ERR: unexpected errno: %d (last map=%p)\n", errno, last_map); } return 0; } --------------------------------------------------------------- $ gcc -m32 -o test test.c $ sudo sysctl -w vm.mmap_min_addr=65536 vm.mmap_min_addr = 65536 $ ./test (run as non-priviledge user) ERR: unexpected errno: 1 (last map=0x10000) Signed-off-by: Akira Takeuchi <takeuchi.akr@jp.panasonic.com> Signed-off-by: Kiyoshi Owada <owada.kiyoshi@jp.panasonic.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: As we do not have vm_unmapped_area(), make arch_get_unmapped_area_topdown() calculate the lower limit for the new area's end address and then compare addresses with this instead of with len. In the process, fix an off-by-one error which could result in returning 0 if mm->mmap_base == len.] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15x86, fpu, amd: Clear exceptions in AMD FXSAVE workaroundLinus Torvalds1-6/+7
commit 26bef1318adc1b3a530ecc807ef99346db2aa8b0 upstream. Before we do an EMMS in the AMD FXSAVE information leak workaround we need to clear any pending exceptions, otherwise we trap with a floating-point exception inside this code. Reported-by: halfdog <me@halfdog.net> Tested-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=YLXL-yr7oMxyy0WU2gCBAf3ydg@mail.gmail.com Signed-off-by: H. Peter Anvin <hpa@zytor.com> [bwh: Backported to 3.2: adjust filename, context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368)Andy Honig3-44/+15
commit fda4e2e85589191b123d31cdc21fd33ee70f50fd upstream. In kvm_lapic_sync_from_vapic and kvm_lapic_sync_to_vapic there is the potential to corrupt kernel memory if userspace provides an address that is at the end of a page. This patches concerts those functions to use kvm_write_guest_cached and kvm_read_guest_cached. It also checks the vapic_address specified by userspace during ioctl processing and returns an error to userspace if the address is not a valid GPA. This is generally not guest triggerable, because the required write is done by firmware that runs before the guest. Also, it only affects AMD processors and oldish Intel that do not have the FlexPriority feature (unless you disable FlexPriority, of course; then newer processors are also affected). Fixes: b93463aa59d6 ('KVM: Accelerated apic support') Reported-by: Andrew Honig <ahonig@google.com> Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [dannf: backported to Debian's 3.2] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15ath9k_htc: properly set MAC address and BSSID maskMathy Vanhoef2-10/+20
commit 657eb17d87852c42b55c4b06d5425baa08b2ddb3 upstream. Pick the MAC address of the first virtual interface as the new hardware MAC address. Set BSSID mask according to this MAC address. This fixes CVE-2013-4579. Signed-off-by: Mathy Vanhoef <vanhoefm@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15hpfs: fix warnings when the filesystem fills upMikulas Patocka1-1/+4
commit bbd465df73f0d8ba41b8a0732766a243d0f5b356 upstream. This patch fixes warnings due to missing lock on write error path. WARNING: at fs/hpfs/hpfs_fn.h:353 hpfs_truncate+0x75/0x80 [hpfs]() Hardware name: empty Pid: 26563, comm: dd Tainted: P O 3.9.4 #12 Call Trace: hpfs_truncate+0x75/0x80 [hpfs] hpfs_write_begin+0x84/0x90 [hpfs] _hpfs_bmap+0x10/0x10 [hpfs] generic_file_buffered_write+0x121/0x2c0 __generic_file_aio_write+0x1c7/0x3f0 generic_file_aio_write+0x7c/0x100 do_sync_write+0x98/0xd0 hpfs_file_write+0xd/0x50 [hpfs] vfs_write+0xa2/0x160 sys_write+0x51/0xa0 page_fault+0x22/0x30 system_call_fastpath+0x1a/0x1f Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [Mikulas Patocka: This is backport of upstream commit bbd465df73f0d8ba41b8a0732766a243d0f5b356, modified for stable kernels 2.6.39 - 3.7.] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15Fix warning from machine_kexec.cPaul Gortmaker1-1/+1
commit c19ce0ab53ad9698968a154647f3dc22aad6c45b upstream. Use proper cpp defined(...) constructs to avoid this: arch/ia64/kernel/machine_kexec.c: In function 'arch_crash_save_vmcoreinfo': arch/ia64/kernel/machine_kexec.c:160:8: warning: "CONFIG_PGTABLE_4" is not defined Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15staging: comedi: cb_pcidio: fix for newer PCI-DIO48HIan Abbott1-13/+8
commit 0283f7a100882684ad32b768f9f1ad81658a0b92 upstream. At some point, Measurement Computing / ComputerBoards redesigned the PCI-DIO48H to use a PLX PCI interface chip instead of an AMCC chip. This meant they had to put their hardware registers in the PCI BAR 2 region instead of PCI BAR 1. Unfortunately, they kept the same PCI device ID for the new design. This means the driver recognizes the newer cards, but doesn't work (and is likely to screw up the local configuration registers of the PLX chip) because it's using the wrong region. Since all the supported boards have the DIO registers in the PCI BAR 2 region except for older PCI-DIO48H boards which have an empty PCI BAR 2 region and the DIO registers in PCI BAR 1, determine which PCI BAR region to use based on whether the PCI BAR 2 region is empty or not. This change makes the `dioregs_badrindex` member of `struct pcidio_board` redundant. The `pcicontroler_badrindex` member is also unused, so remove both members. Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Cc: kernel-team@lists.ubuntu.com Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfullyJianguo Wu1-4/+12
commit a49ecbcd7b0d5a1cda7d60e03df402dd0ef76ac8 upstream. After a successful hugetlb page migration by soft offline, the source page will either be freed into hugepage_freelists or buddy(over-commit page). If page is in buddy, page_hstate(page) will be NULL. It will hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page(). BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 IP: [<ffffffff81163761>] dequeue_hwpoisoned_huge_page+0x131/0x1d0 PGD c23762067 PUD c24be2067 PMD 0 Oops: 0000 [#1] SMP So check PageHuge(page) after call migrate_pages() successfully. Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Tested-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [wujg: backport to 3.4: - adjust context - s/num_poisoned_pages/mce_bad_pages/] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15PCI: Enable ARI if dev and upstream bridge support it; disable otherwiseYijing Wang1-7/+7
commit b0cc6020e1cc62f1253215f189611b34be4a83c7 upstream. Currently, we enable ARI in a device's upstream bridge if the bridge and the device support it. But we never disable ARI, even if the device is removed and replaced with a device that doesn't support ARI. This means that if we hot-remove an ARI device and replace it with a non-ARI multi-function device, we find only function 0 of the new device because the upstream bridge still has ARI enabled, and next_ari_fn() only returns function 0 for the new non-ARI device. This patch disables ARI in the upstream bridge if the device doesn't support ARI. See the PCIe spec, r3.0, sec 6.13. [bhelgaas: changelog, function comment] [yijing: replace PCIe Cap accessor with legacy PCI accessor] Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15xfs: Account log unmount transaction correctlyDave Chinner1-1/+2
commit 3948659e30808fbaa7673bbe89de2ae9769e20a7 upstream. There have been a few reports of this warning appearing recently: XFS (dm-4): xlog_space_left: head behind tail tail_cycle = 129, tail_bytes = 20163072 GH cycle = 129, GH bytes = 20162880 The common cause appears to be lots of freeze and unfreeze cycles, and the output from the warnings indicates that we are leaking around 8 bytes of log space per freeze/unfreeze cycle. When we freeze the filesystem, we write an unmount record and that uses xlog_write directly - a special type of transaction, effectively. What it doesn't do, however, is correctly account for the log space it uses. The unmount record writes an 8 byte structure with a special magic number into the log, and the space this consumes is not accounted for in the log ticket tracking the operation. Hence we leak 8 bytes every unmount record that is written. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15net: avoid reference counter overflows on fib_rules in multicast forwardingHannes Frederic Sowa2-4/+10
[ Upstream commit 95f4a45de1a0f172b35451fc52283290adb21f6e ] Bob Falken reported that after 4G packets, multicast forwarding stopped working. This was because of a rule reference counter overflow which freed the rule as soon as the overflow happend. This patch solves this by adding the FIB_LOOKUP_NOREF flag to fib_rules_lookup calls. This is safe even from non-rcu locked sections as in this case the flag only implies not taking a reference to the rule, which we don't need at all. Rules only hold references to the namespace, which are guaranteed to be available during the call of the non-rcu protected function reg_vif_xmit because of the interface reference which itself holds a reference to the net namespace. Fixes: f0ad0860d01e47 ("ipv4: ipmr: support multiple tables") Fixes: d1db275dd3f6e4 ("ipv6: ip6mr: support multiple tables") Reported-by: Bob Falken <NetFestivalHaveFun@gmx.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Thomas Graf <tgraf@suug.ch> Cc: Julian Anastasov <ja@ssi.bg> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15inet_diag: fix inet_diag_dump_icsk() timewait socket state logicNeal Cardwell1-1/+3
[ Based upon upstream commit 70315d22d3c7383f9a508d0aab21e2eb35b2303a ] Fix inet_diag_dump_icsk() to reflect the fact that both TIME_WAIT and FIN_WAIT2 connections are represented by inet_timewait_sock (not just TIME_WAIT). Thus: (a) We need to iterate through the time_wait buckets if the user wants either TIME_WAIT or FIN_WAIT2. (Before fixing this, "ss -nemoi state fin-wait-2" would not return any sockets, even if there were some in FIN_WAIT2.) (b) We need to check tw_substate to see if the user wants to dump sockets in the particular substate (TIME_WAIT or FIN_WAIT2) that a given connection is in. (Before fixing this, "ss -nemoi state time-wait" would actually return sockets in state FIN_WAIT2.) An analogous fix is in v3.13: 70315d22d3c7383f9a508d0aab21e2eb35b2303a ("inet_diag: fix inet_diag_dump_icsk() to use correct state for timewait sockets") but that patch is quite different because 3.13 code is very different in this area due to the unification of TCP hash tables in 05dbc7b ("tcp/dccp: remove twchain") in v3.13-rc1. I tested that this applies cleanly between v3.3 and v3.12, and tested that it works in both 3.3 and 3.12. It does not apply cleanly to 3.2 and earlier (though it makes semantic sense), and semantically is not the right fix for 3.13 and beyond (as mentioned above). Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15bnx2x: fix DMA unmapping of TSO split BDsMichal Schmidt1-4/+9
[ Upstream commit 95e92fd40c967c363ad66b2fd1ce4dcd68132e54 ] bnx2x triggers warnings with CONFIG_DMA_API_DEBUG=y: WARNING: CPU: 0 PID: 2253 at lib/dma-debug.c:887 check_unmap+0xf8/0x920() bnx2x 0000:28:00.0: DMA-API: device driver frees DMA memory with different size [device address=0x00000000da2b389e] [map size=1490 bytes] [unmap size=66 bytes] The reason is that bnx2x splits a TSO BD into two BDs (headers + data) using one DMA mapping for both, but it uses only the length of the first BD when unmapping. This patch fixes the bug by unmapping the whole length of the two BDs. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15bridge: use spin_lock_bh() in br_multicast_set_hash_maxCurt Brune1-2/+2
[ Upstream commit fe0d692bbc645786bce1a98439e548ae619269f5 ] br_multicast_set_hash_max() is called from process context in net/bridge/br_sysfs_br.c by the sysfs store_hash_max() function. br_multicast_set_hash_max() calls spin_lock(&br->multicast_lock), which can deadlock the CPU if a softirq that also tries to take the same lock interrupts br_multicast_set_hash_max() while the lock is held . This can happen quite easily when any of the bridge multicast timers expire, which try to take the same lock. The fix here is to use spin_lock_bh(), preventing other softirqs from executing on this CPU. Steps to reproduce: 1. Create a bridge with several interfaces (I used 4). 2. Set the "multicast query interval" to a low number, like 2. 3. Enable the bridge as a multicast querier. 4. Repeatedly set the bridge hash_max parameter via sysfs. # brctl addbr br0 # brctl addif br0 eth1 eth2 eth3 eth4 # brctl setmcqi br0 2 # brctl setmcquerier br0 1 # while true ; do echo 4096 > /sys/class/net/br0/bridge/hash_max; done Signed-off-by: Curt Brune <curt@cumulusnetworks.com> Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15net: llc: fix use after free in llc_ui_recvmsgDaniel Borkmann1-2/+3
[ Upstream commit 4d231b76eef6c4a6bd9c96769e191517765942cb ] While commit 30a584d944fb fixes datagram interface in LLC, a use after free bug has been introduced for SOCK_STREAM sockets that do not make use of MSG_PEEK. The flow is as follow ... if (!(flags & MSG_PEEK)) { ... sk_eat_skb(sk, skb, false); ... } ... if (used + offset < skb->len) continue; ... where sk_eat_skb() calls __kfree_skb(). Therefore, cache original length and work on skb_len to check partial reads. Fixes: 30a584d944fb ("[LLX]: SOCK_DGRAM interface fixes") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15vlan: Fix header ops passthru when doing TX VLAN offload.David S. Miller2-1/+26
[ Upstream commit 2205369a314e12fcec4781cc73ac9c08fc2b47de ] When the vlan code detects that the real device can do TX VLAN offloads in hardware, it tries to arrange for the real device's header_ops to be invoked directly. But it does so illegally, by simply hooking the real device's header_ops up to the VLAN device. This doesn't work because we will end up invoking a set of header_ops routines which expect a device type which matches the real device, but will see a VLAN device instead. Fix this by providing a pass-thru set of header_ops which will arrange to pass the proper real device instead. To facilitate this add a dev_rebuild_header(). There are implementations which provide a ->cache and ->create but not a ->rebuild (f.e. PLIP). So we need a helper function just like dev_hard_header() to avoid crashes. Use this helper in the one existing place where the header_ops->rebuild was being invoked, the neighbour code. With lots of help from Florian Westphal. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15net: rose: restore old recvmsg behaviorFlorian Westphal1-12/+4
[ Upstream commit f81152e35001e91997ec74a7b4e040e6ab0acccf ] recvmsg handler in net/rose/af_rose.c performs size-check ->msg_namelen. After commit f3d3342602f8bcbf37d7c46641cb9bca7618eb1c (net: rework recvmsg handler msg_name and msg_namelen logic), we now always take the else branch due to namelen being initialized to 0. Digging in netdev-vger-cvs git repo shows that msg_namelen was initialized with a fixed-size since at least 1995, so the else branch was never taken. Compile tested only. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15rds: prevent dereference of a NULL deviceSasha Levin1-1/+2
[ Upstream commit c2349758acf1874e4c2b93fe41d072336f1a31d0 ] Binding might result in a NULL device, which is dereferenced causing this BUG: [ 1317.260548] BUG: unable to handle kernel NULL pointer dereference at 000000000000097 4 [ 1317.261847] IP: [<ffffffff84225f52>] rds_ib_laddr_check+0x82/0x110 [ 1317.263315] PGD 418bcb067 PUD 3ceb21067 PMD 0 [ 1317.263502] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 1317.264179] Dumping ftrace buffer: [ 1317.264774] (ftrace buffer empty) [ 1317.265220] Modules linked in: [ 1317.265824] CPU: 4 PID: 836 Comm: trinity-child46 Tainted: G W 3.13.0-rc4- next-20131218-sasha-00013-g2cebb9b-dirty #4159 [ 1317.267415] task: ffff8803ddf33000 ti: ffff8803cd31a000 task.ti: ffff8803cd31a000 [ 1317.268399] RIP: 0010:[<ffffffff84225f52>] [<ffffffff84225f52>] rds_ib_laddr_check+ 0x82/0x110 [ 1317.269670] RSP: 0000:ffff8803cd31bdf8 EFLAGS: 00010246 [ 1317.270230] RAX: 0000000000000000 RBX: ffff88020b0dd388 RCX: 0000000000000000 [ 1317.270230] RDX: ffffffff8439822e RSI: 00000000000c000a RDI: 0000000000000286 [ 1317.270230] RBP: ffff8803cd31be38 R08: 0000000000000000 R09: 0000000000000000 [ 1317.270230] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 [ 1317.270230] R13: 0000000054086700 R14: 0000000000a25de0 R15: 0000000000000031 [ 1317.270230] FS: 00007ff40251d700(0000) GS:ffff88022e200000(0000) knlGS:000000000000 0000 [ 1317.270230] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1317.270230] CR2: 0000000000000974 CR3: 00000003cd478000 CR4: 00000000000006e0 [ 1317.270230] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1317.270230] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602 [ 1317.270230] Stack: [ 1317.270230] 0000000054086700 5408670000a25de0 5408670000000002 0000000000000000 [ 1317.270230] ffffffff84223542 00000000ea54c767 0000000000000000 ffffffff86d26160 [ 1317.270230] ffff8803cd31be68 ffffffff84223556 ffff8803cd31beb8 ffff8800c6765280 [ 1317.270230] Call Trace: [ 1317.270230] [<ffffffff84223542>] ? rds_trans_get_preferred+0x42/0xa0 [ 1317.270230] [<ffffffff84223556>] rds_trans_get_preferred+0x56/0xa0 [ 1317.270230] [<ffffffff8421c9c3>] rds_bind+0x73/0xf0 [ 1317.270230] [<ffffffff83e4ce62>] SYSC_bind+0x92/0xf0 [ 1317.270230] [<ffffffff812493f8>] ? context_tracking_user_exit+0xb8/0x1d0 [ 1317.270230] [<ffffffff8119313d>] ? trace_hardirqs_on+0xd/0x10 [ 1317.270230] [<ffffffff8107a852>] ? syscall_trace_enter+0x32/0x290 [ 1317.270230] [<ffffffff83e4cece>] SyS_bind+0xe/0x10 [ 1317.270230] [<ffffffff843a6ad0>] tracesys+0xdd/0xe2 [ 1317.270230] Code: 00 8b 45 cc 48 8d 75 d0 48 c7 45 d8 00 00 00 00 66 c7 45 d0 02 00 89 45 d4 48 89 df e8 78 49 76 ff 41 89 c4 85 c0 75 0c 48 8b 03 <80> b8 74 09 00 00 01 7 4 06 41 bc 9d ff ff ff f6 05 2a b6 c2 02 [ 1317.270230] RIP [<ffffffff84225f52>] rds_ib_laddr_check+0x82/0x110 [ 1317.270230] RSP <ffff8803cd31bdf8> [ 1317.270230] CR2: 0000000000000974 Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15hamradio/yam: fix info leak in ioctlSalva Peiró1-0/+1
[ Upstream commit 8e3fbf870481eb53b2d3a322d1fc395ad8b367ed ] The yam_ioctl() code fails to initialise the cmd field of the struct yamdrv_ioctl_cfg. Add an explicit memset(0) before filling the structure to avoid the 4-byte info leak. Signed-off-by: Salva Peiró <speiro@ai2.upv.es> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15drivers/net/hamradio: Integer overflow in hdlcdrv_ioctl()Wenliang Fan1-0/+2
[ Upstream commit e9db5c21d3646a6454fcd04938dd215ac3ab620a ] The local variable 'bi' comes from userspace. If userspace passed a large number to 'bi.data.calibrate', there would be an integer overflow in the following line: s->hdlctx.calibrate = bi.data.calibrate * s->par.bitrate / 16; Signed-off-by: Wenliang Fan <fanwlexca@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15net: inet_diag: zero out uninitialized idiag_{src,dst} fieldsDaniel Borkmann1-0/+16
[ Upstream commit b1aac815c0891fe4a55a6b0b715910142227700f ] Jakub reported while working with nlmon netlink sniffer that parts of the inet_diag_sockid are not initialized when r->idiag_family != AF_INET6. That is, fields of r->id.idiag_src[1 ... 3], r->id.idiag_dst[1 ... 3]. In fact, it seems that we can leak 6 * sizeof(u32) byte of kernel [slab] memory through this. At least, in udp_dump_one(), we allocate a skb in ... rep = nlmsg_new(sizeof(struct inet_diag_msg) + ..., GFP_KERNEL); ... and then pass that to inet_sk_diag_fill() that puts the whole struct inet_diag_msg into the skb, where we only fill out r->id.idiag_src[0], r->id.idiag_dst[0] and leave the rest untouched: r->id.idiag_src[0] = inet->inet_rcv_saddr; r->id.idiag_dst[0] = inet->inet_daddr; struct inet_diag_msg embeds struct inet_diag_sockid that is correctly / fully filled out in IPv6 case, but for IPv4 not. So just zero them out by using plain memset (for this little amount of bytes it's probably not worth the extra check for idiag_family == AF_INET). Similarly, fix also other places where we fill that out. Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15net: unix: allow bind to fail on mutex lockSasha Levin1-2/+6
[ Upstream commit 37ab4fa7844a044dc21fde45e2a0fc2f3c3b6490 ] This is similar to the set_peek_off patch where calling bind while the socket is stuck in unix_dgram_recvmsg() will block and cause a hung task spew after a while. This is also the last place that did a straightforward mutex_lock(), so there shouldn't be any more of these patches. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15tg3: Initialize REG_BASE_ADDR at PCI config offset 120 to 0Nat Gurumoorthy1-0/+3
[ Upstream commit 388d3335575f4c056dcf7138a30f1454e2145cd8 ] The new tg3 driver leaves REG_BASE_ADDR (PCI config offset 120) uninitialized. From power on reset this register may have garbage in it. The Register Base Address register defines the device local address of a register. The data pointed to by this location is read or written using the Register Data register (PCI config offset 128). When REG_BASE_ADDR has garbage any read or write of Register Data Register (PCI 128) will cause the PCI bus to lock up. The TCO watchdog will fire and bring down the system. Signed-off-by: Nat Gurumoorthy <natg@google.com> Acked-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15net: drop_monitor: fix the value of maxattrChangli Gao1-1/+0
[ Upstream commit d323e92cc3f4edd943610557c9ea1bb4bb5056e8 ] maxattr in genl_family should be used to save the max attribute type, but not the max command type. Drop monitor doesn't support any attributes, so we should leave it as zero. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15ipv6: don't count addrconf generated routes against gc limitHannes Frederic Sowa1-6/+2
[ Upstream commit a3300ef4bbb1f1e33ff0400e1e6cf7733d988f4f ] Brett Ciphery reported that new ipv6 addresses failed to get installed because the addrconf generated dsts where counted against the dst gc limit. We don't need to count those routes like we currently don't count administratively added routes. Because the max_addresses check enforces a limit on unbounded address generation first in case someone plays with router advertisments, we are still safe here. Reported-by: Brett Ciphery <brett.ciphery@windriver.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15rds: prevent BUG_ON triggered on congestion update to loopbackVenkat Venkatsubra1-3/+2
[ Upstream commit 18fc25c94eadc52a42c025125af24657a93638c0 ] After congestion update on a local connection, when rds_ib_xmit returns less bytes than that are there in the message, rds_send_xmit calls back rds_ib_xmit with an offset that causes BUG_ON(off & RDS_FRAG_SIZE) to trigger. For a 4Kb PAGE_SIZE rds_ib_xmit returns min(8240,4096)=4096 when actually the message contains 8240 bytes. rds_send_xmit thinks there is more to send and calls rds_ib_xmit again with a data offset "off" of 4096-48(rds header) =4048 bytes thus hitting the BUG_ON(off & RDS_FRAG_SIZE) [RDS_FRAG_SIZE=4k]. The commit 6094628bfd94323fc1cea05ec2c6affd98c18f7f "rds: prevent BUG_ON triggering on congestion map updates" introduced this regression. That change was addressing the triggering of a different BUG_ON in rds_send_xmit() on PowerPC architecture with 64Kbytes PAGE_SIZE: BUG_ON(ret != 0 && conn->c_xmit_sg == rm->data.op_nents); This was the sequence it was going through: (rds_ib_xmit) /* Do not send cong updates to IB loopback */ if (conn->c_loopback && rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) { rds_cong_map_updated(conn->c_fcong, ~(u64) 0); return sizeof(struct rds_header) + RDS_CONG_MAP_BYTES; } rds_ib_xmit returns 8240 rds_send_xmit: c_xmit_data_off = 0 + 8240 - 48 (rds header accounted only the first time) = 8192 c_xmit_data_off < 65536 (sg->length), so calls rds_ib_xmit again rds_ib_xmit returns 8240 rds_send_xmit: c_xmit_data_off = 8192 + 8240 = 16432, calls rds_ib_xmit again and so on (c_xmit_data_off 24672,32912,41152,49392,57632) rds_ib_xmit returns 8240 On this iteration this sequence causes the BUG_ON in rds_send_xmit: while (ret) { tmp = min_t(int, ret, sg->length - conn->c_xmit_data_off); [tmp = 65536 - 57632 = 7904] conn->c_xmit_data_off += tmp; [c_xmit_data_off = 57632 + 7904 = 65536] ret -= tmp; [ret = 8240 - 7904 = 336] if (conn->c_xmit_data_off == sg->length) { conn->c_xmit_data_off = 0; sg++; conn->c_xmit_sg++; BUG_ON(ret != 0 && conn->c_xmit_sg == rm->data.op_nents); [c_xmit_sg = 1, rm->data.op_nents = 1] What the current fix does: Since the congestion update over loopback is not actually transmitted as a message, all that rds_ib_xmit needs to do is let the caller think the full message has been transmitted and not return partial bytes. It will return 8240 (RDS_CONG_MAP_BYTES+48) when PAGE_SIZE is 4Kb. And 64Kb+48 when page size is 64Kb. Reported-by: Josh Hunt <joshhunt00@gmail.com> Tested-by: Honggang Li <honli@redhat.com> Acked-by: Bang Nguyen <bang.nguyen@oracle.com> Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-02-15net: do not pretend FRAGLIST supportEric Dumazet3-3/+2
[ Upstream commit 28e24c62ab3062e965ef1b3bcc244d50aee7fa85 ] Few network drivers really supports frag_list : virtual drivers. Some drivers wrongly advertise NETIF_F_FRAGLIST feature. If skb with a frag_list is given to them, packet on the wire will be corrupt. Remove this flag, as core networking stack will make sure to provide packets that can be sent without corruption. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Cc: Anirudha Sarangi <anirudh@xilinx.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-01-03Linux 3.2.54v3.2.54Ben Hutchings1-1/+1
2014-01-03mmc: block: fix a bug of error handling in MMC driverKOBAYASHI Yoshitake1-3/+44
commit c8760069627ad3b0dbbea170f0c4c58b16e18d3d upstream. Current MMC driver doesn't handle generic error (bit19 of device status) in write sequence. As a result, write data gets lost when generic error occurs. For example, a generic error when updating a filesystem management information causes a loss of write data and corrupts the filesystem. In the worst case, the system will never boot. This patch includes the following functionality: 1. To enable error checking for the response of CMD12 and CMD13 in write command sequence 2. To retry write sequence when a generic error occurs Messages are added for v2 to show what occurs. [Backported to 3.4-stable] Signed-off-by: KOBAYASHI Yoshitake <yoshitake.kobayashi@toshiba.co.jp> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-01-03ftrace: Fix function graph with loading of modulesSteven Rostedt (Red Hat)1-34/+34
commit 8a56d7761d2d041ae5e8215d20b4167d8aa93f51 upstream. Commit 8c4f3c3fa9681 "ftrace: Check module functions being traced on reload" fixed module loading and unloading with respect to function tracing, but it missed the function graph tracer. If you perform the following # cd /sys/kernel/debug/tracing # echo function_graph > current_tracer # modprobe nfsd # echo nop > current_tracer You'll get the following oops message: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 2910 at /linux.git/kernel/trace/ftrace.c:1640 __ftrace_hash_rec_update.part.35+0x168/0x1b9() Modules linked in: nfsd exportfs nfs_acl lockd ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt CPU: 2 PID: 2910 Comm: bash Not tainted 3.13.0-rc1-test #7 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007 0000000000000668 ffff8800787efcf8 ffffffff814fe193 ffff88007d500000 0000000000000000 ffff8800787efd38 ffffffff8103b80a 0000000000000668 ffffffff810b2b9a ffffffff81a48370 0000000000000001 ffff880037aea000 Call Trace: [<ffffffff814fe193>] dump_stack+0x4f/0x7c [<ffffffff8103b80a>] warn_slowpath_common+0x81/0x9b [<ffffffff810b2b9a>] ? __ftrace_hash_rec_update.part.35+0x168/0x1b9 [<ffffffff8103b83e>] warn_slowpath_null+0x1a/0x1c [<ffffffff810b2b9a>] __ftrace_hash_rec_update.part.35+0x168/0x1b9 [<ffffffff81502f89>] ? __mutex_lock_slowpath+0x364/0x364 [<ffffffff810b2cc2>] ftrace_shutdown+0xd7/0x12b [<ffffffff810b47f0>] unregister_ftrace_graph+0x49/0x78 [<ffffffff810c4b30>] graph_trace_reset+0xe/0x10 [<ffffffff810bf393>] tracing_set_tracer+0xa7/0x26a [<ffffffff810bf5e1>] tracing_set_trace_write+0x8b/0xbd [<ffffffff810c501c>] ? ftrace_return_to_handler+0xb2/0xde [<ffffffff811240a8>] ? __sb_end_write+0x5e/0x5e [<ffffffff81122aed>] vfs_write+0xab/0xf6 [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85 [<ffffffff81122dbd>] SyS_write+0x59/0x82 [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85 [<ffffffff8150a2d2>] system_call_fastpath+0x16/0x1b ---[ end trace 940358030751eafb ]--- The above mentioned commit didn't go far enough. Well, it covered the function tracer by adding checks in __register_ftrace_function(). The problem is that the function graph tracer circumvents that (for a slight efficiency gain when function graph trace is running with a function tracer. The gain was not worth this). The problem came with ftrace_startup() which should always be called after __register_ftrace_function(), if you want this bug to be completely fixed. Anyway, this solution moves __register_ftrace_function() inside of ftrace_startup() and removes the need to call them both. Reported-by: Dave Wysochanski <dwysocha@redhat.com> Fixes: ed926f9b35cd ("ftrace: Use counters to enable functions to trace") Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-01-03ftrace: Check module functions being traced on reloadSteven Rostedt (Red Hat)1-9/+61
commit 8c4f3c3fa9681dc549cd35419b259496082fef8b upstream. There's been a nasty bug that would show up and not give much info. The bug displayed the following warning: WARNING: at kernel/trace/ftrace.c:1529 __ftrace_hash_rec_update+0x1e3/0x230() Pid: 20903, comm: bash Tainted: G O 3.6.11+ #38405.trunk Call Trace: [<ffffffff8103e5ff>] warn_slowpath_common+0x7f/0xc0 [<ffffffff8103e65a>] warn_slowpath_null+0x1a/0x20 [<ffffffff810c2ee3>] __ftrace_hash_rec_update+0x1e3/0x230 [<ffffffff810c4f28>] ftrace_hash_move+0x28/0x1d0 [<ffffffff811401cc>] ? kfree+0x2c/0x110 [<ffffffff810c68ee>] ftrace_regex_release+0x8e/0x150 [<ffffffff81149f1e>] __fput+0xae/0x220 [<ffffffff8114a09e>] ____fput+0xe/0x10 [<ffffffff8105fa22>] task_work_run+0x72/0x90 [<ffffffff810028ec>] do_notify_resume+0x6c/0xc0 [<ffffffff8126596e>] ? trace_hardirqs_on_thunk+0x3a/0x3c [<ffffffff815c0f88>] int_signal+0x12/0x17 ---[ end trace 793179526ee09b2c ]--- It was finally narrowed down to unloading a module that was being traced. It was actually more than that. When functions are being traced, there's a table of all functions that have a ref count of the number of active tracers attached to that function. When a function trace callback is registered to a function, the function's record ref count is incremented. When it is unregistered, the function's record ref count is decremented. If an inconsistency is detected (ref count goes below zero) the above warning is shown and the function tracing is permanently disabled until reboot. The ftrace callback ops holds a hash of functions that it filters on (and/or filters off). If the hash is empty, the default means to filter all functions (for the filter_hash) or to disable no functions (for the notrace_hash). When a module is unloaded, it frees the function records that represent the module functions. These records exist on their own pages, that is function records for one module will not exist on the same page as function records for other modules or even the core kernel. Now when a module unloads, the records that represents its functions are freed. When the module is loaded again, the records are recreated with a default ref count of zero (unless there's a callback that traces all functions, then they will also be traced, and the ref count will be incremented). The problem is that if an ftrace callback hash includes functions of the module being unloaded, those hash entries will not be removed. If the module is reloaded in the same location, the hash entries still point to the functions of the module but the module's ref counts do not reflect that. With the help of Steve and Joern, we found a reproducer: Using uinput module and uinput_release function. cd /sys/kernel/debug/tracing modprobe uinput echo uinput_release > set_ftrace_filter echo function > current_tracer rmmod uinput modprobe uinput # check /proc/modules to see if loaded in same addr, otherwise try again echo nop > current_tracer [BOOM] The above loads the uinput module, which creates a table of functions that can be traced within the module. We add uinput_release to the filter_hash to trace just that function. Enable function tracincg, which increments the ref count of the record associated to uinput_release. Remove uinput, which frees the records including the one that represents uinput_release. Load the uinput module again (and make sure it's at the same address). This recreates the function records all with a ref count of zero, including uinput_release. Disable function tracing, which will decrement the ref count for uinput_release which is now zero because of the module removal and reload, and we have a mismatch (below zero ref count). The solution is to check all currently tracing ftrace callbacks to see if any are tracing any of the module's functions when a module is loaded (it already does that with callbacks that trace all functions). If a callback happens to have a module function being traced, it increments that records ref count and starts tracing that function. There may be a strange side effect with this, where tracing module functions on unload and then reloading a new module may have that new module's functions being traced. This may be something that confuses the user, but it's not a big deal. Another approach is to disable all callback hashes on module unload, but this leaves some ftrace callbacks that may not be registered, but can still have hashes tracing the module's function where ftrace doesn't know about it. That situation can cause the same bug. This solution solves that case too. Another benefit of this solution, is it is possible to trace a module's function on unload and load. Link: http://lkml.kernel.org/r/20130705142629.GA325@redhat.com Reported-by: Jörn Engel <joern@logfs.org> Reported-by: Dave Jones <davej@redhat.com> Reported-by: Steve Hodgson <steve@purestorage.com> Tested-by: Steve Hodgson <steve@purestorage.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> [bwh: Backported to 3.2: adjust context, indentation] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-01-03ftrace: Create ftrace_hash_empty() helper routineSteven Rostedt1-11/+17
commit 06a51d9307380c78bb5c92e68fc80ad2c7d7f890 upstream. There are two types of hashes in the ftrace_ops; one type is the filter_hash and the other is the notrace_hash. Either one may be null, meaning it has no elements. But when elements are added, the hash is allocated. Throughout the code, a check needs to be made to see if a hash exists or the hash has elements, but the check if the hash exists is usually missing causing the possible "NULL pointer dereference bug". Add a helper routine called "ftrace_hash_empty()" that returns true if the hash doesn't exist or its count is zero. As they mean the same thing. Last-bug-reported-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2014-01-03ftrace: Fix ftrace hash record update with notraceSteven Rostedt1-1/+1
commit c842e975520f8ab09e293cc92f51a1f396251fd5 upstream. When disabling the "notrace" records, that means we want to trace them. If the notrace_hash is zero, it means that we want to trace all records. But to disable a zero notrace_hash means nothing. The check for the notrace_hash count was incorrect with: if (hash && !hash->count) return With the correct comment above it that states that we do nothing if the notrace_hash has zero count. But !hash also means that the notrace hash has zero count. I think this was done to protect against dereferencing NULL. But if !hash is true, then we go through the following loop without doing a single thing. Fix it to: if (!hash || !hash->count) return; Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>