kernel/linux.git - Linux kernel stable tree (mirror)

Age	Commit message (Collapse)	Author	Files	Lines
2020-09-09	libata: implement ATA_HORKAGE_MAX_TRIM_128M and apply to Sandisks	Tejun Heo	1	-1/+7
	commit 3b5455636fe26ea21b4189d135a424a6da016418 upstream. All three generations of Sandisk SSDs lock up hard intermittently. Experiments showed that disabling NCQ lowered the failure rate significantly and the kernel has been disabling NCQ for some models of SD7's and 8's, which is obviously undesirable. Karthik worked with Sandisk to root cause the hard lockups to trim commands larger than 128M. This patch implements ATA_HORKAGE_MAX_TRIM_128M which limits max trim size to 128M and applies it to all three generations of Sandisk SSDs. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Karthik Shivaram <karthikgs@fb.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-06-30	ata/libata: Fix usage of page address by page_address in ↵	Ye Bin	1	-3/+6
	ata_scsi_mode_select_xlat function [ Upstream commit f650ef61e040bcb175dd8762164b00a5d627f20e ] BUG: KASAN: use-after-free in ata_scsi_mode_select_xlat+0x10bd/0x10f0 drivers/ata/libata-scsi.c:4045 Read of size 1 at addr ffff88803b8cd003 by task syz-executor.6/12621 CPU: 1 PID: 12621 Comm: syz-executor.6 Not tainted 4.19.95 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xac/0xee lib/dump_stack.c:118 print_address_description+0x60/0x223 mm/kasan/report.c:253 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report mm/kasan/report.c:409 [inline] kasan_report.cold+0xae/0x2d8 mm/kasan/report.c:393 ata_scsi_mode_select_xlat+0x10bd/0x10f0 drivers/ata/libata-scsi.c:4045 ata_scsi_translate+0x2da/0x680 drivers/ata/libata-scsi.c:2035 __ata_scsi_queuecmd drivers/ata/libata-scsi.c:4360 [inline] ata_scsi_queuecmd+0x2e4/0x790 drivers/ata/libata-scsi.c:4409 scsi_dispatch_cmd+0x2ee/0x6c0 drivers/scsi/scsi_lib.c:1867 scsi_queue_rq+0xfd7/0x1990 drivers/scsi/scsi_lib.c:2170 blk_mq_dispatch_rq_list+0x1e1/0x19a0 block/blk-mq.c:1186 blk_mq_do_dispatch_sched+0x147/0x3d0 block/blk-mq-sched.c:108 blk_mq_sched_dispatch_requests+0x427/0x680 block/blk-mq-sched.c:204 __blk_mq_run_hw_queue+0xbc/0x200 block/blk-mq.c:1308 __blk_mq_delay_run_hw_queue+0x3c0/0x460 block/blk-mq.c:1376 blk_mq_run_hw_queue+0x152/0x310 block/blk-mq.c:1413 blk_mq_sched_insert_request+0x337/0x6c0 block/blk-mq-sched.c:397 blk_execute_rq_nowait+0x124/0x320 block/blk-exec.c:64 blk_execute_rq+0xc5/0x112 block/blk-exec.c:101 sg_scsi_ioctl+0x3b0/0x6a0 block/scsi_ioctl.c:507 sg_ioctl+0xd37/0x23f0 drivers/scsi/sg.c:1106 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:501 [inline] do_vfs_ioctl+0xae6/0x1030 fs/ioctl.c:688 ksys_ioctl+0x76/0xa0 fs/ioctl.c:705 __do_sys_ioctl fs/ioctl.c:712 [inline] __se_sys_ioctl fs/ioctl.c:710 [inline] __x64_sys_ioctl+0x6f/0xb0 fs/ioctl.c:710 do_syscall_64+0xa0/0x2e0 arch/x86/entry/common.c:293 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x45c479 Code: ad b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fb0e9602c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fb0e96036d4 RCX: 000000000045c479 RDX: 0000000020000040 RSI: 0000000000000001 RDI: 0000000000000003 RBP: 000000000076bfc0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 000000000000046d R14: 00000000004c6e1a R15: 000000000076bfcc Allocated by task 12577: set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc mm/kasan/kasan.c:553 [inline] kasan_kmalloc+0xbf/0xe0 mm/kasan/kasan.c:531 __kmalloc+0xf3/0x1e0 mm/slub.c:3749 kmalloc include/linux/slab.h:520 [inline] load_elf_phdrs+0x118/0x1b0 fs/binfmt_elf.c:441 load_elf_binary+0x2de/0x4610 fs/binfmt_elf.c:737 search_binary_handler fs/exec.c:1654 [inline] search_binary_handler+0x15c/0x4e0 fs/exec.c:1632 exec_binprm fs/exec.c:1696 [inline] __do_execve_file.isra.0+0xf52/0x1a90 fs/exec.c:1820 do_execveat_common fs/exec.c:1866 [inline] do_execve fs/exec.c:1883 [inline] __do_sys_execve fs/exec.c:1964 [inline] __se_sys_execve fs/exec.c:1959 [inline] __x64_sys_execve+0x8a/0xb0 fs/exec.c:1959 do_syscall_64+0xa0/0x2e0 arch/x86/entry/common.c:293 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 12577: set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x129/0x170 mm/kasan/kasan.c:521 slab_free_hook mm/slub.c:1370 [inline] slab_free_freelist_hook mm/slub.c:1397 [inline] slab_free mm/slub.c:2952 [inline] kfree+0x8b/0x1a0 mm/slub.c:3904 load_elf_binary+0x1be7/0x4610 fs/binfmt_elf.c:1118 search_binary_handler fs/exec.c:1654 [inline] search_binary_handler+0x15c/0x4e0 fs/exec.c:1632 exec_binprm fs/exec.c:1696 [inline] __do_execve_file.isra.0+0xf52/0x1a90 fs/exec.c:1820 do_execveat_common fs/exec.c:1866 [inline] do_execve fs/exec.c:1883 [inline] __do_sys_execve fs/exec.c:1964 [inline] __se_sys_execve fs/exec.c:1959 [inline] __x64_sys_execve+0x8a/0xb0 fs/exec.c:1959 do_syscall_64+0xa0/0x2e0 arch/x86/entry/common.c:293 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The buggy address belongs to the object at ffff88803b8ccf00 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 259 bytes inside of 512-byte region [ffff88803b8ccf00, ffff88803b8cd100) The buggy address belongs to the page: page:ffffea0000ee3300 count:1 mapcount:0 mapping:ffff88806cc03080 index:0xffff88803b8cc780 compound_mapcount: 0 flags: 0x100000000008100(slab\|head) raw: 0100000000008100 ffffea0001104080 0000000200000002 ffff88806cc03080 raw: ffff88803b8cc780 00000000800c000b 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88803b8ccf00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88803b8ccf80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff88803b8cd000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88803b8cd080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88803b8cd100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc You can refer to "https://www.lkml.org/lkml/2019/1/17/474" reproduce this error. The exception code is "bd_len = p[3];", "p" value is ffff88803b8cd000 which belongs to the cache kmalloc-512 of size 512. The "page_address(sg_page(scsi_sglist(scmd)))" maybe from sg_scsi_ioctl function "buffer" which allocated by kzalloc, so "buffer" may not page aligned. This also looks completely buggy on highmem systems and really needs to use a kmap_atomic. --Christoph Hellwig To address above bugs, Paolo Bonzini advise to simpler to just make a char array of size CACHE_MPAGE_LEN+8+8+4-2(or just 64 to make it easy), use sg_copy_to_buffer to copy from the sglist into the buffer, and workthere. Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-24	libata: Remove extra scsi_host_put() in ata_scsi_add_hosts()	John Garry	1	-6/+3
	[ Upstream commit 1d72f7aec3595249dbb83291ccac041a2d676c57 ] If the call to scsi_add_host_with_dma() in ata_scsi_add_hosts() fails, then we may get use-after-free KASAN warns: ================================================================== BUG: KASAN: use-after-free in kobject_put+0x24/0x180 Read of size 1 at addr ffff0026b8c80364 by task swapper/0/1 CPU: 1 PID: 1 Comm: swapper/0 Tainted: G W 5.6.0-rc3-00004-g5a71b206ea82-dirty #1765 Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDD, BIOS 2280-V2 CS V3.B160.01 02/24/2020 Call trace: dump_backtrace+0x0/0x298 show_stack+0x14/0x20 dump_stack+0x118/0x190 print_address_description.isra.9+0x6c/0x3b8 __kasan_report+0x134/0x23c kasan_report+0xc/0x18 __asan_load1+0x5c/0x68 kobject_put+0x24/0x180 put_device+0x10/0x20 scsi_host_put+0x10/0x18 ata_devres_release+0x74/0xb0 release_nodes+0x2d0/0x470 devres_release_all+0x50/0x78 really_probe+0x2d4/0x560 driver_probe_device+0x7c/0x148 device_driver_attach+0x94/0xa0 __driver_attach+0xa8/0x110 bus_for_each_dev+0xe8/0x158 driver_attach+0x30/0x40 bus_add_driver+0x220/0x2e0 driver_register+0xbc/0x1d0 __pci_register_driver+0xbc/0xd0 ahci_pci_driver_init+0x20/0x28 do_one_initcall+0xf0/0x608 kernel_init_freeable+0x31c/0x384 kernel_init+0x10/0x118 ret_from_fork+0x10/0x18 Allocated by task 5: save_stack+0x28/0xc8 __kasan_kmalloc.isra.8+0xbc/0xd8 kasan_kmalloc+0xc/0x18 __kmalloc+0x1a8/0x280 scsi_host_alloc+0x44/0x678 ata_scsi_add_hosts+0x74/0x268 ata_host_register+0x228/0x488 ahci_host_activate+0x1c4/0x2a8 ahci_init_one+0xd18/0x1298 local_pci_probe+0x74/0xf0 work_for_cpu_fn+0x2c/0x48 process_one_work+0x488/0xc08 worker_thread+0x330/0x5d0 kthread+0x1c8/0x1d0 ret_from_fork+0x10/0x18 Freed by task 5: save_stack+0x28/0xc8 __kasan_slab_free+0x118/0x180 kasan_slab_free+0x10/0x18 slab_free_freelist_hook+0xa4/0x1a0 kfree+0xd4/0x3a0 scsi_host_dev_release+0x100/0x148 device_release+0x7c/0xe0 kobject_put+0xb0/0x180 put_device+0x10/0x20 scsi_host_put+0x10/0x18 ata_scsi_add_hosts+0x210/0x268 ata_host_register+0x228/0x488 ahci_host_activate+0x1c4/0x2a8 ahci_init_one+0xd18/0x1298 local_pci_probe+0x74/0xf0 work_for_cpu_fn+0x2c/0x48 process_one_work+0x488/0xc08 worker_thread+0x330/0x5d0 kthread+0x1c8/0x1d0 ret_from_fork+0x10/0x18 There is also refcount issue, as well: WARNING: CPU: 1 PID: 1 at lib/refcount.c:28 refcount_warn_saturate+0xf8/0x170 The issue is that we make an erroneous extra call to scsi_host_put() for that host: So in ahci_init_one()->ata_host_alloc_pinfo()->ata_host_alloc(), we setup a device release method - ata_devres_release() - which intends to release the SCSI hosts: static void ata_devres_release(struct device gendev, void res) { ... for (i = 0; i < host->n_ports; i++) { struct ata_port *ap = host->ports[i]; if (!ap) continue; if (ap->scsi_host) scsi_host_put(ap->scsi_host); } ... } However in the ata_scsi_add_hosts() error path, we also call scsi_host_put() for the SCSI hosts. Fix by removing the the scsi_host_put() calls in ata_scsi_add_hosts() and leave this to ata_devres_release(). Fixes: f31871951b38 ("libata: separate out ata_host_alloc() and ata_host_register()") Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-08-29	libata: have ata_scsi_rw_xlat() fail invalid passthrough requests	Jens Axboe	1	-0/+21
	[ Upstream commit 2d7271501720038381d45fb3dcbe4831228fc8cc ] For passthrough requests, libata-scsi takes what the user passes in as gospel. This can be problematic if the user fills in the CDB incorrectly. One example of that is in request sizes. For read/write commands, the CDB contains fields describing the transfer length of the request. These should match with the SG_IO header fields, but libata-scsi currently does no validation of that. Check that the number of blocks in the CDB for passthrough requests matches what was mapped into the request. If the CDB asks for more data then the validated SG_IO header fields, error it. Reported-by: Krishna Ram Prakash R <krp@gtux.in> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-07-17	ata: Fix ZBC_OUT all bit handling	Damien Le Moal	1	-3/+8
	commit 6edf1d4cb0acde3a0a5dac849f33031bd7abb7b1 upstream. If the ALL bit is set in the ZBC_OUT command, the command zone ID field (block) should be ignored. Reported-by: David Butterfield <david.butterfield@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Cc: stable@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-17	ata: Fix ZBC_OUT command block check	Damien Le Moal	1	-6/+7
	commit b320a0a9f23c98f21631eb27bcbbca91c79b1c6e upstream. The block (LBA) specified must not exceed the last addressable LBA, which is dev->nr_sectors - 1. So fix the correct check is "if (block >= dev->n_sectors)" and not "if (block > dev->n_sectords)". Additionally, the asc/ascq to return for an LBA that is not a zone start LBA should be ILLEGAL REQUEST, regardless if the bad LBA is out of range. Reported-by: David Butterfield <david.butterfield@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Cc: stable@vger.kernel.org Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30	libata: Fix compile warning with ATA_DEBUG enabled	Dong Bo	1	-1/+1
	[ Upstream commit 0d3e45bc6507bd1f8728bf586ebd16c2d9e40613 ] This fixs the following comile warnings with ATA_DEBUG enabled, which detected by Linaro GCC 5.2-2015.11: drivers/ata/libata-scsi.c: In function 'ata_scsi_dump_cdb': ./include/linux/kern_levels.h:5:18: warning: format '%d' expects argument of type 'int', but argument 6 has type 'u64 {aka long long unsigned int}' [-Wformat=] tj: Patch hand-applied and description trimmed. Signed-off-by: Dong Bo <dongbo4@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28	libata: don't try to pass through NCQ commands to non-NCQ devices	Eric Biggers	1	-0/+6
	commit 2c1ec6fda2d07044cda922ee25337cf5d4b429b3 upstream. syzkaller hit a WARN() in ata_bmdma_qc_issue() when writing to /dev/sg0. This happened because it issued an ATA pass-through command (ATA_16) where the protocol field indicated that NCQ should be used -- but the device did not support NCQ. We could just remove the WARN() from libata-sff.c, but the real problem seems to be that the SCSI -> ATA translation code passes through NCQ commands without verifying that the device actually supports NCQ. Fix this by adding the appropriate check to ata_scsi_pass_thru(). Here's reproducer that works in QEMU when /dev/sg0 refers to a disk of the default type ("82371SB PIIX3 IDE"): #include <fcntl.h> #include <unistd.h> int main() { char buf[53] = { 0 }; buf[36] = 0x85; /* ATA_16 / buf[37] = (12 << 1); / FPDMA / buf[38] = 0x1; / Has data / buf[51] = 0xC8; / ATA_CMD_READ */ write(open("/dev/sg0", O_RDWR), buf, sizeof(buf)); } Fixes: ee7fb331c3ac ("libata: add support for NCQ commands for SG interface") Reported-by: syzbot+2f69ca28df61bdfc77cd36af2e789850355a221e@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> # v4.4+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-28	libata: fix length validation of ATAPI-relayed SCSI commands	Eric Biggers	1	-1/+3
	commit 058f58e235cbe03e923b30ea7c49995a46a8725f upstream. syzkaller reported a crash in ata_bmdma_fill_sg() when writing to /dev/sg1. The immediate cause was that the ATA command's scatterlist was not DMA-mapped, which causes 'pi - 1' to underflow, resulting in a write to 'qc->ap->bmdma_prd[0xffffffff]'. Strangely though, the flag ATA_QCFLAG_DMAMAP was set in qc->flags. The root cause is that when __ata_scsi_queuecmd() is preparing to relay a SCSI command to an ATAPI device, it doesn't correctly validate the CDB length before copying it into the 16-byte buffer 'cdb' in 'struct ata_queued_cmd'. Namely, it validates the fixed CDB length expected based on the SCSI opcode but not the actual CDB length, which can be larger due to the use of the SG_NEXT_CMD_LEN ioctl. Since 'flags' is the next member in ata_queued_cmd, a buffer overflow corrupts it. Fix it by requiring that the actual CDB length be <= 16 (ATAPI_CDB_LEN). [Really it seems the length should be required to be <= dev->cdb_len, but the current behavior seems to have been intentionally introduced by commit 607126c2a21c ("libata-scsi: be tolerant of 12-byte ATAPI commands in 16-byte CDBs") to work around a userspace bug in mplayer. Probably the workaround is no longer needed (mplayer was fixed in 2007), but continuing to allow lengths to up 16 appears harmless for now.] Here's a reproducer that works in QEMU when /dev/sg1 refers to the CD-ROM drive that qemu-system-x86_64 creates by default: #include <fcntl.h> #include <sys/ioctl.h> #include <unistd.h> #define SG_NEXT_CMD_LEN 0x2283 int main() { char buf[53] = { [36] = 0x7e, [52] = 0x02 }; int fd = open("/dev/sg1", O_RDWR); ioctl(fd, SG_NEXT_CMD_LEN, &(int){ 17 }); write(fd, buf, sizeof(buf)); } The crash was: BUG: unable to handle kernel paging request at ffff8cb97db37ffc IP: ata_bmdma_fill_sg drivers/ata/libata-sff.c:2623 [inline] IP: ata_bmdma_qc_prep+0xa4/0xc0 drivers/ata/libata-sff.c:2727 PGD fb6c067 P4D fb6c067 PUD 0 Oops: 0002 [#1] SMP CPU: 1 PID: 150 Comm: syz_ata_bmdma_q Not tainted 4.15.0-next-20180202 #99 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 [...] Call Trace: ata_qc_issue+0x100/0x1d0 drivers/ata/libata-core.c:5421 ata_scsi_translate+0xc9/0x1a0 drivers/ata/libata-scsi.c:2024 __ata_scsi_queuecmd drivers/ata/libata-scsi.c:4326 [inline] ata_scsi_queuecmd+0x8c/0x210 drivers/ata/libata-scsi.c:4375 scsi_dispatch_cmd+0xa2/0xe0 drivers/scsi/scsi_lib.c:1727 scsi_request_fn+0x24c/0x530 drivers/scsi/scsi_lib.c:1865 __blk_run_queue_uncond block/blk-core.c:412 [inline] __blk_run_queue+0x3a/0x60 block/blk-core.c:432 blk_execute_rq_nowait+0x93/0xc0 block/blk-exec.c:78 sg_common_write.isra.7+0x272/0x5a0 drivers/scsi/sg.c:806 sg_write+0x1ef/0x340 drivers/scsi/sg.c:677 __vfs_write+0x31/0x160 fs/read_write.c:480 vfs_write+0xa7/0x160 fs/read_write.c:544 SYSC_write fs/read_write.c:589 [inline] SyS_write+0x4d/0xc0 fs/read_write.c:581 do_syscall_64+0x5e/0x110 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x21/0x86 Fixes: 607126c2a21c ("libata-scsi: be tolerant of 12-byte ATAPI commands in 16-byte CDBs") Reported-by: syzbot+1ff6f9fcc3c35f1c72a95e26528c8e7e3276e4da@syzkaller.appspotmail.com Cc: <stable@vger.kernel.org> # v2.6.24+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-19	libata: array underflow in ata_find_dev()	Dan Carpenter	1	-2/+4
	My static checker complains that "devno" can be negative, meaning that we read before the start of the loop. I've looked at the code, and I think the warning is right. This come from /proc so it's root only or it would be quite a quite a serious bug. The call tree looks like this: proc_scsi_write() <- gets id and channel from simple_strtoul() -> scsi_add_single_device() <- calls shost->transportt->user_scan() -> ata_scsi_user_scan() -> ata_find_dev() Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org # all versions at this point
2017-06-29	sd: add support for TCG OPAL self encrypting disks	Christoph Hellwig	1	-0/+3
	Just wire up the generic TCG OPAL infrastructure to the SCSI disk driver and the Security In/Out commands. Note that I don't know of any actual SCSI disks that do support TCG OPAL, but this is required to support ATA disks through libata. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-06-29	libata: fix build warning from unused goto label	Tejun Heo	1	-1/+0
	b1ffbf854e08 ("libata: Support for an ATA PASS-THROUGH(32) command.") introduced an unused goto label. Remove it. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-06-27	libata: Support for an ATA PASS-THROUGH(32) command.	Minwoo Im	1	-5/+67
	SAT-4(SCSI/ATA Translation) supports for an ata pass-thru(32). This patch will allow to translate an ata pass-thru(32) SCSI cmd to an ATA cmd. Signed-off-by: Minwoo Im <dn3108@gmail.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-06-20	libata: make the function name in comment match the actual function	Minwoo Im	1	-1/+1
	The function name used to be ata_scsiop_mode_select() but renamed to ata_scsi_mode_select_xlat(). Update the comment accordingly. tj: Minor commit desc update. Signed-off-by: Minwoo Im <dn3108@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-06-12	libata: Convert bare printks to pr_cont	Joe Perches	1	-21/+20
	Linus Torvalds changed the behavior of printks without KERN_<LEVEL>. Convert the continuation prints to use pr_cont. At the same time, convert the existing printks with KERN_<LEVEL> to pr_<level> Miscellanea: o Coalesce a multiline format Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-06-05	libata: implement SECURITY PROTOCOL IN/OUT	Christoph Hellwig	1	-0/+76
	This allows us to use the generic OPAL code with ATA devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-05-16	ata: update references for libata documentation	Mauro Carvalho Chehab	1	-1/+1
	The libata documentation is now using ReST. Update references to it to point to the new place. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-05-16	libata: fix identation on a kernel-doc markup	Mauro Carvalho Chehab	1	-3/+4
	Sphinx got confused with the markup identation: ./drivers/ata/libata-scsi.c:3402: ERROR: Unexpected indentation. No functional changes. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-04-29	libata: remove SCT WRITE SAME support	Christoph Hellwig	1	-103/+29
	This was already disabled a while ago because it caused I/O errors, and it's severly getting into the way of the discard / write zeroes rework. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-04-29	libata: reject passthrough WRITE SAME requests	Christoph Hellwig	1	-0/+8
	The WRITE SAME to TRIM translation rewrites the DATA OUT buffer. While the SCSI code accomodates for this by passing a read-writable buffer userspace applications don't cater for this behavior. In fact it can be used to rewrite e.g. a readonly file through mmap and should be considered as a security fix. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-02-24	scsi: merge __scsi_execute into scsi_execute	Christoph Hellwig	1	-8/+4
	All but one caller want the decoded sense header, so offer the existing __scsi_execute helper as the public scsi_execute API to simply the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-22	Merge branch 'for-4.11' of ↵	Linus Torvalds	1	-61/+36
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata updates from Tejun Heo: - Bartlomiej added pata_falcon - Christoph is trying to remove use of static 4k buf. It's still WIP - config cleanup around HAS_DMA - other fixes and driver-specific changes * 'for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: (29 commits) ata: pata_of_platform: using of_property_read_u32() helper pata_atiixp: Don't use unconnected secondary port on SB600/SB700 libata-sff: Don't scan disabled ports when checking for legacy mode. pata_octeon_cf: remove unused local variables from octeon_cf_set_piomode() ahci: qoriq: added ls2088a platforms support ahci: qoriq: report error when ecc register address is missing in dts ahci: qoriq: added a condition to enable dma coherence Revert "libata: switch to dynamic allocation instead of ata_scsi_rbuf" ahci: imx: fix building without hwmon or thermal ata: add Atari Falcon PATA controller driver ata: pass queued command to ->sff_data_xfer method ata: allow subsystem to be used on m68k arch libata: switch to dynamic allocation instead of ata_scsi_rbuf libata: don't call ata_scsi_rbuf_fill for command without a response buffer libata: call ->scsi_done from ata_scsi_simulate libata: remove the done callback from ata_scsi_args libata: move struct ata_scsi_args to libata-scsi.c libata: avoid global response buffer in atapi_qc_complete libata-eh: Use switch() instead of sparse array for protocol strings ata: sata_mv: Convert to devm_ioremap_resource() ...
2017-02-01	block: introduce blk_rq_is_passthrough	Christoph Hellwig	1	-1/+1
	This can be used to check for fs vs non-fs requests and basically removes all knowledge of BLOCK_PC specific from the block layer, as well as preparing for removing the cmd_type field in struct request. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-28	block: split scsi_request out of struct request	Christoph Hellwig	1	-1/+1
	And require all drivers that want to support BLOCK_PC to allocate it as the first thing of their private data. To support this the legacy IDE and BSG code is switched to set cmd_size on their queues to let the block layer allocate the additional space. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-18	Revert "libata: switch to dynamic allocation instead of ata_scsi_rbuf"	Tejun Heo	1	-36/+86
	This reverts commit a234f7395c9301a5048cb2daa4c86f15c6f02de8. The commit tried to get rid of the shared global SCSI response buffer. Unfortunately, it added blocking allocation to atomic path. Revert it for now. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Hellwig <hch@lst.de>
2017-01-10	libata: switch to dynamic allocation instead of ata_scsi_rbuf	Christoph Hellwig	1	-86/+36
	Note of the emulated commands in the pageout/pagein path, so just do a GFP_NOIO dynamic allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-01-10	libata: don't call ata_scsi_rbuf_fill for command without a response buffer	Christoph Hellwig	1	-21/+1
	No need to copy a zeroed buffer to the caller if the command is defined to not have a response in the SCSI spec. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-01-10	libata: call ->scsi_done from ata_scsi_simulate	Christoph Hellwig	1	-15/+7
	We always need to call ->scsi_done after we've finished emulating a command, so do it in a single place at the end of ata_scsi_simulate. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-01-10	libata: remove the done callback from ata_scsi_args	Christoph Hellwig	1	-3/+1
	It's always the scsi_done callback, and we can get at that easily in the place where ->done is called. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-01-10	libata: move struct ata_scsi_args to libata-scsi.c	Christoph Hellwig	1	-0/+7
	It's only used in libata-scsi.c, so move it closer to the users. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2017-01-10	libata: avoid global response buffer in atapi_qc_complete	Christoph Hellwig	1	-24/+22
	We only need to look at 4 bytes of the inquiry response for ATAPI devices. Instead of using the global ata_scsi_rbuf just use a a stack buffer. Also factor the fixup into it's own little helper function to make it more readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-12-14	Merge branch 'for-4.10' of ↵	Linus Torvalds	1	-17/+21
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull another libata patch from Tejun Heo: "One more patch from Adam added. It makes libata skip probing for NCQ prio unless the feature is explicitly requested by the user. This is necessary because some controllers lock up after the optional feature is probed" * 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: ata: avoid probing NCQ Prio Support if not explicitly requested
2016-12-14	ata: avoid probing NCQ Prio Support if not explicitly requested	Adam Manzanares	1	-17/+21
	Previously, when the ata device was being initialized we were probing for NCQ prio support by checking the identify information and also checking the log page that holds information about ncq prio support. This caused an error on an Intel HBA so the code is now updated to only probe for NCQ prio support when the sysfs variable controlling NCQ prio support is enabled. tj: Update formatting, switch to spin_[un]lock_irq() and update locking a bit, use REVALIDATE instead of RESET, and return -EIO instead of -EINVAL on config failure. Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-12-14	Merge branch 'for-4.10' of ↵	Linus Torvalds	1	-1/+79
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata updates from Tejun Heo: - Adam added opt-in ATA command priority support. - There are machines which hide multiple nvme devices behind an ahci BAR. Dan Williams proposed a solution to force-switch the mode but deemed too hackishd. People are gonna discuss the proper way to handle the situation in nvme standard meetings. For now, detect and warn about the situation. - Low level driver specific changes. Christoph Hellwig pipes in about the hidden nvme warning: "I wish that was the case. We've pretty much agreed that we'll want to implement it as a virtual PCIe root bridge, similar to Intels other 'innovation' VMD that we work around that way. But Intel management has apparently decided that they don't want to spend more cycles on this now that Lenovo has an optional BIOS that doesn't force this broken mode anymore, and no one outside of Intel has enough information to implement something like this. So for now I guess this warning is it, until Intel reconsideres and spends resources on fixing up the damage their Chipset people caused" * 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: ahci: warn about remapped NVMe devices ahci-remap.h: add ahci remapping definitions nvme: move NVMe class code to pci_ids.h pata: imx: support controller modes up to PIO4 pata: imx: add support of setting timings for PIO modes pata: imx: set controller PIO mode with .set_piomode callback pata: imx: sort headers out ata: set ncq_prio_enabled iff device has support ata: ATA Command Priority Disabled By Default ata: Enabling ATA Command Priorities block: Add iocontext priority to request ahci: qoriq: added ls1046a platform support
2016-12-08	libata-scsi: disable SCT Write Same for the moment	Nicolai Stange	1	-0/+1
	SCT Write Same support had been introduced with commit 7b2030942859 ("libata: Add support for SCT Write Same") Some problems, namely excessive userspace segfaults, had been reported at http://lkml.kernel.org/r/20160908192736.GA4356@gmail.com This lead to commit 0ce1b18c42a5 ("libata: Some drives failing on SCT Write Same") which strived to disable SCT Write Same on !ZAC devices. Due to the way this was done and to the logic in sd_config_write_same(), this didn't work for those devices that have ->max_ws_blocks > SD_MAX_WS10_BLOCKS: for these, ->no_write_same and ->max_write_same_sectors would still be non-zero, but ->ws10 == ->ws16 == 0. This would cause sd_setup_write_same_cmnd() to demultiplex REQ_OP_WRITE_SAME requests to WRITE_SAME, and these in turn aren't supported by libata-scsi: EXT4-fs (dm-1): Delayed block allocation failed for inode 2625094 at logical offset 2032 with max blocks 2 with error 121 EXT4-fs (dm-1): This should not happen!! Data will be lost 121 == EREMOTEIO is what scsi_io_completion() asserts in case of invalid opcodes. Back to the original problem of userspace segfaults: this can be tracked down to ata_format_sct_write_same() overwriting the input page. Sometimes, this page is ZERO_PAGE(0) which ceases to be filled with zeros from that point on. Since ZERO_PAGE(0) is used for userspace .bss mappings, code of the following is doomed: static char a = NULL; / .bss / ... if (a) a = 'a'; This problem is not solved by disabling SCT Write Same for !ZAC devices only. It can certainly be fixed, but the final release is quite close -- so disable SCT Write Same for all ATA devices rather than introducing some SCT key buffer allocation schemes at this point. Fixes: 7b2030942859 ("libata: Add support for SCT Write Same") Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-11-01	libata-scsi: Fixup ata_gen_passthru_sense()	Hannes Reinecke	1	-1/+1
	There's a typo in ata_gen_passthru_sense(), where the first byte would be overwritten incorrectly later on. Reported-by: Charles Machalow <csm10495@gmail.com> Signed-off-by: Hannes Reinecke <hare@suse.com> Fixes: 11093cb1ef56 ("libata-scsi: generate correct ATA pass-through sense") Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: Tejun Heo <tj@kernel.org>
2016-10-20	ata: set ncq_prio_enabled iff device has support	Adam Manzanares	1	-2/+8
	We previously had a check to see if the device has support for prioritized ncq commands and a check to see if a device flag is set, through a sysfs variable, in order to send a prioritized command. This patch only allows the sysfs variable to be set if the device supports prioritized commands enabling one check in ata_build_rw_tf in order to determine whether or not to send a prioritized command. This patch depends on ata: ATA Command Priority Disabled By Default tj: Minor subject and formatting updates. Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-10-19	ata: ATA Command Priority Disabled By Default	Adam Manzanares	1	-0/+68
	Add a sysfs entry to turn on priority information being passed to a ATA device. By default this feature is turned off. This patch depends on ata: Enabling ATA Command Priorities tj: Renamed ncq_prio_on to ncq_prio_enable and removed trivial ata_ncq_prio_on() and open-coded the test. Signed-off-by: Adam Manzanares <adam.manzanares@hgst.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-10-19	ata: Enabling ATA Command Priorities	Adam Manzanares	1	-1/+5
	This patch checks to see if an ATA device supports NCQ command priorities. If so and the user has specified an iocontext that indicates IO_PRIO_CLASS_RT then we build a tf with a high priority command. This is done to improve the tail latency of commands that are high priority by passing priority to the device. tj: Removed trivial ata_ncq_prio_enabled() and open-coded the test. Signed-off-by: Adam Manzanares <adam.manzanares@hgst.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-09-09	libata: Some drives failing on SCT Write Same	Shaun Tancheff	1	-3/+3
	Restrict support SCT Write Same to devices which also support ZAC where support is required. Reported-by: Mike Krinkin <krinkin.m.u@gmail.com> Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-08-25	libata: SCT Write Same handle ATA_DFLAG_PIO	Shaun Tancheff	1	-0/+2
	Use non DMA write log when ATA_DFLAG_PIO is set. Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Tejun Heo <tj@kernel.org>
2016-08-25	libata: SCT Write Same / DSM Trim	Shaun Tancheff	1	-28/+57
	Correct handling of devices with sector_size other that 512 bytes. In the case of a 4Kn device sector_size it is possible to describe a much larger DSM Trim than the current fixed default of 512 bytes. This patch assumes the minimum descriptor is sector_size and fills out the descriptor accordingly. The ACS-2 specification is quite clear that the DSM command payload is sized as number of 512 byte transfers so a 4Kn device will operate correctly without this patch. Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com> Acked-by: Tejun Heo <tj@kernel.org>
2016-08-25	libata: Add support for SCT Write Same	Shaun Tancheff	1	-29/+170
	SATA drives may support write same via SCT. This is useful for setting the drive contents to a specific pattern (0's). Translate a SCSI WRITE SAME 16 command to be either a DSM TRIM command or an SCT Write Same command. Based on the UNMAP flag: - When set translate to DSM TRIM - When not set translate to SCT Write Same Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Tejun Heo <tj@kernel.org>
2016-08-25	libata: Safely overwrite attached page in WRITE SAME xlat	Shaun Tancheff	1	-5/+51
	Safely overwriting the attached page to ATA format from the SCSI formatted variant. Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Acked-by: Tejun Heo <tj@kernel.org>
2016-08-10	libata-scsi: fix MODE SELECT translation for Control mode page	Tom Yan	1	-2/+2
	scsi_done() was called repeatedly and apparently because of that, the kernel would call trace when we touch the Control mode page: Call Trace: [<ffffffff812ea0d2>] dump_stack+0x63/0x81 [<ffffffff81079cfb>] __warn+0xcb/0xf0 [<ffffffff81079e2d>] warn_slowpath_null+0x1d/0x20 [<ffffffffa00f51b0>] ata_eh_finish+0xe0/0xf0 [libata] [<ffffffffa00fb830>] sata_pmp_error_handler+0x640/0xa50 [libata] [<ffffffffa00470ed>] ahci_error_handler+0x1d/0x70 [libahci] [<ffffffffa00f55f0>] ata_scsi_port_error_handler+0x430/0x770 [libata] [<ffffffffa00eff8d>] ? ata_scsi_cmd_error_handler+0xdd/0x160 [libata] [<ffffffffa00f59d7>] ata_scsi_error+0xa7/0xf0 [libata] [<ffffffffa00913ba>] scsi_error_handler+0xaa/0x560 [scsi_mod] [<ffffffffa0091310>] ? scsi_eh_get_sense+0x180/0x180 [scsi_mod] [<ffffffff81098eb8>] kthread+0xd8/0xf0 [<ffffffff815d913f>] ret_from_fork+0x1f/0x40 [<ffffffff81098de0>] ? kthread_worker_fn+0x170/0x170 ---[ end trace 8b7501047e928a17 ]--- Removed the unnecessary code and let ata_scsi_translate() do the job. Also, since ata_mselect_control() has no ATA command to send to the device, ata_scsi_mode_select_xlat() should return 1 for it, so that ata_scsi_translate() will finish early to avoid ata_qc_issue(). Signed-off-by: Tom Yan <tom.ty89@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-08-09	libata-scsi: use u8 array to store mode page copy	Tom Yan	1	-2/+2
	ata_mselect_*() would initialize a char array for storing a copy of the current mode page. However, char could be signed char. In that case, bytes larger than 127 would be converted to negative number. For example, 0xff from def_control_mpage[] would become -1. This prevented ata_mselect_control() from working at all, since when it did the read-only bits check, there would always be a mismatch. Signed-off-by: Tom Yan <tom.ty89@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-07-27	Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block	Linus Torvalds	1	-1/+1
	Pull core block updates from Jens Axboe: - the big change is the cleanup from Mike Christie, cleaning up our uses of command types and modified flags. This is what will throw some merge conflicts - regression fix for the above for btrfs, from Vincent - following up to the above, better packing of struct request from Christoph - a 2038 fix for blktrace from Arnd - a few trivial/spelling fixes from Bart Van Assche - a front merge check fix from Damien, which could cause issues on SMR drives - Atari partition fix from Gabriel - convert cfq to highres timers, since jiffies isn't granular enough for some devices these days. From Jan and Jeff - CFQ priority boost fix idle classes, from me - cleanup series from Ming, improving our bio/bvec iteration - a direct issue fix for blk-mq from Omar - fix for plug merging not involving the IO scheduler, like we do for other types of merges. From Tahsin - expose DAX type internally and through sysfs. From Toshi and Yigal * 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits) block: Fix front merge check block: do not merge requests without consulting with io scheduler block: Fix spelling in a source code comment block: expose QUEUE_FLAG_DAX in sysfs block: add QUEUE_FLAG_DAX for devices to advertise their DAX support Btrfs: fix comparison in __btrfs_map_block() block: atari: Return early for unsupported sector size Doc: block: Fix a typo in queue-sysfs.txt cfq-iosched: Charge at least 1 jiffie instead of 1 ns cfq-iosched: Fix regression in bonnie++ rewrite performance cfq-iosched: Convert slice_resid from u64 to s64 block: Convert fifo_time from ulong to u64 blktrace: avoid using timespec block/blk-cgroup.c: Declare local symbols static block/bio-integrity.c: Add #include "blk.h" block/partition-generic.c: Remove a set-but-not-used variable block: bio: kill BIO_MAX_SIZE cfq-iosched: temporarily boost queue priority for idle classes block: drbd: avoid to use BIO_MAX_SIZE block: bio: remove BIO_MAX_SECTORS ...
2016-07-20	libata-scsi: better style in ata_msense_*()	Tom Yan	1	-6/+13
	`changeable` is the "version" of mode page requested by the user. It will be less confusing/misleading if we do not check it "together" with the setting bits of the drive. Not to mention that we currently have ata_mselect_() implemented in a way that each of them will serve exclusively a particular bit on each page. The old style will hence make the condition look even more unnecessarily arcane if the ata_msense_() is reflecting more than one bit. Signed-off-by: Tom Yan <tom.ty89@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-07-15	libata-scsi: minor cleanup for ata_scsi_zbc_out_xlat	Damien Le Moal	1	-4/+4
	The reset_all variable name is misleading as this bit is also applicable to open, close, and finish actions. So rename that variable to "all" and remove the unnecessary mask operation that's already done earlier. Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com> [hch: split from the previous patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>
2016-07-15	libata-scsi: Fix ZBC management out command translation	Damien Le Moal	1	-2/+2
	The subcommand for NCQ NON-DATA must be specified in the feature (low byte), not the high-order count byte. Also make sure to properly cast the all bit to a u16 before shiting it by 8 to avoid undefined behavior. Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com> [hch: split the original patch into two, updated changelog] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>