summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-11-19md/r5cache: r5cache recovery: part 2Song Liu2-187/+26
1. In previous patch, we: - add new data to r5l_recovery_ctx - add new functions to recovery write-back cache The new functions are not used in this patch, so this patch does not change the behavior of recovery. 2. In this patchpatch, we: - modify main recovery procedure r5l_recovery_log() to call new functions - remove old functions Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-19md/r5cache: r5cache recovery: part 1Song Liu1-0/+602
Recovery of write-back cache has different logic to write-through only cache. Specifically, for write-back cache, the recovery need to scan through all active journal entries before flushing data out. Therefore, large portion of the recovery logic is rewritten here. To make the diffs cleaner, we split the rewrite as follows: 1. In this patch, we: - add new data to r5l_recovery_ctx - add new functions to recovery write-back cache The new functions are not used in this patch, so this patch does not change the behavior of recovery. 2. In next patch, we: - modify main recovery procedure r5l_recovery_log() to call new functions - remove old functions With cache feature, there are 2 different scenarios of recovery: 1. Data-Parity stripe: a stripe with complete parity in journal. 2. Data-Only stripe: a stripe with only data in journal (or partial parity). The code differentiate Data-Parity stripe from Data-Only stripe with flag STRIPE_R5C_CACHING. For Data-Parity stripes, we use the same procedure as raid5 journal, where all the data and parity are replayed to the RAID devices. For Data-Only strips, we need to finish complete calculate parity and finish the full reconstruct write or RMW write. For simplicity, in the recovery, we load the stripe to stripe cache. Once the array is started, the stripe cache state machine will handle these stripes through normal write path. r5c_recovery_flush_log contains the main procedure of recovery. The recovery code first scans through the journal and loads data to stripe cache. The code keeps tracks of all these stripes in a list (use sh->lru and ctx->cached_list), stripes in the list are organized in the order of its first appearance on the journal. During the scan, the recovery code assesses each stripe as Data-Parity or Data-Only. During scan, the array may run out of stripe cache. In these cases, the recovery code will also call raid5_set_cache_size to increase stripe cache size. If the array still runs out of stripe cache because there isn't enough memory, the array will not assemble. At the end of scan, the recovery code replays all Data-Parity stripes, and sets proper states for Data-Only stripes. The recovery code also increases seq number by 10 and rewrites all Data-Only stripes to journal. This is to avoid confusion after repeated crashes. More details is explained in raid5-cache.c before r5c_recovery_rewrite_data_only_stripes(). Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-19md/r5cache: refactoring journal recovery codeSong Liu1-9/+18
1. rename r5l_read_meta_block() as r5l_recovery_read_meta_block(); 2. pull the code that initialize r5l_meta_block from r5l_log_write_empty_meta_block() to a separate function r5l_recovery_create_empty_meta_block(), so that we can reuse this piece of code. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-19md/r5cache: sysfs entry journal_modeSong Liu3-0/+67
With write cache, journal_mode is the knob to switch between write-back and write-through. Below is an example: root@virt-test:~/# cat /sys/block/md0/md/journal_mode [write-through] write-back root@virt-test:~/# echo write-back > /sys/block/md0/md/journal_mode root@virt-test:~/# cat /sys/block/md0/md/journal_mode write-through [write-back] Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-19md/r5cache: write-out phase and reclaim supportSong Liu3-41/+430
There are two limited resources, stripe cache and journal disk space. For better performance, we priotize reclaim of full stripe writes. To free up more journal space, we free earliest data on the journal. In current implementation, reclaim happens when: 1. Periodically (every R5C_RECLAIM_WAKEUP_INTERVAL, 30 seconds) reclaim if there is no reclaim in the past 5 seconds. 2. when there are R5C_FULL_STRIPE_FLUSH_BATCH (256) cached full stripes, or cached stripes is enough for a full stripe (chunk size / 4k) (r5c_check_cached_full_stripe) 3. when there is pressure on stripe cache (r5c_check_stripe_cache_usage) 4. when there is pressure on journal space (r5l_write_stripe, r5c_cache_data) r5c_do_reclaim() contains new logic of reclaim. For stripe cache: When stripe cache pressure is high (more than 3/4 stripes are cached, or there is empty inactive lists), flush all full stripe. If fewer than R5C_RECLAIM_STRIPE_GROUP (NR_STRIPE_HASH_LOCKS * 2) full stripes are flushed, flush some paritial stripes. When stripe cache pressure is moderate (1/2 to 3/4 of stripes are cached), flush all full stripes. For log space: To avoid deadlock due to log space, we need to reserve enough space to flush cached data. The size of required log space depends on total number of cached stripes (stripe_in_journal_count). In current implementation, the writing-out phase automatically include pending data writes with parity writes (similar to write through case). Therefore, we need up to (conf->raid_disks + 1) pages for each cached stripe (1 page for meta data, raid_disks pages for all data and parity). r5c_log_required_to_flush_cache() calculates log space required to flush cache. In the following, we refer to the space calculated by r5c_log_required_to_flush_cache() as reclaim_required_space. Two flags are added to r5conf->cache_state: R5C_LOG_TIGHT and R5C_LOG_CRITICAL. R5C_LOG_TIGHT is set when free space on the log device is less than 3x of reclaim_required_space. R5C_LOG_CRITICAL is set when free space on the log device is less than 2x of reclaim_required_space. r5c_cache keeps all data in cache (not fully committed to RAID) in a list (stripe_in_journal_list). These stripes are in the order of their first appearance on the journal. So the log tail (last_checkpoint) should point to the journal_start of the first item in the list. When R5C_LOG_TIGHT is set, r5l_reclaim_thread starts flushing out stripes at the head of stripe_in_journal. When R5C_LOG_CRITICAL is set, the state machine only writes data that are already in the log device (in stripe_in_journal_list). This patch includes a fix to improve performance by Shaohua Li <shli@fb.com>. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-19md/r5cache: caching phase of r5cacheSong Liu3-32/+381
As described in previous patch, write back cache operates in two phases: caching and writing-out. The caching phase works as: 1. write data to journal (r5c_handle_stripe_dirtying, r5c_cache_data) 2. call bio_endio (r5c_handle_data_cached, r5c_return_dev_pending_writes). Then the writing-out phase is as: 1. Mark the stripe as write-out (r5c_make_stripe_write_out) 2. Calcualte parity (reconstruct or RMW) 3. Write parity (and maybe some other data) to journal device 4. Write data and parity to RAID disks This patch implements caching phase. The cache is integrated with stripe cache of raid456. It leverages code of r5l_log to write data to journal device. Writing-out phase of the cache is implemented in the next patch. With r5cache, write operation does not wait for parity calculation and write out, so the write latency is lower (1 write to journal device vs. read and then write to raid disks). Also, r5cache will reduce RAID overhead (multipile IO due to read-modify-write of parity) and provide more opportunities of full stripe writes. This patch adds 2 flags to stripe_head.state: - STRIPE_R5C_PARTIAL_STRIPE, - STRIPE_R5C_FULL_STRIPE, Instead of inactive_list, stripes with cached data are tracked in r5conf->r5c_full_stripe_list and r5conf->r5c_partial_stripe_list. STRIPE_R5C_FULL_STRIPE and STRIPE_R5C_PARTIAL_STRIPE are flags for stripes in these lists. Note: stripes in r5c_full/partial_stripe_list are not considered as "active". For RMW, the code allocates an extra page for each data block being updated. This is stored in r5dev->orig_page and the old data is read into it. Then the prexor calculation subtracts ->orig_page from the parity block, and the reconstruct calculation adds the ->page data back into the parity block. r5cache naturally excludes SkipCopy. When the array has write back cache, async_copy_data() will not skip copy. There are some known limitations of the cache implementation: 1. Write cache only covers full page writes (R5_OVERWRITE). Writes of smaller granularity are write through. 2. Only one log io (sh->log_io) for each stripe at anytime. Later writes for the same stripe have to wait. This can be improved by moving log_io to r5dev. 3. With writeback cache, read path must enter state machine, which is a significant bottleneck for some workloads. 4. There is no per stripe checkpoint (with r5l_payload_flush) in the log, so recovery code has to replay more than necessary data (sometimes all the log from last_checkpoint). This reduces availability of the array. This patch includes a fix proposed by ZhengYuan Liu <liuzhengyuan@kylinos.cn> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-19md/r5cache: State machine for raid5-cache write back modeSong Liu3-8/+211
This patch adds state machine for raid5-cache. With log device, the raid456 array could operate in two different modes (r5c_journal_mode): - write-back (R5C_MODE_WRITE_BACK) - write-through (R5C_MODE_WRITE_THROUGH) Existing code of raid5-cache only has write-through mode. For write-back cache, it is necessary to extend the state machine. With write-back cache, every stripe could operate in two different phases: - caching - writing-out In caching phase, the stripe handles writes as: - write to journal - return IO In writing-out phase, the stripe behaviors as a stripe in write through mode R5C_MODE_WRITE_THROUGH. STRIPE_R5C_CACHING is added to sh->state to differentiate caching and writing-out phase. Please note: this is a "no-op" patch for raid5-cache write-through mode. The following detailed explanation is copied from the raid5-cache.c: /* * raid5 cache state machine * * With rhe RAID cache, each stripe works in two phases: * - caching phase * - writing-out phase * * These two phases are controlled by bit STRIPE_R5C_CACHING: * if STRIPE_R5C_CACHING == 0, the stripe is in writing-out phase * if STRIPE_R5C_CACHING == 1, the stripe is in caching phase * * When there is no journal, or the journal is in write-through mode, * the stripe is always in writing-out phase. * * For write-back journal, the stripe is sent to caching phase on write * (r5c_handle_stripe_dirtying). r5c_make_stripe_write_out() kicks off * the write-out phase by clearing STRIPE_R5C_CACHING. * * Stripes in caching phase do not write the raid disks. Instead, all * writes are committed from the log device. Therefore, a stripe in * caching phase handles writes as: * - write to log device * - return IO * * Stripes in writing-out phase handle writes as: * - calculate parity * - write pending data and parity to journal * - write data and parity to raid disks * - return IO for pending writes */ Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-19md/r5cache: move some code to raid5.hSong Liu2-71/+77
Move some define and inline functions to raid5.h, so they can be used in raid5-cache.c Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-19md/r5cache: Check array size in r5l_init_logSong Liu1-10/+16
Currently, r5l_write_stripe checks meta size for each stripe write, which is not necessary. With this patch, r5l_init_log checks maximal meta size of the array, which is (r5l_meta_block + raid_disks x r5l_payload_data_parity). If this is too big to fit in one page, r5l_init_log aborts. With current meta data, r5l_log support raid_disks up to 203. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-18md: add blktrace event for writes to superblockShaohua Li1-0/+3
superblock write is an expensive operation. With raid5-cache, it can be called regularly. Tracing to help performance debug. Signed-off-by: Shaohua Li <shli@fb.com> Cc: NeilBrown <neilb@suse.com>
2016-11-18md/raid1, raid10: add blktrace records when IO is delayedNeilBrown2-0/+16
Both raid1 and raid10 will sometimes delay handling an IO request, such as when resync is happening or there are too many requests queued. Add some blktrace messsages so we can see when that is happening when looking for performance artefacts. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-18md/bitmap: add blktrace event for writes to the bitmapNeilBrown1-1/+10
We trace wheneven bitmap_unplug() finds that it needs to write to the bitmap, or when bitmap_daemon_work() find there is work to do. This makes it easier to correlate bitmap updates with data writes. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-18md: add block tracing for bio_remappingNeilBrown4-13/+75
The block tracing infrastructure (accessed with blktrace/blkparse) supports the tracing of mapping bios from one device to another. This is currently used when a bio in a partition is mapped to the whole device, when bios are mapped by dm, and for mapping in md/raid5. Other md personalities do not include this tracing yet, so add it. When a read-error is detected we redirect the request to a different device. This could justifiably be seen as a new mapping for the originial bio, or a secondary mapping for the bio that errors. This patch uses the second option. When md is used under dm-raid, the mappings are not traced as we do not have access to the block device number of the parent. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-17raid5-cache: fix lockdep warningShaohua Li1-2/+14
lockdep reports warning of the rcu_dereference usage. Using normal rdev access pattern to avoid the warning. Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-10md: remove md_super_wait() call after bitmap_flush()NeilBrown1-1/+0
bitmap_flush() finishes with bitmap_update_sb(), and that finishes with write_page(..., 1), so write_page() will wait for all writes to complete. So there is no point calling md_super_wait() immediately afterwards. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-09md: define mddev flags, recovery flags and r1bio state bits using enumsNeilBrown2-48/+46
This is less error prone than using individual #defines. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-09md/raid1: fix: IO can block resync indefinitelyNeilBrown1-1/+1
While performing a resync/recovery, raid1 divides the array space into three regions: - before the resync - at or shortly after the resync point - much further ahead of the resync point. Write requests to the first or third do not need to wait. Write requests to the middle region do need to wait if resync requests are pending. If there are any active write requests in the middle region, resync will wait for them. Due to an accounting error, there is a small range of addresses, between conf->next_resync and conf->start_next_window, where write requests will *not* be blocked, but *will* be counted in the middle region. This can effectively block resync indefinitely if filesystem writes happen repeatedly to this region. As ->next_window_requests is incremented when the sector is after conf->start_next_window + NEXT_NORMALIO_DISTANCE the same boundary should be used for determining when write requests should wait. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-08md/bitmap: Don't write bitmap while earlier writes might be in-flightNeilBrown1-5/+22
As we don't wait for writes to complete in bitmap_daemon_work, they could still be in-flight when bitmap_unplug writes again. Or when bitmap_daemon_work tries to write again. This can be confusing and could risk the wrong data being written last. So make sure we wait for old writes to complete before new writes start. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-08md/raid10: abort delayed writes when device fails.NeilBrown1-6/+16
When writing to an array with a bitmap enabled, the writes are grouped in batches which are preceded by an update to the bitmap. It is quite likely if that a drive develops a problem which is not media related, that the bitmap write will be the first to report an error and cause the device to be marked faulty (as the bitmap write is at the start of a batch). In this case, there is point submiting the subsequent writes to the failed device - that just wastes times. So re-check the Faulty state of a device before submitting a delayed write. This requires that we keep the 'rdev', rather than the 'bdev' in the bio, then swap in the bdev just before final submission. Reported-by: Hannes Reinecke <hare@suse.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-08md/raid1: abort delayed writes when device fails.NeilBrown1-5/+15
When writing to an array with a bitmap enabled, the writes are grouped in batches which are preceded by an update to the bitmap. It is quite likely if that a drive develops a problem which is not media related, that the bitmap write will be the first to report an error and cause the device to be marked faulty (as the bitmap write is at the start of a batch). In this case, there is point submiting the subsequent writes to the failed device - that just wastes times. So re-check the Faulty state of a device before submitting a delayed write. This requires that we keep the 'rdev', rather than the 'bdev' in the bio, then swap in the bdev just before final submission. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-08md: perform async updates for metadata where possible.NeilBrown1-4/+12
When adding devices to, or removing device from, an array we need to update the metadata. However we don't need to do it synchronously as data integrity doesn't depend on these changes being recorded instantly. So avoid the synchronous call to md_update_sb and just set a flag so that the thread will do it. This can reduce the number of updates performed when lots of devices are being added or removed. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-08raid5-cache: restrict the use area of the log_offset variableJackieLiu1-11/+8
We can calculate this offset by using ctx->meta_total_blocks, without passing in from the function Signed-off-by: JackieLiu <liuyun01@kylinos.cn> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-08md/raid5: change printk() to pr_*()NeilBrown1-121/+86
Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-08md/raid10: change printk() to pr_*()NeilBrown1-85/+56
Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-08md/raid1: change printk() to pr_*()NeilBrown1-58/+41
Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-08md/raid0: replace printk() with pr_*()NeilBrown1-45/+44
This makes md/raid0 much less verbose as the messages about the array geometry are now pr_debug() Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-08md/multipath: replace printk() with pr_*()NeilBrown1-57/+33
Also remove all messages about memory allocation failure. page_alloc() reports those. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-08md/linear: replace printk() with pr_*()NeilBrown1-8/+5
Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-08md/bitmap: change all printk() to pr_*()NeilBrown1-57/+50
Follow err/warn distinction introduced in md.c Join multi-part strings into single string. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-08md: change all printk() to pr_err() or pr_warn() etc.NeilBrown1-219/+170
1/ using pr_debug() for a number of messages reduces the noise of md, but still allows them to be enabled when needed. 2/ try to be consistent in the usage of pr_err() and pr_warn(), and document the intention 3/ When strings have been split onto multiple lines, rejoin into a single string. The cost of having lines > 80 chars is less than the cost of not being able to easily search for a particular message. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-08md: fix some issues with alloc_disk_sb()NeilBrown1-10/+10
1/ don't print a warning if allocation fails. page_alloc() does that already. 2/ always check return status for error. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-08md/bitmap: call bitmap_file_unmap once bitmap_storage_alloc returns -ENOMEMGuoqing Jiang1-1/+3
It is possible that bitmap_storage_alloc could return -ENOMEM, and some member inside store could be allocated such as filemap. To avoid memory leak, we need to call bitmap_file_unmap to free those members in the bitmap_resize. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-08raid5: revert commit 11367799f3d1Tomasz Majchrzak1-3/+1
Revert commit 11367799f3d1 ("md: Prevent IO hold during accessing to faulty raid5 array") as it doesn't comply with commit c3cce6cda162 ("md/raid5: ensure device failure recorded before write request returns."). That change is not required anymore as the problem is resolved by commit 16f889499a52 ("md: report 'write_pending' state when array in sync") - read request is stuck as array state is not reported correctly via sysfs attribute. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-08md: wake up personality thread after array state updateTomasz Majchrzak1-0/+1
When raid1/raid10 array fails to write to one of the drives, the request is added to bio_end_io_list and finished by personality thread. The thread doesn't handle it as long as MD_CHANGE_PENDING flag is set. In case of external metadata this flag is cleared, however the thread is not woken up. It causes request to be blocked for few seconds (until another action on the array wakes up the thread) or to get stuck indefinitely. Wake up personality thread once MD_CHANGE_PENDING has been cleared. Moving 'restart_array' call after the flag is cleared it not a solution because in read-write mode the call doesn't wake up the thread. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-08md: don't fail an array if there are unacknowledged bad blocksTomasz Majchrzak1-1/+3
If external metadata handler supports bad blocks and unacknowledged bad blocks are present, don't report disk via sysfs as faulty. Such situation can be still handled so disk just has to be blocked for a moment. It makes it consistent with kernel state as corresponding rdev flag is also not set. When the disk in being unblocked there are few cases: 1. Disk has been in blocked and faulty state, it is being unblocked but it still remains in faulty state. Metadata handler will remove it from array in the next call. 2. There is no bad block support in external metadata handler and bad blocks are present - put the disk in blocked and faulty state (see case 1). 3. There is bad block support in external metadata handler and all bad blocks are acknowledged - clear all flags, continue. 4. There is bad block support in external metadata handler but there are still unacknowledged bad blocks - clear all flags, continue. It is fine to clear Blocked flag because it was probably not set anyway (if it was it is case 1). BlockedBadBlocks flag can also be cleared because the request waiting for it will set it again when it finds out that some bad block is still not acknowledged. Recovery is not necessary but there are no problems if the flag is set. Sysfs rdev state is still reported as blocked (due to unacknowledged bad blocks) so metadata handler will process remaining bad blocks and unblock disk again. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-08md: add bad block support for external metadataTomasz Majchrzak2-39/+42
Add new rdev flag which external metadata handler can use to switch on/off bad block support. If new bad block is encountered, notify it via rdev 'unacknowledged_bad_blocks' sysfs file. If bad block has been cleared, notify update to rdev 'bad_blocks' sysfs file. When bad blocks support is being removed, just clear rdev flag. It is not necessary to reset badblocks->shift field. If there are bad blocks cleared or added at the same time, it is ok for those changes to be applied to the structure. The array is in blocked state and the drive which cannot handle bad blocks any more will be removed from the array before it is unlocked. Simplify state_show function by adding a separator at the end of each string and overwrite last separator with new line. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-08lib/raid6: Add AVX2 optimized xor_syndrome functionsGayatri Kammela1-3/+229
Implement the AVX2 optimization of RAID6 xor_syndrome functions which is simply based on sse2.c written by hpa. Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Yuanhan Liu <yuanhan.liu@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Gayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
2016-11-07Merge tag 'arm64-fixes' of ↵Linus Torvalds4-21/+42
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Will Deacon: "It's been pretty quiet on the fixes side of things for us, but Artem reported a build failure introduced during the merge window that appears with older GCCs that do not support asm goto. The fix is bigger than I'd like, but it's a mechnical move of some constants to break an include dependency between atomic.h and jump_label.h when !HAVE_JUMP_LABEL. Summary: - Fix build failure on compilers without asm goto" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Fix circular include of asm/lse.h through linux/jump_label.h
2016-11-07Merge tag 'openrisc-for-linus-v4.9-rc5' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull openrisc fix from Guenter Roeck: "Fix openrisc crash caused by ro_init changes" * tag 'openrisc-for-linus-v4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: openrisc: Define __ro_after_init to avoid crash
2016-11-07Merge tag 'hwmon-for-linus-v4.9-rc5' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fix from Guenter Roeck: "Fix resource leak on devm_kcalloc failure" * tag 'hwmon-for-linus-v4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (core) fix resource leak on devm_kcalloc failure
2016-11-07Merge branch 'for-linus' of ↵Linus Torvalds6-36/+95
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID fixes from Jiri Kosina: - modprobe-after-rmmod load failure bugfix for intel-ish, from Even Xu - IRQ probing bugfix for intel-ish, from Srinivas Pandruvada - attribute parsing fix in hid-sensor, from Ooi, Joyce - other small misc fixes / quirky device additions * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: sensor: fix attributes in HID sensor interface HID: intel-ish-hid: request_irq failure HID: intel-ish-hid: Fix driver reinit failure HID: intel-ish-hid: Move DMA disable code to new function HID: intel-ish-hid: consolidate ish wake up operation HID: usbhid: add ATEN CS962 to list of quirky devices HID: intel-ish-hid: Fix !CONFIG_PM build warning HID: sensor-hub: Fix packing of result buffer for feature report
2016-11-06openrisc: Define __ro_after_init to avoid crashGuenter Roeck1-0/+2
openrisc qemu tests fail with the following crash. Unable to handle kernel access at virtual address 0xc0300c34 Oops#: 0001 CPU #: 0 PC: c016c710 SR: 0000ae67 SP: c1017e04 GPR00: 00000000 GPR01: c1017e04 GPR02: c0300c34 GPR03: c0300c34 GPR04: 00000000 GPR05: c0300cb0 GPR06: c0300c34 GPR07: 000000ff GPR08: c107f074 GPR09: c0199ef4 GPR10: c1016000 GPR11: 00000000 GPR12: 00000000 GPR13: c107f044 GPR14: c0473774 GPR15: 07ce0000 GPR16: 00000000 GPR17: c107ed8a GPR18: 00009600 GPR19: c107f044 GPR20: c107ee74 GPR21: 00000003 GPR22: c0473770 GPR23: 00000033 GPR24: 000000bf GPR25: 00000019 GPR26: c046400c GPR27: 00000001 GPR28: c0464028 GPR29: c1018000 GPR30: 00000006 GPR31: ccf37483 RES: 00000000 oGPR11: ffffffff Process swapper (pid: 1, stackpage=c1001960) Stack: Stack dump [0xc1017cf8]: sp + 00: 0xc1017e04 sp + 04: 0xc0300c34 sp + 08: 0xc0300c34 sp + 12: 0x00000000 ... Bisect points to commit d2ec3f77de8e ("pty: make ptmx file ops read-only after init"). Fix by defining __ro_after_init for the openrisc architecture, similar to parisc. Fixes: d2ec3f77de8e ("pty: make ptmx file ops read-only after init") Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Stafford Horne <shorne@gmail.com>
2016-11-06Linux 4.9-rc4v4.9-rc4Linus Torvalds1-1/+1
2016-11-06Merge branch 'i2c/for-current' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fix from Wolfram Sang: "A bugfix for the I2C core fixing a (rare) race condition" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: core: fix NULL pointer dereference under race condition
2016-11-05arm64: Fix circular include of asm/lse.h through linux/jump_label.hCatalin Marinas4-21/+42
Commit efd9e03facd0 ("arm64: Use static keys for CPU features") introduced support for static keys in asm/cpufeature.h, including linux/jump_label.h. When CC_HAVE_ASM_GOTO is not defined, this causes a circular dependency via linux/atomic.h, asm/lse.h and asm/cpufeature.h. This patch moves the capability macros out out of asm/cpufeature.h into a separate asm/cpucaps.h and modifies some of the #includes accordingly. Fixes: efd9e03facd0 ("arm64: Use static keys for CPU features") Reported-by: Artem Savkov <asavkov@redhat.com> Tested-by: Artem Savkov <asavkov@redhat.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-11-05Merge branches 'sched-urgent-for-linus' and 'core-urgent-for-linus' of ↵Linus Torvalds2-9/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull stack vmap fixups from Thomas Gleixner: "Two small patches related to sched_show_task(): - make sure to hold a reference on the task stack while accessing it - remove the thread_saved_pc printout .. and add a sanity check into release_task_stack() to catch problems with task stack references" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Remove pointless printout in sched_show_task() sched/core: Fix oops in sched_show_task() * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fork: Add task stack refcounting sanity check and prevent premature task stack freeing
2016-11-05Merge tag 'md/4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/mdLinus Torvalds4-19/+30
Pull MD fixes from Shaohua Li: "There are several bug fixes queued: - fix raid5-cache recovery bugs - fix discard IO error handling for raid1/10 - fix array sync writes bogus position to superblock - fix IO error handling for raid array with external metadata" * tag 'md/4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: md: be careful not lot leak internal curr_resync value into metadata. -- (all) raid1: handle read error also in readonly mode raid5-cache: correct condition for empty metadata write md: report 'write_pending' state when array in sync md/raid5: write an empty meta-block when creating log super-block md/raid5: initialize next_checkpoint field before use RAID10: ignore discard error RAID1: ignore discard error
2016-11-05Merge tag 'scsi-fixes' of ↵Linus Torvalds3-17/+6
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two more important data integrity fixes related to RAID device drivers which wrongly throw away the SYNCHRONIZE CACHE command in the non-RAID path and a memory leak in the scsi_debug driver" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: arcmsr: Send SYNCHRONIZE_CACHE command to firmware scsi: scsi_debug: Fix memory leak if LBP enabled and module is unloaded scsi: megaraid_sas: Fix data integrity failure for JBOD (passthrough) devices
2016-11-05Merge branch 'for-linus' of ↵Linus Torvalds2-3/+10
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input subsystem updates from Dmitry Torokhov. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: psmouse - cleanup Focaltech code Input: i8042 - add XMG C504 to keyboard reset table
2016-11-05Merge tag 'firewire-fixes' of ↵Linus Torvalds1-20/+39
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 Pull FireWire (IEEE 1394) fixes from Stefan Richter: - add missing input validation to the firewire-net driver. Invalid IP-over-1394 encapsulation headers could trigger buffer overflows (CVE 2016-8633). - IP-over-1394 link fragmentation headers were read and written incorrectly, breaking fragmented RX/TX with other OS's stacks. * tag 'firewire-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: firewire: net: fix fragmented datagram_size off-by-one firewire: net: guard against rx buffer overflows