summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-12-10dm thin: allow pool in read-only mode to transition to read-write modeJoe Thornber
A thin-pool may be in read-only mode because the pool's data or metadata space was exhausted. To allow for recovery, by adding more space to the pool, we must allow a pool to transition from PM_READ_ONLY to PM_WRITE mode. Otherwise, running out of space will render the pool permanently read-only. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2013-12-10dm thin: re-establish read-only state when switching to fail modeJoe Thornber
If the thin-pool transitioned to fail mode and the thin-pool's table were reloaded for some reason: the new table's default pool mode would be read-write, though it will transition to fail mode during resume. When the pool mode transitions directly from PM_WRITE to PM_FAIL we need to re-establish the intermediate read-only state in both the metadata and persistent-data block manager (as is usually done with the normal pool mode transition sequence: PM_WRITE -> PM_READ_ONLY -> PM_FAIL). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2013-12-10dm thin: always fallback the pool mode if commit failsJoe Thornber
Rename commit_or_fallback() to commit(). Now all previous calls to commit() will trigger the pool mode to fallback if the commit fails. Also, check the error returned from commit() in alloc_data_block(). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2013-12-10dm thin: switch to read-only mode if metadata space is exhaustedMike Snitzer
Switch the thin pool to read-only mode in alloc_data_block() if dm_pool_alloc_data_block() fails because the pool's metadata space is exhausted. Differentiate between data and metadata space in messages about no free space available. This issue was noticed with the device-mapper-test-suite using: dmtest run --suite thin-provisioning -n /exhausting_metadata_space_causes_fail_mode/ The quantity of errors logged in this case must be reduced. before patch: device-mapper: thin: 253:4: reached low water mark for metadata device: sending event. device-mapper: space map metadata: unable to allocate new metadata block device-mapper: space map common: dm_tm_shadow_block() failed device-mapper: space map metadata: unable to allocate new metadata block device-mapper: space map common: dm_tm_shadow_block() failed device-mapper: space map metadata: unable to allocate new metadata block device-mapper: space map common: dm_tm_shadow_block() failed device-mapper: space map metadata: unable to allocate new metadata block device-mapper: space map common: dm_tm_shadow_block() failed device-mapper: space map metadata: unable to allocate new metadata block device-mapper: space map common: dm_tm_shadow_block() failed <snip ... these repeat for a _very_ long while ... > device-mapper: space map metadata: unable to allocate new metadata block device-mapper: thin: 253:4: commit failed: error = -28 device-mapper: thin: 253:4: switching pool to read-only mode after patch: device-mapper: thin: 253:4: reached low water mark for metadata device: sending event. device-mapper: space map metadata: unable to allocate new metadata block device-mapper: thin: 253:4: no free metadata space available. device-mapper: thin: 253:4: switching pool to read-only mode Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Cc: stable@vger.kernel.org
2013-12-10dm thin: switch to read only mode if a mapping insert failsJoe Thornber
Switch the thin pool to read-only mode when dm_thin_insert_block() fails since there is little reason to expect the cause of the failure to be resolved without further action by user space. This issue was noticed with the device-mapper-test-suite using: dmtest run --suite thin-provisioning -n /exhausting_metadata_space_causes_fail_mode/ The quantity of errors logged in this case must be reduced. before patch: device-mapper: thin: dm_thin_insert_block() failed device-mapper: space map metadata: unable to allocate new metadata block device-mapper: thin: dm_thin_insert_block() failed device-mapper: space map metadata: unable to allocate new metadata block device-mapper: thin: dm_thin_insert_block() failed device-mapper: space map metadata: unable to allocate new metadata block device-mapper: thin: dm_thin_insert_block() failed device-mapper: space map metadata: unable to allocate new metadata block device-mapper: thin: dm_thin_insert_block() failed device-mapper: space map metadata: unable to allocate new metadata block device-mapper: thin: dm_thin_insert_block() failed device-mapper: space map metadata: unable to allocate new metadata block device-mapper: thin: dm_thin_insert_block() failed device-mapper: space map metadata: unable to allocate new metadata block device-mapper: thin: dm_thin_insert_block() failed device-mapper: space map metadata: unable to allocate new metadata block device-mapper: thin: dm_thin_insert_block() failed device-mapper: space map metadata: unable to allocate new metadata block device-mapper: thin: dm_thin_insert_block() failed device-mapper: space map metadata: unable to allocate new metadata block device-mapper: space map metadata: unable to allocate new metadata block device-mapper: space map metadata: unable to allocate new metadata block device-mapper: space map metadata: unable to allocate new metadata block device-mapper: space map metadata: unable to allocate new metadata block device-mapper: space map metadata: unable to allocate new metadata block <snip ... these repeat for a long while ... > device-mapper: space map metadata: unable to allocate new metadata block device-mapper: space map common: dm_tm_shadow_block() failed device-mapper: thin: 253:4: no free metadata space available. device-mapper: thin: 253:4: switching pool to read-only mode after patch: device-mapper: space map metadata: unable to allocate new metadata block device-mapper: thin: 253:4: dm_thin_insert_block() failed: error = -28 device-mapper: thin: 253:4: switching pool to read-only mode Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2013-12-10dm space map metadata: return on failure in sm_metadata_new_blockMike Snitzer
Commit 2fc48021f4afdd109b9e52b6eef5db89ca80bac7 ("dm persistent metadata: add space map threshold callback") introduced a regression to the metadata block allocation path that resulted in errors being ignored. This regression was uncovered by running the following device-mapper-test-suite test: dmtest run --suite thin-provisioning -n /exhausting_metadata_space_causes_fail_mode/ The ignored error codes in sm_metadata_new_block() could crash the kernel through use of either the dm-thin or dm-cache targets, e.g.: device-mapper: thin: 253:4: reached low water mark for metadata device: sending event. device-mapper: space map metadata: unable to allocate new metadata block general protection fault: 0000 [#1] SMP ... Workqueue: dm-thin do_worker [dm_thin_pool] task: ffff880035ce2ab0 ti: ffff88021a054000 task.ti: ffff88021a054000 RIP: 0010:[<ffffffffa0331385>] [<ffffffffa0331385>] metadata_ll_load_ie+0x15/0x30 [dm_persistent_data] RSP: 0018:ffff88021a055a68 EFLAGS: 00010202 RAX: 003fc8243d212ba0 RBX: ffff88021a780070 RCX: ffff88021a055a78 RDX: ffff88021a055a78 RSI: 0040402222a92a80 RDI: ffff88021a780070 RBP: ffff88021a055a68 R08: ffff88021a055ba4 R09: 0000000000000010 R10: 0000000000000000 R11: 00000002a02e1000 R12: ffff88021a055ad4 R13: 0000000000000598 R14: ffffffffa0338470 R15: ffff88021a055ba4 FS: 0000000000000000(0000) GS:ffff88033fca0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f467c0291b8 CR3: 0000000001a0b000 CR4: 00000000000007e0 Stack: ffff88021a055ab8 ffffffffa0332020 ffff88021a055b30 0000000000000001 ffff88021a055b30 0000000000000000 ffff88021a055b18 0000000000000000 ffff88021a055ba4 ffff88021a055b98 ffff88021a055ae8 ffffffffa033304c Call Trace: [<ffffffffa0332020>] sm_ll_lookup_bitmap+0x40/0xa0 [dm_persistent_data] [<ffffffffa033304c>] sm_metadata_count_is_more_than_one+0x8c/0xc0 [dm_persistent_data] [<ffffffffa0333825>] dm_tm_shadow_block+0x65/0x110 [dm_persistent_data] [<ffffffffa0331b00>] sm_ll_mutate+0x80/0x300 [dm_persistent_data] [<ffffffffa0330e60>] ? set_ref_count+0x10/0x10 [dm_persistent_data] [<ffffffffa0331dba>] sm_ll_inc+0x1a/0x20 [dm_persistent_data] [<ffffffffa0332270>] sm_disk_new_block+0x60/0x80 [dm_persistent_data] [<ffffffff81520036>] ? down_write+0x16/0x40 [<ffffffffa001e5c4>] dm_pool_alloc_data_block+0x54/0x80 [dm_thin_pool] [<ffffffffa001b23c>] alloc_data_block+0x9c/0x130 [dm_thin_pool] [<ffffffffa001c27e>] provision_block+0x4e/0x180 [dm_thin_pool] [<ffffffffa001fe9a>] ? dm_thin_find_block+0x6a/0x110 [dm_thin_pool] [<ffffffffa001c57a>] process_bio+0x1ca/0x1f0 [dm_thin_pool] [<ffffffff8111e2ed>] ? mempool_free+0x8d/0xa0 [<ffffffffa001d755>] process_deferred_bios+0xc5/0x230 [dm_thin_pool] [<ffffffffa001d911>] do_worker+0x51/0x60 [dm_thin_pool] [<ffffffff81067872>] process_one_work+0x182/0x3b0 [<ffffffff81068c90>] worker_thread+0x120/0x3a0 [<ffffffff81068b70>] ? manage_workers+0x160/0x160 [<ffffffff8106eb2e>] kthread+0xce/0xe0 [<ffffffff8106ea60>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8152af6c>] ret_from_fork+0x7c/0xb0 [<ffffffff8106ea60>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff8152af6c>] ret_from_fork+0x7c/0xb0 [<ffffffff8106ea60>] ? kthread_freezable_should_stop+0x70/0x70 Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Cc: stable@vger.kernel.org # v3.10+
2013-12-10dm table: fail dm_table_create on dm_round_up overflowMikulas Patocka
The dm_round_up function may overflow to zero. In this case, dm_table_create() must fail rather than go on to allocate an empty array with alloc_targets(). This fixes a possible memory corruption that could be caused by passing too large a number in "param->target_count". Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2013-12-10dm snapshot: avoid snapshot space leak on crashMikulas Patocka
There is a possible leak of snapshot space in case of crash. The reason for space leaking is that chunks in the snapshot device are allocated sequentially, but they are finished (and stored in the metadata) out of order, depending on the order in which copying finished. For example, supposed that the metadata contains the following records SUPERBLOCK METADATA (blocks 0 ... 250) DATA 0 DATA 1 DATA 2 ... DATA 250 Now suppose that you allocate 10 new data blocks 251-260. Suppose that copying of these blocks finish out of order (block 260 finished first and the block 251 finished last). Now, the snapshot device looks like this: SUPERBLOCK METADATA (blocks 0 ... 250, 260, 259, 258, 257, 256) DATA 0 DATA 1 DATA 2 ... DATA 250 DATA 251 DATA 252 DATA 253 DATA 254 DATA 255 METADATA (blocks 255, 254, 253, 252, 251) DATA 256 DATA 257 DATA 258 DATA 259 DATA 260 Now, if the machine crashes after writing the first metadata block but before writing the second metadata block, the space for areas DATA 250-255 is leaked, it contains no valid data and it will never be used in the future. This patch makes dm-snapshot complete exceptions in the same order they were allocated, thus fixing this bug. Note: when backporting this patch to the stable kernel, change the version field in the following way: * if version in the stable kernel is {1, 11, 1}, change it to {1, 12, 0} * if version in the stable kernel is {1, 10, 0} or {1, 10, 1}, change it to {1, 10, 2} Userspace reads the version to determine if the bug was fixed, so the version change is needed. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2013-11-18dm delay: fix a possible deadlock due to shared workqueueMikulas Patocka
The dm-delay target uses a shared workqueue for multiple instances. This can cause deadlock if two or more dm-delay targets are stacked on the top of each other. This patch changes dm-delay to use a per-instance workqueue. Cc: stable@vger.kernel.org # 2.6.22+ Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-12dm cache: resolve small nits and improve DocumentationMike Snitzer
Document passthrough mode, cache shrinking, and cache invalidation. Also, use strcasecmp() and hlist_unhashed(). Reported-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-11dm cache: add cache block invalidation supportJoe Thornber
Cache block invalidation is removing an entry from the cache without writing it back. Cache blocks can be invalidated via the 'invalidate_cblocks' message, which takes an arbitrary number of cblock ranges: invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]* E.g. dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789 Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-11dm cache: add remove_cblock method to policy interfaceJoe Thornber
Implement policy_remove_cblock() and add remove_cblock method to the mq policy. These methods will be used by the following cache block invalidation patch which adds the 'invalidate_cblocks' message to the cache core. Also, update some comments in dm-cache-policy.h Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-11dm cache policy mq: reduce memory requirementsJoe Thornber
Rather than storing the cblock in each cache entry, we allocate all entries in an array and infer the cblock from the entry position. Saves 4 bytes of memory per cache block. In addition, this gives us an easy way of looking up cache entries by cblock. We no longer need to keep an explicit bitset to track which cblocks have been allocated. And no searching is needed to find free cblocks. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-11dm cache metadata: check the metadata version when reading the superblockJoe Thornber
Need to check the version to verify on-disk metadata is supported. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-11dm cache: add passthrough modeJoe Thornber
"Passthrough" is a dm-cache operating mode (like writethrough or writeback) which is intended to be used when the cache contents are not known to be coherent with the origin device. It behaves as follows: * All reads are served from the origin device (all reads miss the cache) * All writes are forwarded to the origin device; additionally, write hits cause cache block invalidates This mode decouples cache coherency checks from cache device creation, largely to avoid having to perform coherency checks while booting. Boot scripts can create cache devices in passthrough mode and put them into service (mount cached filesystems, for example) without having to worry about coherency. Coherency that exists is maintained, although the cache will gradually cool as writes take place. Later, applications can perform coherency checks, the nature of which will depend on the type of the underlying storage. If coherency can be verified, the cache device can be transitioned to writethrough or writeback mode while still warm; otherwise, the cache contents can be discarded prior to transitioning to the desired operating mode. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Morgan Mears <Morgan.Mears@netapp.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-11dm cache: cache shrinking supportJoe Thornber
Allow a cache to shrink if the blocks being removed from the cache are not dirty. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09dm cache: promotion optimisation for writesJoe Thornber
If a write block triggers promotion and covers a whole block we can avoid a copy. Introduce dm_{hook,unhook}_bio to simplify saving and restoring bio fields (bi_private is now used by overwrite). Switch writethrough support over to using these helpers too. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09dm cache: be much more aggressive about promoting writes to discarded blocksJoe Thornber
Previously these promotions only got priority if there were unused cache blocks. Now we give them priority if there are any clean blocks in the cache. The fio_soak_test in the device-mapper-test-suite now gives uniform performance across subvolumes (~16 seconds). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09dm cache policy mq: implement writeback_work() and mq_{set,clear}_dirty()Joe Thornber
There are now two multiqueues for in cache blocks. A clean one and a dirty one. writeback_work comes from the dirty one. Demotions come from the clean one. There are two benefits: - Performance improvement, since demoting a clean block is a noop. - The cache cleans itself when io load is light. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09dm cache: optimize commit_if_neededHeinz Mauelshagen
Check commit_requested flag _before_ calling dm_cache_changed_this_transaction() superfluously. Also, be sure to set last_commit_jiffies _after_ dm_cache_commit() completes. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09dm space map disk: optimise sm_disk_dec_blockJoe Thornber
Don't waste time spotting blocks that have been allocated and then freed in the same transaction. The extra lookup is expensive, and I don't think it really gives us much. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09MAINTAINERS: add reference to device-mapper's linux-dm.git treeMike Snitzer
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09dm: fix Kconfig menu indentationMikulas Patocka
The option DM_LOG_USERSPACE is sub-option of DM_MIRROR, so place it right after DM_MIRROR. Doing so fixes various other Device mapper targets/features to be properly nested under "Device mapper support". Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09dm: allow remove to be deferredMikulas Patocka
This patch allows the removal of an open device to be deferred until it is closed. (Previously such a removal attempt would fail.) The deferred remove functionality is enabled by setting the flag DM_DEFERRED_REMOVE in the ioctl structure on DM_DEV_REMOVE or DM_REMOVE_ALL ioctl. On return from DM_DEV_REMOVE, the flag DM_DEFERRED_REMOVE indicates if the device was removed immediately or flagged to be removed on close - if the flag is clear, the device was removed. On return from DM_DEV_STATUS and other ioctls, the flag DM_DEFERRED_REMOVE is set if the device is scheduled to be removed on closure. A device that is scheduled to be deleted can be revived using the message "@cancel_deferred_remove". This message clears the DMF_DEFERRED_REMOVE flag so that the device won't be deleted on close. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09dm table: print error on preresume failureMike Snitzer
If preresume fails it is worth logging an error given that a device is left suspended due to the failure. This change was motivated by local preresume error logging that was added to the cache target ("preresume failed"). Elevating this target-agnostic context for the where the target-specific error occurred relative to the DM core's callouts makes sense. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com>
2013-11-09dm crypt: add TCW IV mode for old CBC TCRYPT containersMilan Broz
dm-crypt can already activate TCRYPT (TrueCrypt compatible) containers in LRW or XTS block encryption mode. TCRYPT containers prior to version 4.1 use CBC mode with some additional tweaks, this patch adds support for these containers. This new mode is implemented using special IV generator named TCW (TrueCrypt IV with whitening). TCW IV only supports containers that are encrypted with one cipher (Tested with AES, Twofish, Serpent, CAST5 and TripleDES). While this mode is legacy and is known to be vulnerable to some watermarking attacks (e.g. revealing of hidden disk existence) it can still be useful to activate old containers without using 3rd party software or for independent forensic analysis of such containers. (Both the userspace and kernel code is an independent implementation based on the format documentation and it completely avoids use of original source code.) The TCW IV generator uses two additional keys: Kw (whitening seed, size is always 16 bytes - TCW_WHITENING_SIZE) and Kiv (IV seed, size is always the IV size of the selected cipher). These keys are concatenated at the end of the main encryption key provided in mapping table. While whitening is completely independent from IV, it is implemented inside IV generator for simplification. The whitening value is always 16 bytes long and is calculated per sector from provided Kw as initial seed, xored with sector number and mixed with CRC32 algorithm. Resulting value is xored with ciphertext sector content. IV is calculated from the provided Kiv as initial IV seed and xored with sector number. Detailed calculation can be found in the Truecrypt documentation for version < 4.1 and will also be described on dm-crypt site, see: http://code.google.com/p/cryptsetup/wiki/DMCrypt The experimental support for activation of these containers is already present in git devel brach of cryptsetup. Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09dm crypt: properly handle extra key string in initializationMilan Broz
Some encryption modes use extra keys (e.g. loopAES has IV seed) which are not used in block cipher initialization but are part of key string in table constructor. This patch adds an additional field which describes the length of the extra key(s) and substracts it before real key encryption setting. The key_size always includes the size, in bytes, of the key provided in mapping table. The key_parts describes how many parts (usually keys) are contained in the whole key buffer. And key_extra_size contains size in bytes of additional keys part (this number of bytes must be subtracted because it is processed by the IV generator). | K1 | K2 | .... | K64 | Kiv | |----------- key_size ----------------- | | |-key_extra_size-| | [64 keys] | [1 key] | => key_parts = 65 Example where key string contains main key K, whitening key Kw and IV seed Kiv: | K | Kiv | Kw | |--------------- key_size --------------| | |-----key_extra_size------| | [1 key] | [1 key] | [1 key] | => key_parts = 3 Because key_extra_size is calculated during IV mode setting, key initialization is moved after this step. For now, this change has no effect to supported modes (thanks to ilog2 rounding) but it is required by the following patch. Also, fix a sparse warning in crypt_iv_lmk_one(). Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09dm cache: log error message if dm_kcopyd_copy() failsHeinz Mauelshagen
A migration failure should be logged (albeit limited). Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09dm cache: use cell_defer() boolean argument consistentlyHeinz Mauelshagen
Fix a few cell_defer() calls that weren't passing a bool. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09dm cache: return -EINVAL if the user specifies unknown cache policyMikulas Patocka
Return -EINVAL when the specified cache policy is unknown rather than returning -ENOMEM. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09dm cache metadata: return bool from __superblock_all_zeroesJoe Thornber
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09dm cache policy mq: a few small fixesJoe Thornber
Rename takeout_queue to concat_queue. Fix a harmless bug in mq policies pop() function. Currently pop() always succeeds, with up coming changes this wont be the case. Fix typo in comment above pre_cache_to_cache prototype. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09dm cache policy: remove return from void policy_remove_mappingJoe Thornber
No need to return from a void function. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09dm cache: improve efficiency of quiescing flag managementJoe Thornber
Make the quiescing flag an atomic_t and stop protecting it with a spin lock. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09dm cache: fix a race condition between queuing new migrations and quiescing ↵Joe Thornber
for a shutdown The code that was trying to do this was inadequate. The postsuspend method (in ioctl context), needs to wait for the worker thread to acknowledge the request to quiesce. Otherwise the migration count may drop to zero temporarily before the worker thread realises we're quiescing. In this case the target will be taken down, but the worker thread may have issued a new migration, which will cause an oops when it completes. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.9+
2013-11-09dm cache: io destined for the cache device can now serve as tick biosJoe Thornber
Previously only origin bios could trigger ticks, which meant if all the io was destined for the cache no ticks were generated. If no ticks are generated then multiple hits, and movements in general, are attributed to the same tick. Only a stop gap fix, we need a better solution. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09dm cache policy mq: protect residency method with existing mutexJoe Thornber
It is safe to use a mutex in mq_residency() at this point since it is only called from ioctl context. But future-proof mq_residency() by using might_sleep() to catch new contexts that cannot sleep. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-05dm array: fix bug in growing arrayJoe Thornber
Entries would be lost if the old tail block was partially filled. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # 3.9+
2013-11-05dm mpath: requeue I/O during pg_initHannes Reinecke
When pg_init is running no I/O can be submitted to the underlying devices, as the path priority etc might change. When using queue_io for this, requests will be piling up within multipath as the block I/O scheduler just sees a _very fast_ device. All of this queued I/O has to be resubmitted from within multipathing once pg_init is done. This approach has the problem that it's virtually impossible to abort I/O when pg_init is running, and we're adding heavy load to the devices after pg_init since all of the queued I/O needs to be resubmitted _before_ any requests can be pulled off of the request queue and normal operation continues. This patch will requeue the I/O that triggers the pg_init call, and return 'busy' when pg_init is in progress. With these changes the block I/O scheduler will stop submitting I/O during pg_init, resulting in a quicker path switch and less I/O pressure (and memory consumption) after pg_init. Signed-off-by: Hannes Reinecke <hare@suse.de> [patch header edited for clarity and typos by Mike Snitzer] Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-01dm mpath: fix race condition between multipath_dtr and pg_init_doneShiva Krishna Merla
Whenever multipath_dtr() is happening we must prevent queueing any further path activation work. Implement this by adding a new 'pg_init_disabled' flag to the multipath structure that denotes future path activation work should be skipped if it is set. By disabling pg_init and then re-enabling in flush_multipath_work() we also avoid the potential for pg_init to be initiated while suspending an mpath device. Without this patch a race condition exists that may result in a kernel panic: 1) If after pg_init_done() decrements pg_init_in_progress to 0, a call to wait_for_pg_init_completion() assumes there are no more pending path management commands. 2) If pg_init_required is set by pg_init_done(), due to retryable mode_select errors, then process_queued_ios() will again queue the path activation work. 3) If free_multipath() completes before activate_path() work is called a NULL pointer dereference like the following can be seen when accessing members of the recently destructed multipath: BUG: unable to handle kernel NULL pointer dereference at 0000000000000090 RIP: 0010:[<ffffffffa003db1b>] [<ffffffffa003db1b>] activate_path+0x1b/0x30 [dm_multipath] [<ffffffff81090ac0>] worker_thread+0x170/0x2a0 [<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40 [switch to disabling pg_init in flush_multipath_work & header edits by Mike Snitzer] Signed-off-by: Shiva Krishna Merla <shivakrishna.merla@netapp.com> Reviewed-by: Krishnasamy Somasundaram <somasundaram.krishnasamy@netapp.com> Tested-by: Speagle Andy <Andy.Speagle@netapp.com> Acked-by: Junichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2013-10-31dm: allocate buffer for messages with small number of arguments using GFP_NOIOMikulas Patocka
dm-mpath and dm-thin must process messages even if some device is suspended, so we allocate argv buffer with GFP_NOIO. These messages have a small fixed number of arguments. On the other hand, dm-switch needs to process bulk data using messages so excessive use of GFP_NOIO could cause trouble. The patch also lowers the default number of arguments from 64 to 8, so that there is smaller load on GFP_NOIO allocations. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Acked-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-10-13Linux 3.12-rc5Linus Torvalds
2013-10-13Merge git://www.linux-watchdog.org/linux-watchdogLinus Torvalds
Pull watchdog fixes from Wim Van Sebroeck: "This will fix a deadlock on the ts72xx_wdt driver, fix bitmasks in the kempld_wdt driver and fix a section mismatch in the sunxi_wdt driver" * git://www.linux-watchdog.org/linux-watchdog: watchdog: sunxi: Fix section mismatch watchdog: kempld_wdt: Fix bit mask definition watchdog: ts72xx_wdt: locking bug in ioctl
2013-10-13watchdog: sunxi: Fix section mismatchMaxime Ripard
This driver has a section mismatch, for probe and remove functions, leading to the following warning during the compilation. WARNING: drivers/watchdog/built-in.o(.data+0x24): Section mismatch in reference from the variable sunxi_wdt_driver to the function .init.text:sunxi_wdt_probe() The variable sunxi_wdt_driver references the function __init sunxi_wdt_probe() Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2013-10-13watchdog: kempld_wdt: Fix bit mask definitionJingoo Han
STAGE_CFG bits are defined as [5:4] bits. However, '(((x) & 0x30) << 4)' handles [9:8] bits. Thus, it should be fixed in order to handle [5:4] bits. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2013-10-13watchdog: ts72xx_wdt: locking bug in ioctlDan Carpenter
Calling the WDIOC_GETSTATUS & WDIOC_GETBOOTSTATUS and twice will cause a interruptible deadlock. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2013-10-13Merge tag 'fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "A small batch of fixes this week, mostly OMAP related. Nothing stands out as particularly controversial. Also a fix for a 3.12-rc1 timer regression for Exynos platforms, including the Chromebooks" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: exynos: dts: Update 5250 arch timer node with clock frequency ARM: OMAP2: RX-51: Add missing max_current to rx51_lp5523_led_config ARM: mach-omap2: board-generic: fix undefined symbol ARM: dts: Fix pinctrl mask for omap3 ARM: OMAP3: Fix hardware detection for omap3630 when booted with device tree ARM: OMAP2: gpmc-onenand: fix sync mode setup with DT
2013-10-13ARM: exynos: dts: Update 5250 arch timer node with clock frequencyYuvaraj Kumar C D
Without the "clock-frequency" property in arch timer node, could able to see the below crash dump. [<c0014e28>] (unwind_backtrace+0x0/0xf4) from [<c0011808>] (show_stack+0x10/0x14) [<c0011808>] (show_stack+0x10/0x14) from [<c036ac1c>] (dump_stack+0x7c/0xb0) [<c036ac1c>] (dump_stack+0x7c/0xb0) from [<c01ab760>] (Ldiv0_64+0x8/0x18) [<c01ab760>] (Ldiv0_64+0x8/0x18) from [<c0062f60>] (clockevents_config.part.2+0x1c/0x74) [<c0062f60>] (clockevents_config.part.2+0x1c/0x74) from [<c0062fd8>] (clockevents_config_and_register+0x20/0x2c) [<c0062fd8>] (clockevents_config_and_register+0x20/0x2c) from [<c02b8e8c>] (arch_timer_setup+0xa8/0x134) [<c02b8e8c>] (arch_timer_setup+0xa8/0x134) from [<c04b47b4>] (arch_timer_init+0x1f4/0x24c) [<c04b47b4>] (arch_timer_init+0x1f4/0x24c) from [<c04b40d8>] (clocksource_of_init+0x34/0x58) [<c04b40d8>] (clocksource_of_init+0x34/0x58) from [<c049ed8c>] (time_init+0x20/0x2c) [<c049ed8c>] (time_init+0x20/0x2c) from [<c049b95c>] (start_kernel+0x1e0/0x39c) THis is because the Exynos u-boot, for example on the Chromebooks, doesn't set up the CNTFRQ register as expected by arch_timer. Instead, we have to specify the frequency in the device tree like this. Signed-off-by: Yuvaraj Kumar C D <yuvaraj.cd@samsung.com> [olof: Changed subject, added comment, elaborated on commit message] Signed-off-by: Olof Johansson <olof@lixom.net>
2013-10-13Merge tag 'fixes-against-v3.12-rc3-take2' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes From Tony Lindgren: Few fixes for omap3 related hangs and errors that people have noticed now that people are actually using the device tree based booting for omap3. Also one regression fix for timer compile for dra7xx when omap5 is not selected, and a LED regression fix for n900. * tag 'fixes-against-v3.12-rc3-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP2: RX-51: Add missing max_current to rx51_lp5523_led_config ARM: mach-omap2: board-generic: fix undefined symbol ARM: dts: Fix pinctrl mask for omap3 ARM: OMAP3: Fix hardware detection for omap3630 when booted with device tree ARM: OMAP2: gpmc-onenand: fix sync mode setup with DT Signed-off-by: Olof Johansson <olof@lixom.net>
2013-10-13Merge branch 'parisc-3.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: "This patchset includes a bugfix to prevent a kernel crash when memory in page zero is accessed by the kernel itself, e.g. via probe_kernel_read(). Furthermore we now export flush_cache_page() which is needed (indirectly) by the lustre filesystem. The other patches remove unused functions and optimizes the page fault handler to only evaluate variables if needed, which again protects against possible kernel crashes" * 'parisc-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: let probe_kernel_read() capture access to page zero parisc: optimize variable initialization in do_page_fault parisc: fix interruption handler to respect pagefault_disable() parisc: mark parisc_terminate() noreturn and cold. parisc: remove unused syscall_ipi() function. parisc: kill SMP single function call interrupt parisc: Export flush_cache_page() (needed by lustre)