summaryrefslogtreecommitdiff
path: root/drivers/md
AgeCommit message (Collapse)Author
2013-03-23block: Add bio_alloc_pages()Kent Overstreet
More utility code to replace stuff that's getting open coded. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de>
2013-03-23block: Convert some code to bio_for_each_segment_all()Kent Overstreet
More prep work for immutable bvecs: A few places in the code were either open coding or using the wrong version - fix. After we introduce the bvec iter, it'll no longer be possible to modify the biovec through bio_for_each_segment_all() - it doesn't increment a pointer to the current bvec, you pass in a struct bio_vec (not a pointer) which is updated with what the current biovec would be (taking into account bi_bvec_done and bi_size). So because of that it's more worthwhile to be consistent about bio_for_each_segment()/bio_for_each_segment_all() usage. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de> CC: Alasdair Kergon <agk@redhat.com> CC: dm-devel@redhat.com CC: Alexander Viro <viro@zeniv.linux.org.uk>
2013-03-23block: Add bio_for_each_segment_all()Kent Overstreet
__bio_for_each_segment() iterates bvecs from the specified index instead of bio->bv_idx. Currently, the only usage is to walk all the bvecs after the bio has been advanced by specifying 0 index. For immutable bvecs, we need to split these apart; bio_for_each_segment() is going to have a different implementation. This will also help document the intent of code that's using it - bio_for_each_segment_all() is only legal to use for code that owns the bio. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: Neil Brown <neilb@suse.de> CC: Boaz Harrosh <bharrosh@panasas.com>
2013-03-23raid1: use bio_copy_data()Kent Overstreet
This doesn't really delete any code _yet_, but once immutable bvecs are done we can just delete the rest of the code in that loop. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de>
2013-03-23raid1: Refactor narrow_write_error() to not use bi_idxKent Overstreet
More bi_idx removal. This code was just open coding bio_clone(). This could probably be further improved by using bio_advance() instead of skipping over null pages, but that'd be a larger rework. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de>
2013-03-23raid5: use bio_reset()Kent Overstreet
Had to shuffle the code around a bit (where bi_rw and bi_end_io were set), but shouldn't really be anything tricky here Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de>
2013-03-23raid1: use bio_reset()Kent Overstreet
Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de>
2013-03-23raid10: Use bio_reset()Kent Overstreet
More prep work for immutable bio vecs, mainly getting rid of references to bi_idx. bio_reset was being open coded in a few places. The one in sync_request was a bit nontrivial to convert, so could use some extra eyeballs. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de> Acked-by: NeilBrown <neilb@suse.de>
2013-03-23block: Add submit_bio_wait(), remove from mdKent Overstreet
Random cleanup - this code was duplicated and it's not really specific to md. Also added the ability to return the actual error code. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de> Acked-by: Tejun Heo <tj@kernel.org>
2013-03-23block: Remove bi_idx referencesKent Overstreet
For immutable bvecs, all bi_idx usage needs to be audited - so here we're removing all the unnecessary uses. Most of these are places where it was being initialized on a bio that was just allocated, a few others are conversions to standard macros. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk>
2013-03-23block: Change bio_split() to respect the current value of bi_idxKent Overstreet
In the current code bio_split() won't be seeing partially completed bios so this doesn't change any behaviour, but this makes the code a bit clearer as to what bio_split() actually requires. The immediate purpose of the patch is removing unnecessary bi_idx references, but the end goal is to allow partial completed bios to be submitted, which along with immutable biovecs enables effecient bio splitting. Some of the callers were (double) checking that bios could be split, so update their checks too. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: Lars Ellenberg <drbd-dev@lists.linbit.com> CC: Neil Brown <neilb@suse.de> CC: Martin K. Petersen <martin.petersen@oracle.com>
2013-03-23block: Use bio_sectors() more consistentlyKent Overstreet
Bunch of places in the code weren't using it where they could be - this'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx into a struct bvec_iter. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: "Ed L. Cashin" <ecashin@coraid.com> CC: Nick Piggin <npiggin@kernel.dk> CC: Jiri Kosina <jkosina@suse.cz> CC: Jim Paris <jim@jtan.com> CC: Geoff Levand <geoff@infradead.org> CC: Alasdair Kergon <agk@redhat.com> CC: dm-devel@redhat.com CC: Neil Brown <neilb@suse.de> CC: Steven Rostedt <rostedt@goodmis.org> Acked-by: Ed Cashin <ecashin@coraid.com>
2013-03-23block: Add bio_end_sector()Kent Overstreet
Just a little convenience macro - main reason to add it now is preparing for immutable bio vecs, it'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx into a struct bvec_iter. Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: Lars Ellenberg <drbd-dev@lists.linbit.com> CC: Jiri Kosina <jkosina@suse.cz> CC: Alasdair Kergon <agk@redhat.com> CC: dm-devel@redhat.com CC: Neil Brown <neilb@suse.de> CC: Martin Schwidefsky <schwidefsky@de.ibm.com> CC: Heiko Carstens <heiko.carstens@de.ibm.com> CC: linux-s390@vger.kernel.org CC: Chris Mason <chris.mason@fusionio.com> CC: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com>
2013-03-23md: Convert md_trim_bio() to use bio_advance()Kent Overstreet
Signed-off-by: Kent Overstreet <koverstreet@google.com> CC: Jens Axboe <axboe@kernel.dk> CC: NeilBrown <neilb@suse.de> Acked-by: NeilBrown <neilb@suse.de>
2013-03-06Merge tag 'md-3.9' of git://neil.brown.name/mdLinus Torvalds
Pull md updates from NeilBrown: "Mostly little bugfixes. Only "feature" is a new RAID10 layout which slightly improves the number of sets of devices that can concurrently fail, without data loss." * tag 'md-3.9' of git://neil.brown.name/md: md: expedite metadata update when switching read-auto -> active md: remove CONFIG_MULTICORE_RAID456 md/raid1,raid10: fix deadlock with freeze_array() md/raid0: improve error message when converting RAID4-with-spares to RAID0 md: raid0: fix error return from create_stripe_zones. md: fix two bugs when attempting to resize RAID0 array. DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 2) MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 1) MD RAID10: Minor non-functional code changes md: raid1,10: Handle REQ_WRITE_SAME flag in write bios md: protect against crash upon fsync on ro array
2013-03-01dm cache: add cleaner policyHeinz Mauelshagen
A simple cache policy that writes back all data to the origin. This is used to decommission a dm cache by emptying it. Signed-off-by: Heinz Mauelshagen <mauelshagen@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm cache: add mq policyJoe Thornber
A cache policy that uses a multiqueue ordered by recent hit count to select which blocks should be promoted and demoted. This is meant to be a general purpose policy. It prioritises reads over writes. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm: add cache targetJoe Thornber
Add a target that allows a fast device such as an SSD to be used as a cache for a slower device such as a disk. A plug-in architecture was chosen so that the decisions about which data to migrate and when are delegated to interchangeable tunable policy modules. The first general purpose module we have developed, called "mq" (multiqueue), follows in the next patch. Other modules are under development. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Heinz Mauelshagen <mauelshagen@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm persistent data: add bitsetJoe Thornber
Add a persistent bitset as a wrapper around dm-array. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm persistent data: add transactional arrayJoe Thornber
Add a transactional array. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm thin: remove cells from stackJoe Thornber
This patch takes advantage of the new bio-prison interface where the memory is now passed in rather than using a mempool in bio-prison. This allows the map function to avoid performing potentially-blocking allocations that could lead to deadlocks: We want to avoid the cell allocation that is done in bio_detain. (The potential for mempool deadlocks still remains in other functions that use bio_detain.) Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm bio prison: pass cell memory inJoe Thornber
Change the dm_bio_prison interface so that instead of allocating memory internally, dm_bio_detain is supplied with a pre-allocated cell each time it is called. This enables a subsequent patch to move the allocation of the struct dm_bio_prison_cell outside the thin target's mapping function so it can no longer block there. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm persistent data: add btree_walkJoe Thornber
Add dm_btree_walk to iterate through the contents of a btree. This will be used by the dm cache target. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm: add target num_write_bios fnAlasdair G Kergon
Add a num_write_bios function to struct target. If an instance of a target sets this, it will be queried before the target's mapping function is called on a write bio, and the response controls the number of copies of the write bio that the target will receive. This provides a convenient way for a target to send the same data to more than one device. The new cache target uses this in writethrough mode, to send the data both to the cache and the backing device. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm kcopyd: introduce configurable throttlingMikulas Patocka
This patch allows the administrator to reduce the rate at which kcopyd issues I/O. Each module that uses kcopyd acquires a throttle parameter that can be set in /sys/module/*/parameters. We maintain a history of kcopyd usage by each module in the variables io_period and total_period in struct dm_kcopyd_throttle. The actual kcopyd activity is calculated as a percentage of time equal to "(100 * io_period / total_period)". This is compared with the user-defined throttle percentage threshold and if it is exceeded, we sleep. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm ioctl: allow message to return dataMikulas Patocka
This patch introduces enhanced message support that allows the device-mapper core to recognise messages that are common to all devices, and for messages to return data to userspace. Core messages are processed by the function "message_for_md". If the device mapper doesn't support the message, it is passed to the target driver. If the message returns data, the kernel sets the flag DM_MESSAGE_OUT_FLAG. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm ioctl: optimize functions without variable paramsMikulas Patocka
Device-mapper ioctls receive and send data in a buffer supplied by userspace. The buffer has two parts. The first part contains a 'struct dm_ioctl' and has a fixed size. The second part depends on the ioctl and has a variable size. This patch recognises the specific ioctls that do not use the variable part of the buffer and skips allocating memory for it. In particular, when a device is suspended and a resume ioctl is sent, this now avoid memory allocation completely. The variable "struct dm_ioctl tmp" is moved from the function copy_params to its caller ctl_ioctl and renamed to param_kernel. It is used directly when the ioctl function doesn't need any arguments. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm ioctl: introduce ioctl_flagsMikulas Patocka
This patch introduces flags for each ioctl function. So far, one flag is defined, IOCTL_FLAGS_NO_PARAMS. It is set if the function processing the ioctl doesn't take or produce any parameters in the section of the data buffer that has a variable size. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm: merge io_pool and tio_poolJun'ichi Nomura
This patch merges io_pool and tio_pool into io_pool and cleans up related functions. Though device-mapper used to have 2 pools of objects for each dm device, the use of bioset frontbad for per-bio data has shrunk the number of pools to 1 for both bio-based and request-based device types. (See c0820cf5 "dm: introduce per_bio_data" and 94818742 "dm: Use bioset's front_pad for dm_rq_clone_bio_info") So dm no longer has to maintain 2 different pointers. No functional changes. Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm: remove unused _rq_bio_info_cacheJun'ichi Nomura
Remove _rq_bio_info_cache, which is no longer used. No functional changes. Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm: fix limits initialization when there are no data devicesMike Christie
dm_calculate_queue_limits will first reset the provided limits to defaults using blk_set_stacking_limits; whereby defeating the purpose of retaining the original live table's limits -- as was intended via commit 3ae706561637331aa578e52bb89ecbba5edcb7a9 ("dm: retain table limits when swapping to new table with no devices"). Fix this improper limits initialization (in the no data devices case) by avoiding the call to dm_calculate_queue_limits. [patch header revised by Mike Snitzer] Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # v3.6+ Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm snapshot: add missing module aliasesMikulas Patocka
Add module aliases so that autoloading works correctly if the user tries to activate "snapshot-origin" or "snapshot-merge" targets. Reference: https://bugzilla.redhat.com/889973 Reported-by: Chao Yang <chyang@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm persistent data: set some btree fn parms constMike Snitzer
Mark some constant parameters constant in some dm-btree functions. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm: refactor bio cloningAlasdair G Kergon
Refactor part of the bio splitting and cloning code to try to make it easier to understand. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm: rename bio cloning functionsAlasdair G Kergon
Rename functions involved in splitting and cloning bios. The sequence of functions is now: (1) __split_and_process* - entry point that selects the processing strategy (2) __send* - prepare the details for each bio needed and loop through them (3) __clone_and_map* - creates a clone and maps it Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm: rename request variables to biosAlasdair G Kergon
Use 'bio' in the name of variables and functions that deal with bios rather than 'request' to avoid confusion with the normal block layer use of 'request'. No functional changes. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm: clean up clone_bioAlasdair G Kergon
Remove the no-longer-used struct bio_set argument from clone_bio and split_bvec. Use tio->ti in __map_bio() instead of passing in ti. Factor out some code for setting up cloned bios. Take target_request_nr as a parameter to alloc_tio(). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm persistent data: remove CONFIG_EXPERIMENTALKees Cook
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm: remove CONFIG_EXPERIMENTALAlasdair G Kergon
Remove EXPERIMENTAL from all existing device-mapper targets. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm thin: use block_size_is_power_of_twoMike Snitzer
Use block_size_is_power_of_two() rather than checking sectors_per_block_shift directly. Also introduce local pool variable in get_bio_block() to eliminate redundant tc->pool dereferences. No functional change. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm bufio: use WRITE_FLUSH instead of REQ_FLUSHMikulas Patocka
Use WRITE_FLUSH instead of REQ_FLUSH for submitted requests to make it consistent with the rest of the kernel. There is no functional change. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm table: remove superfluous variable resetWang Sheng-Hui
If allocation fails, the local var *t is not used any more after kfree. Don't need to reset it to NULL. Remove the unnecesary NULL set here. Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm thin: support a non power of 2 discard_granularityMike Snitzer
Support a non-power-of-2 discard granularity in dm-thin, now that the block layer supports this(via 8dd2cb7e880d2f77fba53b523c99133ad5054cfd "block: discard granularity might not be power of 2" and 59771079c18c44e39106f0f30054025acafadb41 "blk: avoid divide-by-zero with zero discard granularity"). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm: fix truncated status stringsMikulas Patocka
Avoid returning a truncated table or status string instead of setting the DM_BUFFER_FULL_FLAG when the last target of a table fills the buffer. When processing a table or status request, the function retrieve_status calls ti->type->status. If ti->type->status returns non-zero, retrieve_status assumes that the buffer overflowed and sets DM_BUFFER_FULL_FLAG. However, targets don't return non-zero values from their status method on overflow. Most targets returns always zero. If a buffer overflow happens in a target that is not the last in the table, it gets noticed during the next iteration of the loop in retrieve_status; but if a buffer overflow happens in the last target, it goes unnoticed and erroneously truncated data is returned. In the current code, the targets behave in the following way: * dm-crypt returns -ENOMEM if there is not enough space to store the key, but it returns 0 on all other overflows. * dm-thin returns errors from the status method if a disk error happened. This is incorrect because retrieve_status doesn't check the error code, it assumes that all non-zero values mean buffer overflow. * all the other targets always return 0. This patch changes the ti->type->status function to return void (because most targets don't use the return code). Overflow is detected in retrieve_status: if the status method fills up the remaining space completely, it is assumed that buffer overflow happened. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01dm: do not replace bioset for request based dmJun'ichi Nomura
This patch fixes a regression introduced in v3.8, which causes oops like this when dm-multipath is used: general protection fault: 0000 [#1] SMP RIP: 0010:[<ffffffff810fe754>] [<ffffffff810fe754>] mempool_free+0x24/0xb0 Call Trace: <IRQ> [<ffffffff81187417>] bio_put+0x97/0xc0 [<ffffffffa02247a5>] end_clone_bio+0x35/0x90 [dm_mod] [<ffffffff81185efd>] bio_endio+0x1d/0x30 [<ffffffff811f03a3>] req_bio_endio.isra.51+0xa3/0xe0 [<ffffffff811f2f68>] blk_update_request+0x118/0x520 [<ffffffff811f3397>] blk_update_bidi_request+0x27/0xa0 [<ffffffff811f343c>] blk_end_bidi_request+0x2c/0x80 [<ffffffff811f34d0>] blk_end_request+0x10/0x20 [<ffffffffa000b32b>] scsi_io_completion+0xfb/0x6c0 [scsi_mod] [<ffffffffa000107d>] scsi_finish_command+0xbd/0x120 [scsi_mod] [<ffffffffa000b12f>] scsi_softirq_done+0x13f/0x160 [scsi_mod] [<ffffffff811f9fd0>] blk_done_softirq+0x80/0xa0 [<ffffffff81044551>] __do_softirq+0xf1/0x250 [<ffffffff8142ee8c>] call_softirq+0x1c/0x30 [<ffffffff8100420d>] do_softirq+0x8d/0xc0 [<ffffffff81044885>] irq_exit+0xd5/0xe0 [<ffffffff8142f3e3>] do_IRQ+0x63/0xe0 [<ffffffff814257af>] common_interrupt+0x6f/0x6f <EOI> [<ffffffffa021737c>] srp_queuecommand+0x8c/0xcb0 [ib_srp] [<ffffffffa0002f18>] scsi_dispatch_cmd+0x148/0x310 [scsi_mod] [<ffffffffa000a38e>] scsi_request_fn+0x31e/0x520 [scsi_mod] [<ffffffff811f1e57>] __blk_run_queue+0x37/0x50 [<ffffffff811f1f69>] blk_delay_work+0x29/0x40 [<ffffffff81059003>] process_one_work+0x1c3/0x5c0 [<ffffffff8105b22e>] worker_thread+0x15e/0x440 [<ffffffff8106164b>] kthread+0xdb/0xe0 [<ffffffff8142db9c>] ret_from_fork+0x7c/0xb0 The regression was introduced by the change c0820cf5 "dm: introduce per_bio_data", where dm started to replace bioset during table replacement. For bio-based dm, it is good because clone bios do not exist during the table replacement. For request-based dm, however, (not-yet-mapped) clone bios may stay in request queue and survive during the table replacement. So freeing the old bioset could cause the oops in bio_put(). Since the size of front_pad may change only with bio-based dm, it is not necessary to replace bioset for request-based dm. Reported-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-02-28Merge branch 'for-3.9/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block IO core bits from Jens Axboe: "Below are the core block IO bits for 3.9. It was delayed a few days since my workstation kept crashing every 2-8h after pulling it into current -git, but turns out it is a bug in the new pstate code (divide by zero, will report separately). In any case, it contains: - The big cfq/blkcg update from Tejun and and Vivek. - Additional block and writeback tracepoints from Tejun. - Improvement of the should sort (based on queues) logic in the plug flushing. - _io() variants of the wait_for_completion() interface, using io_schedule() instead of schedule() to contribute to io wait properly. - Various little fixes. You'll get two trivial merge conflicts, which should be easy enough to fix up" Fix up the trivial conflicts due to hlist traversal cleanups (commit b67bfe0d42ca: "hlist: drop the node parameter from iterators"). * 'for-3.9/core' of git://git.kernel.dk/linux-block: (39 commits) block: remove redundant check to bd_openers() block: use i_size_write() in bd_set_size() cfq: fix lock imbalance with failed allocations drivers/block/swim3.c: fix null pointer dereference block: don't select PERCPU_RWSEM block: account iowait time when waiting for completion of IO request sched: add wait_for_completion_io[_timeout] writeback: add more tracepoints block: add block_{touch|dirty}_buffer tracepoint buffer: make touch_buffer() an exported function block: add @req to bio_{front|back}_merge tracepoints block: add missing block_bio_complete() tracepoint block: Remove should_sort judgement when flush blk_plug block,elevator: use new hashtable implementation cfq-iosched: add hierarchical cfq_group statistics cfq-iosched: collect stats from dead cfqgs cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats() blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock block: RCU free request_queue blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge() ...
2013-02-28hlist: drop the node parameter from iteratorsSasha Levin
I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-28dm: convert to idr_alloc()Tejun Heo
Convert to the much saner new idr interface. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Alasdair Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-28dm: don't use idr_remove_all()Tejun Heo
idr_destroy() can destroy idr by itself and idr_remove_all() is being deprecated. Drop its usage. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Alasdair Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-28md: expedite metadata update when switching read-auto -> activeNeilBrown
If something has failed while the array was read-auto, then when we switch to 'active' we need to update the metadata. This will happen anyway but it is good to expedite it, and also to ensure any failed device has been released by the underlying device before we try to action the ioctl which caused us to switch to 'active' mode. Reported-by: Joe Lawrence <Joe.Lawrence@stratus.com> Signed-off-by: NeilBrown <neilb@suse.de>