summaryrefslogtreecommitdiff
path: root/drivers/block/drbd/drbd_worker.c
AgeCommit message (Collapse)Author
2011-06-30drbd: Use the correct max_bio_size when creating resync requestsPhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-05-24drbd: Fix spellingBart Van Assche
Found these with the help of ispell -l. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2011-05-24drbd: fix potential distributed deadlockLars Ellenberg
We limit ourselves to a configurable maximum number of pages used as temporary bio pages. If the configured "max_buffers" is not big enough to match the bandwidth of the respective deployment, a distributed deadlock could be triggered by e.g. fast online verify and heavy application IO. TCP connections would block on congestion, because both receivers would wait on pages to become available. Fortunately the respective senders in this case would be able to give back some pages already. So do that. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Fixed handling of read errors on a 'VerifyS' nodePhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Fixed handling of read errors on a 'VerifyT' nodePhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Remove unused function atodb_endio()Andreas Gruenbacher
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: don't BUG_ON, if bio_add_page of a single page to an empty bio failsLars Ellenberg
Just deal with it more gracefully, if we fail to add even a single page to an empty bio. We used to BUG_ON() there, but it has been observed in some Xen deployment, so we need to handle that case more robustly now. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: log UUIDs whenever they changeLars Ellenberg
All decisions about sync, sync direction, and wether or not to allow a connect or attach are based on our set of UUIDs to tag a data generation. Log changes to the UUIDs whenever they occur, logging "new current UUID P:Q:R:S" is more useful than "Creating new current UUID". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: only generate and send a new sync uuid after a successful state changeLars Ellenberg
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: cleaned up __set_current_state() followed by schedule_timeout() callsPhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Work on the Ahead -> SyncSource transitionPhilipp Reisner
The test if rs_pending_cnt == 0 was too weak. Using Test for unacked_cnt == 0 instead. Moved that into the worker. Since unacked_cnt gets already increased when an P_RS_DATA_REQ comes in. Also using a timer to make Ahead -> SyncSource -> Ahead cycles slower... Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Cleaned up the resync timer logicPhilipp Reisner
Besides removed a few lines of code, this moves the inspection of the state from before the queuing process to after the queuing. I.e. more closely to the actual invocation of the work. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: No longer answer P_RS_DATA_REQUEST packets when in C_AHEAD modePhilipp Reisner
When the sync source node replies to a P_RS_DATA_REQUEST packet when it is already in ahead mode. I.e. those two packets crossed each other on the wire, that may lead to diverging bitmaps. This never happens in a well-tuned-system. In a well-tuned- system the resync controller has reduced the resync speed to zero long before we got into ahead-mode. But we have to be prepared for the not-well-tuned-system of course as well. Because -> diverging bitmaps = non terminating resync. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: ratelimit io error messagesLars Ellenberg
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: serialize sending of resync uuid with pending w_send_oosLars Ellenberg
To improve the latency of IO requests during bitmap exchange, we recently allowed writes while waiting for the bitmap, sending "set out-of-sync" information packets for any newly dirtied bits. We have to make sure that the new resync-uuid does not overtake these "set oos" packets. Once the resync-uuid is received, the sync target starts the resync process, and expects the bitmap to only be cleared, not re-set. If we use this protocol extension, we queue the generation and sending of the resync-uuid on the worker, which naturally serializes with all previously queued packets. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: move bitmap write from resync_finished to after_state_changeLars Ellenberg
We must not call it directly from resync_finished, as we may be in either receiver or worker context there. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: allow petabyte storage on 64bit archLars Ellenberg
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: bitmap keep track of changes vs on-disk bitmapLars Ellenberg
When we set or clear bits in a bitmap page, also set a flag in the page->private pointer. This allows us to skip writes of unchanged pages. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Use the standard bool, true, and false keywordsAndreas Gruenbacher
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Implemented the before-resync-source handlerPhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Make some functions staticPhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Implemented priority inheritance for resync requestsPhilipp Reisner
We only issue resync requests if there is no significant application IO going on. = Application IO has higher priority than resnyc IO. If application IO can not be started because the resync process locked an resync_lru entry, start the IO operations necessary to release the lock ASAP. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: Do not cleanup resync LRU for the Ahead/Behind SyncSource/SyncTarget ↵Philipp Reisner
transitions This one should be replaced with moving this cleanup to the 'right' position. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: When proxy's buffer drained off go into regular resync modePhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: New packet for Ahead/Behind mode: P_OUT_OF_SYNCPhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: properly use max_hw_sectors to limit the our bio sizeLars Ellenberg
To ease tracking of bios in some hash tables, we want it to not cross certain boundaries (128k, used to be 32k). We limit the maximum bio size using queue parameters. Historically some defines and variables we use there have been named max_segment_size, which was misguided. Rename them to max_bio_size, and use [blk_]queue_max_hw_sectors where appropriate. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: use the resync controller for online-verify requests as wellLars Ellenberg
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: factor out drbd_rs_number_requestsLars Ellenberg
Preparation patch to be able to use the auto-throttling resync controller for online-verify requests as well. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: factor out drbd_rs_controller_resetLars Ellenberg
Preparation patch to be able to use the auto-throttling resync controller for online-verify requests as well. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: advance progress step marks for online-verifyLars Ellenberg
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10drbd: only reset online-verify start sector if verify completedLars Ellenberg
For network hickups during online-verify, on the next verify triggered, we by default want to resume where it left off. After any replication link interruption, there will be a (possibly empty) resync. Do not reset online-verify start sector if some resync completed, that would defeats the purpose. Only reset the start sector once a verify run is completed. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2011-03-10block: remove per-queue pluggingJens Axboe
Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-11-27drbd: fix for spin_lock_irqsave in endio callbackLars Ellenberg
In commit 9b7f76dc37919ea36caa9680a3f765e5b19b25fb, Author: Lars Ellenberg <lars.ellenberg@linbit.com> Date: Wed Aug 11 23:40:24 2010 +0200 drbd: new configuration parameter c-min-rate a bad chunk slipped through, which is now reverted as well, restoring the correct irqsave for the endio callback. This patch also add comments at both req_mod() and in the endio callback so it should not happen again. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-11-17BKL: remove extraneous #include <smp_lock.h>Arnd Bergmann
The big kernel lock has been removed from all these files at some point, leaving only the #include. Remove this too as a cleanup. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-23drbd: Removed the BIO_RW_BARRIER support form the receiver/epoch codePhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-22drbd: fix a misleading printkLars Ellenberg
This codepath used to be called only for failed kmalloc GFP_ATOMIC, but is now also triggered by other things. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: add explicit drbd_md_sync to drbd_resync_finishedLars Ellenberg
As we usually update the generation UUIDs here, we should explicitly sync them to disk. So far this has been done only implicitly by related code paths. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: fix unlikely access after free and list corruptionLars Ellenberg
Various cleanup paths have been incomplete, for the very unlikely case that we cannot allocate enough bios from process context when submitting on behalf of the peer or resync process. Never observed. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: fix for spurious fullsync (uuids rotated too fast)Lars Ellenberg
If it was an "empty" resync, the SyncSource may have already "finished" the resync and rotated the UUIDs, before noticing the connection loss (and generating a new uuid, if Primary, rotating again), while the SyncTarget did not change its uuids at all, or only got to the previous sync-uuid. This would then again lead to a full sync on next handshake (see also Bug #251). Fix: Use explicit resync finished notification even for empty resyncs, do not finish an empty resync implicitly on the SyncSource. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: Fixed a stupid copy and paste errorPhilipp Reisner
This caused rs_planed to be not in sync with the content of the fifo. That in turn could cause that the resync comes to a complete halt. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: DIV_ROUND_UP not needed hereLars Ellenberg
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: Fixed compatibility with protocol versions smaller than 95Philipp Reisner
Forgot to consider the max size for the resync requests. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: fix potential kernel BUG (NULL deref)Lars Ellenberg
BUG trace would look like: lc_find drbd_rs_complete_io got_OVResult drbd_asender Could be triggered by explicit, or IO-error policy based, detach during online-verify. We may only dereference mdev->resync, if we first get_ldev(), as the disk may break any time, causing mdev->resync to disappear once all ldev references have been returned. Already in flight online-verify requests or replies may still come in, which we then need to ignore. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: don't count sendpage()d pages only referenced by tcp as in useLars Ellenberg
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: Removed a race that could cause unexpected execution of ↵Philipp Reisner
w_make_resync_request() The actual race happened int the drbd_start_resync() function. Where drbd_resync_finished() -> __drbd_set_state() set STOP_SYNC_TIMER and armed the timer. If the timer fired before execution reaches the mod_timer statement at the end of drbd_start_resync() the latter would cause an unexpected call to w_make_resync_request(). Removed the STOP_SYNC_TIMER bit, and base it on the connection state. The STOP_SYNC_TIMER bit probably originates probably the time before the state engine. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: Disable activity log updates when the whole device is out of syncPhilipp Reisner
When the complete device is marked as out of sync, we can disable updates of the on disk AL. Currently AL updates are only disabled if one uses the "invalidate-remote" command on an unconnected, primary device, or when at attach time all bits in the bitmap are set. As of now, AL updated do not get disabled when a all bits becomes set due to application writes to an unconnected DRBD device. While this is a missing feature, it is not considered important, and might get added later. BTW, after initializing a "one legged" DRBD device drbdadm create-md resX drbdadm -- --force primary resX AL updates also get disabled, until the first connect. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: Sending of big packets, for payloads from 64KByte to 4GBytePhilipp Reisner
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: Bugfix for regression introduced with f9bc8913c06022ePhilipp Reisner
If we intent to use the block_id member of an epoch entry, we may not use the digest member. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: new configuration parameter c-min-rateLars Ellenberg
We now track the data rate of locally submitted resync related requests, and can thus detect non-resync activity on the lower level device. If the current sync rate is above c-min-rate, and the lower level device appears to be busy, we throttle the resyncer. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2010-10-14drbd: reduce code duplication when receiving data requestsLars Ellenberg
also canonicalize the return values of read_for_csum and drbd_rs_begin_io to return -ESOMETHING, or 0 for success. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>