summaryrefslogtreecommitdiff
path: root/drivers/block/rbd.c
AgeCommit message (Collapse)Author
2013-09-11block: replace strict_strtoul() with kstrtoul()Jingoo Han
The use of strict_strtoul() is not preferred, because strict_strtoul() is obsolete. Thus, kstrtoul() should be used. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph updates from Sage Weil: "This includes both the first pile of Ceph patches (which I sent to torvalds@vger, sigh) and a few new patches that add support for fscache for Ceph. That includes a few fscache core fixes that David Howells asked go through the Ceph tree. (Thanks go to Milosz Tanski for putting this feature together) This first batch of patches (included here) had (has) several important RBD bug fixes, hole punch support, several different cleanups in the page cache interactions, improvements in the truncate code (new truncate mutex to avoid shenanigans with i_mutex), and a series of fixes in the synchronous striping read/write code. On top of that is a random collection of small fixes all across the tree (error code checks and error path cleanup, obsolete wq flags, etc)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (43 commits) ceph: use d_invalidate() to invalidate aliases ceph: remove ceph_lookup_inode() ceph: trivial buildbot warnings fix ceph: Do not do invalidate if the filesystem is mounted nofsc ceph: page still marked private_2 ceph: ceph_readpage_to_fscache didn't check if marked ceph: clean PgPrivate2 on returning from readpages ceph: use fscache as a local presisent cache fscache: Netfs function for cleanup post readpages FS-Cache: Fix heading in documentation CacheFiles: Implement interface to check cache consistency FS-Cache: Add interface to check consistency of a cached object rbd: fix null dereference in dout rbd: fix buffer size for writes to images with snapshots libceph: use pg_num_mask instead of pgp_num_mask for pg.seed calc rbd: fix I/O error propagation for reads ceph: use vfs __set_page_dirty_nobuffers interface instead of doing it inside filesystem ceph: allow sync_read/write return partial successed size of read/write. ceph: fix bugs about handling short-read for sync read mode. ceph: remove useless variable revoked_rdcache ...
2013-09-04rbd: fix null dereference in doutJosh Durgin
The order parameter is sometimes NULL in _rbd_dev_v2_snap_size(), but the dout() always derefences it. Move this to another dout() protected by a check that order is non-NULL. Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <alex.elder@linaro.org>
2013-09-04rbd: fix buffer size for writes to images with snapshotsJosh Durgin
rbd_osd_req_create() needs to know the snapshot context size to create a buffer large enough to send it with the message front. It gets this from the img_request, which was not set for the obj_request yet. This resulted in trying to write past the end of the front payload, hitting this BUG: libceph: BUG_ON(p > msg->front.iov_base + msg->front.iov_len); Fix this by associating the obj_request with its img_request immediately after it's created, before the osd request is created. Fixes: http://tracker.ceph.com/issues/5760 Suggested-by: Alex Elder <alex.elder@linaro.org> Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Alex Elder <alex.elder@linaro.org>
2013-09-04rbd: fix I/O error propagation for readsJosh Durgin
When a request returns an error, the driver needs to report the entire extent of the request as completed. Writes already did this, since they always set xferred = length, but reads were skipping that step if an error other than -ENOENT occurred. Instead, rbd would end up passing 0 xferred to blk_end_request(), which would always report needing more data. This resulted in an assert failing when more data was required by the block layer, but all the object requests were done: [ 1868.719077] rbd: obj_request read result -108 xferred 0 [ 1868.719077] [ 1868.719518] end_request: I/O error, dev rbd1, sector 0 [ 1868.719739] [ 1868.719739] Assertion failure in rbd_img_obj_callback() at line 1736: [ 1868.719739] [ 1868.719739] rbd_assert(more ^ (which == img_request->obj_request_count)); Without this assert, reads that hit errors would hang forever, since the block layer considered them incomplete. Fixes: http://tracker.ceph.com/issues/5647 CC: stable@vger.kernel.org # v3.10 Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Alex Elder <alex.elder@linaro.org>
2013-08-28rbd: convert bus code to use bus_groupsGreg Kroah-Hartman
The bus_attrs field of struct bus_type is going away soon, dev_groups should be used instead. This converts the RBD bus code to use the correct field. Cc: Yehuda Sadeh <yehuda@inktank.com> Cc: Sage Weil <sage@inktank.com> Acked-by: Alex Elder <elder@linaro.org> Cc: <ceph-devel@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-10block: rbd: use NULL instead of 0Jingoo Han
The local variables such as 'bio_list', and 'pages' are pointers; thus, use NULL instead of 0 to fix the following sparse warnings. drivers/block/rbd.c:2166:32: warning: Using plain integer as NULL pointer drivers/block/rbd.c:2168:31: warning: Using plain integer as NULL pointer Signed-off-by: Jingoo Han <jg1.han@samsung.com> Reviewed-by: Sage Weil <sage@inktank.com>
2013-07-03rbd: fix a couple warningsSage Weil
gcc isn't quite smart enough and generates these warnings: drivers/block/rbd.c: In function 'rbd_img_request_fill': drivers/block/rbd.c:1266:22: warning: 'bio_list' may be used uninitialized in this function [-Wmaybe-uninitialized] drivers/block/rbd.c:2186:14: note: 'bio_list' was declared here drivers/block/rbd.c:2247:10: warning: 'pages' may be used uninitialized in this function [-Wmaybe-uninitialized] even though they are initialized for their respective code paths. Signed-off-by: Sage Weil <sage@inktank.com>
2013-07-03rbd: take a little creditAlex Elder
Add a name to the list of authors. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-07-03rbd: use rwsem to protect header updatesAlex Elder
Updating an image header needs to be protected to ensure it's done consistently. However distinct headers can be updated concurrently without a problem. Instead of using the global control lock to serialize headder updates, just rely on the header semaphore. (It's already used, this just moves it out to cover a broader section of the code.) That leaves the control mutex protecting only the creation of rbd clients, so rename it. This resolves: http://tracker.ceph.com/issues/5222 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-07-03rbd: don't hold ctl_mutex to get/put deviceAlex Elder
When an rbd device is first getting mapped, its device registration is protected the control mutex. There is no need to do that though, because the device has already been assigned an id that's guaranteed to be unique. An unmap of an rbd device won't proceed if the device has a non-zero open count or is already being unmapped. So there's no need to hold the control mutex in that case either. Finally, an rbd device can't be opened if it is being removed, and it won't go away if there is a non-zero open count. So here too there's no need to hold the control mutex while getting or putting a reference to an rbd device's Linux device structure. Drop the mutex calls in these cases. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-07-03rbd: protect against concurrent unmapsAlex Elder
Make sure two concurrent unmap operations on the same rbd device won't collide, by only proceeding with the removal and cleanup of a device if is not already underway. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-07-03rbd: set removing flag while holding list lockAlex Elder
When unmapping a device, its id is supplied, and that is used to look up which rbd device should be unmapped. Looking up the device involves searching the rbd device list while holding a spinlock that protects access to that list. Currently all of this is done under protection of the control lock, but that protection is going away soon. To ensure the rbd_dev is still valid (still on the list) while setting its REMOVING flag, do so while still holding the list lock. To do so, get rid of __rbd_get_dev(), and open code what it did in the one place it was used. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-07-03rbd: protect against duplicate client creationAlex Elder
If more than one rbd image has the same ceph cluster configuration (same options, same set of monitors, same keys) they normally share a single rbd client. When an image is getting mapped, rbd looks to see if an existing client can be used, and creates a new one if not. The lookup and creation are not done under a common lock though, so mapping two images concurrently could lead to duplicate clients getting set up needlessly. This isn't a major problem, but it's wasteful and different from what's intended. This patch fixes that by using the control mutex to protect both the lookup and (if needed) creation of the client. It was previously used just when creating. This resolves: http://tracker.ceph.com/issues/3094 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-07-03rbd: clean up a few things in the refresh pathAlex Elder
This includes a few relatively small fixes I found while examining the code that refreshes image information. This resolves: http://tracker.ceph.com/issues/5040 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-07-03rbd: flush dcache after zeroing page dataAlex Elder
Neither zero_bio_chain() nor zero_pages() contains a call to flush caches after zeroing a portion of a page. This can cause problems on architectures that have caches that allow virtual address aliasing. This resolves: http://tracker.ceph.com/issues/4777 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-07-01rbd: drop original request earlier for existence checkAlex Elder
The reference to the original request dropped at the end of rbd_img_obj_exists_callback() corresponds to the reference taken in rbd_img_obj_exists_submit() to account for the stat request referring to it. Move the put of that reference up right after clearing that pointer to make its purpose more obvious. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-07-01rbd: Use min_t() to fix comparison of distinct pointer types warningGeert Uytterhoeven
drivers/block/rbd.c: In function ‘zero_pages’: drivers/block/rbd.c:1102: warning: comparison of distinct pointer types lacks a cast Remove the hackish casts and use min_t() to fix this. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Alex Elder <elder@inktank.com>
2013-06-29Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fix from Sage Weil: "This is a recently spotted regression in the snapshot behavior... It turns out several tests weren't being run in the nightlies so this took a while to spot" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: send snapshot context with writes
2013-06-27rbd: send snapshot context with writesJosh Durgin
Sending the right snapshot context with each write is required for snapshots to work. Due to the ordering of calls, the snapshot context is never set for any requests. This causes writes to the current version of the image to be reflected in all snapshots, which are supposed to be read-only. This happens because rbd_osd_req_format_write() sets the snapshot context based on obj_request->img_request. At this point, however, obj_request->img_request has not been set yet, to the snapshot context is set to NULL. Fix this by moving rbd_img_obj_request_add(), which sets obj_request->img_request, before the osd request formatting calls. This resolves: http://tracker.ceph.com/issues/5465 Reported-by: Karol Jurak <karol.jurak@gmail.com> Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@linaro.org>
2013-06-26Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fix from Sage Weil: "This fixes another problem with using v2 images on 3.10 due to the order in which fields are read from the image header. Hopefully this is the last one" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fetch object order before using it
2013-06-25rbd: fetch object order before using itJosh Durgin
rbd_dev_v2_header_onetime() fetches striping information, and checks whether the image can be read by compariing the stripe unit to the object size. It determines the object size by shifting the object order, which is 0 at this point since it has not been read yet. Move the call to get the image size and object order before rbd_dev_v2_header_onetime() so it is set before use. Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2013-06-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fix from Sage Weil: "This fixes a problem preventing the kernel and userland librbd libraries from sharing data with the new format 2 images" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: use the correct length for format 2 object names
2013-06-13rbd: use the correct length for format 2 object namesJosh Durgin
Format 2 objects use 16 characters for the object name suffix to be able to express the full 64-bit range of object numbers. Format 1 images only use 12 characters for this. Using 12-character names for format 2 caused userspace and kernel rbd clients to read differently named objects, which made an image written by one client look empty to the other client. CC: stable@vger.kernel.org # 3.9+ Reported-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
2013-06-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph fixes from Sage Weil: "There is a pair of fixes for double-frees in the recent bundle for 3.10, a couple of fixes for long-standing bugs (sleep while atomic and an endianness fix), and a locking fix that can be triggered when osds are going down" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: fix cleanup in rbd_add() rbd: don't destroy ceph_opts in rbd_add() ceph: ceph_pagelist_append might sleep while atomic ceph: add cpu_to_le32() calls when encoding a reconnect capability libceph: must hold mutex for reset_changed_osds()
2013-05-17rbd: fix cleanup in rbd_add()Alex Elder
Bjorn Helgaas pointed out that a recent commit introduced a use-after-free condition in an error path for rbd_add(). He correctly stated: I think b536f69a3a5 "rbd: set up devices only for mapped images" introduced a use-after-free error in rbd_add(): ... If rbd_dev_device_setup() returns an error, we call rbd_dev_image_release(), which ultimately kfrees rbd_dev. Then we call rbd_dev_destroy(), which references fields in the already-freed rbd_dev struct before kfreeing it again. The simple fix is to return the error code after the call to rbd_dev_image_release(). Closer examination revealed that there's no need to clean up rbd_opts in that function, so fix that too. Update some other comments that have also become out of date. Reported-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-17rbd: don't destroy ceph_opts in rbd_add()Alex Elder
Whether rbd_client_create() successfully creates a new client or not, it takes responsibility for getting the ceph_opts structure it's passed destroyed. If successful, the structure becomes associated with the created client; if not, rbd_client_create() will destroy it. Previously, rbd_get_client() would call ceph_destroy_options() if rbd_get_client() failed, and that meant it got called twice. That led freeing various pointers more than once, which is never a good idea. This resolves: http://tracker.ceph.com/issues/4559 Cc: stable@vger.kernel.org # 3.8+ Reported-by: Dan van der Ster <dan@vanderster.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "Yes, this is a much larger pull than I would like after -rc1. There are a few things included: - a few fixes for leaks and incorrect assertions - a few patches fixing behavior when mapped images are resized - handling for cloned/layered images that are flattened out from underneath the client The last bit was non-trivial, and there is some code movement and associated cleanup mixed in. This was ready and was meant to go in last week but I missed the boat on Friday. My only excuse is that I was waiting for an all clear from the testing and there were many other shiny things to distract me. Strictly speaking, handling the flatten case isn't a regression and could wait, so if you like we can try to pull the series apart, but Alex and I would much prefer to have it all in as it is a case real users will hit with 3.10." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (33 commits) rbd: re-submit flattened write request (part 2) rbd: re-submit write request for flattened clone rbd: re-submit read request for flattened clone rbd: detect when clone image is flattened rbd: reference count parent requests rbd: define parent image request routines rbd: define rbd_dev_unparent() rbd: don't release write request until necessary rbd: get parent info on refresh rbd: ignore zero-overlap parent rbd: support reading parent page data for writes rbd: fix parent request size assumption libceph: init sent and completed when starting rbd: kill rbd_img_request_get() rbd: only set up watch for mapped images rbd: set mapping read-only flag in rbd_add() rbd: support reading parent page data rbd: fix an incorrect assertion condition rbd: define rbd_dev_v2_header_info() rbd: get rid of trivial v1 header wrappers ...
2013-05-13rbd: re-submit flattened write request (part 2)Alex Elder
Add code to rbd_img_obj_exists_callback() to detect when a clone's parent image has disappeared, and re-submit the original write request in that case. Kill off some redundant assertions. This completes the resolution for: http://tracker.ceph.com/issues/3763 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-13rbd: re-submit write request for flattened cloneAlex Elder
Add code to rbd_img_parent_read_full_callback() to detect when a clone's parent image has disappeared, and re-submit the original write request in that case. (See the previous commit for more reasoning about why this is appropriate.) Rename some variables in rbd_img_obj_parent_read_full_callback() to match the convention used in the previous patch. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-13rbd: re-submit read request for flattened cloneAlex Elder
If a clone image gets flattened while a parent read request is underway, the original rbd object request needs to be resubmitted. The reason is that by the time we get the response to the parent read request, the data read from the parent may be out of date. In other words, we could see this sequence of events: rbd client parent image/osd ---------- ---------------- original object ENOENT; issue parent read respond to parent read child image flattened original image header refresh <--- original object written independently here parent read response received Add code to rbd_img_parent_read_callback() to detect when a clone's parent image has disappeared (as evidenced by its parent overlap becoming 0), and re-submit the original read request in that case. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-13rbd: detect when clone image is flattenedAlex Elder
A format 2 clone image can be the subject of a "flatten" operation, during which all of its data gets "copied up" from its parent image, leaving the image fully populated. Once this is complete, the clone's association with the parent is abolished. Since this can occur when a clone is mapped, we need to detect when it has occurred and handle it accordingly. We know an image has been flattened when we know it at one time had a parent, but we have learned (via a "get_parent" object class method call) it no longer has one. There might be in-flight requests at the point we learn an image has been flattened, so we can't simply clean up parent data structures right away. Instead, we'll drop the initial parent reference when the parent has disappeared (rather than when the image gets destroyed), which will allow the last in-flight reference to clean things up when it's complete. We leverage the fact that a zero parent overlap renders an image effectively unlayered. We set the overlap to 0 at the point we detect the clone image has flattened, which allows the unlayered behavior to take effect immediately, while keeping other parent structures in place until in-flight requests to complete. This and the next few patches resolve: http://tracker.ceph.com/issues/3763 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-13rbd: reference count parent requestsAlex Elder
Keep a reference count for uses of the parent information for an rbd device. An initial reference is set in rbd_img_request_create() if the target image has a parent (with non-zero overlap). Each image request for an image with a non-zero parent overlap gets another reference when it's created, and that reference is dropped when the request is destroyed. The initial reference is dropped when the image gets torn down. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-13rbd: define parent image request routinesAlex Elder
Define rbd_parent_request_create() and rbd_parent_request_destroy() to handle the creation of parent image requests submitted for layered image objects. For simplicity, let rbd_img_request_put() handle dropping the reference to any image request (parent or not), and call whichever destructor is appropriate on the last put. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-13rbd: define rbd_dev_unparent()Alex Elder
Define rbd_dev_unparent() to encapsulate cleaning up parent data structures from a layered rbd image. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-13rbd: don't release write request until necessaryAlex Elder
Previously when a layered write was going to involve a copyup request, the original osd request was released before submitting the parent full-object read. The osd request for the copyup would then be allocated in rbd_img_obj_parent_read_full_callback(). Shortly we will be handling the event of mapped layered images getting flattened, and when that occurs we need to resubmit the original request. We therefore don't want to release the osd request until we really konw we're going to replace it--in the callback function. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-13rbd: get parent info on refreshAlex Elder
Get parent info for format 2 images on every refresh (rather than just during the initial probe). This will be needed to detect the disappearance of the parent image in the event a mapped image becomes unlayered (i.e., flattened). Avoid leaking the previous parent spec on the second and subsequent times this information is requested by dropping the previous one (if any) before updating it. (Also, extract the pool id into a local variable before assigning it into the parent spec.) Switch to using a non-zero parent overlap value rather than the existence of a parent (a non-null parent_spec pointer) to determine whether to mark a request layered. It will soon be possible for a layered image to become unlayered while a request is in flight. This means that the layered flag for an image request indicates that there was a non-zero parent overlap at the time the image request was created. The parent overlap can change thereafter, which may lead to special handling at request submission or completion time. This and the next several patches are related to: http://tracker.ceph.com/issues/3763 NOTE: If an error occurs while refreshing the parent info (i.e., requesting it after initial probe), the old parent info will persist. This is not really correct, and is a scenario that needs to be addressed. For now we'll assert that the failure mode is unlikely, but the issue has been documented in tracker issue 5040. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-13rbd: ignore zero-overlap parentAlex Elder
An rbd clone image that has an overlap with its parent of 0 is effectively not a layered image at all. Detect this case and treat such an image as non-layered. Issue a warning to be sure the user knows what's going on. This resolves: http://tracker.ceph.com/issues/5028 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-13rbd: support reading parent page data for writesAlex Elder
Currently, rbd_img_obj_parent_read_full() assumes the incoming object request contains bio data. But if a layered image is part of a multi-layer stack of images it will result in read requests of page data to parent images. This is handling the same kind of issue as was resolved by this commit: 5b2ab72d rbd: support reading parent page data This resolves: http://tracker.ceph.com/issues/5027 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-13rbd: fix parent request size assumptionAlex Elder
The code that reads object data from the parent for a copyup on write request currently assumes that the size of that request is the size of a "full" object from the original target image. That is not necessarily the case. The parent overlap could reduce the request size below that. To fix that assumption we need to record the number of pages in the copyup_pages array, for both an image request and an object request. Rename a local variable in rbd_img_obj_parent_read_full_callback() to reflect we're recording the length of the parent read request, not the size of the target object. This resolves: http://tracker.ceph.com/issues/5038 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-09rbd: kill rbd_img_request_get()Alex Elder
Get rid of rbd_img_request_get(), because it isn't used, and maybe won't ever be needed. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-09rbd: only set up watch for mapped imagesAlex Elder
Any changes to parent images are immaterial to any mapped clone. So there is no need to have a watch event registered on header objects except for the header object of an image that is mapped. In fact, a watch request is a write operation, and we may only have read access to a parent image. We can't set up the watch request until we know the name of the header object though. So pass a flag to rbd_dev_image_probe() to indicate whether this probe is for a mapping or for a parent image. Change the second parameter to rbd_dev_header_watch_sync() be Boolean while we're at it. This resolves: http://tracker.ceph.com/issues/4941 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-09rbd: set mapping read-only flag in rbd_add()Alex Elder
The rbd_dev->mapping field for a parent image is not meaningful. Since rbd_image_probe() is used both for images being mapped and their parents, it doesn't make sense to set that flag in that function. So move the setting of the mapping.read_only flag out of rbd_dev_image_probe() and into rbd_add() instead. This resolves: http://tracker.ceph.com/issues/4940 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-09rbd: support reading parent page dataAlex Elder
Currently, rbd_img_parent_read() assumes the incoming object request contains bio data. But if a layered image is part of a multi-layer stack of images it will result in read requests of page data to parent images. Fortunately, it's not hard to add support for page data. This resolves: http://tracker.ceph.com/issues/4939 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-09rbd: fix an incorrect assertion conditionAlex Elder
In rbd_img_obj_parent_read_full_callback() there is an assertion intended to verify the size of the image request for a full parent read was the size of the original request's target object. But assertion was looking at the parent image order rather than the original one, and these values can differ. Fix that. This resolves: http://tracker.ceph.com/issues/4938 Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-08rbd: define rbd_dev_v2_header_info()Alex Elder
This rearranges rbd_dev_v2_refresh() so it works more like rbd_dev_v1_header_info(). While format 1 images need to read the whole header object to get any information, format 2 can collect almost all information selectively. So the one-time initialization will remain in a separate function--based on rbd_dev_v2_probe(). Rename rbd_dev_v2_refresh() to be rbd_dev_v2_header_info(), and have it call rbd_dev_v2_header_onetime() if it's being called for the first time for the given rbd device. Rename rbd_dev_v2_probe() to be rbd_dev_v2_header_onetime() and remove the image size and snapshot context calls it held in common with the refresh function. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-08rbd: get rid of trivial v1 header wrappersAlex Elder
Get rid of the trivial wrapper functions rbd_dev_v1_refresh() and rbd_dev_v1_probe(), substituting rbd_dev_v1_header_read() calls in their place. Rename rbd_dev_v1_header_read() to be rbd_dev_v1_header_info(), to be more generic (it will better reflect what happens with format 2 images). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-08rbd: simplify rbd_dev_v1_probe()Alex Elder
An rbd_dev structure's fields are all zero-filled for an initial probe, so there's no need to explicitly zero the parent_spec and parent_overlap fields in rbd_dev_v1_probe(). Removing these assignments makes rbd_dev_v1_probe() *almost* trivial. Move the dout() message that announces discovery of an image into rbd_dev_image_probe(), generalize to support images in either format and only show it if an image is fully discovered. This highlights that are some unnecessary cleanups in the error path for rbd_dev_v1_probe(), so they can be removed. Now rbd_dev_v1_probe() *is* a trivial wrapper function. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-08rbd: update in-core header directlyAlex Elder
Now that rbd_header_from_disk() only fills in one-time fields once, we can extend it slightly so it releases the other fields before replacing their values. This way there's no need to pass a temporary buffer and then copy all the results in. Just use the rbd device header structure in rbd_header_from_disk() so its values get updated directly. Note that this means we need to take the header semaphore at the point we update things. So pass the rbd_dev rather than the address of its header as its first argument to rbd_header_from_disk(), and have it return an error code. As a result, rbd_dev_v1_header_read() does all the work, rbd_read_header() becomes unnecessary, and rbd_dev_v1_refresh() becomes a very simple wrapper. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-08rbd: refactor rbd_header_from_disk()Alex Elder
This rearranges rbd_header_from_disk so that it: - allocates the snapshot context right away - keeps results in local variables, not changing the passed-in header until it's known we'll succeed - does initialization of set-once fields in a header only if they have not already been set The last point is moot at the moment, because rbd_read_header() (the only caller) always supplies a zero-filled header buffer. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>