summaryrefslogtreecommitdiff
path: root/fs/nfs/pnfs.c
AgeCommit message (Collapse)Author
2013-03-01NFSv4.1: LAYOUTGET EDELAY loops timeout to the MDSWeston Andros Adamson
The client will currently try LAYOUTGETs forever if a server is returning NFS4ERR_LAYOUTTRYLATER or NFS4ERR_RECALLCONFLICT - even if the client no longer needs the layout (ie process killed, unmounted). This patch uses the DS timeout value (module parameter 'dataserver_timeo' via rpc layer) to set an upper limit of how long the client tries LATOUTGETs in this situation. Once the timeout is reached, IO is redirected to the MDS. This also changes how the client checks if a layout is on the clp list to avoid a double list_add. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-24pnfs: fix resend_to_mds for directioBenny Halevy
Pass the directio request on pageio_init to clean up the API. Percolate pg_dreq from original nfs_pageio_descriptor to the pnfs_{read,write}_done_resend_to_mds and use it on respective call to nfs_pageio_init_{read,write} on the newly created nfs_pageio_descriptor. Reproduced by command: mount -o vers=4.1 server:/ /mnt dd bs=128k count=8 if=/dev/zero of=/mnt/dd.out oflag=direct BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 IP: [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs] PGD 34786067 PUD 34794067 PMD 0 Oops: 0002 [#1] SMP Modules linked in: nfs_layout_nfsv41_files nfsv4 nfs nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc btrfs zlib_deflate libcrc32c ipv6 autofs4 CPU 1 Pid: 259, comm: kworker/1:2 Not tainted 3.8.0-rc6 #2 Bochs Bochs RIP: 0010:[<ffffffffa021a3a8>] [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs] RSP: 0018:ffff880038f8fa68 EFLAGS: 00010206 RAX: ffffffffa021a6a9 RBX: ffff880038f8fb48 RCX: 00000000000a0000 RDX: ffffffffa021e616 RSI: ffff8800385e9a40 RDI: 0000000000000028 RBP: ffff880038f8fa68 R08: ffffffff81ad6720 R09: ffff8800385e9510 R10: ffffffffa0228450 R11: ffff880038e87418 R12: ffff8800385e9a40 R13: ffff8800385e9a70 R14: ffff880038f8fb38 R15: ffffffffa0148878 FS: 0000000000000000(0000) GS:ffff88003e400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000028 CR3: 0000000034789000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/1:2 (pid: 259, threadinfo ffff880038f8e000, task ffff880038302480) Stack: ffff880038f8fa78 ffffffffa021a6bf ffff880038f8fa88 ffffffffa021bb82 ffff880038f8fae8 ffffffffa021f454 ffff880038f8fae8 ffffffff8109689d ffff880038f8fab8 ffffffff00000006 0000000000000000 ffff880038f8fb48 Call Trace: [<ffffffffa021a6bf>] nfs_direct_pgio_init+0x16/0x18 [nfs] [<ffffffffa021bb82>] nfs_pgheader_init+0x6a/0x6c [nfs] [<ffffffffa021f454>] nfs_generic_pg_writepages+0x51/0xf8 [nfs] [<ffffffff8109689d>] ? mark_held_locks+0x71/0x99 [<ffffffffa0148878>] ? rpc_release_resources_task+0x37/0x37 [sunrpc] [<ffffffffa021bc25>] nfs_pageio_doio+0x1a/0x43 [nfs] [<ffffffffa021be7c>] nfs_pageio_complete+0x16/0x2c [nfs] [<ffffffffa02608be>] pnfs_write_done_resend_to_mds+0x95/0xc5 [nfsv4] [<ffffffffa0148878>] ? rpc_release_resources_task+0x37/0x37 [sunrpc] [<ffffffffa028e27f>] filelayout_reset_write+0x8c/0x99 [nfs_layout_nfsv41_files] [<ffffffffa028e5f9>] filelayout_write_done_cb+0x4d/0xc1 [nfs_layout_nfsv41_files] [<ffffffffa024587a>] nfs4_write_done+0x36/0x49 [nfsv4] [<ffffffffa021f996>] nfs_writeback_done+0x53/0x1cc [nfs] [<ffffffffa021fb1d>] nfs_writeback_done_common+0xe/0x10 [nfs] [<ffffffffa028e03d>] filelayout_write_call_done+0x28/0x2a [nfs_layout_nfsv41_files] [<ffffffffa01488a1>] rpc_exit_task+0x29/0x87 [sunrpc] [<ffffffffa014a0c9>] __rpc_execute+0x11d/0x3cc [sunrpc] [<ffffffff810969dc>] ? trace_hardirqs_on_caller+0x117/0x173 [<ffffffffa014a39f>] rpc_async_schedule+0x27/0x32 [sunrpc] [<ffffffffa014a378>] ? __rpc_execute+0x3cc/0x3cc [sunrpc] [<ffffffff8105f8c1>] process_one_work+0x226/0x422 [<ffffffff8105f7f4>] ? process_one_work+0x159/0x422 [<ffffffff81094757>] ? lock_acquired+0x210/0x249 [<ffffffffa014a378>] ? __rpc_execute+0x3cc/0x3cc [sunrpc] [<ffffffff810600d8>] worker_thread+0x126/0x1c4 [<ffffffff8105ffb2>] ? manage_workers+0x240/0x240 [<ffffffff81064ef8>] kthread+0xb1/0xb9 [<ffffffff81064e47>] ? __kthread_parkme+0x65/0x65 [<ffffffff815206ec>] ret_from_fork+0x7c/0xb0 [<ffffffff81064e47>] ? __kthread_parkme+0x65/0x65 Code: 00 83 38 02 74 12 48 81 4b 50 00 00 01 00 c7 83 60 07 00 00 01 00 00 00 48 89 df e8 55 fe ff ff 5b 41 5c 5d c3 66 90 55 48 89 e5 <f0> ff 07 5d c3 55 48 89 e5 f0 ff 0f 0f 94 c0 84 c0 0f 95 c0 0f RIP [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs] RSP <ffff880038f8fa68> CR2: 0000000000000028 Signed-off-by: Benny Halevy <bhalevy@tonian.com> Cc: stable@kernel.org [>= 3.6] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-14NFSv4.1: Fix bulk recall and destroy of layoutsTrond Myklebust
The current code in pnfs_destroy_all_layouts() assumes that removing the layout from the server->layouts list is sufficient to make it invisible to other processes. This ignores the fact that most users access the layout through the nfs_inode->layout... There is further breakage due to lack of reference counting of the layouts, meaning that the whole thing Oopses at the drop of a hat. The code in initiate_bulk_draining() is almost correct, and can be used as a model for pnfs_destroy_all_layouts(), so move that code to pnfs.c, and refactor the code to allow us to choose between a single filesystem bulk recall, and a recall of all layouts. Also note that initiate_bulk_draining() currently calls iput() while holding locks. Fix that too. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2013-01-04pnfs: Increase the refcount when LAYOUTGET fails the first timeYanchuan Nian
The layout will be set unusable if LAYOUTGET fails. Is it reasonable to increase the refcount iff LAYOUTGET fails the first time? Signed-off-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>= 3.7]
2012-11-04NFSv4.1: Remove assertion BUG_ON()s from the files and generic layout codeTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-11-04NFSv4.1: Remove unused function last_byte_offsetTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-31nfs: Check whether a layout pointer is NULL before free itYanchuan Nian
The new layout pointer in pnfs_find_alloc_layout() may be NULL because of out of memory. we must do some check work, otherwise pnfs_free_layout_hdr() will go wrong because it can not deal with a NULL pointer. Signed-off-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-08NFS41: send real read size in layoutgetPeng Tao
For buffer read, use offst-to-isize. For direct read, use dreq->bytes_left. Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-08NFS41: send real write size in layoutgetPeng Tao
For buffer write, block layout client scan inode mapping to find next hole and use offset-to-hole as layoutget length. Object layout client uses offset-to-isize as layoutget length. For direct write, both block layout and object layout use dreq->bytes_left. Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-05NFSv4.1: Cleanup ugliness in pnfs_layoutgets_blocked()Trond Myklebust
Split it into two functions, one which checks if layoutgets are blocked, and one which checks if the layout stateid has expired. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-04NFSv4.1: Ensure that the layout sequence id stays 'close' to the currentTrond Myklebust
Clamp the layout barrier sequence id to the current sequence id minus the maximum number of outstanding layoutget requests. Also ensure that we correctly initialise lo->plh_barrier if there are no layout segments associated to this layout header. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-04NFSv4.1: Deal with seqid wraparound in the pNFS return-on-close codeTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-03NFSv4.1: Deal with wraparound when updating the layout "barrier" seqidTrond Myklebust
...and fix a bug in pnfs_set_layout_stateid. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02NFSv4.1: Deal with wraparound issues when updating the layout stateidTrond Myklebust
...and add a helper function. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02NFSv4.1: Always set the layout stateid if this is the first layoutgetTrond Myklebust
If the list of layout segments is empty, we must unconditionally set the layout stateid. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-10-02NFSv4.1: Fix another refcount issue in pnfs_find_alloc_layoutTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: _pnfs_return_layout() shouldn't invalidate the layout on failureTrond Myklebust
Failure of the layoutreturn allocation fails is not a good reason to mark the pnfs_layout_hdr as having failed a layoutget or i/o. Just exit cleanly. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: Remove the NFS_LAYOUT_RETURNED stateTrond Myklebust
It serves no purpose that the test for whether or not we have valid layout segments doesn't already serve. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: Clear NFS_LAYOUT_BULK_RECALL when the layout segments are freedTrond Myklebust
Once all the affected layout segments have been freed up, clear the NFS_LAYOUT_BULK_RECALL flag so that we can reuse the pnfs_layout_hdr Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: Get rid of the NFS_LAYOUT_DESTROYED stateTrond Myklebust
We already have a mechanism for blocking LAYOUTGET by means of the plh_block_lgets counter. The only "service" that NFS_LAYOUT_DESTROYED provides at this point is to block layoutget once the layout segment list is empty, which basically means that you have to wait until the pnfs_layout_hdr is destroyed before you can do pNFS on that file again. This patch enables the reuse of the pnfs_layout_hdr if the layout segment list is empty. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: Remove unused 'default allocation' for pnfs_alloc_layout_hdr()Trond Myklebust
...and ditto for pnfs_free_layout_hdr() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: Get rid of pNFS spin lock debugging asserts...Trond Myklebust
These are all in static declared functions that are called only once. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: Balance pnfs_layout_hdr refcount in pnfs_layout_(insert|remove)_lsegTrond Myklebust
Ensure that the reference count for pnfs_layout_hdr reverts to the original value after a call to pnfs_layout_remove_lseg(). Note that the caller is expected to hold a reference to the struct pnfs_layout_hdr. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: Clean up pnfs_put_lseg()Trond Myklebust
There is no longer a need to use pnfs_free_lseg_list(). Just call pnfs_free_lseg() directly. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: Clean up the removal of pnfs_layout_hdr from the server listTrond Myklebust
Move the code into pnfs_free_layout_hdr(), and add checks to get_layout_by_fh_locked to ensure that they don't reference a layout that is being freed. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: Free the pnfs_layout_hdr outside the inode->i_lockTrond Myklebust
None of the existing pNFS layout drivers seem to require the inode to be locked while they free the layout header. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: Remove redundant reference to the pnfs_layout_hdrTrond Myklebust
Each layout segment already holds a reference to the pnfs_layout_hdr, so there is no need to hold an extra reference that is released once the last layout segment is freed. Ensure that pnfs_find_alloc_layout() always returns a reference to the pnfs_layout_hdr, which will be matched by the final call to pnfs_put_layout_hdr() in pnfs_update_layout(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: Rename the pnfs_put_lseg_common to pnfs_layout_remove_lsegTrond Myklebust
The latter name is more descriptive of the actual function. Also rename pnfs_insert_layout to pnfs_layout_insert_lseg. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: reset the inode MDS threshold counters on layout destructionTrond Myklebust
Instead of resetting the inode MDS threshold counters when we mark the layout for destruction, do it as part of freeing the layout. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: Fix a race in the pNFS return-on-close codeTrond Myklebust
If we sleep after dropping the inode->i_lock, then we are no longer atomic with respect to the rpc_wake_up() call in pnfs_layout_remove_lseg(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: pnfs_layout_io_set_failed must clear invalid lsegsTrond Myklebust
If pnfs_layout_io_test_failed() authorises a retry of the failed layoutgets, we should clear the existing layout segments so that we start afresh. Do this in pnfs_layout_io_set_failed(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: Don't drop the pnfs_layout_hdr after a layoutget failureTrond Myklebust
We want to cache the pnfs_layout_hdr after a layoutget or i/o failure so that pnfs_update_layout() can find it and know when it is time to retry. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: Fix a reference leak in pnfs_update_layoutTrond Myklebust
If we exit after the call to pnfs_find_alloc_layout(), we have to ensure that we put the struct pnfs_layout_hdr. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: Retry pNFS after a 2 minute timeoutTrond Myklebust
If we had to fall back to read/write through MDS, then assume that we should retry pNFS after a suitable timeout period. The following patch sets a timeout of 2 minutes. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: Add helpers for setting/reading the I/O fail bitTrond Myklebust
...and make them local to the pnfs.c file. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: Replace dprintk() in pnfs_update_layout with something less buggyTrond Myklebust
Dereferencing nfsi->layout in order to read plh_flags without holding a spin lock is bug prone. Furthermore, the dprintk() tells you nothing about whether or not the call succeeded. Replace it with something that tells you about whether or not a valid layout segment was returned for the inode in question. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: Cleanup; add "pnfs_" prefix to put_lseg() and get_lseg()Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: Cleanup; add "pnfs_" prefix to get_layout_hdr() and put_layout_hdr()Trond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFSv4.1: Cleanup add a "pnfs_" prefix to mark_matching_lsegs_invalidTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-09-28NFS: Clean up the pNFS layoutget interfaceTrond Myklebust
Ensure that we do return errors from nfs4_proc_layoutget() and that we don't mark the layout as having failed if the error was due to a signal or resource problem on the client side. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-08-02pnfs: defer release of pages in layoutgetIdan Kedar
we have encountered a bug whereby reading a lot of files (copying fedora's /bin) from a pNFS mount and hitting Ctrl+C in the middle caused a general protection fault in xdr_shrink_bufhead. this function is called when decoding the response from LAYOUTGET. the decoding is done by a worker thread, and the caller of LAYOUTGET waits for the worker thread to complete. hitting Ctrl+C caused the synchronous wait to end and the next thing the caller does is to free the pages, so when the worker thread calls xdr_shrink_bufhead, the pages are gone. therefore, the cleanup of these pages has been moved to nfs4_layoutget_release. Signed-off-by: Idan Kedar <idank@tonian.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30NFS: Convert v4 into a moduleBryan Schumaker
This patch exports symbols needed by the v4 module. In addition, I also switch over to using IS_ENABLED() to check if CONFIG_NFS_V4 or CONFIG_NFS_V4_MODULE are set. The module (nfs4.ko) will be created in the same directory as nfs.ko and will be automatically loaded the first time you try to mount over NFS v4. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16NFSv4.1 do not send LAYOUTRETURN on emtpy plh_segs listAndy Adamson
mark_matching_lsegs_invalid() resets the mds_threshold counters and can dereference the layout hdr on an initial empty plh_segs list. It returns 0 both in the case of an initial empty list and in a non-emtpy list that was cleared by calls to mark_lseg_invalid. Don't send a LAYOUTRETURN if the list was initially empty. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16NFSv4.1 mark layout when already returnedAndy Adamson
When the file layout driver is fencing a DS, _pnfs_return_layout can be called mulitple times per inode due to in-flight i/o referencing lsegs on it's plh_segs list. Remember that LAYOUTRETURN has been called, and do not call it again. Allow LAYOUTRETURNs after a subsequent LAYOUTGET. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29NFS: Create an write_pageio_init() functionBryan Schumaker
pNFS needs to select a write function based on the layout driver currently in use, so I let each NFS version decide how to best handle initializing writes. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29NFS: Create an read_pageio_init() functionBryan Schumaker
pNFS needs to select a read function based on the layout driver currently in use, so I let each NFS version decide how to best handle initializing reads. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-19NFSv4.1: Fix a race in set_pnfs_layoutdriverTrond Myklebust
The call to try_module_get() dereferences ld_type outside the spin locks, which means that it may be pointing to garbage if a module unload was in progress. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-18NFSv4.1: Fix umount when filelayout DS is also the MDSTrond Myklebust
Currently there is a 'chicken and egg' issue when the DS is also the mounted MDS. The nfs_match_client() reference from nfs4_set_ds_client bumps the cl_count, the nfs_client is not freed at umount, and nfs4_deviceid_purge_client is not called to dereference the MDS usage of a deviceid which holds a reference to the DS nfs_client. The result is the umount program returns, but the nfs_client is not freed, and the cl_session hearbeat continues. The MDS (and all other nfs mounts) lose their last nfs_client reference in nfs_free_server when the last nfs_server (fsid) is umounted. The file layout DS lose their last nfs_client reference in destroy_ds when the last deviceid referencing the data server is put and destroy_ds is called. This is triggered by a call to nfs4_deviceid_purge_client which removes references to a pNFS deviceid used by an MDS mount. The fix is to track how many pnfs enabled filesystems are mounted from this server, and then to purge the device id cache once that count reaches zero. Reported-by: Jorge Mora <Jorge.Mora@netapp.com> Reported-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24NFSv4.1 test the mdsthreshold hint parametersAndy Adamson
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24NFSv4.1 add nfs_inode book keeping for mdsthresholdAndy Adamson
Keep track of the number of bytes read or written via buffered, direct, and mem-mapped i/o for use by mdsthreshold size_io hints. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>