summaryrefslogtreecommitdiff
path: root/fs/nfs/pnfs.c
AgeCommit message (Collapse)Author
2012-06-19NFSv4.1: Fix a race in set_pnfs_layoutdriverTrond Myklebust
The call to try_module_get() dereferences ld_type outside the spin locks, which means that it may be pointing to garbage if a module unload was in progress. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-18NFSv4.1: Fix umount when filelayout DS is also the MDSTrond Myklebust
Currently there is a 'chicken and egg' issue when the DS is also the mounted MDS. The nfs_match_client() reference from nfs4_set_ds_client bumps the cl_count, the nfs_client is not freed at umount, and nfs4_deviceid_purge_client is not called to dereference the MDS usage of a deviceid which holds a reference to the DS nfs_client. The result is the umount program returns, but the nfs_client is not freed, and the cl_session hearbeat continues. The MDS (and all other nfs mounts) lose their last nfs_client reference in nfs_free_server when the last nfs_server (fsid) is umounted. The file layout DS lose their last nfs_client reference in destroy_ds when the last deviceid referencing the data server is put and destroy_ds is called. This is triggered by a call to nfs4_deviceid_purge_client which removes references to a pNFS deviceid used by an MDS mount. The fix is to track how many pnfs enabled filesystems are mounted from this server, and then to purge the device id cache once that count reaches zero. Reported-by: Jorge Mora <Jorge.Mora@netapp.com> Reported-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24NFSv4.1 test the mdsthreshold hint parametersAndy Adamson
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24NFSv4.1 add nfs_inode book keeping for mdsthresholdAndy Adamson
Keep track of the number of bytes read or written via buffered, direct, and mem-mapped i/o for use by mdsthreshold size_io hints. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-24NFSv4.1 cache mdsthreshold values on OPENAndy Adamson
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19NFSv4.1 resend LAYOUTGET on data server invalid layout errorsAndy Adamson
The "invalid layout" class of errors is handled by destroying the layout and getting a new layout from the server. Currently, the layout must be destroyed before a new layout can be obtained. This means that all references (e.g.lsegs) to the "to be destroyed" layout header must be dropped before it can be destroyed. This in turn means waiting for all in flight RPC's using the old layout as well as draining the data server session slot table wait queue. Set the NFS_LAYOUT_INVALID flag to redirect I/O to the MDS while waiting for the old layout to be destroyed. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19NFSv4.1 send layoutreturn to fence disconnected data serverAndy Adamson
Let the MDS know that you are redirecting I/O from pNFS to MDS. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-19NFSv4.1: mark deviceid invalid on filelayout DS connection errorsAndy Adamson
This prevents the use of any layout for i/o that references the deviceid. I/O is redirected through the MDS. Redirect the unhandled failed I/O to the MDS without marking either the layout or the deviceid invalid. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-01NFS: Clean up nfs read and write error pathsTrond Myklebust
Move the error handling for nfs_generic_pagein() into a single function. Ditto for nfs_generic_flush(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Fred Isaman <iisaman@netapp.com>
2012-04-27NFS: Fix a use-before-initialised warning in fs/nfs/write.c and fs/nfs/pnfs.cTrond Myklebust
If the allocation of nfs_write_header fails, the list of nfs_pages that needs to be cleaned up is still on desc->pg_list... Reported-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Fred Isaman <iisaman@netapp.com>
2012-04-27NFS: prepare coalesce testing for directioFred Isaman
The coalesce code made assumptions that will no longer be true once non-page aligned io occurs. This introduces no change in current behavior, but allows for more general situations to come. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27NFS: create completion structure to pass into page_init functionsFred Isaman
Factors out the code that will need to change when directio starts using these code paths. This will allow directio to use the generic pagein and flush routines Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27NFS: merge _full and _partial write rpc_opsFred Isaman
Decouple nfs_pgio_header and nfs_write_data, and have (possibly multiple) nfs_write_datas each take a refcount on nfs_pgio_header. For the moment keeps nfs_write_header as a way to preallocate a single nfs_write_data with the nfs_pgio_header. The code doesn't need this, and would be prettier without, but given the amount of churn I am already introducing I didn't want to play with tuning new mempools. This also fixes bug in pnfs_ld_handle_write_error. In the case of desc->pg_bsize < PAGE_CACHE_SIZE, the pages list was empty, causing replay attempt to do nothing. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27NFS: merge _full and _partial read rpc_opsFred Isaman
Decouple nfs_pgio_header and nfs_read_data, and have (possibly multiple) nfs_read_datas each take a refcount on nfs_pgio_header. For the moment keeps nfs_read_header as a way to preallocate a single nfs_read_data with the nfs_pgio_header. The code doesn't need this, and would be prettier without, but given the amount of churn I am already introducing I didn't want to play with tuning new mempools. This also fixes bug in pnfs_ld_handle_read_error. In the case of desc->pg_bsize < PAGE_CACHE_SIZE, the pages list was empty, causing replay attempt to do nothing. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27NFS: create common nfs_pgio_header for both read and writeFred Isaman
In order to avoid duplicating all the data in nfs_read_data whenever we split it up into multiple RPC calls (either due to a short read result or due to rsize < PAGE_SIZE), we split out the bits that are the same per RPC call into a separate "header" structure. The goal this patch moves towards is to have a single header refcounted by several rpc_data structures. Thus, want to always refer from rpc_data to the header, and not the other way. This patch comes close to that ideal, but the directio code currently needs some special casing, isolated in the nfs_direct_[read_write]hdr_release() functions. This will be dealt with in a future patch. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-27NFS4.1: make pnfs_ld_[read|write]_done consistentFred Isaman
The two functions had diverged quite a bit, with the write function being a bit more robust than the read. However, these still break badly in the desc->pg_bsize < PAGE_CACHE_SIZE case, as then there is nothing hanging on the data->pages list, and the resend ends up doing nothing. This will be fixed in a patch later in the series. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-04-26NFSv4.1 fix page number calculation bug for filelayout decode buffersAndy Adamson
Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-17NFSv4.1: Clean ups and bugfixes for the pNFS read/writeback/commit codeTrond Myklebust
Move more pnfs-isms out of the generic commit code. Bugfixes: - filelayout_scan_commit_lists doesn't need to get/put the lseg. In fact since it is run under the inode->i_lock, the lseg_put() can deadlock. - Ensure that we distinguish between what needs to be done for commit-to-data server and what needs to be done for commit-to-MDS using the new flag PG_COMMIT_TO_DS. Otherwise we may end up calling put_lseg() on a bucket for a struct nfs_page that got written through the MDS. - Fix a case where we were using list_del() on an nfs_page->wb_list instead of list_del_init(). - filelayout_initiate_commit needs to call filelayout_commit_release on error instead of the mds_ops->rpc_release(). Otherwise it won't clear the commit lock. Cleanups: - Let the files layout manage the commit lists for the pNFS case. Don't expose stuff like pnfs_choose_commit_list, and the fact that the commit buckets hold references to the layout segment in common code. - Cast out the put_lseg() calls for the struct nfs_read/write_data->lseg into the pNFS layer from whence they came. - Let the pNFS layer manage the NFS_INO_PNFS_COMMIT bit. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Fred Isaman <iisaman@netapp.com>
2012-03-06NFSv4: Simplify the struct nfs4_stateidTrond Myklebust
Replace the union with the common struct stateid4 as defined in both RFC3530 and RFC5661. This makes it easier to access the sequence id, which will again make implementing support for parallel OPEN calls easier. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-06NFSv4: Add helpers for basic copying of stateidsTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-03-01NFSv4.1: Get rid of redundant NFS4CLNT_LAYOUTRECALL testsTrond Myklebust
The NFS4CLNT_LAYOUTRECALL tests in pnfs_layout_process and pnfs_update_layout are redundant. In the case of a bulk layout recall, we're always testing for the NFS_LAYOUT_BULK_RECALL flay anyway. In the case of a file or segment recall, the call to pnfs_set_layout_stateid() updates the layout_header 'barrier' sequence id, which triggers the test in pnfs_layoutgets_blocked() and is less race-prone than NFS4CLNT_LAYOUTRECALL anyway. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-06NFS: start printks w/ NFS: even if __func__ shownWeston Andros Adamson
This patch addresses printks that have some context to show that they are from fs/nfs/, but for the sake of consistency now start with NFS: Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-02-01NFS: Use kcalloc() when allocating arraysTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-06NFS: Remove pNFS bloat from the generic write pathTrond Myklebust
We have no business doing any this in the standard write release path. Get rid of it, and put it in the pNFS layer. Also, while we're at it, get rid of the completely bogus unlock/relock semantics that were present in nfs_writeback_release_full(). It is not only unnecessary, but actually dangerous to release the write lock just in order to take it again in nfs_page_async_flush(). Better just to open code the pgio operations in a pnfs helper. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-01-06pnfs-obj: Must return layout on IO errorBoaz Harrosh
As mandated by the standard. In case of an IO error, a pNFS objects layout driver must return it's layout. This is because all device errors are reported to the server as part of the layout return buffer. This is implemented the same way PNFS_LAYOUTRET_ON_SETATTR is done, through a bit flag on the pnfs_layoutdriver_type->flags member. The flag is set by the layout driver that wants a layout_return preformed at pnfs_ld_{write,read}_done in case of an error. (Though I have not defined a wrapper like pnfs_ld_layoutret_on_setattr because this code is never called outside of pnfs.c and pnfs IO paths) Without this patch 3.[0-2] Kernels leak memory and have an annoying WARN_ON after every IO error utilizing the pnfs-obj driver. [This patch is for 3.2 Kernel. 3.1/0 Kernels need a different patch] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-22Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: Revert pnfs ugliness from the generic NFS read code path SUNRPC: destroy freshly allocated transport in case of sockaddr init error NFS: Fix a regression in the referral code nfs: move nfs_file_operations declaration to bottom of file.c (try #2) nfs: when attempting to open a directory, fall back on normal lookup (try #5)
2011-11-10NFS: Revert pnfs ugliness from the generic NFS read code pathTrond Myklebust
pNFS-specific code belongs in the pnfs layer. It should not be hijacking generic NFS read or write code paths. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-11-07Merge branch 'modsplit-Oct31_2011' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h
2011-10-31fs: add module.h to files that were implicitly using itPaul Gortmaker
Some files were using the complete module.h infrastructure without actually including the header at all. Fix them up in advance so once the implicit presence is removed, we won't get failures like this: CC [M] fs/nfsd/nfssvc.o fs/nfsd/nfssvc.c: In function 'nfsd_create_serv': fs/nfsd/nfssvc.c:335: error: 'THIS_MODULE' undeclared (first use in this function) fs/nfsd/nfssvc.c:335: error: (Each undeclared identifier is reported only once fs/nfsd/nfssvc.c:335: error: for each function it appears in.) fs/nfsd/nfssvc.c: In function 'nfsd': fs/nfsd/nfssvc.c:555: error: implicit declaration of function 'module_put_and_exit' make[3]: *** [fs/nfsd/nfssvc.o] Error 1 Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31nfs4: serialize layoutcommitPeng Tao
Current pnfs_layoutcommit_inode can not handle parallel layoutcommit. And as Trond suggested , there is no need for client to optimize for parallel layoutcommit. So add NFS_INO_LAYOUTCOMMITTING flag to mark inflight layoutcommit and serialize lalyoutcommit with it. Also mark_inode_dirty_sync if pnfs_layoutcommit_inode fails to issue layoutcommit. Reported-by: Vitaliy Gusev <gusev.vitaliy@nexenta.com> Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18pnfs: recoalesce when ld read pagelist failsPeng Tao
For pnfs pagelist read failure, we need to pg_recoalesce and resend IO to mds. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Cc: stable@kernel.org [3.0] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18pnfs: recoalesce when ld write pagelist failsPeng Tao
For pnfs pagelist write failure, we need to pg_recoalesce and resend IO to mds. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Cc: stable@kernel.org [3.0] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-10-18pnfs: make _set_lo_fail genericPeng Tao
file layout and block layout both use it to set mark layout io failure bit. So make it generic. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Cc: stable@kernel.org [3.0] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31pnfs: cleanup_layoutcommitAndy Adamson
This gives layout driver a chance to cleanup structures they put in at encode_layoutcommit. Signed-off-by: Andy Adamson <andros@netapp.com> [fixup layout header pointer for layoutcommit] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> [rm inode and pnfs_layout_hdr args from cleanup_layoutcommit()] Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31pnfs: add set-clear layoutdriver interfaceBenny Halevy
To allow layout driver to issue getdevicelist at mount time, and clean up at umount time. [fixup non NFS_V4_1 set_pnfs_layoutdriver definition] [pnfs: pass mntfh down the init_pnfs path] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31pnfs: let layoutcommit handle a list of lsegPeng Tao
There can be multiple lseg per file, so layoutcommit should be able to handle it. [Needed in v3.0] CC: Stable Tree <stable@kernel.org> Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31pnfs: save layoutcommit cred at layout header initPeng Tao
No need to save it for every lseg. No need to save it at every pnfs_set_layoutcommit. [Needed in v3.0] CC: Stable Tree <stable@kernel.org> Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-31pnfs: save layoutcommit lwb at layout headerPeng Tao
No need to save it for every lseg. [Needed in v3.0] CC: Stable Tree <stable@kernel.org> Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15NFS: Clean up - simplify the switch to read/write-through-MDSTrond Myklebust
Use nfs_pageio_reset_read_mds and nfs_pageio_reset_write_mds instead of completely reinitialising the struct nfs_pageio_descriptor. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15NFS: Move the pnfs write code into pnfs.cTrond Myklebust
...and ensure that we recoalese to take into account differences in differences in block sizes when falling back to write through the MDS. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-15NFS: Move the pnfs read code into pnfs.cTrond Myklebust
...and ensure that we recoalese to take into account differences in block sizes when falling back to read through the MDS. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12NFSv4.1: do not use deviceids after MDS clientid invalidationAndy Adamson
Mark all deviceids established under an expired MDS clientid as invalid. Stop all new i/o through DS and send through the MDS. Don't use any new LAYOUTGETs that use the invalid deviceid. Purge all layouts established under the expired MDS clientid. Remove the MDS clientid deviceid and data servers reference Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12NFSv4.1: Clean up layoutreturnTrond Myklebust
Since we take a reference to it, we really ought to pass the a pointer to the layout header in the arguments instead of assuming that NFS_I(inode)->layout will forever point to the correct object. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12NFSv4.1: File layout only supports whole file layoutsAndy Adamson
Ask for whole file layouts. Until support for layout segments is fully supported in the file layout code, discard non-whole file layouts. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12NFSv4.1: Fall back to ordinary i/o through the mds if we have no layout segmentTrond Myklebust
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12NFSv4.1: Add an initialisation callback for pNFSTrond Myklebust
Ensure that we always get a layout before setting up the i/o request. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12NFS: Cleanup of the nfs_pageio code in preparation for a pnfs bugfixTrond Myklebust
We need to ensure that the layouts are set up before we can decide to coalesce requests. To do so, we want to further split up the struct nfs_pageio_descriptor operations into an initialisation callback, a coalescing test callback, and a 'do i/o' callback. This patch cleans up the existing callback methods before adding the 'initialisation' callback. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-07-12NFS: move pnfs layouts to nfs_server structureWeston Andros Adamson
Layouts should be tracked per nfs_server (aka superblock) instead of per struct nfs_client, which may have multiple FSIDs associated with it. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-21NFSv4.1: Fix an off-by-one error in pnfs_generic_pg_testTrond Myklebust
And document what is going on there... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2011-06-21NFSv4.1: Fix some issues with pnfs_generic_pg_testTrond Myklebust
1. If the intention is to coalesce requests 'prev' and 'req' then we have to ensure at least that we have a layout starting at req_offset(prev). 2. If we're only requesting a minimal layout of length desc->pg_count, we need to test the length actually returned by the server before we allow the coalescing to occur. 3. We need to deal correctly with (pgio->lseg == NULL) 4. Fixup the test guarding the pnfs_update_layout. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>