summaryrefslogtreecommitdiff
path: root/fs/nfs
AgeCommit message (Collapse)Author
2012-08-01Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds
Merge Andrew's second set of patches: - MM - a few random fixes - a couple of RTC leftovers * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits) rtc/rtc-88pm80x: remove unneed devm_kfree rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables tmpfs: distribute interleave better across nodes mm: remove redundant initialization mm: warn if pg_data_t isn't initialized with zero mips: zero out pg_data_t when it's allocated memcg: gix memory accounting scalability in shrink_page_list mm/sparse: remove index_init_lock mm/sparse: more checks on mem_section number mm/sparse: optimize sparse_index_alloc memcg: add mem_cgroup_from_css() helper memcg: further prevent OOM with too many dirty pages memcg: prevent OOM with too many dirty pages mm: mmu_notifier: fix freed page still mapped in secondary MMU mm: memcg: only check anon swapin page charges for swap cache mm: memcg: only check swap cache pages for repeated charging mm: memcg: split swapin charge function into private and public part mm: memcg: remove needless !mm fixup to init_mm when charging mm: memcg: remove unneeded shmem charge type ...
2012-08-01Merge tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull second wave of NFS client updates from Trond Myklebust: - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into separate modules. - Fix Oopses in the NFSv4 idmapper - Fix a deadlock whereby rpciod tries to allocate a new socket and ends up recursing into the NFS code due to memory reclaim. - Increase the number of permitted callback connections. * tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: explicitly reject LOCK_MAND flock() requests nfs: increase number of permitted callback connections. SUNRPC: return negative value in case rpcbind client creation error NFS: Convert v4 into a module NFS: Convert v3 into a module NFS: Convert v2 into a module NFS: Keep module parameters in the generic NFS client NFS: Split out remaining NFS v4 inode functions NFS: Pass super operations and xattr handlers in the nfs_subversion NFS: Only initialize the ACL client in the v3 case NFS: Create a try_mount rpc op NFS: Remove the NFS v4 xdev mount function NFS: Add version registering framework NFS: Fix a number of bugs in the idmapper nfs: skip commit in releasepage if we're freeing memory for fs-related reasons sunrpc: clarify comments on rpc_make_runnable pnfsblock: bail out partial page IO
2012-08-01nfs: prevent page allocator recursions with swap over NFS.Mel Gorman
GFP_NOFS is _more_ permissive than GFP_NOIO in that it will initiate IO, just not of any filesystem data. The problem is that previously NOFS was correct because that avoids recursion into the NFS code. With swap-over-NFS, it is no longer correct as swap IO can lead to this recursion. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-08-01nfs: enable swap on NFSMel Gorman
Implement the new swapfile a_ops for NFS and hook up ->direct_IO. This will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol ->connect() method. PF_MEMALLOC should allow the allocation of struct socket and related objects and the early (re)setting of SOCK_MEMALLOC should allow us to receive the packets required for the TCP connection buildup. [jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases] [dfeng@redhat.com: Fix handling of multiple swap files] [a.p.zijlstra@chello.nl: Original patch] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-08-01nfs: disable data cache revalidation for swapfilesMel Gorman
The VM does not like PG_private set on PG_swapcache pages. As suggested by Trond in http://lkml.org/lkml/2006/8/25/348, this patch disables NFS data cache revalidation on swap files. as it does not make sense to have other clients change the file while it is being used as swap. This avoids setting PG_private on swap pages, since there ought to be no further races with invalidate_inode_pages2() to deal with. Since we cannot set PG_private we cannot use page->private which is already used by PG_swapcache pages to store the nfs_page. Thus augment the new nfs_page_find_request logic. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-08-01nfs: teach the NFS client how to treat PG_swapcache pagesMel Gorman
Replace all relevant occurences of page->index and page->mapping in the NFS client with the new page_file_index() and page_file_mapping() functions. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-31Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd changes from J. Bruce Fields: "This has been an unusually quiet cycle--mostly bugfixes and cleanup. The one large piece is Stanislav's work to containerize the server's grace period--but that in itself is just one more step in a not-yet-complete project to allow fully containerized nfs service. There are a number of outstanding delegation, container, v4 state, and gss patches that aren't quite ready yet; 3.7 may be wilder." * 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (35 commits) NFSd: make boot_time variable per network namespace NFSd: make grace end flag per network namespace Lockd: move grace period management from lockd() to per-net functions LockD: pass actual network namespace to grace period management functions LockD: manage grace list per network namespace SUNRPC: service request network namespace helper introduced NFSd: make nfsd4_manager allocated per network namespace context. LockD: make lockd manager allocated per network namespace LockD: manage grace period per network namespace Lockd: add more debug to host shutdown functions Lockd: host complaining function introduced LockD: manage used host count per networks namespace LockD: manage garbage collection timeout per networks namespace LockD: make garbage collector network namespace aware. LockD: mark host per network namespace on garbage collect nfsd4: fix missing fault_inject.h include locks: move lease-specific code out of locks_delete_lock locks: prevent side-effects of locks_release_private before file_lock is initialized NFSd: set nfsd_serv to NULL after service destruction NFSd: introduce nfsd_destroy() helper ...
2012-07-31nfs: explicitly reject LOCK_MAND flock() requestsJeff Layton
We have no mechanism to emulate LOCK_MAND locks on NFSv4, so explicitly return -EINVAL if someone requests it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-31nfs: increase number of permitted callback connections.NeilBrown
By default a sunrpc service is limited to (N+3)*20 connections where N is the number of threads. This is 80 when N==1. If this number is exceeded a warning is printed suggesting that the number of threads be increased. However with services which run a single thread, this is impossible. For such services there is a ->sv_maxconn setting that can be used to forcibly increase the limit, and silence the message. This is used by lockd. The nfs client uses a sunrpc service to handle callbacks and it too is single-threaded, so to avoid the useless messages, and to allow a reasonable number of concurrent connections, we need to set ->sv_maxconn. 1024 seems like a good number. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-31Merge tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Features include: - More preparatory patches for modularising NFSv2/v3/v4. Split out the various NFSv2/v3/v4-specific code into separate files - More preparation for the NFSv4 migration code - Ensure that OPEN(O_CREATE) observes the pNFS mds threshold parameters - pNFS fast failover when the data servers are down - Various cleanups and debugging patches" * tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (67 commits) nfs: fix fl_type tests in NFSv4 code NFS: fix pnfs regression with directio writes NFS: fix pnfs regression with directio reads sunrpc: clnt: Add missing braces nfs: fix stub return type warnings NFS: exit_nfs_v4() shouldn't be an __exit function SUNRPC: Add a missing spin_unlock to gss_mech_list_pseudoflavors NFS: Split out NFS v4 client functions NFS: Split out the NFS v4 filesystem types NFS: Create a single nfs_clone_super() function NFS: Split out NFS v4 server creating code NFS: Initialize the NFS v4 client from init_nfs_v4() NFS: Move the v4 getroot code to nfs4getroot.c NFS: Split out NFS v4 file operations NFS: Initialize v4 sysctls from nfs_init_v4() NFS: Create an init_nfs_v4() function NFS: Split out NFS v4 inode operations NFS: Split out NFS v3 inode operations NFS: Split out NFS v2 inode operations NFS: Clean up nfs4_proc_setclientid() and friends ...
2012-07-30NFS: Convert v4 into a moduleBryan Schumaker
This patch exports symbols needed by the v4 module. In addition, I also switch over to using IS_ENABLED() to check if CONFIG_NFS_V4 or CONFIG_NFS_V4_MODULE are set. The module (nfs4.ko) will be created in the same directory as nfs.ko and will be automatically loaded the first time you try to mount over NFS v4. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30NFS: Convert v3 into a moduleBryan Schumaker
This patch exports symbols and moves over the final structures needed by the v3 module. In addition, I also switch over to using IS_ENABLED() to check if CONFIG_NFS_V3 or CONFIG_NFS_V3_MODULE are set. The module (nfs3.ko) will be created in the same directory as nfs.ko and will be automatically loaded the first time you try to mount over NFS v3. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30NFS: Convert v2 into a moduleBryan Schumaker
The module (nfs2.ko) will be created in the same directory as nfs.ko and will be automatically loaded the first time you try to mount over NFS v2. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30NFS: Keep module parameters in the generic NFS clientBryan Schumaker
Otherwise we break backwards compatibility when v4 becomes a modules. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30NFS: Split out remaining NFS v4 inode functionsBryan Schumaker
Somehow I missed this in my previous patch series, but these functions are only needed by the v4 code and should be moved to a v4-only file. I wasn't exactly sure where I should put these functions, so I moved them into nfs4super.c where I could make them static. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30NFS: Pass super operations and xattr handlers in the nfs_subversionBryan Schumaker
I can set all variables in the nfs_fill_super() function, allowing me to remove the nfs4_fill_super() function. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30NFS: Only initialize the ACL client in the v3 caseBryan Schumaker
v2 and v4 don't use it, so I create two new nfs_rpc_ops functions to initialize the ACL client only when we are using v3. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30NFS: Create a try_mount rpc opBryan Schumaker
I'm already looking up the nfs subversion in nfs_fs_mount(), so I have easy access to rpc_ops that used to be difficult to reach. This allows me to set up a different mount path for NFS v2/3 and NFS v4. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30NFS: Remove the NFS v4 xdev mount functionBryan Schumaker
I can now share this code with the v2 and v3 code by using the NFS subversion structure. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30NFS: Add version registering frameworkBryan Schumaker
This patch adds in the code to track multiple versions of the NFS protocol. I created default structures for v2, v3 and v4 so that each version can continue to work while I convert them into kernel modules. I also removed the const parameter from the rpc_version array so that I can change it at runtime. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30NFS: Fix a number of bugs in the idmapperDavid Howells
Fix a number of bugs in the NFS idmapper code: (1) Only registered key types can be passed to the core keys code, so register the legacy idmapper key type. This is a requirement because the unregister function cleans up keys belonging to that key type so that there aren't dangling pointers to the module left behind - including the key->type pointer. (2) Rename the legacy key type. You can't have two key types with the same name, and (1) would otherwise require that. (3) complete_request_key() must be called in the error path of nfs_idmap_legacy_upcall(). (4) There is one idmap struct for each nfs_client struct. This means that idmap->idmap_key_cons is shared without the use of a lock. This is a problem because key_instantiate_and_link() - as called indirectly by idmap_pipe_downcall() - releases anyone waiting for the key to be instantiated. What happens is that idmap_pipe_downcall() running in the rpc.idmapd thread, releases the NFS filesystem in whatever thread that is running in to continue. This may then make another idmapper call, overwriting idmap_key_cons before idmap_pipe_downcall() gets the chance to call complete_request_key(). I *think* that reading idmap_key_cons only once, before key_instantiate_and_link() is called, and then caching the result in a variable is sufficient. Bug (4) is the cause of: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) PGD 0 Oops: 0010 [#1] SMP CPU 1 Modules linked in: ppdev parport_pc lp parport ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack nfs fscache xt_CHECKSUM auth_rpcgss iptable_mangle nfs_acl bridge stp llc lockd be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_codec_realtek snd_usb_audio snd_hda_intel snd_hda_codec snd_seq snd_pcm snd_hwdep snd_usbmidi_lib snd_rawmidi snd_timer uvcvideo videobuf2_core videodev media videobuf2_vmalloc snd_seq_device videobuf2_memops e1000e vhost_net iTCO_wdt joydev coretemp snd soundcore macvtap macvlan i2c_i801 snd_page_alloc tun iTCO_vendor_support microcode kvm_intel kvm sunrpc hid_logitech_dj usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan] Pid: 1229, comm: rpc.idmapd Not tainted 3.4.2-1.fc16.x86_64 #1 Gateway DX4710-UB801A/G33M05G1 RIP: 0010:[<0000000000000000>] [< (null)>] (null) RSP: 0018:ffff8801a3645d40 EFLAGS: 00010246 RAX: ffff880077707e30 RBX: ffff880077707f50 RCX: ffff8801a18ccd80 RDX: 0000000000000006 RSI: ffff8801a3645e75 RDI: ffff880077707f50 RBP: ffff8801a3645d88 R08: ffff8801a430f9c0 R09: ffff8801a3645db0 R10: 000000000000000a R11: 0000000000000246 R12: ffff8801a18ccd80 R13: ffff8801a3645e75 R14: ffff8801a430f9c0 R15: 0000000000000006 FS: 00007fb6fb51a700(0000) GS:ffff8801afc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001a49b0000 CR4: 00000000000027e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process rpc.idmapd (pid: 1229, threadinfo ffff8801a3644000, task ffff8801a3bf9710) Stack: ffffffff81260878 ffff8801a3645db0 ffff8801a3645db0 ffff880077707a90 ffff880077707f50 ffff8801a18ccd80 0000000000000006 ffff8801a3645e75 ffff8801a430f9c0 ffff8801a3645dd8 ffffffff81260983 ffff8801a3645de8 Call Trace: [<ffffffff81260878>] ? __key_instantiate_and_link+0x58/0x100 [<ffffffff81260983>] key_instantiate_and_link+0x63/0xa0 [<ffffffffa057062b>] idmap_pipe_downcall+0x1cb/0x1e0 [nfs] [<ffffffffa0107f57>] rpc_pipe_write+0x67/0x90 [sunrpc] [<ffffffff8117f833>] vfs_write+0xb3/0x180 [<ffffffff8117fb5a>] sys_write+0x4a/0x90 [<ffffffff81600329>] system_call_fastpath+0x16/0x1b Code: Bad RIP value. RIP [< (null)>] (null) RSP <ffff8801a3645d40> CR2: 0000000000000000 Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org [>= 3.4]
2012-07-30nfs: skip commit in releasepage if we're freeing memory for fs-related reasonsJeff Layton
We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
2012-07-30pnfsblock: bail out partial page IOPeng Tao
Current block layout driver read/write code assumes page aligned IO in many places. Add a checker to validate the assumption. Otherwise there would be data corruption like when application does open(O_WRONLY) and page unaliged write. Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30nfs: fix fl_type tests in NFSv4 codeJeff Layton
fl_type is not a bitmap. Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30NFS: fix pnfs regression with directio writesFred Isaman
Commit 57208fa7e51 "NFS: Create an write_pageio_init() function" did not modify the calls in direct.c, preventing direct io from using pnfs. This reintroduces that capability. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30NFS: fix pnfs regression with directio readsFred Isaman
Commit 1abb50886af "NFS: Create an read_pageio_init() function" did not modify the call in direct.c, preventing direct io from using pnfs. This reintroduces that capability. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-30nfs: fix stub return type warningsRandy Dunlap
Fix numerous repeated warnings by making the stub function void instead of non-void: fs/nfs/nfs4_fs.h: In function 'nfs4_unregister_sysctl': fs/nfs/nfs4_fs.h:385:1: warning: no return statement in function returning non-void Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-27SUNRPC: service request network namespace helper introducedStanislav Kinsbursky
This is a cleanup patch - makes code looks simplier. It replaces widely used rqstp->rq_xprt->xpt_net by introduced SVC_NET(rqstp). Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-07-23Merge branch 'for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull the big VFS changes from Al Viro: "This one is *big* and changes quite a few things around VFS. What's in there: - the first of two really major architecture changes - death to open intents. The former is finally there; it was very long in making, but with Miklos getting through really hard and messy final push in fs/namei.c, we finally have it. Unlike his variant, this one doesn't introduce struct opendata; what we have instead is ->atomic_open() taking preallocated struct file * and passing everything via its fields. Instead of returning struct file *, it returns -E... on error, 0 on success and 1 in "deal with it yourself" case (e.g. symlink found on server, etc.). See comments before fs/namei.c:atomic_open(). That made a lot of goodies finally possible and quite a few are in that pile: ->lookup(), ->d_revalidate() and ->create() do not get struct nameidata * anymore; ->lookup() and ->d_revalidate() get lookup flags instead, ->create() gets "do we want it exclusive" flag. With the introduction of new helper (kern_path_locked()) we are rid of all struct nameidata instances outside of fs/namei.c; it's still visible in namei.h, but not for long. Come the next cycle, declaration will move either to fs/internal.h or to fs/namei.c itself. [me, miklos, hch] - The second major change: behaviour of final fput(). Now we have __fput() done without any locks held by caller *and* not from deep in call stack. That obviously lifts a lot of constraints on the locking in there. Moreover, it's legal now to call fput() from atomic contexts (which has immediately simplified life for aio.c). We also don't need anti-recursion logics in __scm_destroy() anymore. There is a price, though - the damn thing has become partially asynchronous. For fput() from normal process we are guaranteed that pending __fput() will be done before the caller returns to userland, exits or gets stopped for ptrace. For kernel threads and atomic contexts it's done via schedule_work(), so theoretically we might need a way to make sure it's finished; so far only one such place had been found, but there might be more. There's flush_delayed_fput() (do all pending __fput()) and there's __fput_sync() (fput() analog doing __fput() immediately). I hope we won't need them often; see warnings in fs/file_table.c for details. [me, based on task_work series from Oleg merged last cycle] - sync series from Jan - large part of "death to sync_supers()" work from Artem; the only bits missing here are exofs and ext4 ones. As far as I understand, those are going via the exofs and ext4 trees resp.; once they are in, we can put ->write_super() to the rest, along with the thread calling it. - preparatory bits from unionmount series (from dhowells). - assorted cleanups and fixes all over the place, as usual. This is not the last pile for this cycle; there's at least jlayton's ESTALE work and fsfreeze series (the latter - in dire need of fixes, so I'm not sure it'll make the cut this cycle). I'll probably throw symlink/hardlink restrictions stuff from Kees into the next pile, too. Plus there's a lot of misc patches I hadn't thrown into that one - it's large enough as it is..." * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (127 commits) ext4: switch EXT4_IOC_RESIZE_FS to mnt_want_write_file() btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file() switch dentry_open() to struct path, make it grab references itself spufs: shift dget/mntget towards dentry_open() zoran: don't bother with struct file * in zoran_map ecryptfs: don't reinvent the wheels, please - use struct completion don't expose I_NEW inodes via dentry->d_inode tidy up namei.c a bit unobfuscate follow_up() a bit ext3: pass custom EOF to generic_file_llseek_size() ext4: use core vfs llseek code for dir seeks vfs: allow custom EOF in generic_file_llseek code vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes vfs: Remove unnecessary flushing of block devices vfs: Make sys_sync writeout also block device inodes vfs: Create function for iterating over block devices vfs: Reorder operations during sys_sync quota: Move quota syncing to ->sync_fs method quota: Split dquot_quota_sync() to writeback and cache flushing part vfs: Move noop_backing_dev_info check from sync into writeback ...
2012-07-20Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osdLinus Torvalds
Pull pnfs/ore fixes from Boaz Harrosh: "These are catastrophic fixes to the pnfs objects-layout that were just discovered. They are also destined for @stable. I have found these and worked on them at around RC1 time but unfortunately went to the hospital for kidney stones and had a very slow recovery. I refrained from sending them as is, before proper testing, and surly I have found a bug just yesterday. So now they are all well tested, and have my sign-off. Other then fixing the problem at hand, and assuming there are no bugs at the new code, there is low risk to any surrounding code. And in anyway they affect only these paths that are now broken. That is RAID5 in pnfs objects-layout code. It does also affect exofs (which was not broken) but I have tested exofs and it is lower priority then objects-layout because no one is using exofs, but objects-layout has lots of users." * 'for-linus' of git://git.open-osd.org/linux-open-osd: pnfs-obj: Fix __r4w_get_page when offset is beyond i_size pnfs-obj: don't leak objio_state if ore_write/read fails ore: Unlock r4w pages in exact reverse order of locking ore: Remove support of partial IO request (NFS crash) ore: Fix NFS crash by supporting any unaligned RAID IO
2012-07-20pnfs-obj: Fix __r4w_get_page when offset is beyond i_sizeBoaz Harrosh
It is very common for the end of the file to be unaligned on stripe size. But since we know it's beyond file's end then the XOR should be preformed with all zeros. Old code used to just read zeros out of the OSD devices, which is a great waist. But what scares me more about this situation is that, we now have pages attached to the file's mapping that are beyond i_size. I don't like the kind of bugs this calls for. Fix both birds, by returning a global zero_page, if offset is beyond i_size. TODO: Change the API to ->__r4w_get_page() so a NULL can be returned without being considered as error, since XOR API treats NULL entries as zero_pages. [Bug since 3.2. Should apply the same way to all Kernels since] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-07-20pnfs-obj: don't leak objio_state if ore_write/read failsBoaz Harrosh
[Bug since 3.2 Kernel] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-07-17NFS: exit_nfs_v4() shouldn't be an __exit functionBryan Schumaker
... yet. Right now, init_nfs() is calling this function if an error is encountered when loading the nfs module. An __exit function can't be called from one declared as __init. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17NFS: Split out NFS v4 client functionsBryan Schumaker
These functions are only needed by NFS v4, so they can be moved into a v4 specific file. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17NFS: Split out the NFS v4 filesystem typesBryan Schumaker
This allows me to move the v4 mounting and unmounting functions out of the generic client and into a file that is only compiled when CONFIG_NFS_V4 is enabled. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17NFS: Create a single nfs_clone_super() functionBryan Schumaker
v2 and v3 shared a function for this, but v4 implemented something only slightly different. Might as well share code whenever possible... Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17NFS: Split out NFS v4 server creating codeBryan Schumaker
These functions are specific to NFS v4 and can be moved to nfs4client.c to keep them out of the generic client. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17NFS: Initialize the NFS v4 client from init_nfs_v4()Bryan Schumaker
And split these functions out of the generic client into a v4 specific file. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17NFS: Move the v4 getroot code to nfs4getroot.cBryan Schumaker
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17NFS: Split out NFS v4 file operationsBryan Schumaker
This patch moves the NFS v4 file functions into a new file that is only compiled when CONFIG_NFS_V4 is enabled. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17NFS: Initialize v4 sysctls from nfs_init_v4()Bryan Schumaker
And split them out of the generic client into their own file. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17NFS: Create an init_nfs_v4() functionBryan Schumaker
I want to initialize all of NFS v4 in a single function that will eventually be used as the v4 module init function. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17NFS: Split out NFS v4 inode operationsBryan Schumaker
The NFS v4 file inode operations are already already in nfs4proc.c, so this patch just needs to move the directory operations to the same file. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17NFS: Split out NFS v3 inode operationsBryan Schumaker
This patch moves the NFS v3 file and directory inode functions into files that are only compiled whet CONFIG_NFS_V3 is enabled. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17NFS: Split out NFS v2 inode operationsBryan Schumaker
This patch moves the NFS v2 file and directory inode functions into files that are only compiled whet CONFIG_NFS_V2 is enabled. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16NFS: Clean up nfs4_proc_setclientid() and friendsChuck Lever
Add documenting comments and appropriate debugging messages. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16NFS: Treat NFS4ERR_CLID_INUSE as a fatal errorChuck Lever
For NFSv4 minor version 0, currently the cl_id_uniquifier allows the Linux client to generate a unique nfs_client_id4 string whenever a server replies with NFS4ERR_CLID_INUSE. This implementation seems to be based on a flawed reading of RFC 3530. NFS4ERR_CLID_INUSE actually means that the client has presented this nfs_client_id4 string with a different principal at some time in the past, and that lease is still in use on the server. For a Linux client this might be rather difficult to achieve: the authentication flavor is named right in the nfs_client_id4.id string. If we change flavors, we change strings automatically. So, practically speaking, NFS4ERR_CLID_INUSE means there is some other client using our string. There is not much that can be done to recover automatically. Let's make it a permanent error. Remove the recovery logic in nfs4_proc_setclientid(), and remove the cl_id_uniquifier field from the nfs_client data structure. And, remove the authentication flavor from the nfs_client_id4 string. Keeping the authentication flavor in the nfs_client_id4.id string means that we could have a separate lease for each authentication flavor used by mounts on the client. But we want just one lease for all the mounts on this client. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16NFS: When state recovery fails, waiting tasks should exitChuck Lever
NFSv4 state recovery is not always successful. Failure is signalled by setting the nfs_client.cl_cons_state to a negative (errno) value, then waking waiters. Currently this can happen only during mount processing. I'm about to add an explicit case where state recovery failure during normal operation should force all NFS requests waiting on that state recovery to exit. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16SUNRPC: Add rpcauth_list_flavors()Chuck Lever
The gss_mech_list_pseudoflavors() function provides a list of currently registered GSS pseudoflavors. This list does not include any non-GSS flavors that have been registered with the RPC client. nfs4_find_root_sec() currently adds these extra flavors by hand. Instead, nfs4_find_root_sec() should be looking at the set of flavors that have been explicitly registered via rpcauth_register(). And, other areas of code will soon need the same kind of list that contains all flavors the kernel currently knows about (see below). Rather than cloning the open-coded logic in nfs4_find_root_sec() to those new places, introduce a generic RPC function that generates a full list of registered auth flavors and pseudoflavors. A new rpc_authops method is added that lists a flavor's pseudoflavors, if it has any. I encountered an interesting module loader loop when I tried to get the RPC client to invoke gss_mech_list_pseudoflavors() by name. This patch is a pre-requisite for server trunking discovery, and a pre-requisite for fixing up the in-kernel mount client to do better automatic security flavor selection. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-16NFS: nfs_getaclargs.acl_len is a size_tChuck Lever
Squelch compiler warnings: fs/nfs/nfs4proc.c: In function ‘__nfs4_get_acl_uncached’: fs/nfs/nfs4proc.c:3811:14: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] fs/nfs/nfs4proc.c:3818:15: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] Introduced by commit bf118a34 "NFSv4: include bitmap in nfsv4 get acl data", Dec 7, 2011. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>