summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-08-13Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - an NVMe fix from Gabriel, fixing a suspend/resume issue on some setups - addition of a few missing entries in the block queue sysfs documentation, from Joe - a fix for a sparse shadow warning for the bvec iterator, from Johannes - a writeback deadlock involving raid issuing barriers, and not flushing the plug when we wakeup the flusher threads. From Konstantin - a set of patches for the NVMe target/loop/rdma code, from Roland and Sagi * 'for-linus' of git://git.kernel.dk/linux-block: bvec: avoid variable shadowing warning doc: update block/queue-sysfs.txt entries nvme: Suspend all queues before deletion mm, writeback: flush plugged IO in wakeup_flusher_threads() nvme-rdma: Remove unused includes nvme-rdma: start async event handler after reconnecting to a controller nvmet: Fix controller serial number inconsistency nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads nvmet-rdma: Correctly handle RDMA device hot removal nvme-rdma: Make sure to shutdown the controller if we can nvme-loop: Remove duplicate call to nvme_remove_namespaces nvme-rdma: Free the I/O tags when we delete the controller nvme-rdma: Remove duplicate call to nvme_remove_namespaces nvme-rdma: Fix device removal handling nvme-rdma: Queue ns scanning after a sucessful reconnection nvme-rdma: Don't leak uninitialized memory in connect request private data
2016-08-12Merge tag 'nfsd-4.8-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd fixes from Bruce Fields: "Fixes for the dentry refcounting leak I introduced in 4.8-rc1, and for races in the LOCK code which appear to go back to the big nfsd state lock removal from 3.17" * tag 'nfsd-4.8-1' of git://linux-nfs.org/~bfields/linux: nfsd: don't return an unhashed lock stateid after taking mutex nfsd: Fix race between FREE_STATEID and LOCK nfsd: fix dentry refcounting on create
2016-08-12nfsd: don't return an unhashed lock stateid after taking mutexJeff Layton
nfsd4_lock will take the st_mutex before working with the stateid it gets, but between the time when we drop the cl_lock and take the mutex, the stateid could become unhashed (a'la FREE_STATEID). If that happens the lock stateid returned to the client will be forgotten. Fix this by first moving the st_mutex acquisition into lookup_or_create_lock_state. Then, have it check to see if the lock stateid is still hashed after taking the mutex. If it's not, then put the stateid and try the find/create again. Signed-off-by: Jeff Layton <jlayton@redhat.com> Tested-by: Alexey Kodanev <alexey.kodanev@oracle.com> Cc: stable@vger.kernel.org # feb9dad5 nfsd: Always lock state exclusively. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-12Merge tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: - Stable patch from Olga to fix RPCSEC_GSS upcalls when the same user needs multiple different security services (e.g. krb5i and krb5p). - Stable patch to fix a regression introduced by the use of SO_REUSEPORT, and that prevented the use of multiple different NFS versions to the same server. - TCP socket reconnection timer fixes. - Patch from Neil to disable the use of IPv6 temporary addresses" * tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Cap the transport reconnection timer at 1/2 lease period NFSv4: Cleanup the setting of the nfs4 lease period SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout SUNRPC: Fix reconnection timeouts NFSv4.2: LAYOUTSTATS may return NFS4ERR_ADMIN/DELEG_REVOKED SUNRPC: disable the use of IPv6 temporary addresses. SUNRPC: allow for upcalls for same uid but different gss service SUNRPC: Fix up socket autodisconnect SUNRPC: Handle EADDRNOTAVAIL on connection failures
2016-08-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge fixes from Andrew Morton: "7 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/memory_hotplug.c: initialize per_cpu_nodestats for hotadded pgdats mm, oom: fix uninitialized ret in task_will_free_mem() kasan: remove the unnecessary WARN_ONCE from quarantine.c mm: memcontrol: fix memcg id ref counter on swap charge move mm: memcontrol: fix swap counter leak on swapout from offline cgroup proc, meminfo: use correct helpers for calculating LRU sizes in meminfo mm/hugetlb: fix incorrect hugepages count during mem hotplug
2016-08-11proc, meminfo: use correct helpers for calculating LRU sizes in meminfoMel Gorman
meminfo_proc_show() and si_mem_available() are using the wrong helpers for calculating the size of the LRUs. The user-visible impact is that there appears to be an abnormally high number of unevictable pages. Link: http://lkml.kernel.org/r/20160805105805.GR2799@techsingularity.net Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-11Merge tag 'ceph-for-4.8-rc2' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "A patch for a NULL dereference bug introduced in 4.8-rc1 and a handful of static checker fixes" * tag 'ceph-for-4.8-rc2' of https://github.com/ceph/ceph-client: ceph: initialize pathbase in the !dentry case in encode_caps_cb() rbd: nuke the 32-bit pool id check rbd: destroy header_oloc in rbd_dev_release() ceph: fix null pointer dereference in ceph_flush_snaps() libceph: using kfree_rcu() to simplify the code libceph: make cancel_generic_request() static libceph: fix return value check in alloc_msg_with_page_vector()
2016-08-11nfsd: Fix race between FREE_STATEID and LOCKChuck Lever
When running LTP's nfslock01 test, the Linux client can send a LOCK and a FREE_STATEID request at the same time. The outcome is: Frame 324 R OPEN stateid [2,O] Frame 115004 C LOCK lockowner_is_new stateid [2,O] offset 672000 len 64 Frame 115008 R LOCK stateid [1,L] Frame 115012 C WRITE stateid [0,L] offset 672000 len 64 Frame 115016 R WRITE NFS4_OK Frame 115019 C LOCKU stateid [1,L] offset 672000 len 64 Frame 115022 R LOCKU NFS4_OK Frame 115025 C FREE_STATEID stateid [2,L] Frame 115026 C LOCK lockowner_is_new stateid [2,O] offset 672128 len 64 Frame 115029 R FREE_STATEID NFS4_OK Frame 115030 R LOCK stateid [3,L] Frame 115034 C WRITE stateid [0,L] offset 672128 len 64 Frame 115038 R WRITE NFS4ERR_BAD_STATEID In other words, the server returns stateid L in a successful LOCK reply, but it has already released it. Subsequent uses of stateid L fail. To address this, protect the generation check in nfsd4_free_stateid with the st_mutex. This should guarantee that only one of two outcomes occurs: either LOCK returns a fresh valid stateid, or FREE_STATEID returns NFS4ERR_LOCKS_HELD. Reported-by: Alexey Kodanev <alexey.kodanev@oracle.com> Fix-suggested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Alexey Kodanev <alexey.kodanev@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-11nfsd: fix dentry refcounting on createJosef Bacik
b44061d0b9 introduced a dentry ref counting bug. Previously we were grabbing one ref to dchild in nfsd_create(), but with the creation of nfsd_create_locked() we have a ref for dchild from the lookup in nfsd_create(), and then another ref in nfsd_create_locked(). The ref from the lookup in nfsd_create() is never dropped and results in dentries still in use at unmount. Signed-off-by: Josef Bacik <jbacik@fb.com> Fixes: b44061d0b9 "nfsd: reorganize nfsd_create" Reported-by: kernel test robot <xiaolong.ye@intel.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-10Merge branch 'for-linus-4.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "Some fixes for btrfs send/recv and fsync from Filipe and Robbie Ko. Bonus points to Filipe for already having xfstests in place for many of these" * 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: remove unused function btrfs_add_delayed_qgroup_reserve() Btrfs: improve performance on fsync against new inode after rename/unlink Btrfs: be more precise on errors when getting an inode from disk Btrfs: send, don't bug on inconsistent snapshots Btrfs: send, avoid incorrect leaf accesses when sending utimes operations Btrfs: send, fix invalid leaf accesses due to incorrect utimes operations Btrfs: send, fix warning due to late freeing of orphan_dir_info structures Btrfs: incremental send, fix premature rmdir operations Btrfs: incremental send, fix invalid paths for rename operations Btrfs: send, add missing error check for calls to path_loop() Btrfs: send, fix failure to move directories with the same name around Btrfs: add missing check for writeback errors on fsync
2016-08-10mm, writeback: flush plugged IO in wakeup_flusher_threads()Konstantin Khlebnikov
I've found funny live-lock between raid10 barriers during resync and memory controller hard limits. Inside mpage_readpages() task holds on to its plug bio which blocks the barrier in raid10. Its memory cgroup have no free memory thus the task goes into reclaimer but all reclaimable pages are dirty and cannot be written because raid10 is rebuilding and stuck on the barrier. Common flush of such IO in schedule() never happens, because the caller doesn't go to sleep. Lock is 'live' because changing memory limit or killing tasks which holds that stuck bio unblock whole progress. That was what happened in 3.18.x but I see no difference in upstream logic. Theoretically this might happen even without memory cgroup. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-09mm: memcontrol: only mark charged pages with PageKmemcgVladimir Davydov
To distinguish non-slab pages charged to kmemcg we mark them PageKmemcg, which sets page->_mapcount to -512. Currently, we set/clear PageKmemcg in __alloc_pages_nodemask()/free_pages_prepare() for any page allocated with __GFP_ACCOUNT, including those that aren't actually charged to any cgroup, i.e. allocated from the root cgroup context. To avoid overhead in case cgroups are not used, we only do that if memcg_kmem_enabled() is true. The latter is set iff there are kmem-enabled memory cgroups (online or offline). The root cgroup is not considered kmem-enabled. As a result, if a page is allocated with __GFP_ACCOUNT for the root cgroup when there are kmem-enabled memory cgroups and is freed after all kmem-enabled memory cgroups were removed, e.g. # no memory cgroups has been created yet, create one mkdir /sys/fs/cgroup/memory/test # run something allocating pages with __GFP_ACCOUNT, e.g. # a program using pipe dmesg | tail # remove the memory cgroup rmdir /sys/fs/cgroup/memory/test we'll get bad page state bug complaining about page->_mapcount != -1: BUG: Bad page state in process swapper/0 pfn:1fd945c page:ffffea007f651700 count:0 mapcount:-511 mapping: (null) index:0x0 flags: 0x1000000000000000() To avoid that, let's mark with PageKmemcg only those pages that are actually charged to and hence pin a non-root memory cgroup. Fixes: 4949148ad433 ("mm: charge/uncharge kmemcg from generic page allocator paths") Reported-and-tested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-09ceph: initialize pathbase in the !dentry case in encode_caps_cb()Ilya Dryomov
pathbase is the base inode; set it to 0 if we've got no path. Coverity-id: 146348 Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
2016-08-08ceph: fix null pointer dereference in ceph_flush_snaps()Yan, Zheng
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-08-07block: rename bio bi_rw to bi_opfJens Axboe
Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually setting bi_rw is most likely going to be broken. Instead of letting that brokeness linger, rename the member, to force old and out-of-tree code to break at compile time instead of at runtime. No intended functional changes in this commit. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-07block/mm: make bdev_ops->rw_page() take a bool for read/writeJens Axboe
Commit abf545484d31 changed it from an 'rw' flags type to the newer ops based interface, but now we're effectively leaking some bdev internals to the rest of the kernel. Since we only care about whether it's a read or a write at that level, just pass in a bool 'is_write' parameter instead. Then we can also move op_is_write() and friends back under CONFIG_BLOCK protection. Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-07Merge tag 'binfmt-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/binfmt_misc Pull binfmt_misc update from James Bottomley: "This update is to allow architecture emulation containers to function such that the emulation binary can be housed outside the container itself. The container and fs parts both have acks from relevant experts. To use the new feature you have to add an F option to your binfmt_misc configuration" From the docs: "The usual behaviour of binfmt_misc is to spawn the binary lazily when the misc format file is invoked. However, this doesn't work very well in the face of mount namespaces and changeroots, so the F mode opens the binary as soon as the emulation is installed and uses the opened image to spawn the emulator, meaning it is always available once installed, regardless of how the environment changes" * tag 'binfmt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/binfmt_misc: binfmt_misc: add F option description to documentation binfmt_misc: add persistent opened binary handler for containers fs: add filp_clone_open API
2016-08-07fs: return EPERM on immutable inodeEryu Guan
In most cases, EPERM is returned on immutable inode, and there're only a few places returning EACCES. I noticed this when running LTP on overlayfs, setxattr03 failed due to unexpected EACCES on immutable inode. So converting all EACCES to EPERM on immutable inode. Acked-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-07Merge branch 'for-linus-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted cleanups and fixes. In the "trivial API change" department - ->d_compare() losing 'parent' argument" * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: cachefiles: Fix race between inactivating and culling a cache object 9p: use clone_fid() 9p: fix braino introduced in "9p: new helper - v9fs_parent_fid()" vfs: make dentry_needs_remove_privs() internal vfs: remove file_needs_remove_privs() vfs: fix deadlock in file_remove_privs() on overlayfs get rid of 'parent' argument of ->d_compare() cifs, msdos, vfat, hfs+: don't bother with parent in ->d_compare() affs ->d_compare(): don't bother with ->d_inode fold _d_rehash() and __d_rehash() together fold dentry_rcuwalk_invalidate() into its only remaining caller
2016-08-06Merge tag 'xfs-rmap-for-linus-4.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull more xfs updates from Dave Chinner: "This is the second part of the XFS updates for this merge cycle, and contains the new reverse block mapping feature for XFS. Reverse mapping allows us to track the owner of a specific block on disk precisely. It is implemented as a set of btrees (one per allocation group) that track the owners of allocated extents. Effectively it is a "used space tree" that is updated when we allocate or free extents. i.e. it is coherent with the free space btrees we already maintain and never overlaps with them. This reverse mapping infrastructure is the building block of several upcoming features - reflink, copy-on-write data, dedupe, online metadata and data scrubbing, highly accurate bad sector/data loss reporting to users, and significantly improved reconstruction of damaged and corrupted filesystems. There's a lot of new stuff coming along in the next couple of cycles,a nd it all builds in the rmap infrastructure. As such, it's a huge chunk of new code with new on-disk format features and internal infrastructure. It warns at mount time as an experimental feature and that it may eat data (as we do with all new on-disk features until they stabilise). We have not released userspace suport for it yet - userspace support currently requires download from Darrick's xfsprogs repo and build from source, so the access to this feature is really developer/tester only at this point. Initial userspace support will be released at the same time kernel with this code in it is released. The new rmap enabled code regresses 3 xfstests - all are ENOSPC related corner cases, one of which Darrick posted a fix for a few hours ago. The other two are fixed by infrastructure that is part of the upcoming reflink patchset. This new ENOSPC infrastructure requires a on-disk format tweak required to keep mount times in check - we need to keep an on-disk count of allocated rmapbt blocks so we don't have to scan the entire btrees at mount time to count them. This is currently being tested and will be part of the fixes sent in the next week or two so users will not be exposed to this change" * tag 'xfs-rmap-for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (52 commits) xfs: move (and rename) the deferred bmap-free tracepoints xfs: collapse single use static functions xfs: remove unnecessary parentheses from log redo item recovery functions xfs: remove the extents array from the rmap update done log item xfs: in btree_lshift, only allocate temporary cursor when needed xfs: remove unnecesary lshift/rshift key initialization xfs: remove the get*keys and update_keys btree ops pointers xfs: enable the rmap btree functionality xfs: don't update rmapbt when fixing agfl xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled xfs: add rmap btree block detection to log recovery xfs: add rmap btree geometry feature flag xfs: propagate bmap updates to rmapbt xfs: enable the xfs_defer mechanism to process rmaps to update xfs: log rmap intent items xfs: create rmap update intent log items xfs: add rmap btree insert and delete helpers xfs: convert unwritten status of reverse mappings xfs: remove an extent from the rmap btree xfs: add an extent to the rmap btree ...
2016-08-06Merge branch 'work.const-qstr' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull qstr constification updates from Al Viro: "Fairly self-contained bunch - surprising lot of places passes struct qstr * as an argument when const struct qstr * would suffice; it complicates analysis for no good reason. I'd prefer to feed that separately from the assorted fixes (those are in #for-linus and with somewhat trickier topology)" * 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: qstr: constify instances in adfs qstr: constify instances in lustre qstr: constify instances in f2fs qstr: constify instances in ext2 qstr: constify instances in vfat qstr: constify instances in procfs qstr: constify instances in fuse qstr constify instances in fs/dcache.c qstr: constify instances in nfs qstr: constify instances in ocfs2 qstr: constify instances in autofs4 qstr: constify instances in hfs qstr: constify instances in hfsplus qstr: constify instances in logfs qstr: constify dentry_init_security
2016-08-06Merge tag 'pstore-v4.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore fixes from Kees Cook: "Fixes for pstore ramoops driver to catch bad kfree() and to use better DT bindings" * tag 'pstore-v4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: ramoops: use persistent_ram_free() instead of kfree() for freeing prz ramoops: use DT reserved-memory bindings
2016-08-06Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Here's the second round of block updates for this merge window. It's a mix of fixes for changes that went in previously in this round, and fixes in general. This pull request contains: - Fixes for loop from Christoph - A bdi vs gendisk lifetime fix from Dan, worth two cookies. - A blk-mq timeout fix, when on frozen queues. From Gabriel. - Writeback fix from Jan, ensuring that __writeback_single_inode() does the right thing. - Fix for bio->bi_rw usage in f2fs from me. - Error path deadlock fix in blk-mq sysfs registration from me. - Floppy O_ACCMODE fix from Jiri. - Fix to the new bio op methods from Mike. One more followup will be coming here, ensuring that we don't propagate the block types outside of block. That, and a rename of bio->bi_rw is coming right after -rc1 is cut. - Various little fixes" * 'for-linus' of git://git.kernel.dk/linux-block: mm/block: convert rw_page users to bio op use loop: make do_req_filebacked more robust loop: don't try to use AIO for discards blk-mq: fix deadlock in blk_mq_register_disk() error path Include: blkdev: Removed duplicate 'struct request;' declaration. Fixup direct bi_rw modifiers block: fix bdi vs gendisk lifetime mismatch blk-mq: Allow timeouts to run while queue is freezing nbd: fix race in ioctl block: fix use-after-free in seq file f2fs: drop bio->bi_rw manual assignment block: add missing group association in bio-cloning functions blkcg: kill unused field nr_undestroyed_grps writeback: Write dirty times for WB_SYNC_ALL writeback floppy: fix open(O_ACCMODE) for ioctl-only open
2016-08-05NFSv4: Cap the transport reconnection timer at 1/2 lease periodTrond Myklebust
We don't want to miss a lease period renewal due to the TCP connection failing to reconnect in a timely fashion. To ensure this doesn't happen, cap the reconnection timer so that we retry the connection attempt at least every 1/2 lease period. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-05NFSv4: Cleanup the setting of the nfs4 lease periodTrond Myklebust
Make a helper function nfs4_set_lease_period() and have nfs41_setup_state_renewal() and nfs4_do_fsinfo() use it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-05Merge branch 'integration-4.8' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.8
2016-08-05ramoops: use persistent_ram_free() instead of kfree() for freeing przHiraku Toyooka
persistent_ram_zone(=prz) structures are allocated by persistent_ram_new(), which includes vmap() or ioremap(). But they are currently freed by kfree(). This uses persistent_ram_free() for correct this asymmetry usage. Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.kw@hitachi.com> Cc: Mark Salyzyn <salyzyn@android.com> Cc: Seiji Aguchi <seiji.aguchi.tr@hitachi.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2016-08-05ramoops: use DT reserved-memory bindingsKees Cook
Instead of a ramoops-specific node, use a child node of /reserved-memory. This requires that of_platform_device_create() be explicitly called for the node, though, since "/reserved-memory" does not have its own "compatible" property. Suggested-by: Rob Herring <robh@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Rob Herring <robh@kernel.org>
2016-08-05NFSv4.2: LAYOUTSTATS may return NFS4ERR_ADMIN/DELEG_REVOKEDTrond Myklebust
We should handle those errors in the same way we handle the other stateid errors: by invalidating the faulty layout stateid. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2016-08-04Merge tag 'nfsd-4.8' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "Highlights: - Trond made a change to the server's tcp logic that allows a fast client to better take advantage of high bandwidth networks, but may increase the risk that a single client could starve other clients; a new sunrpc.svc_rpc_per_connection_limit parameter should help mitigate this in the (hopefully unlikely) event this becomes a problem in practice. - Tom Haynes added a minimal flex-layout pnfs server, which is of no use in production for now--don't build it unless you're doing client testing or further server development" * tag 'nfsd-4.8' of git://linux-nfs.org/~bfields/linux: (32 commits) nfsd: remove some dead code in nfsd_create_locked() nfsd: drop unnecessary MAY_EXEC check from create nfsd: clean up bad-type check in nfsd_create_locked nfsd: remove unnecessary positive-dentry check nfsd: reorganize nfsd_create nfsd: check d_can_lookup in fh_verify of directories nfsd: remove redundant zero-length check from create nfsd: Make creates return EEXIST instead of EACCES SUNRPC: Detect immediate closure of accepted sockets SUNRPC: accept() may return sockets that are still in SYN_RECV nfsd: allow nfsd to advertise multiple layout types nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock nfsd/blocklayout: Make sure calculate signature/designator length aligned xfs: abstract block export operations from nfsd layouts SUNRPC: Remove unused callback xpo_adjust_wspace() SUNRPC: Change TCP socket space reservation SUNRPC: Add a server side per-connection limit SUNRPC: Micro optimisation for svc_data_ready SUNRPC: Call the default socket callbacks instead of open coding SUNRPC: lock the socket while detaching it ...
2016-08-04Merge branch 'for-linus-4.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull more btrfs updates from Chris Mason: "This is part two of my btrfs pull, which is some cleanups and a batch of fixes. Most of the code here is from Jeff Mahoney, making the pointers we pass around internally more consistent and less confusing overall. I noticed a small problem right before I sent this out yesterday, so I fixed it up and re-tested overnight" * 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (40 commits) Btrfs: fix __MAX_CSUM_ITEMS btrfs: btrfs_abort_transaction, drop root parameter btrfs: add btrfs_trans_handle->fs_info pointer btrfs: btrfs_relocate_chunk pass extent_root to btrfs_end_transaction btrfs: convert nodesize macros to static inlines btrfs: introduce BTRFS_MAX_ITEM_SIZE btrfs: cleanup, remove prototype for btrfs_find_root_ref btrfs: copy_to_sk drop unused root parameter btrfs: simpilify btrfs_subvol_inherit_props btrfs: tests, use BTRFS_FS_STATE_DUMMY_FS_INFO instead of dummy root btrfs: tests, require fs_info for root btrfs: tests, move initialization into tests/ btrfs: btrfs_test_opt and friends should take a btrfs_fs_info btrfs: prefix fsid to all trace events btrfs: plumb fs_info into btrfs_work btrfs: remove obsolete part of comment in statfs btrfs: hide test-only member under ifdef btrfs: Ratelimit "no csum found" info message btrfs: Add ratelimit to btrfs printing Btrfs: fix unexpected balance crash due to BUG_ON ...
2016-08-04Merge tag 'upstream-4.8-rc1' of git://git.infradead.org/linux-ubifsLinus Torvalds
Pull UBI/UBIFS updates from Richard Weinberger: "This contains mostly cleanups and minor improvements of UBI and UBIFS" * tag 'upstream-4.8-rc1' of git://git.infradead.org/linux-ubifs: ubi: Use bitmaps in Fastmap self-check code ubi: Be more paranoid while seaching for the most recent Fastmap ubi: Check whether the Fastmap anchor matches the super block ubi: Rework Fastmap attach base code ubi: Fix whitespace issue in count_fastmap_pebs() ubi: Introduce vol_ignored() ubi: Fix scan_fast() comment ubifs: switch_gc_head: Remove redondant sync of wbuf ubi: Make volume resize power cut aware ubi: Fix early logging ubi: gluebi: Fix double refcounting ubifs: Silence early error messages if MS_SILENT is set ubi: Fix race condition between ubi device creation and udev ubifs: Update comment for ubifs_errc ubi: Only read necessary size when reading the VID header ubifs: Make xattr structures static ubifs: Silence error output if MS_SILENT is set
2016-08-04Merge branch 'for-linus-4.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: "Beside of various fixes this also contains patches to enable features such was Kcov, kmemleak and TRACE_IRQFLAGS_SUPPORT on UML" * 'for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common() um: Support kcov um: Enable TRACE_IRQFLAGS_SUPPORT um: Use asm-generic/irqflags.h um: Fix possible deadlock in sig_handler_common() um: Select HAVE_DEBUG_KMEMLEAK um: Setup physical memory in setup_arch() um: Eliminate null test after alloc_bootmem
2016-08-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu Pull m68knommu updates from Greg Ungerer: "This series is all about Nicolas flat format support for MMU systems. Traditional m68k no-MMU flat format binaries can now be run on m68k MMU enabled systems too. The series includes some nice cleanups of the binfmt_flat code and converts it to using proper user space accessor functions. With all this in place you can boot and run a complete no-MMU flat format based user space on an MMU enabled system" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68k: enable binfmt_flat on systems with an MMU binfmt_flat: allow compressed flat binary format to work on MMU systems binfmt_flat: add MMU-specific support binfmt_flat: update libraries' data segment pointer with userspace accessors binfmt_flat: use clear_user() rather than memset() to clear .bss binfmt_flat: use proper user space accessors with old relocs code binfmt_flat: use proper user space accessors with relocs processing code binfmt_flat: clean up create_flat_tables() and stack accesses binfmt_flat: use generic transfer_args_to_stack() elf_fdpic_transfer_args_to_stack(): make it generic binfmt_flat: prevent kernel dammage from corrupted executable headers binfmt_flat: convert printk invocations to their modern form binfmt_flat: assorted cleanups m68k: use same start_thread() on MMU and no-MMU m68k: fix file path comment m68k: fix bFLT executable running on MMU enabled systems
2016-08-04nfsd: remove some dead code in nfsd_create_locked()Dan Carpenter
We changed this around in f135af1041f ('nfsd: reorganize nfsd_create') so "dchild" can't be an error pointer any more. Also, dchild can't be NULL here (and dput would already handle this even if it was). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04nfsd: drop unnecessary MAY_EXEC check from createJ. Bruce Fields
We need an fh_verify to make sure we at least have a dentry, but actual permission checks happen later. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04nfsd: clean up bad-type check in nfsd_create_lockedJ. Bruce Fields
Minor cleanup, no change in behavior. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04nfsd: remove unnecessary positive-dentry checkJ. Bruce Fields
vfs_{create,mkdir,mknod} each begin with a call to may_create(), which returns EEXIST if the object already exists. This check is therefore unnecessary. (In the NFSv2 case, nfsd_proc_create also has such a check. Contrary to RFC 1094, our code seems to believe that a CREATE of an existing file should succeed. I'm leaving that behavior alone.) Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04nfsd: reorganize nfsd_createJ. Bruce Fields
There's some odd logic in nfsd_create() that allows it to be called with the parent directory either locked or unlocked. The only already-locked caller is NFSv2's nfsd_proc_create(). It's less confusing to split out the unlocked case into a separate function which the NFSv2 code can call directly. Also fix some comments while we're here. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04nfsd: check d_can_lookup in fh_verify of directoriesJ. Bruce Fields
Create and other nfsd ops generally assume we can call lookup_one_len on inodes with S_IFDIR set. Al says that this assumption isn't true in general, though it should be for the filesystem objects nfsd sees. Add a check just to make sure our assumption isn't violated. Remove a couple checks for i_op->lookup in create code. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04nfsd: remove redundant zero-length check from createJ. Bruce Fields
lookup_one_len already has this check. The only effect of this patch is to return access instead of perm in the 0-length-filename case. I actually prefer nfserr_perm (or _inval?), but I doubt anyone cares. The isdotent check seems redundant too, but I worry that some client might actually care about that strange nfserr_exist error. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04nfsd: Make creates return EEXIST instead of EACCESOleg Drokin
When doing a create (mkdir/mknod) on a name, it's worth checking the name exists first before returning EACCES in case the directory is not writeable by the user. This makes return values on the client more consistent regardless of whenever the entry there is cached in the local cache or not. Another positive side effect is certain programs only expect EEXIST in that case even despite POSIX allowing any valid error to be returned. Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-08-04mm/block: convert rw_page users to bio op useMike Christie
The rw_page users were not converted to use bio/req ops. As a result bdev_write_page is not passing down REQ_OP_WRITE and the IOs will be sent down as reads. Signed-off-by: Mike Christie <mchristi@redhat.com> Fixes: 4e1b2d52a80d ("block, fs, drivers: remove REQ_OP compat defs and related code") Modified by me to: 1) Drop op_flags passing into ->rw_page(), as we don't use it. 2) Make op_is_write() and friends safe to use for !CONFIG_BLOCK Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-04Fixup direct bi_rw modifiersShaun Tancheff
bi_rw should be using bio_set_op_attrs to set bi_rw. Signed-off-by: Shaun Tancheff <shaun@tancheff.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: David Sterba <dsterba@suse.com> Cc: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-04f2fs: drop bio->bi_rw manual assignmentJens Axboe
Merge 4fc29c1aa375 included this extra line, but it's not needed (or useful) since we'll bio_set_op_attrs() right after to properly set the op and flags for the bio. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-04block: add missing group association in bio-cloning functionsPaolo Valente
When a bio is cloned, the newly created bio must be associated with the same blkcg as the original bio (if BLK_CGROUP is enabled). If this operation is not performed, then the new bio is not associated with any group, and the group of the current task is returned when the group of the bio is requested. Depending on the cloning frequency, this may cause a large percentage of the bios belonging to a given group to be treated as if belonging to other groups (in most cases as if belonging to the root group). The expected group isolation may thereby be broken. This commit adds the missing association in bio-cloning functions. Fixes: da2f0f74cf7d ("Btrfs: add support for blkio controllers") Cc: stable@vger.kernel.org # v4.3+ Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Reviewed-by: Nikolay Borisov <kernel@kyup.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-04writeback: Write dirty times for WB_SYNC_ALL writebackJan Kara
Currently we take care to handle I_DIRTY_TIME in vfs_fsync() and queue_io() so that inodes which have only dirty timestamps are properly written on fsync(2) and sync(2). However there are other call sites - most notably going through write_inode_now() - which expect inode to be clean after WB_SYNC_ALL writeback. This is not currently true as we do not clear I_DIRTY_TIME in __writeback_single_inode() even for WB_SYNC_ALL writeback in all the cases. This then resulted in the following oops because bdev_write_inode() did not clean the inode and writeback code later stumbled over a dirty inode with detached wb. general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN Modules linked in: CPU: 3 PID: 32 Comm: kworker/u10:1 Not tainted 4.6.0-rc3+ #349 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: writeback wb_workfn (flush-11:0) task: ffff88006ccf1840 ti: ffff88006cda8000 task.ti: ffff88006cda8000 RIP: 0010:[<ffffffff818884d2>] [<ffffffff818884d2>] locked_inode_to_wb_and_lock_list+0xa2/0x750 RSP: 0018:ffff88006cdaf7d0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88006ccf2050 RDX: 0000000000000000 RSI: 000000114c8a8484 RDI: 0000000000000286 RBP: ffff88006cdaf820 R08: ffff88006ccf1840 R09: 0000000000000000 R10: 000229915090805f R11: 0000000000000001 R12: ffff88006a72f5e0 R13: dffffc0000000000 R14: ffffed000d4e5eed R15: ffffffff8830cf40 FS: 0000000000000000(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000003301bf8 CR3: 000000006368f000 CR4: 00000000000006e0 DR0: 0000000000001ec9 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600 Stack: ffff88006a72f680 ffff88006a72f768 ffff8800671230d8 03ff88006cdaf948 ffff88006a72f668 ffff88006a72f5e0 ffff8800671230d8 ffff88006cdaf948 ffff880065b90cc8 ffff880067123100 ffff88006cdaf970 ffffffff8188e12e Call Trace: [< inline >] inode_to_wb_and_lock_list fs/fs-writeback.c:309 [<ffffffff8188e12e>] writeback_sb_inodes+0x4de/0x1250 fs/fs-writeback.c:1554 [<ffffffff8188efa4>] __writeback_inodes_wb+0x104/0x1e0 fs/fs-writeback.c:1600 [<ffffffff8188f9ae>] wb_writeback+0x7ce/0xc90 fs/fs-writeback.c:1709 [< inline >] wb_do_writeback fs/fs-writeback.c:1844 [<ffffffff81891079>] wb_workfn+0x2f9/0x1000 fs/fs-writeback.c:1884 [<ffffffff813bcd1e>] process_one_work+0x78e/0x15c0 kernel/workqueue.c:2094 [<ffffffff813bdc2b>] worker_thread+0xdb/0xfc0 kernel/workqueue.c:2228 [<ffffffff813cdeef>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303 [<ffffffff867bc5d2>] ret_from_fork+0x22/0x50 arch/x86/entry/entry_64.S:392 Code: 05 94 4a a8 06 85 c0 0f 85 03 03 00 00 e8 07 15 d0 ff 41 80 3e 00 0f 85 64 06 00 00 49 8b 9c 24 88 01 00 00 48 89 d8 48 c1 e8 03 <42> 80 3c 28 00 0f 85 17 06 00 00 48 8b 03 48 83 c0 50 48 39 c3 RIP [< inline >] wb_get include/linux/backing-dev-defs.h:212 RIP [<ffffffff818884d2>] locked_inode_to_wb_and_lock_list+0xa2/0x750 fs/fs-writeback.c:281 RSP <ffff88006cdaf7d0> ---[ end trace 986a4d314dcb2694 ]--- Fix the problem by making sure __writeback_single_inode() writes inode only with dirty times in WB_SYNC_ALL mode. Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-04block: remove BLK_DEV_DAX config optionRoss Zwisler
The functionality for block device DAX was already removed with commit acc93d30d7d4 ("Revert "block: enable dax for raw block devices"") However, we still had a config option hanging around that was always disabled because it depended on CONFIG_BROKEN. This config option was introduced in commit 03cdadb04077 ("block: disable block device DAX by default") This change reverts that commit, removing the dead config option. Link: http://lkml.kernel.org/r/20160729182314.6368-1-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-03hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common()Dan Carpenter
We can't pass error pointers to kfree() or it causes an oops. Fixes: 52b209f7b848 ('get rid of hostfs_read_inode()') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2016-08-03Btrfs: fix __MAX_CSUM_ITEMSChris Mason
Jeff Mahoney's cleanup commit (14a1e067b4) wasn't correct for csums on machines where the pagesize >= metadata blocksize. This just reverts the relevant hunks to bring the old math back. Signed-off-by: Chris Mason <clm@fb.com>