summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-02-14kernfs: fix kernfs_node_from_dentry()Li Zefan
Currently kernfs_node_from_dentry() returns NULL for root dentry, because root_dentry->d_op == NULL. Due to this bug cgroupstats_build() returns -EINVAL for root cgroup. # mount -t cgroup -o cpuacct /cgroup # Documentation/accounting/getdelays -C /cgroup fatal reply error, errno -22 With this fix: # Documentation/accounting/getdelays -C /cgroup sleeping 305, blocked 0, running 1, stopped 0, uninterruptible 1 Signed-off-by: Li Zefan <lizefan@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-14ACPI / platform: drop redundant ACPI_HANDLE checkJosh Cartwright
The acpi_dev_pm_attach/_detach functions perform their own checks to ensure the device has an ACPI companion. It is not necessary for the caller to do so. This mirrors what other busses with ACPI dev PM support do (i2c, spi, sdio). Cc: Len Brown <lenb@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Josh Cartwright <joshc@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-11Merge branch 'master' into driver-core-next-test-merge-rc2Tejun Heo
da9846ae1518 ("kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flag") in driver-core-linus conflicts with kernfs_drain() updates in driver-core-next. The former just adds the missing KERNFS_LOCKDEP checks which are already handled by kernfs_lockdep() checks in driver-core-next. The conflict can be resolved by taking code from driver-core-next. Conflicts: fs/kernfs/dir.c
2014-02-11kernfs: fix hash calculation in kernfs_rename_ns()Tejun Heo
3eef34ad7dc3 ("kernfs: implement kernfs_get_parent(), kernfs_name/path() and friends") restructured kernfs_rename_ns() such that new name assignment happens under kernfs_rename_lock; unfortunately, it mistakenly passed NULL to kernfs_name_hash() to calculate the new hash if the name hasn't changed, which can lead to oops. Fix it by using kn->name and kn->ns when calculating the new hash. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-10Linux 3.14-rc2Linus Torvalds
2014-02-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull SELinux fixes from James Morris. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: SELinux: Fix kernel BUG on empty security contexts. selinux: add SOCK_DIAG_BY_FAMILY to the list of netlink message types
2014-02-10Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "A couple of fixes, both -stable fodder. The O_SYNC bug is fairly old..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix a kmap leak in virtio_console fix O_SYNC|O_APPEND syncing the wrong range on write()
2014-02-10Merge branch 'stable-3.14' of git://git.infradead.org/users/pcmoore/selinux ↵James Morris
into for-linus
2014-02-09fix a kmap leak in virtio_consoleAl Viro
While we are at it, don't do kmap() under kmap_atomic(), *especially* for a page we'd allocated with GFP_KERNEL. It's spelled "page_address", and had that been more than that, we'd have a real trouble - kmap_high() can block, and doing that while holding kmap_atomic() is a Bad Idea(tm). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-02-09fix O_SYNC|O_APPEND syncing the wrong range on write()Al Viro
It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support) when sync_page_range() had been introduced; generic_file_write{,v}() correctly synced pos_after_write - written .. pos_after_write - 1 but generic_file_aio_write() synced pos_before_write .. pos_before_write + written - 1 instead. Which is not the same thing with O_APPEND, obviously. A couple of years later correct variant had been killed off when everything switched to use of generic_file_aio_write(). All users of generic_file_aio_write() are affected, and the same bug has been copied into other instances of ->aio_write(). The fix is trivial; the only subtle point is that generic_write_sync() ought to be inlined to avoid calculations useless for the majority of calls. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-02-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "This is a small collection of fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix data corruption when reading/updating compressed extents Btrfs: don't loop forever if we can't run because of the tree mod log btrfs: reserve no transaction units in btrfs_ioctl_set_features btrfs: commit transaction after setting label and features Btrfs: fix assert screwup for the pending move stuff
2014-02-09Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Tooling fixes, mostly related to the KASLR fallout, but also other fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf buildid-cache: Check relocation when checking for existing kcore perf tools: Adjust kallsyms for relocated kernel perf tests: No need to set up ref_reloc_sym perf symbols: Prevent the use of kcore if the kernel has moved perf record: Get ref_reloc_sym from kernel map perf machine: Set up ref_reloc_sym in machine__create_kernel_maps() perf machine: Add machine__get_kallsyms_filename() perf tools: Add kallsyms__get_function_start() perf symbols: Fix symbol annotation for relocated kernel perf tools: Fix include for non x86 architectures perf tools: Fix AAAAARGH64 memory barriers perf tools: Demangle kernel and kernel module symbols too perf/doc: Remove mention of non-existent set_perf_event_pending() from design.txt
2014-02-09Btrfs: fix data corruption when reading/updating compressed extentsFilipe David Borba Manana
When using a mix of compressed file extents and prealloc extents, it is possible to fill a page of a file with random, garbage data from some unrelated previous use of the page, instead of a sequence of zeroes. A simple sequence of steps to get into such case, taken from the test case I made for xfstests, is: _scratch_mkfs _scratch_mount "-o compress-force=lzo" $XFS_IO_PROG -f -c "pwrite -S 0x06 -b 18670 266978 18670" $SCRATCH_MNT/foobar $XFS_IO_PROG -c "falloc 26450 665194" $SCRATCH_MNT/foobar $XFS_IO_PROG -c "truncate 542872" $SCRATCH_MNT/foobar $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar This results in the following file items in the fs tree: item 4 key (257 INODE_ITEM 0) itemoff 15879 itemsize 160 inode generation 6 transid 6 size 542872 block group 0 mode 100600 item 5 key (257 INODE_REF 256) itemoff 15863 itemsize 16 inode ref index 2 namelen 6 name: foobar item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53 extent data disk byte 0 nr 0 gen 6 extent data offset 0 nr 24576 ram 266240 extent compression 0 item 7 key (257 EXTENT_DATA 24576) itemoff 15757 itemsize 53 prealloc data disk byte 12849152 nr 241664 gen 6 prealloc data offset 0 nr 241664 item 8 key (257 EXTENT_DATA 266240) itemoff 15704 itemsize 53 extent data disk byte 12845056 nr 4096 gen 6 extent data offset 0 nr 20480 ram 20480 extent compression 2 item 9 key (257 EXTENT_DATA 286720) itemoff 15651 itemsize 53 prealloc data disk byte 13090816 nr 405504 gen 6 prealloc data offset 0 nr 258048 The on disk extent at offset 266240 (which corresponds to 1 single disk block), contains 5 compressed chunks of file data. Each of the first 4 compress 4096 bytes of file data, while the last one only compresses 3024 bytes of file data. Therefore a read into the file region [285648 ; 286720[ (length = 4096 - 3024 = 1072 bytes) should always return zeroes (our next extent is a prealloc one). The solution here is the compression code path to zero the remaining (untouched) bytes of the last page it uncompressed data into, as the information about how much space the file data consumes in the last page is not known in the upper layer fs/btrfs/extent_io.c:__do_readpage(). In __do_readpage we were correctly zeroing the remainder of the page but only if it corresponds to the last page of the inode and if the inode's size is not a multiple of the page size. This would cause not only returning random data on reads, but also permanently storing random data when updating parts of the region that should be zeroed. For the example above, it means updating a single byte in the region [285648 ; 286720[ would store that byte correctly but also store random data on disk. A test case for xfstests follows soon. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-02-09Btrfs: don't loop forever if we can't run because of the tree mod logJosef Bacik
A user reported a 100% cpu hang with my new delayed ref code. Turns out I forgot to increase the count check when we can't run a delayed ref because of the tree mod log. If we can't run any delayed refs during this there is no point in continuing to look, and we need to break out. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-02-09btrfs: reserve no transaction units in btrfs_ioctl_set_featuresDavid Sterba
Added in patch "btrfs: add ioctls to query/change feature bits online" modifications to superblock don't need to reserve metadata blocks when starting a transaction. Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-02-09btrfs: commit transaction after setting label and featuresJeff Mahoney
The set_fslabel ioctl uses btrfs_end_transaction, which means it's possible that the change will be lost if the system crashes, same for the newly set features. Let's use btrfs_commit_transaction instead. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
2014-02-09Btrfs: fix assert screwup for the pending move stuffJosef Bacik
Wang noticed that he was failing btrfs/030 even though me and Filipe couldn't reproduce. Turns out this is because Wang didn't have CONFIG_BTRFS_ASSERT set, which meant that a key part of Filipe's original patch was not being built in. This appears to be a mess up with merging Filipe's patch as it does not exist in his original patch. Fix this by changing how we make sure del_waiting_dir_move asserts that it did not error and take the function out of the ifdef check. This makes btrfs/030 pass with the assert on or off. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Filipe Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-02-08Merge tag 'pinctrl-v3.14-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pinctrl fixes from Linus Walleij: "First round of pin control fixes for v3.14: - Protect pinctrl_list_add() with the proper mutex. This was identified by RedHat. Caused nasty locking warnings was rootcased by Stanislaw Gruszka. - Avoid adding dangerous debugfs files when either half of the subsystem is unused: pinmux or pinconf. - Various fixes to various drivers: locking, hardware particulars, DT parsing, error codes" * tag 'pinctrl-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: tegra: return correct error type pinctrl: do not init debugfs entries for unimplemented functionalities pinctrl: protect pinctrl_list add pinctrl: sirf: correct the pin index of ac97_pins group pinctrl: imx27: fix offset calculation in imx_read_2bit pinctrl: vt8500: Change devicetree data parsing pinctrl: imx27: fix wrong offset to ICONFB pinctrl: at91: use locked variant of irq_set_handler
2014-02-08Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "Add a missing Kconfig dependency" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Generic irq chip requires IRQ_DOMAIN
2014-02-08Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "Quite a varied little collection of fixes. Most of them are relatively small or isolated; the biggest one is Mel Gorman's fixes for TLB range flushing. A couple of AMD-related fixes (including not crashing when given an invalid microcode image) and fix a crash when compiled with gcov" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode, AMD: Unify valid container checks x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y x86/efi: Allow mapping BGRT on x86-32 x86: Fix the initialization of physnode_map x86, cpu hotplug: Fix stack frame warning in check_irq_vectors_for_cpu_disable() x86/intel/mid: Fix X86_INTEL_MID dependencies arch/x86/mm/srat: Skip NUMA_NO_NODE while parsing SLIT mm, x86: Revisit tlb_flushall_shift tuning for page flushes except on IvyBridge x86: mm: change tlb_flushall_shift for IvyBridge x86/mm: Eliminate redundant page table walk during TLB range flushing x86/mm: Clean up inconsistencies when flushing TLB ranges mm, x86: Account for TLB flushes only when debugging x86/AMD/NB: Fix amd_set_subcaches() parameter type x86/quirks: Add workaround for AMD F16h Erratum792 x86, doc, kconfig: Fix dud URL for Microcode data
2014-02-08Merge tag 'jfs-3.14-rc2' of git://github.com/kleikamp/linux-shaggyLinus Torvalds
Pull jfs fix from David Kleikamp: "Fix regression" * tag 'jfs-3.14-rc2' of git://github.com/kleikamp/linux-shaggy: jfs: fix generic posix ACL regression
2014-02-08jfs: fix generic posix ACL regressionDave Kleikamp
I missed a couple errors in reviewing the patches converting jfs to use the generic posix ACL function. Setting ACL's currently fails with -EOPNOTSUPP. Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reported-by: Michael L. Semon <mlsemon35@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2014-02-08watchdog: dw_wdt: Add dependency on HAS_IOMEMRichard Weinberger
On archs like S390 or um this driver cannot build nor work. Make it depend on HAS_IOMEM to bypass build failures. drivers/built-in.o: In function `dw_wdt_drv_probe': drivers/watchdog/dw_wdt.c:302: undefined reference to `devm_ioremap_resource' Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
2014-02-08kernfs: add CONFIG_KERNFSTejun Heo
As sysfs was kernfs's only user, kernfs has been piggybacking on CONFIG_SYSFS; however, kernfs is scheduled to grow a new user very soon. Introduce a separate config option CONFIG_KERNFS which is to be selected by kernfs users. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-08sysfs, kobject: add sysfs wrapper for kernfs_enable_ns()Tejun Heo
Currently, kobject is invoking kernfs_enable_ns() directly. This is fine now as sysfs and kernfs are enabled and disabled together. If sysfs is disabled, kernfs_enable_ns() is switched to dummy implementation too and everything is fine; however, kernfs will soon have its own config option CONFIG_KERNFS and !SYSFS && KERNFS will be possible, which can make kobject call into non-dummy kernfs_enable_ns() with NULL kernfs_node pointers leading to an oops. Introduce sysfs_enable_ns() which is a wrapper around kernfs_enable_ns() so that it can be made a noop depending only on CONFIG_SYSFS regardless of the planned CONFIG_KERNFS. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-08kernfs: implement kernfs_get_parent(), kernfs_name/path() and friendsTejun Heo
kernfs_node->parent and ->name are currently marked as "published" indicating that kernfs users may access them directly; however, those fields may get updated by kernfs_rename[_ns]() and unrestricted access may lead to erroneous values or oops. Protect ->parent and ->name updates with a irq-safe spinlock kernfs_rename_lock and implement the following accessors for these fields. * kernfs_name() - format the node's name into the specified buffer * kernfs_path() - format the node's path into the specified buffer * pr_cont_kernfs_name() - pr_cont a node's name (doesn't need buffer) * pr_cont_kernfs_path() - pr_cont a node's path (doesn't need buffer) * kernfs_get_parent() - pin and return a node's parent All can be called under any context. The recursive sysfs_pathname() in fs/sysfs/dir.c is replaced with kernfs_path() and sysfs_rename_dir_ns() is updated to use kernfs_get_parent() instead of dereferencing parent directly. v2: Dummy definition of kernfs_path() for !CONFIG_KERNFS was missing static inline making it cause a lot of build warnings. Add it. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-08kernfs: implement kernfs_node_from_dentry(), kernfs_root_from_sb() and ↵Tejun Heo
kernfs_rename() Implement helpers to determine node from dentry and root from super_block. Also add a kernfs_rename_ns() wrapper which assumes NULL namespace. These generally make sense and will be used by cgroup. v2: Some dummy implementations for !CONFIG_SYSFS was missing. Fixed. Reported by kbuild test robot. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-08kernfs: add kernfs_open_file->privTejun Heo
Add a private data field to be used by kernfs file operations. This generally makes sense and will be used by cgroup. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07kernfs: implement kernfs_ops->atomic_write_lenTejun Heo
A write to a kernfs_node is buffered through a kernel buffer. Writes <= PAGE_SIZE are performed atomically, while larger ones are executed in PAGE_SIZE chunks. While this is enough for sysfs, cgroup which is scheduled to be converted to use kernfs needs a bit more control over it. This patch adds kernfs_ops->atomic_write_len. If not set (zero), the behavior stays the same. If set, writes upto the size are executed atomically and larger writes are rejected with -E2BIG. A different implementation strategy would be allowing configuring chunking size while making the original write size available to the write method; however, such strategy, while being more complicated, doesn't really buy anything. If the write implementation has to handle chunking, the specific chunk size shouldn't matter all that much. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07kernfs: allow nodes to be created in the deactivated stateTejun Heo
Currently, kernfs_nodes are made visible to userland on creation, which makes it difficult for kernfs users to atomically succeed or fail creation of multiple nodes. In addition, if something fails after creating some nodes, the created nodes might already be in use and their active refs need to be drained for removal, which has the potential to introduce tricky reverse locking dependency on active_ref depending on how the error path is synchronized. This patch introduces per-root flag KERNFS_ROOT_CREATE_DEACTIVATED. If set, all nodes under the root are created in the deactivated state and stay invisible to userland until explicitly enabled by the new kernfs_activate() API. Also, nodes which have never been activated are guaranteed to bypass draining on removal thus allowing error paths to not worry about lockding dependency on active_ref draining. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07kernfs: add missing kernfs_active() checks in directory operationsTejun Heo
kernfs_iop_lookup(), kernfs_dir_pos() and kernfs_dir_next_pos() were missing kernfs_active() tests before using the found kernfs_node. As deactivated state is currently visible only while a node is being removed, this doesn't pose an actual problem. e.g. lookup succeeding on a deactivated node doesn't harm anything as the eventual file operations are gonna fail and those failures are indistinguishible from the cases in which the lookups had happened before the node was deactivated. However, we're gonna allow new nodes to be created deactivated and then activated explicitly by the kernfs user when it sees fit. This is to support atomically making multiple nodes visible to userland and thus those nodes must not be visible to userland before activated. Let's plug the lookup and readdir holes so that deactivated nodes are invisible to userland. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07kernfs: implement kernfs_syscall_ops->remount_fs() and ->show_options()Tejun Heo
Add two super_block related syscall callbacks ->remount_fs() and ->show_options() to kernfs_syscall_ops. These simply forward the matching super_operations. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07kernfs: rename kernfs_dir_ops to kernfs_syscall_opsTejun Heo
We're gonna need non-dir syscall callbacks, which will make dir_ops a misnomer. Let's rename kernfs_dir_ops to kernfs_syscall_ops. This is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07kernfs: invoke dir_ops while holding active ref of the target nodeTejun Heo
kernfs_dir_ops are currently being invoked without any active reference, which makes it tricky for the invoked operations to determine whether the objects associated those nodes are safe to access and will remain that way for the duration of such operations. kernfs already has active_ref mechanism to deal with this which makes the removal of a given node the synchronization point for gating the file operations. There's no reason for dir_ops to be any different. Update the dir_ops handling so that active_ref is held while the dir_ops are executing. This guarantees that while a dir_ops is executing the target nodes stay alive. As kernfs_dir_ops doesn't have any in-kernel user at this point, this doesn't affect anybody. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()Tejun Heo
All device_schedule_callback_owner() users are converted to use device_remove_file_self(). Remove now unused {sysfs|device}_schedule_callback_owner(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07s390: use device_remove_file_self() instead of device_schedule_callback()Tejun Heo
driver-core now supports synchrnous self-deletion of attributes and the asynchrnous removal mechanism is scheduled for removal. Use it instead of device_schedule_callback(). * Conversions in arch/s390/pci/pci_sysfs.c and drivers/s390/block/dcssblk.c are straightforward. * drivers/s390/cio/ccwgroup.c is a bit more tricky because ccwgroup_notifier() was (ab)using device_schedule_callback() to purely obtain a process context to kick off ungroup operation which may block from a notifier callback. Rename ccwgroup_ungroup_callback() to ccwgroup_ungroup() and make it take ccwgroup_device * instead. The new function is now called directly from ccwgroup_ungroup_store(). ccwgroup_notifier() chain is updated to explicitly bounce through ccwgroup_device->ungroup_work. This also removes possible failure from memory pressure. Only compile-tested. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux390@de.ibm.com Cc: linux-s390@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07scsi: use device_remove_file_self() instead of device_schedule_callback()Tejun Heo
driver-core now supports synchrnous self-deletion of attributes and the asynchrnous removal mechanism is scheduled for removal. Use it instead of device_schedule_callback(). This makes "delete" behave synchronously. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "James E.J. Bottomley" <JBottomley@parallels.com> Cc: linux-scsi@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07pci: use device_remove_file_self() instead of device_schedule_callback()Tejun Heo
driver-core now supports synchrnous self-deletion of attributes and the asynchrnous removal mechanism is scheduled for removal. Use it instead of device_schedule_callback(). This makes "remove" behave synchronously. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappersTejun Heo
Sometimes it's necessary to implement a node which wants to delete nodes including itself. This isn't straightforward because of kernfs active reference. While a file operation is in progress, an active reference is held and kernfs_remove() waits for all such references to drain before completing. For a self-deleting node, this is a deadlock as kernfs_remove() ends up waiting for an active reference that itself is sitting on top of. This currently is worked around in the sysfs layer using sysfs_schedule_callback() which makes such removals asynchronous. While it works, it's rather cumbersome and inherently breaks synchronicity of the operation - the file operation which triggered the operation may complete before the removal is finished (or even started) and the removal may fail asynchronously. If a removal operation is immmediately followed by another operation which expects the specific name to be available (e.g. removal followed by rename onto the same name), there's no way to make the latter operation reliable. The thing is there's no inherent reason for this to be asynchrnous. All that's necessary to do this synchronous is a dedicated operation which drops its own active ref and deactivates self. This patch implements kernfs_remove_self() and its wrappers in sysfs and driver core. kernfs_remove_self() is to be called from one of the file operations, drops the active ref the task is holding, removes the self node, and restores active ref to the dead node so that the ref is balanced afterwards. __kernfs_remove() is updated so that it takes an early exit if the target node is already fully removed so that the active ref restored by kernfs_remove_self() after removal doesn't confuse the deactivation path. This makes implementing self-deleting nodes very easy. The normal removal path doesn't even need to be changed to use kernfs_remove_self() for the self-deleting node. The method can invoke kernfs_remove_self() on itself before proceeding the normal removal path. kernfs_remove() invoked on the node by the normal deletion path will simply be ignored. This will replace sysfs_schedule_callback(). A subtle feature of sysfs_schedule_callback() is that it collapses multiple invocations - even if multiple removals are triggered, the removal callback is run only once. An equivalent effect can be achieved by testing the return value of kernfs_remove_self() - only the one which gets %true return value should proceed with actual deletion. All other instances of kernfs_remove_self() will wait till the enclosing kernfs operation which invoked the winning instance of kernfs_remove_self() finishes and then return %false. This trivially makes all users of kernfs_remove_self() automatically show correct synchronous behavior even when there are multiple concurrent operations - all "echo 1 > delete" instances will finish only after the whole operation is completed by one of the instances. Note that manipulation of active ref is implemented in separate public functions - kernfs_[un]break_active_protection(). kernfs_remove_self() is the only user at the moment but this will be used to cater to more complex cases. v2: For !CONFIG_SYSFS, dummy version kernfs_remove_self() was missing and sysfs_remove_file_self() had incorrect return type. Fix it. Reported by kbuild test bot. v3: kernfs_[un]break_active_protection() separated out from kernfs_remove_self() and exposed as public API. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07kernfs: remove KERNFS_REMOVEDTejun Heo
KERNFS_REMOVED is used to mark half-initialized and dying nodes so that they don't show up in lookups and deny adding new nodes under or renaming it; however, its role overlaps that of deactivation. It's necessary to deny addition of new children while removal is in progress; however, this role considerably intersects with deactivation - KERNFS_REMOVED prevents new children while deactivation prevents new file operations. There's no reason to have them separate making things more complex than necessary. This patch removes KERNFS_REMOVED. * Instead of KERNFS_REMOVED, each node now starts its life deactivated. This means that we now use both atomic_add() and atomic_sub() on KN_DEACTIVATED_BIAS, which is INT_MIN. The compiler generates an overflow warnings when negating INT_MIN as the negation can't be represented as a positive number. Nothing is actually broken but let's bump BIAS by one to avoid the warnings for archs which negates the subtrahend.. * A new helper kernfs_active() which tests whether kn->active >= 0 is added for convenience and lockdep annotation. All KERNFS_REMOVED tests are replaced with negated kernfs_active() tests. * __kernfs_remove() is updated to deactivate, but not drain, all nodes in the subtree instead of setting KERNFS_REMOVED. This removes deactivation from kernfs_deactivate(), which is now renamed to kernfs_drain(). * Sanity check on KERNFS_REMOVED in kernfs_put() is replaced with checks on the active ref. * Some comment style updates in the affected area. v2: Reordered before removal path restructuring. kernfs_active() dropped and kernfs_get/put_active() used instead. RB_EMPTY_NODE() used in the lookup paths. v3: Reverted most of v2 except for creating a new node with KN_DEACTIVATED_BIAS. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07kernfs: remove KERNFS_ACTIVE_REF and add kernfs_lockdep()Tejun Heo
There currently are two mechanisms gating active ref lockdep annotations - KERNFS_LOCKDEP flag and KERNFS_ACTIVE_REF type mask. The former disables lockdep annotations in kernfs_get/put_active() while the latter disables all of kernfs_deactivate(). While KERNFS_ACTIVE_REF also behaves as an optimization to skip the deactivation step for non-file nodes, the benefit is marginal and it needlessly diverges code paths. Let's drop KERNFS_ACTIVE_REF. While at it, add a test helper kernfs_lockdep() to test KERNFS_LOCKDEP flag so that it's more convenient and the related code can be compiled out when not enabled. v2: Refreshed on top of ("kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flag"). As the earlier patch already added KERNFS_LOCKDEP tests to kernfs_deactivate(), those additions are dropped from this patch and the existing ones are simply converted to kernfs_lockdep(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07kernfs: remove kernfs_addrm_cxtTejun Heo
kernfs_addrm_cxt and the accompanying kernfs_addrm_start/finish() were added because there were operations which should be performed outside kernfs_mutex after adding and removing kernfs_nodes. The necessary operations were recorded in kernfs_addrm_cxt and performed by kernfs_addrm_finish(); however, after the recent changes which relocated deactivation and unmapping so that they're performed directly during removal, the only operation kernfs_addrm_finish() performs is kernfs_put(), which can be moved inside the removal path too. This patch moves the kernfs_put() of the base ref to __kernfs_remove() and remove kernfs_addrm_cxt and kernfs_addrm_start/finish(). * kernfs_add_one() is updated to grab and release kernfs_mutex itself. sysfs_addrm_start/finish() invocations around it are removed from all users. * __kernfs_remove() puts an unlinked node directly instead of chaining it to kernfs_addrm_cxt. Its callers are updated to grab and release kernfs_mutex instead of calling kernfs_addrm_start/finish() around it. v2: Rebased on top of "kernfs: associate a new kernfs_node with its parent on creation" which dropped @parent from kernfs_add_one(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07kernfs: invoke kernfs_unmap_bin_file() directly from kernfs_deactivate()Tejun Heo
kernfs_unmap_bin_file() is supposed to unmap all memory mappings of the target file before kernfs_remove() finishes; however, it currently is being called from kernfs_addrm_finish() and has the same race problem as the original implementation of deactivation when there are multiple removers - only the remover which snatches the node to its addrm_cxt->removed list is guaranteed to wait for its completion before returning. It can be easily fixed by moving kernfs_unmap_bin_file() invocation from kernfs_addrm_finish() to kernfs_deactivated(). The function may be called multiple times but that shouldn't do any harm. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07kernfs: restructure removal path to fix possible premature returnTejun Heo
The recursive nature of kernfs_remove() means that, even if kernfs_remove() is not allowed to be called multiple times on the same node, there may be race conditions between removal of parent and its descendants. While we can claim that kernfs_remove() shouldn't be called on one of the descendants while the removal of an ancestor is in progress, such rule is unnecessarily restrictive and very difficult to enforce. It's better to simply allow invoking kernfs_remove() as the caller sees fit as long as the caller ensures that the node is accessible. The current behavior in such situations is broken. Whoever enters removal path first takes the node off the hierarchy and then deactivates. Following removers either return as soon as it notices that it's not the first one or can't even find the target node as it has already been removed from the hierarchy. In both cases, the following removers may finish prematurely while the nodes which should be removed and drained are still being processed by the first one. This patch restructures so that multiple removers, whether through recursion or direction invocation, always follow the following rules. * When there are multiple concurrent removers, only one puts the base ref. * Regardless of which one puts the base ref, all removers are blocked until the target node is fully deactivated and removed. To achieve the above, removal path now first marks all descendants including self REMOVED and then deactivates and unlinks leftmost descendant one-by-one. kernfs_deactivate() is called directly from __kernfs_removal() and drops and regrabs kernfs_mutex for each descendant to drain active refs. As this means that multiple removers can enter kernfs_deactivate() for the same node, the function is updated so that it can handle multiple deactivators of the same node - only one actually deactivates but all wait till drain completion. The restructured removal path guarantees that a removed node gets unlinked only after the node is deactivated and drained. Combined with proper multiple deactivator handling, this guarantees that any invocation of kernfs_remove() returns only after the node itself and all its descendants are deactivated, drained and removed. v2: Draining separated into a separate loop (used to be in the same loop as unlink) and done from __kernfs_deactivate(). This is to allow exposing deactivation as a separate interface later. Root node removal was broken in v1 patch. Fixed. v3: Revert most of v2 except for root node removal fix and simplification of KERNFS_REMOVED setting loop. v4: Refreshed on top of ("kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flag"). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07kernfs: replace kernfs_node->u.completion with kernfs_root->deactivate_waitqTejun Heo
kernfs_node->u.completion is used to notify deactivation completion from kernfs_put_active() to kernfs_deactivate(). We now allow multiple racing removals of the same node and the current removal scheme is no longer correct - kernfs_remove() invocation may return before the node is properly deactivated if it races against another removal. The removal path will be restructured to address the issue. To help such restructure which requires supporting multiple waiters, this patch replaces kernfs_node->u.completion with kernfs_root->deactivate_waitq. This makes deactivation event notifications share a per-root waitqueue_head; however, the wait path is quite cold and this will also allow shaving one pointer off kernfs_node. v2: Refreshed on top of ("kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flag"). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flagTejun Heo
kernfs_deactivate() forgot to check whether KERNFS_LOCKDEP is set before performing lockdep annotations and ends up feeding uninitialized lockdep_map to lockdep triggering warning like the following on USB stick hotunplug. usb 1-2: USB disconnect, device number 2 INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. CPU: 1 PID: 62 Comm: khubd Not tainted 3.13.0-work+ #82 Hardware name: empty empty/S3992, BIOS 080011 10/26/2007 ffff880065ca7f60 ffff88013a4ffa08 ffffffff81cfb6bd 0000000000000002 ffff88013a4ffac8 ffffffff810f8530 ffff88013a4fc710 0000000000000002 ffff880100000000 ffffffff82a3db50 0000000000000001 ffff88013a4fc710 Call Trace: [<ffffffff81cfb6bd>] dump_stack+0x4e/0x7a [<ffffffff810f8530>] __lock_acquire+0x1910/0x1e70 [<ffffffff810f931a>] lock_acquire+0x9a/0x1d0 [<ffffffff8127c75e>] kernfs_deactivate+0xee/0x130 [<ffffffff8127d4c8>] kernfs_addrm_finish+0x38/0x60 [<ffffffff8127d701>] kernfs_remove_by_name_ns+0x51/0xa0 [<ffffffff8127b4f1>] remove_files.isra.1+0x41/0x80 [<ffffffff8127b7e7>] sysfs_remove_group+0x47/0xa0 [<ffffffff8127b873>] sysfs_remove_groups+0x33/0x50 [<ffffffff8177d66d>] device_remove_attrs+0x4d/0x80 [<ffffffff8177e25e>] device_del+0x12e/0x1d0 [<ffffffff819722c2>] usb_disconnect+0x122/0x1a0 [<ffffffff819749b5>] hub_thread+0x3c5/0x1290 [<ffffffff810c6a6d>] kthread+0xed/0x110 [<ffffffff81d0a56c>] ret_from_fork+0x7c/0xb0 Fix it by making kernfs_deactivate() perform lockdep annotations only if KERNFS_LOCKDEP is set. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Fabio Estevam <festevam@gmail.com> Reported-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07dma-buf: avoid using IS_ERR_OR_NULLColin Cross
dma_buf_map_attachment and dma_buf_vmap can return NULL or ERR_PTR on a error. This encourages a common buggy pattern in callers: sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL); if (IS_ERR_OR_NULL(sgt)) return PTR_ERR(sgt); This causes the caller to return 0 on an error. IS_ERR_OR_NULL is almost always a sign of poorly-defined error handling. This patch converts dma_buf_map_attachment to always return ERR_PTR, and fixes the callers that incorrectly handled NULL. There are a few more callers that were not checking for NULL at all, which would have dereferenced a NULL pointer later. There are also a few more callers that correctly handled NULL and ERR_PTR differently, I left those alone but they could also be modified to delete the NULL check. This patch also converts dma_buf_vmap to always return NULL. All the callers to dma_buf_vmap only check for NULL, and would have dereferenced an ERR_PTR and panic'd if one was ever returned. This is not consistent with the rest of the dma buf APIs, but matches the expectations of all of the callers. Signed-off-by: Colin Cross <ccross@android.com> Reviewed-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-07Merge tag 'driver-core-3.14-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fix from Greg KH: "Here is a single kernfs fix to resolve a much-reported lockdep issue with the removal of entries in sysfs" * tag 'driver-core-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flag
2014-02-07Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph fixes from Sage Weil: "There is an RBD fix for a crash due to the immutable bio changes, an error path fix, and a locking fix in the recent redirect support" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: do not dereference a NULL bio pointer libceph: take map_sem for read in handle_reply() libceph: factor out logic from ceph_osdc_start_request() libceph: fix error handling in ceph_osdc_init()
2014-02-07Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Relax VDSO alignment requirements so that the kernel-picked one (4K) does not conflict with the dynamic linker's one (64K) - VDSO gettimeofday fix - Barrier fixes for atomic operations and cache flushing - TLB invalidation when overriding early page mappings during boot - Wired up new 32-bit arm (compat) syscalls - LSM_MMAP_MIN_ADDR when COMPAT is enabled - defconfig update - Clean-up (comments, pgd_alloc). * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: defconfig: Expand default enabled features arm64: asm: remove redundant "cc" clobbers arm64: atomics: fix use of acquire + release for full barrier semantics arm64: barriers: allow dsb macro to take option parameter security: select correct default LSM_MMAP_MIN_ADDR on arm on arm64 arm64: compat: Wire up new AArch32 syscalls arm64: vdso: update wtm fields for CLOCK_MONOTONIC_COARSE arm64: vdso: fix coarse clock handling arm64: simplify pgd_alloc arm64: fix typo: s/SERRROR/SERROR/ arm64: Invalidate the TLB when replacing pmd entries during boot arm64: Align CMA sizes to PAGE_SIZE arm64: add DSB after icache flush in __flush_icache_all() arm64: vdso: prevent ld from aligning PT_LOAD segments to 64k