summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2013-07-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "This is a re-do of the net-next pull request for the current merge window. The only difference from the one I made the other day is that this has Eliezer's interface renames and the timeout handling changes made based upon your feedback, as well as a few bug fixes that have trickeled in. Highlights: 1) Low latency device polling, eliminating the cost of interrupt handling and context switches. Allows direct polling of a network device from socket operations, such as recvmsg() and poll(). Currently ixgbe, mlx4, and bnx2x support this feature. Full high level description, performance numbers, and design in commit 0a4db187a999 ("Merge branch 'll_poll'") From Eliezer Tamir. 2) With the routing cache removed, ip_check_mc_rcu() gets exercised more than ever before in the case where we have lots of multicast addresses. Use a hash table instead of a simple linked list, from Eric Dumazet. 3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski, Marek Puzyniak, Michal Kazior, and Sujith Manoharan. 4) Support reporting the TUN device persist flag to userspace, from Pavel Emelyanov. 5) Allow controlling network device VF link state using netlink, from Rony Efraim. 6) Support GRE tunneling in openvswitch, from Pravin B Shelar. 7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from Daniel Borkmann and Eric Dumazet. 8) Allow controlling of TCP quickack behavior on a per-route basis, from Cong Wang. 9) Several bug fixes and improvements to vxlan from Stephen Hemminger, Pravin B Shelar, and Mike Rapoport. In particular, support receiving on multiple UDP ports. 10) Major cleanups, particular in the area of debugging and cookie lifetime handline, to the SCTP protocol code. From Daniel Borkmann. 11) Allow packets to cross network namespaces when traversing tunnel devices. From Nicolas Dichtel. 12) Allow monitoring netlink traffic via AF_PACKET sockets, in a manner akin to how we monitor real network traffic via ptype_all. From Daniel Borkmann. 13) Several bug fixes and improvements for the new alx device driver, from Johannes Berg. 14) Fix scalability issues in the netem packet scheduler's time queue, by using an rbtree. From Eric Dumazet. 15) Several bug fixes in TCP loss recovery handling, from Yuchung Cheng. 16) Add support for GSO segmentation of MPLS packets, from Simon Horman. 17) Make network notifiers have a real data type for the opaque pointer that's passed into them. Use this to properly handle network device flag changes in arp_netdev_event(). From Jiri Pirko and Timo Teräs. 18) Convert several drivers over to module_pci_driver(), from Peter Huewe. 19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a O(1) calculation instead. From Eric Dumazet. 20) Support setting of explicit tunnel peer addresses in ipv6, just like ipv4. From Nicolas Dichtel. 21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet. 22) Prevent a single high rate flow from overruning an individual cpu during RX packet processing via selective flow shedding. From Willem de Bruijn. 23) Don't use spinlocks in TCP md5 signing fast paths, from Eric Dumazet. 24) Don't just drop GSO packets which are above the TBF scheduler's burst limit, chop them up so they are in-bounds instead. Also from Eric Dumazet. 25) VLAN offloads are missed when configured on top of a bridge, fix from Vlad Yasevich. 26) Support IPV6 in ping sockets. From Lorenzo Colitti. 27) Receive flow steering targets should be updated at poll() time too, from David Majnemer. 28) Fix several corner case regressions in PMTU/redirect handling due to the routing cache removal, from Timo Teräs. 29) We have to be mindful of ipv4 mapped ipv6 sockets in upd_v6_push_pending_frames(). From Hannes Frederic Sowa. 30) Fix L2TP sequence number handling bugs, from James Chapman." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits) drivers/net: caif: fix wrong rtnl_is_locked() usage drivers/net: enic: release rtnl_lock on error-path vhost-net: fix use-after-free in vhost_net_flush net: mv643xx_eth: do not use port number as platform device id net: sctp: confirm route during forward progress virtio_net: fix race in RX VQ processing virtio: support unlocked queue poll net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit Documentation: Fix references to defunct linux-net@vger.kernel.org net/fs: change busy poll time accounting net: rename low latency sockets functions to busy poll bridge: fix some kernel warning in multicast timer sfc: Fix memory leak when discarding scattered packets sit: fix tunnel update via netlink dt:net:stmmac: Add dt specific phy reset callback support. dt:net:stmmac: Add support to dwmac version 3.610 and 3.710 dt:net:stmmac: Allocate platform data only if its NULL. net:stmmac: fix memleak in the open method ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available net: ipv6: fix wrong ping_v6_sendmsg return value ...
2013-07-09Merge branch 'akpm' (updates from Andrew Morton)Linus Torvalds
Merge second patch-bomb from Andrew Morton: - misc fixes - audit stuff - fanotify/inotify/dnotify things - most of the rest of MM. The new cache shrinker code from Glauber and Dave Chinner probably isn't quite stabilized yet. - ptrace - ipc - partitions - reboot cleanups - add LZ4 decompressor, use it for kernel compression * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) lib/scatterlist: error handling in __sg_alloc_table() scsi_debug: fix do_device_access() with wrap around range crypto: talitos: use sg_pcopy_to_buffer() lib/scatterlist: introduce sg_pcopy_from_buffer() and sg_pcopy_to_buffer() lib/scatterlist: factor out sg_miter_get_next_page() from sg_miter_next() crypto: add lz4 Cryptographic API lib: add lz4 compressor module arm: add support for LZ4-compressed kernel lib: add support for LZ4-compressed kernel decompressor: add LZ4 decompressor module lib: add weak clz/ctz functions reboot: move arch/x86 reboot= handling to generic kernel reboot: arm: change reboot_mode to use enum reboot_mode reboot: arm: prepare reboot_mode for moving to generic kernel code reboot: arm: remove unused restart_mode fields from some arm subarchs reboot: unicore32: prepare reboot_mode for moving to generic kernel code reboot: x86: prepare reboot_mode for moving to generic kernel code reboot: checkpatch.pl the new kernel/reboot.c file reboot: move shutdown/reboot related functions to kernel/reboot.c reboot: remove -stable friendly PF_THREAD_BOUND define ...
2013-07-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "There is some follow-on RBD cleanup after the last window's code drop, a series from Yan fixing multi-mds behavior in cephfs, and then a sprinkling of bug fixes all around. Some warnings, sleeping while atomic, a null dereference, and cleanups" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (36 commits) libceph: fix invalid unsigned->signed conversion for timespec encoding libceph: call r_unsafe_callback when unsafe reply is received ceph: fix race between cap issue and revoke ceph: fix cap revoke race ceph: fix pending vmtruncate race ceph: avoid accessing invalid memory libceph: Fix NULL pointer dereference in auth client code ceph: Reconstruct the func ceph_reserve_caps. ceph: Free mdsc if alloc mdsc->mdsmap failed. ceph: remove sb_start/end_write in ceph_aio_write. ceph: avoid meaningless calling ceph_caps_revoking if sync_mode == WB_SYNC_ALL. ceph: fix sleeping function called from invalid context. ceph: move inode to proper flushing list when auth MDS changes rbd: fix a couple warnings ceph: clear migrate seq when MDS restarts ceph: check migrate seq before changing auth cap ceph: fix race between page writeback and truncate ceph: reset iov_len when discarding cap release messages ceph: fix cap release race libceph: fix truncate size calculation ...
2013-07-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs update from Chris Mason: "These are the usual mixture of bugs, cleanups and performance fixes. Miao has some really nice tuning of our crc code as well as our transaction commits. Josef is peeling off more and more problems related to early enospc, and has a number of important bug fixes in here too" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (81 commits) Btrfs: wait ordered range before doing direct io Btrfs: only do the tree_mod_log_free_eb if this is our last ref Btrfs: hold the tree mod lock in __tree_mod_log_rewind Btrfs: make backref walking code handle skinny metadata Btrfs: fix crash regarding to ulist_add_merge Btrfs: fix several potential problems in copy_nocow_pages_for_inode Btrfs: cleanup the code of copy_nocow_pages_for_inode() Btrfs: fix oops when recovering the file data by scrub function Btrfs: make the chunk allocator completely tree lockless Btrfs: cleanup orphaned root orphan item Btrfs: fix wrong mirror number tuning Btrfs: cleanup redundant code in btrfs_submit_direct() Btrfs: remove btrfs_sector_sum structure Btrfs: check if we can nocow if we don't have data space Btrfs: stop using try_to_writeback_inodes_sb_nr to flush delalloc Btrfs: use a percpu to keep track of possibly pinned bytes Btrfs: check for actual acls rather than just xattrs when caching no acl Btrfs: move btrfs_truncate_page to btrfs_cont_expand instead of btrfs_truncate Btrfs: optimize reada_for_balance Btrfs: optimize read_block_for_search ...
2013-07-09net/fs: change busy poll time accountingEliezer Tamir
Suggested by Linus: Changed time accounting for busy-poll: - Make it microsecond based. - Use unsigned longs. - Revert back to use time_after instead of time_in_range. Reorder poll/select busy loop conditions: - Clear busy_flag after one time we can't busy-poll. - Only init busy_end if we actually are going to busy-poll. Added one more missing need_resched() test. Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-09Merge tag 'for-linus-v3.11-rc1' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Pull xfs update from Ben Myers: "This includes several bugfixes, part of the work for project quotas and group quotas to be used together, performance improvements for inode creation/deletion, buffer readahead, and bulkstat, implementation of the inode change count, an inode create transaction, and the removal of a bunch of dead code. There are also some duplicate commits that you already have from the 3.10-rc series. - part of the work to allow project quotas and group quotas to be used together - inode change count - inode create transaction - block queue plugging in buffer readahead and bulkstat - ordered log vector support - removal of dead code in and around xfs_sync_inode_grab, xfs_ialloc_get_rec, XFS_MOUNT_RETERR, XFS_ALLOCFREE_LOG_RES, XFS_DIROP_LOG_RES, xfs_chash, ctl_table, and xfs_growfs_data_private - don't keep silent if sunit/swidth can not be changed via mount - fix a leak of remote symlink blocks into the filesystem when xattrs are used on symlinks - fix for fiemap to return FIEMAP_EXTENT_UNKOWN flag on delay extents - part of a fix for xfs_fsr - disable speculative preallocation with small files - performance improvements for inode creates and deletes" * tag 'for-linus-v3.11-rc1' of git://oss.sgi.com/xfs/xfs: (61 commits) xfs: Remove incore use of XFS_OQUOTA_ENFD and XFS_OQUOTA_CHKD xfs: Change xfs_dquot_acct to be a 2-dimensional array xfs: Code cleanup and removal of some typedef usage xfs: Replace macro XFS_DQ_TO_QIP with a function xfs: Replace macro XFS_DQUOT_TREE with a function xfs: Define a new function xfs_is_quota_inode() xfs: implement inode change count xfs: Use inode create transaction xfs: Inode create item recovery xfs: Inode create transaction reservations xfs: Inode create log items xfs: Introduce an ordered buffer item xfs: Introduce ordered log vector support xfs: xfs_ifree doesn't need to modify the inode buffer xfs: don't do IO when creating an new inode xfs: don't use speculative prealloc for small files xfs: plug directory buffer readahead xfs: add pluging for bulkstat readahead xfs: Remove dead function prototype xfs_sync_inode_grab() xfs: Remove the left function variable from xfs_ialloc_get_rec() ...
2013-07-09Merge tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Feature highlights include: - Add basic client support for NFSv4.2 - Add basic client support for Labeled NFS (selinux for NFSv4.2) - Fix the use of credentials in NFSv4.1 stateful operations, and add support for NFSv4.1 state protection. Bugfix highlights: - Fix another NFSv4 open state recovery race - Fix an NFSv4.1 back channel session regression - Various rpc_pipefs races - Fix another issue with NFSv3 auth negotiation Please note that Labeled NFS does require some additional support from the security subsystem. The relevant changesets have all been reviewed and acked by James Morris." * tag 'nfs-for-3.11-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (54 commits) NFS: Set NFS_CS_MIGRATION for NFSv4 mounts NFSv4.1 Refactor nfs4_init_session and nfs4_init_channel_attrs nfs: have NFSv3 try server-specified auth flavors in turn nfs: have nfs_mount fake up a auth_flavs list when the server didn't provide it nfs: move server_authlist into nfs_try_mount_request nfs: refactor "need_mount" code out of nfs_try_mount SUNRPC: PipeFS MOUNT notification optimization for dying clients SUNRPC: split client creation routine into setup and registration SUNRPC: fix races on PipeFS UMOUNT notifications SUNRPC: fix races on PipeFS MOUNT notifications NFSv4.1 use pnfs_device maxcount for the objectlayout gdia_maxcount NFSv4.1 use pnfs_device maxcount for the blocklayout gdia_maxcount NFSv4.1 Fix gdia_maxcount calculation to fit in ca_maxresponsesize NFS: Improve legacy idmapping fallback NFSv4.1 end back channel session draining NFS: Apply v4.1 capabilities to v4.2 NFSv4.1: Clean up layout segment comparison helper names NFSv4.1: layout segment comparison helpers should take 'const' parameters NFSv4: Move the DNS resolver into the NFSv4 module rpc_pipefs: only set rpc_dentry_ops if d_op isn't already set ...
2013-07-09Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext3 fix and quota cleanup from Jan Kara: "A fix of ext3 error reporting from fsync and a quota cleanup" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: Convert use of typedef ctl_table to struct ctl_table ext3: Fix fsync error handling after filesystem abort.
2013-07-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull third set of VFS updates from Al Viro: "Misc stuff all over the place. There will be one more pile in a couple of days" This is an "evil merge" that also uses the new d_count helper in fs/configfs/dir.c, missed by commit 84d08fa888e7 ("helper for reading ->d_count") * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ncpfs: fix error return code in ncp_parse_options() locks: move file_lock_list to a set of percpu hlist_heads and convert file_lock_lock to an lglock seq_file: add seq_list_*_percpu helpers f2fs: fix readdir incorrectness mode_t whack-a-mole... lustre: kill the pointless wrapper helper for reading ->d_count
2013-07-09fatfs: add FAT_IOCTL_GET_VOLUME_IDMike Lockwood
This patch, originally from Android kernel, adds vfat ioctl command FAT_IOCTL_GET_VOLUME_ID, with this command we can get the vfat volume ID using following code: ioctl(fd, FAT_IOCTL_GET_VOLUME_ID, &volume_ID) This patch is a modified version of the patch by Mike Lockwood, with changes from Dmitry Pervushin, who noticed the original patch makes some volume IDs abiguous with error returns: for example, if volume id is 0xFFFFFDAD, that matches -ENOIOCTLCMD, we get "FFFFFFFF" from the user space. So add a parameter to ioctl to get the correct volume ID. Android uses vfat volume ID to identify different sd card, when a new sd card is inserted to device, android can scan the media on it and pop up new contents. Signed-off-by: Bintian Wang <bintian.wang@linaro.org> Cc: dmitry pervushin <dpervushin@gmail.com> Cc: Mike Lockwood <lockwood@android.com> Cc: Colin Cross <ccross@android.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: John Stultz <john.stultz@linaro.org> Cc: Sean McNeil <sean@mcneil.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09ncpfs: fix error return code in ncp_parse_options()Wei Yongjun
Fix to return -EINVAL from the option parse error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: Petr Vandrovec <petr@vandrovec.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09mm/writeback: don't check force_wait to handle bdi->work_listWanpeng Li
After commit 839a8e8660b6 ("writeback: replace custom worker pool implementation with unbound workqueue"), bdi_writeback_workfn runs off bdi_writeback->dwork, on each execution, it processes bdi->work_list and reschedules if there are more things to do instead of flush any work that race with us existing. It is unecessary to check force_wait in wb_do_writeback since it is always 0 after the mentioned commit. This patch remove the force_wait in wb_do_writeback. Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09fs/fs-writeback.c: : make wb_do_writeback() as staticHaicheng Li
It's not used globally and could be static. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09fsnotify: update comments concerning locking schemeLino Sanfilippo
There have been changes in the locking scheme of fsnotify but the comments in the source code have not been updated yet. This patch corrects this. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09inotify: fix race when adding a new watchLino Sanfilippo
In inotify_new_watch() the number of watches for a group is compared against the max number of allowed watches and increased afterwards. The check and incrementation is not done atomically, so it is possible for multiple concurrent threads to pass the check and increment the number of marks above the allowed max. This patch uses an inotify groups mark_lock to ensure that both check and incrementation are done atomic. Furthermore we dont have to worry about the race that allows a concurrent thread to add a watch just after inotify_update_existing_watch() returned with -ENOENT anymore, since this is also synchronized by the groups mark mutex now. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09dnotify: replace dnotify_mark_mutex with mark mutex of dnotify_groupLino Sanfilippo
There is no need to use a special mutex to protect against the fcntl/close race (see dnotify.c for a description of this race). Instead the dnotify_groups mark mutex can be used. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09fanotify: put duplicate code for adding vfsmount/inode marks into an own ↵Lino Sanfilippo
function The code under the groups mark_mutex in fanotify_add_inode_mark() and fanotify_add_vfsmount_mark() is almost identical. So put it into a seperate function. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09fanotify: fix races when adding/removing marksLino Sanfilippo
For both adding an event to an existing mark and destroying a mark we first have to find it via fsnotify_find_[inode|vfsmount]_mark(). But getting the mark and adding an event (or destroying it) is not done atomically. This opens a race where a thread is about to destroy a mark while another thread still finds the same mark and adds an event to its mask although it will be destroyed. Another race exists concerning the excess of a groups number of marks limit: When a mark is added the number of group marks is checked against the max number of marks per group and increased afterwards. Since check and increment is also not done atomically, this may result in 2 or more processes passing the check at the same time and increasing the number of group marks above the allowed limit. With this patch both races are avoided by doing the concerning operations with the groups mark mutex locked. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09fanotify: info leak in copy_event_to_user()Dan Carpenter
The ->reserved field isn't cleared so we leak one byte of stack information to userspace. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Eric Paris <eparis@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09net: rename low latency sockets functions to busy pollEliezer Tamir
Rename functions in include/net/ll_poll.h to busy wait. Clarify documentation about expected power use increase. Rename POLL_LL to POLL_BUSY_LOOP. Add need_resched() testing to poll/select busy loops. Note, that in select and poll can_busy_poll is dynamic and is updated continuously to reflect the existence of supported sockets with valid queue information. Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-08ncpfs: fix error return code in ncp_parse_options()Wei Yongjun
Fix to return -EINVAL from the option parse error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-08locks: move file_lock_list to a set of percpu hlist_heads and convert ↵Jeff Layton
file_lock_lock to an lglock The file_lock_list is only used for /proc/locks. The vastly common case is for locks to be put onto the list and come off again, without ever being traversed. Help optimize for this use-case by moving to percpu hlist_head-s. At the same time, we can make the locking less contentious by moving to an lglock. When iterating over the lists for /proc/locks, we must take the global lock and then iterate over each CPU's list in turn. This change necessitates a new fl_link_cpu field to keep track of which CPU the entry is on. On x86_64 at least, this field is placed within an existing hole in the struct to avoid growing the size. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-08seq_file: add seq_list_*_percpu helpersJeff Layton
When we convert the file_lock_list to a set of percpu lists, we'll need a way to iterate over them in order to output /proc/locks info. Add some seq_list_*_percpu helpers to handle that. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-08f2fs: fix readdir incorrectnessJaegeuk Kim
In the previous Al Viro's readdir patch set, there occurs a bug when running xfstest: 006 as follows. [Error output] alpha size = 4, name length = 6, total files = 4096, nproc=1 1023 files created rm: cannot remove `/mnt/f2fs/permname.15150/a': Directory not empty [Correct output] alpha size = 4, name length = 6, total files = 4096, nproc=1 4097 files created This bug is due to the misupdate of directory position in ctx. So, this patch fixes this. [AV: fixed a braino] CC: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-06Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core updates from Thomas Gleixner: "The timer changes contain: - posix timer code consolidation and fixes for odd corner cases - sched_clock implementation moved from ARM to core code to avoid duplication by other architectures - alarm timer updates - clocksource and clockevents unregistration facilities - clocksource/events support for new hardware - precise nanoseconds RTC readout (Xen feature) - generic support for Xen suspend/resume oddities - the usual lot of fixes and cleanups all over the place The parts which touch other areas (ARM/XEN) have been coordinated with the relevant maintainers. Though this results in an handful of trivial to solve merge conflicts, which we preferred over nasty cross tree merge dependencies. The patches which have been committed in the last few days are bug fixes plus the posix timer lot. The latter was in akpms queue and next for quite some time; they just got forgotten and Frederic collected them last minute." * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits) hrtimer: Remove unused variable hrtimers: Move SMP function call to thread context clocksource: Reselect clocksource when watchdog validated high-res capability posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting posix_timers: fix racy timer delta caching on task exit posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule() selftests: add basic posix timers selftests posix_cpu_timers: consolidate expired timers check posix_cpu_timers: consolidate timer list cleanups posix_cpu_timer: consolidate expiry time type tick: Sanitize broadcast control logic tick: Prevent uncontrolled switch to oneshot mode tick: Make oneshot broadcast robust vs. CPU offlining x86: xen: Sync the CMOS RTC as well as the Xen wallclock x86: xen: Sync the wallclock when the system time is set timekeeping: Indicate that clock was set in the pvclock gtod notifier timekeeping: Pass flags instead of multiple bools to timekeeping_update() xen: Remove clock_was_set() call in the resume path hrtimers: Support resuming with two or more CPUs online (but stopped) timer: Fix jiffies wrap behavior of round_jiffies_common() ...
2013-07-05Merge tag 'upstream-3.11-rc1' of git://git.infradead.org/linux-ubifsLinus Torvalds
Pull ubifs fix from Artem Bityutskiy: "Only a single patch which fixes a message" * tag 'upstream-3.11-rc1' of git://git.infradead.org/linux-ubifs: UBIFS: correct mount message
2013-07-05helper for reading ->d_countAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-07-04Merge branch 'timers/posix-cpu-timers-for-tglx' ofThomas Gleixner
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/core Frederic sayed: "Most of these patches have been hanging around for several month now, in -mmotm for a significant chunk. They already missed a few releases." Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-07-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "The usual stuff from trivial tree" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits) treewide: relase -> release Documentation/cgroups/memory.txt: fix stat file documentation sysctl/net.txt: delete reference to obsolete 2.4.x kernel spinlock_api_smp.h: fix preprocessor comments treewide: Fix typo in printk doc: device tree: clarify stuff in usage-model.txt. open firmware: "/aliasas" -> "/aliases" md: bcache: Fixed a typo with the word 'arithmetic' irq/generic-chip: fix a few kernel-doc entries frv: Convert use of typedef ctl_table to struct ctl_table sgi: xpc: Convert use of typedef ctl_table to struct ctl_table doc: clk: Fix incorrect wording Documentation/arm/IXP4xx fix a typo Documentation/networking/ieee802154 fix a typo Documentation/DocBook/media/v4l fix a typo Documentation/video4linux/si476x.txt fix a typo Documentation/virtual/kvm/api.txt fix a typo Documentation/early-userspace/README fix a typo Documentation/video4linux/soc-camera.txt fix a typo lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment ...
2013-07-04Merge branch 'hpfs' from Mikulas PatockaLinus Torvalds
Merge hpfs patches from Mikulas Patocka. * emailed patches from Mikulas Patocka <mpatocka@artax.karlin.mff.cuni.cz>: hpfs: implement prefetch to improve performance hpfs: use mpage hpfs: better test for errors
2013-07-04hpfs: implement prefetch to improve performanceMikulas Patocka
This patch implements prefetch to improve performance. It helps mostly when scanning the bitmaps to calculate free space. Signed-off-by: Mikulas Patocka <mpatocka@artax.karlin.mff.cuni.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-04hpfs: use mpageMikulas Patocka
Use the mpage interface to improve performance. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-04hpfs: better test for errorsMikulas Patocka
The test if bitmap access is out of bound could errorneously pass if the device size is divisible by 16384 sectors and we are asking for one bitmap after the end. Check for invalid size in the superblock. Invalid size could cause integer overflows in the rest of the code. Signed-off-by: Mikulas Patocka <mpatocka@artax.karlin.mff.cuni.cz> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-04Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "This is the powerpc changes for the 3.11 merge window. In addition to the usual bug fixes and small updates, the main highlights are: - Support for transparent huge pages by Aneesh Kumar for 64-bit server processors. This allows the use of 16M pages as transparent huge pages on kernels compiled with a 64K base page size. - Base VFIO support for KVM on power by Alexey Kardashevskiy - Wiring up of our nvram to the pstore infrastructure, including putting compressed oopses in there by Aruna Balakrishnaiah - Move, rework and improve our "EEH" (basically PCI error handling and recovery) infrastructure. It is no longer specific to pseries but is now usable by the new "powernv" platform as well (no hypervisor) by Gavin Shan. - I fixed some bugs in our math-emu instruction decoding and made it usable to emulate some optional FP instructions on processors with hard FP that lack them (such as fsqrt on Freescale embedded processors). - Support for Power8 "Event Based Branch" facility by Michael Ellerman. This facility allows what is basically "userspace interrupts" for performance monitor events. - A bunch of Transactional Memory vs. Signals bug fixes and HW breakpoint/watchpoint fixes by Michael Neuling. And more ... I appologize in advance if I've failed to highlight something that somebody deemed worth it." * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits) pstore: Add hsize argument in write_buf call of pstore_ftrace_call powerpc/fsl: add MPIC timer wakeup support powerpc/mpic: create mpic subsystem object powerpc/mpic: add global timer support powerpc/mpic: add irq_set_wake support powerpc/85xx: enable coreint for all the 64bit boards powerpc/8xx: Erroneous double irq_eoi() on CPM IRQ in MPC8xx powerpc/fsl: Enable CONFIG_E1000E in mpc85xx_smp_defconfig powerpc/mpic: Add get_version API both for internal and external use powerpc: Handle both new style and old style reserve maps powerpc/hw_brk: Fix off by one error when validating DAWR region end powerpc/pseries: Support compression of oops text via pstore powerpc/pseries: Re-organise the oops compression code pstore: Pass header size in the pstore write callback powerpc/powernv: Fix iommu initialization again powerpc/pseries: Inform the hypervisor we are using EBB regs powerpc/perf: Add power8 EBB support powerpc/perf: Core EBB support for 64-bit book3s powerpc/perf: Drop MMCRA from thread_struct powerpc/perf: Don't enable if we have zero events ...
2013-07-04quota: Convert use of typedef ctl_table to struct ctl_tableJoe Perches
This typedef is unnecessary and should just be removed. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Jan Kara <jack@suse.cz>
2013-07-04ext3: Fix fsync error handling after filesystem abort.Dmitry Monakhov
If filesystem was aborted we will return success due to (sb->s_flags & MS_RDONLY) which is incorrect and results in data loss. In order to handle fs abort correctly we have to check fs state once we discover that it is in MS_RDONLY state Test case: http://patchwork.ozlabs.org/patch/244297/ Changes from V1: - fix spelling - fix smp_rmb()/debug order Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz>
2013-07-04Merge branch 'akpm' (updates from Andrew Morton)Linus Torvalds
Merge first patch-bomb from Andrew Morton: - various misc bits - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been distracted. There has been quite a bit of activity. - About half the MM queue - Some backlight bits - Various lib/ updates - checkpatch updates - zillions more little rtc patches - ptrace - signals - exec - procfs - rapidio - nbd - aoe - pps - memstick - tools/testing/selftests updates * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (445 commits) tools/testing/selftests: don't assume the x bit is set on scripts selftests: add .gitignore for kcmp selftests: fix clean target in kcmp Makefile selftests: add .gitignore for vm selftests: add hugetlbfstest self-test: fix make clean selftests: exit 1 on failure kernel/resource.c: remove the unneeded assignment in function __find_resource aio: fix wrong comment in aio_complete() drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode drivers/memstick/host/r592.c: convert to module_pci_driver drivers/memstick/host/jmb38x_ms: convert to module_pci_driver pps-gpio: add device-tree binding and support drivers/pps/clients/pps-gpio.c: convert to module_platform_driver drivers/pps/clients/pps-gpio.c: convert to devm_* helpers drivers/parport/share.c: use kzalloc Documentation/accounting/getdelays.c: avoid strncpy in accounting tool aoe: update internal version number to v83 aoe: update copyright date aoe: perform I/O completions in parallel ...
2013-07-03aio: fix wrong comment in aio_complete()Tang Chen
ctx->ctx_lock should be ctx->completion_lock. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03fs/exec.c:de_thread: mt-exec should update ->real_start_timeOleg Nesterov
924b42d5 ("Use boot based time for process start time and boot time in /proc") updated copy_process/do_task_stat but forgot about de_thread(). This breaks "ps axOT" if a sub-thread execs. Note: I think that task->start_time should die. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Tomas Janousek <tjanouse@redhat.com> Cc: Tomas Smetana <tsmetana@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03fs/exec.c: do_execve_common(): use current_user()Oleg Nesterov
Trivial cleanup. do_execve_common() can use current_user() and avoid the unnecessary "struct cred *cred" var. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03fs/proc/kcore.c: using strlcpy() instead of strncpy()Zhao Hongjiang
For NUL terminated string, set '\0' at the end. Signed-off-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03fs/proc/uptime.c:uptime_proc_show(): use get_monotonic_boottime()Oleg Nesterov
Change uptime_proc_show() to use get_monotonic_boottime() instead of do_posix_clock_monotonic_gettime() + monotonic_to_bootbased(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Tomas Janousek <tjanouse@redhat.com> Cc: Tomas Smetana <tsmetana@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03fs/exec.c:de_thread(): use change_pid() rather than detach_pid/attach_pidOleg Nesterov
de_thread() can use change_pid() instead of detach + attach. This looks better and this ensures that, say, next_thread() can never see a task with ->pid == NULL. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Sergey Dyasly <dserrg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03coredump: '% at the end' shouldn't bypass core_uses_pid logicOleg Nesterov
"goto end" should not bypass the "Backward compatibility with core_uses_pid" code, move this label up. While at it, - It is ugly to copy '|' into cn->corename and then inc the pointer for argv_split(). Change format_corename() to increment pat_ptr instead. - Remove the dead "if (*pat_ptr == 0)" in format_corename(), we already checked it is not zero. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Colin Walters <walters@verbum.org> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Lucas De Marchi <lucas.de.marchi@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03coredump: kill call_count, add core_name_sizeOleg Nesterov
Imho, "atomic_t call_count" is ugly and should die. It buys nothing and in fact it can grow more than necessary, expand doesn't check if it was already incremented by another task. Kill it, and introduce "static int core_name_size" updated by expand_corename(). This is obviously racy too but harmless, and core_name_size never grows for no reason. We do not bother to to calculate the "right" new size, we simply do kmalloc(size_we_need) and use ksize() to rely on kmalloc_index's decision. Finally change format_corename() to use expand_corename(), krealloc(NULL) is fine. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Colin Walters <walters@verbum.org> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Lucas De Marchi <lucas.de.marchi@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03coredump: kill cn_escape(), introduce cn_esc_printf()Oleg Nesterov
The usage of cn_escape() looks really annoying, imho this sequence needs a wrapper. And it is buggy. If cn_printf() does expand_corename() cn_escape() writes to the freed memory. Introduce cn_esc_printf() which hopefully does this all right. It records the index before cn_vprintf(), not "char *" which is no longer valid (in general) after krealloc(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Colin Walters <walters@verbum.org> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Lucas De Marchi <lucas.de.marchi@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03coredump: cn_vprintf() has no reason to call vsnprintf() twiceOleg Nesterov
cn_vprintf() looks really overcomplicated and sub-optimal. We do not need vsnprintf(NULL) to calculate the size we need, we can simply try to print into the current buffer and expand/retry only if necessary. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Colin Walters <walters@verbum.org> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Lucas De Marchi <lucas.de.marchi@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03coredump: introduce cn_vprintf()Oleg Nesterov
Turn cn_printf(...) into cn_vprintf(va_list args), reintroduce cn_printf() as a trivial wrapper. This simplifies the next change and cn_vprintf() will have more callers. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Colin Walters <walters@verbum.org> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Lucas De Marchi <lucas.de.marchi@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03coredump: format_corename() can leak cn->corenameOleg Nesterov
do_coredump() assumes that format_corename() can only fail if expand_corename() fails and frees cn->corename. This is not true, for example cn_print_exe_file() can fail and in this case nobody frees cn->corename. Change do_coredump() to always do kfree(cn->corename) after it calls format_corename() (NULL is fine), change expand_corename() to do nothing if kmalloc() fails. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Colin Walters <walters@verbum.org> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Lennart Poettering <mzxreary@0pointer.de> Cc: Lucas De Marchi <lucas.de.marchi@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-03signals: eventpoll: do not use sigprocmask()Oleg Nesterov
sigprocmask() should die. None of the current callers actually need this strange interface. Change fs/eventpoll.c to use set_current_blocked(). This also means we should not worry about SIGKILL/SIGSTOP. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eric Wong <normalperson@yhbt.net> Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>