summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2014-04-03printk: fix one circular lockdep warning about console_lockJane Li
Fix a warning about possible circular locking dependency. If do in following sequence: enter suspend -> resume -> plug-out CPUx (echo 0 > cpux/online) lockdep will show warning as following: ====================================================== [ INFO: possible circular locking dependency detected ] 3.10.0 #2 Tainted: G O ------------------------------------------------------- sh/1271 is trying to acquire lock: (console_lock){+.+.+.}, at: console_cpu_notify+0x20/0x2c but task is already holding lock: (cpu_hotplug.lock){+.+.+.}, at: cpu_hotplug_begin+0x2c/0x58 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (cpu_hotplug.lock){+.+.+.}: lock_acquire+0x98/0x12c mutex_lock_nested+0x50/0x3d8 cpu_hotplug_begin+0x2c/0x58 _cpu_up+0x24/0x154 cpu_up+0x64/0x84 smp_init+0x9c/0xd4 kernel_init_freeable+0x78/0x1c8 kernel_init+0x8/0xe4 ret_from_fork+0x14/0x2c -> #1 (cpu_add_remove_lock){+.+.+.}: lock_acquire+0x98/0x12c mutex_lock_nested+0x50/0x3d8 disable_nonboot_cpus+0x8/0xe8 suspend_devices_and_enter+0x214/0x448 pm_suspend+0x1e4/0x284 try_to_suspend+0xa4/0xbc process_one_work+0x1c4/0x4fc worker_thread+0x138/0x37c kthread+0xa4/0xb0 ret_from_fork+0x14/0x2c -> #0 (console_lock){+.+.+.}: __lock_acquire+0x1b38/0x1b80 lock_acquire+0x98/0x12c console_lock+0x54/0x68 console_cpu_notify+0x20/0x2c notifier_call_chain+0x44/0x84 __cpu_notify+0x2c/0x48 cpu_notify_nofail+0x8/0x14 _cpu_down+0xf4/0x258 cpu_down+0x24/0x40 store_online+0x30/0x74 dev_attr_store+0x18/0x24 sysfs_write_file+0x16c/0x19c vfs_write+0xb4/0x190 SyS_write+0x3c/0x70 ret_fast_syscall+0x0/0x48 Chain exists of: console_lock --> cpu_add_remove_lock --> cpu_hotplug.lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(cpu_hotplug.lock); lock(cpu_add_remove_lock); lock(cpu_hotplug.lock); lock(console_lock); *** DEADLOCK *** There are three locks involved in two sequence: a) pm suspend: console_lock (@suspend_console()) cpu_add_remove_lock (@disable_nonboot_cpus()) cpu_hotplug.lock (@_cpu_down()) b) Plug-out CPUx: cpu_add_remove_lock (@(cpu_down()) cpu_hotplug.lock (@_cpu_down()) console_lock (@console_cpu_notify()) => Lockdeps prints warning log. There should be not real deadlock, as flag of console_suspended can protect this. Although console_suspend() releases console_sem, it doesn't tell lockdep about it. That results in the lockdep warning about circular locking when doing the following: enter suspend -> resume -> plug-out CPUx (echo 0 > cpux/online) Fix the problem by telling lockdep we actually released the semaphore in console_suspend() and acquired it again in console_resume(). Signed-off-by: Jane Li <jiel@marvell.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03printk: do not compute the size of the message twicePetr Mladek
This is just a tiny optimization. It removes duplicate computation of the message size. Signed-off-by: Petr Mladek <pmladek@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03printk: use also the last bytes in the ring bufferPetr Mladek
It seems that we have newer used the last byte in the ring buffer. In fact, we have newer used the last 4 bytes because of padding. First problem is in the check for free space. The exact number of free bytes is enough to store the length of data. Second problem is in the check where the ring buffer is rotated. The left side counts the first unused index. It is unused, so it might be the same as the size of the buffer. Note that the first problem has to be fixed together with the second one. Otherwise, the buffer is rotated even when there is enough space on the end of the buffer. Then the beginning of the buffer is rewritten and valid entries get corrupted. Signed-off-by: Petr Mladek <pmladek@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03printk: add comment about tricky check for text buffer sizePetr Mladek
There is no check for potential "text_len" overflow. It is not needed because only valid level is detected. It took me some time to understand why. It would deserve a comment ;-) Signed-off-by: Petr Mladek <pmladek@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03printk: remove obsolete check for log level "c"Petr Mladek
The kernel log level "c" was removed in commit 61e99ab8e35a ("printk: remove the now unnecessary "C" annotation for KERN_CONT"). It is no longer detected in printk_get_level(). Hence we do not need to check it in vprintk_emit. Signed-off-by: Petr Mladek <pmladek@suse.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Michal Hocko <mhocko@suse.cz> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03kernel/resource.c: make reallocate_resource() staticDaeseok Youn
sparse says: kernel/resource.c:518:5: warning: symbol 'reallocate_resource' was not declared. Should it be static? Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03kernel: audit/fix non-modular users of module_init in core codePaul Gortmaker
Code that is obj-y (always built-in) or dependent on a bool Kconfig (built-in or absent) can never be modular. So using module_init as an alias for __initcall can be somewhat misleading. Fix these up now, so that we can relocate module_init from init.h into module.h in the future. If we don't do this, we'd have to add module.h to obviously non-modular code, and that would be a worse thing. The audit targets the following module_init users for change: kernel/user.c obj-y kernel/kexec.c bool KEXEC (one instance per arch) kernel/profile.c bool PROFILING kernel/hung_task.c bool DETECT_HUNG_TASK kernel/sched/stats.c bool SCHEDSTATS kernel/user_namespace.c bool USER_NS Note that direct use of __initcall is discouraged, vs. one of the priority categorized subgroups. As __initcall gets mapped onto device_initcall, our use of subsys_initcall (which makes sense for these files) will thus change this registration from level 6-device to level 4-subsys (i.e. slightly earlier). However no observable impact of that difference has been observed during testing. Also, two instances of missing ";" at EOL are fixed in kexec. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03fs, kernel: permit disabling the uselib syscallJosh Triplett
uselib hasn't been used since libc5; glibc does not use it. Support turning it off. When disabled, also omit the load_elf_library implementation from binfmt_elf.c, which only uselib invokes. bloat-o-meter: add/remove: 0/4 grow/shrink: 0/1 up/down: 0/-785 (-785) function old new delta padzero 39 36 -3 uselib_flags 20 - -20 sys_uselib 168 - -168 SyS_uselib 168 - -168 load_elf_library 426 - -426 The new CONFIG_USELIB defaults to `y'. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03kernel/groups.c: remove return value of set_groupsWang YanQing
After commit 6307f8fee295 ("security: remove dead hook task_setgroups"), set_groups will always return zero, so we could just remove return value of set_groups. This patch reduces code size, and simplfies code to use set_groups, because we don't need to check its return value any more. [akpm@linux-foundation.org: remove obsolete claims from set_groups() comment] Signed-off-by: Wang YanQing <udknight@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03sys_sysfs: Add CONFIG_SYSFS_SYSCALLFabian Frederick
sys_sysfs is an obsolete system call no longer supported by libc. - This patch adds a default CONFIG_SYSFS_SYSCALL=y - Option can be turned off in expert mode. - cond_syscall added to kernel/sys_ni.c [akpm@linux-foundation.org: tweak Kconfig help text] Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03drop_caches: add some documentation and info messageDave Hansen
There is plenty of anecdotal evidence and a load of blog posts suggesting that using "drop_caches" periodically keeps your system running in "tip top shape". Perhaps adding some kernel documentation will increase the amount of accurate data on its use. If we are not shrinking caches effectively, then we have real bugs. Using drop_caches will simply mask the bugs and make them harder to find, but certainly does not fix them, nor is it an appropriate "workaround" to limit the size of the caches. On the contrary, there have been bug reports on issues that turned out to be misguided use of cache dropping. Dropping caches is a very drastic and disruptive operation that is good for debugging and running tests, but if it creates bug reports from production use, kernel developers should be aware of its use. Add a bit more documentation about it, a syslog message to track down abusers, and vmstat drop counters to help analyze problem reports. [akpm@linux-foundation.org: checkpatch fixes] [hannes@cmpxchg.org: add runtime suppression control] Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03mm: optimize put_mems_allowed() usageMel Gorman
Since put_mems_allowed() is strictly optional, its a seqcount retry, we don't need to evaluate the function if the allocation was in fact successful, saving a smp_rmb some loads and comparisons on some relative fast-paths. Since the naming, get/put_mems_allowed() does suggest a mandatory pairing, rename the interface, as suggested by Mel, to resemble the seqcount interface. This gives us: read_mems_allowed_begin() and read_mems_allowed_retry(), where it is important to note that the return value of the latter call is inverted from its previous incarnation. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03kernel/watchdog.c: touch_nmi_watchdog should only touch local cpu not every oneBen Zhang
I ran into a scenario where while one cpu was stuck and should have panic'd because of the NMI watchdog, it didn't. The reason was another cpu was spewing stack dumps on to the console. Upon investigation, I noticed that when writing to the console and also when dumping the stack, the watchdog is touched. This causes all the cpus to reset their NMI watchdog flags and the 'stuck' cpu just spins forever. This change causes the semantics of touch_nmi_watchdog to be changed slightly. Previously, I accidentally changed the semantics and we noticed there was a codepath in which touch_nmi_watchdog could be touched from a preemtible area. That caused a BUG() to happen when CONFIG_DEBUG_PREEMPT was enabled. I believe it was the acpi code. My attempt here re-introduces the change to have the touch_nmi_watchdog() code only touch the local cpu instead of all of the cpus. But instead of using __get_cpu_var(), I use the __raw_get_cpu_var() version. This avoids the preemption problem. However my reasoning wasn't because I was trying to be lazy. Instead I rationalized it as, well if preemption is enabled then interrupts should be enabled to and the NMI watchdog will have no reason to trigger. So it won't matter if the wrong cpu is touched because the percpu interrupt counters the NMI watchdog uses should still be incrementing. Don said: : I'm ok with this patch, though it does alter the behaviour of how : touch_nmi_watchdog works. For the most part I don't think most callers : need to touch all of the watchdogs (on each cpu). Perhaps a corner case : will pop up (the scheduler?? to mimic touch_all_softlockup_watchdogs() ). : : But this does address an issue where if a system is locked up and one cpu : is spewing out useful debug messages (or error messages), the hard lockup : will fail to go off. We have seen this on RHEL also. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Ben Zhang <benzh@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03kthread: ensure locality of task_struct allocationsNishanth Aravamudan
In the presence of memoryless nodes, numa_node_id() will return the current CPU's NUMA node, but that may not be where we expect to allocate from memory from. Instead, we should rely on the fallback code in the memory allocator itself, by using NUMA_NO_NODE. Also, when calling kthread_create_on_node(), use the nearest node with memory to the cpu in question, rather than the node it is running on. Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Reviewed-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Anton Blanchard <anton@samba.org> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Ben Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "Apart from reordering the SELinux mmap code to ensure DAC is called before MAC, these are minor maintenance updates" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits) selinux: correctly label /proc inodes in use before the policy is loaded selinux: put the mmap() DAC controls before the MAC controls selinux: fix the output of ./scripts/get_maintainer.pl for SELinux evm: enable key retention service automatically ima: skip memory allocation for empty files evm: EVM does not use MD5 ima: return d_name.name if d_path fails integrity: fix checkpatch errors ima: fix erroneous removal of security.ima xattr security: integrity: Use a more current logging style MAINTAINERS: email updates and other misc. changes ima: reduce memory usage when a template containing the n field is used ima: restore the original behavior for sending data with ima template Integrity: Pass commname via get_task_comm() fs: move i_readcount ima: use static const char array definitions security: have cap_dentry_init_security return error ima: new helper: file_inode(file) kernel: Mark function as static in kernel/seccomp.c capability: Use current logging styles ...
2014-04-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Here is my initial pull request for the networking subsystem during this merge window: 1) Support for ESN in AH (RFC 4302) from Fan Du. 2) Add full kernel doc for ethtool command structures, from Ben Hutchings. 3) Add BCM7xxx PHY driver, from Florian Fainelli. 4) Export computed TCP rate information in netlink socket dumps, from Eric Dumazet. 5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas Dichtel. 6) Convert many drivers to pci_enable_msix_range(), from Alexander Gordeev. 7) Record SKB timestamps more efficiently, from Eric Dumazet. 8) Switch to microsecond resolution for TCP round trip times, also from Eric Dumazet. 9) Clean up and fix 6lowpan fragmentation handling by making use of the existing inet_frag api for it's implementation. 10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss. 11) Auto size SKB lengths when composing netlink messages based upon past message sizes used, from Eric Dumazet. 12) qdisc dumps can take a long time, add a cond_resched(), From Eric Dumazet. 13) Sanitize netpoll core and drivers wrt. SKB handling semantics. Get rid of never-used-in-tree netpoll RX handling. From Eric W Biederman. 14) Support inter-address-family and namespace changing in VTI tunnel driver(s). From Steffen Klassert. 15) Add Altera TSE driver, from Vince Bridgers. 16) Optimizing csum_replace2() so that it doesn't adjust the checksum by checksumming the entire header, from Eric Dumazet. 17) Expand BPF internal implementation for faster interpreting, more direct translations into JIT'd code, and much cleaner uses of BPF filtering in non-socket ocntexts. From Daniel Borkmann and Alexei Starovoitov" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits) netpoll: Use skb_irq_freeable to make zap_completion_queue safe. net: Add a test to see if a skb is freeable in irq context qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port' net: ptp: move PTP classifier in its own file net: sxgbe: make "core_ops" static net: sxgbe: fix logical vs bitwise operation net: sxgbe: sxgbe_mdio_register() frees the bus Call efx_set_channels() before efx->type->dimension_resources() xen-netback: disable rogue vif in kthread context net/mlx4: Set proper build dependancy with vxlan be2net: fix build dependency on VxLAN mac802154: make csma/cca parameters per-wpan mac802154: allow only one WPAN to be up at any given time net: filter: minor: fix kdoc in __sk_run_filter netlink: don't compare the nul-termination in nla_strcmp can: c_can: Avoid led toggling for every packet. can: c_can: Simplify TX interrupt cleanup can: c_can: Store dlc private can: c_can: Reduce register access can: c_can: Make the code readable ...
2014-04-02Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "Usual rocket science -- mostly documentation and comment updates" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: sparse: fix comment doc: fix double words isdn: capi: fix "CAPI_VERSION" comment doc: DocBook: Fix typos in xml and template file Bluetooth: add module name for btwilink driver core: unexport static function create_syslog_header mmc: core: typo fix in printk specifier ARM: spear: clean up editing mistake net-sysfs: fix comment typo 'CONFIG_SYFS' doc: Insert MODULE_ in module-signing macros Documentation: update URL to hfsplus Technote 1150 gpio: update path to documentation ixgbe: Fix format string in ixgbe_fcoe. Kconfig: Remove useless "default N" lines user_namespace.c: Remove duplicated word in comment CREDITS: fix formatting treewide: Fix typo in Documentation/DocBook mm: Fix warning on make htmldocs caused by slab.c ata: ata-samsung_cf: cleanup in header file idr: remove unused prototype of idr_free()
2014-04-02Merge branch 'sched-idle-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull sched/idle changes from Ingo Molnar: "More idle code reorganization, to prepare for more integration. (Sent separately because it depended on pending timer work, which is now upstream)" * 'sched-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/idle: Add more comments to the code sched/idle: Move idle conditions in cpuidle_idle main function sched/idle: Reorganize the idle loop cpuidle/idle: Move the cpuidle_idle_call function to idle.c idle/cpuidle: Split cpuidle_idle_call main function into smaller functions
2014-04-02pid_namespace: pidns_get() should check task_active_pid_ns() != NULLOleg Nesterov
pidns_get()->get_pid_ns() can hit ns == NULL. This task_struct can't go away, but task_active_pid_ns(task) is NULL if release_task(task) was already called. Alternatively we could change get_pid_ns(ns) to check ns != NULL, but it seems that other callers are fine. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman ebiederm@xmission.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-02Merge branch 'x86-x32-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull compat time conversion changes from Peter Anvin: "Despite the branch name this is really neither an x86 nor an x32-specific patchset, although it the implementation of the discussions that followed the x32 security hole a few months ago. This removes get/put_compat_timespec/val() and replaces them with compat_get/put_timespec/val() which are savvy as to the current status of COMPAT_USE_64BIT_TIME. It removes several unused and/or incorrect/misleading functions (like compat_put_timeval_convert which doesn't in fact do any conversion) and also replaces several open-coded implementations what is now called compat_convert_timespec() with that function" * 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: compat: Fix sparse address space warnings compat: Get rid of (get|put)_compat_time(val|spec)
2014-04-02Merge branch 'for-3.15/core' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block layer updates from Jens Axboe: "This is the pull request for the core block IO bits for the 3.15 kernel. It's a smaller round this time, it contains: - Various little blk-mq fixes and additions from Christoph and myself. - Cleanup of the IPI usage from the block layer, and associated helper code. From Frederic Weisbecker and Jan Kara. - Duplicate code cleanup in bio-integrity from Gu Zheng. This will give you a merge conflict, but that should be easy to resolve. - blk-mq notify spinlock fix for RT from Mike Galbraith. - A blktrace partial accounting bug fix from Roman Pen. - Missing REQ_SYNC detection fix for blk-mq from Shaohua Li" * 'for-3.15/core' of git://git.kernel.dk/linux-block: (25 commits) blk-mq: add REQ_SYNC early rt,blk,mq: Make blk_mq_cpu_notify_lock a raw spinlock blk-mq: support partial I/O completions blk-mq: merge blk_mq_insert_request and blk_mq_run_request blk-mq: remove blk_mq_alloc_rq blk-mq: don't dump CPU -> hw queue map on driver load blk-mq: fix wrong usage of hctx->state vs hctx->flags blk-mq: allow blk_mq_init_commands() to return failure block: remove old blk_iopoll_enabled variable blktrace: fix accounting of partially completed requests smp: Rename __smp_call_function_single() to smp_call_function_single_async() smp: Remove wait argument from __smp_call_function_single() watchdog: Simplify a little the IPI call smp: Move __smp_call_function_single() below its safe version smp: Consolidate the various smp_call_function_single() declensions smp: Teach __smp_call_function_single() to check for offline cpus smp: Remove unused list_head from csd smp: Iterate functions through llist_for_each_entry_safe() block: Stop abusing rq->csd.list in blk-softirq block: Remove useless IPI struct initialization ...
2014-04-01Merge tag 'pci-v3.15-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI changes from Bjorn Helgaas: "Enumeration - Increment max correctly in pci_scan_bridge() (Andreas Noever) - Clarify the "scan anyway" comment in pci_scan_bridge() (Andreas Noever) - Assign CardBus bus number only during the second pass (Andreas Noever) - Use request_resource_conflict() instead of insert_ for bus numbers (Andreas Noever) - Make sure bus number resources stay within their parents bounds (Andreas Noever) - Remove pci_fixup_parent_subordinate_busnr() (Andreas Noever) - Check for child busses which use more bus numbers than allocated (Andreas Noever) - Don't scan random busses in pci_scan_bridge() (Andreas Noever) - x86: Drop pcibios_scan_root() check for bus already scanned (Bjorn Helgaas) - x86: Use pcibios_scan_root() instead of pci_scan_bus_with_sysdata() (Bjorn Helgaas) - x86: Use pcibios_scan_root() instead of pci_scan_bus_on_node() (Bjorn Helgaas) - x86: Merge pci_scan_bus_on_node() into pcibios_scan_root() (Bjorn Helgaas) - x86: Drop return value of pcibios_scan_root() (Bjorn Helgaas) NUMA - x86: Add x86_pci_root_bus_node() to look up NUMA node from PCI bus (Bjorn Helgaas) - x86: Use x86_pci_root_bus_node() instead of get_mp_bus_to_node() (Bjorn Helgaas) - x86: Remove mp_bus_to_node[], set_mp_bus_to_node(), get_mp_bus_to_node() (Bjorn Helgaas) - x86: Use NUMA_NO_NODE, not -1, for unknown node (Bjorn Helgaas) - x86: Remove acpi_get_pxm() usage (Bjorn Helgaas) - ia64: Use NUMA_NO_NODE, not MAX_NUMNODES, for unknown node (Bjorn Helgaas) - ia64: Remove acpi_get_pxm() usage (Bjorn Helgaas) - ACPI: Fix acpi_get_node() prototype (Bjorn Helgaas) Resource management - i2o: Fix and refactor PCI space allocation (Bjorn Helgaas) - Add resource_contains() (Bjorn Helgaas) - Add %pR support for IORESOURCE_UNSET (Bjorn Helgaas) - Mark resources as IORESOURCE_UNSET if we can't assign them (Bjorn Helgaas) - Don't clear IORESOURCE_UNSET when updating BAR (Bjorn Helgaas) - Check IORESOURCE_UNSET before updating BAR (Bjorn Helgaas) - Don't try to claim IORESOURCE_UNSET resources (Bjorn Helgaas) - Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit (Bjorn Helgaas) - Don't enable decoding if BAR hasn't been assigned an address (Bjorn Helgaas) - Add "weak" generic pcibios_enable_device() implementation (Bjorn Helgaas) - alpha, microblaze, sh, sparc, tile: Use default pcibios_enable_device() (Bjorn Helgaas) - s390: Use generic pci_enable_resources() (Bjorn Helgaas) - Don't check resource_size() in pci_bus_alloc_resource() (Bjorn Helgaas) - Set type in __request_region() (Bjorn Helgaas) - Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() (Bjorn Helgaas) - Change pci_bus_alloc_resource() type_mask to unsigned long (Bjorn Helgaas) - Log IDE resource quirk in dmesg (Bjorn Helgaas) - Revert "[PATCH] Insert GART region into resource map" (Bjorn Helgaas) PCI device hotplug - Make check_link_active() non-static (Rajat Jain) - Use link change notifications for hot-plug and removal (Rajat Jain) - Enable link state change notifications (Rajat Jain) - Don't disable the link permanently during removal (Rajat Jain) - Don't check adapter or latch status while disabling (Rajat Jain) - Disable link notification across slot reset (Rajat Jain) - Ensure very fast hotplug events are also processed (Rajat Jain) - Add hotplug_lock to serialize hotplug events (Rajat Jain) - Remove a non-existent card, regardless of "surprise" capability (Rajat Jain) - Don't turn slot off when hot-added device already exists (Yijing Wang) MSI - Keep pci_enable_msi() documentation (Alexander Gordeev) - ahci: Fix broken single MSI fallback (Alexander Gordeev) - ahci, vfio: Use pci_enable_msi_range() (Alexander Gordeev) - Check kmalloc() return value, fix leak of name (Greg Kroah-Hartman) - Fix leak of msi_attrs (Greg Kroah-Hartman) - Fix pci_msix_vec_count() htmldocs failure (Masanari Iida) Virtualization - Device-specific ACS support (Alex Williamson) Freescale i.MX6 - Wait for retraining (Marek Vasut) Marvell MVEBU - Use Device ID and revision from underlying endpoint (Andrew Lunn) - Fix incorrect size for PCI aperture resources (Jason Gunthorpe) - Call request_resource() on the apertures (Jason Gunthorpe) - Fix potential issue in range parsing (Jean-Jacques Hiblot) Renesas R-Car - Check platform_get_irq() return code (Ben Dooks) - Add error interrupt handling (Ben Dooks) - Fix bridge logic configuration accesses (Ben Dooks) - Register each instance independently (Magnus Damm) - Break out window size handling (Magnus Damm) - Make the Kconfig dependencies more generic (Magnus Damm) Synopsys DesignWare - Fix RC BAR to be single 64-bit non-prefetchable memory (Mohit Kumar) Miscellaneous - Remove unused SR-IOV VF Migration support (Bjorn Helgaas) - Enable INTx if BIOS left them disabled (Bjorn Helgaas) - Fix hex vs decimal typo in cpqhpc_probe() (Dan Carpenter) - Clean up par-arch object file list (Liviu Dudau) - Set IORESOURCE_ROM_SHADOW only for the default VGA device (Sander Eikelenboom) - ACPI, ARM, drm, powerpc, pcmcia, PCI: Use list_for_each_entry() for bus traversal (Yijing Wang) - Fix pci_bus_b() build failure (Paul Gortmaker)" * tag 'pci-v3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (108 commits) Revert "[PATCH] Insert GART region into resource map" PCI: Log IDE resource quirk in dmesg PCI: Change pci_bus_alloc_resource() type_mask to unsigned long PCI: Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() resources: Set type in __request_region() PCI: Don't check resource_size() in pci_bus_alloc_resource() s390/PCI: Use generic pci_enable_resources() tile PCI RC: Use default pcibios_enable_device() sparc/PCI: Use default pcibios_enable_device() (Leon only) sh/PCI: Use default pcibios_enable_device() microblaze/PCI: Use default pcibios_enable_device() alpha/PCI: Use default pcibios_enable_device() PCI: Add "weak" generic pcibios_enable_device() implementation PCI: Don't enable decoding if BAR hasn't been assigned an address PCI: Enable INTx in pci_reenable_device() only when MSI/MSI-X not enabled PCI: Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit PCI: Don't try to claim IORESOURCE_UNSET resources PCI: Check IORESOURCE_UNSET before updating BAR PCI: Don't clear IORESOURCE_UNSET when updating BAR PCI: Mark resources as IORESOURCE_UNSET if we can't assign them ... Conflicts: arch/x86/include/asm/topology.h drivers/ata/ahci.c
2014-04-01Merge tag 'pm+acpi-3.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: "The majority of this material spent some time in linux-next, some of it even several weeks. There are a few relatively fresh commits in it, but they are mostly fixes and simple cleanups. ACPI took the lead this time, both in terms of the number of commits and the number of modified lines of code, cpufreq follows and there are a few changes in the PM core and in cpuidle too. A new feature that already got some LWN.net's attention is the device PM QoS extension allowing latency tolerance requirements to be propagated from leaf devices to their ancestors with hardware interfaces for specifying latency tolerance. That should help systems with hardware-driven power management to avoid going too far with it in cases when there are latency tolerance constraints. There also are some significant changes in the ACPI core related to the way in which hotplug notifications are handled. They affect PCI hotplug (ACPIPHP) and the ACPI dock station code too. The bottom line is that all those notification now go through the root notify handler and are propagated to the interested subsystems by means of callbacks instead of having to install a notify handler for each device object that we can potentially get hotplug notifications for. In addition to that ACPICA will now advertise "Windows 2013" compatibility for _OSI, because some systems out there don't work correctly if that is not done (some of them don't even boot). On the system suspend side of things, all of the device suspend and resume callbacks, except for ->prepare() and ->complete(), are now going to be executed asynchronously as that turns out to speed up system suspend and resume on some platforms quite significantly and we have a few more optimizations in that area. Apart from that, there are some new device IDs and fixes and cleanups all over. In particular, the system suspend and resume handling by cpufreq should be improved and the cpuidle menu governor should be a bit more robust now. Specifics: - Device PM QoS support for latency tolerance constraints on systems with hardware interfaces allowing such constraints to be specified. That is necessary to prevent hardware-driven power management from becoming overly aggressive on some systems and to prevent power management features leading to excessive latencies from being used in some cases. - Consolidation of the handling of ACPI hotplug notifications for device objects. This causes all device hotplug notifications to go through the root notify handler (that was executed for all of them anyway before) that propagates them to individual subsystems, if necessary, by executing callbacks provided by those subsystems (those callbacks are associated with struct acpi_device objects during device enumeration). As a result, the code in question becomes both smaller in size and more straightforward and all of those changes should not affect users. - ACPICA update, including fixes related to the handling of _PRT in cases when it is broken and the addition of "Windows 2013" to the list of supported "features" for _OSI (which is necessary to support systems that work incorrectly or don't even boot without it). Changes from Bob Moore and Lv Zheng. - Consolidation of ACPI _OST handling from Jiang Liu. - ACPI battery and AC fixes allowing unusual system configurations to be handled by that code from Alexander Mezin. - New device IDs for the ACPI LPSS driver from Chiau Ee Chew. - ACPI fan and thermal optimizations related to system suspend and resume from Aaron Lu. - Cleanups related to ACPI video from Jean Delvare. - Assorted ACPI fixes and cleanups from Al Stone, Hanjun Guo, Lan Tianyu, Paul Bolle, Tomasz Nowicki. - Intel RAPL (Running Average Power Limits) driver cleanups from Jacob Pan. - intel_pstate fixes and cleanups from Dirk Brandewie. - cpufreq fixes related to system suspend/resume handling from Viresh Kumar. - cpufreq core fixes and cleanups from Viresh Kumar, Stratos Karafotis, Saravana Kannan, Rashika Kheria, Joe Perches. - cpufreq drivers updates from Viresh Kumar, Zhuoyu Zhang, Rob Herring. - cpuidle fixes related to the menu governor from Tuukka Tikkanen. - cpuidle fix related to coupled CPUs handling from Paul Burton. - Asynchronous execution of all device suspend and resume callbacks, except for ->prepare and ->complete, during system suspend and resume from Chuansheng Liu. - Delayed resuming of runtime-suspended devices during system suspend for the PCI bus type and ACPI PM domain. - New set of PM helper routines to allow device runtime PM callbacks to be used during system suspend and resume more easily from Ulf Hansson. - Assorted fixes and cleanups in the PM core from Geert Uytterhoeven, Prabhakar Lad, Philipp Zabel, Rashika Kheria, Sebastian Capella. - devfreq fix from Saravana Kannan" * tag 'pm+acpi-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (162 commits) PM / devfreq: Rewrite devfreq_update_status() to fix multiple bugs PM / sleep: Correct whitespace errors in <linux/pm.h> intel_pstate: Set core to min P state during core offline cpufreq: Add stop CPU callback to cpufreq_driver interface cpufreq: Remove unnecessary braces cpufreq: Fix checkpatch errors and warnings cpufreq: powerpc: add cpufreq transition latency for FSL e500mc SoCs MAINTAINERS: Reorder maintainer addresses for PM and ACPI PM / Runtime: Update runtime_idle() documentation for return value meaning video / output: Drop display output class support fujitsu-laptop: Drop unneeded include acer-wmi: Stop selecting VIDEO_OUTPUT_CONTROL ACPI / gpu / drm: Stop selecting VIDEO_OUTPUT_CONTROL ACPI / video: fix ACPI_VIDEO dependencies cpufreq: remove unused notifier: CPUFREQ_{SUSPENDCHANGE|RESUMECHANGE} cpufreq: Do not allow ->setpolicy drivers to provide ->target cpufreq: arm_big_little: set 'physical_cluster' for each CPU cpufreq: arm_big_little: make vexpress driver depend on bL core driver ACPI / button: Add ACPI Button event via netlink routine ACPI: Remove duplicate definitions of PREFIX ...
2014-04-01Merge branch 'irq-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq code updates from Thomas Gleixner: "The irq department proudly presents: - Another tree wide sweep of irq infrastructure abuse. Clear winner of the trainwreck engineering contest was: #include "../../../kernel/irq/settings.h" - Tree wide update of irq_set_affinity() callbacks which miss a cpu online check when picking a single cpu out of the affinity mask. - Tree wide consolidation of interrupt statistics. - Updates to the threaded interrupt infrastructure to allow explicit wakeup of the interrupt thread and a variant of synchronize_irq() which synchronizes only the hard interrupt handler. Both are needed to replace the homebrewn thread handling in the mmc/sdhci code. - New irq chip callbacks to allow proper support for GPIO based irqs. The GPIO based interrupts need to request/release GPIO resources from request/free_irq. - A few new ARM interrupt chips. No revolutionary new hardware, just differently wreckaged variations of the scheme. - Small improvments, cleanups and updates all over the place" I was hoping that that trainwreck engineering contest was a April Fools' joke. But no. * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (68 commits) irqchip: sun7i/sun6i: Disable NMI before registering the handler ARM: sun7i/sun6i: dts: Fix IRQ number for sun6i NMI controller ARM: sun7i/sun6i: irqchip: Update the documentation ARM: sun7i/sun6i: dts: Add NMI irqchip support ARM: sun7i/sun6i: irqchip: Add irqchip driver for NMI controller genirq: Export symbol no_action() arm: omap: Fix typo in ams-delta-fiq.c m68k: atari: Fix the last kernel_stat.h fallout irqchip: sun4i: Simplify sun4i_irq_ack irqchip: sun4i: Use handle_fasteoi_irq for all interrupts genirq: procfs: Make smp_affinity values go+r softirq: Add linux/irq.h to make it compile again m68k: amiga: Add linux/irq.h to make it compile again irqchip: sun4i: Don't ack IRQs > 0, fix acking of IRQ 0 irqchip: sun4i: Fix a comment about mask register initialization irqchip: sun4i: Fix irq 0 not working genirq: Add a new IRQCHIP_EOI_THREADED flag genirq: Document IRQCHIP_ONESHOT_SAFE flag ARM: sunxi: dt: Convert to the new irq controller compatibles irqchip: sunxi: Change compatibles ...
2014-04-01Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer changes from Thomas Gleixner: "This assorted collection provides: - A new timer based timer broadcast feature for systems which do not provide a global accessible timer device. That allows those systems to put CPUs into deep idle states where the per cpu timer device stops. - A few NOHZ_FULL related improvements to the timer wheel - The usual updates to timer devices found in ARM SoCs - Small improvements and updates all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) tick: Remove code duplication in tick_handle_periodic() tick: Fix spelling mistake in tick_handle_periodic() x86: hpet: Use proper destructor for delayed work workqueue: Provide destroy_delayed_work_on_stack() clocksource: CMT, MTU2, TMU and STI should depend on GENERIC_CLOCKEVENTS timer: Remove code redundancy while calling get_nohz_timer_target() hrtimer: Rearrange comments in the order struct members are declared timer: Use variable head instead of &work_list in __run_timers() clocksource: exynos_mct: silence a static checker warning arm: zynq: Add support for cpufreq arm: zynq: Don't use arm_global_timer with cpufreq clocksource/cadence_ttc: Overhaul clocksource frequency adjustment clocksource/cadence_ttc: Call clockevents_update_freq() with IRQs enabled clocksource: Add Kconfig entries for CMT, MTU2, TMU and STI sh: Remove Kconfig entries for TMU, CMT and MTU2 ARM: shmobile: Remove CMT, TMU and STI Kconfig entries clocksource: armada-370-xp: Use atomic access for shared registers clocksource: orion: Use atomic access for shared registers clocksource: timer-keystone: Delete unnecessary variable clocksource: timer-keystone: introduce clocksource driver for Keystone ...
2014-04-01Merge branch 'timers-nohz-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Ingo Molnar: "The main purpose is to fix a full dynticks bug related to virtualization, where steal time accounting appears to be zero in /proc/stat even after a few seconds of competing guests running busy loops in a same host CPU. It's not a regression though as it was there since the beginning. The other commits are preparatory work to fix the bug and various cleanups" * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arch: Remove stub cputime.h headers sched: Remove needless round trip nsecs <-> tick conversion of steal time cputime: Fix jiffies based cputime assumption on steal accounting cputime: Bring cputime -> nsecs conversion cputime: Default implementation of nsecs -> cputime conversion cputime: Fix nsecs_to_cputime() return type cast
2014-03-31Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull ARM64 updates from Catalin Marinas: - KGDB support for arm64 - PCI I/O space extended to 16M (in preparation of PCIe support patches) - Dropping ZONE_DMA32 in favour of ZONE_DMA (we only need one for the time being), together with swiotlb late initialisation to correctly setup the bounce buffer - DMA API cache maintenance support (not all ARMv8 platforms have hardware cache coherency) - Crypto extensions advertising via ELF_HWCAP2 for compat user space - Perf support for dwarf unwinding in compat mode - asm/tlb.h converted to the generic mmu_gather code - asm-generic rwsem implementation - Code clean-up * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (42 commits) arm64: Remove pgprot_dmacoherent() arm64: Support DMA_ATTR_WRITE_COMBINE arm64: Implement custom mmap functions for dma mapping arm64: Fix __range_ok macro arm64: Fix duplicated Kconfig entries arm64: mm: Route pmd thp functions through pte equivalents arm64: rwsem: use asm-generic rwsem implementation asm-generic: rwsem: de-PPCify rwsem.h arm64: enable generic CPU feature modalias matching for this architecture arm64: smp: make local symbol static arm64: debug: make local symbols static ARM64: perf: support dwarf unwinding in compat mode ARM64: perf: add support for frame pointer unwinding in compat mode ARM64: perf: add support for perf registers API arm64: Add boot time configuration of Intermediate Physical Address size arm64: Do not synchronise I and D caches for special ptes arm64: Make DMA coherent and strongly ordered mappings not executable arm64: barriers: add dmb barrier arm64: topology: Implement basic CPU topology support arm64: advertise ARMv8 extensions to 32-bit compat ELF binaries ...
2014-03-31Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "There are two memory management related changes, the CMMA support for KVM to avoid swap-in of freed pages and the split page table lock for the PMD level. These two come with common code changes in mm/. A fix for the long standing theoretical TLB flush problem, this one comes with a common code change in kernel/sched/. Another set of changes is Heikos uaccess work, included is the initial set of patches with more to come. And fixes and cleanups as usual" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (36 commits) s390/con3270: optionally disable auto update s390/mm: remove unecessary parameter from pgste_ipte_notify s390/mm: remove unnecessary parameter from gmap_do_ipte_notify s390/mm: fixing comment so that parameter name match s390/smp: limit number of cpus in possible cpu mask hypfs: Add clarification for "weight_min" attribute s390: update defconfigs s390/ptrace: add support for PTRACE_SINGLEBLOCK s390/perf: make print_debug_cf() static s390/topology: Remove call to update_cpu_masks() s390/compat: remove compat exec domain s390: select CONFIG_TTY for use of tty in unconditional keyboard driver s390/appldata_os: fix cpu array size calculation s390/checksum: remove memset() within csum_partial_copy_from_user() s390/uaccess: remove copy_from_user_real() s390/sclp_early: Return correct HSA block count also for zero s390: add some drivers/subsystems to the MAINTAINERS file s390: improve debug feature usage s390/airq: add support for irq ranges s390/mm: enable split page table lock for PMD level ...
2014-03-31Merge branch 'compat' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 compat wrapper rework from Heiko Carstens: "S390 compat system call wrapper simplification work. The intention of this work is to get rid of all hand written assembly compat system call wrappers on s390, which perform proper sign or zero extension, or pointer conversion of compat system call parameters. Instead all of this should be done with C code eg by using Al's COMPAT_SYSCALL_DEFINEx() macro. Therefore all common code and s390 specific compat system calls have been converted to the COMPAT_SYSCALL_DEFINEx() macro. In order to generate correct code all compat system calls may only have eg compat_ulong_t parameters, but no unsigned long parameters. Those patches which change parameter types from unsigned long to compat_ulong_t parameters are separate in this series, but shouldn't cause any harm. The only compat system calls which intentionally have 64 bit parameters (preadv64 and pwritev64) in support of the x86/32 ABI haven't been changed, but are now only available if an architecture defines __ARCH_WANT_COMPAT_SYS_PREADV64/PWRITEV64. System calls which do not have a compat variant but still need proper zero extension on s390, like eg "long sys_brk(unsigned long brk)" will get a proper wrapper function with the new s390 specific COMPAT_SYSCALL_WRAPx() macro: COMPAT_SYSCALL_WRAP1(brk, unsigned long, brk); which generates the following code (simplified): asmlinkage long sys_brk(unsigned long brk); asmlinkage long compat_sys_brk(long brk) { return sys_brk((u32)brk); } Given that the C file which contains all the COMPAT_SYSCALL_WRAP lines includes both linux/syscall.h and linux/compat.h, it will generate build errors, if the declaration of sys_brk() doesn't match, or if there exists a non-matching compat_sys_brk() declaration. In addition this will intentionally result in a link error if somewhere else a compat_sys_brk() function exists, which probably should have been used instead. Two more BUILD_BUG_ONs make sure the size and type of each compat syscall parameter can be handled correctly with the s390 specific macros. I converted the compat system calls step by step to verify the generated code is correct and matches the previous code. In fact it did not always match, however that was always a bug in the hand written asm code. In result we get less code, less bugs, and much more sanity checking" * 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (44 commits) s390/compat: add copyright statement compat: include linux/unistd.h within linux/compat.h s390/compat: get rid of compat wrapper assembly code s390/compat: build error for large compat syscall args mm/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types kexec/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types net/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types ipc/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types fs/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types ipc/compat: convert to COMPAT_SYSCALL_DEFINE fs/compat: convert to COMPAT_SYSCALL_DEFINE security/compat: convert to COMPAT_SYSCALL_DEFINE mm/compat: convert to COMPAT_SYSCALL_DEFINE net/compat: convert to COMPAT_SYSCALL_DEFINE kernel/compat: convert to COMPAT_SYSCALL_DEFINE fs/compat: optional preadv64/pwrite64 compat system calls ipc/compat_sys_msgrcv: change msgtyp type from long to compat_long_t s390/compat: partial parameter conversion within syscall wrappers s390/compat: automatic zero, sign and pointer conversion of syscalls s390/compat: add sync_file_range and fallocate compat syscalls ...
2014-03-31Merge branch 'x86-asmlinkage-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 LTO changes from Peter Anvin: "More infrastructure work in preparation for link-time optimization (LTO). Most of these changes is to make sure symbols accessed from assembly code are properly marked as visible so the linker doesn't remove them. My understanding is that the changes to support LTO are still not upstream in binutils, but are on the way there. This patchset should conclude the x86-specific changes, and remaining patches to actually enable LTO will be fed through the Kbuild tree (other than keeping up with changes to the x86 code base, of course), although not necessarily in this merge window" * 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) Kbuild, lto: Handle basic LTO in modpost Kbuild, lto: Disable LTO for asm-offsets.c Kbuild, lto: Add a gcc-ld script to let run gcc as ld Kbuild, lto: add ld-version and ld-ifversion macros Kbuild, lto: Drop .number postfixes in modpost Kbuild, lto, workaround: Don't warn for initcall_reference in modpost lto: Disable LTO for sys_ni lto: Handle LTO common symbols in module loader lto, workaround: Add workaround for initcall reordering lto: Make asmlinkage __visible x86, lto: Disable LTO for the x86 VDSO initconst, x86: Fix initconst mistake in ts5500 code initconst: Fix initconst mistake in dcdbas asmlinkage: Make trace_hardirqs_on/off_caller visible asmlinkage, x86: Fix 32bit memcpy for LTO asmlinkage Make __stack_chk_failed and memcmp visible asmlinkage: Mark rwsem functions that can be called from assembler asmlinkage asmlinkage: Make main_extable_sort_needed visible asmlinkage, mutex: Mark __visible asmlinkage: Make trace_hardirq visible ...
2014-03-31Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu handling changes from Ingo Molnar: "Bigger changes: - Intel CPU hardware-enablement: new vector instructions support (AVX-512), by Fenghua Yu. - Support the clflushopt instruction and use it in appropriate places. clflushopt is similar to clflush but with more relaxed ordering, by Ross Zwisler. - MSR accessor cleanups, by Borislav Petkov. - 'forcepae' boot flag for those who have way too much time to spend on way too old Pentium-M systems and want to live way too dangerously, by Chris Bainbridge" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC x86, intel: Make MSR_IA32_MISC_ENABLE bit constants systematic x86, Intel: Convert to the new bit access MSR accessors x86, AMD: Convert to the new bit access MSR accessors x86: Add another set of MSR accessor functions x86: Use clflushopt in drm_clflush_virt_range x86: Use clflushopt in drm_clflush_page x86: Use clflushopt in clflush_cache_range x86: Add support for the clflushopt instruction x86, AVX-512: Enable AVX-512 States Context Switch x86, AVX-512: AVX-512 Feature Detection
2014-03-31Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "Bigger changes: - sched/idle restructuring: they are WIP preparation for deeper integration between the scheduler and idle state selection, by Nicolas Pitre. - add NUMA scheduling pseudo-interleaving, by Rik van Riel. - optimize cgroup context switches, by Peter Zijlstra. - RT scheduling enhancements, by Thomas Gleixner. The rest is smaller changes, non-urgnt fixes and cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (68 commits) sched: Clean up the task_hot() function sched: Remove double calculation in fix_small_imbalance() sched: Fix broken setscheduler() sparc64, sched: Remove unused sparc64_multi_core sched: Remove unused mc_capable() and smt_capable() sched/numa: Move task_numa_free() to __put_task_struct() sched/fair: Fix endless loop in idle_balance() sched/core: Fix endless loop in pick_next_task() sched/fair: Push down check for high priority class task into idle_balance() sched/rt: Fix picking RT and DL tasks from empty queue trace: Replace hardcoding of 19 with MAX_NICE sched: Guarantee task priority in pick_next_task() sched/idle: Remove stale old file sched: Put rq's sched_avg under CONFIG_FAIR_GROUP_SCHED cpuidle/arm64: Remove redundant cpuidle_idle_call() cpuidle/powernv: Remove redundant cpuidle_idle_call() sched, nohz: Exclude isolated cores from load balancing sched: Fix select_task_rq_fair() description comments workqueue: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE sys: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE ...
2014-03-31Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "Main changes: Kernel side changes: - Add SNB/IVB/HSW client uncore memory controller support (Stephane Eranian) - Fix various x86/P4 PMU driver bugs (Don Zickus) Tooling, user visible changes: - Add several futex 'perf bench' microbenchmarks (Davidlohr Bueso) - Speed up thread map generation (Don Zickus) - Introduce 'perf kvm --list-cmds' command line option for use by scripts (Ramkumar Ramachandra) - Print the evsel name in the annotate stdio output, prep to fix support outputting annotation for multiple events, not just for the first one (Arnaldo Carvalho de Melo) - Allow setting preferred callchain method in .perfconfig (Jiri Olsa) - Show in what binaries/modules 'perf probe's are set (Masami Hiramatsu) - Support distro-style debuginfo for uprobe in 'perf probe' (Masami Hiramatsu) Tooling, internal changes and fixes: - Use tid in mmap/mmap2 events to find maps (Don Zickus) - Record the reason for filtering an address_location (Namhyung Kim) - Apply all filters to an addr_location (Namhyung Kim) - Merge al->filtered with hist_entry->filtered in report/hists (Namhyung Kim) - Fix memory leak when synthesizing thread records (Namhyung Kim) - Use ui__has_annotation() in 'report' (Namhyung Kim) - hists browser refactorings to reuse code accross UIs (Namhyung Kim) - Add support for the new DWARF unwinder library in elfutils (Jiri Olsa) - Fix build race in the generation of bison files (Jiri Olsa) - Further streamline the feature detection display, trimming it a bit to show just the libraries detected, using VF=1 gets a more verbose output, showing the less interesting feature checks as well (Jiri Olsa). - Check compatible symtab type before loading dso (Namhyung Kim) - Check return value of filename__read_debuglink() (Stephane Eranian) - Move some hashing and fs related code from tools/perf/util/ to tools/lib/ so that it can be used by more tools/ living utilities (Borislav Petkov) - Prepare DWARF unwinding code for using an elfutils alternative unwinding library (Jiri Olsa) - Fix DWARF unwind max_stack processing (Jiri Olsa) - Add dwarf unwind 'perf test' entry (Jiri Olsa) - 'perf probe' improvements including memory leak fixes, sharing the intlist class with other tools, uprobes/kprobes code sharing and use of ref_reloc_sym (Masami Hiramatsu) - Shorten sample symbol resolving by adding cpumode to struct addr_location (Arnaldo Carvalho de Melo) - Fix synthesizing mmaps for threads (Don Zickus) - Fix invalid output on event group stdio report (Namhyung Kim) - Fixup header alignment in 'perf sched latency' output (Ramkumar Ramachandra) - Fix off-by-one error in 'perf timechart record' argv handling (Ramkumar Ramachandra) Tooling, cleanups: - Remove unused thread__find_map function (Jiri Olsa) - Remove unused simple_strtoul() function (Ramkumar Ramachandra) Tooling, documentation updates: - Update function names in debug messages (Ramkumar Ramachandra) - Update some code references in design.txt (Ramkumar Ramachandra) - Clarify load-latency information in the 'perf mem' docs (Andi Kleen) - Clarify x86 register naming in 'perf probe' docs (Andi Kleen)" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (96 commits) perf tools: Remove unused simple_strtoul() function perf tools: Update some code references in design.txt perf evsel: Update function names in debug messages perf tools: Remove thread__find_map function perf annotate: Print the evsel name in the stdio output perf report: Use ui__has_annotation() perf tools: Fix memory leak when synthesizing thread records perf tools: Use tid in mmap/mmap2 events to find maps perf report: Merge al->filtered with hist_entry->filtered perf symbols: Apply all filters to an addr_location perf symbols: Record the reason for filtering an address_location perf sched: Fixup header alignment in 'latency' output perf timechart: Fix off-by-one error in 'record' argv handling perf machine: Factor machine__find_thread to take tid argument perf tools: Speed up thread map generation perf kvm: introduce --list-cmds for use by scripts perf ui hists: Pass evsel to hpp->header/width functions explicitly perf symbols: Introduce thread__find_cpumode_addr_location perf session: Change header.misc dump from decimal to hex perf ui/tui: Reuse generic __hpp__fmt() code ...
2014-03-31Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "Main changes: - Torture-test changes, including refactoring of rcutorture and introduction of a vestigial locktorture. - Real-time latency fixes. - Documentation updates. - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits) rcu: Provide grace-period piggybacking API rcu: Ensure kernel/rcu/rcu.h can be sourced/used stand-alone rcu: Fix sparse warning for rcu_expedited from kernel/ksysfs.c notifier: Substitute rcu_access_pointer() for rcu_dereference_raw() Documentation/memory-barriers.txt: Clarify release/acquire ordering rcutorture: Save kvm.sh output to log rcutorture: Add a lock_busted to test the test rcutorture: Place kvm-test-1-run.sh output into res directory rcutorture: Rename TREE_RCU-Kconfig.txt locktorture: Add kvm-recheck.sh plug-in for locktorture rcutorture: Gracefully handle NULL cleanup hooks locktorture: Add vestigial locktorture configuration rcutorture: Introduce "rcu" directory level underneath configs rcutorture: Rename kvm-test-1-rcu.sh rcutorture: Remove RCU dependencies from ver_functions.sh API rcutorture: Create CFcommon file for common Kconfig parameters rcutorture: Create config files for scripted test-the-test testing rcutorture: Add an rcu_busted to test the test locktorture: Add a lock-torture kernel module rcutorture: Abstract kvm-recheck.sh ...
2014-03-31Merge branch 'core-locking-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking updates from Ingo Molnar: "The biggest change is the MCS spinlock generalization changes from Tim Chen, Peter Zijlstra, Jason Low et al. There's also lockdep fixes/enhancements from Oleg Nesterov, in particular a false negative fix related to lockdep_set_novalidate_class() usage" * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits) locking/mutex: Fix debug checks locking/mutexes: Add extra reschedule point locking/mutexes: Introduce cancelable MCS lock for adaptive spinning locking/mutexes: Unlock the mutex without the wait_lock locking/mutexes: Modify the way optimistic spinners are queued locking/mutexes: Return false if task need_resched() in mutex_can_spin_on_owner() locking: Move mcs_spinlock.h into kernel/locking/ m68k: Skip futex_atomic_cmpxchg_inatomic() test futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test Revert "sched/wait: Suppress Sparse 'variable shadowing' warning" lockdep: Change lockdep_set_novalidate_class() to use _and_name lockdep: Change mark_held_locks() to check hlock->check instead of lockdep_no_validate lockdep: Don't create the wrong dependency on hlock->check == 0 lockdep: Make held_lock->check and "int check" argument bool locking/mcs: Allow architecture specific asm files to be used for contended case locking/mcs: Order the header files in Kbuild of each architecture in alphabetical order sched/wait: Suppress Sparse 'variable shadowing' warning hung_task/Documentation: Fix hung_task_warnings description locking/mcs: Allow architectures to hook in to contended paths locking/mcs: Micro-optimize the MCS code, add extra comments ...
2014-03-31net: filter: rework/optimize internal BPF interpreter's instruction setAlexei Starovoitov
This patch replaces/reworks the kernel-internal BPF interpreter with an optimized BPF instruction set format that is modelled closer to mimic native instruction sets and is designed to be JITed with one to one mapping. Thus, the new interpreter is noticeably faster than the current implementation of sk_run_filter(); mainly for two reasons: 1. Fall-through jumps: BPF jump instructions are forced to go either 'true' or 'false' branch which causes branch-miss penalty. The new BPF jump instructions have only one branch and fall-through otherwise, which fits the CPU branch predictor logic better. `perf stat` shows drastic difference for branch-misses between the old and new code. 2. Jump-threaded implementation of interpreter vs switch statement: Instead of single table-jump at the top of 'switch' statement, gcc will now generate multiple table-jump instructions, which helps CPU branch predictor logic. Note that the verification of filters is still being done through sk_chk_filter() in classical BPF format, so filters from user- or kernel space are verified in the same way as we do now, and same restrictions/constraints hold as well. We reuse current BPF JIT compilers in a way that this upgrade would even be fine as is, but nevertheless allows for a successive upgrade of BPF JIT compilers to the new format. The internal instruction set migration is being done after the probing for JIT compilation, so in case JIT compilers are able to create a native opcode image, we're going to use that, and in all other cases we're doing a follow-up migration of the BPF program's instruction set, so that it can be transparently run in the new interpreter. In short, the *internal* format extends BPF in the following way (more details can be taken from the appended documentation): - Number of registers increase from 2 to 10 - Register width increases from 32-bit to 64-bit - Conditional jt/jf targets replaced with jt/fall-through - Adds signed > and >= insns - 16 4-byte stack slots for register spill-fill replaced with up to 512 bytes of multi-use stack space - Introduction of bpf_call insn and register passing convention for zero overhead calls from/to other kernel functions - Adds arithmetic right shift and endianness conversion insns - Adds atomic_add insn - Old tax/txa insns are replaced with 'mov dst,src' insn Performance of two BPF filters generated by libpcap resp. bpf_asm was measured on x86_64, i386 and arm32 (other libpcap programs have similar performance differences): fprog #1 is taken from Documentation/networking/filter.txt: tcpdump -i eth0 port 22 -dd fprog #2 is taken from 'man tcpdump': tcpdump -i eth0 'tcp port 22 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' -dd Raw performance data from BPF micro-benchmark: SK_RUN_FILTER on the same SKB (cache-hit) or 10k SKBs (cache-miss); time in ns per call, smaller is better: --x86_64-- fprog #1 fprog #1 fprog #2 fprog #2 cache-hit cache-miss cache-hit cache-miss old BPF 90 101 192 202 new BPF 31 71 47 97 old BPF jit 12 34 17 44 new BPF jit TBD --i386-- fprog #1 fprog #1 fprog #2 fprog #2 cache-hit cache-miss cache-hit cache-miss old BPF 107 136 227 252 new BPF 40 119 69 172 --arm32-- fprog #1 fprog #1 fprog #2 fprog #2 cache-hit cache-miss cache-hit cache-miss old BPF 202 300 475 540 new BPF 180 270 330 470 old BPF jit 26 182 37 202 new BPF jit TBD Thus, without changing any userland BPF filters, applications on top of AF_PACKET (or other families) such as libpcap/tcpdump, cls_bpf classifier, netfilter's xt_bpf, team driver's load-balancing mode, and many more will have better interpreter filtering performance. While we are replacing the internal BPF interpreter, we also need to convert seccomp BPF in the same step to make use of the new internal structure since it makes use of lower-level API details without being further decoupled through higher-level calls like sk_unattached_filter_{create,destroy}(), for example. Just as for normal socket filtering, also seccomp BPF experiences a time-to-verdict speedup: 05-sim-long_jumps.c of libseccomp was used as micro-benchmark: seccomp_rule_add_exact(ctx,... seccomp_rule_add_exact(ctx,... rc = seccomp_load(ctx); for (i = 0; i < 10000000; i++) syscall(199, 100); 'short filter' has 2 rules 'large filter' has 200 rules 'short filter' performance is slightly better on x86_64/i386/arm32 'large filter' is much faster on x86_64 and i386 and shows no difference on arm32 --x86_64-- short filter old BPF: 2.7 sec 39.12% bench libc-2.15.so [.] syscall 8.10% bench [kernel.kallsyms] [k] sk_run_filter 6.31% bench [kernel.kallsyms] [k] system_call 5.59% bench [kernel.kallsyms] [k] trace_hardirqs_on_caller 4.37% bench [kernel.kallsyms] [k] trace_hardirqs_off_caller 3.70% bench [kernel.kallsyms] [k] __secure_computing 3.67% bench [kernel.kallsyms] [k] lock_is_held 3.03% bench [kernel.kallsyms] [k] seccomp_bpf_load new BPF: 2.58 sec 42.05% bench libc-2.15.so [.] syscall 6.91% bench [kernel.kallsyms] [k] system_call 6.25% bench [kernel.kallsyms] [k] trace_hardirqs_on_caller 6.07% bench [kernel.kallsyms] [k] __secure_computing 5.08% bench [kernel.kallsyms] [k] sk_run_filter_int_seccomp --arm32-- short filter old BPF: 4.0 sec 39.92% bench [kernel.kallsyms] [k] vector_swi 16.60% bench [kernel.kallsyms] [k] sk_run_filter 14.66% bench libc-2.17.so [.] syscall 5.42% bench [kernel.kallsyms] [k] seccomp_bpf_load 5.10% bench [kernel.kallsyms] [k] __secure_computing new BPF: 3.7 sec 35.93% bench [kernel.kallsyms] [k] vector_swi 21.89% bench libc-2.17.so [.] syscall 13.45% bench [kernel.kallsyms] [k] sk_run_filter_int_seccomp 6.25% bench [kernel.kallsyms] [k] __secure_computing 3.96% bench [kernel.kallsyms] [k] syscall_trace_exit --x86_64-- large filter old BPF: 8.6 seconds 73.38% bench [kernel.kallsyms] [k] sk_run_filter 10.70% bench libc-2.15.so [.] syscall 5.09% bench [kernel.kallsyms] [k] seccomp_bpf_load 1.97% bench [kernel.kallsyms] [k] system_call new BPF: 5.7 seconds 66.20% bench [kernel.kallsyms] [k] sk_run_filter_int_seccomp 16.75% bench libc-2.15.so [.] syscall 3.31% bench [kernel.kallsyms] [k] system_call 2.88% bench [kernel.kallsyms] [k] __secure_computing --i386-- large filter old BPF: 5.4 sec new BPF: 3.8 sec --arm32-- large filter old BPF: 13.5 sec 73.88% bench [kernel.kallsyms] [k] sk_run_filter 10.29% bench [kernel.kallsyms] [k] vector_swi 6.46% bench libc-2.17.so [.] syscall 2.94% bench [kernel.kallsyms] [k] seccomp_bpf_load 1.19% bench [kernel.kallsyms] [k] __secure_computing 0.87% bench [kernel.kallsyms] [k] sys_getuid new BPF: 13.5 sec 76.08% bench [kernel.kallsyms] [k] sk_run_filter_int_seccomp 10.98% bench [kernel.kallsyms] [k] vector_swi 5.87% bench libc-2.17.so [.] syscall 1.77% bench [kernel.kallsyms] [k] __secure_computing 0.93% bench [kernel.kallsyms] [k] sys_getuid BPF filters generated by seccomp are very branchy, so the new internal BPF performance is better than the old one. Performance gains will be even higher when BPF JIT is committed for the new structure, which is planned in future work (as successive JIT migrations). BPF has also been stress-tested with trinity's BPF fuzzer. Joint work with Daniel Borkmann. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Kees Cook <keescook@chromium.org> Cc: Paul Moore <pmoore@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: linux-kernel@vger.kernel.org Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31AUDIT: Allow login in non-init namespacesEric Paris
It its possible to configure your PAM stack to refuse login if audit messages (about the login) were unable to be sent. This is common in many distros and thus normal configuration of many containers. The PAM modules determine if audit is enabled/disabled in the kernel based on the return value from sending an audit message on the netlink socket. If userspace gets back ECONNREFUSED it believes audit is disabled in the kernel. If it gets any other error else it refuses to let the login proceed. Just about ever since the introduction of namespaces the kernel audit subsystem has returned EPERM if the task sending a message was not in the init user or pid namespace. So many forms of containers have never worked if audit was enabled in the kernel. BUT if the container was not in net_init then the kernel network code would send ECONNREFUSED (instead of the audit code sending EPERM). Thus by pure accident/dumb luck/bug if an admin configured the PAM stack to reject all logins that didn't talk to audit, but then ran the login untility in the non-init_net namespace, it would work!! Clearly this was a bug, but it is a bug some people expected. With the introduction of network namespace support in 3.14-rc1 the two bugs stopped cancelling each other out. Now, containers in the non-init_net namespace refused to let users log in (just like PAM was configfured!) Obviously some people were not happy that what used to let users log in, now didn't! This fix is kinda hacky. We return ECONNREFUSED for all non-init relevant namespaces. That means that not only will the old broken non-init_net setups continue to work, now the broken non-init_pid or non-init_user setups will 'work'. They don't really work, since audit isn't logging things. But it's what most users want. In 3.15 we should have patches to support not only the non-init_net (3.14) namespace but also the non-init_pid and non-init_user namespace. So all will be right in the world. This just opens the doors wide open on 3.14 and hopefully makes users happy, if not the audit system... Reported-by: Andre Tomt <andre@tomt.net> Reported-by: Adam Richter <adam_richter2004@yahoo.com> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-28time: Revert to calling clock_was_set_delayed() while in irq contextJohn Stultz
In commit 47a1b796306356f35 ("tick/timekeeping: Call update_wall_time outside the jiffies lock"), we moved to calling clock_was_set() due to the fact that we were no longer holding the timekeeping or jiffies lock. However, there is still the problem that clock_was_set() triggers an IPI, which cannot be done from the timer's hard irq context, and will generate WARN_ON warnings. Apparently in my earlier testing, I'm guessing I didn't bump the dmesg log level, so I somehow missed the WARN_ONs. Thus we need to revert back to calling clock_was_set_delayed(). Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1395963049-11923-1-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-26Merge tag 'trace-fixes-v3.14-rc7-v2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "While on my flight to Linux Collaboration Summit, I was working on my slides for the event trigger tutorial. I booted a 3.14-rc7 kernel to perform what I wanted to teach and cut and paste it into my slides. When I tried the traceon event trigger with a condition attached to it (turns tracing on only if a field of the trigger event matches a condition set by the user), nothing happened. Tracing would not turn on. I stopped working on my presentation in order to find what was wrong. It ended up being the way trace event triggers work when they have conditions. Instead of copying the fields, the condition code just looks at the fields that were copied into the ring buffer. This works great, unless tracing is off. That's because when the event is reserved on the ring buffer, the ring buffer returns a NULL pointer, this tells the tracing code that the ring buffer is disabled. This ends up being a problem for the traceon trigger if it is using this information to check its condition. Luckily the code that checks if tracing is on returns the ring buffer to use (because the ring buffer is determined by the event file also passed to that field). I was able to easily solve this bug by checking in that helper function if the returned ring buffer entry is NULL, and if so, also check the file flag if it has a trace event trigger condition, and if so, to pass back a temp ring buffer to use. This will allow the trace event trigger condition to still test the event fields, but nothing will be recorded" * tag 'trace-fixes-v3.14-rc7-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix traceon trigger condition to actually turn tracing on
2014-03-26tracing: Fix traceon trigger condition to actually turn tracing onSteven Rostedt (Red Hat)
While working on my tutorial for 2014 Linux Collaboration Summit I found that the traceon trigger did not work when conditions were used. The other triggers worked fine though. Looking into it, it is because of the way the triggers use the ring buffer to store the fields it will use for the condition. But if tracing is off, nothing is stored in the buffer, and the tracepoint exits before calling the trigger to test the condition. This is fine for all the triggers that only work when tracing is on, but for traceon trigger that is to work when tracing is off, nothing happens. The fix is simple, just use a temp ring buffer to record the event if tracing is off and the event has a trace event conditional trigger enabled. The rest of the tracepoint code will work just fine, but the tracepoint wont be recorded in the other buffers. Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-25tick: Remove code duplication in tick_handle_periodic()Viresh Kumar
tick_handle_periodic() is calling ktime_add() at two places, first before the infinite loop and then at the end of infinite loop. We can rearrange code a bit to fix code duplication here. It looks quite simple and shouldn't break anything, I guess :) Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: linaro-kernel@lists.linaro.org Cc: fweisbec@gmail.com Link: http://lkml.kernel.org/r/be3481e8f3f71df694a4b43623254fc93ca51b59.1395735873.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-25tick: Fix spelling mistake in tick_handle_periodic()Viresh Kumar
One of the comments in tick_handle_periodic() had 'when' instead of 'which' (My guess :)). Fix it. Also fix spelling mistake in 'Possible'. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: linaro-kernel@lists.linaro.org Cc: skarafotis@gmail.com Link: http://lkml.kernel.org/r/2b29ca4230c163e44179941d7c7a16c1474385c2.1395743878.git.viresh.kumar@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-25workqueue: Provide destroy_delayed_work_on_stack()Thomas Gleixner
If a delayed or deferrable work is on stack we need to tell debug objects that we are destroying the timer and the work. Otherwise we leak the tracking object. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Acked-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20140323141939.911487677@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-25Merge branch 'rcu/next' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU update from Paul E. McKenney: " [...] one late-breaking commit. This one was requested for 3.15 by Peter Zijlstra. It is low risk because it adds a new in-kernel API with minimal changes to the existing code. Those minimal changes are the addition of memory barriers and ACCESS_ONCE() macro calls, neither of which should be able to break things. This commit has passed significant rcutorture testing, with these additional additions to rcutorture slated for 3.16. This commit has also been exposed to -next testing. " Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-22genirq: Export symbol no_action()Alexander Shiyan
This will allow to use the dummy IRQ handler no_action() from drivers compiled as module. Drivers which use ARM FIQ interrupts can use this to request the interrupt via the normal request_irq() mechanism w/o having to copy the dummy handler to their own code. Signed-off-by: Alexander Shiyan <shc_work@mail.ru> Link: http://lkml.kernel.org/r/1395476431-16070-1-git-send-email-shc_work@mail.ru Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-21futex: revert back to the explicit waiter counting codeLinus Torvalds
Srikar Dronamraju reports that commit b0c29f79ecea ("futexes: Avoid taking the hb->lock if there's nothing to wake up") causes java threads getting stuck on futexes when runing specjbb on a power7 numa box. The cause appears to be that the powerpc spinlocks aren't using the same ticket lock model that we use on x86 (and other) architectures, which in turn result in the "spin_is_locked()" test in hb_waiters_pending() occasionally reporting an unlocked spinlock even when there are pending waiters. So this reinstates Davidlohr Bueso's original explicit waiter counting code, which I had convinced Davidlohr to drop in favor of figuring out the pending waiters by just using the existing state of the spinlock and the wait queue. Reported-and-tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Original-code-by: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-21Merge tag 'trace-fixes-v3.14-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull trace fix from Steven Rostedt: "Vaibhav Nagarnaik discovered that since 3.10 a clean-up patch made the array index in the trace event format bogus. He supplied an elegant solution that uses __stringify() and also removes the need for the event_storage and event_storage_mutex and also cuts off a few K of overhead from the trace events" * tag 'trace-fixes-v3.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix array size mismatch in format string
2014-03-21rcu: Provide grace-period piggybacking APIPaul E. McKenney
The following pattern is currently not well supported by RCU: 1. Make data element inaccessible to RCU readers. 2. Do work that probably lasts for more than one grace period. 3. Do something to make sure RCU readers in flight before #1 above have completed. Here are some things that could currently be done: a. Do a synchronize_rcu() unconditionally at either #1 or #3 above. This works, but imposes needless work and latency. b. Post an RCU callback at #1 above that does a wakeup, then wait for the wakeup at #3. This works well, but likely results in an extra unneeded grace period. Open-coding this is also a bit more semi-tricky code than would be good. This commit therefore adds get_state_synchronize_rcu() and cond_synchronize_rcu() APIs. Call get_state_synchronize_rcu() at #1 above and pass its return value to cond_synchronize_rcu() at #3 above. This results in a call to synchronize_rcu() if no grace period has elapsed between #1 and #3, but requires only a load, comparison, and memory barrier if a full grace period did elapse. Requested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org>
2014-03-20Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPECDave Jones
Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC, so we can repurpose the flag to encompass a wider range of pushing the CPU beyond its warrany. Signed-off-by: Dave Jones <davej@fedoraproject.org> Link: http://lkml.kernel.org/r/20140226154949.GA770@redhat.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-03-20tracing: Fix array size mismatch in format stringVaibhav Nagarnaik
In event format strings, the array size is reported in two locations. One in array subscript and then via the "size:" attribute. The values reported there have a mismatch. For e.g., in sched:sched_switch the prev_comm and next_comm character arrays have subscript values as [32] where as the actual field size is 16. name: sched_switch ID: 301 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1;signed:0; field:int common_pid; offset:4; size:4; signed:1; field:char prev_comm[32]; offset:8; size:16; signed:1; field:pid_t prev_pid; offset:24; size:4; signed:1; field:int prev_prio; offset:28; size:4; signed:1; field:long prev_state; offset:32; size:8; signed:1; field:char next_comm[32]; offset:40; size:16; signed:1; field:pid_t next_pid; offset:56; size:4; signed:1; field:int next_prio; offset:60; size:4; signed:1; After bisection, the following commit was blamed: 92edca0 tracing: Use direct field, type and system names This commit removes the duplication of strings for field->name and field->type assuming that all the strings passed in __trace_define_field() are immutable. This is not true for arrays, where the type string is created in event_storage variable and field->type for all array fields points to event_storage. Use __stringify() to create a string constant for the type string. Also, get rid of event_storage and event_storage_mutex that are not needed anymore. also, an added benefit is that this reduces the overhead of events a bit more: text data bss dec hex filename 8424787 2036472 1302528 11763787 b3804b vmlinux 8420814 2036408 1302528 11759750 b37086 vmlinux.patched Link: http://lkml.kernel.org/r/1392349908-29685-1-git-send-email-vnagarnaik@google.com Cc: Laurent Chavey <chavey@google.com> Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>