summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2013-01-11kernel/audit.c: avoid negative sleep durationsAndrew Morton
audit_log_start() performs the same jiffies comparison in two places. If sufficient time has elapsed between the two comparisons, the second one produces a negative sleep duration: schedule_timeout: wrong timeout value fffffffffffffff0 Pid: 6606, comm: trinity-child1 Not tainted 3.8.0-rc1+ #43 Call Trace: schedule_timeout+0x305/0x340 audit_log_start+0x311/0x470 audit_log_exit+0x4b/0xfb0 __audit_syscall_exit+0x25f/0x2c0 sysret_audit+0x17/0x21 Fix it by performing the comparison a single time. Reported-by: Dave Jones <davej@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11audit: catch possible NULL audit buffersKees Cook
It's possible for audit_log_start() to return NULL. Handle it in the various callers. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Cc: Jeff Layton <jlayton@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Julien Tinnes <jln@google.com> Cc: Will Drewry <wad@google.com> Cc: Steve Grubb <sgrubb@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11audit: create explicit AUDIT_SECCOMP event typeKees Cook
The seccomp path was using AUDIT_ANOM_ABEND from when seccomp mode 1 could only kill a process. While we still want to make sure an audit record is forced on a kill, this should use a separate record type since seccomp mode 2 introduces other behaviors. In the case of "handled" behaviors (process wasn't killed), only emit a record if the process is under inspection. This change also fixes userspace examination of seccomp audit events, since it was considered malformed due to missing fields of the AUDIT_ANOM_ABEND event type. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Cc: Jeff Layton <jlayton@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Julien Tinnes <jln@google.com> Acked-by: Will Drewry <wad@chromium.org> Acked-by: Steve Grubb <sgrubb@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11lockdep, rwsem: provide down_write_nest_lock()Jiri Kosina
down_write_nest_lock() provides a means to annotate locking scenario where an outer lock is guaranteed to serialize the order nested locks are being acquired. This is analogoue to already existing mutex_lock_nest_lock() and spin_lock_nest_lock(). Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mel Gorman <mel@csn.ul.ie> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-10audit: fix auditfilter.c kernel-doc warningsRandy Dunlap
Fix new kernel-doc warning in auditfilter.c: Warning(kernel/auditfilter.c:1157): Excess function parameter 'uid' description in 'audit_receive_filter' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Cc: linux-audit@redhat.com (subscribers-only) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-10Merge tag 'trace-3.8-rc2-regression-fix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing regression fix from Steven Rostedt: "A change that came in this merge window broke the writing to the trace_options file. It causes garbage to be read during the compare of option names, and breaks setting options via the trace_options file, although options can still be set via the options/<option> files." * tag 'trace-3.8-rc2-regression-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix regression of trace_options file setting
2013-01-10tracing: Fix regression of trace_options file settingSteven Rostedt
The latest change to allow trace options to be set on the command line also broke the trace_options file. The zeroing of the last byte of the option name that is echoed into the trace_option file was removed with the consolidation of some of the code. The compare between the option and what was written to the trace_options file fails because the string holding the data written doesn't terminate with a null character. A zero needs to be added to the end of the string copied from user space. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-01-07Merge branch 'akpm' (fixes from Andrew)Linus Torvalds
Merge emailed fixes from Andrew Morton: "Bunch of fixes: - delayed IPC updates. I held back on this because of some possible outstanding bug reports, but they appear to have been addressed in later versions - A bunch of MAINTAINERS updates - Yet Another RTC driver. I'd held this back while a couple of little issues were being worked out. I'm expecting an intrusive-but-simple patchset from Joe Perches which splits up printk.c into kernel/printk/*. That will be a pig to maintain for two months so if it passes testing I'd like to get it upstream after a week or so." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (35 commits) printk: fix incorrect length from print_time() when seconds > 99999 drivers/rtc/rtc-vt8500.c: fix handling of data passed in struct rtc_time drivers/rtc/rtc-vt8500.c: correct handling of CR_24H bitfield rtc: add RTC driver for TPS6586x MAINTAINERS: fix drivers/staging/sm7xx/ MAINTAINERS: remove include/linux/of_pwm.h MAINTAINERS: remove arch/*/lib/perf_event*.c MAINTAINERS: remove drivers/mmc/host/imxmmc.* MAINTAINERS: fix Documentation/mei/ MAINTAINERS: remove arch/x86/platform/mrst/pmu.* MAINTAINERS: remove firmware/isci/ MAINTAINERS: fix drivers/ieee802154/ MAINTAINERS: fix .../plat-mxc/include/mach/imxfb.h MAINTAINERS: remove drivers/video/epson1355fb.c MAINTAINERS: fix drivers/media/usb/dvb-usb/cxusb* MAINTAINERS: adjust for UAPI MAINTAINERS: fix drivers/media/platform/atmel-isi.c MAINTAINERS: fix arch/arm/mach-at91/include/mach/at_hdmac.h MAINTAINERS: fix drivers/rtc/rtc-vt8500.c MAINTAINERS: remove arch/arm/plat-s5p/ ...
2013-01-06signals: set_current_blocked() can use __set_current_blocked()Oleg Nesterov
Cleanup. And I think we need more cleanups, in particular __set_current_blocked() and sigprocmask() should die. Nobody should ever block SIGKILL or SIGSTOP. - Change set_current_blocked() to use __set_current_blocked() - Change sys_sigprocmask() to use set_current_blocked(), this way it should not worry about SIGKILL/SIGSTOP. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-06signals: sys_ssetmask() uses uninitialized newmaskOleg Nesterov
Commit 77097ae503b1 ("most of set_current_blocked() callers want SIGKILL/SIGSTOP removed from set") removed the initialization of newmask by accident, causing ltp to complain like this: ssetmask01 1 TFAIL : sgetmask() failed: TEST_ERRNO=???(0): Success Restore the proper initialization. Reported-and-tested-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: stable@kernel.org # v3.5+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-05printk: fix incorrect length from print_time() when seconds > 99999Roland Dreier
print_prefix() passes a NULL buf to print_time() to get the length of the time prefix; when printk times are enabled, the current code just returns the constant 15, which matches the format "[%5lu.%06lu] " used to print the time value. However, this is obviously incorrect when the whole seconds part of the time gets beyond 5 digits (100000 seconds is a bit more than a day of uptime). The simple fix is to use snprintf(NULL, 0, ...) to calculate the actual length of the time prefix. This could be micro-optimized but it seems better to have simpler, more readable code here. The bug leads to the syslog system call miscomputing which messages fit into the userspace buffer. If there are enough messages to fill log_buf_len and some have a timestamp >= 100000, dmesg may fail with: # dmesg klogctl: Bad address When this happens, strace shows that the failure is indeed EFAULT due to the kernel mistakenly accessing past the end of dmesg's buffer, since dmesg asks the kernel how big a buffer it needs, allocates a bit more, and then gets an error when it asks the kernel to fill it: syslog(0xa, 0, 0) = 1048576 mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa4d25d2000 syslog(0x3, 0x7fa4d25d2010, 0x100008) = -1 EFAULT (Bad address) As far as I can see, the bug has been there as long as print_time(), which comes from commit 084681d14e42 ("printk: flush continuation lines immediately to console") in 3.5-rc5. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Joe Perches <joe@perches.com> Cc: Sylvain Munaut <s.munaut@whatever-company.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-26pidns: Stop pid allocation when init diesEric W. Biederman
Oleg pointed out that in a pid namespace the sequence. - pid 1 becomes a zombie - setns(thepidns), fork,... - reaping pid 1. - The injected processes exiting. Can lead to processes attempting access their child reaper and instead following a stale pointer. That waitpid for init can return before all of the processes in the pid namespace have exited is also unfortunate. Avoid these problems by disabling the allocation of new pids in a pid namespace when init dies, instead of when the last process in a pid namespace is reaped. Pointed-out-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-12-25pidns: Outlaw thread creation after unshare(CLONE_NEWPID)Eric W. Biederman
The sequence: unshare(CLONE_NEWPID) clone(CLONE_THREAD|CLONE_SIGHAND|CLONE_VM) Creates a new process in the new pid namespace without setting pid_ns->child_reaper. After forking this results in a NULL pointer dereference. Avoid this and other nonsense scenarios that can show up after creating a new pid namespace with unshare by adding a new check in copy_prodcess. Pointed-out-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-12-21Merge branch 'for-next' of git://git.infradead.org/users/eparis/notifyLinus Torvalds
Pull filesystem notification updates from Eric Paris: "This pull mostly is about locking changes in the fsnotify system. By switching the group lock from a spin_lock() to a mutex() we can now hold the lock across things like iput(). This fixes a problem involving unmounting a fs and having inodes be busy, first pointed out by FAT, but reproducible with tmpfs. This also restores signal driven I/O for inotify, which has been broken since about 2.6.32." Ugh. I *hate* the timing of this. It was rebased after the merge window opened, and then left to sit with the pull request coming the day before the merge window closes. That's just crap. But apparently the patches themselves have been around for over a year, just gathering dust, so now it's suddenly critical. Fixed up semantic conflict in fs/notify/fdinfo.c as per Stephen Rothwell's fixes from -next. * 'for-next' of git://git.infradead.org/users/eparis/notify: inotify: automatically restart syscalls inotify: dont skip removal of watch descriptor if creation of ignored event failed fanotify: dont merge permission events fsnotify: make fasync generic for both inotify and fanotify fsnotify: change locking order fsnotify: dont put marks on temporary list when clearing marks by group fsnotify: introduce locked versions of fsnotify_add_mark() and fsnotify_remove_mark() fsnotify: pass group to fsnotify_destroy_mark() fsnotify: use a mutex instead of a spinlock to protect a groups mark list fanotify: add an extra flag to mark_remove_from_mask that indicates wheather a mark should be destroyed fsnotify: take groups mark_lock before mark lock fsnotify: use reference counting for groups fsnotify: introduce fsnotify_get_group() inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group()
2012-12-21Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds
Merge the rest of Andrew's patches for -rc1: "A bunch of fixes and misc missed-out-on things. That'll do for -rc1. I still have a batch of IPC patches which still have a possible bug report which I'm chasing down." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (25 commits) keys: use keyring_alloc() to create module signing keyring keys: fix unreachable code sendfile: allows bypassing of notifier events SGI-XP: handle non-fatal traps fat: fix incorrect function comment Documentation: ABI: remove testing/sysfs-devices-node proc: fix inconsistent lock state linux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisors memcg: don't register hotcpu notifier from ->css_alloc() checkpatch: warn on uapi #includes that #include <uapi/... revert "rtc: recycle id when unloading a rtc driver" mm: clean up transparent hugepage sysfs error messages hfsplus: add error message for the case of failure of sync fs in delayed_sync_fs() method hfsplus: rework processing of hfs_btree_write() returned error hfsplus: rework processing errors in hfsplus_free_extents() hfsplus: avoid crash on failed block map free kcmp: include linux/ptrace.h drivers/rtc/rtc-imxdi.c: must include <linux/spinlock.h> mm: cma: WARN if freed memory is still in use exec: do not leave bprm->interp on stack ...
2012-12-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull signal handling cleanups from Al Viro: "sigaltstack infrastructure + conversion for x86, alpha and um, COMPAT_SYSCALL_DEFINE infrastructure. Note that there are several conflicts between "unify SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline; resolution is trivial - just remove definitions of SS_ONSTACK and SS_DISABLED from arch/*/uapi/asm/signal.h; they are all identical and include/uapi/linux/signal.h contains the unified variant." Fixed up conflicts as per Al. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: alpha: switch to generic sigaltstack new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those generic compat_sys_sigaltstack() introduce generic sys_sigaltstack(), switch x86 and um to it new helper: compat_user_stack_pointer() new helper: restore_altstack() unify SS_ONSTACK/SS_DISABLE definitions new helper: current_user_stack_pointer() missing user_stack_pointer() instances Bury the conditionals from kernel_thread/kernel_execve series COMPAT_SYSCALL_DEFINE: infrastructure
2012-12-21keys: use keyring_alloc() to create module signing keyringDavid Howells
Use keyring_alloc() to create special keyrings now that it has a permissions parameter rather than using key_alloc() + key_instantiate_and_link(). Signed-off-by: David Howells <dhowells@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-21kcmp: include linux/ptrace.hCyrill Gorcunov
This makes it compile on s390. After all the ptrace_may_access (which we use this file) is declared exactly in linux/ptrace.h. This is preparatory work to wire this syscall up on all archs. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Alexander Kartashov <alekskartashov@parallels.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-20sched: numa: ksm: fix oops in task_numa_placment()Hugh Dickins
task_numa_placement() oopsed on NULL p->mm when task_numa_fault() got called in the handling of break_ksm() for ksmd. That might be a peculiar case, which perhaps KSM could takes steps to avoid? but it's more robust if task_numa_placement() allows for such a possibility. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-20Merge tag 'random_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random Pull random updates from Ted Ts'o: "A few /dev/random improvements for the v3.8 merge window." * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: random: Mix cputime from each thread that exits to the pool random: prime last_data value per fips requirements random: fix debug format strings random: make it possible to enable debugging without rebuild
2012-12-19new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to thoseAl Viro
note that they are relying on access_ok() already checked by caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-19generic compat_sys_sigaltstack()Al Viro
Again, conditional on CONFIG_GENERIC_SIGALTSTACK Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-19introduce generic sys_sigaltstack(), switch x86 and um to itAl Viro
Conditional on CONFIG_GENERIC_SIGALTSTACK; architectures that do not select it are completely unaffected Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-19new helper: restore_altstack()Al Viro
to be used by rt_sigreturn instances Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-19Bury the conditionals from kernel_thread/kernel_execve seriesAl Viro
All architectures have CONFIG_GENERIC_KERNEL_THREAD CONFIG_GENERIC_KERNEL_EXECVE __ARCH_WANT_SYS_EXECVE None of them have __ARCH_WANT_KERNEL_EXECVE and there are only two callers of kernel_execve() (which is a trivial wrapper for do_execve() now) left. Kill the conditionals and make both callers use do_execve(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-19watchdog: Fix disable/enable regressionBjørn Mork
Commit 8d4516904b39 ("watchdog: Fix CPU hotplug regression") causes an oops or hard lockup when doing echo 0 > /proc/sys/kernel/nmi_watchdog echo 1 > /proc/sys/kernel/nmi_watchdog and the kernel is booted with nmi_watchdog=1 (default) Running laptop-mode-tools and disconnecting/connecting AC power will cause this to trigger, making it a common failure scenario on laptops. Instead of bailing out of watchdog_disable() when !watchdog_enabled we can initialize the hrtimer regardless of watchdog_enabled status. This makes it safe to call watchdog_disable() in the nmi_watchdog=0 case, without the negative effect on the enabled => disabled => enabled case. All these tests pass with this patch: - nmi_watchdog=1 echo 0 > /proc/sys/kernel/nmi_watchdog echo 1 > /proc/sys/kernel/nmi_watchdog - nmi_watchdog=0 echo 0 > /sys/devices/system/cpu/cpu1/online - nmi_watchdog=0 echo mem > /sys/power/state Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=51661 Cc: <stable@vger.kernel.org> # v3.7 Cc: Norbert Warmuth <nwarmuth@t-online.de> Cc: Joseph Salisbury <joseph.salisbury@canonical.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-19Merge tag 'modules-next-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module update from Rusty Russell: "Nothing all that exciting; a new module-from-fd syscall for those who want to verify the source of the module (ChromeOS) and/or use standard IMA on it or other security hooks." * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: MODSIGN: Fix kbuild output when using default extra_certificates MODSIGN: Avoid using .incbin in C source modules: don't hand 0 to vmalloc. module: Remove a extra null character at the top of module->strtab. ASN.1: Use the ASN1_LONG_TAG and ASN1_INDEFINITE_LENGTH constants ASN.1: Define indefinite length marker constant moduleparam: use __UNIQUE_ID() __UNIQUE_ID() MODSIGN: Add modules_sign make target powerpc: add finit_module syscall. ima: support new kernel module syscall add finit_module syscall to asm-generic ARM: add finit_module syscall to ARM security: introduce kernel_module_from_file hook module: add flags arg to sys_finit_module() module: add syscall to load module from fd
2012-12-18Merge branch 'akpm' (more patches from Andrew)Linus Torvalds
Merge patches from Andrew Morton: "Most of the rest of MM, plus a few dribs and drabs. I still have quite a few irritating patches left around: ones with dubious testing results, lack of review, ones which should have gone via maintainer trees but the maintainers are slack, etc. I need to be more activist in getting these things wrapped up outside the merge window, but they're such a PITA." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (48 commits) mm/vmscan.c: avoid possible deadlock caused by too_many_isolated() vmscan: comment too_many_isolated() mm/kmemleak.c: remove obsolete simple_strtoul mm/memory_hotplug.c: improve comments mm/hugetlb: create hugetlb cgroup file in hugetlb_init mm/mprotect.c: coding-style cleanups Documentation: ABI: /sys/devices/system/node/ slub: drop mutex before deleting sysfs entry memcg: add comments clarifying aspects of cache attribute propagation kmem: add slab-specific documentation about the kmem controller slub: slub-specific propagation changes slab: propagate tunable values memcg: aggregate memcg cache values in slabinfo memcg/sl[au]b: shrink dead caches memcg/sl[au]b: track all the memcg children of a kmem_cache memcg: destroy memcg caches sl[au]b: allocate objects from memcg cache sl[au]b: always get the cache from its page in kmem_cache_free() memcg: skip memcg kmem allocations in specified code regions memcg: infrastructure to match an allocation to the right cache ...
2012-12-18fork: protect architectures where THREAD_SIZE >= PAGE_SIZE against fork bombsGlauber Costa
Because those architectures will draw their stacks directly from the page allocator, rather than the slab cache, we can directly pass __GFP_KMEMCG flag, and issue the corresponding free_pages. This code path is taken when the architecture doesn't define CONFIG_ARCH_THREAD_INFO_ALLOCATOR (only ia64 seems to), and has THREAD_SIZE >= PAGE_SIZE. Luckily, most - if not all - of the remaining architectures fall in this category. This will guarantee that every stack page is accounted to the memcg the process currently lives on, and will have the allocations to fail if they go over limit. For the time being, I am defining a new variant of THREADINFO_GFP, not to mess with the other path. Once the slab is also tracked by memcg, we can get rid of that flag. Tested to successfully protect against :(){ :|:& };: Signed-off-by: Glauber Costa <glommer@parallels.com> Acked-by: Frederic Weisbecker <fweisbec@redhat.com> Acked-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18res_counter: return amount of charges after res_counter_uncharge()Glauber Costa
It is useful to know how many charges are still left after a call to res_counter_uncharge. While it is possible to issue a res_counter_read after uncharge, this can be racy. If we need, for instance, to take some action when the counters drop down to 0, only one of the callers should see it. This is the same semantics as the atomic variables in the kernel. Since the current return value is void, we don't need to worry about anything breaking due to this change: nobody relied on that, and only users appearing from now on will be checking this value. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: JoonSoo Kim <js1304@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18irq: tsk->comm is an arrayAlan Cox
The array check is useless so remove it. [akpm@linux-foundation.org: remove comment, per David] Signed-off-by: Alan Cox <alan@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18Merge branch 'tip/perf/core-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull minor tracing updates and fixes from Steven Rostedt: "It seems that one of my old pull requests have slipped through. The changes are contained to just the files that I maintain, and are changes from others that I told I would get into this merge window. They have already been in linux-next for several weeks, and should be well tested." * 'tip/perf/core-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Remove unnecessary WARN_ONCE's from tracing_buffers_splice_read tracing: Remove unneeded checks from the stack tracer tracing: Add a resize function to make one buffer equivalent to another buffer
2012-12-18Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull (again) user namespace infrastructure changes from Eric Biederman: "Those bugs, those darn embarrasing bugs just want don't want to get fixed. Linus I just updated my mirror of your kernel.org tree and it appears you successfully pulled everything except the last 4 commits that fix those embarrasing bugs. When you get a chance can you please repull my branch" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: userns: Fix typo in description of the limitation of userns_install userns: Add a more complete capability subset test to commit_creds userns: Require CAP_SYS_ADMIN for most uses of setns. Fix cap_capable to only allow owners in the parent user namespace to have caps.
2012-12-18Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds
Merge misc patches from Andrew Morton: "Incoming: - lots of misc stuff - backlight tree updates - lib/ updates - Oleg's percpu-rwsem changes - checkpatch - rtc - aoe - more checkpoint/restart support I still have a pile of MM stuff pending - Pekka should be merging later today after which that is good to go. A number of other things are twiddling thumbs awaiting maintainer merges." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (180 commits) scatterlist: don't BUG when we can trivially return a proper error. docs: update documentation about /proc/<pid>/fdinfo/<fd> fanotify output fs, fanotify: add @mflags field to fanotify output docs: add documentation about /proc/<pid>/fdinfo/<fd> output fs, notify: add procfs fdinfo helper fs, exportfs: add exportfs_encode_inode_fh() helper fs, exportfs: escape nil dereference if no s_export_op present fs, epoll: add procfs fdinfo helper fs, eventfd: add procfs fdinfo helper procfs: add ability to plug in auxiliary fdinfo providers tools/testing/selftests/kcmp/kcmp_test.c: print reason for failure in kcmp_test breakpoint selftests: print failure status instead of cause make error kcmp selftests: print fail status instead of cause make error kcmp selftests: make run_tests fix mem-hotplug selftests: print failure status instead of cause make error cpu-hotplug selftests: print failure status instead of cause make error mqueue selftests: print failure status instead of cause make error vm selftests: print failure status instead of cause make error ubifs: use prandom_bytes mtd: nandsim: use prandom_bytes ...
2012-12-18pidns: remove unused is_container_init()Gao feng
Since commit 1cdcbec1a337 ("CRED: Neuter sys_capset()") is_container_init() has no callers. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Cc: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Cc: James Morris <jmorris@namei.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18ptrace: introduce PTRACE_O_EXITKILLOleg Nesterov
Ptrace jailers want to be sure that the tracee can never escape from the control. However if the tracer dies unexpectedly the tracee continues to run in potentially unsafe mode. Add the new ptrace option PTRACE_O_EXITKILL. If the tracer exits it sends SIGKILL to every tracee which has this bit set. Note that the new option is not equal to the last-option << 1. Because currently all options have an event, and the new one starts the eventless group. It uses the random 20 bit, so we have the room for 12 more events, but we can also add the new eventless options below this one. Suggested by Amnon Shiloh. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Tested-by: Amnon Shiloh <u3557@miso.sublimeip.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Chris Evans <scarybeasts@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18compat: generic compat_sys_sched_rr_get_interval() implementationCatalin Marinas
This function is used by sparc, powerpc tile and arm64 for compat support. The patch adds a generic implementation with a wrapper for PowerPC to do the u32->int sign extension. The reason for a single patch covering powerpc, tile, sparc and arm64 is to keep it bisectable, otherwise kernel building may fail with mismatched function declarations. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> [for tile] Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18trace: use kbasename()Andy Shevchenko
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18printk: boot_delay should only affect outputAndrew Cooks
The boot_delay parameter affects all printk(), even if the log level prevents visible output from the call. It results in delays greater than the user intended without purpose. This patch changes the behaviour of boot_delay to only delay output. Signed-off-by: Andrew Cooks <acooks@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18watchdog: store the watchdog sample period as a variableChuansheng Liu
Currently getting the sample period is always thru a complex calculation: get_softlockup_thresh() * ((u64)NSEC_PER_SEC / 5). We can store the sample period as a variable, and set it as __read_mostly type. Signed-off-by: liu chuansheng <chuansheng.liu@intel.com> Cc: Don Zickus <dzickus@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18lseek: the "whence" argument is called "whence"Andrew Morton
But the kernel decided to call it "origin" instead. Fix most of the sites. Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18kernel: remove reference to feature-removal-schedule.txtTao Ma
In commit 9c0ece069b32 ("Get rid of Documentation/feature-removal.txt"), Linus removed feature-removal-schedule.txt from Documentation, but there is still some reference to this file. So remove them. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace changes from Eric Biederman: "While small this set of changes is very significant with respect to containers in general and user namespaces in particular. The user space interface is now complete. This set of changes adds support for unprivileged users to create user namespaces and as a user namespace root to create other namespaces. The tyranny of supporting suid root preventing unprivileged users from using cool new kernel features is broken. This set of changes completes the work on setns, adding support for the pid, user, mount namespaces. This set of changes includes a bunch of basic pid namespace cleanups/simplifications. Of particular significance is the rework of the pid namespace cleanup so it no longer requires sending out tendrils into all kinds of unexpected cleanup paths for operation. At least one case of broken error handling is fixed by this cleanup. The files under /proc/<pid>/ns/ have been converted from regular files to magic symlinks which prevents incorrect caching by the VFS, ensuring the files always refer to the namespace the process is currently using and ensuring that the ptrace_mayaccess permission checks are always applied. The files under /proc/<pid>/ns/ have been given stable inode numbers so it is now possible to see if different processes share the same namespaces. Through the David Miller's net tree are changes to relax many of the permission checks in the networking stack to allowing the user namespace root to usefully use the networking stack. Similar changes for the mount namespace and the pid namespace are coming through my tree. Two small changes to add user namespace support were commited here adn in David Miller's -net tree so that I could complete the work on the /proc/<pid>/ns/ files in this tree. Work remains to make it safe to build user namespaces and 9p, afs, ceph, cifs, coda, gfs2, ncpfs, nfs, nfsd, ocfs2, and xfs so the Kconfig guard remains in place preventing that user namespaces from being built when any of those filesystems are enabled. Future design work remains to allow root users outside of the initial user namespace to mount more than just /proc and /sys." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (38 commits) proc: Usable inode numbers for the namespace file descriptors. proc: Fix the namespace inode permission checks. proc: Generalize proc inode allocation userns: Allow unprivilged mounts of proc and sysfs userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file procfs: Print task uids and gids in the userns that opened the proc file userns: Implement unshare of the user namespace userns: Implent proc namespace operations userns: Kill task_user_ns userns: Make create_new_namespaces take a user_ns parameter userns: Allow unprivileged use of setns. userns: Allow unprivileged users to create new namespaces userns: Allow setting a userns mapping to your current uid. userns: Allow chown and setgid preservation userns: Allow unprivileged users to create user namespaces. userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped userns: fix return value on mntns_install() failure vfs: Allow unprivileged manipulation of the mount namespace. vfs: Only support slave subtrees across different user namespaces vfs: Add a user namespace reference from struct mnt_namespace ...
2012-12-17sched: numa: Fix build error if CONFIG_NUMA_BALANCING && ↵Mel Gorman
!CONFIG_TRANSPARENT_HUGEPAGE Michal Hocko reported that the following build error occurs if CONFIG_NUMA_BALANCING is set without THP support kernel/sched/fair.c: In function ‘task_numa_work’: kernel/sched/fair.c:932:55: error: call to ‘__build_bug_failed’ declared with attribute error: BUILD_BUG failed The problem is that HPAGE_PMD_SHIFT triggers a BUILD_BUG() on !CONFIG_TRANSPARENT_HUGEPAGE. This patch addresses the problem. Reported-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17random: Mix cputime from each thread that exits to the poolNick Kossifidis
When a thread exits mix it's cputime (userspace + kernelspace) to the entropy pool. We don't know how "random" this is, so we use add_device_randomness that doesn't mess with entropy count. Signed-off-by: Nick Kossifidis <mickflemm@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2012-12-16Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "A quiet cycle for the security subsystem with just a few maintenance updates." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: Smack: create a sysfs mount point for smackfs Smack: use select not depends in Kconfig Yama: remove locking from delete path Yama: add RCU to drop read locking drivers/char/tpm: remove tasklet and cleanup KEYS: Use keyring_alloc() to create special keyrings KEYS: Reduce initial permissions on keys KEYS: Make the session and process keyrings per-thread seccomp: Make syscall skipping and nr changes more consistent key: Fix resource leak keys: Fix unreachable code KEYS: Add payload preparsing opportunity prior to key instantiate or update
2012-12-16Merge tag 'balancenuma-v11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma Pull Automatic NUMA Balancing bare-bones from Mel Gorman: "There are three implementations for NUMA balancing, this tree (balancenuma), numacore which has been developed in tip/master and autonuma which is in aa.git. In almost all respects balancenuma is the dumbest of the three because its main impact is on the VM side with no attempt to be smart about scheduling. In the interest of getting the ball rolling, it would be desirable to see this much merged for 3.8 with the view to building scheduler smarts on top and adapting the VM where required for 3.9. The most recent set of comparisons available from different people are mel: https://lkml.org/lkml/2012/12/9/108 mingo: https://lkml.org/lkml/2012/12/7/331 tglx: https://lkml.org/lkml/2012/12/10/437 srikar: https://lkml.org/lkml/2012/12/10/397 The results are a mixed bag. In my own tests, balancenuma does reasonably well. It's dumb as rocks and does not regress against mainline. On the other hand, Ingo's tests shows that balancenuma is incapable of converging for this workloads driven by perf which is bad but is potentially explained by the lack of scheduler smarts. Thomas' results show balancenuma improves on mainline but falls far short of numacore or autonuma. Srikar's results indicate we all suffer on a large machine with imbalanced node sizes. My own testing showed that recent numacore results have improved dramatically, particularly in the last week but not universally. We've butted heads heavily on system CPU usage and high levels of migration even when it shows that overall performance is better. There are also cases where it regresses. Of interest is that for specjbb in some configurations it will regress for lower numbers of warehouses and show gains for higher numbers which is not reported by the tool by default and sometimes missed in treports. Recently I reported for numacore that the JVM was crashing with NullPointerExceptions but currently it's unclear what the source of this problem is. Initially I thought it was in how numacore batch handles PTEs but I'm no longer think this is the case. It's possible numacore is just able to trigger it due to higher rates of migration. These reports were quite late in the cycle so I/we would like to start with this tree as it contains much of the code we can agree on and has not changed significantly over the last 2-3 weeks." * tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma: (50 commits) mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable mm/rmap: Convert the struct anon_vma::mutex to an rwsem mm: migrate: Account a transhuge page properly when rate limiting mm: numa: Account for failed allocations and isolations as migration failures mm: numa: Add THP migration for the NUMA working set scanning fault case build fix mm: numa: Add THP migration for the NUMA working set scanning fault case. mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG mm: sched: numa: Control enabling and disabling of NUMA balancing mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships mm: numa: migrate: Set last_nid on newly allocated page mm: numa: split_huge_page: Transfer last_nid on tail page mm: numa: Introduce last_nid to the page frame sched: numa: Slowly increase the scanning period as NUMA faults are handled mm: numa: Rate limit setting of pte_numa if node is saturated mm: numa: Rate limit the amount of memory that is migrated between nodes mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting mm: numa: Migrate pages handled during a pmd_numa hinting fault mm: numa: Migrate on reference policy ...
2012-12-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds
Pull crypto update from Herbert Xu: - Added aesni/avx/x86_64 implementations for camellia. - Optimised AVX code for cast5/serpent/twofish/cast6. - Fixed vmac bug with unaligned input. - Allow compression algorithms in FIPS mode. - Optimised crc32c implementation for Intel. - Misc fixes. * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (32 commits) crypto: caam - Updated SEC-4.0 device tree binding for ERA information. crypto: testmgr - remove superfluous initializers for xts(aes) crypto: testmgr - allow compression algs in fips mode crypto: testmgr - add larger crc32c test vector to test FPU path in crc32c_intel crypto: testmgr - clean alg_test_null entries in alg_test_descs[] crypto: testmgr - remove fips_allowed flag from camellia-aesni null-tests crypto: cast5/cast6 - move lookup tables to shared module padata: use __this_cpu_read per-cpu helper crypto: s5p-sss - Fix compilation error crypto: picoxcell - Add terminating entry for platform_device_id table crypto: omap-aes - select BLKCIPHER2 crypto: camellia - add AES-NI/AVX/x86_64 assembler implementation of camellia cipher crypto: camellia-x86_64 - share common functions and move structures and function definitions to header file crypto: tcrypt - add async speed test for camellia cipher crypto: tegra-aes - fix error-valued pointer dereference crypto: tegra - fix missing unlock on error case crypto: cast5/avx - avoid using temporary stack buffers crypto: serpent/avx - avoid using temporary stack buffers crypto: twofish/avx - avoid using temporary stack buffers crypto: cast6/avx - avoid using temporary stack buffers ...
2012-12-15userns: Fix typo in description of the limitation of userns_installEric W. Biederman
Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-12-15userns: Add a more complete capability subset test to commit_credsEric W. Biederman
When unsharing a user namespace we reduce our credentials to just what can be done in that user namespace. This is a subset of the credentials we previously had. Teach commit_creds to recognize this is a subset of the credentials we have had before and don't clear the dumpability flag. This allows an unprivileged program to do: unshare(CLONE_NEWUSER); fd = open("/proc/self/uid_map", O_RDWR); Where previously opening the uid_map writable would fail because the the task had been made non-dumpable. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>