summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2012-06-01Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds
Merge misc patches from Andrew Morton: - the "misc" tree - stuff from all over the map - checkpatch updates - fatfs - kmod changes - procfs - cpumask - UML - kexec - mqueue - rapidio - pidns - some checkpoint-restore feature work. Reluctantly. Most of it delayed a release. I'm still rather worried that we don't have a clear roadmap to completion for this work. * emailed from Andrew Morton <akpm@linux-foundation.org>: (78 patches) kconfig: update compression algorithm info c/r: prctl: add ability to set new mm_struct::exe_file c/r: prctl: extend PR_SET_MM to set up more mm_struct entries c/r: procfs: add arg_start/end, env_start/end and exit_code members to /proc/$pid/stat syscalls, x86: add __NR_kcmp syscall fs, proc: introduce /proc/<pid>/task/<tid>/children entry sysctl: make kernel.ns_last_pid control dependent on CHECKPOINT_RESTORE aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector() eventfd: change int to __u64 in eventfd_signal() fs/nls: add Apple NLS pidns: make killed children autoreap pidns: use task_active_pid_ns in do_notify_parent rapidio/tsi721: add DMA engine support rapidio: add DMA engine support for RIO data transfers ipc/mqueue: add rbtree node caching support tools/selftests: add mq_perf_tests ipc/mqueue: strengthen checks on mqueue creation ipc/mqueue: correct mq_attr_ok test ipc/mqueue: improve performance of send/recv selftests: add mq_open_tests ...
2012-06-01syscalls, x86: add __NR_kcmp syscallCyrill Gorcunov
While doing the checkpoint-restore in the user space one need to determine whether various kernel objects (like mm_struct-s of file_struct-s) are shared between tasks and restore this state. The 2nd step can be solved by using appropriate CLONE_ flags and the unshare syscall, while there's currently no ways for solving the 1st one. One of the ways for checking whether two tasks share e.g. mm_struct is to provide some mm_struct ID of a task to its proc file, but showing such info considered to be not that good for security reasons. Thus after some debates we end up in conclusion that using that named 'comparison' syscall might be the best candidate. So here is it -- __NR_kcmp. It takes up to 5 arguments - the pids of the two tasks (which characteristics should be compared), the comparison type and (in case of comparison of files) two file descriptors. Lookups for pids are done in the caller's PID namespace only. At moment only x86 is supported and tested. [akpm@linux-foundation.org: fix up selftests, warnings] [akpm@linux-foundation.org: include errno.h] [akpm@linux-foundation.org: tweak comment text] Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Glauber Costa <glommer@parallels.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Tejun Heo <tj@kernel.org> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Valdis.Kletnieks@vt.edu Cc: Michal Marek <mmarek@suse.cz> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-31Merge git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull two small kvm fixes from Avi Kivity: "A build fix for non-kvm archs and a transparent hugepage refcount bugfix on hosts with 4M pages." * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Export asm-generic/kvm_para.h KVM: MMU: fix huge page adapted on non-PAE host
2012-05-31Merge tag 'please-pull-mce' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull mce cleanup from Tony Luck: "One more mce cleanup before the 3.5 merge window closes" * tag 'please-pull-mce' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: x86/mce: Cleanup timer mess
2012-05-30x86, amd, xen: Avoid NULL pointer paravirt referencesKonrad Rzeszutek Wilk
Stub out MSR methods that aren't actually needed. This fixes a crash as Xen Dom0 on AMD Trinity systems. A bigger patch should be added to remove the paravirt machinery completely for the methods which apparently have no users! Reported-by: Andre Przywara <andre.przywara@amd.com> Link: http://lkml.kernel.org/r/20120530222356.GA28417@andromeda.dapyr.net Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@vger.kernel.org>
2012-05-30x86/mce: Cleanup timer messThomas Gleixner
Use unsigned long for dealing with jiffies not int. Rename the callback to something sensible. Use __this_cpu_read/write for accessing per cpu data. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-05-30x86, mtrr: Fix a type overflow in range_to_mtrr funczhenzhong.duan
When boot on sun G5+ with 4T mem, see an overflow in mtrr cleanup as below. *BAD*gran_size: 2G chunk_size: 2G num_reg: 10 lose cover RAM: -18014398505283592M This is because 1<<31 sign extended. Use an unsigned long constant to fix it. Useful for mem larger than or equal to 4T. -v2: Use 64bit constant instead of explicit type conversion as suggested by Yinghai. Description updated too. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Link: http://lkml.kernel.org/r/4FC5A77F.6060505@oracle.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-05-30Merge branch 'x86/trampoline' into x86/urgentH. Peter Anvin
x86/trampoline contains an urgent commit which is necessarily on a newer baseline. Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-05-30x86, realmode: Unbreak the ia64 build of drivers/acpi/sleep.cH. Peter Anvin
Revert usage of acpi_wakeup_address and move definition to x86 architecture code in order to make compilation work in ia64. [jsakkine: tested compilation in ia64/x86-64 and added proper commit message] Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Originally-by: H. Peter Anvin <hpa@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com> Link: http://lkml.kernel.org/r/1338370421-27735-1-git-send-email-jarkko.sakkinen@intel.com Cc: Tony Luck <tony.luck@intel.com> Cc: Len Brown <lenb@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2012-05-30Merge branch 'x86/mce' into x86/urgentIngo Molnar
Merge in these fixlets. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30x86/mm/pat: Improve scaling of pat_pagerange_is_ram()John Dykstra
Function pat_pagerange_is_ram() scales poorly to large address ranges, because it probes the resource tree for each page. On a 2.6 GHz Opteron, this function consumes 34 ms for a 1 GB range. It is called twice during untrack_pfn_vma(), slowing process cleanup and handicapping the OOM killer. This replacement consumes less than 1ms, under the same conditions. Signed-off-by: John Dykstra <jdykstra@cray.com> on behalf of Cray Inc. Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1337980366.1979.6.camel@redwood [ Small stylistic cleanups and renames ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-30Merge branch 'x86-trampoline-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 trampoline rework from H. Peter Anvin: "This code reworks all the "trampoline"/"realmode" code (various bits that need to live in the first megabyte of memory, most but not all of which runs in real mode at some point) in the kernel into a single object. The main reason for doing this is that it eliminates the last place in the kernel where we needed pages to be mapped RWX. This code separates all that code into proper R/RW/RX pages." Fix up conflicts in arch/x86/kernel/Makefile (mca removed next to reboot code), and arch/x86/kernel/reboot.c (reboot code moved around in one branch, modified in this one), and arch/x86/tools/relocs.c (mostly same code came in earlier due to working around the ld bugs just before the 3.4 release). Also remove stale x86-relocs entry from scripts/.gitignore as per Peter Anvin. * commit '61f5446169046c217a5479517edac3a890c3bee7': (36 commits) x86, realmode: Move end signature into header.S x86, relocs: When printing an error, say relative or absolute x86, relocs: More relocations which may end up as absolute x86, relocs: Workaround for binutils 2.22.52.0.1 section bug xen-acpi-processor: Add missing #include <xen/xen.h> acpi, bgrd: Add missing <linux/io.h> to drivers/acpi/bgrt.c x86, realmode: Change EFER to a single u64 field x86, realmode: Move kernel/realmode.c to realmode/init.c x86, realmode: Move not-common bits out of trampoline_common.S x86, realmode: Mask out EFER.LMA when saving trampoline EFER x86, realmode: Fix no cache bits test in reboot_32.S x86, realmode: Make sure all generated files are listed in targets x86, realmode: build fix: remove duplicate build x86, realmode: read cr4 and EFER from kernel for 64-bit trampoline x86, realmode: fixes compilation issue in tboot.c x86, realmode: move relocs from scripts/ to arch/x86/tools x86, realmode: header for trampoline code x86, realmode: flattened rm hierachy x86, realmode: don't copy real_mode_header x86, realmode: fix 64-bit wakeup sequence ...
2012-05-29mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race conditionAndrea Arcangeli
When holding the mmap_sem for reading, pmd_offset_map_lock should only run on a pmd_t that has been read atomically from the pmdp pointer, otherwise we may read only half of it leading to this crash. PID: 11679 TASK: f06e8000 CPU: 3 COMMAND: "do_race_2_panic" #0 [f06a9dd8] crash_kexec at c049b5ec #1 [f06a9e2c] oops_end at c083d1c2 #2 [f06a9e40] no_context at c0433ded #3 [f06a9e64] bad_area_nosemaphore at c043401a #4 [f06a9e6c] __do_page_fault at c0434493 #5 [f06a9eec] do_page_fault at c083eb45 #6 [f06a9f04] error_code (via page_fault) at c083c5d5 EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP: 00000000 DS: 007b ESI: 9e201000 ES: 007b EDI: 01fb4700 GS: 00e0 CS: 0060 EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246 #7 [f06a9f38] _spin_lock at c083bc14 #8 [f06a9f44] sys_mincore at c0507b7d #9 [f06a9fb0] system_call at c083becd start len EAX: ffffffda EBX: 9e200000 ECX: 00001000 EDX: 6228537f DS: 007b ESI: 00000000 ES: 007b EDI: 003d0f00 SS: 007b ESP: 62285354 EBP: 62285388 GS: 0033 CS: 0073 EIP: 00291416 ERR: 000000da EFLAGS: 00000286 This should be a longstanding bug affecting x86 32bit PAE without THP. Only archs with 64bit large pmd_t and 32bit unsigned long should be affected. With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad() would partly hide the bug when the pmd transition from none to stable, by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is enabled a new set of problem arises by the fact could then transition freely in any of the none, pmd_trans_huge or pmd_trans_stable states. So making the barrier in pmd_none_or_trans_huge_or_clear_bad() unconditional isn't good idea and it would be a flakey solution. This should be fully fixed by introducing a pmd_read_atomic that reads the pmd in order with THP disabled, or by reading the pmd atomically with cmpxchg8b with THP enabled. Luckily this new race condition only triggers in the places that must already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix is localized there but this bug is not related to THP. NOTE: this can trigger on x86 32bit systems with PAE enabled with more than 4G of ram, otherwise the high part of the pmd will never risk to be truncated because it would be zero at all times, in turn so hiding the SMP race. This bug was discovered and fully debugged by Ulrich, quote: ---- [..] pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and eax. 496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd) 497 { 498 /* depend on compiler for an atomic pmd read */ 499 pmd_t pmdval = *pmd; // edi = pmd pointer 0xc0507a74 <sys_mincore+548>: mov 0x8(%esp),%edi ... // edx = PTE page table high address 0xc0507a84 <sys_mincore+564>: mov 0x4(%edi),%edx ... // eax = PTE page table low address 0xc0507a8e <sys_mincore+574>: mov (%edi),%eax [..] Please note that the PMD is not read atomically. These are two "mov" instructions where the high order bits of the PMD entry are fetched first. Hence, the above machine code is prone to the following race. - The PMD entry {high|low} is 0x0000000000000000. The "mov" at 0xc0507a84 loads 0x00000000 into edx. - A page fault (on another CPU) sneaks in between the two "mov" instructions and instantiates the PMD. - The PMD entry {high|low} is now 0x00000003fda38067. The "mov" at 0xc0507a8e loads 0xfda38067 into eax. ---- Reported-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29x86: print physical addresses consistently with other parts of kernelBjorn Helgaas
Print physical address info in a style consistent with the %pR style used elsewhere in the kernel. For example: -found SMP MP-table at [ffff8800000fce90] fce90 +found SMP MP-table at [mem 0x000fce90-0x000fce9f] mapped at [ffff8800000fce90] -initial memory mapped : 0 - 20000000 +initial memory mapped: [mem 0x00000000-0x1fffffff] -Base memory trampoline at [ffff88000009c000] 9c000 size 8192 +Base memory trampoline [mem 0x0009c000-0x0009dfff] mapped at [ffff88000009c000] -SRAT: Node 0 PXM 0 0-80000000 +SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29x86: print e820 physical addresses consistently with other parts of kernelBjorn Helgaas
Print physical address info in a style consistent with the %pR style used elsewhere in the kernel. For example: -BIOS-provided physical RAM map: +e820: BIOS-provided physical RAM map: - BIOS-e820: 0000000000000100 - 000000000009e000 (usable) +BIOS-e820: [mem 0x0000000000000100-0x000000000009dfff] usable -Allocating PCI resources starting at 90000000 (gap: 90000000:6ed1c000) +e820: [mem 0x90000000-0xfed1bfff] available for PCI devices -reserve RAM buffer: 000000000009e000 - 000000000009ffff +e820: reserve RAM buffer [mem 0x0009e000-0x0009ffff] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29Merge tag 'mfd-3.5-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6 Pull MFD changes from Samuel Ortiz: "Besides the usual cleanups, this one brings: * Support for 5 new chipsets: Intel's ICH LPC and SCH Centerton, ST-E's STAX211, Samsung's MAX77693 and TI's LM3533. * Device tree support for the twl6040, tps65910, da9502 and ab8500 drivers. * Fairly big tps56910, ab8500 and db8500 updates. * i2c support for mc13xxx. * Our regular update for the wm8xxx driver from Mark." Fix up various conflicts with other trees, largely due to ab5500 removal etc. * tag 'mfd-3.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (106 commits) mfd: Fix build break of max77693 by adding REGMAP_I2C option mfd: Fix twl6040 build failure mfd: Fix max77693 build failure mfd: ab8500-core should depend on MFD_DB8500_PRCMU gpio: tps65910: dt: process gpio specific device node info mfd: Remove the parsing of dt info for tps65910 gpio mfd: Save device node parsed platform data for tps65910 sub devices mfd: Add r_select to lm3533 platform data gpio: Add Intel Centerton support to gpio-sch mfd: Emulate active low IRQs as well as active high IRQs for wm831x mfd: Mark two lm3533 zone registers as volatile mfd: Fix return type of lm533 attribute is_visible mfd: Enable Device Tree support in the ab8500-pwm driver mfd: Enable Device Tree support in the ab8500-sysctrl driver mfd: Add support for Device Tree to twl6040 mfd: Register the twl6040 child for the ASoC codec unconditionally mfd: Allocate twl6040 IRQ numbers dynamically mfd: twl6040 code cleanup in interrupt initialization part mfd: Enable ab8500-gpadc driver for Device Tree mfd: Prevent unassigned pointer from being used in ab8500-gpadc driver ...
2012-05-28KVM: MMU: fix huge page adapted on non-PAE hostXiao Guangrong
The huge page size is 4M on non-PAE host, but 2M page size is used in transparent_hugepage_adjust(), so the page we get after adjust the mapping level is not the head page, the BUG_ON() will be triggered Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2012-05-26Merge branch 'generic-string-functions'Linus Torvalds
This makes <asm/word-at-a-time.h> actually live up to its promise of allowing architectures to help tune the string functions that do their work a word at a time. David had already taken the x86 strncpy_from_user() function, modified it to work on sparc, and then done the extra work to make it generically useful. This then expands on that work by making x86 use that generic version, completing the circle. But more importantly, it fixes up the word-at-a-time interfaces so that it's now easy to also support things like strnlen_user(), and pretty much most random string functions. David reports that it all works fine on sparc, and Jonas Bonn reported that an earlier version of this worked on OpenRISC too. It's pretty easy for architectures to add support for this and just replace their private versions with the generic code. * generic-string-functions: sparc: use the new generic strnlen_user() function x86: use the new generic strnlen_user() function lib: add generic strnlen_user() function word-at-a-time: make the interfaces truly generic x86: use generic strncpy_from_user routine
2012-05-26x86: use the new generic strnlen_user() functionLinus Torvalds
This throws away the old x86-specific functions in favor of the generic optimized version. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-26word-at-a-time: make the interfaces truly genericLinus Torvalds
This changes the interfaces in <asm/word-at-a-time.h> to be a bit more complicated, but a lot more generic. In particular, it allows us to really do the operations efficiently on both little-endian and big-endian machines, pretty much regardless of machine details. For example, if you can rely on a fast population count instruction on your architecture, this will allow you to make your optimized <asm/word-at-a-time.h> file with that. NOTE! The "generic" version in include/asm-generic/word-at-a-time.h is not truly generic, it actually only works on big-endian. Why? Because on little-endian the generic algorithms are wasteful, since you can inevitably do better. The x86 implementation is an example of that. (The only truly non-generic part of the asm-generic implementation is the "find_zero()" function, and you could make a little-endian version of it. And if the Kbuild infrastructure allowed us to pick a particular header file, that would be lovely) The <asm/word-at-a-time.h> functions are as follows: - WORD_AT_A_TIME_CONSTANTS: specific constants that the algorithm uses. - has_zero(): take a word, and determine if it has a zero byte in it. It gets the word, the pointer to the constant pool, and a pointer to an intermediate "data" field it can set. This is the "quick-and-dirty" zero tester: it's what is run inside the hot loops. - "prep_zero_mask()": take the word, the data that has_zero() produced, and the constant pool, and generate an *exact* mask of which byte had the first zero. This is run directly *outside* the loop, and allows the "has_zero()" function to answer the "is there a zero byte" question without necessarily getting exactly *which* byte is the first one to contain a zero. If you do multiple byte lookups concurrently (eg "hash_name()", which looks for both NUL and '/' bytes), after you've done the prep_zero_mask() phase, the result of those can be or'ed together to get the "either or" case. - The result from "prep_zero_mask()" can then be fed into "find_zero()" (to find the byte offset of the first byte that was zero) or into "zero_bytemask()" (to find the bytemask of the bytes preceding the zero byte). The existence of zero_bytemask() is optional, and is not necessary for the normal string routines. But dentry name hashing needs it, so if you enable DENTRY_WORD_AT_A_TIME you need to expose it. This changes the generic strncpy_from_user() function and the dentry hashing functions to use these modified word-at-a-time interfaces. This gets us back to the optimized state of the x86 strncpy that we lost in the previous commit when moving over to the generic version. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-26x86: use generic strncpy_from_user routineLinus Torvalds
The generic strncpy_from_user() is not really optimal, since it is designed to work on both little-endian and big-endian. And on little-endian you can simplify much of the logic to find the first zero byte, since little-endian arithmetic doesn't have to worry about the carry bit propagating into earlier bytes (only later bytes, which we don't care about). But I have patches to make the generic routines use the architecture- specific <asm/word-at-a-time.h> infrastructure, so that we can regain the little-endian optimizations. But before we do that, switch over to the generic routines to make the patches each do just one well-defined thing. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-25Merge tag 'x86-mce-merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull x86/mce merge window patches from Tony Luck: "Including two that make error_context() checks less sucky" * tag 'x86-mce-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: x86/mce: Add instruction recovery signatures to mce-severity table x86/mce: Fix check for processor context when machine check was taken. MCE: Fix vm86 handling for 32bit mce handler x86/mce Add validation check before GHES error is recorded x86/mce: Avoid reading every machine check bank register twice.
2012-05-25Merge branch 'for-linus' of ↵Linus Torvalds
git://git.linaro.org/people/mszyprowski/linux-dma-mapping Pull CMA and ARM DMA-mapping updates from Marek Szyprowski: "These patches contain two major updates for DMA mapping subsystem (mainly for ARM architecture). First one is Contiguous Memory Allocator (CMA) which makes it possible for device drivers to allocate big contiguous chunks of memory after the system has booted. The main difference from the similar frameworks is the fact that CMA allows to transparently reuse the memory region reserved for the big chunk allocation as a system memory, so no memory is wasted when no big chunk is allocated. Once the alloc request is issued, the framework migrates system pages to create space for the required big chunk of physically contiguous memory. For more information one can refer to nice LWN articles: - 'A reworked contiguous memory allocator': http://lwn.net/Articles/447405/ - 'CMA and ARM': http://lwn.net/Articles/450286/ - 'A deep dive into CMA': http://lwn.net/Articles/486301/ - and the following thread with the patches and links to all previous versions: https://lkml.org/lkml/2012/4/3/204 The main client for this new framework is ARM DMA-mapping subsystem. The second part provides a complete redesign in ARM DMA-mapping subsystem. The core implementation has been changed to use common struct dma_map_ops based infrastructure with the recent updates for new dma attributes merged in v3.4-rc2. This allows to use more than one implementation of dma-mapping calls and change/select them on the struct device basis. The first client of this new infractructure is dmabounce implementation which has been completely cut out of the core, common code. The last patch of this redesign update introduces a new, experimental implementation of dma-mapping calls on top of generic IOMMU framework. This lets ARM sub-platform to transparently use IOMMU for DMA-mapping calls if one provides required IOMMU hardware. For more information please refer to the following thread: http://www.spinics.net/lists/arm-kernel/msg175729.html The last patch merges changes from both updates and provides a resolution for the conflicts which cannot be avoided when patches have been applied on the same files (mainly arch/arm/mm/dma-mapping.c)." Acked by Andrew Morton <akpm@linux-foundation.org>: "Yup, this one please. It's had much work, plenty of review and I think even Russell is happy with it." * 'for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping: (28 commits) ARM: dma-mapping: use PMD size for section unmap cma: fix migration mode ARM: integrate CMA with DMA-mapping subsystem X86: integrate CMA with DMA-mapping subsystem drivers: add Contiguous Memory Allocator mm: trigger page reclaim in alloc_contig_range() to stabilise watermarks mm: extract reclaim code from __alloc_pages_direct_reclaim() mm: Serialize access to min_free_kbytes mm: page_isolation: MIGRATE_CMA isolation functions added mm: mmzone: MIGRATE_CMA migration type added mm: page_alloc: change fallbacks array handling mm: page_alloc: introduce alloc_contig_range() mm: compaction: export some of the functions mm: compaction: introduce isolate_freepages_range() mm: compaction: introduce map_pages() mm: compaction: introduce isolate_migratepages_range() mm: page_alloc: remove trailing whitespace ARM: dma-mapping: add support for IOMMU mapper ARM: dma-mapping: use alloc, mmap, free from dma_ops ARM: dma-mapping: remove redundant code and do the cleanup ... Conflicts: arch/x86/include/asm/dma-mapping.h
2012-05-25x86: hpet: Fix copy-and-paste mistake in earlier changeJan Beulich
This fixes an oversight in 396e2c6fed4ff13b53ce0e573105531cf53b0cad ("x86: Clear HPET configuration registers on startup"), noticed by Thomas Gleixner. Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/4FBF7DA902000078000861EE@nat28.tlf.novell.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-05-24Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM changes from Avi Kivity: "Changes include additional instruction emulation, page-crossing MMIO, faster dirty logging, preventing the watchdog from killing a stopped guest, module autoload, a new MSI ABI, and some minor optimizations and fixes. Outside x86 we have a small s390 and a very large ppc update. Regarding the new (for kvm) rebaseless workflow, some of the patches that were merged before we switch trees had to be rebased, while others are true pulls. In either case the signoffs should be correct now." Fix up trivial conflicts in Documentation/feature-removal-schedule.txt arch/powerpc/kvm/book3s_segment.S and arch/x86/include/asm/kvm_para.h. I suspect the kvm_para.h resolution ends up doing the "do I have cpuid" check effectively twice (it was done differently in two different commits), but better safe than sorry ;) * 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (125 commits) KVM: make asm-generic/kvm_para.h have an ifdef __KERNEL__ block KVM: s390: onereg for timer related registers KVM: s390: epoch difference and TOD programmable field KVM: s390: KVM_GET/SET_ONEREG for s390 KVM: s390: add capability indicating COW support KVM: Fix mmu_reload() clash with nested vmx event injection KVM: MMU: Don't use RCU for lockless shadow walking KVM: VMX: Optimize %ds, %es reload KVM: VMX: Fix %ds/%es clobber KVM: x86 emulator: convert bsf/bsr instructions to emulate_2op_SrcV_nobyte() KVM: VMX: unlike vmcs on fail path KVM: PPC: Emulator: clean up SPR reads and writes KVM: PPC: Emulator: clean up instruction parsing kvm/powerpc: Add new ioctl to retreive server MMU infos kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM KVM: PPC: bookehv: Fix r8/r13 storing in level exception handler KVM: PPC: Book3S: Enable IRQs during exit handling KVM: PPC: Fix PR KVM on POWER7 bare metal KVM: PPC: Fix stbux emulation KVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields ...
2012-05-24Merge tag 'stable/for-linus-3.5-rc0-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen updates from Konrad Rzeszutek Wilk: "Features: * Extend the APIC ops implementation and add IRQ_WORKER vector support so that 'perf' can work properly. * Fix self-ballooning code, and balloon logic when booting as initial domain. * Move array printing code to generic debugfs * Support XenBus domains. * Lazily free grants when a domain is dead/non-existent. * In M2P code use batching calls Bug-fixes: * Fix NULL dereference in allocation failure path (hvc_xen) * Fix unbinding of IRQ_WORKER vector during vCPU hot-unplug * Fix HVM guest resume - we would leak an PIRQ value instead of reusing the existing one." Fix up add-add onflicts in arch/x86/xen/enlighten.c due to addition of apic ipi interface next to the new apic_id functions. * tag 'stable/for-linus-3.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: do not map the same GSI twice in PVHVM guests. hvc_xen: NULL dereference on allocation failure xen: Add selfballoning memory reservation tunable. xenbus: Add support for xenbus backend in stub domain xen/smp: unbind irqworkX when unplugging vCPUs. xen: enter/exit lazy_mmu_mode around m2p_override calls xen/acpi/sleep: Enable ACPI sleep via the __acpi_os_prepare_sleep xen: implement IRQ_WORK_VECTOR handler xen: implement apic ipi interface xen/setup: update VA mapping when releasing memory during setup xen/setup: Combine the two hypercall functions - since they are quite similar. xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0 xen/gnttab: add deferred freeing logic debugfs: Add support to print u32 array in debugfs xen/p2m: An early bootup variant of set_phys_to_machine xen/p2m: Collapse early_alloc_p2m_middle redundant checks. xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument xen/p2m: Move code around to allow for better re-usage.
2012-05-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds
Pull sparc changes from David S. Miller: "This has the generic strncpy_from_user() implementation architectures can now use, which we've been developing on linux-arch over the past few days. For good measure I ran both a 32-bit and a 64-bit glibc testsuite run, and the latter of which pointed out an adjustment I needed to make to sparc's user_addr_max() definition. Linus, you were right, STACK_TOP was not the right thing to use, even on sparc itself :-) From Sam Ravnborg, we have a conversion of sparc32 over to the common alloc_thread_info_node(), since the aspect which originally blocked our doing so (sun4c) has been removed." Fix up trivial arch/sparc/Kconfig and lib/Makefile conflicts. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc: Fix user_addr_max() definition. lib: Sparc's strncpy_from_user is generic enough, move under lib/ kernel: Move REPEAT_BYTE definition into linux/kernel.h sparc: Increase portability of strncpy_from_user() implementation. sparc: Optimize strncpy_from_user() zero byte search. sparc: Add full proper error handling to strncpy_from_user(). sparc32: use the common implementation of alloc_thread_info_node()
2012-05-24Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more relocation fixes from Peter Anvin: "These are additional symbols that have been found to either be absolute or look like they might end up being absolute on one version of GNU ld or another. Unfortunately we have since that a different GNU ld version, 2.21, can generate bogus absolute symbols; again, this would have caused a malfunctioning kernel on x86-32 if relocated. The relocs.c changes changed silent corruption to a build time error. It is worth noting that if the various barrier symbols we use were more consistent in the namespace used this probably could be reduced to a single regexp; if nothing else it looks like there is migration toward a common __(start|stop)___.* namespace." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, relocs: Add jiffies and jiffies_64 to the relative whitelist x86-32, relocs: Whitelist more symbols for ld bug workaround
2012-05-24Merge tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux-2.6Linus Torvalds
Pull GPIO driver changes from Grant Likely: "Lots of gpio changes, both to core code and drivers. Changes do touch architecture code to remove the need for separate arm/gpio.h includes in most architectures. Some new drivers are added, and a number of gpio drivers are converted to use irq_domains for gpio inputs used as interrupts. Device tree support has been amended to allow multiple gpio_chips to use the same device tree node. Remaining changes are primarily bug fixes." * tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux-2.6: (33 commits) gpio/generic: initialize basic_mmio_gpio shadow variables properly gpiolib: Remove 'const' from data argument of gpiochip_find() gpio/rc5t583: add gpio driver for RICOH PMIC RC5T583 gpiolib: quiet gpiochip_add boot message noise gpio: mpc8xxx: Prevent NULL pointer deref in demux handler gpio/lpc32xx: Add device tree support gpio: Adjust of_xlate API to support multiple GPIO chips gpiolib: Implement devm_gpio_request_one() gpio-mcp23s08: dbg_show: fix pullup configuration display Add support for TCA6424A gpio/omap: (re)fix wakeups on level-triggered GPIOs gpio/omap: fix broken context restore for non-OFF mode transitions gpio/omap: fix missing check in *_runtime_suspend() gpio/omap: remove cpu_is_omapxxxx() checks from *_runtime_resume() gpio/omap: remove suspend/resume callbacks gpio/omap: remove retrigger variable in gpio_irq_handler gpio/omap: remove saved_wakeup field from struct gpio_bank gpio/omap: remove suspend_wakeup field from struct gpio_bank gpio/omap: remove saved_fallingdetect, saved_risingdetect gpio/omap: remove virtual_irq_start variable ... Conflicts: drivers/gpio/gpio-samsung.c
2012-05-24Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner. Various trivial conflict fixups in arch Kconfig due to addition of unrelated entries nearby. And one slightly more subtle one for sparc32 (new user of GENERIC_CLOCKEVENTS), fixed up as per Thomas. * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) timekeeping: Fix a few minor newline issues. time: remove obsolete declaration ntp: Fix a stale comment and a few stray newlines. ntp: Correct TAI offset during leap second timers: Fixup the Kconfig consolidation fallout x86: Use generic time config unicore32: Use generic time config um: Use generic time config tile: Use generic time config sparc: Use: generic time config sh: Use generic time config score: Use generic time config s390: Use generic time config openrisc: Use generic time config powerpc: Use generic time config mn10300: Use generic time config mips: Use generic time config microblaze: Use generic time config m68k: Use generic time config m32r: Use generic time config ...
2012-05-24kernel: Move REPEAT_BYTE definition into linux/kernel.hDavid S. Miller
And make sure that everything using it explicitly includes that header file. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-24Merge branch 'drm-core-next' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull main drm updates from Dave Airlie: "This is the main merge window request for the drm. It's big, but jam packed will lots of features and of course 0 regressions. (okay maybe there'll be one). Highlights: - new KMS drivers for server GPU chipsets: ast, mgag200 and cirrus (qemu only). These drivers use the generic modesetting drivers. - initial prime/dma-buf support for i915, nouveau, radeon, udl and exynos - switcheroo audio support: so GPUs with HDMI can turn off the sound driver without crashing stuff. - There are some patches drifting outside drivers/gpu into x86 and EFI for better handling of multiple video adapters in Apple Macs, they've got correct acks except one trivial fixup. - Core: edid parser has better DMT and reduced blanking support, crtc properties, plane properties, - Drivers: exynos: add 2D core accel support, prime support, hdmi features intel: more Haswell support, initial Valleyview support, more hdmi infoframe fixes, update MAINTAINERS for Daniel, lots of cleanups and fixes radeon: more HDMI audio support, improved GPU lockup recovery support, remove nested mutexes, less memory copying on PCIE, fix bus master enable race (kexec), improved fence handling gma500: cleanups, 1080p support, acpi fixes nouveau: better nva3 memory reclocking, kepler accel (needs external firmware rip), async buffer moves on nv84+ hw. I've some more dma-buf patches that rely on the dma-buf merge for vmap stuff, and I've a few fixes building up, but I'd decided I'd better get rid of the main pull sooner rather than later, so the audio guys are also unblocked." Fix up trivial conflict due to some duplicated changes in drivers/gpu/drm/i915/intel_ringbuffer.c * 'drm-core-next' of git://people.freedesktop.org/~airlied/linux: (605 commits) drm/nouveau/nvd9: Fix GPIO initialisation sequence. drm/nouveau: Unregister switcheroo client on exit drm/nouveau: Check dsm on switcheroo unregister drm/nouveau: fix a minor annoyance in an output string drm/nouveau: turn a BUG into a WARN drm/nv50: decode PGRAPH DATA_ERROR = 0x24 drm/nouveau/disp: fix dithering not being enabled on some eDP macbooks drm/nvd9/copy: initialise copy engine, seems to work like nvc0 drm/nvc0/ttm: use copy engines for async buffer moves drm/nva3/ttm: use copy engine for async buffer moves drm/nv98/ttm: add in a (disabled) crypto engine buffer copy method drm/nv84/ttm: use crypto engine for async buffer copies drm/nouveau/ttm: untangle code to support accelerated buffer moves drm/nouveau/fbcon: use fence for sync, rather than notifier drm/nv98/crypt: non-stub implementation of the engine hooks drm/nouveau/fifo: turn all fifo modules into engine modules drm/nv50/graph: remove ability to do interrupt-driven context switching drm/nv50: remove manual context unload on context destruction drm/nv50: remove execution engine context saves on suspend drm/nv50/fifo: use hardware channel kickoff functionality ...
2012-05-24Merge branch 'perf-uprobes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull user-space probe instrumentation from Ingo Molnar: "The uprobes code originates from SystemTap and has been used for years in Fedora and RHEL kernels. This version is much rewritten, reviews from PeterZ, Oleg and myself shaped the end result. This tree includes uprobes support in 'perf probe' - but SystemTap (and other tools) can take advantage of user probe points as well. Sample usage of uprobes via perf, for example to profile malloc() calls without modifying user-space binaries. First boot a new kernel with CONFIG_UPROBE_EVENT=y enabled. If you don't know which function you want to probe you can pick one from 'perf top' or can get a list all functions that can be probed within libc (binaries can be specified as well): $ perf probe -F -x /lib/libc.so.6 To probe libc's malloc(): $ perf probe -x /lib64/libc.so.6 malloc Added new event: probe_libc:malloc (on 0x7eac0) You can now use it in all perf tools, such as: perf record -e probe_libc:malloc -aR sleep 1 Make use of it to create a call graph (as the flat profile is going to look very boring): $ perf record -e probe_libc:malloc -gR make [ perf record: Woken up 173 times to write data ] [ perf record: Captured and wrote 44.190 MB perf.data (~1930712 $ perf report | less 32.03% git libc-2.15.so [.] malloc | --- malloc 29.49% cc1 libc-2.15.so [.] malloc | --- malloc | |--0.95%-- 0x208eb1000000000 | |--0.63%-- htab_traverse_noresize 11.04% as libc-2.15.so [.] malloc | --- malloc | 7.15% ld libc-2.15.so [.] malloc | --- malloc | 5.07% sh libc-2.15.so [.] malloc | --- malloc | 4.99% python-config libc-2.15.so [.] malloc | --- malloc | 4.54% make libc-2.15.so [.] malloc | --- malloc | |--7.34%-- glob | | | |--93.18%-- 0x41588f | | | --6.82%-- glob | 0x41588f ... Or: $ perf report -g flat | less # Overhead Command Shared Object Symbol # ........ ............. ............. .......... # 32.03% git libc-2.15.so [.] malloc 27.19% malloc 29.49% cc1 libc-2.15.so [.] malloc 24.77% malloc 11.04% as libc-2.15.so [.] malloc 11.02% malloc 7.15% ld libc-2.15.so [.] malloc 6.57% malloc ... The core uprobes design is fairly straightforward: uprobes probe points register themselves at (inode:offset) addresses of libraries/binaries, after which all existing (or new) vmas that map that address will have a software breakpoint injected at that address. vmas are COW-ed to preserve original content. The probe points are kept in an rbtree. If user-space executes the probed inode:offset instruction address then an event is generated which can be recovered from the regular perf event channels and mmap-ed ring-buffer. Multiple probes at the same address are supported, they create a dynamic callback list of event consumers. The basic model is further complicated by the XOL speedup: the original instruction that is probed is copied (in an architecture specific fashion) and executed out of line when the probe triggers. The XOL area is a single vma per process, with a fixed number of entries (which limits probe execution parallelism). The API: uprobes are installed/removed via /sys/kernel/debug/tracing/uprobe_events, the API is integrated to align with the kprobes interface as much as possible, but is separate to it. Injecting a probe point is privileged operation, which can be relaxed by setting perf_paranoid to -1. You can use multiple probes as well and mix them with kprobes and regular PMU events or tracepoints, when instrumenting a task." Fix up trivial conflicts in mm/memory.c due to previous cleanup of unmap_single_vma(). * 'perf-uprobes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) perf probe: Detect probe target when m/x options are absent perf probe: Provide perf interface for uprobes tracing: Fix kconfig warning due to a typo tracing: Provide trace events interface for uprobes tracing: Extract out common code for kprobes/uprobes trace events tracing: Modify is_delete, is_return from int to bool uprobes/core: Decrement uprobe count before the pages are unmapped uprobes/core: Make background page replacement logic account for rss_stat counters uprobes/core: Optimize probe hits with the help of a counter uprobes/core: Allocate XOL slots for uprobes use uprobes/core: Handle breakpoint and singlestep exceptions uprobes/core: Rename bkpt to swbp uprobes/core: Make order of function parameters consistent across functions uprobes/core: Make macro names consistent uprobes: Update copyright notices uprobes/core: Move insn to arch specific structure uprobes/core: Remove uprobe_opcode_sz uprobes/core: Make instruction tables volatile uprobes: Move to kernel/events/ uprobes/core: Clean up, refactor and improve the code ...
2012-05-24x86, relocs: Add jiffies and jiffies_64 to the relative whitelistH. Peter Anvin
The symbol jiffies is created in the linker script as an alias to jiffies_64. Unfortunately this is done outside any section, and apparently GNU ld 2.21 doesn't carry the section with it, so we end up with an absolute symbol and therefore a broken kernel. Add jiffies and jiffies_64 to the whitelist. The most disturbing bit with this discovery is that it shows that we have had multiple linker bugs in this area crossing multiple generations, and have been silently building bad kernels for some time. Link: http://lkml.kernel.org/r/20120524171604.0d98284f3affc643e9714470@canb.auug.org.au Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@vger.kernel.org> v3.4
2012-05-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull first series of signal handling cleanups from Al Viro: "This is just the first part of the queue (about a half of it); assorted fixes all over the place in signal handling. This one ends with all sigsuspend() implementations switched to generic one (->saved_sigmask-based). With this, a bunch of assorted old buglets are fixed and most of the missing bits of NOTIFY_RESUME hookup are in place. Two more fixes sit in arm and um trees respectively, and there's a couple of broken ones that need obvious fixes - parisc and avr32 check TIF_NOTIFY_RESUME only on one of two codepaths; fixes for that will happen in the next series" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (55 commits) unicore32: if there's no handler we need to restore sigmask, syscall or no syscall xtensa: add handling of TIF_NOTIFY_RESUME microblaze: drop 'oldset' argument of do_notify_resume() microblaze: handle TIF_NOTIFY_RESUME score: add handling of NOTIFY_RESUME to do_notify_resume() m68k: add TIF_NOTIFY_RESUME and handle it. sparc: kill ancient comment in sparc_sigaction() h8300: missing checks of __get_user()/__put_user() return values frv: missing checks of __get_user()/__put_user() return values cris: missing checks of __get_user()/__put_user() return values powerpc: missing checks of __get_user()/__put_user() return values sh: missing checks of __get_user()/__put_user() return values sparc: missing checks of __get_user()/__put_user() return values avr32: struct old_sigaction is never used m32r: struct old_sigaction is never used xtensa: xtensa_sigaction doesn't exist alpha: tidy signal delivery up score: don't open-code force_sigsegv() cris: don't open-code force_sigsegv() blackfin: don't open-code force_sigsegv() ...
2012-05-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace enhancements from Eric Biederman: "This is a course correction for the user namespace, so that we can reach an inexpensive, maintainable, and reasonably complete implementation. Highlights: - Config guards make it impossible to enable the user namespace and code that has not been converted to be user namespace safe. - Use of the new kuid_t type ensures the if you somehow get past the config guards the kernel will encounter type errors if you enable user namespaces and attempt to compile in code whose permission checks have not been updated to be user namespace safe. - All uids from child user namespaces are mapped into the initial user namespace before they are processed. Removing the need to add an additional check to see if the user namespace of the compared uids remains the same. - With the user namespaces compiled out the performance is as good or better than it is today. - For most operations absolutely nothing changes performance or operationally with the user namespace enabled. - The worst case performance I could come up with was timing 1 billion cache cold stat operations with the user namespace code enabled. This went from 156s to 164s on my laptop (or 156ns to 164ns per stat operation). - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value. Most uid/gid setting system calls treat these value specially anyway so attempting to use -1 as a uid would likely cause entertaining failures in userspace. - If setuid is called with a uid that can not be mapped setuid fails. I have looked at sendmail, login, ssh and every other program I could think of that would call setuid and they all check for and handle the case where setuid fails. - If stat or a similar system call is called from a context in which we can not map a uid we lie and return overflowuid. The LFS experience suggests not lying and returning an error code might be better, but the historical precedent with uids is different and I can not think of anything that would break by lying about a uid we can't map. - Capabilities are localized to the current user namespace making it safe to give the initial user in a user namespace all capabilities. My git tree covers all of the modifications needed to convert the core kernel and enough changes to make a system bootable to runlevel 1." Fix up trivial conflicts due to nearby independent changes in fs/stat.c * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits) userns: Silence silly gcc warning. cred: use correct cred accessor with regards to rcu read lock userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq userns: Convert cgroup permission checks to use uid_eq userns: Convert tmpfs to use kuid and kgid where appropriate userns: Convert sysfs to use kgid/kuid where appropriate userns: Convert sysctl permission checks to use kuid and kgids. userns: Convert proc to use kuid/kgid where appropriate userns: Convert ext4 to user kuid/kgid where appropriate userns: Convert ext3 to use kuid/kgid where appropriate userns: Convert ext2 to use kuid/kgid where appropriate. userns: Convert devpts to use kuid/kgid where appropriate userns: Convert binary formats to use kuid/kgid where appropriate userns: Add negative depends on entries to avoid building code that is userns unsafe userns: signal remove unnecessary map_cred_ns userns: Teach inode_capable to understand inodes whose uids map to other namespaces. userns: Fail exec for suid and sgid binaries with ids outside our user namespace. userns: Convert stat to return values mapped from kuids and kgids userns: Convert user specfied uids and gids in chown into kuids and kgid userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs ...
2012-05-24Merge branch 'delete-mca' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull the MCA deletion branch from Paul Gortmaker: "It was good that we could support MCA machines back in the day, but realistically, nobody is using them anymore. They were mostly limited to 386-sx 16MHz CPU and some 486 class machines and never more than 64MB of RAM. Even the enthusiast hobbyist community seems to have dried up close to ten years ago, based on what you can find searching various websites dedicated to the relatively short lived hardware. So lets remove the support relating to CONFIG_MCA. There is no point carrying this forward, wasting cycles doing routine maintenance on it; wasting allyesconfig build time on validating it, wasting I/O on git grep'ping over it, and so on." Let's see if anybody screams. It generally has compiled, and James Bottomley pointed out that there was a MCA extension from NCR that allowed for up to 4GB of memory and PPro-class machines. So in *theory* there may be users out there. But even James (technically listed as a maintainer) doesn't actually have a system, and while Alan Cox claims to have a machine in his cellar that he offered to anybody who wants to take it off his hands, he didn't argue for keeping MCA support either. So we could bring it back. But somebody had better speak up and talk about how they have actually been using said MCA hardware with modern kernels for us to do that. And David already took the patch to delete all the networking driver code (commit a5e371f61ad3: "drivers/net: delete all code/drivers depending on CONFIG_MCA"). * 'delete-mca' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: MCA: delete all remaining traces of microchannel bus support. scsi: delete the MCA specific drivers and driver code serial: delete the MCA specific 8250 support. arm: remove ability to select CONFIG_MCA
2012-05-24Merge tag 'md-3.5' of git://neil.brown.name/mdLinus Torvalds
Pull md updates from NeilBrown: "It's been a busy cycle for md - lots of fun stuff here.. if you like this kind of thing :-) Main features: - RAID10 arrays can be reshaped - adding and removing devices and changing chunks (not 'far' array though) - allow RAID5 arrays to be reshaped with a backup file (not tested yet, but the priciple works fine for RAID10). - arrays can be reshaped while a bitmap is present - you no longer need to remove it first - SSSE3 support for RAID6 syndrome calculations and of course a number of minor fixes etc." * tag 'md-3.5' of git://neil.brown.name/md: (56 commits) md/bitmap: record the space available for the bitmap in the superblock. md/raid10: Remove extras after reshape to smaller number of devices. md/raid5: improve removal of extra devices after reshape. md: check the return of mddev_find() MD RAID1: Further conditionalize 'fullsync' DM RAID: Use md_error() in place of simply setting Faulty bit DM RAID: Record and handle missing devices DM RAID: Set recovery flags on resume md/raid5: Allow reshape while a bitmap is present. md/raid10: resize bitmap when required during reshape. md: allow array to be resized while bitmap is present. md/bitmap: make sure reshape request are reflected in superblock. md/bitmap: add bitmap_resize function to allow bitmap resizing. md/bitmap: use DIV_ROUND_UP instead of open-code md/bitmap: create a 'struct bitmap_counts' substructure of 'struct bitmap' md/bitmap: make bitmap bitops atomic. md/bitmap: make _page_attr bitops atomic. md/bitmap: merge bitmap_file_unmap and bitmap_file_put. md/bitmap: remove async freeing of bitmap file. md/bitmap: convert some spin_lock_irqsave to spin_lock_irq ...
2012-05-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds
Pull crypto updates from Herbert Xu: - New cipher/hash driver for ARM ux500. - Code clean-up for aesni-intel. - Misc fixes. Fixed up conflicts in arch/arm/mach-ux500/devices-common.h, where quite frankly some of it made no sense at all (the pull brought in a declaration for the dbx500_add_platform_device_noirq() function, which neither exists nor is used anywhere). Also some trivial add-add context conflicts in the Kconfig file in drivers/{char/hw_random,crypto}/ * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: aesni-intel - move more common code to ablk_init_common crypto: aesni-intel - use crypto_[un]register_algs crypto: ux500 - Cleanup hardware identification crypto: ux500 - Update DMA handling for 3.4 mach-ux500: crypto - core support for CRYP/HASH module. crypto: ux500 - Add driver for HASH hardware crypto: ux500 - Add driver for CRYP hardware hwrng: Kconfig - modify default state for atmel-rng driver hwrng: omap - use devm_request_and_ioremap crypto: crypto4xx - move up err_request_irq label crypto, xor: Sanitize checksumming function selection output crypto: caam - add backward compatible string sec4.0
2012-05-23x86/mce: Add instruction recovery signatures to mce-severity tableTony Luck
Instruction recovery cases are very similar to the data recovery one we already have. Just trade out for a new MCACOD value. Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-05-23x86/mce: Fix check for processor context when machine check was taken.Tony Luck
Linus pointed out that there was no value is checking whether m->ip was zero - because zero is a legimate value. If we have a reliable (or faked in the VM86 case) "m->cs" we can use it to tell whether we were in user mode or kernelwhen the machine check hit. Reported-by: Linus Torvalds <torvalds@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-05-23MCE: Fix vm86 handling for 32bit mce handlerAndi Kleen
When running on 32bit the mce handler could misinterpret vm86 mode as ring 0. This can affect whether it does recovery or not; it was possible to panic when recovery was actually possible. Fix this by always forcing vm86 to look like ring 3. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-05-23x86-32, relocs: Whitelist more symbols for ld bug workaroundH. Peter Anvin
As noted in checkin: a3e854d95 x86, relocs: Workaround for binutils 2.22.52.0.1 section bug ld version 2.22.52.0.[12] can incorrectly promote relative symbols to absolute, if the output section they appear in is otherwise empty. Since checkin: 6520fe55 x86, realmode: 16-bit real-mode code support for relocs tool we actually check for this and error out rather than silently creating a kernel which will malfunction if relocated. Ingo found a configuration in which __start_builtin_fw triggered the warning. Go through the linker script sources and look for more symbols that could plausibly get bogusly promoted to absolute, and add them to the whitelist. In general, if the following error triggers: Invalid absolute R_386_32 relocation: <symbol> ... then we should verify that <symbol> is really meant to be relocated, and add it and any related symbols manually to the S_REL regexp. Please note that 6520fe55 does not introduce the error, only the check for the error -- without 6520fe55 this version of ld will simply produce a corrupt kernel if CONFIG_RELOCATABLE is set on x86-32. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@vger.kernel.org> v3.4
2012-05-23Merge branches 'perf-urgent-for-linus' and 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: - Leftover AMD PMU driver fix fix from the end of the v3.4 stabilization cycle. - Late tools/perf/ changes that missed the first round: * endianness fixes * event parsing improvements * libtraceevent fixes factored out from trace-cmd * perl scripting engine fixes related to libtraceevent, * testcase improvements * perf inject / pipe mode fixes * plus a kernel side fix * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Update event scheduling constraints for AMD family 15h models * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "sched, perf: Use a single callback into the scheduler" perf evlist: Show event attribute details perf tools: Bump default sample freq to 4 kHz perf buildid-list: Work better with pipe mode perf tools: Fix piped mode read code perf inject: Fix broken perf inject -b perf tools: rename HEADER_TRACE_INFO to HEADER_TRACING_DATA perf tools: Add union u64_swap type for swapping u64 data perf tools: Carry perf_event_attr bitfield throught different endians perf record: Fix documentation for branch stack sampling perf target: Add cpu flag to sample_type if target has cpu perf tools: Always try to build libtraceevent perf tools: Rename libparsevent to libtraceevent in Makefile perf script: Rename struct event to struct event_format in perl engine perf script: Explicitly handle known default print arg type perf tools: Add hardcoded name term for pmu events perf tools: Separate 'mem:' event scanner bits perf tools: Use allocated list for each parsed event perf tools: Add support for displaying event parser debug info perf test: Move parse event automated tests to separated object
2012-05-23Merge branch 'x86-reboot-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 reboot changes from Ingo Molnar: "The biggest change is a gentler method of rebooting/stopping via IRQs first and then via NMIs. There are several cleanups in the tree as well." * 'x86-reboot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/reboot: Update nonmi_ipi parameter x86/reboot: Use NMI to assist in shutting down if IRQ fails Revert "x86, reboot: Use NMI instead of REBOOT_VECTOR to stop cpus" x86/reboot: Clean up coding style x86/reboot: Reduce to a single DMI table for reboot quirks
2012-05-23Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform changes from Ingo Molnar: "This tree includes assorted platform driver updates and a preparatory series for a platform with custom DMA remapping semantics (sta2x11 I/O hub)." * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vsmp: Fix number of CPUs when vsmp is disabled keyboard: Use BIOS Keyboard variable to set Numlock x86/olpc/xo1/sci: Report RTC wakeup events x86/olpc/xo1/sci: Produce wakeup events for buttons and switches x86, platform: Initial support for sta2x11 I/O hub x86: Introduce CONFIG_X86_DMA_REMAP x86-32: Introduce CONFIG_X86_DEV_DMA_OPS
2012-05-23Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "This tree includes a micro-optimization that avoids cr3 switches during idling; it fixes corner cases and there's also small cleanups" Fix up trivial context conflict with the percpu_xx -> this_cpu_xx changes. * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64: Fix accounting in kernel_physical_mapping_init() x86/tlb: Clean up and unify TLB_FLUSH_ALL definition x86: Drop obsolete ARCH_BOOTMEM support x86, tlb: Switch cr3 in leave_mm() only when needed x86/mm: Fix the size calculation of mapping tables
2012-05-23Merge branch 'x86-mce-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull MCE updates from Ingo Molnar: "This tree updates/fixes MCE hardware support, it makes the APIC LVT thresholding interrupt optional because a subset of AMD F15h models don't support it." * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, MCE, AMD: Disable error thresholding bank 4 on some models x86, MCE, AMD: Hide interrupt_enable sysfs node x86, MCE, AMD: Make APIC LVT thresholding interrupt optional
2012-05-23Merge branch 'x86-fpu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull fpu state cleanups from Ingo Molnar: "This tree streamlines further aspects of FPU handling by eliminating the prepare_to_copy() complication and moving that logic to arch_dup_task_struct(). It also fixes the FPU dumps in threaded core dumps, removes and old (and now invalid) assumption plus micro-optimizes the exit path by avoiding an FPU save for dead tasks." Fixed up trivial add-add conflict in arch/sh/kernel/process.c that came in because we now do the FPU handling in arch_dup_task_struct() rather than the legacy (and now gone) prepare_to_copy(). * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, fpu: drop the fpu state during thread exit x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state() coredump: ensure the fpu state is flushed for proper multi-threaded core dump fork: move the real prepare_to_copy() users to arch_dup_task_struct()
2012-05-23Merge branch 'x86-extable-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull exception table generation updates from Ingo Molnar: "The biggest change here is to allow the build-time sorting of the exception table, to speed up booting. This is achieved by the architecture enabling BUILDTIME_EXTABLE_SORT. This option is enabled for x86 and MIPS currently. On x86 a number of fixes and changes were needed to allow build-time sorting of the exception table, in particular a relocation invariant exception table format was needed. This required the abstracting out of exception table protocol and the removal of 20 years of accumulated assumptions about the x86 exception table format. While at it, this tree also cleans up various other aspects of exception handling, such as early(er) exception handling for rdmsr_safe() et al. All in one, as the result of these changes the x86 exception code is now pretty nice and modern. As an added bonus any regressions in this code will be early and violent crashes, so if you see any of those, you'll know whom to blame!" Fix up trivial conflicts in arch/{mips,x86}/Kconfig files due to nearby modifications of other core architecture options. * 'x86-extable-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits) Revert "x86, extable: Disable presorted exception table for now" scripts/sortextable: Handle relative entries, and other cleanups x86, extable: Switch to relative exception table entries x86, extable: Disable presorted exception table for now x86, extable: Add _ASM_EXTABLE_EX() macro x86, extable: Remove open-coded exception table entries in arch/x86/ia32/ia32entry.S x86, extable: Remove open-coded exception table entries in arch/x86/include/asm/xsave.h x86, extable: Remove open-coded exception table entries in arch/x86/include/asm/kvm_host.h x86, extable: Remove the now-unused __ASM_EX_SEC macros x86, extable: Remove open-coded exception table entries in arch/x86/xen/xen-asm_32.S x86, extable: Remove open-coded exception table entries in arch/x86/um/checksum_32.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/usercopy_32.c x86, extable: Remove open-coded exception table entries in arch/x86/lib/putuser.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/getuser.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/csum-copy_64.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/copy_user_nocache_64.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/copy_user_64.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/checksum_32.S x86, extable: Remove open-coded exception table entries in arch/x86/kernel/test_rodata.c x86, extable: Remove open-coded exception table entries in arch/x86/kernel/entry_64.S ...