summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2013-02-24acpi, memory-hotplug: support getting hotplug info from SRATTang Chen
We now provide an option for users who don't want to specify physical memory address in kernel commandline. /* * For movablemem_map=acpi: * * SRAT: |_____| |_____| |_________| |_________| ...... * node id: 0 1 1 2 * hotpluggable: n y y n * movablemem_map: |_____| |_________| * * Using movablemem_map, we can prevent memblock from allocating memory * on ZONE_MOVABLE at boot time. */ So user just specify movablemem_map=acpi, and the kernel will use hotpluggable info in SRAT to determine which memory ranges should be set as ZONE_MOVABLE. If all the memory ranges in SRAT is hotpluggable, then no memory can be used by kernel. But before parsing SRAT, memblock has already reserve some memory ranges for other purposes, such as for kernel image, and so on. We cannot prevent kernel from using these memory. So we need to exclude these ranges even if these memory is hotpluggable. Furthermore, there could be several memory ranges in the single node which the kernel resides in. We may skip one range that have memory reserved by memblock, but if the rest of memory is too small, then the kernel will fail to boot. So, make the whole node which the kernel resides in un-hotpluggable. Then the kernel has enough memory to use. NOTE: Using this way will cause NUMA performance down because the whole node will be set as ZONE_MOVABLE, and kernel cannot use memory on it. If users don't want to lose NUMA performance, just don't use it. [akpm@linux-foundation.org: fix warning] [akpm@linux-foundation.org: use strcmp()] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Len Brown <lenb@kernel.org> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24acpi, memory-hotplug: extend movablemem_map ranges to the end of nodeTang Chen
When implementing movablemem_map boot option, we introduced an array movablemem_map.map[] to store the memory ranges to be set as ZONE_MOVABLE. Since ZONE_MOVABLE is the latst zone of a node, if user didn't specify the whole node memory range, we need to extend it to the node end so that we can use it to prevent memblock from allocating memory in the ranges user didn't specify. We now implement movablemem_map boot option like this: /* * For movablemem_map=nn[KMG]@ss[KMG]: * * SRAT: |_____| |_____| |_________| |_________| ...... * node id: 0 1 1 2 * user specified: |__| |___| * movablemem_map: |___| |_________| |______| ...... * * Using movablemem_map, we can prevent memblock from allocating memory * on ZONE_MOVABLE at boot time. * * NOTE: In this case, SRAT info will be ingored. */ [akpm@linux-foundation.org: clean up code, fix build warning] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Len Brown <lenb@kernel.org> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24acpi, memory-hotplug: parse SRAT before memblock is readyTang Chen
On linux, the pages used by kernel could not be migrated. As a result, if a memory range is used by kernel, it cannot be hot-removed. So if we want to hot-remove memory, we should prevent kernel from using it. The way now used to prevent this is specify a memory range by movablemem_map boot option and set it as ZONE_MOVABLE. But when the system is booting, memblock will allocate memory, and reserve the memory for kernel. And before we parse SRAT, and know the node memory ranges, memblock is working. And it may allocate memory in ranges to be set as ZONE_MOVABLE. This memory can be used by kernel, and never be freed. So, let's parse SRAT before memblock is called first. And it is early enough. The first call of memblock_find_in_range_node() is in: setup_arch() |-->setup_real_mode() so, this patch add a function early_parse_srat() to parse SRAT, and call it before setup_real_mode() is called. NOTE: 1) early_parse_srat() is called before numa_init(), and has initialized numa_meminfo. So DO NOT clear numa_nodes_parsed in numa_init() and DO NOT zero numa_meminfo in numa_init(), otherwise we will lose memory numa info. 2) I don't know why using count of memory affinities parsed from SRAT as a return value in original acpi_numa_init(). So I add a static variable srat_mem_cnt to remember this count and use it as the return value of the new acpi_numa_init() [mhocko@suse.cz: parse SRAT before memblock is ready fix] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Len Brown <lenb@kernel.org> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24page_alloc: bootmem limit with movablecore_mapTang Chen
Ensure the bootmem will not allocate memory from areas that may be ZONE_MOVABLE. The map info is from movablecore_map boot option. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com> Tested-by: Lin Feng <linfeng@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24page_alloc: add movable_memmap kernel parameterTang Chen
Add functions to parse movablemem_map boot option. Since the option could be specified more then once, all the maps will be stored in the global variable movablemem_map.map array. And also, we keep the array in monotonic increasing order by start_pfn. And merge all overlapped ranges. [akpm@linux-foundation.org: improve comment] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: remove unneeded parens] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Reviewed-by: Wen Congyang <wency@cn.fujitsu.com> Tested-by: Lin Feng <linfeng@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24memory-hotplug: export the function try_offline_node()Wen Congyang
try_offline_node() will be needed in the tristate drivers/acpi/processor_driver.c. The node will be offlined when all memory/cpu on the node have been hotremoved. So we need the function try_offline_node() in cpu-hotplug path. If the memory-hotplug is disabled, and cpu-hotplug is enabled 1. no memory no the node we don't online the node, and cpu's node is the nearest node. 2. the node contains some memory the node has been onlined, and cpu's node is still needed to migrate the sleep task on the cpu to the same node. So we do nothing in try_offline_node() in this case. [rientjes@google.com: export the function try_offline_node() fix] Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <lenb@kernel.org> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24memory-hotplug: remove sysfs file of nodeTang Chen
Introduce a new function try_offline_node() to remove sysfs file of node when all memory sections of this node are removed. If some memory sections of this node are not removed, this function does nothing. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24memory-hotplug: remove memmap of sparse-vmemmapTang Chen
Introduce a new API vmemmap_free() to free and remove vmemmap pagetables. Since pagetable implements are different, each architecture has to provide its own version of vmemmap_free(), just like vmemmap_populate(). Note: vmemmap_free() is not implemented for ia64, ppc, s390, and sparc. [mhocko@suse.cz: fix implicit declaration of remove_pagetable] Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24memory-hotplug: common APIs to support page tables hot-removeWen Congyang
When memory is removed, the corresponding pagetables should alse be removed. This patch introduces some common APIs to support vmemmap pagetable and x86_64 architecture direct mapping pagetable removing. All pages of virtual mapping in removed memory cannot be freed if some pages used as PGD/PUD include not only removed memory but also other memory. So this patch uses the following way to check whether a page can be freed or not. 1) When removing memory, the page structs of the removed memory are filled with 0FD. 2) All page structs are filled with 0xFD on PT/PMD, PT/PMD can be cleared. In this case, the page used as PT/PMD can be freed. For direct mapping pages, update direct_pages_count[level] when we freed their pagetables. And do not free the pages again because they were freed when offlining. For vmemmap pages, free the pages and their pagetables. For larger pages, do not split them into smaller ones because there is no way to know if the larger page has been split. As a result, there is no way to decide when to split. We deal the larger pages in the following way: 1) For direct mapped pages, all the pages were freed when they were offlined. And since menmory offline is done section by section, all the memory ranges being removed are aligned to PAGE_SIZE. So only need to deal with unaligned pages when freeing vmemmap pages. 2) For vmemmap pages being used to store page_struct, if part of the larger page is still in use, just fill the unused part with 0xFD. And when the whole page is fulfilled with 0xFD, then free the larger page. [akpm@linux-foundation.org: fix typo in comment] [tangchen@cn.fujitsu.com: do not calculate direct mapping pages when freeing vmemmap pagetables] [tangchen@cn.fujitsu.com: do not free direct mapping pages twice] [tangchen@cn.fujitsu.com: do not free page split from hugepage one by one] [tangchen@cn.fujitsu.com: do not split pages when freeing pagetable pages] [akpm@linux-foundation.org: use pmd_page_vaddr()] [akpm@linux-foundation.org: fix used-uninitialised bug] Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmapYasuaki Ishimatsu
For removing memmap region of sparse-vmemmap which is allocated bootmem, memmap region of sparse-vmemmap needs to be registered by get_page_bootmem(). So the patch searches pages of virtual mapping and registers the pages by get_page_bootmem(). NOTE: register_page_bootmem_memmap() is not implemented for ia64, ppc, s390, and sparc. So introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node() when platform doesn't support it. It's implemented by adding a new Kconfig option named CONFIG_HAVE_BOOTMEM_INFO_NODE, which will be automatically selected by memory-hotplug feature fully supported archs(currently only on x86_64). Since we have 2 config options called MEMORY_HOTPLUG and MEMORY_HOTREMOVE used for memory hot-add and hot-remove separately, and codes in function register_page_bootmem_info_node() are only used for collecting infomation for hot-remove, so reside it under MEMORY_HOTREMOVE. Besides page_isolation.c selected by MEMORY_ISOLATION under MEMORY_HOTPLUG is also such case, move it too. [mhocko@suse.cz: put register_page_bootmem_memmap inside CONFIG_MEMORY_HOTPLUG_SPARSE] [linfeng@cn.fujitsu.com: introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node()] [mhocko@suse.cz: remove the arch specific functions without any implementation] [linfeng@cn.fujitsu.com: mm/Kconfig: move auto selects from MEMORY_HOTPLUG to MEMORY_HOTREMOVE as needed] [rientjes@google.com: fix defined but not used warning] Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Wu Jianguo <wujianguo@huawei.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24memory-hotplug: introduce new arch_remove_memory() for removing page tableWen Congyang
For removing memory, we need to remove page tables. But it depends on architecture. So the patch introduce arch_remove_memory() for removing page table. Now it only calls __remove_pages(). Note: __remove_pages() for some archtecuture is not implemented (I don't know how to implement it for s390). Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24memory-hotplug: remove /sys/firmware/memmap/X sysfsYasuaki Ishimatsu
When (hot)adding memory into system, /sys/firmware/memmap/X/{end, start, type} sysfs files are created. But there is no code to remove these files. This patch implements the function to remove them. We cannot free firmware_map_entry which is allocated by bootmem because there is no way to do so when the system is up. But we can at least remember the address of that memory and reuse the storage when the memory is added next time. This patch also introduces a new list map_entries_bootmem to link the map entries allocated by bootmem when they are removed, and a lock to protect it. And these entries will be reused when the memory is hot-added again. The idea is suggestted by Andrew Morton. NOTE: It is unsafe to return an entry pointer and release the map_entries_lock. So we should not hold the map_entries_lock separately in firmware_map_find_entry() and firmware_map_remove_entry(). Hold the map_entries_lock across find and remove /sys/firmware/memmap/X operation. And also, users of these two functions need to be careful to hold the lock when using these two functions. [tangchen@cn.fujitsu.com: Hold spinlock across find|remove /sys operation] [tangchen@cn.fujitsu.com: fix the wrong comments of map_entries] [tangchen@cn.fujitsu.com: reuse the storage of /sys/firmware/memmap/X/ allocated by bootmem] [tangchen@cn.fujitsu.com: fix section mismatch problem] [tangchen@cn.fujitsu.com: fix the doc format in drivers/firmware/memmap.c] Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Julian Calaby <julian.calaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24memory-hotplug: check whether all memory blocks are offlined or not when ↵Yasuaki Ishimatsu
removing memory We remove the memory like this: 1. lock memory hotplug 2. offline a memory block 3. unlock memory hotplug 4. repeat 1-3 to offline all memory blocks 5. lock memory hotplug 6. remove memory(TODO) 7. unlock memory hotplug All memory blocks must be offlined before removing memory. But we don't hold the lock in the whole operation. So we should check whether all memory blocks are offlined before step6. Otherwise, kernel maybe panicked. Offlining a memory block and removing a memory device can be two different operations. Users can just offline some memory blocks without removing the memory device. For this purpose, the kernel has held lock_memory_hotplug() in __offline_pages(). To reuse the code for memory hot-remove, we repeat step 1-3 to offline all the memory blocks, repeatedly lock and unlock memory hotplug, but not hold the memory hotplug lock in the whole operation. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Wu Jianguo <wujianguo@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24mm: make do_mmap_pgoff return populate as a size in bytes, not as a boolMichel Lespinasse
do_mmap_pgoff() rounds up the desired size to the next PAGE_SIZE multiple, however there was no equivalent code in mm_populate(), which caused issues. This could be fixed by introduced the same rounding in mm_populate(), however I think it's preferable to make do_mmap_pgoff() return populate as a size rather than as a boolean, so we don't have to duplicate the size rounding logic in mm_populate(). Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Andy Lutomirski <luto@amacapital.net> Cc: Greg Ungerer <gregungerer@westnet.com.au> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24mm: introduce VM_POPULATE flag to better deal with racy userspace programsMichel Lespinasse
The vm_populate() code populates user mappings without constantly holding the mmap_sem. This makes it susceptible to racy userspace programs: the user mappings may change while vm_populate() is running, and in this case vm_populate() may end up populating the new mapping instead of the old one. In order to reduce the possibility of userspace getting surprised by this behavior, this change introduces the VM_POPULATE vma flag which gets set on vmas we want vm_populate() to work on. This way vm_populate() may still end up populating the new mapping after such a race, but only if the new mapping is also one that the user has requested (using MAP_SHARED, MAP_LOCKED or mlock) to be populated. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Andy Lutomirski <luto@amacapital.net> Cc: Greg Ungerer <gregungerer@westnet.com.au> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24mm: directly use __mlock_vma_pages_range() in find_extend_vma()Michel Lespinasse
In find_extend_vma(), we don't need mlock_vma_pages_range() to verify the vma type - we know we're working with a stack. So, we can call directly into __mlock_vma_pages_range(), and remove the last make_pages_present() call site. Note that we don't use mm_populate() here, so we can't release the mmap_sem while allocating new stack pages. This is deemed acceptable, because the stack vmas grow by a bounded number of pages at a time, and these are anon pages so we don't have to read from disk to populate them. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Andy Lutomirski <luto@amacapital.net> Cc: Greg Ungerer <gregungerer@westnet.com.au> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24mm: remove flags argument to mmap_regionMichel Lespinasse
After the MAP_POPULATE handling has been moved to mmap_region() call sites, the only remaining use of the flags argument is to pass the MAP_NORESERVE flag. This can be just as easily handled by do_mmap_pgoff(), so do that and remove the mmap_region() flags parameter. [akpm@linux-foundation.org: remove double parens] Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Andy Lutomirski <luto@amacapital.net> Cc: Greg Ungerer <gregungerer@westnet.com.au> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24mm: introduce mm_populate() for populating new vmasMichel Lespinasse
When creating new mappings using the MAP_POPULATE / MAP_LOCKED flags (or with MCL_FUTURE in effect), we want to populate the pages within the newly created vmas. This may take a while as we may have to read pages from disk, so ideally we want to do this outside of the write-locked mmap_sem region. This change introduces mm_populate(), which is used to defer populating such mappings until after the mmap_sem write lock has been released. This is implemented as a generalization of the former do_mlock_pages(), which accomplished the same task but was using during mlock() / mlockall(). Signed-off-by: Michel Lespinasse <walken@google.com> Reported-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Andy Lutomirski <luto@amacapital.net> Cc: Greg Ungerer <gregungerer@westnet.com.au> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24mm: compaction: make __compact_pgdat() and compact_pgdat() return voidAndrew Morton
These functions always return 0. Formalise this. Cc: Jason Liu <r64343@freescale.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-24mm: vmscan: save work scanning (almost) empty LRU listsJohannes Weiner
In certain cases (kswapd reclaim, memcg target reclaim), a fixed minimum amount of pages is scanned from the LRU lists on each iteration, to make progress. Do not make this minimum bigger than the respective LRU list size, however, and save some busy work trying to isolate and reclaim pages that are not there. Empty LRU lists are quite common with memory cgroups in NUMA environments because there exists a set of LRU lists for each zone for each memory cgroup, while the memory of a single cgroup is expected to stay on just one node. The number of expected empty LRU lists is thus memcgs * (nodes - 1) * lru types Each attempt to reclaim from an empty LRU list does expensive size comparisons between lists, acquires the zone's lru lock etc. Avoid that. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Satoru Moriya <satoru.moriya@hds.com> Cc: Simon Jeons <simon.jeons@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Peter Anvin: "This is a huge set of several partly interrelated (and concurrently developed) changes, which is why the branch history is messier than one would like. The *really* big items are two humonguous patchsets mostly developed by Yinghai Lu at my request, which completely revamps the way we create initial page tables. In particular, rather than estimating how much memory we will need for page tables and then build them into that memory -- a calculation that has shown to be incredibly fragile -- we now build them (on 64 bits) with the aid of a "pseudo-linear mode" -- a #PF handler which creates temporary page tables on demand. This has several advantages: 1. It makes it much easier to support things that need access to data very early (a followon patchset uses this to load microcode way early in the kernel startup). 2. It allows the kernel and all the kernel data objects to be invoked from above the 4 GB limit. This allows kdump to work on very large systems. 3. It greatly reduces the difference between Xen and native (Xen's equivalent of the #PF handler are the temporary page tables created by the domain builder), eliminating a bunch of fragile hooks. The patch series also gets us a bit closer to W^X. Additional work in this pull is the 64-bit get_user() work which you were also involved with, and a bunch of cleanups/speedups to __phys_addr()/__pa()." * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (105 commits) x86, mm: Move reserving low memory later in initialization x86, doc: Clarify the use of asm("%edx") in uaccess.h x86, mm: Redesign get_user with a __builtin_choose_expr hack x86: Be consistent with data size in getuser.S x86, mm: Use a bitfield to mask nuisance get_user() warnings x86/kvm: Fix compile warning in kvm_register_steal_time() x86-32: Add support for 64bit get_user() x86-32, mm: Remove reference to alloc_remap() x86-32, mm: Remove reference to resume_map_numa_kva() x86-32, mm: Rip out x86_32 NUMA remapping code x86/numa: Use __pa_nodebug() instead x86: Don't panic if can not alloc buffer for swiotlb mm: Add alloc_bootmem_low_pages_nopanic() x86, 64bit, mm: hibernate use generic mapping_init x86, 64bit, mm: Mark data/bss/brk to nx x86: Merge early kernel reserve for 32bit and 64bit x86: Add Crash kernel low reservation x86, kdump: Remove crashkernel range find limit for 64bit memblock: Add memblock_mem_size() x86, boot: Not need to check setup_header version for setup_data ...
2013-02-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 update from Martin Schwidefsky: "The most prominent change in this patch set is the software dirty bit patch for s390. It removes __HAVE_ARCH_PAGE_TEST_AND_CLEAR_DIRTY and the page_test_and_clear_dirty primitive which makes the common memory management code a bit less obscure. Heiko fixed most of the PCI related fallout, more often than not missing GENERIC_HARDIRQS dependencies. Notable is one of the 3270 patches which adds an export to tty_io to be able to resize a tty. The rest is the usual bunch of cleanups and bug fixes." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (42 commits) s390/module: Add missing R_390_NONE relocation type drivers/gpio: add missing GENERIC_HARDIRQ dependency drivers/input: add couple of missing GENERIC_HARDIRQS dependencies s390/cleanup: rename SPP to LPP s390/mm: implement software dirty bits s390/mm: Fix crst upgrade of mmap with MAP_FIXED s390/linker skript: discard exit.data at runtime drivers/media: add missing GENERIC_HARDIRQS dependency s390/bpf,jit: add vlan tag support drivers/net,AT91RM9200: add missing GENERIC_HARDIRQS dependency iucv: fix kernel panic at reboot s390/Kconfig: sort list of arch selected config options phylib: remove !S390 dependeny from Kconfig uio: remove !S390 dependency from Kconfig dasd: fix sysfs cleanup in dasd_generic_remove s390/pci: fix hotplug module init s390/pci: cleanup clp page allocation s390/pci: cleanup clp inline assembly s390/perf: cpum_cf: fallback to software sampling events s390/mm: provide PAGE_SHARED define ...
2013-02-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID subsystem updates from Jiri Kosina: "HID subsystem and drivers update. Highlights: - new support of a group of Win7/Win8 multitouch devices, from Benjamin Tissoires - fix for compat interface brokenness in uhid, from Dmitry Torokhov - conversion of drivers to use hid_driver helper, by H Hartley Sweeten - HID over I2C transport received ACPI enumeration support, written by Mika Westerberg - there is an ongoing effort to make HID sensor hubs independent of USB transport. The first self-contained part of this work is provided here, done by Mika Westerberg - a few smaller fixes here and there, support for a couple new devices added" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (43 commits) HID: Correct Logitech order in hid-ids.h HID: LG4FF: Remove unnecessary deadzone code HID: LG: Prevent the Logitech Gaming Wheels deadzone HID: LG: Fix detection of Logitech Speed Force Wireless (WiiWheel) HID: LG: Add support for Logitech Momo Force (Red) Wheel HID: hidraw: print message when succesfully initialized HID: logitech: split accel, brake for Driving Force wheel HID: logitech: add report descriptor for Driving Force wheel HID: add ThingM blink(1) USB RGB LED support HID: uhid: make creating devices work on 64/32 systems HID: wiimote: fix nunchuck button parser HID: blacklist Velleman data acquisition boards HID: sensor-hub: don't limit the driver only to USB bus HID: sensor-hub: get rid of unused sensor_hub_grabbed_usages[] table HID: extend autodetect to handle I2C sensors as well HID: ntrig: use input_configured() callback to set the name HID: multitouch: do not use pointers towards hid-core HID: add missing GENERIC_HARDIRQ dependency HID: multitouch: make MT_CLS_ALWAYS_TRUE the new default class HID: multitouch: fix protocol for Elo panels ...
2013-02-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "Assorted tiny fixes queued in trivial tree" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (22 commits) DocBook: update EXPORT_SYMBOL entry to point at export.h Documentation: update top level 00-INDEX file with new additions ARM: at91/ide: remove unsused at91-ide Kconfig entry percpu_counter.h: comment code for better readability x86, efi: fix comment typo in head_32.S IB: cxgb3: delay freeing mem untill entirely done with it net: mvneta: remove unneeded version.h include time: x86: report_lost_ticks doesn't exist any more pcmcia: avoid static analysis complaint about use-after-free fs/jfs: Fix typo in comment : 'how may' -> 'how many' of: add missing documentation for of_platform_populate() btrfs: remove unnecessary cur_trans set before goto loop in join_transaction sound: soc: Fix typo in sound/codecs treewide: Fix typo in various drivers btrfs: fix comment typos Update ibmvscsi module name in Kconfig. powerpc: fix typo (utilties -> utilities) of: fix spelling mistake in comment h8300: Fix home page URL in h8300/README xtensa: Fix home page URL in Kconfig ...
2013-02-22Merge branch 'akpm' (incoming from Andrew)Linus Torvalds
Merge misc patches from Andrew Morton: - Florian has vanished so I appear to have become fbdev maintainer again :( - Joel and Mark are distracted to welcome to the new OCFS2 maintainer - The backlight queue - Small core kernel changes - lib/ updates - The rtc queue - Various random bits * akpm: (164 commits) rtc: rtc-davinci: use devm_*() functions rtc: rtc-max8997: use devm_request_threaded_irq() rtc: rtc-max8907: use devm_request_threaded_irq() rtc: rtc-da9052: use devm_request_threaded_irq() rtc: rtc-wm831x: use devm_request_threaded_irq() rtc: rtc-tps80031: use devm_request_threaded_irq() rtc: rtc-lp8788: use devm_request_threaded_irq() rtc: rtc-coh901331: use devm_clk_get() rtc: rtc-vt8500: use devm_*() functions rtc: rtc-tps6586x: use devm_request_threaded_irq() rtc: rtc-imxdi: use devm_clk_get() rtc: rtc-cmos: use dev_warn()/dev_dbg() instead of printk()/pr_debug() rtc: rtc-pcf8583: use dev_warn() instead of printk() rtc: rtc-sun4v: use pr_warn() instead of printk() rtc: rtc-vr41xx: use dev_info() instead of printk() rtc: rtc-rs5c313: use pr_err() instead of printk() rtc: rtc-at91rm9200: use dev_dbg()/dev_err() instead of printk()/pr_debug() rtc: rtc-rs5c372: use dev_dbg()/dev_warn() instead of printk()/pr_debug() rtc: rtc-ds2404: use dev_err() instead of printk() rtc: rtc-efi: use dev_err()/dev_warn()/pr_err() instead of printk() ...
2013-02-22backlight: lp855x_bl: support new LP8557 deviceKim, Milo
LP8557 is one of LP855x family device, but it has different register map and initialization process. To support this device, device specific configuration is done through the lp855x_device_config structure. Few register definitions are fixed for better readability. BRIGHTNESS_CTRL -> LP855X_BRIGHTNESS_CTRL DEVICE_CTRL -> LP855X_DEVICE_CTRL EEPROM_START -> LP855X_EEPROM_START EEPROM_END -> LP855X_EEPROM_END EPROM_START -> LP8556_EPROM_START EPROM_END -> LP8556_EPROM_END And LP8557 register definitions are added. New register function, lp855x_update_bit() is added. Signed-off-by: Milo(Woogyom) Kim <milo.kim@ti.com> Acked-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22printk: add pr_devel_once and pr_devel_ratelimitedMikhail Gruzdev
Standardize pr_devel logging macros family by adding pr_devel_once and pr_devel_ratelimited. Signed-off-by: Mikhail Gruzdev <michail.gruzdev@gmail.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22sun.com documentation fixesChristian Kujau
After I came across a help text for SUNGEM mentioning a broken sun.com URL, I felt like fixing those up, as they are now pointing to oracle.com URLs. Signed-off-by: Christian Kujau <lists@nerdbynature.de> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22smp: make smp_call_function_many() use logic similar to ↵Shaohua Li
smp_call_function_single() I'm testing swapout workload in a two-socket Xeon machine. The workload has 10 threads, each thread sequentially accesses separate memory region. TLB flush overhead is very big in the workload. For each page, page reclaim need move it from active lru list and then unmap it. Both need a TLB flush. And this is a multthread workload, TLB flush happens in 10 CPUs. In X86, TLB flush uses generic smp_call)function. So this workload stress smp_call_function_many heavily. Without patch, perf shows: + 24.49% [k] generic_smp_call_function_interrupt - 21.72% [k] _raw_spin_lock - _raw_spin_lock + 79.80% __page_check_address + 6.42% generic_smp_call_function_interrupt + 3.31% get_swap_page + 2.37% free_pcppages_bulk + 1.75% handle_pte_fault + 1.54% put_super + 1.41% grab_super_passive + 1.36% __swap_duplicate + 0.68% blk_flush_plug_list + 0.62% swap_info_get + 6.55% [k] flush_tlb_func + 6.46% [k] smp_call_function_many + 5.09% [k] call_function_interrupt + 4.75% [k] default_send_IPI_mask_sequence_phys + 2.18% [k] find_next_bit swapout throughput is around 1300M/s. With the patch, perf shows: - 27.23% [k] _raw_spin_lock - _raw_spin_lock + 80.53% __page_check_address + 8.39% generic_smp_call_function_single_interrupt + 2.44% get_swap_page + 1.76% free_pcppages_bulk + 1.40% handle_pte_fault + 1.15% __swap_duplicate + 1.05% put_super + 0.98% grab_super_passive + 0.86% blk_flush_plug_list + 0.57% swap_info_get + 8.25% [k] default_send_IPI_mask_sequence_phys + 7.55% [k] call_function_interrupt + 7.47% [k] smp_call_function_many + 7.25% [k] flush_tlb_func + 3.81% [k] _raw_spin_lock_irqsave + 3.78% [k] generic_smp_call_function_single_interrupt swapout throughput is around 1400M/s. So there is around a 7% improvement, and total cpu utilization doesn't change. Without the patch, cfd_data is shared by all CPUs. generic_smp_call_function_interrupt does read/write cfd_data several times which will create a lot of cache ping-pong. With the patch, the data becomes per-cpu. The ping-pong is avoided. And from the perf data, this doesn't make call_single_queue lock contend. Next step is to remove generic_smp_call_function_interrupt() from arch code. Signed-off-by: Shaohua Li <shli@fusionio.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22block: optionally snapshot page contents to provide stable pages during writeDarrick J. Wong
This provides a band-aid to provide stable page writes on jbd without needing to backport the fixed locking and page writeback bit handling schemes of jbd2. The band-aid works by using bounce buffers to snapshot page contents instead of waiting. For those wondering about the ext3 bandage -- fixing the jbd locking (which was done as part of ext4dev years ago) is a lot of surgery, and setting PG_writeback on data pages when we actually hold the page lock dropped ext3 performance by nearly an order of magnitude. If we're going to migrate iscsi and raid to use stable page writes, the complaints about high latency will likely return. We might as well centralize their page snapshotting thing to one place. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Tested-by: Andy Lutomirski <luto@amacapital.net> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Artem Bityutskiy <dedekind1@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22mm: only enforce stable page writes if the backing device requires itDarrick J. Wong
Create a helper function to check if a backing device requires stable page writes and, if so, performs the necessary wait. Then, make it so that all points in the memory manager that handle making pages writable use the helper function. This should provide stable page write support to most filesystems, while eliminating unnecessary waiting for devices that don't require the feature. Before this patchset, all filesystems would block, regardless of whether or not it was necessary. ext3 would wait, but still generate occasional checksum errors. The network filesystems were left to do their own thing, so they'd wait too. After this patchset, all the disk filesystems except ext3 and btrfs will wait only if the hardware requires it. ext3 (if necessary) snapshots pages instead of blocking, and btrfs provides its own bdi so the mm will never wait. Network filesystems haven't been touched, so either they provide their own stable page guarantees or they don't block at all. The blocking behavior is back to what it was before 3.0 if you don't have a disk requiring stable page writes. Here's the result of using dbench to test latency on ext2: 3.8.0-rc3: Operation Count AvgLat MaxLat ---------------------------------------- WriteX 109347 0.028 59.817 ReadX 347180 0.004 3.391 Flush 15514 29.828 287.283 Throughput 57.429 MB/sec 4 clients 4 procs max_latency=287.290 ms 3.8.0-rc3 + patches: WriteX 105556 0.029 4.273 ReadX 335004 0.005 4.112 Flush 14982 30.540 298.634 Throughput 55.4496 MB/sec 4 clients 4 procs max_latency=298.650 ms As you can see, the maximum write latency drops considerably with this patch enabled. The other filesystems (ext3/ext4/xfs/btrfs) behave similarly, but see the cover letter for those results. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22bdi: allow block devices to say that they require stable page writesDarrick J. Wong
This patchset ("stable page writes, part 2") makes some key modifications to the original 'stable page writes' patchset. First, it provides creators (devices and filesystems) of a backing_dev_info a flag that declares whether or not it is necessary to ensure that page contents cannot change during writeout. It is no longer assumed that this is true of all devices (which was never true anyway). Second, the flag is used to relaxed the wait_on_page_writeback calls so that wait only occurs if the device needs it. Third, it fixes up the remaining disk-backed filesystems to use this improved conditional-wait logic to provide stable page writes on those filesystems. It is hoped that (for people not using checksumming devices, anyway) this patchset will give back unnecessary performance decreases since the original stable page write patchset went into 3.0. Sorry about not fixing it sooner. Complaints were registered by several people about the long write latencies introduced by the original stable page write patchset. Generally speaking, the kernel ought to allocate as little extra memory as possible to facilitate writeout, but for people who simply cannot wait, a second page stability strategy is (re)introduced: snapshotting page contents. The waiting behavior is still the default strategy; to enable page snapshotting, a superblock flag (MS_SNAP_STABLE) must be set. This flag is used to bandaid^Henable stable page writeback on ext3[1], and is not used anywhere else. Given that there are already a few storage devices and network FSes that have rolled their own page stability wait/page snapshot code, it would be nice to move towards consolidating all of these. It seems possible that iscsi and raid5 may wish to use the new stable page write support to enable zero-copy writeout. Thank you to Jan Kara for helping fix a couple more filesystems. Per Andrew Morton's request, here are the result of using dbench to measure latencies on ext2: 3.8.0-rc3: Operation Count AvgLat MaxLat ---------------------------------------- WriteX 109347 0.028 59.817 ReadX 347180 0.004 3.391 Flush 15514 29.828 287.283 Throughput 57.429 MB/sec 4 clients 4 procs max_latency=287.290 ms 3.8.0-rc3 + patches: WriteX 105556 0.029 4.273 ReadX 335004 0.005 4.112 Flush 14982 30.540 298.634 Throughput 55.4496 MB/sec 4 clients 4 procs max_latency=298.650 ms As you can see, for ext2 the maximum write latency decreases from ~60ms on a laptop hard disk to ~4ms. I'm not sure why the flush latencies increase, though I suspect that being able to dirty pages faster gives the flusher more work to do. On ext4, the average write latency decreases as well as all the maximum latencies: 3.8.0-rc3: WriteX 85624 0.152 33.078 ReadX 272090 0.010 61.210 Flush 12129 36.219 168.260 Throughput 44.8618 MB/sec 4 clients 4 procs max_latency=168.276 ms 3.8.0-rc3 + patches: WriteX 86082 0.141 30.928 ReadX 273358 0.010 36.124 Flush 12214 34.800 165.689 Throughput 44.9941 MB/sec 4 clients 4 procs max_latency=165.722 ms XFS seems to exhibit similar latency improvements as ext2: 3.8.0-rc3: WriteX 125739 0.028 104.343 ReadX 399070 0.005 4.115 Flush 17851 25.004 131.390 Throughput 66.0024 MB/sec 4 clients 4 procs max_latency=131.406 ms 3.8.0-rc3 + patches: WriteX 123529 0.028 6.299 ReadX 392434 0.005 4.287 Flush 17549 25.120 188.687 Throughput 64.9113 MB/sec 4 clients 4 procs max_latency=188.704 ms ...and btrfs, just to round things out, also shows some latency decreases: 3.8.0-rc3: WriteX 67122 0.083 82.355 ReadX 212719 0.005 2.828 Flush 9547 47.561 147.418 Throughput 35.3391 MB/sec 4 clients 4 procs max_latency=147.433 ms 3.8.0-rc3 + patches: WriteX 64898 0.101 71.631 ReadX 206673 0.005 7.123 Flush 9190 47.963 219.034 Throughput 34.0795 MB/sec 4 clients 4 procs max_latency=219.044 ms Before this patchset, all filesystems would block, regardless of whether or not it was necessary. ext3 would wait, but still generate occasional checksum errors. The network filesystems were left to do their own thing, so they'd wait too. After this patchset, all the disk filesystems except ext3 and btrfs will wait only if the hardware requires it. ext3 (if necessary) snapshots pages instead of blocking, and btrfs provides its own bdi so the mm will never wait. Network filesystems haven't been touched, so either they provide their own wait code, or they don't block at all. The blocking behavior is back to what it was before 3.0 if you don't have a disk requiring stable page writes. This patchset has been tested on 3.8.0-rc3 on x64 with ext3, ext4, and xfs. I've spot-checked 3.8.0-rc4 and seem to be getting the same results as -rc3. [1] The alternative fixes to ext3 include fixing the locking order and page bit handling like we did for ext4 (but then why not just use ext4?), or setting PG_writeback so early that ext3 becomes extremely slow. I tried that, but the number of write()s I could initiate dropped by nearly an order of magnitude. That was a bit much even for the author of the stable page series! :) This patch: Creates a per-backing-device flag that tracks whether or not pages must be held immutable during writeout. Eventually it will be used to waive wait_for_page_writeback() if nothing requires stable pages. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22lockdep: make lockdep_assert_held() not have a return valueJohannes Berg
I recently made the mistake of writing: foo = lockdep_dereference_protected(..., lockdep_assert_held(...)); which is clearly bogus. If lockdep is disabled in the config this would cause a compile failure, if it is enabled then it compiles and causes a puzzling warning about dereferencing without the correct protection. Wrap the macro in "do { ... } while (0)" to also fail compile for this when lockdep is enabled. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22video: s3c-fb: fix typo in definition of VIDCON1_VSTATUS_FRONTPORCH valueTomasz Figa
The correct value for VIDCON1_VSTATUS_FRONTPORCH is 3, not 0. Signed-off-by: Tomasz Figa <t.figa@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Jingoo Han <jg1.han@samsung.com> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22video: s3c-fb: add the bit definitions for CSC EQ709 and EQ601Jingoo Han
Add the bit definitions for CSC EQ709 and EQ601. These definitons are used to control the CSC parameter such as equation 709 and equation 601. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Tomasz Figa <t.figa@samsung.com> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22video: s3c-fb: remove unnecessary bracketsJingoo Han
Remove unnecessary brackets and the duplicated VIDTCON2 definition. Also, header comment is modified, because EXYNOS series is supported and <mach/regs-fb.h> is not available. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Tomasz Figa <t.figa@samsung.com> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22video: s3c-fb: remove duplicated S3C_FB_MAX_WINJingoo Han
S3C_FB_MAX_WIN is already defined in 'plat-samsung/include/plat/fb.h'. So, this definition in 'include/video/samsung_fimd.h' should be removed to avoid the duplication. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Tomasz Figa <t.figa@samsung.com> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22drivers/video/exynos/exynos_mipi_dsi.c: use devm_* APIsSachin Kamat
devm_* APIs are device managed and make exit and cleanup code simpler. While at it also remove some unused labels and fix an error path. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Acked-by: Donghwa Lee <dh09.lee@samsung.com> Cc: Inki Dae <inki.dae@samsung.com> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22video: mmp display subsystemZhou Zhu
Add mmp display subsystem to support Marvell MMP display controllers. This subsystem contains 4 parts: --fb folder --core.c --hw folder --panel folder 1. fb folder contains implementation of fb. fb get path and overlay from common interface and operates on these structures. 2. core.c provides common interface for a hardware abstraction. Major parts of this interface are: a) Path: path is a output device connected to a panel or HDMI TV. Main operations of the path is set/get timing/output color. fb operates output device through path structure. b) Ovly: Ovly is a buffer shown on the path. Ovly describes frame buffer and its source/destination size, offset, input color, buffer address, z-order, and so on. Each fb device maps to one overlay. 3. hw folder contains implementation of hardware operations defined by core.c. It registers paths for fb use. 4. panel folder contains implementation of panels. It's connected to path. Panel drivers would also regiester panels and linked to path when probe. Signed-off-by: Zhou Zhu <zzhu3@marvell.com> Signed-off-by: Lisa Du <cldu@marvell.com> Cc: Guoqing Li <ligq@marvell.com> Acked-by: Haojian Zhuang <haojian.zhuang@gmail.com> Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22bug.h, compiler.h: introduce compiletime_assert & BUILD_BUG_ON_MSGDaniel Santos
Introduce compiletime_assert to compiler.h, which moves the details of how to break a build and emit an error message for a specific compiler to the headers where these details should be. Following in the tradition of the POSIX assert macro, compiletime_assert creates a build-time error when the supplied condition is *false*. Next, we add BUILD_BUG_ON_MSG to bug.h which simply wraps compiletime_assert, inverting the logic, so that it fails when the condition is *true*, consistent with the language "build bug on." This macro allows you to specify the error message you want emitted when the supplied condition is true. Finally, we remove all other code from bug.h that mucks with these details (BUILD_BUG & BUILD_BUG_ON), and have them all call BUILD_BUG_ON_MSG. This not only reduces source code bloat, but also prevents the possibility of code being changed for one macro and not for the other (which was previously the case for BUILD_BUG and BUILD_BUG_ON). Since __compiletime_error_fallback is now only used in compiler.h, I'm considering it a private macro and removing the double negation that's now extraneous. [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: Joe Perches <joe@perches.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22compiler.h, bug.h: prevent double error messages with BUILD_BUG{,_ON}Daniel Santos
Prior to the introduction of __attribute__((error("msg"))) in gcc 4.3, creating compile-time errors required a little trickery. BUILD_BUG{,_ON} uses this attribute when available to generate compile-time errors, but also uses the negative-sized array trick for older compilers, resulting in two error messages in some cases. The reason it's "some" cases is that as of gcc 4.4, the negative-sized array will not create an error in some situations, like inline functions. This patch replaces the negative-sized array code with the new __compiletime_error_fallback() macro which expands to the same thing unless the the error attribute is available, in which case it expands to do{}while(0), resulting in exactly one compile-time error on all versions of gcc. Note that we are not changing the negative-sized array code for the unoptimized version of BUILD_BUG_ON, since it has the potential to catch problems that would be disabled in later versions of gcc were __compiletime_error_fallback used. The reason is that that an unoptimized build can't always remove calls to an error-attributed function call (like we are using) that should effectively become dead code if it were optimized. However, using a negative-sized array with a similar value will not result in an false-positive (error). The only caveat being that it will also fail to catch valid conditions, which we should be expecting in an unoptimized build anyway. Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: Joe Perches <joe@perches.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22bug.h: make BUILD_BUG_ON generate compile-time errorDaniel Santos
Negative sized arrays wont create a compile-time error in some cases starting with gcc 4.4 (e.g., inlined functions), but gcc 4.3 introduced the error function attribute that will. This patch modifies BUILD_BUG_ON to behave like BUILD_BUG already does, using the error function attribute so that you don't have to build the entire kernel to discover that you have a problem, and then enjoy trying to track it down from a link-time error. Also, we are only including asm/bug.h and then expecting that linux/compiler.h will eventually be included to define __linktime_error (used in BUILD_BUG_ON). This patch includes it directly for clarity and to avoid the possibility of changes in <arch>/*/include/asm/bug.h being changed or not including linux/compiler.h for some reason. Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Joe Perches <joe@perches.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22bug.h: prevent double evaulation of `condition' in BUILD_BUG_ONDaniel Santos
When calling BUILD_BUG_ON in an optimized build using gcc 4.3 and later, the condition will be evaulated twice, possibily with side-effects. This patch eliminates that error. [akpm@linux-foundation.org: tweak code layout] Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: Joe Perches <joe@perches.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22bug.h: fix BUILD_BUG_ON macro in __CHECKER__Daniel Santos
When __CHECKER__ is defined, we disable all of the BUILD_BUG.* macros. However, both BUILD_BUG_ON_NOT_POWER_OF_2 and BUILD_BUG_ON was evaluating to nothing in this case, and we want (0) since this is a function-like macro that will be followed by a semicolon. Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Joe Perches <joe@perches.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22compiler{,-gcc4}.h, bug.h: Remove duplicate macrosDaniel Santos
__linktime_error() does the same thing as __compiletime_error() and is only used in bug.h. Since the macro defines a function attribute that will cause a failure at compile-time (not link-time), it makes more sense to keep __compiletime_error(), which is also neatly mated with __compiletime_warning(). Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: Joe Perches <joe@perches.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22compiler-gcc{3,4}.h: Use GCC_VERSION macroDaniel Santos
Using GCC_VERSION reduces complexity, is easier to read and is GCC's recommended mechanism for doing version checks. (Just don't ask me why they didn't define it in the first place.) This also makes it easy to merge compiler-gcc{,3,4}.h should somebody want to. Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: Joe Perches <joe@perches.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22compiler-gcc.h: Add gcc-recommended GCC_VERSION macroDaniel Santos
Throughout compiler*.h, many version checks are made. These can be simplified by using the macro that gcc's documentation recommends. However, my primary reason for adding this is that I need bug-check macros that are enabled at certain gcc versions and it's cleaner to use this macro than the tradition method: #if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ => 2) If you add patch level, it gets this ugly: #if __GNUC__ > 4 || (__GNUC__ == 4 && (__GNUC_MINOR__ > 2 || \ __GNUC_MINOR__ == 2 __GNUC_PATCHLEVEL__ >= 1)) As opposed to: #if GCC_VERSION >= 40201 While having separate headers for gcc 3 & 4 eliminates some of this verbosity, they can still be cleaned up by this. See also: http://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Acked-by: Borislav Petkov <bp@alien8.de> Acked-by: David Rientjes <rientjes@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Joe Perches <joe@perches.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-22compiler-gcc4.h: Reorder macros based upon gcc verDaniel Santos
This helps to keep the file from getting confusing, removes one duplicate version check and should encourage future editors to put new macros where they belong. Signed-off-by: Daniel Santos <daniel.santos@pobox.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Borislav Petkov <bp@alien8.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: Joe Perches <joe@perches.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-21Merge tag 'virt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds
Pull ARM virtualization changes: "This contains parts of the ARM KVM support that have dependencies on other patches merged through the arm-soc tree. In combination with patches coming through Russell's tree, this will finally add full support for the kernel based virtual machine on ARM, which has been awaited for some time now. Further, we now have a separate platform for virtual machines and qemu booting that is used by both Xen and KVM, separating these from the Versatile Express reference implementation. Obviously, this new platform is multiplatform capable so it can be combined with existing machines in the same kernel." * tag 'virt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (38 commits) ARM: arch_timer: include linux/errno.h arm: arch_timer: add missing inline in stub function ARM: KVM: arch_timers: Wire the init code and config option ARM: KVM: arch_timers: Add timer world switch ARM: KVM: arch_timers: Add guest timer core support ARM: KVM: Add VGIC configuration option ARM: KVM: VGIC initialisation code ARM: KVM: VGIC control interface world switch ARM: KVM: VGIC interrupt injection ARM: KVM: vgic: retire queued, disabled interrupts ARM: KVM: VGIC virtual CPU interface management ARM: KVM: VGIC distributor handling ARM: KVM: VGIC accept vcpu and dist base addresses from user space ARM: KVM: Initial VGIC infrastructure code ARM: KVM: Keep track of currently running vcpus KVM: ARM: Introduce KVM_ARM_SET_DEVICE_ADDR ioctl ARM: gic: add __ASSEMBLY__ guard to C definitions ARM: gic: define GICH offsets for VGIC support ARM: gic: add missing distributor defintions ARM: mach-virt: fixup machine descriptor after removal of sys_timer ...
2013-02-21Merge tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds
Pull ARM SoC-specific updates from Arnd Bergmann: "This is a larger set of new functionality for the existing SoC families, including: - vt8500 gains support for new CPU cores, notably the Cortex-A9 based wm8850 - prima2 gains support for the "marco" SoC family, its SMP based cousin - tegra gains support for the new Tegra4 (Tegra114) family - socfpga now supports a newer version of the hardware including SMP - i.mx31 and bcm2835 are now using DT probing for their clocks - lots of updates for sh-mobile - OMAP updates for clocks, power management and USB - i.mx6q and tegra now support cpuidle - kirkwood now supports PCIe hot plugging - tegra clock support is updated - tegra USB PHY probing gets implemented diffently" * tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (148 commits) ARM: prima2: remove duplicate v7_invalidate_l1 ARM: shmobile: r8a7779: Correct TMU clock support again ARM: prima2: fix __init section for cpu hotplug ARM: OMAP: Consolidate OMAP USB-HS platform data (part 3/3) ARM: OMAP: Consolidate OMAP USB-HS platform data (part 1/3) arm: socfpga: Add SMP support for actual socfpga harware arm: Add v7_invalidate_l1 to cache-v7.S arm: socfpga: Add entries to enable make dtbs socfpga arm: socfpga: Add new device tree source for actual socfpga HW ARM: tegra: sort Kconfig selects for Tegra114 ARM: tegra: enable ARCH_REQUIRE_GPIOLIB for Tegra114 ARM: tegra: Fix build error w/ ARCH_TEGRA_114_SOC w/o ARCH_TEGRA_3x_SOC ARM: tegra: Fix build error for gic update ARM: tegra: remove empty tegra_smp_init_cpus() ARM: shmobile: Register ARM architected timer ARM: MARCO: fix the build issue due to gic-vic-to-irqchip move ARM: shmobile: r8a7779: Correct TMU clock support ARM: mxs_defconfig: Select CONFIG_DEVTMPFS_MOUNT ARM: mxs: decrease mxs_clockevent_device.min_delta_ns to 2 clock cycles ARM: mxs: use apbx bus clock to drive the timers on timrotv2 ...