summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2011-02-16x86-64, NUMA: Rename cpu_nodes_parsed to numa_nodes_parsedTejun Heo
It's no longer necessary to keep both cpu_nodes_parsed and mem_nodes_parsed. In preparation for merge, rename cpu_nodes_parsed to numa_nodes_parsed. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86-64, NUMA: Kill numa_nodes[]Tejun Heo
numa_nodes[] doesn't carry any information which isn't present in numa_meminfo. Each entry is simply min/max range of all the memblks for the node. This is not only redundant but also inaccurate when memblks for different nodes interleave - for example, find_node_by_addr() can return the wrong nodeid. Kill numa_nodes[] and always use numa_meminfo instead. * nodes_cover_memory() is renamed to numa_meminfo_cover_memory() and now operations on numa_meminfo and returns bool. * setup_node_bootmem() needs min/max range. Compute the range on the fly. setup_node_bootmem() invocation is restructured to use outer loop instead of hardcoding the double invocations. * find_node_by_addr() now operates on numa_meminfo. * setup_physnodes() builds physnodes[] from memblks. This will go away when emulation code is updated to use struct numa_meminfo. This patch also makes the following misc changes. * Clearing of nodes_add[] clearing is converted to memset(). * numa_add_memblk() in amd_numa_init() is moved down a bit for consistency. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86-64, NUMA: Add common find_node_by_addr()Tejun Heo
srat_64.c and amdtopology_64.c had their own versions of find_node_by_addr() which were basically the same. Add common one in numa_64.c and remove the duplicates. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86-64, NUMA: consolidate and improve memblk sanity checksTejun Heo
memblk sanity check was scattered around and incomplete. Consolidate and improve. * Confliction detection and cutoff_node() logic are moved to numa_cleanup_meminfo(). * numa_cleanup_meminfo() clears the unused memblks before returning. * Check and warn about invalid input parameters in numa_add_memblk(). * Check the maximum number of memblk isn't exceeded in numa_add_memblk(). * numa_cleanup_meminfo() is now called before numa_emulation() so that the emulation code also uses the cleaned up version. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86-64, NUMA: make numa_cleanup_meminfo() prettierTejun Heo
* Factor out numa_remove_memblk_from(). * Hole detection doesn't need separate start/end. Calculate start/end once. * Relocate comment. * Define iterators at the top and remove unnecessary prefix increments. This prepares for further improvements to the function. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86-64, NUMA: Separate out numa_cleanup_meminfo()Tejun Heo
Separate out numa_cleanup_meminfo() from numa_register_memblks(). node_possible_map initialization is moved to the top of the split numa_register_memblks(). This patch doesn't cause behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86-64, NUMA: Introduce struct numa_meminfoTejun Heo
Arrays for memblks and nodeids and their length lived in separate variables making things unnecessarily cumbersome. Introduce struct numa_meminfo which contains all memory configuration info. This patch doesn't cause any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86-64, NUMA: Remove %NULL @nodeids handling from compute_hash_shift()Tejun Heo
numa_emulation() called compute_hash_shift() with %NULL @nodeids which meant identity mapping between index and nodeid. Make numa_emulation() build identity array and drop %NULL @nodeids handling from populate_memnodemap() and thus from compute_hash_shift(). This is to prepare for transition to using memblks instead. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86-64, NUMA: Kill {acpi|amd|dummy}_scan_nodes()Tejun Heo
They are empty now. Kill them. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86-64, NUMA: Unify the rest of memblk registrationTejun Heo
Move the remaining memblk registration logic from acpi_scan_nodes() to numa_register_memblks() and initmem_init(). This applies nodes_cover_memory() sanity check, memory node sorting and node_online() checking, which were only applied to acpi, to all init methods. As all memblk registration is moved to common code, active range clearing is moved to initmem_init() too and removed from bad_srat(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86-64, NUMA: Unify use of memblk in all init methodsTejun Heo
Make both amd and dummy use numa_add_memblk() to describe the detected memory blocks. This allows initmem_init() to call numa_register_memblk() regardless of init method in use. Drop custom memory registration codes from amd and dummy. After this change, memblk merge/cleanup in numa_register_memblks() is applied to all init methods. As this makes compute_hash_shift() and numa_register_memblks() used only inside numa_64.c, make them static. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86-64, NUMA: Factor out memblk handling into numa_{add|register}_memblk()Tejun Heo
Factor out memblk handling from srat_64.c into two functions in numa_64.c. This patch doesn't introduce any behavior change. The next patch will make all init methods use these functions. - v2: Fixed build failure on 32bit due to misplaced NR_NODE_MEMBLKS. Reported by Ingo. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86-64, NUMA: Kill {acpi|amd}_get_nodes()Tejun Heo
With common numa_nodes[], common code in numa_64.c can access it directly. Copy directly and kill {acpi|amd}_get_nodes(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86-64, NUMA: Use common numa_nodes[]Tejun Heo
ACPI and amd are using separate nodes[] array. Add numa_nodes[] and use them in all NUMA init methods. cutoff_node() cleanup is moved from srat_64.c to numa_64.c and applied in initmem_init() regardless of init methods. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86-64, NUMA: Move apicid to numa mapping initialization from ↵Tejun Heo
amd_scan_nodes() to amd_numa_init() This brings amd initialization behavior closer to that of acpi. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86-64, NUMA: Remove local variable found from amd_numa_init()Tejun Heo
Use weight count on mem_nodes_parsed instead. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86-64, NUMA: Use common {cpu|mem}_nodes_parsedTejun Heo
ACPI and amd are using separate nodes_parsed masks. Add {cpu|mem}_nodes_parsed and use them in all NUMA init methods. Initialization of the masks and building node_possible_map are now handled commonly by initmem_init(). dummy_numa_init() is updated to set node 0 on both masks. While at it, move the info messages from scan to init. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86-64, NUMA: Restructure initmem_init()Tejun Heo
Reorganize initmem_init() such that, * Different NUMA init methods are iterated in a consistent way. * Each iteration re-initializes all the parameters and different method can be tried after a failure. * Dummy init is handled the same as other methods. Apart from how retry after failure, this patch doesn't change the behavior. The call sequences are kept equivalent across the conversion. After the change, bad_srat() doesn't need to clear apic to node mapping or worry about numa_off. Simplified accordingly. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86, NUMA: Move *_numa_init() invocations into initmem_init()Tejun Heo
There's no reason for these to live in setup_arch(). Move them inside initmem_init(). - v2: x86-32 initmem_init() weren't updated breaking 32bit builds. Fixed. Found by Ankita. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ankita Garg <ankita@in.ibm.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86-64, NUMA: Wrap acpi_numa_init() so that failure can be indicated by ↵Tejun Heo
return value Because of the way ACPI tables are parsed, the generic acpi_numa_init() couldn't return failure when error was detected by arch hooks. Instead, the failure state was recorded and later arch dependent init hook - acpi_scan_nodes() - would fail. Wrap acpi_numa_init() with x86_acpi_numa_init() so that failure can be indicated as return value immediately. This is in preparation for further NUMA init cleanups. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86-64, NUMA: Unify {acpi|amd}_{numa_init|scan_nodes}() arguments and return ↵Tejun Heo
values The functions used during NUMA initialization - *_numa_init() and *_scan_nodes() - have different arguments and return values. Unify them such that they all take no argument and return 0 on success and -errno on failure. This is in preparation for further NUMA init cleanups. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86, NUMA: Drop @start/last_pfn from initmem_init()Tejun Heo
initmem_init() extensively accesses and modifies global data structures and the parameters aren't even followed depending on which path is being used. Drop @start/last_pfn and let it deal with @max_pfn directly. This is in preparation for further NUMA init cleanups. - v2: x86-32 initmem_init() weren't updated breaking 32bit builds. Fixed. Found by Yinghai. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86-64, NUMA: Simplify hotplug node handling in acpi_numa_memory_affinity_init()Tejun Heo
Hotplug node handling in acpi_numa_memory_affinity_init() was unnecessarily complicated with storing the original nodes[] entry and restoring it afterwards. Simplify it by not modifying the nodes[] entry for hotplug nodes from the beginning. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16x86-64, NUMA: Make dummy node initialization path similar to non-dummy onesTejun Heo
Dummy node initialization in initmem_init() didn't initialize apicid to node mapping and set cpu to node mapping directly by caling numa_set_node(), which is different from non-dummy init paths. Update it such that they behave similarly. Initialize apicid to node mapping and call numa_init_array(). The actual cpu to node mapping is handled by init_cpu_to_node() later. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
2011-02-16Merge branch 'x86/amd-nb' into x86/mmIngo Molnar
Merge reason: consolidate it into the more generic x86/mm tree to prevent conflicts with ongoing NUMA work. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-16Merge branch 'x86/numa' into x86/mmIngo Molnar
Merge reason: consolidate it into the more generic x86/mm tree to prevent conflicts. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-16Merge branch 'x86/bootmem' into x86/mmIngo Molnar
Merge reason: the topic is ready - consolidate it into the more generic x86/mm tree and prevent conflicts. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-15x86, amd: Initialize variable properlyBorislav Petkov
Commit d518573de63f ("x86, amd: Normalize compute unit IDs on multi-node processors") introduced compute unit normalization but causes a compiler warning: arch/x86/kernel/cpu/amd.c: In function 'amd_detect_cmp': arch/x86/kernel/cpu/amd.c:268: warning: 'cores_per_cu' may be used uninitialized in this function arch/x86/kernel/cpu/amd.c:268: note: 'cores_per_cu' was declared here The compiler is right - initialize it with a proper value. Also, fix up a comment while at it. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20110214171451.GB10076@kryptos.osrc.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14x86, numa: Add error handling for bad cpu-to-node mappingsDavid Rientjes
CONFIG_DEBUG_PER_CPU_MAPS may return NUMA_NO_NODE when an early_cpu_to_node() mapping hasn't been initialized. In such a case, it emits a warning and continues without an issue but callers may try to use the return value to index into an array. We can catch those errors and fail silently since a warning has already been emitted. No current user of numa_add_cpu() requires this error checking to avoid a crash, but it's better to be proactive in case a future user happens to have a bug and a user tries to diagnose it with CONFIG_DEBUG_PER_CPU_MAPS. Reported-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Tejun Heo <tj@kernel.org> LKML-Reference: <alpine.DEB.2.00.1102071407250.7812@chino.kir.corp.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14Merge commit 'v2.6.38-rc4' into x86/numaIngo Molnar
Merge reason: Merge latest fixes before applying new patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14x86: Emit "mem=nopentium ignored" warning when not supportedKamal Mostafa
Emit warning when "mem=nopentium" is specified on any arch other than x86_32 (the only that arch supports it). Signed-off-by: Kamal Mostafa <kamal@canonical.com> BugLink: http://bugs.launchpad.net/bugs/553464 Cc: Yinghai Lu <yinghai@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> LKML-Reference: <1296783486-23033-2-git-send-email-kamal@canonical.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org>
2011-02-14x86: Fix panic when handling "mem={invalid}" paramKamal Mostafa
Avoid removing all of memory and panicing when "mem={invalid}" is specified, e.g. mem=blahblah, mem=0, or mem=nopentium (on platforms other than x86_32). Signed-off-by: Kamal Mostafa <kamal@canonical.com> BugLink: http://bugs.launchpad.net/bugs/553464 Cc: Yinghai Lu <yinghai@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: <stable@kernel.org> # .3x: as far back as it applies LKML-Reference: <1296783486-23033-1-git-send-email-kamal@canonical.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14x86: Avoid tlbstate lock if not enough cpusShaohua Li
This one isn't related to previous patch. If online cpus are below NUM_INVALIDATE_TLB_VECTORS, we don't need the lock. The comments in the code declares we don't need the check, but a hot lock still needs an atomic operation and expensive, so add the check here. Uses nr_cpu_ids here as suggested by Eric Dumazet. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> LKML-Reference: <1295232730.1949.710.camel@sli10-conroe> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14x86: Scale up the number of TLB invalidate vectors with NR_CPUs, up to 32Shaohua Li
Make the maxium TLB invalidate vectors depend on NR_CPUS linearly, with a maximum of 32 vectors. We currently only have 8 vectors for TLB invalidation and that is clearly inadequate. If we have a lot of CPUs, the CPUs need share the 8 vectors and tlbstate_lock is used to protect them. flush_tlb_page() is heavily used in page reclaim, which will cause a lot of lock contention for tlbstate_lock. Andi Kleen suggested increasing the vectors number to 32, which should be good for current typical systems to reduce the tlbstate_lock contention. My test system has 4 sockets and 64G memory, and 64 CPUs. My workload creates 64 processes. Each process mmap reads a big empty sparse file. The total size of the files are 2*total_mem, so this will cause a lot of page reclaim. Below is the result I get from perf call-graph profiling: without the patch: ------------------ 24.25% usemem [kernel] [k] _raw_spin_lock | --- _raw_spin_lock | |--42.15%-- native_flush_tlb_others with the patch: ------------------ 14.96% usemem [kernel] [k] _raw_spin_lock | --- _raw_spin_lock |--13.89%-- native_flush_tlb_others So this heavily reduces the tlbstate_lock contention. Suggested-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Shaohua Li <shaohua.li@intel.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1295232727.1949.709.camel@sli10-conroe> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14x86: Allocate 32 tlb_invalidate_interrupt handler stubsShaohua Li
Add up to 32 invalidate_interrupt handlers. How many handlers are added depends on NUM_INVALIDATE_TLB_VECTORS. So if NUM_INVALIDATE_TLB_VECTORS is smaller than 32, we reduce code size. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> LKML-Reference: <1295232725.1949.708.camel@sli10-conroe> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14x86: Cleanup vector usageShaohua Li
Cleanup the vector usage and make them continuous if possible. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> LKML-Reference: <1295232722.1949.707.camel@sli10-conroe> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-14Merge branch 'linus' into x86/bootmemIngo Molnar
Conflicts: arch/x86/mm/numa_64.c Merge reason: fix the conflict, update to latest -rc and pick up this dependent fix from Yinghai: e6d2e2b2b1e1: memblock: don't adjust size in memblock_find_base() Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-12Merge branch 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
* 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Make sure KERNEL_GS_BASE is valid when loading gs_index
2011-02-12Merge branch 's5p-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung * 's5p-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung: ARM: SAMSUNG: Ensure struct sys_device is declared in plat/pm.h ARM: S5PV310: Cleanup System MMU ARM: S5PV310: Add support System MMU on SMDKV310
2011-02-12Merge branch 'next' of git://git.monstr.eu/linux-2.6-microblazeLinus Torvalds
* 'next' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: Fix msr instruction detection microblaze: Fix pte_update function microblaze: Fix asm compilation warning microblaze: Fix IRQ flag handling for MSR=0
2011-02-11ARM: SAMSUNG: Ensure struct sys_device is declared in plat/pm.hMark Brown
Previously we were relying on it being pulled in by other headers for the prototype of s3c24xx_irq_suspend() and s3c24xx_irq_resume(). Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2011-02-11ARM: S5PV310: Cleanup System MMUKukjin Kim
This patch cleans following up. - Moved definition of System MMU IPNUM into mach/sysmmu.h - Removed useless SYSMMU_DEBUG configuration - Removed useless header file plat/sysmmu.h Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2011-02-11ARM: S5PV310: Add support System MMU on SMDKV310Thomas Abraham
The 's5pv310_device_sysmmu' is used on SMDKV310. But since it is not compiled now, there is a build error. To fix this compilation error, S5PV310_DEV_SYSMMU needs to be selected for SMDKV310 board. This patch enables System MMU support on SMDKV310. Signed-off-by: Thomas Abraham <thomas.ab@samsung.com> [kgene.kim@samsung.com: Adding description] Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
2011-02-10Merge branch 'tty-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6 * 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: serial: bfin_5xx: split uart RX lock from uart port lock to avoid deadlock 68360serial: Plumb in rs_360_get_icount() n_gsm: copy mtu over when configuring via ioctl interface virtio: console: Move file back to drivers/char/
2011-02-10x86: Adjust section placement in AMD northbridge related codeJan Beulich
amd_nb_misc_ids[] can live in .rodata, and enable_pci_io_ecs() can be moved into .cpuinit.text. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Hans Rosenfeld <hans.rosenfeld@amd.com> Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <4D525DDD0200007800030F07@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-02-09KVM: SVM: Make sure KERNEL_GS_BASE is valid when loading gs_indexJoerg Roedel
The gs_index loading code uses the swapgs instruction to switch to the user gs_base temporarily. This is unsave in an lightweight exit-path in KVM on AMD because the KERNEL_GS_BASE MSR is switches lazily. An NMI happening in the critical path of load_gs_index may use the wrong GS_BASE value then leading to unpredictable behavior, e.g. a triple-fault. This patch fixes the issue by making sure that load_gs_index is called only with a valid KERNEL_GS_BASE value loaded in KVM. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
2011-02-07Merge master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds
* master.kernel.org:/home/rmk/linux-2.6-arm: ALSA: AACI: allow writes to MAINCR to take effect ARM: Update mach-types ARM: 6652/1: ep93xx: correct the end address of the AC97 memory resource ARM: mxs/imx28: remove now unused clock lookup "fec.0" ARM: mxs: fix clock base address missing ARM: mxs: acknowledge gpio irq ARM: mach-imx/mach-mx25_3ds: Fix section type ARM: imx: Add VPR200 and MX51_3DS entries to uncompress.h ARM i.MX23: use correct register for setting the rate ARM i.MX23/28: remove secondary field from struct clk. It's unused ARM i.MX28: use correct register for setting the rate ARM i.MX28: fix bit operation
2011-02-07Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc: Fix hcall tracepoint recursion powerpc/numa: Fix bug in unmap_cpu_from_node powerpc/numa: Disable VPHN on dedicated processor partitions powerpc/numa: Add length when creating OF properties via VPHN powerpc/numa: Check for all VPHN changes powerpc/numa: Only use active VPHN count fields powerpc/pseries: Remove unnecessary variable initializations in numa.c powerpc/pseries: Fix brace placement in numa.c powerpc/pseries: Fix typo in VPHN comments powerpc: Fix some 6xx/7xxx CPU setup functions powerpc: Pass the right cpu_spec to ->setup_cpu() on 64-bit powerpc/book3e: Protect complex macro args in mmu-book3e.h powerpc: Fix pfn_valid() when memory starts at a non-zero address
2011-02-07Merge branch 'omap-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6 * 'omap-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6: arm: omap4: panda: remove usb_nop_xceiv_register(v1) OMAP1: Fix non-working LCD on OMAP310 OMAP3: Devkit8000: Change lcd power pin omap1: remove duplicated #include arm: mach-omap2: mux: free allocated memory on error exit arm: mach-omap2: board-rm680: fix rm680_vemmc regulator constraints OMAP: PM: SmartReflex: Fix possible null pointer read access OMAP: PM: SmartReflex: Fix possible memory leak arm: mach-omap2: voltage: debugfs: fix memory leak OMAP3: PM: fix save secure RAM to restore MPU power state OMAP: PM: SmartReflex: Add missing IS_ERR test
2011-02-07Merge branch 'fixes'Russell King