summaryrefslogtreecommitdiff
path: root/init
AgeCommit message (Collapse)Author
2014-12-11init: allow CONFIG_INIT_FALLBACK=n to disable defaults if init= failsAndy Lutomirski
If a user puts init=/whatever on the command line and /whatever can't be run, then the kernel will try a few default options before giving up. If init=/whatever came from a bootloader prompt, then this is unexpected but probably harmless. On the other hand, if it comes from a script (e.g. a tool like virtme or perhaps a future kselftest script), then the fallbacks are likely to exist, but they'll do the wrong thing. For example, they might unexpectedly invoke systemd. This adds a config option CONFIG_INIT_FALLBACK. If unset, then a failure to run the specified init= process be fatal. The tentative plan is to remove CONFIG_INIT_FALLBACK for 3.20. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Rob Landley <rob@landley.net> Cc: Chuck Ebbert <cebbert.lkml@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Shuah Khan <shuah.kh@samsung.com> Cc: Frank Rowand <frowand.list@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11mm: move page->mem_cgroup bad page handling into generic codeJohannes Weiner
Now that the external page_cgroup data structure and its lookup is gone, let the generic bad_page() check for page->mem_cgroup sanity. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Tejun Heo <tj@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11mm: embed the memcg pointer directly into struct pageJohannes Weiner
Memory cgroups used to have 5 per-page pointers. To allow users to disable that amount of overhead during runtime, those pointers were allocated in a separate array, with a translation layer between them and struct page. There is now only one page pointer remaining: the memcg pointer, that indicates which cgroup the page is associated with when charged. The complexity of runtime allocation and the runtime translation overhead is no longer justified to save that *potential* 0.19% of memory. With CONFIG_SLUB, page->mem_cgroup actually sits in the doubleword padding after the page->private member and doesn't even increase struct page, and then this patch actually saves space. Remaining users that care can still compile their kernels without CONFIG_MEMCG. text data bss dec hex filename 8828345 1725264 983040 11536649 b00909 vmlinux.old 8827425 1725264 966656 11519345 afc571 vmlinux.new [mhocko@suse.cz: update Documentation/cgroups/memory.txt] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Tejun Heo <tj@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11mm/numa balancing: rearrange Kconfig entryAneesh Kumar K.V
Add the default enable config option after the NUMA_BALANCING option so that it appears related in the nconfig interface. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11kernel: res_counter: remove the unused APIJohannes Weiner
All memory accounting and limiting has been switched over to the lockless page counters. Bye, res_counter! [akpm@linux-foundation.org: update Documentation/cgroups/memory.txt] [mhocko@suse.cz: ditch the last remainings of res_counter] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11mm: hugetlb_cgroup: convert to lockless page countersJohannes Weiner
Abandon the spinlock-protected byte counters in favor of the unlocked page counters in the hugetlb controller as well. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-11mm: memcontrol: lockless page countersJohannes Weiner
Memory is internally accounted in bytes, using spinlock-protected 64-bit counters, even though the smallest accounting delta is a page. The counter interface is also convoluted and does too many things. Introduce a new lockless word-sized page counter API, then change all memory accounting over to it. The translation from and to bytes then only happens when interfacing with userspace. The removed locking overhead is noticable when scaling beyond the per-cpu charge caches - on a 4-socket machine with 144-threads, the following test shows the performance differences of 288 memcgs concurrently running a page fault benchmark: vanilla: 18631648.500498 task-clock (msec) # 140.643 CPUs utilized ( +- 0.33% ) 1,380,638 context-switches # 0.074 K/sec ( +- 0.75% ) 24,390 cpu-migrations # 0.001 K/sec ( +- 8.44% ) 1,843,305,768 page-faults # 0.099 M/sec ( +- 0.00% ) 50,134,994,088,218 cycles # 2.691 GHz ( +- 0.33% ) <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 8,049,712,224,651 instructions # 0.16 insns per cycle ( +- 0.04% ) 1,586,970,584,979 branches # 85.176 M/sec ( +- 0.05% ) 1,724,989,949 branch-misses # 0.11% of all branches ( +- 0.48% ) 132.474343877 seconds time elapsed ( +- 0.21% ) lockless: 12195979.037525 task-clock (msec) # 133.480 CPUs utilized ( +- 0.18% ) 832,850 context-switches # 0.068 K/sec ( +- 0.54% ) 15,624 cpu-migrations # 0.001 K/sec ( +- 10.17% ) 1,843,304,774 page-faults # 0.151 M/sec ( +- 0.00% ) 32,811,216,801,141 cycles # 2.690 GHz ( +- 0.18% ) <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 9,999,265,091,727 instructions # 0.30 insns per cycle ( +- 0.10% ) 2,076,759,325,203 branches # 170.282 M/sec ( +- 0.12% ) 1,656,917,214 branch-misses # 0.08% of all branches ( +- 0.55% ) 91.369330729 seconds time elapsed ( +- 0.45% ) On top of improved scalability, this also gets rid of the icky long long types in the very heart of memcg, which is great for 32 bit and also makes the code a lot more readable. Notable differences between the old and new API: - res_counter_charge() and res_counter_charge_nofail() become page_counter_try_charge() and page_counter_charge() resp. to match the more common kernel naming scheme of try_do()/do() - res_counter_uncharge_until() is only ever used to cancel a local counter and never to uncharge bigger segments of a hierarchy, so it's replaced by the simpler page_counter_cancel() - res_counter_set_limit() is replaced by page_counter_limit(), which expects its callers to serialize against themselves - res_counter_memparse_write_strategy() is replaced by page_counter_limit(), which rounds down to the nearest page size - rather than up. This is more reasonable for explicitely requested hard upper limits. - to keep charging light-weight, page_counter_try_charge() charges speculatively, only to roll back if the result exceeds the limit. Because of this, a failing bigger charge can temporarily lock out smaller charges that would otherwise succeed. The error is bounded to the difference between the smallest and the biggest possible charge size, so for memcg, this means that a failing THP charge can send base page charges into reclaim upto 2MB (4MB) before the limit would have been reached. This should be acceptable. [akpm@linux-foundation.org: add includes for WARN_ON_ONCE and memparse] [akpm@linux-foundation.org: add includes for WARN_ON_ONCE, memparse, strncmp, and PAGE_SIZE] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-20Merge branch 'rcu/next' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Streamline RCU's use of per-CPU variables, shifting from "cpu" arguments to functions to "this_"-style per-CPU variable accessors. - Signal-handling RCU updates. - Real-time updates. - Torture-test updates. - Miscellaneous fixes. - Documentation updates. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-11param: fix crash on bad kernel argumentsDaniel Thompson
Currently if the user passes an invalid value on the kernel command line then the kernel will crash during argument parsing. On most systems this is very hard to debug because the console hasn't been initialized yet. This is a regression due to commit 51e158c12aca ("param: hand arguments after -- straight to init") which, in response to the systemd debug controversy, made it possible to explicitly pass arguments to init. To achieve this parse_args() was extended from simply returning an error code to returning a pointer. Regretably the new init args logic does not perform a proper validity check on the pointer resulting in a crash. This patch fixes the validity check. Should the check fail then no arguments will be passed to init. This is reasonable and matches how the kernel treats its own arguments (i.e. no error recovery). Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: stable@vger.kernel.org Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-10-29rcu: Remove redundant TREE_PREEMPT_RCU config optionPranith Kumar
PREEMPT_RCU and TREE_PREEMPT_RCU serve the same function after TINY_PREEMPT_RCU has been removed. This patch removes TREE_PREEMPT_RCU and uses PREEMPT_RCU config option in its place. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-10-29rcu: Unify boost and kthread prioritiesClark Williams
Rename CONFIG_RCU_BOOST_PRIO to CONFIG_RCU_KTHREAD_PRIO and use this value for both the per-CPU kthreads (rcuc/N) and the rcu boosting threads (rcub/n). Also, create the module_parameter rcutree.kthread_prio to be used on the kernel command line at boot to set a new value (rcutree.kthread_prio=N). Signed-off-by: Clark Williams <clark.williams@gmail.com> [ paulmck: Ported to rcu/dev, applied Paul Bolle and Peter Zijlstra feedback. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-10-28init/Kconfig: move RCU_NOCB_CPU dependencies to choiceStefan Hengelein
Every choice item of the "Build-forced no-CBs CPUs" choice had a dependency to RCU_NOCB_CPU. It's more comprehensible if the choice itself has the dependency instead of every choice item. The choice itself doesn't need to be visible if there are no items selectable (i.e. on arch/frv) or RCU_NOCB_CPU is not defined. Signed-off-by: Stefan Hengelein <stefan.hengelein@fau.de> Signed-off-by: Andreas Ruprecht <rupran@einserver.de> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-10-27bpf: split eBPF out of NETAlexei Starovoitov
introduce two configs: - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters depend on - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use that solves several problems: - tracing and others that wish to use eBPF don't need to depend on NET. They can use BPF_SYSCALL to allow loading from userspace or select BPF to use it directly from kernel in NET-less configs. - in 3.18 programs cannot be attached to events yet, so don't force it on - when the rest of eBPF infra is there in 3.19+, it's still useful to switch it off to minimize kernel size bloat-o-meter on x64 shows: add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601) tested with many different config combinations. Hopefully didn't miss anything. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-14kernel/param: consolidate __{start,stop}___param[] in <linux/moduleparam.h>Geert Uytterhoeven
Consolidate the various external const and non-const declarations of __start___param[] and __stop___param in <linux/moduleparam.h>. This requires making a few struct kernel_param pointers in kernel/params.c const. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14init/initramfs.c: resolve shadow warningsMark Rustad
Resolve shadow warnings that are produced in W=2 builds by renaming a global with a too-generic name and renaming a formal parameter. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14printk: don't bother using LOG_CPU_MAX_BUF_SHIFT on !SMPGeert Uytterhoeven
When configuring a uniprocessor kernel, don't bother the user with an irrelevant LOG_CPU_MAX_BUF_SHIFT question, and don't build the unused code. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-13Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave Hansen) - Various sched/idle refinements for better idle handling (Nicolas Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot) - sched/numa updates and optimizations (Rik van Riel) - sysbench speedup (Vincent Guittot) - capacity calculation cleanups/refactoring (Vincent Guittot) - Various cleanups to thread group iteration (Oleg Nesterov) - Double-rq-lock removal optimization and various refactorings (Kirill Tkhai) - various sched/deadline fixes ... and lots of other changes" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits) sched/dl: Use dl_bw_of() under rcu_read_lock_sched() sched/fair: Delete resched_cpu() from idle_balance() sched, time: Fix build error with 64 bit cputime_t on 32 bit systems sched: Improve sysbench performance by fixing spurious active migration sched/x86: Fix up typo in topology detection x86, sched: Add new topology for multi-NUMA-node CPUs sched/rt: Use resched_curr() in task_tick_rt() sched: Use rq->rd in sched_setaffinity() under RCU read lock sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask' sched: Use dl_bw_of() under RCU read lock sched/fair: Remove duplicate code from can_migrate_task() sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW sched: print_rq(): Don't use tasklist_lock sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock() sched: Fix the task-group check in tg_has_rt_tasks() sched/fair: Leverage the idle state info when choosing the "idlest" cpu sched: Let the scheduler see CPU idle states sched/deadline: Fix inter- exclusive cpusets migrations sched/deadline: Clear dl_entity params when setscheduling to different class sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault() ...
2014-10-13Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main changes in this cycle were: - changes related to No-CBs CPUs and NO_HZ_FULL - RCU-tasks implementation - torture-test updates - miscellaneous fixes - locktorture updates - RCU documentation updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits) workqueue: Use cond_resched_rcu_qs macro workqueue: Add quiescent state between work items locktorture: Cleanup header usage locktorture: Cannot hold read and write lock locktorture: Fix __acquire annotation for spinlock irq locktorture: Support rwlocks rcu: Eliminate deadlock between CPU hotplug and expedited grace periods locktorture: Document boot/module parameters rcutorture: Rename rcutorture_runnable parameter locktorture: Add test scenario for rwsem_lock locktorture: Add test scenario for mutex_lock locktorture: Make torture scripting account for new _runnable name locktorture: Introduce torture context locktorture: Support rwsems locktorture: Add infrastructure for torturing read locks torture: Address race in module cleanup locktorture: Make statistics generic locktorture: Teach about lock debugging locktorture: Support mutexes locktorture: Add documentation ...
2014-10-10mm: remove misleading ARCH_USES_NUMA_PROT_NONEMel Gorman
ARCH_USES_NUMA_PROT_NONE was defined for architectures that implemented _PAGE_NUMA using _PROT_NONE. This saved using an additional PTE bit and relied on the fact that PROT_NONE vmas were skipped by the NUMA hinting fault scanner. This was found to be conceptually confusing with a lot of implicit assumptions and it was asked that an alternative be found. Commit c46a7c81 "x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels" redefined _PAGE_NUMA on x86 to be one of the swap PTE bits and shrunk the maximum possible swap size but it did not go far enough. There are no architectures that reuse _PROT_NONE as _PROT_NUMA but the relics still exist. This patch removes ARCH_USES_NUMA_PROT_NONE and removes some unnecessary duplication in powerpc vs the generic implementation by defining the types the core NUMA helpers expected to exist from x86 with their ppc64 equivalent. This necessitated that a PTE bit mask be created that identified the bits that distinguish present from NUMA pte entries but it is expected this will only differ between arches based on _PAGE_PROTNONE. The naming for the generic helpers was taken from x86 originally but ppc64 has types that are equivalent for the purposes of the helper so they are mapped instead of duplicating code. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09Merge branch 'timers-nohz-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: "Main changes: - Fix the deadlock reported by Dave Jones et al - Clean up and fix nohz_full interaction with arch abilities - nohz init code consolidation/cleanup" * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: nohz: nohz full depends on irq work self IPI support nohz: Consolidate nohz full init code arm64: Tell irq work about self IPI support arm: Tell irq work about self IPI support x86: Tell irq work about self IPI support irq_work: Force raised irq work to run on irq work interrupt irq_work: Introduce arch_irq_work_has_interrupt() nohz: Move nohz full init call to tick init
2014-10-08Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull "trivial tree" updates from Jiri Kosina: "Usual pile from trivial tree everyone is so eagerly waiting for" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) Remove MN10300_PROC_MN2WS0038 mei: fix comments treewide: Fix typos in Kconfig kprobes: update jprobe_example.c for do_fork() change Documentation: change "&" to "and" in Documentation/applying-patches.txt Documentation: remove obsolete pcmcia-cs from Changes Documentation: update links in Changes Documentation: Docbook: Fix generated DocBook/kernel-api.xml score: Remove GENERIC_HAS_IOMAP gpio: fix 'CONFIG_GPIO_IRQCHIP' comments tty: doc: Fix grammar in serial/tty dma-debug: modify check_for_stack output treewide: fix errors in printk genirq: fix reference in devm_request_threaded_irq comment treewide: fix synchronize_rcu() in comments checkstack.pl: port to AArch64 doc: queue-sysfs: minor fixes init/do_mounts: better syntax description MIPS: fix comment spelling powerpc/simpleboot: fix comment ...
2014-10-08Merge tag 'modules-next-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module update from Rusty Russell: "Nothing major: support for compressing modules, and auto-tainting params. PS. My virtio-next tree is empty: DaveM took the patches I had. There might be a virtio-rng starvation fix, but so far it's a bit voodoo so I will get to that in the next two days or it will wait" * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: moduleparam: Resolve missing-field-initializer warning kbuild: handle module compression while running 'make modules_install'. modinst: wrap long lines in order to enhance cmd_modules_install modsign: lookup lines ending in .ko in .mod files modpost: simplify file name generation of *.mod.c files modpost: reduce visibility of symbols and constify r/o arrays param: check for tainting before calling set op. drm/i915: taint the kernel if unsafe module parameters are set module: add module_param_unsafe and module_param_named_unsafe module: make it possible to have unsafe, tainting module params module: rename KERNEL_PARAM_FL_NOARG to avoid confusion
2014-10-07Merge tag 'tiny/for-3.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux Pull "tinification" patches from Josh Triplett. Work on making smaller kernels. * tag 'tiny/for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux: bloat-o-meter: Ignore syscall aliases SyS_ and compat_SyS_ mm: Support compiling out madvise and fadvise x86: Support compiling out human-friendly processor feature names x86: Drop support for /proc files when !CONFIG_PROC_FS x86, boot: Don't compile early_serial_console.c when !CONFIG_EARLY_PRINTK x86, boot: Don't compile aslr.c when !CONFIG_RANDOMIZE_BASE x86, boot: Use the usual -y -n mechanism for objects in vmlinux x86: Add "make tinyconfig" to configure the tiniest possible kernel x86, platform, kconfig: move kvmconfig functionality to a helper
2014-10-04Merge tag 'tiny/kconfig-for-3.17' of ↵Linus Torvalds
https://git.kernel.org/pub/scm/linux/kernel/git/josh/linux Pull kconfig fixes for tiny setups from Josh Triplett: "Two Kconfig bugfixes for 3.17 related to tinification. These fixes make the Kconfig "General Setup" menu much more usable" * tag 'tiny/kconfig-for-3.17' of https://git.kernel.org/pub/scm/linux/kernel/git/josh/linux: init/Kconfig: Fix HAVE_FUTEX_CMPXCHG to not break up the EXPERT menu init/Kconfig: Hide printk log config if CONFIG_PRINTK=n
2014-10-03init/Kconfig: Fix HAVE_FUTEX_CMPXCHG to not break up the EXPERT menuJosh Triplett
commit 03b8c7b623c80af264c4c8d6111e5c6289933666 ("futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test") added the HAVE_FUTEX_CMPXCHG symbol right below FUTEX. This placed it right in the middle of the options for the EXPERT menu. However, HAVE_FUTEX_CMPXCHG does not depend on EXPERT or FUTEX, so Kconfig stops placing items in the EXPERT menu, and displays the remaining several EXPERT items (starting with EPOLL) directly in the General Setup menu. Since both users of HAVE_FUTEX_CMPXCHG only select it "if FUTEX", make HAVE_FUTEX_CMPXCHG itself depend on FUTEX. With this change, the subsequent items display as part of the EXPERT menu again; the EMBEDDED menu now appears as the next top-level item in the General Setup menu, which makes General Setup much shorter and more usable. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: stable <stable@vger.kernel.org>
2014-10-03init/Kconfig: Hide printk log config if CONFIG_PRINTK=nJosh Triplett
The buffers sized by CONFIG_LOG_BUF_SHIFT and CONFIG_LOG_CPU_MAX_BUF_SHIFT do not exist if CONFIG_PRINTK=n, so don't ask about their size at all. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> Cc: stable <stable@vger.kernel.org>
2014-09-23Merge branch 'rcu/next' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull the v3.18 RCU changes from Paul E. McKenney: " * Update RCU documentation. These were posted to LKML at https://lkml.org/lkml/2014/8/28/378. * Miscellaneous fixes. These were posted to LKML at https://lkml.org/lkml/2014/8/28/386. An additional fix that eliminates a documented (but now inconvenient) deadlock between RCU hotplug and expedited grace periods was posted at https://lkml.org/lkml/2014/8/28/573. * Changes related to No-CBs CPUs and NO_HZ_FULL. These were posted to LKML at https://lkml.org/lkml/2014/8/28/412. * Torture-test updates. These were posted to LKML at https://lkml.org/lkml/2014/8/28/546 and at https://lkml.org/lkml/2014/9/11/1114. * RCU-tasks implementation. These were posted to LKML at https://lkml.org/lkml/2014/8/28/540. " Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-19init/main.c: Give init_task a canaryAaron Tomlin
Tasks get their end of stack set to STACK_END_MAGIC with the aim to catch stack overruns. Currently this feature does not apply to init_task. This patch removes this restriction. Note that a similar patch was posted by Prarit Bhargava some time ago but was never merged: http://marc.info/?l=linux-kernel&m=127144305403241&w=2 Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: aneesh.kumar@linux.vnet.ibm.com Cc: dzickus@redhat.com Cc: bmr@redhat.com Cc: jcastillo@redhat.com Cc: jgh@redhat.com Cc: minchan@kernel.org Cc: tglx@linutronix.de Cc: hannes@cmpxchg.org Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Daeseok Youn <daeseok.youn@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Fabian Frederick <fabf@skynet.be> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/1410527779-8133-2-git-send-email-atomlin@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-17Revert "init: make rootdelay=N consistent with rootwait behaviour"Paul Gortmaker
This reverts commit 4dfe694f616e00e6fd83e5bbcd7a3c4d7113493d. In that, we did: Here we move the rootdelay code to be right beside the rootwait code, so that their behaviour is consistent. ...which is fine, but in hindsight, perhaps moving the rootwait to be beside the rootdelay would have been better. We also indicated: It should be noted that in doing so, the actions based on the saved_root_name[0] and initrd_load() were previously put on hold by rootdelay=N and now currently will not be delayed. However, I think consistent behaviour is more important than matching historical behaviour of delaying the above two operations. But Pavel reported an instance where an ARM target with root on MMC was failing to mount root, and Russell diagnosed it to the fact that the call to set ROOT_DEV within the saved_root_name[0] processing block mentioned above was no longer being delayed. Rather than moving both wait clauses to the original position of rootdelay and risking unearthing other possible corner case breakage at this point in time, we simply revert now and we can revisit trying the alternate/earlier location in another development cycle. Cc: Pavel Machek <pavel@denx.de> Cc: Russell King <linux@arm.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-16Merge branch 'rcu-tasks.2014.09.10a' into HEADPaul E. McKenney
rcu-tasks.2014.09.10a: Add RCU-tasks flavor of RCU.
2014-09-16rcu: Fix attempt to avoid unsolicited offloading of callbacksPaul E. McKenney
Commit b58cc46c5f6b (rcu: Don't offload callbacks unless specifically requested) failed to adjust the callback lists of the CPUs that are known to be no-CBs CPUs only because they are also nohz_full= CPUs. This failure can result in callbacks that are posted during early boot getting stranded on nxtlist for CPUs whose no-CBs property becomes apparent late, and there can also be spurious warnings about offline CPUs posting callbacks. This commit fixes these problems by adding an early-boot rcu_init_nohz() that properly initializes the no-CBs CPUs. Note that kernels built with CONFIG_RCU_NOCB_CPU_ALL=y or with CONFIG_RCU_NOCB_CPU=n do not exhibit this bug. Neither do kernels booted without the nohz_full= boot parameter. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2014-09-13nohz: Move nohz full init call to tick initFrederic Weisbecker
This way we unbloat a bit main.c and more importantly we initialize nohz full after init_IRQ(). This dependency will be needed in further patches because nohz full needs irq work to raise its own IRQ. Information about the support for this ability on ARM64 is obtained on init_IRQ() which initialize the pointer to __smp_call_function. Since tick_init() is called right after init_IRQ(), this is a good place to call tick_nohz_init() and prepare for that dependency. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-09-07rcu: Add call_rcu_tasks()Paul E. McKenney
This commit adds a new RCU-tasks flavor of RCU, which provides call_rcu_tasks(). This RCU flavor's quiescent states are voluntary context switch (not preemption!) and userspace execution (not the idle loop -- use some sort of schedule_on_each_cpu() if you need to handle the idle tasks. Note that unlike other RCU flavors, these quiescent states occur in tasks, not necessarily CPUs. Includes fixes from Steven Rostedt. This RCU flavor is assumed to have very infrequent latency-tolerant updaters. This assumption permits significant simplifications, including a single global callback list protected by a single global lock, along with a single task-private linked list containing all tasks that have not yet passed through a quiescent state. If experience shows this assumption to be incorrect, the required additional complexity will be added. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-08-28init/do_mounts: better syntax descriptionPavel Machek
Specify hex device number unambiquously. Signed-off-by: Pavel Machek <pavel@denx.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-08-27kbuild: handle module compression while running 'make modules_install'.Bertrand Jacquin
Since module-init-tools (gzip) and kmod (gzip and xz) support compressed modules, it could be useful to include a support for compressing modules right after having them installed. Doing this in kbuild instead of per distro can permit to make this kind of usage more generic. This patch add a Kconfig entry to "Enable loadable module support" menu and let you choose to compress using gzip (default) or xz. Both gzip and xz does not used any extra -[1-9] option since Andi Kleen and Rusty Russell prove no gain is made using them. gzip is called with -n argument to avoid storing original filename inside compressed file, that way we can save some more bytes. On a v3.16 kernel, 'make allmodconfig' generated 4680 modules for a total of 378MB (no strip, no sign, no compress), the following table shows observed disk space gain based on the allmodconfig .config : | time | +-------------+-----------------+ | manual .ko | make | size | percent | compression | modules_install | | gain +-------------+-----------------+------+-------- - | | 18.61s | 378M | GZIP | 3m16s | 3m37s | 102M | 73.41% XZ | 5m22s | 5m39s | 77M | 79.83% The gain for restricted environnement seems to be interesting while uncompress can be time consuming but happens only while loading a module, that is generally done only once. This is fully compatible with signed modules while the signed module is compressed. module-init-tools or kmod handles decompression and provide to other layer the uncompressed but signed payload. Reviewed-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Bertrand Jacquin <beber@meleeweb.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-08-26mm: Fix CROSS_MEMORY_ATTACH help text grammarGeert Uytterhoeven
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2014-08-18mm: Support compiling out madvise and fadviseJosh Triplett
Many embedded systems will not need these syscalls, and omitting them saves space. Add a new EXPERT config option CONFIG_ADVISE_SYSCALLS (default y) to support compiling them out. bloat-o-meter: add/remove: 0/3 grow/shrink: 0/0 up/down: 0/-2250 (-2250) function old new delta sys_fadvise64 57 - -57 sys_fadvise64_64 691 - -691 sys_madvise 1502 - -1502 Signed-off-by: Josh Triplett <josh@joshtriplett.org>
2014-08-14mm: fix CROSS_MEMORY_ATTACH help text grammarGeert Uytterhoeven
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08Merge branch 'akpm' (second patchbomb from Andrew Morton)Linus Torvalds
Merge more incoming from Andrew Morton: "Two new syscalls: memfd_create in "shm: add memfd_create() syscall" kexec_file_load in "kexec: implementation of new syscall kexec_file_load" And: - Most (all?) of the rest of MM - Lots of the usual misc bits - fs/autofs4 - drivers/rtc - fs/nilfs - procfs - fork.c, exec.c - more in lib/ - rapidio - Janitorial work in filesystems: fs/ufs, fs/reiserfs, fs/adfs, fs/cramfs, fs/romfs, fs/qnx6. - initrd/initramfs work - "file sealing" and the memfd_create() syscall, in tmpfs - add pci_zalloc_consistent, use it in lots of places - MAINTAINERS maintenance - kexec feature work" * emailed patches from Andrew Morton <akpm@linux-foundation.org: (193 commits) MAINTAINERS: update nomadik patterns MAINTAINERS: update usb/gadget patterns MAINTAINERS: update DMA BUFFER SHARING patterns kexec: verify the signature of signed PE bzImage kexec: support kexec/kdump on EFI systems kexec: support for kexec on panic using new system call kexec-bzImage64: support for loading bzImage using 64bit entry kexec: load and relocate purgatory at kernel load time purgatory: core purgatory functionality purgatory/sha256: provide implementation of sha256 in purgaotory context kexec: implementation of new syscall kexec_file_load kexec: new syscall kexec_file_load() declaration kexec: make kexec_segment user buffer pointer a union resource: provide new functions to walk through resources kexec: use common function for kimage_normal_alloc() and kimage_crash_alloc() kexec: move segment verification code in a separate function kexec: rename unusebale_pages to unusable_pages kernel: build bin2c based on config option CONFIG_BUILD_BIN2C bin2c: move bin2c in scripts/basic shm: wait for pins to be released when sealing ...
2014-08-08kernel: build bin2c based on config option CONFIG_BUILD_BIN2CVivek Goyal
currently bin2c builds only if CONFIG_IKCONFIG=y. But bin2c will now be used by kexec too. So make it compilation dependent on CONFIG_BUILD_BIN2C and this config option can be selected by CONFIG_KEXEC and CONFIG_IKCONFIG. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08init/main.c: code clean-upFabian Frederick
Fixing some checkpatch warnings(remove global initialization, move __initdata, coalesce formats ...) Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08initramfs: add write error checksDavid Engraf
On a system with low memory extracting the initramfs may fail. If this happens the user gets "Failed to execute /init" instead of an initramfs error. Check return value of sys_write and call error() when the write was incomplete or failed. Signed-off-by: David Engraf <david.engraf@sysgo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08initramfs: support initramfs that is bigger than 2GiBYinghai Lu
Now with 64bit bzImage and kexec tools, we support ramdisk that size is bigger than 2g, as we could put it above 4G. Found compressed initramfs image could not be decompressed properly. It turns out that image length is int during decompress detection, and it will become < 0 when length is more than 2G. Furthermore, during decompressing len as int is used for inbuf count, that has problem too. Change len to long, that should be ok as on 32 bit platform long is 32bits. Tested with following compressed initramfs image as root with kexec. gzip, bzip2, xz, lzma, lzop, lz4. run time for populate_rootfs(): size name Nehalem-EX Westmere-EX Ivybridge-EX 9034400256 root_img : 26s 24s 30s 3561095057 root_img.lz4 : 28s 27s 27s 3459554629 root_img.lzo : 29s 29s 28s 3219399480 root_img.gz : 64s 62s 49s 2251594592 root_img.xz : 262s 260s 183s 2226366598 root_img.lzma: 386s 376s 277s 2901482513 root_img.bz2 : 635s 599s Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Rashika Kheria <rashika.kheria@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Kyungsik Lee <kyungsik.lee@lge.com> Cc: P J P <ppandit@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: "Daniel M. Weeks" <dan@danweeks.net> Cc: Alexandre Courbot <acourbot@nvidia.com> Cc: Jan Beulich <JBeulich@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08initramfs: support initrd that is bigger than 2GiBYinghai Lu
When initrd (compressed or not) is used, kernel report data corrupted with /dev/ram0. The root cause: During initramfs checking, if it is initrd, it will be transferred to /initrd.image with sys_write. sys_write only support 2G-4K write, so if the initrd ram is more than that, /initrd.image will not complete at all. Add local xwrite to loop calling sys_write to workaround the problem. Also need to use xwrite in write_buffer() to handle: image is uncompressed cpio and there is one big file (>2G) in it. unpack_to_rootfs ===> write_buffer ===> actions[]/do_copy At the same time, we don't need to worry about sys_read/sys_write in do_mounts_rd.c::crd_load. As decompressor will have fill/flush and local buffer that is smaller than 2G. Test with uncompressed initrd, and compressed ones with gz, bz2, lzma,xz, lzop. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: "Daniel M. Weeks" <dan@danweeks.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08init: make rootdelay=N consistent with rootwait behaviourPaul Gortmaker
Currently rootdelay=N and rootwait behave differently (aside from the obvious unbounded wait duration) because they are at different places in the init sequence. The difference manifests itself for md devices because the call to md_run_setup() lives between rootdelay and rootwait, so if you try to use rootdelay=20 to try and allow a slow RAID0 array to assemble, you get this: [ 4.526011] sd 6:0:0:0: [sdc] Attached SCSI removable disk [ 22.972079] md: Waiting for all devices to be available before autodetect i.e. you've achieved nothing other than delaying the probing 20s, when what you wanted was a 20s delay _after_ the probing for md devices was initiated. Here we move the rootdelay code to be right beside the rootwait code, so that their behaviour is consistent. It should be noted that in doing so, the actions based on the saved_root_name[0] and initrd_load() were previously put on hold by rootdelay=N and now currently will not be delayed. However, I think consistent behaviour is more important than matching historical behaviour of delaying the above two operations. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08Merge tag 'soc-for-3.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC platform changes from Olof Johansson: "This is the bulk of new SoC enablement and other platform changes for 3.17: - Samsung S5PV210 has been converted to DT and multiplatform - Clock drivers and bindings for some of the lower-end i.MX 1/2 platforms - Kirkwood, one of the popular Marvell platforms, is folded into the mvebu platform code, removing mach-kirkwood - Hwmod data for TI AM43xx and DRA7 platforms - More additions of Renesas shmobile platform support - Removal of plat-samsung contents that can be removed with S5PV210 being multiplatform/DT-enabled and the other two old platforms being removed New platforms (most with only basic support right now): - Hisilicon X5HD2 settop box chipset is introduced - Mediatek MT6589 (mobile chipset) is introduced - Broadcom BCM7xxx settop box chipset is introduced + as usual a lot other pieces all over the platform code" * tag 'soc-for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (240 commits) ARM: hisi: remove smp from machine descriptor power: reset: move hisilicon reboot code ARM: dts: Add hix5hd2-dkb dts file. ARM: debug: Rename Hi3716 to HIX5HD2 ARM: hisi: enable hix5hd2 SoC ARM: hisi: add ARCH_HISI MAINTAINERS: add entry for Broadcom ARM STB architecture ARM: brcmstb: select GISB arbiter and interrupt drivers ARM: brcmstb: add infrastructure for ARM-based Broadcom STB SoCs ARM: configs: enable SMP in bcm_defconfig ARM: add SMP support for Broadcom mobile SoCs Documentation: arm: misc updates to Marvell EBU SoC status Documentation: arm: add URLs to public datasheets for the Marvell Armada XP SoC ARM: mvebu: fix build without platforms selected ARM: mvebu: add cpuidle support for Armada 38x ARM: mvebu: add cpuidle support for Armada 370 cpuidle: mvebu: add Armada 38x support cpuidle: mvebu: add Armada 370 support cpuidle: mvebu: rename the driver from armada-370-xp to mvebu-v7 ARM: mvebu: export the SCU address ...
2014-08-07printk: allow increasing the ring buffer depending on the number of CPUsLuis R. Rodriguez
The default size of the ring buffer is too small for machines with a large amount of CPUs under heavy load. What ends up happening when debugging is the ring buffer overlaps and chews up old messages making debugging impossible unless the size is passed as a kernel parameter. An idle system upon boot up will on average spew out only about one or two extra lines but where this really matters is on heavy load and that will vary widely depending on the system and environment. There are mechanisms to help increase the kernel ring buffer for tracing through debugfs, and those interfaces even allow growing the kernel ring buffer per CPU. We also have a static value which can be passed upon boot. Relying on debugfs however is not ideal for production, and relying on the value passed upon bootup is can only used *after* an issue has creeped up. Instead of being reactive this adds a proactive measure which lets you scale the amount of contributions you'd expect to the kernel ring buffer under load by each CPU in the worst case scenario. We use num_possible_cpus() to avoid complexities which could be introduced by dynamically changing the ring buffer size at run time, num_possible_cpus() lets us use the upper limit on possible number of CPUs therefore avoiding having to deal with hotplugging CPUs on and off. This introduces the kernel configuration option LOG_CPU_MAX_BUF_SHIFT which is used to specify the maximum amount of contributions to the kernel ring buffer in the worst case before the kernel ring buffer flips over, the size is specified as a power of 2. The total amount of contributions made by each CPU must be greater than half of the default kernel ring buffer size (1 << LOG_BUF_SHIFT bytes) in order to trigger an increase upon bootup. The kernel ring buffer is increased to the next power of two that would fit the required minimum kernel ring buffer size plus the additional CPU contribution. For example if LOG_BUF_SHIFT is 18 (256 KB) you'd require at least 128 KB contributions by other CPUs in order to trigger an increase of the kernel ring buffer. With a LOG_CPU_BUF_SHIFT of 12 (4 KB) you'd require at least anything over > 64 possible CPUs to trigger an increase. If you had 128 possible CPUs the amount of minimum required kernel ring buffer bumps to: ((1 << 18) + ((128 - 1) * (1 << 12))) / 1024 = 764 KB Since we require the ring buffer to be a power of two the new required size would be 1024 KB. This CPU contributions are ignored when the "log_buf_len" kernel parameter is used as it forces the exact size of the ring buffer to an expected power of two value. [pmladek@suse.cz: fix build] Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.cz> Tested-by: Davidlohr Bueso <davidlohr@hp.com> Tested-by: Petr Mladek <pmladek@suse.cz> Reviewed-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Stephen Warren <swarren@wwwdotorg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Petr Mladek <pmladek@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Arun KS <arunks.linux@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-09Merge branches 'doc.2014.07.08a', 'fixes.2014.07.09a', ↵Paul E. McKenney
'maintainers.2014.07.08b', 'nocbs.2014.07.07a' and 'torture.2014.07.07a' into HEAD doc.2014.07.08a: Documentation updates. fixes.2014.07.09a: Miscellaneous fixes. maintainers.2014.07.08b: Maintainership updates. nocbs.2014.07.07a: Callback-offloading fixes. torture.2014.07.07a: Torture-test updates.
2014-07-09rcu: Handle obsolete references to TINY_PREEMPT_RCUPaul E. McKenney
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2014-07-07rcu: Don't offload callbacks unless specifically requestedPaul E. McKenney
Enabling NO_HZ_FULL currently has the side effect of enabling callback offloading on all CPUs. This results in lots of additional rcuo kthreads, and can also increase context switching and wakeups, even in cases where callback offloading is neither needed nor particularly desirable. This commit therefore enables callback offloading on a given CPU only if specifically requested at build time or boot time, or if that CPU has been specifically designated (again, either at build time or boot time) as a nohz_full CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>