summaryrefslogtreecommitdiff
path: root/init
AgeCommit message (Collapse)Author
2008-07-26make init/do_mounts.c:root_device_name staticAdrian Bunk
This patch makes the needlessly global root_device_name static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26Full conversion to early_initcall() interface, remove old interfaceEduard - Gabriel Munteanu
A previous patch added the early_initcall(), to allow a cleaner hooking of pre-SMP initcalls. Now we remove the older interface, converting all existing users to the new one. [akpm@linux-foundation.org: cleanups] [akpm@linux-foundation.org: build fix] [kosaki.motohiro@jp.fujitsu.com: warning fix] [kosaki.motohiro@jp.fujitsu.com: warning fix] Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Tom Zanussi <tzanussi@gmail.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26Better interface for hooking early initcallsEduard - Gabriel Munteanu
Added early initcall (pre-SMP) support, using an identical interface to that of regular initcalls. Functions called from do_pre_smp_initcalls() could be converted to use this cleaner interface. This is required by CPU hotplug, because early users have to register notifiers before going SMP. One such CPU hotplug user is the relay interface with buffer-only channels, which needs to register such a notifier, to be usable in early code. This in turn is used by kmemtrace. Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Tom Zanussi <tzanussi@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25kconfig: fix typos: "Suport" -> "Support"Heikki Orsila
Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-07-25init: fix URL of "The GNU Accounting Utilities"S.Çağlar Onur
Following patch corrects URL of "The GNU Accounting Utilities" in init/Kconfig. Noticed by: Bart Van Assche" <bart.vanassche@gmail.com> Signed-off-by: S.Çağlar Onur <caglar@pardus.org.tr> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2008-07-25proper pid{hash,map}_init() prototypesAdrian Bunk
This patch adds proper prototypes for pid{hash,map}_init() in include/linux/pid_namespace.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25init/version.c: define version_string only if CONFIG_KALLSYMS is not definedDaniel Guilak
int Version_* is only used with ksymoops, which is only needed (according to README and Documentation/Changes) if CONFIG_KALLSYMS is NOT defined. Therefore this patch defines version_string only if CONFIG_KALLSYMS is not defined. Signed-off-by: Daniel Guilak <daniel@danielguilak.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25init/version.c: silence sparse warning by declaring the version stringDaniel Guilak
Signed-off-by: Daniel Guilak <daniel@danielguilak.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25inflate: refactor inflate malloc codeThomas Petazzoni
Inflate requires some dynamic memory allocation very early in the boot process and this is provided with a set of four functions: malloc/free/gzip_mark/gzip_release. The old inflate code used a mark/release strategy rather than implement free. This new version instead keeps a count on the number of outstanding allocations and when it hits zero, it resets the malloc arena. This allows removing all the mark and release implementations and unifying all the malloc/free implementations. The architecture-dependent code must define two addresses: - free_mem_ptr, the address of the beginning of the area in which allocations should be made - free_mem_end_ptr, the address of the end of the area in which allocations should be made. If set to 0, then no check is made on the number of allocations, it just grows as much as needed The architecture-dependent code can also provide an arch_decomp_wdog() function call. This function will be called several times during the decompression process, and allow to notify the watchdog that the system is still running. If an architecture provides such a call, then it must define ARCH_HAS_DECOMP_WDOG so that the generic inflate code calls arch_decomp_wdog(). Work initially done by Matt Mackall, updated to a recent version of the kernel and improved by me. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Mikael Starvik <mikael.starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: David Howells <dhowells@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <andi@firstfloor.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25init/: delete hard-coded setting and testing of BUILD_CRAMDISKRobert P. J. Day
There seems to be little point in explicitly setting, then testing the macro BUILD_CRAMDISK within the context of a single source file. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25init/do_mounts.c should #include <linux/initrd.h>Adrian Bunk
Every file should include the headers containing the externs for its global code (in this case for rd_doload). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24Merge branch 'sched/for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: hrtick_enabled() should use cpu_active() sched, x86: clean up hrtick implementation sched: fix build error, provide partition_sched_domains() unconditionally sched: fix warning in inc_rt_tasks() to not declare variable 'rq' if it's not needed cpu hotplug: Make cpu_active_map synchronization dependency clear cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2) sched: rework of "prioritize non-migratable tasks over migratable ones" sched: reduce stack size in isolated_cpu_setup() Revert parts of "ftrace: do not trace scheduler functions" Fixed up conflicts in include/asm-x86/thread_info.h (due to the TIF_SINGLESTEP unification vs TIF_HRTICK_RESCHED removal) and kernel/sched_fair.c (due to cpu_active_map vs for_each_cpu_mask_nr() introduction).
2008-07-22make CONFIG_KMOD invisibleJohannes Berg
... as preparation for removing it completely, make it an invisible bool defaulting to yes. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-22Shrink struct module: CONFIG_UNUSED_SYMBOLS ifdefsDenys Vlasenko
module.c and module.h conatains code for finding exported symbols which are declared with EXPORT_UNUSED_SYMBOL, and this code is compiled in even if CONFIG_UNUSED_SYMBOLS is not set and thus there can be no EXPORT_UNUSED_SYMBOLs in modules anyway (because EXPORT_UNUSED_SYMBOL(x) are compiled out to nothing then). This patch adds required #ifdefs. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-07-21initrd: Fix virtual/physical mix-up in overwrite testGeert Uytterhoeven
On recent kernels, I get the following error when using an initrd: | initrd overwritten (0x00b78000 < 0x07668000) - disabling it. My Amiga 4000 has 12 MiB of RAM at physical address 0x07400000 (virtual 0x00000000). The initrd is located at the end of RAM: 0x00b78000 - 0x00c00000 (virtual). The overwrite test compares the (virtual) initrd location to the (physical) first available memory location, which fails. This patch converts initrd_start to a page frame number, so it can safely be compared with min_low_pfn. Before the introduction of discontiguous memory support on m68k (12d810c1b8c2b913d48e629e2b5c01d105029839), min_low_pfn was just left untouched by the m68k-specific code (zero, I guess), and everything worked fine. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-18Merge branch 'linus' into core/generic-dma-coherentIngo Molnar
Conflicts: kernel/Makefile Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment ↵Max Krasnyansky
(take 2) This is based on Linus' idea of creating cpu_active_map that prevents scheduler load balancer from migrating tasks to the cpu that is going down. It allows us to simplify domain management code and avoid unecessary domain rebuilds during cpu hotplug event handling. Please ignore the cpusets part for now. It needs some more work in order to avoid crazy lock nesting. Although I did simplfy and unify domain reinitialization logic. We now simply call partition_sched_domains() in all the cases. This means that we're using exact same code paths as in cpusets case and hence the test below cover cpusets too. Cpuset changes to make rebuild_sched_domains() callable from various contexts are in the separate patch (right next after this one). This not only boots but also easily handles while true; do make clean; make -j 8; done and while true; do on-off-cpu 1; done at the same time. (on-off-cpu 1 simple does echo 0/1 > /sys/.../cpu1/online thing). Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing this on right now in gnome-terminal and things are moving just fine. Also this is running with most of the debug features enabled (lockdep, mutex, etc) no BUG_ONs or lockdep complaints so far. I believe I addressed all of the Dmitry's comments for original Linus' version. I changed both fair and rt balancer to mask out non-active cpus. And replaced cpu_is_offline() with !cpu_active() in the main scheduler code where it made sense (to me). Signed-off-by: Max Krasnyanskiy <maxk@qualcomm.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Gregory Haskins <ghaskins@novell.com> Cc: dmitry.adamushko@gmail.com Cc: pj@sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-16Merge branch 'for_linus' of git://git.infradead.org/~dedekind/ubifs-2.6Linus Torvalds
* 'for_linus' of git://git.infradead.org/~dedekind/ubifs-2.6: UBIFS: include to compilation UBIFS: add new flash file system UBIFS: add brief documentation MAINTAINERS: add UBIFS section do_mounts: allow UBI root device name VFS: export sync_sb_inodes VFS: move inode_lock into sync_sb_inodes
2008-07-15Merge branch 'generic-ipi-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'generic-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits) generic-ipi: more merge fallout generic-ipi: merge fix x86, visws: use mach-default/entry_arch.h x86, visws: fix generic-ipi build generic-ipi: fixlet generic-ipi: fix s390 build bug generic-ipi: fix linux-next tree build failure fix: "smp_call_function: get rid of the unused nonatomic/retry argument" fix: "smp_call_function: get rid of the unused nonatomic/retry argument" fix "smp_call_function: get rid of the unused nonatomic/retry argument" on_each_cpu(): kill unused 'retry' parameter smp_call_function: get rid of the unused nonatomic/retry argument sh: convert to generic helpers for IPI function calls parisc: convert to generic helpers for IPI function calls mips: convert to generic helpers for IPI function calls m32r: convert to generic helpers for IPI function calls arm: convert to generic helpers for IPI function calls alpha: convert to generic helpers for IPI function calls ia64: convert to generic helpers for IPI function calls powerpc: convert to generic helpers for IPI function calls ... Fix trivial conflicts due to rcu updates in kernel/rcupdate.c manually
2008-07-15Merge branch 'generic-ipi' into generic-ipi-for-linusIngo Molnar
Conflicts: arch/powerpc/Kconfig arch/s390/kernel/time.c arch/x86/kernel/apic_32.c arch/x86/kernel/cpu/perfctr-watchdog.c arch/x86/kernel/i8259_64.c arch/x86/kernel/ldt.c arch/x86/kernel/nmi_64.c arch/x86/kernel/smpboot.c arch/x86/xen/smp.c include/asm-x86/hw_irq_32.h include/asm-x86/hw_irq_64.h include/asm-x86/mach-default/irq_vectors.h include/asm-x86/mach-voyager/irq_vectors.h include/asm-x86/smp.h kernel/Makefile Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-15Merge branch 'core/rcu' into core/rcu-for-linusIngo Molnar
2008-07-14do_mounts: allow UBI root device nameAdrian Hunter
Similarly to MTD devices, allow UBI devices. Signed-off-by: Adrian Hunter <ext-adrian.hunter@nokia.com>
2008-06-30generic: per-device coherent dma allocatorDmitry Baryshkov
Currently x86_32, sh and cris-v32 provide per-device coherent dma memory allocator. However their implementation is nearly identical. Refactor out common code to be reused by them. Signed-off-by: Dmitry Baryshkov <dbaryshkov@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-26Add generic helpers for arch IPI function callsJens Axboe
This adds kernel/smp.c which contains helpers for IPI function calls. In addition to supporting the existing smp_call_function() in a more efficient manner, it also adds a more scalable variant called smp_call_function_single() for calling a given function on a single CPU only. The core of this is based on the x86-64 patch from Nick Piggin, lots of changes since then. "Alan D. Brunelle" <Alan.Brunelle@hp.com> has contributed lots of fixes and suggestions as well. Also thanks to Paul E. McKenney <paulmck@linux.vnet.ibm.com> for reviewing RCU usage and getting rid of the data allocation fallback deadlock. Acked-by: Ingo Molnar <mingo@elte.hu> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-06-24x86: use cpu_khz for loops_per_jiffy calculation, cleanupAlok Kataria
As suggested by Ingo, remove all references to tsc from init/calibrate.c TSC is x86 specific, and using tsc in variable names in a generic file should be avoided. lpj_tsc is now called lpj_fine, since it is related to fine tuning of lpj value. Also tsc_rate_* is called timer_rate_* Signed-off-by: Alok N Kataria <akataria@vmware.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Daniel Hecht <dhecht@vmware.com> Cc: Tim Mann <mann@vmware.com> Cc: Zach Amsden <zach@vmware.com> Cc: Sahil Rihan <srihan@vmware.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-23x86: use cpu_khz for loops_per_jiffy calculationAlok Kataria
On the x86 platform we can use the value of tsc_khz computed during tsc calibration to calculate the loops_per_jiffy value. Its very important to keep the error in lpj values to minimum as any error in that may result in kernel panic in check_timer. In virtualization environment, On a highly overloaded host the guest delay calibration may sometimes result in errors beyond the ~50% that timer_irq_works can handle, resulting in the guest panicking. Does some formating changes to lpj_setup code to now have a single printk to print the bogomips value. We do this only for the boot processor because the AP's can have different base frequencies or the BIOS might boot a AP at a different frequency. Signed-off-by: Alok N Kataria <akataria@vmware.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Daniel Hecht <dhecht@vmware.com> Cc: Tim Mann <mann@vmware.com> Cc: Zach Amsden <zach@vmware.com> Cc: Sahil Rihan <srihan@vmware.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-06-16Merge branch 'linus' into core/rcuIngo Molnar
2008-05-25Kconfig: introduce ARCH_DEFCONFIG to DEFCONFIG_LISTSam Ravnborg
init/Kconfig contains a list of configs that are searched for if 'make *config' are used with no .config present. Extend this list to look at the config identified by ARCH_DEFCONFIG. With this change we now try the defconfig targets last. This fixes a regression reported by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com>
2008-05-24md: proper extern for mdp_majorAdrian Bunk
This patch adds a proper extern for mdp_major in include/linux/raid/md.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-19rcu: add call_rcu_sched()Paul E. McKenney
Fourth cut of patch to provide the call_rcu_sched(). This is again to synchronize_sched() as call_rcu() is to synchronize_rcu(). Should be fine for experimental and -rt use, but not ready for inclusion. With some luck, I will be able to tell Andrew to come out of hiding on the next round. Passes multi-day rcutorture sessions with concurrent CPU hotplugging. Fixes since the first version include a bug that could result in indefinite blocking (spotted by Gautham Shenoy), better resiliency against CPU-hotplug operations, and other minor fixes. Fixes since the second version include reworking grace-period detection to avoid deadlocks that could happen when running concurrently with CPU hotplug, adding Mathieu's fix to avoid the softlockup messages, as well as Mathieu's fix to allow use earlier in boot. Fixes since the third version include a wrong-CPU bug spotted by Andrew, getting rid of the obsolete synchronize_kernel API that somehow snuck back in, merging spin_unlock() and local_irq_restore() in a few places, commenting the code that checks for quiescent states based on interrupting from user-mode execution or the idle loop, removing some inline attributes, and some code-style changes. Known/suspected shortcomings: o I still do not entirely trust the sleep/wakeup logic. Next step will be to use a private snapshot of the CPU online mask in rcu_sched_grace_period() -- if the CPU wasn't there at the start of the grace period, we don't need to hear from it. And the bit about accounting for changes in online CPUs inside of rcu_sched_grace_period() is ugly anyway. o It might be good for rcu_sched_grace_period() to invoke resched_cpu() when a given CPU wasn't responding quickly, but resched_cpu() is declared static... This patch also fixes a long-standing bug in the earlier preemptable-RCU implementation of synchronize_rcu() that could result in loss of concurrent external changes to a task's CPU affinity mask. I still cannot remember who reported this... Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-05-16initcalls: Fix m68k build and possible buffer overflowCyrill Gorcunov
This patch fixes a build bug on m68k - gcc decides to emit a call to the strlen library function, which we don't implement. More importantly - my previous patch "init: don't lose initcall return values" (commit e662e1cfd434aa234b72fbc781f1d70211cb785b) had introduced potential buffer overflow by wrong calculation of string accumulator size. Use strlcat() instead, fixing both bugs. Many thanks Andreas Schwab and Geert Uytterhoeven for helping to catch and fix the bug. Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-16Split up 'do_initcalls()' into two simpler functionsLinus Torvalds
One function to just loop over the entries, one function to actually do the call and the associated debugging code. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-16Clean up 'print_fn_descriptor_symbol()' typesLinus Torvalds
Everybody wants to pass it a function pointer, and in fact, that is what you _must_ pass it for it to make sense (since it knows that ia64 and ppc64 use descriptors for function pointers and fetches the actual address from there). So don't make the argument be a 'unsigned long' and force everybody to add a cast. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-14block: do_mounts - accept root=<non-existant partition>Kay Sievers
Some devices, like md, may create partitions only at first access, so allow root= to be set to a valid non-existant partition of an existing disk. This applies only to non-initramfs root mounting. This fixes a regression from 2.6.24 which did allow this to happen and broke some users machines :( Acked-by: Neil Brown <neilb@suse.de> Tested-by: Joao Luis Meloni Assirati <assirati@nonada.if.usp.br> Cc: stable <stable@kernel.org> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-13init: don't lose initcall return valuesCyrill Gorcunov
There is an ability to lose an initcall return value if it happened with irq disabled or imbalanced preemption (and if we debug initcall). Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-09module: don't ignore vermagic string if module doesn't have modversionsRusty Russell
Linus found a logic bug: we ignore the version number in a module's vermagic string if we have CONFIG_MODVERSIONS set, but modversions also lets through a module with no __versions section for modprobe --force (with tainting, but still). We should only ignore the start of the vermagic string if the module actually *has* crcs to check. Rather than (say) having an entertaining hissy fit and creating a config option to work around the buggy code. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-07pcspkr: fix dependanciesStas Sergeev
fix pcspkr dependancies: make the pcspkr platform drivers to depend on a platform device, and not the other way around. Signed-off-by: Stas Sergeev <stsp@aknet.ru> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Dmitry Torokhov <dtor@mail.ru> CC: Vojtech Pavlik <vojtech@suse.cz> CC: Michael Opdenacker <michael-lists@free-electrons.com> [fixed for 2.6.26-rc1 by tiwai] Signed-off-by: Takashi Iwai <tiwai@suse.de>
2008-05-05sched: default to n for GROUP_SCHED and FAIR_GROUP_SCHEDParag Warudkar
GROUP_SCHED is confirmed to cause unacceptable latencies, see: http://lkml.org/lkml/2008/5/2/370. Mark it EXPERIMENTAL and default to no for now. Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-05sched: add optional support for CONFIG_HAVE_UNSTABLE_SCHED_CLOCKPeter Zijlstra
this replaces the rq->clock stuff (and possibly cpu_clock()). - architectures that have an 'imperfect' hardware clock can set CONFIG_HAVE_UNSTABLE_SCHED_CLOCK - the 'jiffie' window might be superfulous when we update tick_gtod before the __update_sched_clock() call in sched_clock_tick() - cpu_clock() might be implemented as: sched_clock_cpu(smp_processor_id()) if the accuracy proves good enough - how far can TSC drift in a single jiffie when considering the filtering and idle hooks? [ mingo@elte.hu: various fixes and cleanups ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-05sched, x86: add HAVE_UNSTABLE_SCHED_CLOCKIngo Molnar
add the HAVE_UNSTABLE_SCHED_CLOCK, for architectures to select. the next change utilizes it. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-05Make forced module loading optionalLinus Torvalds
The kernel module loader used to be much too happy to allow loading of modules for the wrong kernel version by default. For example, if you had MODVERSIONS enabled, but tried to load a module with no version info, it would happily load it and taint the kernel - whether it was likely to actually work or not! Generally, such forced module loading should be considered a really really bad idea, so make it conditional on a new config option (MODULE_FORCE_LOAD), and make it default to off. If somebody really wants to force module loads, that's their problem, but we should not encourage it. Especially as it happened to me by mistake (ie regular unversioned Fedora modules getting loaded) causing lots of strange behavior. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-01slub: #ifdef simplificationChristoph Lameter
If we make SLUB_DEBUG depend on SYSFS then we can simplify some #ifdefs and avoid others. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2008-04-30infrastructure to debug (dynamic) objectsThomas Gleixner
We can see an ever repeating problem pattern with objects of any kind in the kernel: 1) freeing of active objects 2) reinitialization of active objects Both problems can be hard to debug because the crash happens at a point where we have no chance to decode the root cause anymore. One problem spot are kernel timers, where the detection of the problem often happens in interrupt context and usually causes the machine to panic. While working on a timer related bug report I had to hack specialized code into the timer subsystem to get a reasonable hint for the root cause. This debug hack was fine for temporary use, but far from a mergeable solution due to the intrusiveness into the timer code. The code further lacked the ability to detect and report the root cause instantly and keep the system operational. Keeping the system operational is important to get hold of the debug information without special debugging aids like serial consoles and special knowledge of the bug reporter. The problems described above are not restricted to timers, but timers tend to expose it usually in a full system crash. Other objects are less explosive, but the symptoms caused by such mistakes can be even harder to debug. Instead of creating specialized debugging code for the timer subsystem a generic infrastructure is created which allows developers to verify their code and provides an easy to enable debug facility for users in case of trouble. The debugobjects core code keeps track of operations on static and dynamic objects by inserting them into a hashed list and sanity checking them on object operations and provides additional checks whenever kernel memory is freed. The tracked object operations are: - initializing an object - adding an object to a subsystem list - deleting an object from a subsystem list Each operation is sanity checked before the operation is executed and the subsystem specific code can provide a fixup function which allows to prevent the damage of the operation. When the sanity check triggers a warning message and a stack trace is printed. The list of operations can be extended if the need arises. For now it's limited to the requirements of the first user (timers). The core code enqueues the objects into hash buckets. The hash index is generated from the address of the object to simplify the lookup for the check on kfree/vfree. Each bucket has it's own spinlock to avoid contention on a global lock. The debug code can be compiled in without being active. The runtime overhead is minimal and could be optimized by asm alternatives. A kernel command line option enables the debugging code. Thanks to Ingo Molnar for review, suggestions and cleanup patches. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30Deprecate find_task_by_pid()Pavel Emelyanov
There are some places that are known to operate on tasks' global pids only: * the rest_init() call (called on boot) * the kgdb's getthread * the create_kthread() (since the kthread is run in init ns) So use the find_task_by_pid_ns(..., &init_pid_ns) there and schedule the find_task_by_pid for removal. [sukadev@us.ibm.com: Fix warning in kernel/pid.c] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30signals: fix /sbin/init protection from unwanted signalsOleg Nesterov
The global init has a lot of long standing problems with the unhandled fatal signals. - The "is_global_init(current)" check in get_signal_to_deliver() protects only the main thread. Sub-thread can dequee the fatal signal and shutdown the whole thread group except the main thread. If it dequeues SIGSTOP /sbin/init will be stopped, this is not right too. Note that we can't use is_global_init(->group_leader), this breaks exec and this can't solve other problems we have. - Even if afterwards ignored, the fatal signals sets SIGNAL_GROUP_EXIT on delivery. This breaks exec, has other bad implications, and this is just wrong. Introduce the new SIGNAL_UNKILLABLE flag to fix these problems. It also helps to solve some other problems addressed by the subsequent patches. Currently we use this flag for the global init only, but it could also be used by kthreads and (perhaps) by the sub-namespace inits. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29idr: create idr_layer_cache at boot timeAkinobu Mita
Avoid a possible kmem_cache_create() failure by creating idr_layer_cache unconditionary at boot time rather than creating it on-demand when idr_init() is called the first time. This change also enables us to eliminate the check every time idr_init() is called. [akpm@linux-foundation.org: rename init_id_cache() to idr_init_cache()] [akpm@linux-foundation.org: fix alpha build] Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29sysctl: allow embedded targets to disable sysctl_check.cHolger Schurig
Disable sysctl_check.c for embedded targets. This saves about about 11 kB in .text and another 11 kB in .data on a PXA255 embedded platform. Signed-off-by: Holger Schurig <hs4233@mail.mn-solutions.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29cgroups: add an owner to the mm_structBalbir Singh
Remove the mem_cgroup member from mm_struct and instead adds an owner. This approach was suggested by Paul Menage. The advantage of this approach is that, once the mm->owner is known, using the subsystem id, the cgroup can be determined. It also allows several control groups that are virtually grouped by mm_struct, to exist independent of the memory controller i.e., without adding mem_cgroup's for each controller, to mm_struct. A new config option CONFIG_MM_OWNER is added and the memory resource controller selects this config option. This patch also adds cgroup callbacks to notify subsystems when mm->owner changes. The mm_cgroup_changed callback is called with the task_lock() of the new task held and is called just prior to changing the mm->owner. I am indebted to Paul Menage for the several reviews of this patchset and helping me make it lighter and simpler. This patch was tested on a powerpc box, it was compiled with both the MM_OWNER config turned on and off. After the thread group leader exits, it's moved to init_css_state by cgroup_exit(), thus all future charges from runnings threads would be redirected to the init_css_set's subsystem. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Hugh Dickins <hugh@veritas.com> Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Cc: Hirokazu Takahashi <taka@valinux.co.jp> Cc: David Rientjes <rientjes@google.com>, Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Reviewed-by: Paul Menage <menage@google.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29cgroups: implement device whitelistSerge E. Hallyn
Implement a cgroup to track and enforce open and mknod restrictions on device files. A device cgroup associates a device access whitelist with each cgroup. A whitelist entry has 4 fields. 'type' is a (all), c (char), or b (block). 'all' means it applies to all types and all major and minor numbers. Major and minor are either an integer or * for all. Access is a composition of r (read), w (write), and m (mknod). The root device cgroup starts with rwm to 'all'. A child devcg gets a copy of the parent. Admins can then remove devices from the whitelist or add new entries. A child cgroup can never receive a device access which is denied its parent. However when a device access is removed from a parent it will not also be removed from the child(ren). An entry is added using devices.allow, and removed using devices.deny. For instance echo 'c 1:3 mr' > /cgroups/1/devices.allow allows cgroup 1 to read and mknod the device usually known as /dev/null. Doing echo a > /cgroups/1/devices.deny will remove the default 'a *:* mrw' entry. CAP_SYS_ADMIN is needed to change permissions or move another task to a new cgroup. A cgroup may not be granted more permissions than the cgroup's parent has. Any task can move itself between cgroups. This won't be sufficient, but we can decide the best way to adequately restrict movement later. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix may-be-used-uninitialized warning] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Acked-by: James Morris <jmorris@namei.org> Looks-good-to: Pavel Emelyanov <xemul@openvz.org> Cc: Daniel Hokka Zakrisson <daniel@hozac.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29CGroup API files: make CGROUP_DEBUG default to offPaul Menage
The cgroup debug subsystem isn't generally useful for users. It should default to "n". Signed-off-by: Paul Menage <menage@google.com> Cc: "Li Zefan" <lizf@cn.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "YAMAMOTO Takashi" <yamamoto@valinux.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>