Age | Commit message (Collapse) | Author |
|
Currently, if RCU CPU stall warnings are enabled, they are enabled
immediately upon boot. They can be manually disabled via /sys (and
also re-enabled via /sys), and are automatically disabled upon panic.
However, some users need RCU CPU stalls to be disabled at boot time,
but to be enabled without rebuilding/rebooting. For example, someone
running a real-time application in production might not want the
additional latency of RCU CPU stall detection in normal operation, but
might need to enable it at any point for fault isolation purposes.
This commit therefore provides a new CONFIG_RCU_CPU_STALL_DETECTOR_RUNNABLE
kernel configuration parameter that maintains the current behavior
(enable at boot) by default, but allows a kernel to be configured
with RCU CPU stall detection built into the kernel, but disabled at
boot time.
Requested-by: Clark Williams <williams@redhat.com>
Requested-by: John Kacur <jkacur@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Because both TINY_RCU and TREE_PREEMPT_RCU have been in mainline for
several releases, it is time to restrict the use of TREE_RCU to SMP
non-preemptible systems. This reduces testing/validation effort. This
commit is a first step towards driving the selection of RCU implementation
directly off of the SMP and PREEMPT configuration parameters.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Set the permissions of the rcu_cpu_stall_suppress to 644 to enable RCU
CPU stall warnings to be enabled and disabled at runtime via sysfs.
Suggested-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Reported-by: Kyle Hubert <khubert@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
RCU heads really don't need to be initialized. Their state before call_rcu()
really does not matter.
We need to keep init/destroy_rcu_head_on_stack() though, since we want
debugobjects to be able to keep track of these objects.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: David S. Miller <davem@davemloft.net>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: akpm@linux-foundation.org
CC: mingo@elte.hu
CC: laijs@cn.fujitsu.com
CC: dipankar@in.ibm.com
CC: josh@joshtriplett.org
CC: dvhltc@us.ibm.com
CC: niv@us.ibm.com
CC: tglx@linutronix.de
CC: peterz@infradead.org
CC: rostedt@goodmis.org
CC: Valdis.Kletnieks@vt.edu
CC: dhowells@redhat.com
CC: eric.dumazet@gmail.com
CC: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
This adds annotations for RCU operations in core kernel components
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Nick Piggin <npiggin@suse.de>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Make it explicit that new RCU read-side critical sections that start
after call_rcu() and synchronize_rcu() start might still be running
after the end of the relevant grace period.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Although the RCU CPU stall warning messages are a very good way to alert
people to a problem, once alerted, it is sometimes helpful to shut them
off in order to avoid obscuring other messages that might be being used
to track down the problem. Although you can rebuild the kernel with
CONFIG_RCU_CPU_STALL_DETECTOR=n, this is sometimes inconvenient. This
commit therefore adds a boot parameter named "rcu_cpu_stall_suppress"
that shuts these messages off without requiring a rebuild (though a
reboot might be needed for those not brave enough to patch their kernel
while it is running).
This message-suppression was already in place for the panic case, so this
commit need only rename the variable and export it via module_param().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
|
Also set the default to 60 seconds, up from the previous hard-coded timeout
of 10 seconds. This allows people who care to set short timeouts, while
avoiding people with unusual configurations (make randconfig!!!) from being
bothered with spurious CPU stall warnings.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
find_task_by_vpid() says "Must be called under rcu_read_lock().". But due to
commit 3120438 "rcu: Disable lockdep checking in RCU list-traversal primitives",
we are currently unable to catch "find_task_by_vpid() with tasklist_lock held
but RCU lock not held" errors due to the RCU-lockdep checks being
suppressed in the RCU variants of the struct list_head traversals.
This commit therefore places an explicit check for being in an RCU
read-side critical section in find_task_by_pid_ns().
===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
kernel/pid.c:386 invoked rcu_dereference_check() without protection!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
1 lock held by rc.sysinit/1102:
#0: (tasklist_lock){.+.+..}, at: [<c1048340>] sys_setpgid+0x40/0x160
stack backtrace:
Pid: 1102, comm: rc.sysinit Not tainted 2.6.35-rc3-dirty #1
Call Trace:
[<c105e714>] lockdep_rcu_dereference+0x94/0xb0
[<c104b4cd>] find_task_by_pid_ns+0x6d/0x70
[<c104b4e8>] find_task_by_vpid+0x18/0x20
[<c1048347>] sys_setpgid+0x47/0x160
[<c1002b50>] sysenter_do_call+0x12/0x36
Commit updated to use a new rcu_lockdep_assert() exported API rather than
the old internal __do_rcu_dereference().
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
&percpu_data is compatible with allocated percpu data.
And we use it and remove the "->rda[NR_CPUS]" array, saving significant
storage on systems with large numbers of CPUs. This does add an additional
level of indirection and thus an additional cache line referenced, but
because ->rda is not used on the read side, this is OK.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Add random preemption to help we to torture the preemptable rcu.
srcu_read_delay() also calls rcu_read_delay() for shorter delays.
Added comment to preempt_schedule() call indicating that no quiescent
states happen if preemption is disabled.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Add a section describing PROVE_RCU, DEBUG_OBJECTS_RCU_HEAD, and
the __rcu sparse checking to the RCU checklist.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
This avoids warnings from missing __rcu annotations
in the rculist implementation, making it possible to
use the same lists in both RCU and non-RCU cases.
We can add rculist annotations later, together with
lockdep support for rculist, which is missing as well,
but that may involve changing all the users.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
This commit provides definitions for the __rcu annotation defined earlier.
This annotation permits sparse to check for correct use of RCU-protected
pointers. If a pointer that is annotated with __rcu is accessed
directly (as opposed to via rcu_dereference(), rcu_assign_pointer(),
or one of their variants), sparse can be made to complain. To enable
such complaints, use the new default-disabled CONFIG_SPARSE_RCU_POINTER
kernel configuration option. Please note that these sparse complaints are
intended to be a debugging aid, -not- a code-style-enforcement mechanism.
There are special rcu_dereference_protected() and rcu_access_pointer()
accessors for use when RCU read-side protection is not required, for
example, when no other CPU has access to the data structure in question
or while the current CPU hold the update-side lock.
This patch also updates a number of docbook comments that were showing
their age.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Christopher Li <sparse@chrisli.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
|
|
The task_cls_classid() function applies rcu_dereference() to integers,
which does not work with the shiny new sparse-based checking in
rcu_dereference(). This commit therefore moves to the new RCU API
rcu_dereference_index_check().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
When testing cpu hotplug code on 32-bit we kept hitting the "CPU%d:
Stuck ??" message due to multiple cores concurrently accessing the
cpu_callin_mask, among others.
Since these codepaths are not protected from concurrent access due to
the fact that there's no sane reason for making an already complex
code unnecessarily more complex - we hit the issue only when insanely
switching cores off- and online - serialize hotplugging cores on the
sysfs level and be done with it.
[ v2.1: fix !HOTPLUG_CPU build ]
Cc: <stable@kernel.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100819181029.GC17171@aftab>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
kprobes/x86: Fix the return address of multiple kretprobes
perf tools: Fix build error on read only source.
perf, x86: Fix Intel-nhm PMU programming errata workaround
|
|
Commit c7b28e25cb9beb943aead770ff14551b55fa8c79 ("mtd: nand: refactor BB
marker detection") caused a regression in detection of factory-set bad
block markers, especially for certain small-page NAND. This fix removes
some unneeded constraints on using NAND_SMALL_BADBLOCK_POS, making the
detection code more correct.
This regression can be seen, for example, in Hynix HY27US081G1M and
similar.
Signed-off-by: Brian Norris <norris@broadcom.com>
Tested-by: Michael Guntsche <mike@it-loops.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
|
|
Fix the return address of subsequent kretprobes when multiple
kretprobes are set on the same function.
For example:
# cd /sys/kernel/debug/tracing
# echo "r:event1 sys_symlink" > kprobe_events
# echo "r:event2 sys_symlink" >> kprobe_events
# echo 1 > events/kprobes/enable
# ln -s /tmp/foo /tmp/bar
(without this patch)
# cat trace
ln-897 [000] 20404.133727: event1: (kretprobe_trampoline+0x0/0x4c <- sys_symlink)
ln-897 [000] 20404.133747: event2: (system_call_fastpath+0x16/0x1b <- sys_symlink)
(with this patch)
# cat trace
ln-740 [000] 13799.491076: event1: (system_call_fastpath+0x16/0x1b <- sys_symlink)
ln-740 [000] 13799.491096: event2: (system_call_fastpath+0x16/0x1b <- sys_symlink)
Signed-off-by: KUMANO Syuhei <kumano.prog@gmail.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
LKML-Reference: <1281853084.3254.11.camel@camp10-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/urgent
|
|
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Fix an Oops in the NFSv4 atomic open code
NFS: Fix the selection of security flavours in Kconfig
NFS: fix the return value of nfs_file_fsync()
rpcrdma: Fix SQ size calculation when memreg is FRMR
xprtrdma: Do not truncate iova_start values in frmr registrations.
nfs: Remove redundant NULL check upon kfree()
nfs: Add "lookupcache" to displayed mount options
NFS: allow close-to-open cache semantics to apply to root of NFS filesystem
SUNRPC: fix NFS client over TCP hangs due to packet loss (Bug 16494)
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
USB HID: Add ID for eGalax Multitouch used in JooJoo tablet
HID: hiddev: fix memory corruption due to invalid intfdata
HID: hiddev: protect against disconnect/NULL-dereference race
HID: picolcd: correct ordering of framebuffer freeing
HID: picolcd: testing the wrong variable
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] Fix build error: conflicting types for ‘sys_execve’
|
|
Fix dummy inline stubs for trampoline-related functions when no
trampolines exist (until we get rid of the no-trampoline case
entirely.)
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <4C6C294D.3030404@zytor.com>
|
|
Fix the declaration of sys_execve() in asm-generic/syscalls.h to have
various consts applied to its pointers.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
arch/ia64/kernel/process.c:636: error: conflicting types for ‘sys_execve’
commit d7627467b7a8dd6944885290a03a07ceb28c10eb
Make do_execve() take a const filename pointer
Missed the declaration of sys_execve in the ia64 asm/unistd.h (perhaps
because there is no reason for it to be there ... it might be a left over
from the COMPAT code?). Just delete the conflicting version.
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
fs: brlock vfsmount_lock
fs: scale files_lock
lglock: introduce special lglock and brlock spin locks
tty: fix fu_list abuse
fs: cleanup files_lock locking
fs: remove extra lookup in __lookup_hash
fs: fs_struct rwlock to spinlock
apparmor: use task path helpers
fs: dentry allocation consolidation
fs: fix do_lookup false negative
mbcache: Limit the maximum number of cache entries
hostfs ->follow_link() braino
hostfs: dumb (and usually harmless) tpyo - strncpy instead of strlcpy
remove SWRITE* I/O types
kill BH_Ordered flag
vfs: update ctime when changing the file's permission by setfacl
cramfs: only unlock new inodes
fix reiserfs_evict_inode end_writeback second call
|
|
This fixes a build breakage introduced by commit 4c2ef25fe0b8 ("mmc: fix
all hangs related to mmc/sd card insert/removal during suspend/resume")
Cc: David Brownell <david-b@pacbell.net>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-mmc@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Kukjin Kim <kgene.kim@samsung.com>
Acked-by: Maxim Levitsky <maximlevitsky@gmail.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Parts of the build process were generating files outside the specified
O= directory, causing the build to fail on systems where the sources are
in a read only file system.
Fix it by using $(OUTPUT) on these locations.
Also check that $(OUTPUT) actually exists, just like the top level
kernel Makefile does. Otherwise the failure message emitted is
completely misleading.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20100817140841.0859362C03A@msa106.auone-net.jp>
Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf tools: Fix build on POSIX shells
latencytop: Fix kconfig dependency warnings
perf annotate tui: Fix exit and RIGHT keys handling
tracing: Sanitize value returned from write(trace_marker, "...", len)
tracing/events: Convert format output to seq_file
tracing: Extend recordmcount to better support Blackfin mcount
tracing: Fix ring_buffer_read_page reading out of page boundary
tracing: Fix an unallocated memory access in function_graph
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: emu10k1 - delay the PCM interrupts (add pcm_irq_delay parameter)
ALSA: hda - Fix ALC680 base model capture
ASoC: Remove DSP mode support for WM8776
ALSA: hda - Add quirk for Dell Vostro 1220
ALSA: riptide - Fix detection / load of firmware files
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68knommu: include sched.h in ColdFire/SPI driver
m68knommu: formatting of pointers in printk()
m68knommu: arch/m68k/include/asm/ide.h fix for nommu
|
|
* 'for-linus' of git://neil.brown.name/md:
md raid-1/10 Fix bio_rw bit manipulations again
md: provide appropriate return value for spare_active functions.
md: Notify sysfs when RAID1/5/10 disk is In_sync.
Update recovery_offset even when external metadata is used.
|
|
* 'merge-devicetree' of git://git.secretlab.ca/git/linux-2.6:
spi.h: missing kernel-doc notation, please fix
of: fix missing headers for of_address_to_resource() in MTD and SysACE drivers
of: Fix missing includes
ata: update for of_device to platform_device replacement
microblaze: Fix of: eliminate of_device->node and dev_archdata->{of,prom}_node
microblaze: Fix of/address: Merge all of the bus translation code
booting-without-of: Remove nonexistent chapters from TOC, fix numbering
|
|
This patch fixes machine crashes which occur when heavily exercising the
CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by
AMD Erratum 383 and result in a fatal machine check exception. Here's
the scenario:
1. On 32-bit, the swapper_pg_dir page table is used as the initial page
table for booting a secondary CPU.
2. To make this work, swapper_pg_dir needs a direct mapping of physical
memory in it (the low mappings). By adding those low, large page (2M)
mappings (PAE kernel), we create the necessary conditions for Erratum
383 to occur.
3. Other CPUs which do not participate in the off- and onlining game may
use swapper_pg_dir while the low mappings are present (when leave_mm is
called). For all steps below, the CPU referred to is a CPU that is using
swapper_pg_dir, and not the CPU which is being onlined.
4. The presence of the low mappings in swapper_pg_dir can result
in TLB entries for addresses below __PAGE_OFFSET to be established
speculatively. These TLB entries are marked global and large.
5. When the CPU with such TLB entry switches to another page table, this
TLB entry remains because it is global.
6. The process then generates an access to an address covered by the
above TLB entry but there is a permission mismatch - the TLB entry
covers a large global page not accessible to userspace.
7. Due to this permission mismatch a new 4kb, user TLB entry gets
established. Further, Erratum 383 provides for a small window of time
where both TLB entries are present. This results in an uncorrectable
machine check exception signalling a TLB multimatch which panics the
machine.
There are two ways to fix this issue:
1. Always do a global TLB flush when a new cr3 is loaded and the
old page table was swapper_pg_dir. I consider this a hack hard
to understand and with performance implications
2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
does.
This patch implements solution 2. It introduces a trampoline_pg_dir
which has the same layout as swapper_pg_dir with low_mappings. This page
table is used as the initial page table of the booting CPU. Later in the
bringup process, it switches to swapper_pg_dir and does a global TLB
flush. This fixes the crashes in our test cases.
-v2: switch to swapper_pg_dir right after entering start_secondary() so
that we are able to access percpu data which might not be mapped in the
trampoline page table.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <20100816123833.GB28147@aftab>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
|
A bug in the family-model-stepping matching code caused the presence of
errata to go undetected when OSVW was not used. This causes hangs on
some K8 systems because the E400 workaround is not enabled.
Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
LKML-Reference: <1282141190-930137-1-git-send-email-hans.rosenfeld@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
|
Adam Lackorzynski reports:
with 2.6.35.2 I'm getting this reproducible Oops:
[ 110.825396] BUG: unable to handle kernel NULL pointer dereference at
(null)
[ 110.828638] IP: [<ffffffff811247b7>] encode_attrs+0x1a/0x2a4
[ 110.828638] PGD be89f067 PUD bf18f067 PMD 0
[ 110.828638] Oops: 0000 [#1] SMP
[ 110.828638] last sysfs file: /sys/class/net/lo/operstate
[ 110.828638] CPU 2
[ 110.828638] Modules linked in: rtc_cmos rtc_core rtc_lib amd64_edac_mod
i2c_amd756 edac_core i2c_core dm_mirror dm_region_hash dm_log dm_snapshot
sg sr_mod usb_storage ohci_hcd mptspi tg3 mptscsih mptbase usbcore nls_base
[last unloaded: scsi_wait_scan]
[ 110.828638]
[ 110.828638] Pid: 11264, comm: setchecksum Not tainted 2.6.35.2 #1
[ 110.828638] RIP: 0010:[<ffffffff811247b7>] [<ffffffff811247b7>]
encode_attrs+0x1a/0x2a4
[ 110.828638] RSP: 0000:ffff88003bf5b878 EFLAGS: 00010296
[ 110.828638] RAX: ffff8800bddb48a8 RBX: ffff88003bf5bb18 RCX:
0000000000000000
[ 110.828638] RDX: ffff8800be258800 RSI: 0000000000000000 RDI:
ffff88003bf5b9f8
[ 110.828638] RBP: 0000000000000000 R08: ffff8800bddb48a8 R09:
0000000000000004
[ 110.828638] R10: 0000000000000003 R11: ffff8800be779000 R12:
ffff8800be258800
[ 110.828638] R13: ffff88003bf5b9f8 R14: ffff88003bf5bb20 R15:
ffff8800be258800
[ 110.828638] FS: 0000000000000000(0000) GS:ffff880041e00000(0063)
knlGS:00000000556bd6b0
[ 110.828638] CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
[ 110.828638] CR2: 0000000000000000 CR3: 00000000be8ef000 CR4:
00000000000006e0
[ 110.828638] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 110.828638] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 110.828638] Process setchecksum (pid: 11264, threadinfo
ffff88003bf5a000, task ffff88003f232210)
[ 110.828638] Stack:
[ 110.828638] 0000000000000000 ffff8800bfbcf920 0000000000000000
0000000000000ffe
[ 110.828638] <0> 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[ 110.828638] <0> 0000000000000000 0000000000000000 0000000000000000
0000000000000000
[ 110.828638] Call Trace:
[ 110.828638] [<ffffffff81124c1f>] ? nfs4_xdr_enc_setattr+0x90/0xb4
[ 110.828638] [<ffffffff81371161>] ? call_transmit+0x1c3/0x24a
[ 110.828638] [<ffffffff813774d9>] ? __rpc_execute+0x78/0x22a
[ 110.828638] [<ffffffff81371a91>] ? rpc_run_task+0x21/0x2b
[ 110.828638] [<ffffffff81371b7e>] ? rpc_call_sync+0x3d/0x5d
[ 110.828638] [<ffffffff8111e284>] ? _nfs4_do_setattr+0x11b/0x147
[ 110.828638] [<ffffffff81109466>] ? nfs_init_locked+0x0/0x32
[ 110.828638] [<ffffffff810ac521>] ? ifind+0x4e/0x90
[ 110.828638] [<ffffffff8111e2fb>] ? nfs4_do_setattr+0x4b/0x6e
[ 110.828638] [<ffffffff8111e634>] ? nfs4_do_open+0x291/0x3a6
[ 110.828638] [<ffffffff8111ed81>] ? nfs4_open_revalidate+0x63/0x14a
[ 110.828638] [<ffffffff811056c4>] ? nfs_open_revalidate+0xd7/0x161
[ 110.828638] [<ffffffff810a2de4>] ? do_lookup+0x1a4/0x201
[ 110.828638] [<ffffffff810a4733>] ? link_path_walk+0x6a/0x9d5
[ 110.828638] [<ffffffff810a42b6>] ? do_last+0x17b/0x58e
[ 110.828638] [<ffffffff810a5fbe>] ? do_filp_open+0x1bd/0x56e
[ 110.828638] [<ffffffff811cd5e0>] ? _atomic_dec_and_lock+0x30/0x48
[ 110.828638] [<ffffffff810a9b1b>] ? dput+0x37/0x152
[ 110.828638] [<ffffffff810ae063>] ? alloc_fd+0x69/0x10a
[ 110.828638] [<ffffffff81099f39>] ? do_sys_open+0x56/0x100
[ 110.828638] [<ffffffff81027a22>] ? ia32_sysret+0x0/0x5
[ 110.828638] Code: 83 f1 01 e8 f5 ca ff ff 48 83 c4 50 5b 5d 41 5c c3 41
57 41 56 41 55 49 89 fd 41 54 49 89 d4 55 48 89 f5 53 48 81 ec 18 01 00 00
<8b> 06 89 c2 83 e2 08 83 fa 01 19 db 83 e3 f8 83 c3 18 a8 01 8d
[ 110.828638] RIP [<ffffffff811247b7>] encode_attrs+0x1a/0x2a4
[ 110.828638] RSP <ffff88003bf5b878>
[ 110.828638] CR2: 0000000000000000
[ 112.840396] ---[ end trace 95282e83fd77358f ]---
We need to ensure that the O_EXCL flag is turned off if the user doesn't
set O_CREAT.
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
|