From 17fea54bf0ab34fa09a06bbde2f58ed7bbdf9299 Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Mon, 18 May 2015 10:07:17 +0200 Subject: x86/mce: Fix MCE severity messages Derek noticed that a critical MCE gets reported with the wrong error type description: [Hardware Error]: CPU 34: Machine Check Exception: 5 Bank 9: f200003f000100b0 [Hardware Error]: RIP !INEXACT! 10: {intel_idle+0xb1/0x170} [Hardware Error]: TSC 49587b8e321cb [Hardware Error]: PROCESSOR 0:306e4 TIME 1431561296 SOCKET 1 APIC 29 [Hardware Error]: Some CPUs didn't answer in synchronization [Hardware Error]: Machine check: Invalid ^^^^^^^ The last line with 'Invalid' should have printed the high level MCE error type description we get from mce_severity, i.e. something like: [Hardware Error]: Machine check: Action required: data load error in a user process this happens due to the fact that mce_no_way_out() iterates over all MCA banks and possibly overwrites the @msg argument which is used in the panic printing later. Change behavior to take the message of only and the (last) critical MCE it detects. Reported-by: Derek Signed-off-by: Borislav Petkov Cc: Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Tony Luck Link: http://lkml.kernel.org/r/1431936437-25286-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index e535533..20190bd 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -708,6 +708,7 @@ static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp, struct pt_regs *regs) { int i, ret = 0; + char *tmp; for (i = 0; i < mca_cfg.banks; i++) { m->status = mce_rdmsrl(MSR_IA32_MCx_STATUS(i)); @@ -716,9 +717,11 @@ static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp, if (quirk_no_way_out) quirk_no_way_out(i, m, regs); } - if (mce_severity(m, mca_cfg.tolerant, msg, true) >= - MCE_PANIC_SEVERITY) + + if (mce_severity(m, mca_cfg.tolerant, &tmp, true) >= MCE_PANIC_SEVERITY) { + *msg = tmp; ret = 1; + } } return ret; } -- cgit v0.10.2 From ea8e080b604e2c8221af778221d83a59b6529c5b Mon Sep 17 00:00:00 2001 From: Aravind Gopalakrishnan Date: Mon, 18 May 2015 10:07:16 +0200 Subject: x86/Documentation: Update the contact email for L3 cache index disable functionality The mailing list discuss@x86-64.org is now defunct. Use the lkml in its place. Signed-off-by: Aravind Gopalakrishnan Signed-off-by: Borislav Petkov Cc: Greg KH Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-api@vger.kernel.org Cc: sboyd@codeaurora.org Cc: sudeep.holla@arm.com Link: http://lkml.kernel.org/r/1431125098-9470-1-git-send-email-Aravind.Gopalakrishnan@amd.com Link: http://lkml.kernel.org/r/1431936437-25286-2-git-send-email-bp@alien8.de [ Changed the contact email to lkml. ] Signed-off-by: Ingo Molnar diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index 99983e6..da95513 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -162,7 +162,7 @@ Description: Discover CPUs in the same CPU frequency coordination domain What: /sys/devices/system/cpu/cpu*/cache/index3/cache_disable_{0,1} Date: August 2008 KernelVersion: 2.6.27 -Contact: discuss@x86-64.org +Contact: Linux kernel mailing list Description: Disable L3 cache indices These files exist in every CPU's cache/index3 directory. Each -- cgit v0.10.2 From e88221c50cadade0eb4f7f149f4967d760212695 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Wed, 20 May 2015 11:45:30 +0200 Subject: x86/fpu: Disable XSAVES* support for now The kernel's handling of 'compacted' xsave state layout is buggy: http://marc.info/?l=linux-kernel&m=142967852317199 I don't have such a system, and the description there is vague, but from extrapolation I guess that there were two kinds of bugs observed: - boot crashes, due to size calculations being wrong and the dynamic allocation allocating a too small xstate area. (This is now fixed in the new FPU code - but still present in stable kernels.) - FPU state corruption and ABI breakage: if signal handlers try to change the FPU state in standard format, which then the kernel tries to restore in the compacted format. These breakages are scary, but they only occur on a small number of systems that have XSAVES* CPU support. Yet we have had XSAVES support in the upstream kernel for a large number of stable kernel releases, and the fixes are involved and unproven. So do the safe resolution first: disable XSAVES* support and only use the standard xstate format. This makes the code work and is easy to backport. On top of this we can work on enabling (and testing!) proper compacted format support, without backporting pressure, on top of the new, cleaned up FPU code. Cc: Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: Fenghua Yu Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Ingo Molnar diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c index 00918327..6185d31 100644 --- a/arch/x86/kernel/i387.c +++ b/arch/x86/kernel/i387.c @@ -173,6 +173,21 @@ static void init_thread_xstate(void) xstate_size = sizeof(struct i387_fxsave_struct); else xstate_size = sizeof(struct i387_fsave_struct); + + /* + * Quirk: we don't yet handle the XSAVES* instructions + * correctly, as we don't correctly convert between + * standard and compacted format when interfacing + * with user-space - so disable it for now. + * + * The difference is small: with recent CPUs the + * compacted format is only marginally smaller than + * the standard FPU state format. + * + * ( This is easy to backport while we are fixing + * XSAVES* support. ) + */ + setup_clear_cpu_cap(X86_FEATURE_XSAVES); } /* -- cgit v0.10.2