summaryrefslogtreecommitdiff
path: root/drivers/edac
AgeCommit message (Collapse)Author
2015-06-24EDAC, mce_amd_inj: Set MISCV on injectionBorislav Petkov
When during injection we populate MCi_MISC by writing into misc, we need to set the MiscV bit in the corresponding MCi_STATUS register which denotes that there's valid info in the MCi_MISC register. Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-24EDAC, mce_amd_inj: Move bit preparations before the injectionBorislav Petkov
We do get_online_cpus() and then start noodling with the bits. Do that *before* we grab the hotplug lock. Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-24EDAC, mce_amd_inj: Cleanup and simplify READMEBorislav Petkov
Save us an indentation level, widen to 80 cols, make the text more succinct and slender. Use i as the bank variable, same as what the documentation uses. Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-24EDAC, altera: Do not allow suspend when EDAC is enabledAlan Tull
Suspend-to-RAM and EDAC support are mutually exclusive on SOCFPGA. If EDAC is enabled, it will prevent the platform from going into suspend. The reason is that the IRQ vectors for OCRAM reside on DDR and in Suspend-to-RAM mode we're executing out of OCRAM. If an ECC error occurs, we can't handle it so it was decided to make them mutually exclusive. Signed-off-by: Alan Tull <atull@opensource.altera.com> Signed-off-by: Dinh Nguyen <dinguyen@opensource.altera.com> Cc: dinh.linux@gmail.com Cc: dougthompson@xmission.com Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mchehab@osg.samsung.com Cc: tthayer@opensource.altera.com Link: http://lkml.kernel.org/r/1433512155-9906-1-git-send-email-dinguyen@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-24EDAC, mce_amd_inj: Make inj_type statickbuild test robot
It is used there only anyway. Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Mauro Carvalho Chehab <m.chehab@samsung.com> Cc: kbuild-all@01.org Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20150605112426.GA97073@lkp-sb04 Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-24EDAC, altera: Add Arria10 EDAC supportThor Thayer
The Arria10 SDRAM and ECC system differs significantly from the Cyclone5 and Arria5 SoCs. This patch adds support for the Arria10 SoC. 1) IRQ handler needs to support SHARED IRQ 2) Support sberr and dberr address reporting. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: galak@codeaurora.org Cc: grant.likely@linaro.org Cc: ijc+devicetree@hellion.org.uk Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: m.chehab@samsung.com Cc: mark.rutland@arm.com Cc: pawel.moll@arm.com Cc: robh+dt@kernel.org Cc: tthayer.linux@gmail.com Link: http://lkml.kernel.org/r/1433428128-7292-4-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-24EDAC, altera: Refactor for Altera CycloneV SoCThor Thayer
The Arria10 SoC uses a completely different SDRAM controller from the earlier CycloneV and ArriaV SoCs. This patch abstracts the SDRAM bits for the CycloneV/ArriaV SoCs in preparation for the Arria10 support. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: galak@codeaurora.org Cc: grant.likely@linaro.org Cc: ijc+devicetree@hellion.org.uk Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: m.chehab@samsung.com Cc: mark.rutland@arm.com Cc: pawel.moll@arm.com Cc: robh+dt@kernel.org Cc: tthayer.linux@gmail.com Link: http://lkml.kernel.org/r/1433428128-7292-3-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-24EDAC, altera: Generalize driver to use DT Memory sizeThor Thayer
The Arria10 SOC uses a completely different SDRAM controller from the earlier CycloneV and ArriaV SoCs. The memory size is calculated in the bootloader and passed via the device tree. Using this device tree size is more generic than using the register fields to calculate the memory size for different SDRAM controllers. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: devicetree@vger.kernel.org Cc: dinguyen@opensource.altera.com Cc: galak@codeaurora.org Cc: grant.likely@linaro.org Cc: ijc+devicetree@hellion.org.uk Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: m.chehab@samsung.com Cc: mark.rutland@arm.com Cc: pawel.moll@arm.com Cc: robh+dt@kernel.org Cc: tthayer.linux@gmail.com Link: http://lkml.kernel.org/r/1433428128-7292-2-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-24EDAC, mce_amd_inj: Add README fileAravind Gopalakrishnan
Provide information about each injection file and its usage for ease of use and in-band documentation. This is a good idea adapted from ftrace. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mchehab@osg.samsung.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1433277362-10911-7-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-24EDAC, mce_amd_inj: Add individual permissions field to dfs_nodeAravind Gopalakrishnan
Add per-file permissions to the dfs_fls[] array. In a later patch, we will add a README file that needs different permissions. Hence the move here to add a perm field. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mchehab@osg.samsung.com Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1433277362-10911-6-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-03EDAC, mce_amd_inj: Modify flags attribute to use string argumentsAravind Gopalakrishnan
Use strings such as "hw" or "sw" to indicate the type of error injection to be performed. Current flags attribute derives the meanings of values that can be programmed into it from asm/mce.h. Moving to defined strings for the attribute allows this module to be self-sufficient and removes the dependency. Also, we can introduce new flags as and when needed without having to worry about conflicting with the flags already defined in asm/mce.h. Also, modify do_inject() to use the newly defined injection_type enum to figure out the injection mechanism we need to use Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mchehab@osg.samsung.com Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1433277362-10911-4-git-send-email-Aravind.Gopalakrishnan@amd.com [ Use strstrip() return value. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-03EDAC, mce_amd_inj: Read out number of MCE banks from the hardwareAravind Gopalakrishnan
The number of banks for a given processor is encoded in MSR_IA32_MCG_CAP[7:0]. So obtain the value from that MSR and use it for sanity checking in inj_bank_set() instead of doing a family/model check. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mchehab@osg.samsung.com Link: http://lkml.kernel.org/r/1432753418-2985-3-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-03EDAC, mce_amd_inj: Use MCE_INJECT_GET macro for bank node tooAravind Gopalakrishnan
inj_bank_get() is generic enough that we can use the MCE_INJECT_GET macro instead. No functionality change. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mchehab@osg.samsung.com Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/1433277362-10911-2-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-06-02EDAC, xgene: Fix cpuid abuseArnd Bergmann
The new x-gene EDAC driver incorrectly tried to figure out the version of one of its IP blocks by looking at the version of the CPU core, which is only vagely related. This removes the incorrect code and instead uses the version of the IP block in the compatible string where it belongs. Found using build testing on x86, which does not provide the arm64 cpuid interface. Signed-off-by: Arnd Bergmann <arnd@arndb.de> [ Changed subnode to "apm,xgene-edac-pmd-v2", adjusted check. ] Signed-off-by: Loc Ho <lho@apm.com> Cc: devicetree@vger.kernel.org Cc: dougthompson@xmission.com Cc: ijc+devicetree@hellion.org.uk Cc: jcm@redhat.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mark.rutland@arm.com Cc: mchehab@osg.samsung.com Cc: patches@apm.com Cc: robh+dt@kernel.org Link: http://lkml.kernel.org/r/3195065.IK73o60xya@wuerfel Signed-off-by: Borislav Petkov <bp@suse.de>
2015-05-31EDAC, mpc85xx: Extend error address to 64 bitYork Sun
Extend err_addr to cover 64 bits for DDR errors. Signed-off-by: York Sun <yorksun@freescale.com> Acked-by: Johannes Thumshirn <morbidrsa@gmail.com> Cc: Mingkai.hu@freescale.com Link: http://lkml.kernel.org/r/1431425022-44766-2-git-send-email-Wenbin.Song@freescale.com Signed-off-by: songwenbin <wenbin.song@freescale.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2015-05-31EDAC, mpc8xxx: Adapt for FSL SoCYork Sun
Remove mpc83xx and mpc85xx as dependency. Signed-off-by: York Sun <yorksun@freescale.com> Acked-by: Johannes Thumshirn <morbidrsa@gmail.com> Cc: Mingkai.hu@freescale.com Link: http://lkml.kernel.org/r/1431425022-44766-1-git-send-email-Wenbin.Song@freescale.com Signed-off-by: songwenbin <wenbin.song@freescale.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2015-05-29EDAC, edac_stub: Drop arch-specific includeBorislav Petkov
<asm/edac.h> contains only the arch-specific scrubbing function and is thus not needed in edac_stub.c. Kill it. Signed-off-by: Borislav Petkov <bp@suse.de>
2015-05-29EDAC: Add APM X-Gene SoC EDAC driverLoc Ho
Add support for the APM X-Gene SoC EDAC driver. Signed-off-by: Loc Ho <lho@apm.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: devicetree@vger.kernel.org Cc: dougthompson@xmission.com Cc: ijc+devicetree@hellion.org.uk Cc: jcm@redhat.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac <linux-edac@vger.kernel.org> Cc: mark.rutland@arm.com Cc: mchehab@osg.samsung.com Cc: patches@apm.com Cc: robh+dt@kernel.org Link: http://lkml.kernel.org/r/1432337580-3750-5-git-send-email-lho@apm.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-05-28EDAC: Cleanup atomic_scrub messBorislav Petkov
So first of all, this atomic_scrub() function's naming is bad. It looks like an atomic_t helper. Change it to edac_atomic_scrub(). The bigger problem is that this function is arch-specific and every new arch which doesn't necessarily need that functionality still needs to define it, otherwise EDAC doesn't compile. So instead of doing that and including arch-specific headers, have each arch define an EDAC_ATOMIC_SCRUB symbol which can be used in edac_mc.c for ifdeffery. Much cleaner. And we already are doing this with another symbol - EDAC_SUPPORT. This is also much cleaner than having CONFIG_EDAC enumerate all the arches which need/have EDAC support and drivers. This way I can kill the useless edac.h header in tile too. Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Chris Metcalf <cmetcalf@ezchip.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Doug Thompson <dougthompson@xmission.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-edac@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linuxppc-dev@lists.ozlabs.org Cc: "Maciej W. Rozycki" <macro@codesourcery.com> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Paul Mackerras <paulus@samba.org> Cc: "Steven J. Hill" <Steven.Hill@imgtec.com> Cc: x86@kernel.org Signed-off-by: Borislav Petkov <bp@suse.de>
2015-05-03EDAC, altera: Do not build it as a moduleThor Thayer
The SDRAM EDAC requires SDRAM configuration/initialization before SDRAM is accessed (in the preloader) and therefore before Linux is loaded. Having a module compile is not desired so force to be built into kernel. Signed-off-by: Thor Thayer <tthayer@opensource.altera.com> Cc: mchehab@osg.samsung.com Cc: Takashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/1429308974-26380-3-git-send-email-tthayer@opensource.altera.com Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-20EDAC: Constify of_device_id arrayFabian Frederick
of_device_id is always used as const. See driver.of_match_table and open firmware functions. Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Robert Richter <rric@kernel.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Johannes Thumshirn <johannes.thumshirn@men.de> Cc: Michal Simek <michal.simek@xilinx.com> Cc: Sören Brinkmann <soren.brinkmann@xilinx.com> Cc: linux-edac@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1426535685-25996-10-git-send-email-fabf@skynet.be Signed-off-by: Borislav Petkov <bp@suse.de>
2015-03-11EDAC, i82443bxgx: Don't export static symbolJulia Lawall
The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ type T; identifier f; @@ static T f (...) { ... } @@ identifier r.f; declarer name EXPORT_SYMBOL_GPL; @@ -EXPORT_SYMBOL_GPL(f); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Tim Small <tim@buttersideup.com> Link: http://lkml.kernel.org/r/1426092997-30605-13-git-send-email-Julia.Lawall@lip6.fr Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23EDAC, amd64_edac: Get rid of per-node driver instancesBorislav Petkov
... and do the proper thing using EDAC core facilities. Cc: Daniel J Blueman <daniel@numascale.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23EDAC: Properly unwind on failure path in edac_init()Alexey Khoroshilov
edac_init() does not deallocate already allocated resources on failure path. Found by Linux Driver Verification project (linuxtesting.org). [ Boris: The unwind path functions have __exit annotation but are being used in an __init function, leading to section mismatches. Drop the section annotation and make them normal functions. ] Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Link: http://lkml.kernel.org/r/1423203162-26368-1-git-send-email-khoroshilov@ispras.ru Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23EDAC: highbank: Use static attribute groups for sysfs entriesTakashi Iwai
... instead of manual device_create_file() and device_remove_file() calls. Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/1423046938-18111-9-git-send-email-tiwai@suse.de Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23EDAC: octeon: Use static attribute groups for sysfs entriesTakashi Iwai
... instead of manual device_create_file() and device_remove_file() calls. Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/1423046938-18111-8-git-send-email-tiwai@suse.de Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23EDAC: mpc85xx: Use static attribute groups for sysfs entriesTakashi Iwai
... instead of manual device_create_file() and device_remove_file() calls. Signed-off-by: Takashi Iwai <tiwai@suse.de> Cc: Johannes Thumshirn <johannes.thumshirn@men.de> Link: http://lkml.kernel.org/r/1423046938-18111-7-git-send-email-tiwai@suse.de Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23EDAC: i7core: Use static attribute groups for sysfs entriesTakashi Iwai
... instead of manual device_create_file() and device_remove_file() calls. Signed-off-by: Takashi Iwai <tiwai@suse.de> [ Add NULL terminator to i7core_dev_attrs[] caught by the build robot. ] Reported-by: Huang Ying <ying.huang@intel.com> Link: http://lkml.kernel.org/r/1423046938-18111-6-git-send-email-tiwai@suse.de Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23EDAC: i7core: Return proper error codes for kzalloc() errorsTakashi Iwai
... instead of possibly uninitialized return value. Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/1423046938-18111-5-git-send-email-tiwai@suse.de [ Add a commit message, albeit a small one. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23EDAC: amd64: Use static attribute groupsTakashi Iwai
Instead of calling device_create_file() and device_remove_file() manually, pass the static attribute groups with the new edac_mc_add_mc_with_groups(). The conditional creation of inject sysfs files is done by a proper is_visible callback. Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/1423046938-18111-4-git-send-email-tiwai@suse.de Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23EDAC: Allow to pass driver-specific attribute groupsTakashi Iwai
Add edac_mc_add_mc_with_groups() for initializing the mem_ctl_info object with the optional attribute groups. This allows drivers to pass additional sysfs entries without manual (and racy) device_create_file() and co calls. edac_mc_add_mc() is kept as is, just calling edac_mc_add_with_groups() with NULL groups. Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/1423046938-18111-3-git-send-email-tiwai@suse.de Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23EDAC: Use static attribute groups for managing sysfs entriesTakashi Iwai
Instead of manual calls of device_create_file() and device_remove_file(), use static attribute groups with proper is_visible callbacks for managing the sysfs entries. This simplifies the code a lot and avoids the possible races. Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/1423046938-18111-2-git-send-email-tiwai@suse.de Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23EDAC: Delete unnecessary checks before pci_dev_put()Markus Elfring
The pci_dev_put() function tests whether its argument is NULL and thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Link: http://lkml.kernel.org/r/54CFC12C.9010002@users.sourceforge.net Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-19Merge tag 'edac_fixes_for_3.20' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull two EDAC fixes from Borislav Petkov: - A fix to sb_edac for proper detection on SNB machines - A fix to amd64_edac to not explode on Numascale machines with more than 16 memory controllers, from Daniel J Blueman. * tag 'edac_fixes_for_3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC, amd64_edac: Prevent OOPS with >16 memory controllers sb_edac: Fix detection on SNB machines
2015-02-17EDAC, amd64_edac: Prevent OOPS with >16 memory controllersDaniel J Blueman
When DRAM errors occur on memory controllers after EDAC_MAX_MCS (16), the kernel fatally dereferences unallocated structures, see splat below; this occurs on at least NumaConnect systems. Fix by checking if a memory controller info structure was found. BUG: unable to handle kernel NULL pointer dereference at 0000000000000320 IP: [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0 PGD 2f8b5a3067 PUD 2f8b5a2067 PMD 0 Oops: 0000 [#2] SMP Modules linked in: CPU: 224 PID: 11930 Comm: stream_c.exe.gn Tainted: G D 3.19.0 #1 Hardware name: Supermicro H8QGL/H8QGL, BIOS 3.5b 01/28/2015 task: ffff8807dbfb8c00 ti: ffff8807dd16c000 task.ti: ffff8807dd16c000 RIP: 0010:[<ffffffff819f714f>] [<ffffffff819f714f>] decode_bus_error+0x2f/0x2b0 RSP: 0000:ffff8907dfc03c48 EFLAGS: 00010297 RAX: 0000000000000001 RBX: 9c67400010080a13 RCX: 0000000000001dc6 RDX: 000000001dc61dc6 RSI: ffff8907dfc03df0 RDI: 000000000000001c RBP: ffff8907dfc03ce8 R08: 0000000000000000 R09: 0000000000000022 R10: ffff891fffa30380 R11: 00000000001cfc90 R12: 0000000000000008 R13: 0000000000000000 R14: 000000000000001c R15: 00009c6740001000 FS: 00007fa97ee18700(0000) GS:ffff8907dfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000320 CR3: 0000003f889b8000 CR4: 00000000000407e0 Stack: 0000000000000000 ffff8907dfc03df0 0000000000000008 9c67400010080a13 000000000000001c 00009c6740001000 ffff8907dfc03c88 ffffffff810e4f9a ffff8907dfc03ce8 ffffffff81b375b9 0000000000000000 0000000000000010 Call Trace: <IRQ> ? vprintk_default ? printk amd_decode_mce notifier_call_chain atomic_notifier_call_chain mce_log machine_check_poll mce_timer_fn ? mce_cpu_restart call_timer_fn.isra.29 run_timer_softirq __do_softirq irq_exit smp_apic_timer_interrupt apic_timer_interrupt <EOI> ? down_read_trylock __do_page_fault ? __schedule do_page_fault page_fault Signed-off-by: Daniel J Blueman <daniel@numascale.com> Link: http://lkml.kernel.org/r/1424144078-24589-1-git-send-email-daniel@numascale.com Cc: stable@vger.kernel.org [ Boris: massage commit message ] Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-09sb_edac: Fix detection on SNB machinesBorislav Petkov
d0585cd815fa ("sb_edac: Claim a different PCI device") changed the probing of sb_edac to look for PCI device 0x3ca0: 3f:0e.0 System peripheral: Intel Corporation Xeon E5/Core i7 Processor Home Agent (rev 07) 00: 86 80 a0 3c 00 00 00 00 07 00 80 08 00 00 80 00 ... but we're matching for 0x3ca8, i.e. PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA in sbridge_probe() therefore the probing fails. Changing it to probe for 0x3ca0 (PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA0), .i.e., the 14.0 device, fixes the issue and driver loads successfully again: [ 2449.013120] EDAC DEBUG: sbridge_init: [ 2449.017029] EDAC sbridge: Seeking for: PCI ID 8086:3ca0 [ 2449.022368] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca0 [ 2449.028498] EDAC sbridge: Seeking for: PCI ID 8086:3ca0 [ 2449.033768] EDAC sbridge: Seeking for: PCI ID 8086:3ca8 [ 2449.039028] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca8 [ 2449.045155] EDAC sbridge: Seeking for: PCI ID 8086:3ca8 ... Add a debug printk while at it to be able to catch the failure in the future and dump driver version on successful load. Fixes: d0585cd815fa ("sb_edac: Claim a different PCI device") Cc: stable@vger.kernel.org # 3.18 Acked-by: Aristeu Rozanski <aris@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2015-01-30EDAC, mv64x60_edac: Fix an error code in probe()Dan Carpenter
If edac_mc_add_mc() fails then we should preserve the error code, but instead the current code returns success. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: http://lkml.kernel.org/r/20150128191351.GC10259@mwanda Signed-off-by: Borislav Petkov <bp@suse.de>
2015-01-30EDAC: edac_mc_sysfs: Make stuff staticBorislav Petkov
Fix sparse warnings. Signed-off-by: Borislav Petkov <bp@suse.de>
2015-01-30EDAC: Fix the leak of mci->bus->name when bus_register failsJunjie Mao
Also use goto labels for all failure paths in edac_create_sysfs_mci_device and update meaningless labels. Signed-off-by: Junjie Mao <junjie.mao@hotmail.com> Link: http://lkml.kernel.org/r/BLU436-SMTP25291B6B612942A212AEBFE95300@phx.gbl [ Boris: Use ! for 0 checks and add newlines for less crammed code. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2015-01-12edac: i5100_edac: Remove unused i5100_recmema_dm_buf_idRickard Strandqvist
Remove the function i5100_recmema_dm_buf_id() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Link: http://lkml.kernel.org/r/1420999030-21770-1-git-send-email-rickard_strandqvist@spectrumdigital.se Signed-off-by: Borislav Petkov <bp@suse.de>
2015-01-07EDAC, synps: Add EDAC support for zynq ddr ecc controllerPunnaiah Choudary Kalluri
Add EDAC support for ecc errors reporting on the synopsys ddr controller. The ddr ecc controller corrects single bit errors and detects double bit errors. Selected important-ish notes from the changelog: - I have not taken care of spliting synps_edac_geterror_info function as it adds additional indentation levels and moreover the existing changes were made as part of the v2 review comments - Removed dt binding info as already there is a binding info available under memorycontroller. so, updated ecc info there. - Shortened the prefix "sysnopsys" to "synps" Signed-off-by: Punnaiah Choudary Kalluri <punnaia@xilinx.com> Link: http://lkml.kernel.org/r/a728a8d4678f4dbf9de189a480297c3d@BY2FFO11FD034.protection.gbl [ Boris: massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>
2015-01-02mpc85xx_edac: Fix a typo in commentsAlexander Kuleshov
s/kenel/kernel/g [ Boris: massage commit message a bit ] Signed-off-by: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Johannes Thumshirn <johannes.thumshirn@men.de> Link: http://lkml.kernel.org/r/1419749085-7128-1-git-send-email-kuleshovmail@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
2014-12-21EDAC, mce_amd_inj: Fix sparse non static symbol warningWei Yongjun
Fixes the following sparse warnings: drivers/edac/mce_amd_inj.c:204:3: warning: symbol 'dfs_fls' was not declared. Should it be static? Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Link: http://lkml.kernel.org/r/1418087095-14174-1-git-send-email-weiyj_lk@163.com Signed-off-by: Borislav Petkov <bp@suse.de>
2014-12-15Merge tag 'driver-core-3.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core update from Greg KH: "Here's the set of driver core patches for 3.19-rc1. They are dominated by the removal of the .owner field in platform drivers. They touch a lot of files, but they are "simple" changes, just removing a line in a structure. Other than that, a few minor driver core and debugfs changes. There are some ath9k patches coming in through this tree that have been acked by the wireless maintainers as they relied on the debugfs changes. Everything has been in linux-next for a while" * tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits) Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries" fs: debugfs: add forward declaration for struct device type firmware class: Deletion of an unnecessary check before the function call "vunmap" firmware loader: fix hung task warning dump devcoredump: provide a one-way disable function device: Add dev_<level>_once variants ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries ath: use seq_file api for ath9k debugfs files debugfs: add helper function to create device related seq_file drivers/base: cacheinfo: remove noisy error boot message Revert "core: platform: add warning if driver has no owner" drivers: base: support cpu cache information interface to userspace via sysfs drivers: base: add cpu_device_create to support per-cpu devices topology: replace custom attribute macros with standard DEVICE_ATTR* cpumask: factor out show_cpumap into separate helper function driver core: Fix unbalanced device reference in drivers_probe driver core: fix race with userland in device_add() sysfs/kernfs: make read requests on pre-alloc files use the buffer. sysfs/kernfs: allow attributes to request write buffer be pre-allocated. fs: sysfs: return EGBIG on write if offset is larger than file size ...
2014-12-11Merge tag 'edac/v3.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull edac updates from Mauro Carvalho Chehab: - Broadwell-DE support on sb-edac driver - Some fixes at sb-edac driver * tag 'edac/v3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: sb_edac: Fix typo computing number of banks sb_edac: Add support for Broadwell-DE processor sb_edac: Fix discovery of top-of-low-memory for Haswell sb_edac: Fix erroneous bytes->gigabytes conversion sb_edac: Fix off-by-one error in number of channels
2014-12-10Merge branch 'x86-ras-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS update from Ingo Molnar: "The biggest change in this cycle is better support for UCNA (UnCorrected No Action) events: "Handle all uncorrected error reports in the same way (soft offline the page). We used to only do that for SRAO (software recoverable action optional) machine checks, but it makes sense to also do it for UCNA (UnCorrected No Action) logs found by CMCI or polling." plus various x86 MCE handling updates and fixes" * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Spell "panicked" correctly x86, mce: Support memory error recovery for both UCNA and Deferred error in machine_check_poll x86, mce, severity: Extend the the mce_severity mechanism to handle UCNA/DEFERRED error x86, MCE, AMD: Assign interrupt handler only when bank supports it x86, MCE, AMD: Drop software-defined bank in error thresholding x86, MCE, AMD: Move invariant code out from loop body x86, MCE, AMD: Correct thresholding error logging x86, MCE, AMD: Use macros to compute bank MSRs RAS, HWPOISON: Fix wrong error recovery status GHES: Make ghes_estatus_caches static APEI, GHES: Cleanup unnecessary function for lockless list
2014-12-09Merge tag 'edac_for_3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bpLinus Torvalds
Pull EDAC updates from Borislav Petkov: "EDAC updates all over the place: - Enablement for AMD F15h models 0x60 CPUs. Most notably DDR4 RAM support. Out of tree stuff is adding the required PCI IDs. From Aravind Gopalakrishnan. - Enable amd64_edac for 32-bit due to popular demand. From Tomasz Pala. - Convert the AMD MCE injection module to debugfs, where it belongs. - Misc EDAC cleanups" * tag 'edac_for_3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC, MCE, AMD: Correct formatting of decoded text EDAC, mce_amd_inj: Add an injector function EDAC, mce_amd_inj: Add hw-injection attributes EDAC, mce_amd_inj: Enable direct writes to MCE MSRs EDAC, mce_amd_inj: Convert mce_amd_inj module to debugfs EDAC: Delete unnecessary check before calling pci_dev_put() EDAC, pci_sysfs: remove unneccessary ifdef around entire file ghes_edac: Use snprintf() to silence a static checker warning amd64_edac: Build module on x86-32 EDAC, MCE, AMD: Add decoding table for MC6 xec amd64_edac: Add F15h M60h support {mv64x60,ppc4xx}_edac,: Remove deprecated IRQF_DISABLED EDAC: Sync memory types and names EDAC: Add DDR3 LRDIMM entries to edac_mem_types x86, amd_nb: Add device IDs to NB tables for F15h M60h pci_ids: Add PCI device IDs for F15h M60h
2014-12-02sb_edac: Fix typo computing number of banksTony Luck
Code will always think there are 16 banks because of a typo Reported-by: Misha Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-12-02sb_edac: Add support for Broadwell-DE processorTony Luck
Broadwell-DE is the microserver version of next generation Xeon processors. A whole bunch of new PCIe device ids, but otherwise pretty much the same as Haswell. Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
2014-12-02sb_edac: Fix discovery of top-of-low-memory for HaswellTony Luck
Haswell moved the TOLM/TOHM registers to a different device and offset. The sb_edac driver accounted for the change of device, but not for the new offset. There was also a typo in the constant to fill in the low 26 bits (was 0x1ffffff, should be 0x3ffffff). This resulted in a bogus value for the top of low memory: EDAC DEBUG: get_memory_layout: TOLM: 0.032 GB (0x0000000001ffffff) which would result in EDAC refusing to translate addresses for errors above the bogus value and below 4GB: sbridge MC3: HANDLING MCE MEMORY ERROR sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090 sbridge MC3: TSC 0 sbridge MC3: ADDR 2000000 sbridge MC3: MISC 523eac86 sbridge MC3: PROCESSOR 0:306f3 TIME 1414600951 SOCKET 0 APIC 0 MC3: 1 CE Error at TOLM area, on addr 0x02000000 on any memory ( page:0x0 offset:0x0 grain:32 syndrome:0x0) With the fix we see the correct TOLM value: DEBUG: get_memory_layout: TOLM: 2.048 GB (0x000000007fffffff) and we decode address 2000000 correctly: sbridge MC3: HANDLING MCE MEMORY ERROR sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090 sbridge MC3: TSC 0 sbridge MC3: ADDR 2000000 sbridge MC3: MISC 523e1086 sbridge MC3: PROCESSOR 0:306f3 TIME 1414601319 SOCKET 0 APIC 0 DEBUG: get_memory_error_data: SAD interleave package: 0 = CPU socket 0, HA 0, shiftup: 0 DEBUG: get_memory_error_data: TAD#0: address 0x0000000002000000 < 0x000000007fffffff, socket interleave 1, channel interleave 4 (offset 0x00000000), index 0, base ch: 0, ch mask: 0x01 DEBUG: get_memory_error_data: RIR#0, limit: 4.095 GB (0x00000000ffffffff), way: 1 DEBUG: get_memory_error_data: RIR#0: channel address 0x00200000 < 0xffffffff, RIR interleave 0, index 0 DEBUG: sbridge_mce_output_error: area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0 MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x2000 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0) Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>