diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2015-06-24 15:46:32 (GMT) |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2015-06-24 15:46:32 (GMT) |
commit | 08d183e3c1f650b4db1d07d764502116861542fa (patch) | |
tree | f868a813f36744597bc7a8260c63cd37a3a94338 /Documentation | |
parent | 4b1f2af6752a4cc9acc1c22ddf3842478965f113 (diff) | |
parent | 6096f884515466f400864ad23d16f20b731a7ce7 (diff) | |
download | linux-08d183e3c1f650b4db1d07d764502116861542fa.tar.xz |
Merge tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux
Pull powerpc updates from Michael Ellerman:
- disable the 32-bit vdso when building LE, so we can build with a
64-bit only toolchain.
- EEH fixes from Gavin & Richard.
- enable the sys_kcmp syscall from Laurent.
- sysfs control for fastsleep workaround from Shreyas.
- expose OPAL events as an irq chip by Alistair.
- MSI ops moved to pci_controller_ops by Daniel.
- fix for kernel to userspace backtraces for perf from Anton.
- merge pseries and pseries_le defconfigs from Cyril.
- CXL in-kernel API from Mikey.
- OPAL prd driver from Jeremy.
- fix for DSCR handling & tests from Anshuman.
- Powernv flash mtd driver from Cyril.
- dynamic DMA Window support on powernv from Alexey.
- LLVM clang fixes & workarounds from Anton.
- reworked version of the patch to abort syscalls when transactional.
- fix the swap encoding to support 4TB, from Aneesh.
- various fixes as usual.
- Freescale updates from Scott: Highlights include more 8xx
optimizations, an e6500 hugetlb optimization, QMan device tree nodes,
t1024/t1023 support, and various fixes and cleanup.
* tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (180 commits)
cxl: Fix typo in debug print
cxl: Add CXL_KERNEL_API config option
powerpc/powernv: Fix wrong IOMMU table in pnv_ioda_setup_bus_dma()
powerpc/mm: Change the swap encoding in pte.
powerpc/mm: PTE_RPN_MAX is not used, remove the same
powerpc/tm: Abort syscalls in active transactions
powerpc/iommu/ioda2: Enable compile with IOV=on and IOMMU_API=off
powerpc/include: Add opal-prd to installed uapi headers
powerpc/powernv: fix construction of opal PRD messages
powerpc/powernv: Increase opal-irqchip initcall priority
powerpc: Make doorbell check preemption safe
powerpc/powernv: pnv_init_idle_states() should only run on powernv
macintosh/nvram: Remove as unused
powerpc: Don't use gcc specific options on clang
powerpc: Don't use -mno-strict-align on clang
powerpc: Only use -mtraceback=no, -mno-string and -msoft-float if toolchain supports it
powerpc: Only use -mabi=altivec if toolchain supports it
powerpc: Fix duplicate const clang warning in user access code
vfio: powerpc/spapr: Support Dynamic DMA windows
vfio: powerpc/spapr: Register memory and define IOMMU v2
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/ABI/testing/sysfs-class-cxl | 33 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/powerpc/fsl/fman.txt | 13 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/powerpc/fsl/guts.txt | 5 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/soc/fsl/qman-portals.txt | 4 | ||||
-rw-r--r-- | Documentation/powerpc/00-INDEX | 2 | ||||
-rw-r--r-- | Documentation/powerpc/cxl.txt | 4 | ||||
-rw-r--r-- | Documentation/powerpc/dscr.txt | 83 | ||||
-rw-r--r-- | Documentation/powerpc/transactional_memory.txt | 32 | ||||
-rw-r--r-- | Documentation/vfio.txt | 62 |
9 files changed, 217 insertions, 21 deletions
diff --git a/Documentation/ABI/testing/sysfs-class-cxl b/Documentation/ABI/testing/sysfs-class-cxl index d46bba8..acfe9df 100644 --- a/Documentation/ABI/testing/sysfs-class-cxl +++ b/Documentation/ABI/testing/sysfs-class-cxl @@ -6,6 +6,17 @@ Example: The real path of the attribute /sys/class/cxl/afu0.0s/irqs_max is Slave contexts (eg. /sys/class/cxl/afu0.0s): +What: /sys/class/cxl/<afu>/afu_err_buf +Date: September 2014 +Contact: linuxppc-dev@lists.ozlabs.org +Description: read only + AFU Error Buffer contents. The contents of this file are + application specific and depends on the AFU being used. + Applications interacting with the AFU can use this attribute + to know about the current error condition and take appropriate + action like logging the event etc. + + What: /sys/class/cxl/<afu>/irqs_max Date: September 2014 Contact: linuxppc-dev@lists.ozlabs.org @@ -15,6 +26,7 @@ Description: read/write that hardware can support (eg. 2037). Write values will limit userspace applications to that many userspace interrupts. Must be >= irqs_min. +Users: https://github.com/ibm-capi/libcxl What: /sys/class/cxl/<afu>/irqs_min Date: September 2014 @@ -24,6 +36,7 @@ Description: read only userspace must request on a CXL_START_WORK ioctl. Userspace may omit the num_interrupts field in the START_WORK IOCTL to get this minimum automatically. +Users: https://github.com/ibm-capi/libcxl What: /sys/class/cxl/<afu>/mmio_size Date: September 2014 @@ -31,6 +44,7 @@ Contact: linuxppc-dev@lists.ozlabs.org Description: read only Decimal value of the size of the MMIO space that may be mmaped by userspace. +Users: https://github.com/ibm-capi/libcxl What: /sys/class/cxl/<afu>/modes_supported Date: September 2014 @@ -38,6 +52,7 @@ Contact: linuxppc-dev@lists.ozlabs.org Description: read only List of the modes this AFU supports. One per line. Valid entries are: "dedicated_process" and "afu_directed" +Users: https://github.com/ibm-capi/libcxl What: /sys/class/cxl/<afu>/mode Date: September 2014 @@ -46,6 +61,7 @@ Description: read/write The current mode the AFU is using. Will be one of the modes given in modes_supported. Writing will change the mode provided that no user contexts are attached. +Users: https://github.com/ibm-capi/libcxl What: /sys/class/cxl/<afu>/prefault_mode @@ -59,6 +75,7 @@ Description: read/write descriptor as an effective address and prefault what it points to. all: all segments process calling START_WORK maps. +Users: https://github.com/ibm-capi/libcxl What: /sys/class/cxl/<afu>/reset Date: September 2014 @@ -66,12 +83,14 @@ Contact: linuxppc-dev@lists.ozlabs.org Description: write only Writing 1 here will reset the AFU provided there are not contexts active on the AFU. +Users: https://github.com/ibm-capi/libcxl What: /sys/class/cxl/<afu>/api_version Date: September 2014 Contact: linuxppc-dev@lists.ozlabs.org Description: read only Decimal value of the current version of the kernel/user API. +Users: https://github.com/ibm-capi/libcxl What: /sys/class/cxl/<afu>/api_version_compatible Date: September 2014 @@ -79,6 +98,7 @@ Contact: linuxppc-dev@lists.ozlabs.org Description: read only Decimal value of the the lowest version of the userspace API this this kernel supports. +Users: https://github.com/ibm-capi/libcxl AFU configuration records (eg. /sys/class/cxl/afu0.0/cr0): @@ -92,6 +112,7 @@ Contact: linuxppc-dev@lists.ozlabs.org Description: read only Hexadecimal value of the vendor ID found in this AFU configuration record. +Users: https://github.com/ibm-capi/libcxl What: /sys/class/cxl/<afu>/cr<config num>/device Date: February 2015 @@ -99,6 +120,7 @@ Contact: linuxppc-dev@lists.ozlabs.org Description: read only Hexadecimal value of the device ID found in this AFU configuration record. +Users: https://github.com/ibm-capi/libcxl What: /sys/class/cxl/<afu>/cr<config num>/class Date: February 2015 @@ -106,6 +128,7 @@ Contact: linuxppc-dev@lists.ozlabs.org Description: read only Hexadecimal value of the class code found in this AFU configuration record. +Users: https://github.com/ibm-capi/libcxl What: /sys/class/cxl/<afu>/cr<config num>/config Date: February 2015 @@ -115,6 +138,7 @@ Description: read only record. The format is expected to match the either the standard or extended configuration space defined by the PCIe specification. +Users: https://github.com/ibm-capi/libcxl @@ -126,18 +150,21 @@ Contact: linuxppc-dev@lists.ozlabs.org Description: read only Decimal value of the size of the MMIO space that may be mmaped by userspace. This includes all slave contexts space also. +Users: https://github.com/ibm-capi/libcxl What: /sys/class/cxl/<afu>m/pp_mmio_len Date: September 2014 Contact: linuxppc-dev@lists.ozlabs.org Description: read only Decimal value of the Per Process MMIO space length. +Users: https://github.com/ibm-capi/libcxl What: /sys/class/cxl/<afu>m/pp_mmio_off Date: September 2014 Contact: linuxppc-dev@lists.ozlabs.org Description: read only Decimal value of the Per Process MMIO space offset. +Users: https://github.com/ibm-capi/libcxl Card info (eg. /sys/class/cxl/card0) @@ -147,12 +174,14 @@ Date: September 2014 Contact: linuxppc-dev@lists.ozlabs.org Description: read only Identifies the CAIA Version the card implements. +Users: https://github.com/ibm-capi/libcxl What: /sys/class/cxl/<card>/psl_revision Date: September 2014 Contact: linuxppc-dev@lists.ozlabs.org Description: read only Identifies the revision level of the PSL. +Users: https://github.com/ibm-capi/libcxl What: /sys/class/cxl/<card>/base_image Date: September 2014 @@ -162,6 +191,7 @@ Description: read only that support loadable PSLs. For FPGAs this field identifies the image contained in the on-adapter flash which is loaded during the initial program load. +Users: https://github.com/ibm-capi/libcxl What: /sys/class/cxl/<card>/image_loaded Date: September 2014 @@ -169,6 +199,7 @@ Contact: linuxppc-dev@lists.ozlabs.org Description: read only Will return "user" or "factory" depending on the image loaded onto the card. +Users: https://github.com/ibm-capi/libcxl What: /sys/class/cxl/<card>/load_image_on_perst Date: December 2014 @@ -183,6 +214,7 @@ Description: read/write user or factory image to be loaded. Default is to reload on PERST whichever image the card has loaded. +Users: https://github.com/ibm-capi/libcxl What: /sys/class/cxl/<card>/reset Date: October 2014 @@ -190,3 +222,4 @@ Contact: linuxppc-dev@lists.ozlabs.org Description: write only Writing 1 will issue a PERST to card which may cause the card to reload the FPGA depending on load_image_on_perst. +Users: https://github.com/ibm-capi/libcxl diff --git a/Documentation/devicetree/bindings/powerpc/fsl/fman.txt b/Documentation/devicetree/bindings/powerpc/fsl/fman.txt index edda55f..1fc5328 100644 --- a/Documentation/devicetree/bindings/powerpc/fsl/fman.txt +++ b/Documentation/devicetree/bindings/powerpc/fsl/fman.txt @@ -189,6 +189,19 @@ PROPERTIES Definition: There is one reg region describing the port configuration registers. +- fsl,fman-10g-port + Usage: optional + Value type: boolean + Definition: The default port rate is 1G. + If this property exists, the port is s 10G port. + +- fsl,fman-best-effort-port + Usage: optional + Value type: boolean + Definition: Can be defined only if 10G-support is set. + This property marks a best-effort 10G port (10G port that + may not be capable of line rate). + EXAMPLE port@a8000 { diff --git a/Documentation/devicetree/bindings/powerpc/fsl/guts.txt b/Documentation/devicetree/bindings/powerpc/fsl/guts.txt index 7f150b5..b71b203 100644 --- a/Documentation/devicetree/bindings/powerpc/fsl/guts.txt +++ b/Documentation/devicetree/bindings/powerpc/fsl/guts.txt @@ -9,6 +9,11 @@ Required properties: - compatible : Should define the compatible device type for global-utilities. + Possible compatibles: + "fsl,qoriq-device-config-1.0" + "fsl,qoriq-device-config-2.0" + "fsl,<chip>-device-config" + "fsl,<chip>-guts" - reg : Offset and length of the register set for the device. Recommended properties: diff --git a/Documentation/devicetree/bindings/soc/fsl/qman-portals.txt b/Documentation/devicetree/bindings/soc/fsl/qman-portals.txt index 48c4dae..47e46cc 100644 --- a/Documentation/devicetree/bindings/soc/fsl/qman-portals.txt +++ b/Documentation/devicetree/bindings/soc/fsl/qman-portals.txt @@ -47,7 +47,7 @@ PROPERTIES For additional details about the PAMU/LIODN binding(s) see pamu.txt -- fsl,qman-channel-id +- cell-index Usage: Required Value type: <u32> Definition: The hardware index of the channel. This can also be @@ -136,7 +136,7 @@ The example below shows a (P4080) QMan portals container/bus node with two porta reg = <0x4000 0x4000>, <0x101000 0x1000>; interrupts = <106 2 0 0>; fsl,liodn = <3 4>; - fsl,qman-channel-id = <1>; + cell-index = <1>; fman0 { fsl,liodn = <0x22>; diff --git a/Documentation/powerpc/00-INDEX b/Documentation/powerpc/00-INDEX index 6fd0e8b..9dc845c 100644 --- a/Documentation/powerpc/00-INDEX +++ b/Documentation/powerpc/00-INDEX @@ -30,3 +30,5 @@ ptrace.txt - Information on the ptrace interfaces for hardware debug registers. transactional_memory.txt - Overview of the Power8 transactional memory support. +dscr.txt + - Overview DSCR (Data Stream Control Register) support. diff --git a/Documentation/powerpc/cxl.txt b/Documentation/powerpc/cxl.txt index 2c71ecc..2a230d01 100644 --- a/Documentation/powerpc/cxl.txt +++ b/Documentation/powerpc/cxl.txt @@ -133,6 +133,9 @@ User API The following file operations are supported on both slave and master devices. + A userspace library libcxl is avaliable here: + https://github.com/ibm-capi/libcxl + This provides a C interface to this kernel API. open ---- @@ -366,6 +369,7 @@ Sysfs Class enumeration and tuning of the accelerators. Its layout is described in Documentation/ABI/testing/sysfs-class-cxl + Udev rules ========== diff --git a/Documentation/powerpc/dscr.txt b/Documentation/powerpc/dscr.txt new file mode 100644 index 0000000..1ff4400 --- /dev/null +++ b/Documentation/powerpc/dscr.txt @@ -0,0 +1,83 @@ + DSCR (Data Stream Control Register) + ================================================ + +DSCR register in powerpc allows user to have some control of prefetch of data +stream in the processor. Please refer to the ISA documents or related manual +for more detailed information regarding how to use this DSCR to attain this +control of the pefetches . This document here provides an overview of kernel +support for DSCR, related kernel objects, it's functionalities and exported +user interface. + +(A) Data Structures: + + (1) thread_struct: + dscr /* Thread DSCR value */ + dscr_inherit /* Thread has changed default DSCR */ + + (2) PACA: + dscr_default /* per-CPU DSCR default value */ + + (3) sysfs.c: + dscr_default /* System DSCR default value */ + +(B) Scheduler Changes: + + Scheduler will write the per-CPU DSCR default which is stored in the + CPU's PACA value into the register if the thread has dscr_inherit value + cleared which means that it has not changed the default DSCR till now. + If the dscr_inherit value is set which means that it has changed the + default DSCR value, scheduler will write the changed value which will + now be contained in thread struct's dscr into the register instead of + the per-CPU default PACA based DSCR value. + + NOTE: Please note here that the system wide global DSCR value never + gets used directly in the scheduler process context switch at all. + +(C) SYSFS Interface: + + Global DSCR default: /sys/devices/system/cpu/dscr_default + CPU specific DSCR default: /sys/devices/system/cpu/cpuN/dscr + + Changing the global DSCR default in the sysfs will change all the CPU + specific DSCR defaults immediately in their PACA structures. Again if + the current process has the dscr_inherit clear, it also writes the new + value into every CPU's DSCR register right away and updates the current + thread's DSCR value as well. + + Changing the CPU specif DSCR default value in the sysfs does exactly + the same thing as above but unlike the global one above, it just changes + stuff for that particular CPU instead for all the CPUs on the system. + +(D) User Space Instructions: + + The DSCR register can be accessed in the user space using any of these + two SPR numbers available for that purpose. + + (1) Problem state SPR: 0x03 (Un-privileged, POWER8 only) + (2) Privileged state SPR: 0x11 (Privileged) + + Accessing DSCR through privileged SPR number (0x11) from user space + works, as it is emulated following an illegal instruction exception + inside the kernel. Both mfspr and mtspr instructions are emulated. + + Accessing DSCR through user level SPR (0x03) from user space will first + create a facility unavailable exception. Inside this exception handler + all mfspr isntruction based read attempts will get emulated and returned + where as the first mtspr instruction based write attempts will enable + the DSCR facility for the next time around (both for read and write) by + setting DSCR facility in the FSCR register. + +(E) Specifics about 'dscr_inherit': + + The thread struct element 'dscr_inherit' represents whether the thread + in question has attempted and changed the DSCR itself using any of the + following methods. This element signifies whether the thread wants to + use the CPU default DSCR value or its own changed DSCR value in the + kernel. + + (1) mtspr instruction (SPR number 0x03) + (2) mtspr instruction (SPR number 0x11) + (3) ptrace interface (Explicitly set user DSCR value) + + Any child of the process created after this event in the process inherits + this same behaviour as well. diff --git a/Documentation/powerpc/transactional_memory.txt b/Documentation/powerpc/transactional_memory.txt index ded6979..ba0a2a4 100644 --- a/Documentation/powerpc/transactional_memory.txt +++ b/Documentation/powerpc/transactional_memory.txt @@ -74,22 +74,23 @@ Causes of transaction aborts Syscalls ======== -Performing syscalls from within transaction is not recommended, and can lead -to unpredictable results. +Syscalls made from within an active transaction will not be performed and the +transaction will be doomed by the kernel with the failure code TM_CAUSE_SYSCALL +| TM_CAUSE_PERSISTENT. -Syscalls do not by design abort transactions, but beware: The kernel code will -not be running in transactional state. The effect of syscalls will always -remain visible, but depending on the call they may abort your transaction as a -side-effect, read soon-to-be-aborted transactional data that should not remain -invisible, etc. If you constantly retry a transaction that constantly aborts -itself by calling a syscall, you'll have a livelock & make no progress. +Syscalls made from within a suspended transaction are performed as normal and +the transaction is not explicitly doomed by the kernel. However, what the +kernel does to perform the syscall may result in the transaction being doomed +by the hardware. The syscall is performed in suspended mode so any side +effects will be persistent, independent of transaction success or failure. No +guarantees are provided by the kernel about which syscalls will affect +transaction success. -Simple syscalls (e.g. sigprocmask()) "could" be OK. Even things like write() -from, say, printf() should be OK as long as the kernel does not access any -memory that was accessed transactionally. - -Consider any syscalls that happen to work as debug-only -- not recommended for -production use. Best to queue them up till after the transaction is over. +Care must be taken when relying on syscalls to abort during active transactions +if the calls are made via a library. Libraries may cache values (which may +give the appearance of success) or perform operations that cause transaction +failure before entering the kernel (which may produce different failure codes). +Examples are glibc's getpid() and lazy symbol resolution. Signals @@ -176,8 +177,7 @@ kernel aborted a transaction: TM_CAUSE_RESCHED Thread was rescheduled. TM_CAUSE_TLBI Software TLB invalid. TM_CAUSE_FAC_UNAV FP/VEC/VSX unavailable trap. - TM_CAUSE_SYSCALL Currently unused; future syscalls that must abort - transactions for consistency will use this. + TM_CAUSE_SYSCALL Syscall from active transaction. TM_CAUSE_SIGNAL Signal delivered. TM_CAUSE_MISC Currently unused. TM_CAUSE_ALIGNMENT Alignment fault. diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt index 96978ec..1dd3fdd 100644 --- a/Documentation/vfio.txt +++ b/Documentation/vfio.txt @@ -289,10 +289,12 @@ PPC64 sPAPR implementation note This implementation has some specifics: -1) Only one IOMMU group per container is supported as an IOMMU group -represents the minimal entity which isolation can be guaranteed for and -groups are allocated statically, one per a Partitionable Endpoint (PE) +1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per +container is supported as an IOMMU table is allocated at the boot time, +one table per a IOMMU group which is a Partitionable Endpoint (PE) (PE is often a PCI domain but not always). +Newer systems (POWER8 with IODA2) have improved hardware design which allows +to remove this limitation and have multiple IOMMU groups per a VFIO container. 2) The hardware supports so called DMA windows - the PCI address range within which DMA transfer is allowed, any attempt to access address space @@ -385,6 +387,18 @@ The code flow from the example above should be slightly changed: .... + /* Inject EEH error, which is expected to be caused by 32-bits + * config load. + */ + pe_op.op = VFIO_EEH_PE_INJECT_ERR; + pe_op.err.type = EEH_ERR_TYPE_32; + pe_op.err.func = EEH_ERR_FUNC_LD_CFG_ADDR; + pe_op.err.addr = 0ul; + pe_op.err.mask = 0ul; + ioctl(container, VFIO_EEH_PE_OP, &pe_op); + + .... + /* When 0xFF's returned from reading PCI config space or IO BARs * of the PCI device. Check the PE's state to see if that has been * frozen. @@ -427,6 +441,48 @@ The code flow from the example above should be slightly changed: .... +5) There is v2 of SPAPR TCE IOMMU. It deprecates VFIO_IOMMU_ENABLE/ +VFIO_IOMMU_DISABLE and implements 2 new ioctls: +VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY +(which are unsupported in v1 IOMMU). + +PPC64 paravirtualized guests generate a lot of map/unmap requests, +and the handling of those includes pinning/unpinning pages and updating +mm::locked_vm counter to make sure we do not exceed the rlimit. +The v2 IOMMU splits accounting and pinning into separate operations: + +- VFIO_IOMMU_SPAPR_REGISTER_MEMORY/VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY ioctls +receive a user space address and size of the block to be pinned. +Bisecting is not supported and VFIO_IOMMU_UNREGISTER_MEMORY is expected to +be called with the exact address and size used for registering +the memory block. The userspace is not expected to call these often. +The ranges are stored in a linked list in a VFIO container. + +- VFIO_IOMMU_MAP_DMA/VFIO_IOMMU_UNMAP_DMA ioctls only update the actual +IOMMU table and do not do pinning; instead these check that the userspace +address is from pre-registered range. + +This separation helps in optimizing DMA for guests. + +6) sPAPR specification allows guests to have an additional DMA window(s) on +a PCI bus with a variable page size. Two ioctls have been added to support +this: VFIO_IOMMU_SPAPR_TCE_CREATE and VFIO_IOMMU_SPAPR_TCE_REMOVE. +The platform has to support the functionality or error will be returned to +the userspace. The existing hardware supports up to 2 DMA windows, one is +2GB long, uses 4K pages and called "default 32bit window"; the other can +be as big as entire RAM, use different page size, it is optional - guests +create those in run-time if the guest driver supports 64bit DMA. + +VFIO_IOMMU_SPAPR_TCE_CREATE receives a page shift, a DMA window size and +a number of TCE table levels (if a TCE table is going to be big enough and +the kernel may not be able to allocate enough of physically contiguous memory). +It creates a new window in the available slot and returns the bus address where +the new window starts. Due to hardware limitation, the user space cannot choose +the location of DMA windows. + +VFIO_IOMMU_SPAPR_TCE_REMOVE receives the bus start address of the window +and removes it. + ------------------------------------------------------------------------------- [1] VFIO was originally an acronym for "Virtual Function I/O" in its |