summaryrefslogtreecommitdiff
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2015-06-24 15:46:32 (GMT)
committerLinus Torvalds <torvalds@linux-foundation.org>2015-06-24 15:46:32 (GMT)
commit08d183e3c1f650b4db1d07d764502116861542fa (patch)
treef868a813f36744597bc7a8260c63cd37a3a94338 /Documentation
parent4b1f2af6752a4cc9acc1c22ddf3842478965f113 (diff)
parent6096f884515466f400864ad23d16f20b731a7ce7 (diff)
downloadlinux-08d183e3c1f650b4db1d07d764502116861542fa.tar.xz
Merge tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux
Pull powerpc updates from Michael Ellerman: - disable the 32-bit vdso when building LE, so we can build with a 64-bit only toolchain. - EEH fixes from Gavin & Richard. - enable the sys_kcmp syscall from Laurent. - sysfs control for fastsleep workaround from Shreyas. - expose OPAL events as an irq chip by Alistair. - MSI ops moved to pci_controller_ops by Daniel. - fix for kernel to userspace backtraces for perf from Anton. - merge pseries and pseries_le defconfigs from Cyril. - CXL in-kernel API from Mikey. - OPAL prd driver from Jeremy. - fix for DSCR handling & tests from Anshuman. - Powernv flash mtd driver from Cyril. - dynamic DMA Window support on powernv from Alexey. - LLVM clang fixes & workarounds from Anton. - reworked version of the patch to abort syscalls when transactional. - fix the swap encoding to support 4TB, from Aneesh. - various fixes as usual. - Freescale updates from Scott: Highlights include more 8xx optimizations, an e6500 hugetlb optimization, QMan device tree nodes, t1024/t1023 support, and various fixes and cleanup. * tag 'powerpc-4.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (180 commits) cxl: Fix typo in debug print cxl: Add CXL_KERNEL_API config option powerpc/powernv: Fix wrong IOMMU table in pnv_ioda_setup_bus_dma() powerpc/mm: Change the swap encoding in pte. powerpc/mm: PTE_RPN_MAX is not used, remove the same powerpc/tm: Abort syscalls in active transactions powerpc/iommu/ioda2: Enable compile with IOV=on and IOMMU_API=off powerpc/include: Add opal-prd to installed uapi headers powerpc/powernv: fix construction of opal PRD messages powerpc/powernv: Increase opal-irqchip initcall priority powerpc: Make doorbell check preemption safe powerpc/powernv: pnv_init_idle_states() should only run on powernv macintosh/nvram: Remove as unused powerpc: Don't use gcc specific options on clang powerpc: Don't use -mno-strict-align on clang powerpc: Only use -mtraceback=no, -mno-string and -msoft-float if toolchain supports it powerpc: Only use -mabi=altivec if toolchain supports it powerpc: Fix duplicate const clang warning in user access code vfio: powerpc/spapr: Support Dynamic DMA windows vfio: powerpc/spapr: Register memory and define IOMMU v2 ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-class-cxl33
-rw-r--r--Documentation/devicetree/bindings/powerpc/fsl/fman.txt13
-rw-r--r--Documentation/devicetree/bindings/powerpc/fsl/guts.txt5
-rw-r--r--Documentation/devicetree/bindings/soc/fsl/qman-portals.txt4
-rw-r--r--Documentation/powerpc/00-INDEX2
-rw-r--r--Documentation/powerpc/cxl.txt4
-rw-r--r--Documentation/powerpc/dscr.txt83
-rw-r--r--Documentation/powerpc/transactional_memory.txt32
-rw-r--r--Documentation/vfio.txt62
9 files changed, 217 insertions, 21 deletions
diff --git a/Documentation/ABI/testing/sysfs-class-cxl b/Documentation/ABI/testing/sysfs-class-cxl
index d46bba8..acfe9df 100644
--- a/Documentation/ABI/testing/sysfs-class-cxl
+++ b/Documentation/ABI/testing/sysfs-class-cxl
@@ -6,6 +6,17 @@ Example: The real path of the attribute /sys/class/cxl/afu0.0s/irqs_max is
Slave contexts (eg. /sys/class/cxl/afu0.0s):
+What: /sys/class/cxl/<afu>/afu_err_buf
+Date: September 2014
+Contact: linuxppc-dev@lists.ozlabs.org
+Description: read only
+ AFU Error Buffer contents. The contents of this file are
+ application specific and depends on the AFU being used.
+ Applications interacting with the AFU can use this attribute
+ to know about the current error condition and take appropriate
+ action like logging the event etc.
+
+
What: /sys/class/cxl/<afu>/irqs_max
Date: September 2014
Contact: linuxppc-dev@lists.ozlabs.org
@@ -15,6 +26,7 @@ Description: read/write
that hardware can support (eg. 2037). Write values will limit
userspace applications to that many userspace interrupts. Must
be >= irqs_min.
+Users: https://github.com/ibm-capi/libcxl
What: /sys/class/cxl/<afu>/irqs_min
Date: September 2014
@@ -24,6 +36,7 @@ Description: read only
userspace must request on a CXL_START_WORK ioctl. Userspace may
omit the num_interrupts field in the START_WORK IOCTL to get
this minimum automatically.
+Users: https://github.com/ibm-capi/libcxl
What: /sys/class/cxl/<afu>/mmio_size
Date: September 2014
@@ -31,6 +44,7 @@ Contact: linuxppc-dev@lists.ozlabs.org
Description: read only
Decimal value of the size of the MMIO space that may be mmaped
by userspace.
+Users: https://github.com/ibm-capi/libcxl
What: /sys/class/cxl/<afu>/modes_supported
Date: September 2014
@@ -38,6 +52,7 @@ Contact: linuxppc-dev@lists.ozlabs.org
Description: read only
List of the modes this AFU supports. One per line.
Valid entries are: "dedicated_process" and "afu_directed"
+Users: https://github.com/ibm-capi/libcxl
What: /sys/class/cxl/<afu>/mode
Date: September 2014
@@ -46,6 +61,7 @@ Description: read/write
The current mode the AFU is using. Will be one of the modes
given in modes_supported. Writing will change the mode
provided that no user contexts are attached.
+Users: https://github.com/ibm-capi/libcxl
What: /sys/class/cxl/<afu>/prefault_mode
@@ -59,6 +75,7 @@ Description: read/write
descriptor as an effective address and
prefault what it points to.
all: all segments process calling START_WORK maps.
+Users: https://github.com/ibm-capi/libcxl
What: /sys/class/cxl/<afu>/reset
Date: September 2014
@@ -66,12 +83,14 @@ Contact: linuxppc-dev@lists.ozlabs.org
Description: write only
Writing 1 here will reset the AFU provided there are not
contexts active on the AFU.
+Users: https://github.com/ibm-capi/libcxl
What: /sys/class/cxl/<afu>/api_version
Date: September 2014
Contact: linuxppc-dev@lists.ozlabs.org
Description: read only
Decimal value of the current version of the kernel/user API.
+Users: https://github.com/ibm-capi/libcxl
What: /sys/class/cxl/<afu>/api_version_compatible
Date: September 2014
@@ -79,6 +98,7 @@ Contact: linuxppc-dev@lists.ozlabs.org
Description: read only
Decimal value of the the lowest version of the userspace API
this this kernel supports.
+Users: https://github.com/ibm-capi/libcxl
AFU configuration records (eg. /sys/class/cxl/afu0.0/cr0):
@@ -92,6 +112,7 @@ Contact: linuxppc-dev@lists.ozlabs.org
Description: read only
Hexadecimal value of the vendor ID found in this AFU
configuration record.
+Users: https://github.com/ibm-capi/libcxl
What: /sys/class/cxl/<afu>/cr<config num>/device
Date: February 2015
@@ -99,6 +120,7 @@ Contact: linuxppc-dev@lists.ozlabs.org
Description: read only
Hexadecimal value of the device ID found in this AFU
configuration record.
+Users: https://github.com/ibm-capi/libcxl
What: /sys/class/cxl/<afu>/cr<config num>/class
Date: February 2015
@@ -106,6 +128,7 @@ Contact: linuxppc-dev@lists.ozlabs.org
Description: read only
Hexadecimal value of the class code found in this AFU
configuration record.
+Users: https://github.com/ibm-capi/libcxl
What: /sys/class/cxl/<afu>/cr<config num>/config
Date: February 2015
@@ -115,6 +138,7 @@ Description: read only
record. The format is expected to match the either the standard
or extended configuration space defined by the PCIe
specification.
+Users: https://github.com/ibm-capi/libcxl
@@ -126,18 +150,21 @@ Contact: linuxppc-dev@lists.ozlabs.org
Description: read only
Decimal value of the size of the MMIO space that may be mmaped
by userspace. This includes all slave contexts space also.
+Users: https://github.com/ibm-capi/libcxl
What: /sys/class/cxl/<afu>m/pp_mmio_len
Date: September 2014
Contact: linuxppc-dev@lists.ozlabs.org
Description: read only
Decimal value of the Per Process MMIO space length.
+Users: https://github.com/ibm-capi/libcxl
What: /sys/class/cxl/<afu>m/pp_mmio_off
Date: September 2014
Contact: linuxppc-dev@lists.ozlabs.org
Description: read only
Decimal value of the Per Process MMIO space offset.
+Users: https://github.com/ibm-capi/libcxl
Card info (eg. /sys/class/cxl/card0)
@@ -147,12 +174,14 @@ Date: September 2014
Contact: linuxppc-dev@lists.ozlabs.org
Description: read only
Identifies the CAIA Version the card implements.
+Users: https://github.com/ibm-capi/libcxl
What: /sys/class/cxl/<card>/psl_revision
Date: September 2014
Contact: linuxppc-dev@lists.ozlabs.org
Description: read only
Identifies the revision level of the PSL.
+Users: https://github.com/ibm-capi/libcxl
What: /sys/class/cxl/<card>/base_image
Date: September 2014
@@ -162,6 +191,7 @@ Description: read only
that support loadable PSLs. For FPGAs this field identifies
the image contained in the on-adapter flash which is loaded
during the initial program load.
+Users: https://github.com/ibm-capi/libcxl
What: /sys/class/cxl/<card>/image_loaded
Date: September 2014
@@ -169,6 +199,7 @@ Contact: linuxppc-dev@lists.ozlabs.org
Description: read only
Will return "user" or "factory" depending on the image loaded
onto the card.
+Users: https://github.com/ibm-capi/libcxl
What: /sys/class/cxl/<card>/load_image_on_perst
Date: December 2014
@@ -183,6 +214,7 @@ Description: read/write
user or factory image to be loaded.
Default is to reload on PERST whichever image the card has
loaded.
+Users: https://github.com/ibm-capi/libcxl
What: /sys/class/cxl/<card>/reset
Date: October 2014
@@ -190,3 +222,4 @@ Contact: linuxppc-dev@lists.ozlabs.org
Description: write only
Writing 1 will issue a PERST to card which may cause the card
to reload the FPGA depending on load_image_on_perst.
+Users: https://github.com/ibm-capi/libcxl
diff --git a/Documentation/devicetree/bindings/powerpc/fsl/fman.txt b/Documentation/devicetree/bindings/powerpc/fsl/fman.txt
index edda55f..1fc5328 100644
--- a/Documentation/devicetree/bindings/powerpc/fsl/fman.txt
+++ b/Documentation/devicetree/bindings/powerpc/fsl/fman.txt
@@ -189,6 +189,19 @@ PROPERTIES
Definition: There is one reg region describing the port
configuration registers.
+- fsl,fman-10g-port
+ Usage: optional
+ Value type: boolean
+ Definition: The default port rate is 1G.
+ If this property exists, the port is s 10G port.
+
+- fsl,fman-best-effort-port
+ Usage: optional
+ Value type: boolean
+ Definition: Can be defined only if 10G-support is set.
+ This property marks a best-effort 10G port (10G port that
+ may not be capable of line rate).
+
EXAMPLE
port@a8000 {
diff --git a/Documentation/devicetree/bindings/powerpc/fsl/guts.txt b/Documentation/devicetree/bindings/powerpc/fsl/guts.txt
index 7f150b5..b71b203 100644
--- a/Documentation/devicetree/bindings/powerpc/fsl/guts.txt
+++ b/Documentation/devicetree/bindings/powerpc/fsl/guts.txt
@@ -9,6 +9,11 @@ Required properties:
- compatible : Should define the compatible device type for
global-utilities.
+ Possible compatibles:
+ "fsl,qoriq-device-config-1.0"
+ "fsl,qoriq-device-config-2.0"
+ "fsl,<chip>-device-config"
+ "fsl,<chip>-guts"
- reg : Offset and length of the register set for the device.
Recommended properties:
diff --git a/Documentation/devicetree/bindings/soc/fsl/qman-portals.txt b/Documentation/devicetree/bindings/soc/fsl/qman-portals.txt
index 48c4dae..47e46cc 100644
--- a/Documentation/devicetree/bindings/soc/fsl/qman-portals.txt
+++ b/Documentation/devicetree/bindings/soc/fsl/qman-portals.txt
@@ -47,7 +47,7 @@ PROPERTIES
For additional details about the PAMU/LIODN binding(s) see pamu.txt
-- fsl,qman-channel-id
+- cell-index
Usage: Required
Value type: <u32>
Definition: The hardware index of the channel. This can also be
@@ -136,7 +136,7 @@ The example below shows a (P4080) QMan portals container/bus node with two porta
reg = <0x4000 0x4000>, <0x101000 0x1000>;
interrupts = <106 2 0 0>;
fsl,liodn = <3 4>;
- fsl,qman-channel-id = <1>;
+ cell-index = <1>;
fman0 {
fsl,liodn = <0x22>;
diff --git a/Documentation/powerpc/00-INDEX b/Documentation/powerpc/00-INDEX
index 6fd0e8b..9dc845c 100644
--- a/Documentation/powerpc/00-INDEX
+++ b/Documentation/powerpc/00-INDEX
@@ -30,3 +30,5 @@ ptrace.txt
- Information on the ptrace interfaces for hardware debug registers.
transactional_memory.txt
- Overview of the Power8 transactional memory support.
+dscr.txt
+ - Overview DSCR (Data Stream Control Register) support.
diff --git a/Documentation/powerpc/cxl.txt b/Documentation/powerpc/cxl.txt
index 2c71ecc..2a230d01 100644
--- a/Documentation/powerpc/cxl.txt
+++ b/Documentation/powerpc/cxl.txt
@@ -133,6 +133,9 @@ User API
The following file operations are supported on both slave and
master devices.
+ A userspace library libcxl is avaliable here:
+ https://github.com/ibm-capi/libcxl
+ This provides a C interface to this kernel API.
open
----
@@ -366,6 +369,7 @@ Sysfs Class
enumeration and tuning of the accelerators. Its layout is
described in Documentation/ABI/testing/sysfs-class-cxl
+
Udev rules
==========
diff --git a/Documentation/powerpc/dscr.txt b/Documentation/powerpc/dscr.txt
new file mode 100644
index 0000000..1ff4400
--- /dev/null
+++ b/Documentation/powerpc/dscr.txt
@@ -0,0 +1,83 @@
+ DSCR (Data Stream Control Register)
+ ================================================
+
+DSCR register in powerpc allows user to have some control of prefetch of data
+stream in the processor. Please refer to the ISA documents or related manual
+for more detailed information regarding how to use this DSCR to attain this
+control of the pefetches . This document here provides an overview of kernel
+support for DSCR, related kernel objects, it's functionalities and exported
+user interface.
+
+(A) Data Structures:
+
+ (1) thread_struct:
+ dscr /* Thread DSCR value */
+ dscr_inherit /* Thread has changed default DSCR */
+
+ (2) PACA:
+ dscr_default /* per-CPU DSCR default value */
+
+ (3) sysfs.c:
+ dscr_default /* System DSCR default value */
+
+(B) Scheduler Changes:
+
+ Scheduler will write the per-CPU DSCR default which is stored in the
+ CPU's PACA value into the register if the thread has dscr_inherit value
+ cleared which means that it has not changed the default DSCR till now.
+ If the dscr_inherit value is set which means that it has changed the
+ default DSCR value, scheduler will write the changed value which will
+ now be contained in thread struct's dscr into the register instead of
+ the per-CPU default PACA based DSCR value.
+
+ NOTE: Please note here that the system wide global DSCR value never
+ gets used directly in the scheduler process context switch at all.
+
+(C) SYSFS Interface:
+
+ Global DSCR default: /sys/devices/system/cpu/dscr_default
+ CPU specific DSCR default: /sys/devices/system/cpu/cpuN/dscr
+
+ Changing the global DSCR default in the sysfs will change all the CPU
+ specific DSCR defaults immediately in their PACA structures. Again if
+ the current process has the dscr_inherit clear, it also writes the new
+ value into every CPU's DSCR register right away and updates the current
+ thread's DSCR value as well.
+
+ Changing the CPU specif DSCR default value in the sysfs does exactly
+ the same thing as above but unlike the global one above, it just changes
+ stuff for that particular CPU instead for all the CPUs on the system.
+
+(D) User Space Instructions:
+
+ The DSCR register can be accessed in the user space using any of these
+ two SPR numbers available for that purpose.
+
+ (1) Problem state SPR: 0x03 (Un-privileged, POWER8 only)
+ (2) Privileged state SPR: 0x11 (Privileged)
+
+ Accessing DSCR through privileged SPR number (0x11) from user space
+ works, as it is emulated following an illegal instruction exception
+ inside the kernel. Both mfspr and mtspr instructions are emulated.
+
+ Accessing DSCR through user level SPR (0x03) from user space will first
+ create a facility unavailable exception. Inside this exception handler
+ all mfspr isntruction based read attempts will get emulated and returned
+ where as the first mtspr instruction based write attempts will enable
+ the DSCR facility for the next time around (both for read and write) by
+ setting DSCR facility in the FSCR register.
+
+(E) Specifics about 'dscr_inherit':
+
+ The thread struct element 'dscr_inherit' represents whether the thread
+ in question has attempted and changed the DSCR itself using any of the
+ following methods. This element signifies whether the thread wants to
+ use the CPU default DSCR value or its own changed DSCR value in the
+ kernel.
+
+ (1) mtspr instruction (SPR number 0x03)
+ (2) mtspr instruction (SPR number 0x11)
+ (3) ptrace interface (Explicitly set user DSCR value)
+
+ Any child of the process created after this event in the process inherits
+ this same behaviour as well.
diff --git a/Documentation/powerpc/transactional_memory.txt b/Documentation/powerpc/transactional_memory.txt
index ded6979..ba0a2a4 100644
--- a/Documentation/powerpc/transactional_memory.txt
+++ b/Documentation/powerpc/transactional_memory.txt
@@ -74,22 +74,23 @@ Causes of transaction aborts
Syscalls
========
-Performing syscalls from within transaction is not recommended, and can lead
-to unpredictable results.
+Syscalls made from within an active transaction will not be performed and the
+transaction will be doomed by the kernel with the failure code TM_CAUSE_SYSCALL
+| TM_CAUSE_PERSISTENT.
-Syscalls do not by design abort transactions, but beware: The kernel code will
-not be running in transactional state. The effect of syscalls will always
-remain visible, but depending on the call they may abort your transaction as a
-side-effect, read soon-to-be-aborted transactional data that should not remain
-invisible, etc. If you constantly retry a transaction that constantly aborts
-itself by calling a syscall, you'll have a livelock & make no progress.
+Syscalls made from within a suspended transaction are performed as normal and
+the transaction is not explicitly doomed by the kernel. However, what the
+kernel does to perform the syscall may result in the transaction being doomed
+by the hardware. The syscall is performed in suspended mode so any side
+effects will be persistent, independent of transaction success or failure. No
+guarantees are provided by the kernel about which syscalls will affect
+transaction success.
-Simple syscalls (e.g. sigprocmask()) "could" be OK. Even things like write()
-from, say, printf() should be OK as long as the kernel does not access any
-memory that was accessed transactionally.
-
-Consider any syscalls that happen to work as debug-only -- not recommended for
-production use. Best to queue them up till after the transaction is over.
+Care must be taken when relying on syscalls to abort during active transactions
+if the calls are made via a library. Libraries may cache values (which may
+give the appearance of success) or perform operations that cause transaction
+failure before entering the kernel (which may produce different failure codes).
+Examples are glibc's getpid() and lazy symbol resolution.
Signals
@@ -176,8 +177,7 @@ kernel aborted a transaction:
TM_CAUSE_RESCHED Thread was rescheduled.
TM_CAUSE_TLBI Software TLB invalid.
TM_CAUSE_FAC_UNAV FP/VEC/VSX unavailable trap.
- TM_CAUSE_SYSCALL Currently unused; future syscalls that must abort
- transactions for consistency will use this.
+ TM_CAUSE_SYSCALL Syscall from active transaction.
TM_CAUSE_SIGNAL Signal delivered.
TM_CAUSE_MISC Currently unused.
TM_CAUSE_ALIGNMENT Alignment fault.
diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
index 96978ec..1dd3fdd 100644
--- a/Documentation/vfio.txt
+++ b/Documentation/vfio.txt
@@ -289,10 +289,12 @@ PPC64 sPAPR implementation note
This implementation has some specifics:
-1) Only one IOMMU group per container is supported as an IOMMU group
-represents the minimal entity which isolation can be guaranteed for and
-groups are allocated statically, one per a Partitionable Endpoint (PE)
+1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per
+container is supported as an IOMMU table is allocated at the boot time,
+one table per a IOMMU group which is a Partitionable Endpoint (PE)
(PE is often a PCI domain but not always).
+Newer systems (POWER8 with IODA2) have improved hardware design which allows
+to remove this limitation and have multiple IOMMU groups per a VFIO container.
2) The hardware supports so called DMA windows - the PCI address range
within which DMA transfer is allowed, any attempt to access address space
@@ -385,6 +387,18 @@ The code flow from the example above should be slightly changed:
....
+ /* Inject EEH error, which is expected to be caused by 32-bits
+ * config load.
+ */
+ pe_op.op = VFIO_EEH_PE_INJECT_ERR;
+ pe_op.err.type = EEH_ERR_TYPE_32;
+ pe_op.err.func = EEH_ERR_FUNC_LD_CFG_ADDR;
+ pe_op.err.addr = 0ul;
+ pe_op.err.mask = 0ul;
+ ioctl(container, VFIO_EEH_PE_OP, &pe_op);
+
+ ....
+
/* When 0xFF's returned from reading PCI config space or IO BARs
* of the PCI device. Check the PE's state to see if that has been
* frozen.
@@ -427,6 +441,48 @@ The code flow from the example above should be slightly changed:
....
+5) There is v2 of SPAPR TCE IOMMU. It deprecates VFIO_IOMMU_ENABLE/
+VFIO_IOMMU_DISABLE and implements 2 new ioctls:
+VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY
+(which are unsupported in v1 IOMMU).
+
+PPC64 paravirtualized guests generate a lot of map/unmap requests,
+and the handling of those includes pinning/unpinning pages and updating
+mm::locked_vm counter to make sure we do not exceed the rlimit.
+The v2 IOMMU splits accounting and pinning into separate operations:
+
+- VFIO_IOMMU_SPAPR_REGISTER_MEMORY/VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY ioctls
+receive a user space address and size of the block to be pinned.
+Bisecting is not supported and VFIO_IOMMU_UNREGISTER_MEMORY is expected to
+be called with the exact address and size used for registering
+the memory block. The userspace is not expected to call these often.
+The ranges are stored in a linked list in a VFIO container.
+
+- VFIO_IOMMU_MAP_DMA/VFIO_IOMMU_UNMAP_DMA ioctls only update the actual
+IOMMU table and do not do pinning; instead these check that the userspace
+address is from pre-registered range.
+
+This separation helps in optimizing DMA for guests.
+
+6) sPAPR specification allows guests to have an additional DMA window(s) on
+a PCI bus with a variable page size. Two ioctls have been added to support
+this: VFIO_IOMMU_SPAPR_TCE_CREATE and VFIO_IOMMU_SPAPR_TCE_REMOVE.
+The platform has to support the functionality or error will be returned to
+the userspace. The existing hardware supports up to 2 DMA windows, one is
+2GB long, uses 4K pages and called "default 32bit window"; the other can
+be as big as entire RAM, use different page size, it is optional - guests
+create those in run-time if the guest driver supports 64bit DMA.
+
+VFIO_IOMMU_SPAPR_TCE_CREATE receives a page shift, a DMA window size and
+a number of TCE table levels (if a TCE table is going to be big enough and
+the kernel may not be able to allocate enough of physically contiguous memory).
+It creates a new window in the available slot and returns the bus address where
+the new window starts. Due to hardware limitation, the user space cannot choose
+the location of DMA windows.
+
+VFIO_IOMMU_SPAPR_TCE_REMOVE receives the bus start address of the window
+and removes it.
+
-------------------------------------------------------------------------------
[1] VFIO was originally an acronym for "Virtual Function I/O" in its