linux - NXP/Freescale LSDK linux tree with Scalys patches

diff options

author	Alex Williamson <alex.williamson@redhat.com>	2015-02-06 17:58:56 (GMT)
committer	Alex Williamson <alex.williamson@redhat.com>	2015-02-06 17:58:56 (GMT)
commit	6fe1010d6d9c02cf3556ab076585104551a6ee7e (patch)
tree	a4067ec65d2adef950cd233db2998c725b0a6905 /Documentation/firmware_class
parent	e36f014edff70fc02b3d3d79cead1d58f289332e (diff)
download	linux-6fe1010d6d9c02cf3556ab076585104551a6ee7e.tar.xz

vfio/type1: DMA unmap chunking

When unmapping DMA entries we try to rely on the IOMMU API behavior that allows the IOMMU to unmap a larger area than requested, up to the size of the original mapping. This works great when the IOMMU supports superpages *and* they're in use. Otherwise, each PAGE_SIZE increment is unmapped separately, resulting in poor performance. Instead we can use the IOVA-to-physical-address translation provided by the IOMMU API and unmap using the largest contiguous physical memory chunk available, which is also how vfio/type1 would have mapped the region. For a synthetic 1TB guest VM mapping and shutdown test on Intel VT-d (2M IOMMU pagesize support), this achieves about a 30% overall improvement mapping standard 4K pages, regardless of IOMMU superpage enabling, and about a 40% improvement mapping 2M hugetlbfs pages when IOMMU superpages are not available. Hugetlbfs with IOMMU superpages enabled is effectively unchanged. Unfortunately the same algorithm does not work well on IOMMUs with fine-grained superpages, like AMD-Vi, costing about 25% extra since the IOMMU will automatically unmap any power-of-two contiguous mapping we've provided it. We add a routine and a domain flag to detect this feature, leaving AMD-Vi unaffected by this unmap optimization. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Diffstat (limited to 'Documentation/firmware_class')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: