summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/i915/i915_gem.c
AgeCommit message (Collapse)Author
2016-08-15Merge tag 'drm-intel-next-2016-08-08' of ↵Dave Airlie
git://anongit.freedesktop.org/drm-intel into drm-next - refactor ddi buffer programming a bit (Ville) - large-scale renaming to untangle naming in the gem code (Chris) - rework vma/active tracking for accurately reaping idle mappings of shared objects (Chris) - misc dp sst/mst probing corner case fixes (Ville) - tons of cleanup&tunings all around in gem - lockless (rcu-protected) request lookup, plus use it everywhere for non(b)locking waits (Chris) - pipe crc debugfs fixes (Rodrigo) - random fixes all over * tag 'drm-intel-next-2016-08-08' of git://anongit.freedesktop.org/drm-intel: (222 commits) drm/i915: Update DRIVER_DATE to 20160808 drm/i915: fix aliasing_ppgtt leak drm/i915: Update comment before i915_spin_request drm/i915: Use drm official vblank_no_hw_counter callback. drm/i915: Fix copy_to_user usage for pipe_crc Revert "drm/i915: Track active streams also for DP SST" drm/i915: fix WaInsertDummyPushConstPs drm/i915: Assert that the request hasn't been retired drm/i915: Repack fence tiling mode and stride into a single integer drm/i915: Document and reject invalid tiling modes drm/i915: Remove locking for get_tiling drm/i915: Remove pinned check from madvise ioctl drm/i915: Reduce locking inside swfinish ioctl drm/i915: Remove (struct_mutex) locking for busy-ioctl drm/i915: Remove (struct_mutex) locking for wait-ioctl drm/i915: Do a nonblocking wait first in pread/pwrite drm/i915: Remove unused no-shrinker-steal drm/i915: Tidy generation of the GTT mmap offset drm/i915/shrinker: Wait before acquiring struct_mutex under oom drm/i915: Simplify do_idling() (Ironlake vt-d w/a) ...
2016-08-14drm/i915: Unbind closed vma for i915_gem_object_unbind()Chris Wilson
Closed vma are removed from the obj->vma_list so that they cannot be found by userspace. However, this means that when forcibly unbinding an object, we have to wait upon all rendering to that object first in order for the closed, but active, vma to be reaped and their bindings removed. Reported-by: Matthew Auld <matthew.auld@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97343 Fixes: aa653a685d81 ("drm/i915: Be more careful when unbinding vma") Fixes: 8a3b3d576c93 (" drm/i915: Convert non-blocking userptr waits...") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471196681-30043-2-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com> Tested-by: Matthew Auld <matthew.auld@intel.com>
2016-08-14drm/i915: Initialize return value for empty i915_gem_object_unbind()Chris Wilson
If the obj->vma_list is empty, we immediately return ret. However, we are doing so having never set it to any value, it should be zero! Reported-by: Matthew Auld <matthew.auld@intel.com> References: https://bugs.freedesktop.org/show_bug.cgi?id=97343 Fixes: aa653a685d81 ("drm/i915: Be more careful when unbinding vma") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471196681-30043-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Matthew Auld <matthew.auld@intel.com>
2016-08-12drm/i915: Support for creating write combined type vmapsChris Wilson
vmaps has a provision for controlling the page protection bits, with which we can use to control the mapping type, e.g. WB, WC, UC or even WT. To allow the caller to choose their mapping type, we add a parameter to i915_gem_object_pin_map - but we still only allow one vmap to be cached per object. If the object is currently not pinned, then we recreate the previous vmap with the new access type, but if it was pinned we report an error. This effectively limits the access via i915_gem_object_pin_map to a single mapping type for the lifetime of the object. Not usually a problem, but something to be aware of when setting up the object's vmap. We will want to vary the access type to enable WC mappings of ringbuffer and context objects on !llc platforms, as well as other objects where we need coherent access to the GPU's pages without going through the GTT v2: Remove the redundant braces around pin count check and fix the marker in documentation (Chris) v3: - Add a new enum for the vmalloc mapping type & pass that as an argument to i915_object_pin_map. (Tvrtko) - Use PAGE_MASK to extract or filter the mapping type info and remove a superfluous BUG_ON.(Tvrtko) v4: - Rename the enums and clean up the pin_map function. (Chris) v5: Drop the VM_NO_GUARD, minor cosmetics. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Akash Goel <akash.goel@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
2016-08-10drm/i915: Add missing rpm wakelock to GGTT preadChris Wilson
Joonas spotted a discrepancy between the pwrite and pread ioctls, in that pwrite takes the rpm wakelock around its GGTT access, The wakelock is required in order for the GTT to function. In disregard for the current convention, we take the rpm wakelock around the access itself rather than around the struct_mutex as the nesting is not strictly required and such ordering will one day be fixed by explicitly noting the barrier dependencies between the GGTT and rpm. Fixes: b50a53715f09 ("drm/i915: Support for pread/pwrite ...") Reported-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: drm-intel-fixes@lists.freedesktop.org Link: http://patchwork.freedesktop.org/patch/msgid/1470298193-21765-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (cherry picked from commit 1dd5b6f2020389e75bb3d269c038497f065e68c9) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-08-10drm/i915: Handle ENOSPC after failing to insert a mappable nodeChris Wilson
Even after adding individual page support for GTT mmaping, we can still fail to find any space within the mappable region, and drm_mm_insert_node() will then report ENOSPC. We have to then handle this error by using the shmem access to the pages. Fixes: b50a53715f09 ("drm/i915: Support for pread/pwrite ... objects") Testcase: igt/gem_concurrent_blit Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com Link: http://patchwork.freedesktop.org/patch/msgid/1468690956-23480-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (cherry picked from commit d1054ee492a89b134fb0ac527b0714c277ae9c0f) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-08-10drm/i915: Flush GT idle status upon resetChris Wilson
Upon resetting the GPU, we force the engines to be idle by clearing their request lists. However, I neglected to clear the GT active status and so the next request following the reset was not marking the device as busy again. (We had to wait until any outstanding retire worker finally ran and cleared the active status.) Fixes: 67d97da34917 ("drm/i915: Only start retire worker when idle") Testcase: igt/pm_rps/reset Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1468397438-21226-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> (cherry picked from commit b913b33c43db849778f044d4b9e74b167898a9bc) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2016-08-10drm/i915: Move missed interrupt detection from hangcheck to breadcrumbsChris Wilson
In commit 2529d57050af ("drm/i915: Drop racy markup of missed-irqs from idle-worker") the racy detection of missed interrupts was removed when we went idle. This however opened up the issue that the stuck waiters were not being reported, causing a test case failure. If we move the stuck waiter detection out of hangcheck and into the breadcrumb mechanims (i.e. the waiter) itself, we can avoid this issue entirely. This leaves hangcheck looking for a stuck GPU (inspecting for request advancement and HEAD motion), and breadcrumbs looking for a stuck waiter - hopefully make both easier to understand by their segregation. v2: Reduce the error message as we now run independently of hangcheck, and the hanging batch used by igt also counts as a stuck waiter causing extra warnings in dmesg. v3: Move the breadcrumb's hangcheck kickstart to the first missed wait. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97104 Fixes: 2529d57050af (waiter"drm/i915: Drop racy markup of missed-irqs...") Testcase: igt/drv_missed_irq Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470761272-1245-2-git-send-email-chris@chris-wilson.co.uk
2016-08-10drm/i915: Always mark the writer as also a read for busy ioctlChris Wilson
One of the few guarantees we want the busy ioctl to provide is that the reported busy writer is included in the set of busy read engines. This should be provided by the ordering of setting and retiring the active trackers, but we can do better by explicitly setting the busy read engine flag for the last writer. v2: More comments inside __busy_write_id() to explain why both fields are set. Fixes: 3fdc13c7a3cb ("drm/i915: Remove (struct_mutex) locking for busy-ioctl") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470762505-12799-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2016-08-09drm/i915: Add smp_rmb() to busy ioctl's RCU danceChris Wilson
In the debate as to whether the second read of active->request is ordered after the dependent reads of the first read of active->request, just give in and throw a smp_rmb() in there so that ordering of loads is assured. v2: Explain the manual smp_rmb() Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470731014-6894-1-git-send-email-chris@chris-wilson.co.uk
2016-08-09drm/i915: Don't check for idleness before retiring after a GPU hangChris Wilson
When we force the cleanup after a GPU hang, we want to retire all requests, or else we may leak them if truly wedged (and the GPU never advances again). Converting to the active request helpers had the issue of doing the check against busyness before reporting the request, so if we claim the GPU had hung but this engine hadn't we could potential skip the request cleanup - triggering the self-check BUG. Fixes: dcff85c8443e ("drm/i915: Enable i915_gem_wait_for_idle() ...") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470728222-10243-3-git-send-email-chris@chris-wilson.co.uk
2016-08-05drm/i915: Repack fence tiling mode and stride into a single integerChris Wilson
In the previous commit, we moved the obj->tiling_mode out of a bitfield and into its own integer so that we could safely use READ_ONCE(). Let us now repair some of that damage by sharing the tiling_mode with its companion, the fence stride. v2: New magic Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-18-git-send-email-chris@chris-wilson.co.uk
2016-08-05drm/i915: Remove pinned check from madvise ioctlChris Wilson
We don't need to incur the overhead of checking whether the object is pinned prior to changing its madvise. If the object is pinned, the madvise will not take effect until it is unpinned and so we cannot free the pages being pointed at by hardware. Marking a pinned object with allocated pages as DONTNEED will not trigger any undue warnings. The check is therefore superfluous, and by removing it we can remove a linear walk over all the vma the object has. Still despite it being an overzealous check, that error code is part of the current ABI and so we must proceed with caution. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-15-git-send-email-chris@chris-wilson.co.uk
2016-08-05drm/i915: Reduce locking inside swfinish ioctlChris Wilson
We only need to take the struct_mutex if the object is pinned to the display engine and so requires checking for clflush. (The race with userspace pinning the object to a framebuffer is irrelevant.) v2: Use access once for compiler hints (or not as it is a bitfield) v3: READ_ONCE, obj->pin_display is not a bitfield anymore v4: Don't be creative with goto. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-14-git-send-email-chris@chris-wilson.co.uk
2016-08-05drm/i915: Remove (struct_mutex) locking for busy-ioctlChris Wilson
By applying the same logic as for wait-ioctl, we can query whether a request has completed without holding struct_mutex. The biggest impact system-wide is removing the flush_active and the contention that causes. Testcase: igt/gem_busy Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Akash Goel <akash.goel@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-13-git-send-email-chris@chris-wilson.co.uk
2016-08-05drm/i915: Remove (struct_mutex) locking for wait-ioctlChris Wilson
With a bit of care (and leniency) we can iterate over the object and wait for previous rendering to complete with judicial use of atomic reference counting. The ABI requires us to ensure that an active object is eventually flushed (like the busy-ioctl) which is guaranteed by our management of requests (i.e. everything that is submitted to hardware is flushed in the same request). All we have to do is ensure that we can detect when the requests are complete for reporting when the object is idle (without triggering ETIME), locklessly - this is handled by i915_gem_active_wait_unlocked(). The impact of this is actually quite small - the return to userspace following the wait was already lockless and so we don't see much gain in latency improvement upon completing the wait. What we do achieve here is completing an already finished wait without hitting the struct_mutex, our hold is quite short and so we are typically just a victim of contention rather than a cause - but it is still one less contention point! v2: Break up a long line. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-12-git-send-email-chris@chris-wilson.co.uk
2016-08-05drm/i915: Do a nonblocking wait first in pread/pwriteChris Wilson
If we try and read or write to an active request, we first must wait upon the GPU completing that request. Let's do that without holding the mutex (and so allow someone else to access the GPU whilst we wait). Upon completion, we will acquire the mutex and only then start the operation (i.e. we do not rely on state from before the initial wait). v2: Repaint the goto labels v3: Move the tracepoints back to the start of the ioctls Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-11-git-send-email-chris@chris-wilson.co.uk
2016-08-05drm/i915: Tidy generation of the GTT mmap offsetChris Wilson
If we make the observation that mmap-offsets are only released when we free an object, we can then deduce that the shrinker only creates free space in the mmap arena indirectly by flushing the request list and freeing expired objects. If we combine this with the lockless vma-manager and lockless idling, we can avoid taking our big struct_mutex until we need to actually free the requests. One side-effect is that we defer the madvise checking until we need the pages (i.e. the fault handler). This brings us into line with the other delayed checks (and madvise in general). v2: s/ret/err/ and use if (!err) rather than if (ret == 0) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-9-git-send-email-chris@chris-wilson.co.uk
2016-08-05drm/i915: Enable i915_gem_wait_for_idle() without holding struct_mutexChris Wilson
The principal motivation for this was to try and eliminate the struct_mutex from i915_gem_suspend - but we still need to hold the mutex current for the i915_gem_context_lost(). (The issue there is that there may be an indirect lockdep cycle between cpu_hotplug (i.e. suspend) and struct_mutex via the stop_machine().) For the moment, enabling last request tracking for the engine, allows us to do busyness checking and waiting without requiring the struct_mutex - which is useful in its own right. As a side-effect of having a robust means for tracking engine busyness, we can replace our other busyness heuristic, that of comparing against the last submitted seqno. For paranoid reasons, we have a semi-ordered check of that seqno inside the hangchecker, which we can now improve to an ordered check of the engine's busyness (removing a locked xchg in the process). v2: Pass along "bool interruptible" as being unlocked we cannot rely on i915->mm.interruptible being stable or even under our control. v3: Replace check Ironlake i915_gpu_busy() with the common precalculated value Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-6-git-send-email-chris@chris-wilson.co.uk
2016-08-05drm/i915: Remove forced stop ring on suspend/unloadChris Wilson
Before suspending (or unloading), we would first wait upon all rendering to be completed and then disable the rings. This later step is a remanent from DRI1 days when we did not use request tracking for all operations upon the ring. Now that we are sure we are waiting upon the very last operation by the engine, we can forgo clobbering the ring registers, though we do keep the assert that the engine is indeed idle before sleeping. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-5-git-send-email-chris@chris-wilson.co.uk
2016-08-05drm/i915: Convert non-blocking waits for requests over to using RCUChris Wilson
We can completely avoid taking the struct_mutex around the non-blocking waits by switching over to the RCU request management (trading the mutex for a RCU read lock and some complex atomic operations). The improvement is that we gain further contention reduction, and overall the code become simpler due to the reduced mutex dancing. v2: Move i915_gem_fault tracepoint back to the start of the function, before the unlocked wait. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470388464-28458-2-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Enable lockless lookup of request tracking via RCUChris Wilson
If we enable RCU for the requests (providing a grace period where we can inspect a "dead" request before it is freed), we can allow callers to carefully perform lockless lookup of an active request. However, by enabling deferred freeing of requests, we can potentially hog a lot of memory when dealing with tens of thousands of requests per second - with a quick insertion of a synchronize_rcu() inside our shrinker callback, that issue disappears. v2: Currently, it is our responsibility to handle reclaim i.e. to avoid hogging memory with the delayed slab frees. At the moment, we wait for a grace period in the shrinker, and block for all RCU callbacks on oom. Suggested alternatives focus on flushing our RCU callback when we have a certain number of outstanding request frees, and blocking on that flush after a second high watermark. (So rather than wait for the system to run out of memory, we stop issuing requests - both are nondeterministic.) Paul E. McKenney wrote: Another approach is synchronize_rcu() after some largish number of requests. The advantage of this approach is that it throttles the production of callbacks at the source. The corresponding disadvantage is that it slows things up. Another approach is to use call_rcu(), but if the previous call_rcu() is still in flight, block waiting for it. Yet another approach is the get_state_synchronize_rcu() / cond_synchronize_rcu() pair. The idea is to do something like this: cond_synchronize_rcu(cookie); cookie = get_state_synchronize_rcu(); You would of course do an initial get_state_synchronize_rcu() to get things going. This would not block unless there was less than one grace period's worth of time between invocations. But this assumes a busy system, where there is almost always a grace period in flight. But you can make that happen as follows: cond_synchronize_rcu(cookie); cookie = get_state_synchronize_rcu(); call_rcu(&my_rcu_head, noop_function); Note that you need additional code to make sure that the old callback has completed before doing a new one. Setting and clearing a flag with appropriate memory ordering control suffices (e.g,. smp_load_acquire() and smp_store_release()). v3: More comments on compiler and processor order of operations within the RCU lookup and discover we can use rcu_access_pointer() here instead. v4: Wrap i915_gem_active_get_rcu() to take the rcu_read_lock itself. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: "Goel, Akash" <akash.goel@intel.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Move i915_gem_object_wait_rendering()Chris Wilson
Just move it earlier so that we can use the companion nonblocking version in a couple of more callsites without having to add a forward declaration. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-24-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Move obj->active:5 to obj->flagsChris Wilson
We are motivated to avoid using a bitfield for obj->active for a couple of reasons. Firstly, we wish to document our lockless read of obj->active using READ_ONCE inside i915_gem_busy_ioctl() and that requires an integral type (i.e. not a bitfield). Secondly, gcc produces abysmal code when presented with a bitfield and that shows up high on the profiles of request tracking (mainly due to excess memory traffic as it converts the bitfield to a register and back and generates frequent AGI in the process). v2: BIT, break up a long line in compute the other engines, new paint for i915_gem_object_is_active (now i915_gem_object_get_active). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-23-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Use atomics to manipulate obj->frontbuffer_bitsChris Wilson
The individual bits inside obj->frontbuffer_bits are protected by each plane->mutex, but the whole bitfield may be accessed by multiple KMS operations simultaneously and so the RMW need to be under atomics. However, for updating the single field we do not need to mandate that it be under the struct_mutex, one more step towards its removal as the de facto BKL. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-21-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Make fb_tracking.lock a spinlockChris Wilson
We only need a very lightweight mechanism here as the locking is only used for co-ordinating a bitfield. v2: Move the cheap unlikely tests into the caller v3: Move the kerneldoc into the header (now separated out into intel_fronbuffer.h for better kerneldoc and readability) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtien <joonas.lahtinen@linux.intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-20-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Separate intel_frontbuffer into its own headerChris Wilson
In view of adding inline functions into the intel_frontbuffer section, we first split the header into its own file so that we can integrate it more easily with kerneldoc. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-19-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Remove highly confusing i915_gem_obj_ggtt_pin()Chris Wilson
Since i915_gem_obj_ggtt_pin() is an idiom breaking curry function for i915_gem_object_ggtt_pin(), spare us the confusion and remove it. Removing it now simplifies later patches to change the i915_vma_pin() (and friends) interface. v2: Add a redundant GEM_BUG_ON(!view) to i915_gem_obj_lookup_or_create_ggtt_vma() Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-18-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Make i915_vma_pin() small and inlineChris Wilson
Not only is i915_vma_pin() called for every single object on every single execbuf, it is usually a simple increment as the VMA is already bound for execution by the GPU. Rearrange the tests for unbound and pin_count overflow so that we can do the increment and test very cheaply and compact enough to inline the operation into execbuf. The trick used is to note that we can check for an overflow bit (keeping space available for it inside the flags) at the same time as checking the binding bits. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-17-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Combine all i915_vma bitfields into a single set of flagsChris Wilson
In preparation to perform some magic to speed up i915_vma_pin(), which is among the hottest of hot paths in execbuf, refactor all the bitfields accessed by i915_vma_pin() into a single unified set of flags. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-16-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Start passing around i915_vma from execbufferChris Wilson
During execbuffer we look up the i915_vma in order to reserve them in the VM. However, we then do a double lookup of the vma in order to then pin them, all because we lack the necessary interfaces to operate on i915_vma - so introduce i915_vma_pin()! v2: Tidy parameter lists to remove one level of redirection in the hot path. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-15-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Wrap vma->pin_count accessors with small inline helpersChris Wilson
In the next few patches, the VMA pinning API is overhauled and to reduce the churn we pull out the update to the accessors into a prep patch. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-14-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Record allocated vma sizeChris Wilson
Tracking the size of the VMA as allocated allows us to dramatically reduce the complexity of later functions (like inserting the VMA in to the drm_mm range manager). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-13-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Update i915_gem_get_ggtt_size/_alignment to use drm_i915_privateChris Wilson
For consistency, internal functions should take drm_i915_private rather than drm_device. Now that we are subclassing drm_device, there are no more size wins, but being consistent is its own blessing. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-12-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Update the GGTT size/alignment query functionsChris Wilson
In order to be consistent with other address space functions, we want to pass around 64-bit sizes, even though all known global GTT are limited to 4GiB. Similarly, we are trying to be consistent in using the _ggtt_ nomenclature when referring to the special global GTT. v2: Update docs to consistently state "global GTT". Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-11-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Convert 4096 alignment request to 0 for drm_mm allocationsChris Wilson
As we always allocate in chunks of 4096 (that being both the PAGE_SIZE and our own GTT_PAGE_SIZE), we know that all results from the drm_mm are aligned to at least 4096. The drm_mm allocator itself is optimised for alignment == 0, and so by converting alignments of 4096 to 0 we can satisfy our own requirements and still hit the faster path. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-10-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Split insertion/binding of an object into the VMChris Wilson
Split the insertion into the address space's range manager and binding of that object into the GTT to simplify the code flow when pinning a VMA. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-9-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Reduce WARN(i915_gem_valid_gtt_space) to a debug-only checkChris Wilson
i915_gem_valid_gtt_space() is used after inserting the VMA to double check the list - the location should have been chosen to pass all the restrictions. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-8-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Pad GTT views of exec objects up to user specified sizeChris Wilson
Our GPUs impose certain requirements upon buffers that depend upon how exactly they are used. Typically this is expressed as that they require a larger surface than would be naively computed by pitch * height. Normally such requirements are hidden away in the userspace driver, but when we accept pointers from strangers and later impose extra conditions on them, the original client allocator has no idea about the monstrosities in the GPU and we require the userspace driver to inform the kernel how many padding pages are required beyond the client allocation. v2: Long time, no see v3: Try an anonymous union for uapi struct compatibility Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-7-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Fix up vma alignment to be u64Chris Wilson
This is not the full fix, as we are required to percolate the u64 nature down through the drm_mm stack, but this is required now to prevent explosions due to mismatch between execbuf (eb_vma_misplaced) and vma binding (i915_vma_misplaced) - and reduces the risk of spurious changes as we adjust the vma interface in the next patches. v2: long long casts not required for u64 printk (%llx) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-6-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Remove surplus drm_device parameter to i915_gem_evict_something()Chris Wilson
Eviction is VM local, so we can ignore the significance of the drm_device in the caller, and leave it to i915_gem_evict_something() to manage itself. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470324762-2545-2-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Add missing rpm wakelock to GGTT preadChris Wilson
Joonas spotted a discrepancy between the pwrite and pread ioctls, in that pwrite takes the rpm wakelock around its GGTT access, The wakelock is required in order for the GTT to function. In disregard for the current convention, we take the rpm wakelock around the access itself rather than around the struct_mutex as the nesting is not strictly required and such ordering will one day be fixed by explicitly noting the barrier dependencies between the GGTT and rpm. Fixes: b50a53715f09 ("drm/i915: Support for pread/pwrite ...") Reported-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ankitprasad Sharma <ankitprasad.r.sharma@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: drm-intel-fixes@lists.freedesktop.org Link: http://patchwork.freedesktop.org/patch/msgid/1470298193-21765-1-git-send-email-chris@chris-wilson.co.uk Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2016-08-04Revert "drm/i915: Clean up associated VMAs on context destruction"Chris Wilson
This reverts commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae. The patch was only a stop-gap measure that fixed half the problem - the leak of the fbcon when restarting X. A complete solution required releasing the VMA when the object itself was closed rather than rely on file/process exit. The previous patches add the VMA tracking necessary to do close them along with the object, context or file, and so the time has come to remove the partial fix. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-28-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Mark the context and address space as closedChris Wilson
When the user closes the context mark it and the dependent address space as closed. As we use an asynchronous destruct method, this has two purposes. First it allows us to flag the closed context and detect internal errors if we to create any new objects for it (as it is removed from the user's namespace, these should be internal bugs only). And secondly, it allows us to immediately reap stale vma. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-27-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Release vma when the handle is closedChris Wilson
In order to prevent a leak of the vma on shared objects, we need to hook into the object_close callback to destroy the vma on the object for this file. However, if we destroyed that vma immediately we may cause unexpected application stalls as we try to unbind a busy vma - hence we defer the unbind to when we retire the vma. v2: Keep vma allocated until closed. This is useful for a later optimisation, but it is required now in order to handle potential recursion of i915_vma_unbind() by retiring itself. v3: Comments are important. Testcase: igt/gem_ppggtt/flink-and-close-vma-leak Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-26-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Track active vma requestsChris Wilson
Hook the vma itself into the i915_gem_request_retire() so that we can accurately track when a solitary vma is inactive (as opposed to having to wait for the entire object to be idle). This improves the interaction when using multiple contexts (with full-ppgtt) and eliminates some frequent list walking when retiring objects after a completed request. A side-effect is that we get an active vma reference for free. The consequence of this is shown in the next patch... v2: Update inline names to be consistent with i915_gem_object_get_active() Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-25-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: i915_vma_move_to_active prep patchChris Wilson
This patch is broken out of the next just to remove the code motion from that patch and make it more readable. What we do here is move the i915_vma_move_to_active() to i915_gem_execbuffer.c and put the three stages (read, write, fenced) together so that future modifications to active handling are all located in the same spot. The importance of this is so that we can more simply control the order in which the requests are place in the retirement list (i.e. control the order at which we retire and so control the lifetimes to avoid having to hold onto references). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-24-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Move request list retirement to i915_gem_request.cChris Wilson
As the list retirement is now clean of implementation details, we can move it closer to the request management. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-23-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: s/__i915_wait_request/i915_wait_request/Chris Wilson
There is only one wait on request function now, so drop the "expert" indication of leading __. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-21-git-send-email-chris@chris-wilson.co.uk
2016-08-04drm/i915: Refactor activity tracking for requestsChris Wilson
With the introduction of requests, we amplified the number of atomic refcounted objects we use and update every execbuffer; from none to several references, and a set of references that need to be changed. We also introduced interesting side-effects in the order of retiring requests and objects. Instead of independently tracking the last request for an object, track the active objects for each request. The object will reside in the buffer list of its most recent active request and so we reduce the kref interchange to a list_move. Now retirements are entirely driven by the request, dramatically simplifying activity tracking on the object themselves, and removing the ambiguity between retiring objects and retiring requests. Furthermore with the consolidation of managing the activity tracking centrally, we can look forward to using RCU to enable lockless lookup of the current active requests for an object. In the future, we will be able to query the status or wait upon rendering to an object without even touching the struct_mutex BKL. All told, less code, simpler and faster, and more extensible. v2: Add a typedef for the function pointer for convenience later. v3: Make the noop retirement callback explicit. Allow passing NULL to the init_request_active() which is expanded to a common noop function. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1470293567-10811-16-git-send-email-chris@chris-wilson.co.uk