Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"First off, the ACPICA code in the kernel is updated to upstream
revision 20160831 that brings in a few bug fixes and cleanups. In
particular, it is possible to mask GPEs now (and the sysfs interface
for GPE control is fixed on top of that), problems related to the
table loading mechanism are fixed and all code related to FADT version
2 (which has never been part of the ACPI specification) is dropped.
On the new features front, there is a new watchdog driver based on the
ACPI WDAT (ACPI Watchdog Action Table), needed on some platforms to
replace the iTCO watchdog that doesn't work there, and some UART
devices get new definitions of built-in properties (to be accessed via
the generic device properties API).
Also, included is a fix for an ACPI-related PCI resorces allocation
issue and a few problems in the EC driver and in the button and
battery drivers are fixed.
In addition to that, the ACPI CPPC library is updated to make batching
of requests sent over the PCC channel possible (which reduces the PCC
usage overhead substantially in some cases) and to support functional
fixed hardware (FFH) type of CPPC registers access (which will allow
CPPC to be used on x86 too in the future).
As usual, there are some assorted fixes and cleanups too.
Specifics:
- Update of the ACPICA code in the kernel to upstream revision
20160831 with the following major changes:
* New mechanism for GPE masking.
* Fixes for issues related to the LoadTable operator and table
loading.
* Fixes for issues related to so-called module-level code (MLC),
that is AML that doesn't belong to any methods.
* Change of the return value of the _OSI method to reflect the
Windows behavior.
* GAS (Generic Address Structure) support fix related to 32-bit
FADT addresses.
* Elimination of unnecessary FADT version 2 support.
* ACPI tools fixes and cleanups.
From Bob Moore, Lv Zheng, and Jung-uk Kim.
- ACPI sysfs interface updates to fix GPE handling (on top of the new
GPE masking mechanism in ACPICA) and issues related to table
loading (Lv Zheng).
- New watchdog driver based on the ACPI WDAT (ACPI Watchdog Action
Table), needed on some platforms to replace the iTCO watchdog that
doesn't work there and related updates of the intel_pmc_ipc,
i2c/i801 and MFD/lcp_ich drivers (Mika Westerberg).
- Driver core fix to prevent it from leaking secondary fwnode objects
during device removal (Lukas Wunner).
- New definitions of built-in properties for UART in ACPI-based x86
SoC drivers and a 8250_dw driver quirk for the APM X-Gene SoC
(Heikki Krogerus).
- New device ID for the Vulcan SPI controller and constification of
local strucures in the AMD SoC (APD) ACPI driver (Kamlakant Patel,
Julia Lawall).
- Fix for a bug causing the allocation of PCI resorces to fail if
ACPI-enumerated child platform devices are registered below the PCI
devices in question (Mika Westerberg).
- Change of the default polarity for PCI legacy IRQs to high on
systems booting wth ACPI on platforms with a GIC interrupt
controller model fixing the discrepancy between the specification
and HW behavior (Lorenzo Pieralisi).
- Fixes for the handling of system suspend/resume in the ACPI EC
driver and update of that driver to make it cope with the cases
when the EC device defined in the ECDT has to be used throughout
the entire system life cycle (Lv Zheng).
- Update of the ACPI CPPC library to allow it to batch requests sent
over the PCC channel (to reduce overhead), to support the fixed
functional hardware (FFH) CPPC registers access type, to notify the
mailbox framework about TX completions when the interrupt flag is
set for the PCC mailbox, and to support HW-Reduced Communication
Subspace type 2 (Ashwin Chaugule, Prashanth Prakash, Srinivas
Pandruvada, Hoan Tran).
- ACPI button driver fix and documentation update related to the
handling of laptop lids (Lv Zheng).
- ACPI battery driver initialization fix (Carlos Garnacho).
- ACPI GPIO enumeration documentation update (Mika Westerberg).
- Assorted updates of the core ACPI bus type code (Lukas Wunner, Lv
Zheng).
- Assorted cleanups of the ACPI table parsing code and the
x86-specific ACPI code (Al Stone).
- Fixes for assorted ACPI-related issues found in linux-next (Wei
Yongjun)"
* tag 'acpi-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (98 commits)
ACPI / documentation: Use recommended name in GPIO property names
watchdog: wdat_wdt: Fix warning for using 0 as NULL
watchdog: wdat_wdt: fix return value check in wdat_wdt_probe()
platform/x86: intel_pmc_ipc: Do not create iTCO watchdog when WDAT table exists
i2c: i801: Do not create iTCO watchdog when WDAT table exists
mfd: lpc_ich: Do not create iTCO watchdog when WDAT table exists
ACPI / bus: Adjust ACPI subsystem initialization for new table loading mode
ACPICA: Parser: Fix a regression in LoadTable support
ACPICA: Tables: Fix "UNLOAD" code path lock issues
ACPI / watchdog: Add support for WDAT hardware watchdog
ACPI / platform: Pay attention to parent device's resources
PCI: Add pci_find_resource()
ACPI / CPPC: Support PCC with interrupt flag
ACPI / sysfs: Update sysfs signature handling code
ACPI / sysfs: Fix an issue for LoadTable opcode
ACPICA: Tables: Fix a regression in acpi_tb_find_table()
ACPI / tables: Remove duplicated include from tables.c
ACPI / APD: constify local structures
x86: ACPI: make variable names clearer in acpi_parse_madt_lapic_entries()
x86: ACPI: remove extraneous white space after semicolon
...
|
|
* pm-cpuidle:
ARM: cpuidle: Fix error return code
* pm-opp:
PM / OPP: Don't support OPP if it provides supported-hw but platform does not
PM / OPP: avoid maybe-uninitialized warning
* pm-avs:
PM / AVS: SmartReflex: Neaten logging
|
|
* pm-domains:
PM / Domains: Rename pm_genpd_sync_poweron|poweroff()
PM / Domains: Don't measure latency of ->power_on|off() during system PM
PM / Domains: Remove redundant system PM callbacks
PM / Domains: Simplify detaching a device from its genpd
PM / Domains: Allow holes in genpd_data.domains array
PM / Domains: Add support for removing nested PM domains by provider
PM / Domains: Add support for removing PM domains
PM / Domains: Store the provider in the PM domain structure
PM / Domains: Prepare for adding support to remove PM domains
PM / Domains: Verify the PM domain is present when adding a provider
PM / Domains: Don't expose xlate and provider helper functions
PM / Domains: Don't expose generic_pm_domain structure to clients
staging: board: Remove calls to of_genpd_get_from_provider()
ARM: EXYNOS: Remove calls to of_genpd_get_from_provider()
PM / Domains: Add new helper functions for device-tree
PM / Domains: Always enable debugfs support if available
|
|
* device-properties:
serial: 8250_dw: Add quirk for APM X-Gene SoC
ACPI / LPSS: Provide build-in properties of the UART
ACPI / APD: Provide build-in properties of the UART
driver core: Don't leak secondary fwnode on device removal
|
|
The OPP framework allows each OPP to set a opp-supported-hw property
which provides values that are matched against supported_hw values
provided by the platform to limit support for certain OPPs on specific
hardware. Currently, if the platform does not set supported_hw values,
all OPPs are interpreted as supported, even if they have provided their
own opp-supported-hw values.
If an OPP has provided opp-supported-hw, it is indicating that there is
some specific hardware configuration it is supported by. These constraints
should be honored, and if no supported_hw has been provided by the
platform, there is no way to determine if that OPP is actually supported,
so it should be marked as not supported.
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
These are internal static functions to genpd. Let's conform to the naming
rules, by dropping the "pm_" prefix from these.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Measure latency does by itself contribute to an increased latency, thus we
should avoid it when it isn't needed.
Currently genpd measures latencies in the system PM phase for the
->power_on|off() callbacks, except in the syscore case when it's not
allowed to use ktime_get() as timekeeping may be suspended.
Since there should be plenty of occasions during runtime PM to perform
these measurements, let's rely on that and drop them from system PM. This
will also make it consistent for how measurements are done of the runtime
PM callbacks (as those may be invoked during system PM).
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
In cases when the PM domain haven't assigned a system PM callback, the PM
core fall-backs to check for the callback at the driver level instead.
This makes it redundant to assign a pm_generic_* helper function to a
corresponding system PM callback at a PM domain level.
Therefore, let's remove these assignments in pm_genpd_init().
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
There's no need to validate the PM domain by using genpd_lookup_dev() when
removing the device via genpd's genpd_dev_pm_detach() function. That's
because this function can't be called, unless there is a valid PM domain
for the device.
To simplify the behaviour, let's move code from pm_genpd_remove_device()
into a new internal function, genpd_remove_device(), which is called from
pm_genpd_remove_device() and genpd_dev_pm_detach().
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Lina Iyer <lina.iyer@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fix from Mark Brown:
"A fix for an issue with double locking that was introduced earlier
this release. I'd missed in review that we were already in a locked
region when trying to drop part of the cache"
* tag 'regmap-fix-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: fix deadlock on _regmap_raw_write() error path
|
|
Commit 815806e39bf6 ("regmap: drop cache if the bus transfer error")
added a call to regcache_drop_region() to error path in
_regmap_raw_write(). However that path runs with regmap lock taken,
and regcache_drop_region() tries to re-take it, causing a deadlock.
Fix that by calling map->cache_ops->drop() directly.
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
When CONFIG_OPTIMIZE_INLINING is set and we are building with -Wmaybe-uninitialized
enabled, we can get a warning for the opp core driver:
drivers/base/power/opp/core.c: In function 'dev_pm_opp_set_rate':
drivers/base/power/opp/core.c:560:8: warning: 'ou_volt_min' may be used uninitialized in this function [-Wmaybe-uninitialized]
This has only now appeared as a result of commit 797da5598f3a ("PM / devfreq:
Add COMPILE_TEST for build coverage"), which makes the driver visible in
some configurations that didn't have it before.
The warning is a false positive that I got with gcc-6.1.1, but there is
a simple workaround in removing the local variables that we get warnings
for (all three are affected depending on the configuration). This also
makes the code easier to read.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
In platforms such as Rockchip's, the array of domains isn't always
filled without holes, as which domains are present depend on the
particular SoC revision.
By allowing holes to be in the array, such SoCs can still use a single
set of constants to index the array of power domains.
Fixes: 0159ec670763 (PM / Domains: Verify the PM domain is present when adding a provider)
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Further testing with false negatives suppressed by commit 293e2421fe25
("rcu: Remove superfluous versions of rcu_read_lock_sched_held()")
identified a few more unprotected uses of RCU from the idle loop.
Because RCU actively ignores idle-loop code (for energy-efficiency
reasons, among other things), using RCU from the idle loop can result
in too-short grace periods, in turn resulting in arbitrary misbehavior.
The affected function is rpm_suspend().
The resulting lockdep-RCU splat is as follows:
------------------------------------------------------------------------
Warning from omap3
===============================
[ INFO: suspicious RCU usage. ]
4.6.0-rc5-next-20160426+ #1112 Not tainted
-------------------------------
include/trace/events/rpm.h:63 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
1 lock held by swapper/0/0:
#0: (&(&dev->power.lock)->rlock){-.-...}, at: [<c052ee24>] __pm_runtime_suspend+0x54/0x84
stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6.0-rc5-next-20160426+ #1112
Hardware name: Generic OMAP36xx (Flattened Device Tree)
[<c0110308>] (unwind_backtrace) from [<c010c3a8>] (show_stack+0x10/0x14)
[<c010c3a8>] (show_stack) from [<c047fec8>] (dump_stack+0xb0/0xe4)
[<c047fec8>] (dump_stack) from [<c052d7b4>] (rpm_suspend+0x604/0x7e4)
[<c052d7b4>] (rpm_suspend) from [<c052ee34>] (__pm_runtime_suspend+0x64/0x84)
[<c052ee34>] (__pm_runtime_suspend) from [<c04bf3bc>] (omap2_gpio_prepare_for_idle+0x5c/0x70)
[<c04bf3bc>] (omap2_gpio_prepare_for_idle) from [<c01255e8>] (omap_sram_idle+0x140/0x244)
[<c01255e8>] (omap_sram_idle) from [<c0126b48>] (omap3_enter_idle_bm+0xfc/0x1ec)
[<c0126b48>] (omap3_enter_idle_bm) from [<c0601db8>] (cpuidle_enter_state+0x80/0x3d4)
[<c0601db8>] (cpuidle_enter_state) from [<c0183c74>] (cpu_startup_entry+0x198/0x3a0)
[<c0183c74>] (cpu_startup_entry) from [<c0b00c0c>] (start_kernel+0x354/0x3c8)
[<c0b00c0c>] (start_kernel) from [<8000807c>] (0x8000807c)
------------------------------------------------------------------------
Reported-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
If a device supports PM domains that are subdomains of another PM
domain, then the PM domains should be removed in reverse order to
ensure that the subdomains are removed first. Furthermore, if there is
more than one provider, then there needs to be a way to remove the
domains in reverse order for a specific provider.
Add the function of_genpd_remove_last() to remove the last PM domain
added by a given PM domain provider and return the generic_pm_domain
structure for the PM domain that was removed.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The genpd framework allows users to add PM domains via the pm_genpd_init()
function, however, there is no corresponding function to remove a PM
domain. For most devices this may be fine as the PM domains are never
removed, however, for devices that wish to populate the PM domains from
within a driver, having the ability to remove a PM domain if the probing
of the device fails or the driver is unloaded is necessary.
Add the function pm_genpd_remove() to remove a PM domain by referencing
it's generic_pm_domain structure. Note that the bulk of the code that
removes the PM domain is placed in a separate local function
genpd_remove() (which is called by pm_genpd_remove()). The code is
structured in this way to prepare for adding another function to remove
a PM domain by provider that will also call genpd_remove(). Note that
users of genpd_remove() must call this function with the mutex,
gpd_list_lock, held.
PM domains can only be removed if the associated provider has been
removed, they are not a parent domain to another PM domain and have no
devices associated with them.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
It is possible that a device has more than one provider of PM domains
and to support the removal of a PM domain by provider, it is necessary
to store a reference to the provider in the PM domain structure.
Therefore, store a reference to the firmware node handle in the PM
domain structure and populate it when providers (only device-tree based
providers are currently supported by PM domains) are registered.
Please note that when removing PM domains, it is necessary to verify
that the PM domain provider has been removed from the list of providers
before the PM domain can be removed. To do this add another member to
the PM domain structure that indicates if the provider is present and
set this member accordingly when providers are added and removed.
Initialise the 'provider' and 'has_provider' members of the
generic_pm_domain structure when a PM domains is added by calling
pm_genpd_init().
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
In order to remove PM domains safely from the list of PM domains,
it is necessary to adding locking for the PM domain list around any
places where devices or subdomains are added to a PM domain.
There are places where a reference to a PM domain is obtained via
calling of_genpd_get_from_provider() before adding the device or the
subdomain. In these cases a lock for the PM domain list needs to be
held around the call to of_genpd_get_from_provider() and the call to
add the device/subdomain. To avoid deadlocks by multiple attempts to
obtain the PM domain list lock, add functions genpd_add_device() and
genpd_add_subdomain() which require the user to hold the PM domain
list lock when calling.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
When a PM domain provider is added, there is currently no way to tell if
any PM domains associated with the provider are present. Naturally, the
PM domain provider should not be registered if the PM domains have not
been added. Nonetheless, verify that the PM domain(s) associated with a
provider are present when registering the PM domain provider.
This change adds a dependency on the function pm_genpd_present() when
CONFIG_PM_GENERIC_DOMAINS_OF is enabled and so ensure this function is
available when CONFIG_PM_GENERIC_DOMAINS_OF selected.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Functions __of_genpd_xlate_simple(), __of_genpd_xlate_onecell() and
__of_genpd_add_provider() are not used outside of the core generic PM
domain code. Therefore, reduce the number of APIs exposed by making
these static. At the same time don't expose the typedef for
genpd_xlate_t either and make this a local definition as well.
The functions are renamed to follow the naming conventions for static
functions in the generic PM domain core.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
There should be no need to expose the generic_pm_domain structure to
clients and this eliminates the need to implement reference counting for
any external reference to a PM domain. Therefore, make the functions
pm_genpd_lookup_dev() and of_genpd_get_from_provider() private to the
PM domain core. The functions are renamed in accordance with the naming
conventions for genpd static functions.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Ideally, if we are returning a reference to a PM domain via a call to
of_genpd_get_from_provider(), then we should keep track of such
references via a reference count. The reference count could then be used
to determine if a PM domain can be safely removed. Alternatively, it is
possible to avoid such external references by providing APIs to access
the PM domain and hence, eliminate any calls to
of_genpd_get_from_provider().
Add new helper functions for adding a device and a subdomain to a PM
domain when using device-tree, so that external calls to
of_genpd_get_from_provider() can be removed.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap fixes from Mark Brown:
"Several fixes here, the main one being the change from Lars-Peter
which I'd been letting soak in -next since the merge window in case it
uncovered further issues as it's a minimal fix rather than a change
addressing the root cause of the problems (which would've been too
invasive for -rc):
- The biggest change is a fix from Lars-Peter to ensure that we don't
create overlapping rbtree nodes which in turn avoids returning
corrupt cache values to users, fixing some issues that were exposed
by some recent optimisations with certain access patterns but had
been present for a long time.
- A fix from Elaine Zhang to stop us updating the cache if we get an
I/O error when writing to the hardware.
- A fix fromm Maarten ter Huurne to avoid uninitialized defaults in
cases where we have non-readable registers but are initializing the
cache by reading from the device"
* tag 'regmap-fix-v4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: drop cache if the bus transfer error
regmap: rbtree: Avoid overlapping nodes
regmap: cache: Fix num_reg_defaults computation from reg_defaults_raw
|
|
|
|
|
|
This commit appends a few _rcuidle suffixes to fix the following
RCU-used-from-idle bug:
> ===============================
> [ INFO: suspicious RCU usage. ]
> 4.6.0-rc5-next-20160426+ #1116 Not tainted
> -------------------------------
> include/trace/events/rpm.h:95 suspicious rcu_dereference_check() usage!
>
> other info that might help us debug this:
>
>
> RCU used illegally from idle CPU!
> rcu_scheduler_active = 1, debug_locks = 0
> RCU used illegally from extended quiescent state!
> 1 lock held by swapper/0/0:
> #0: (&(&dev->power.lock)->rlock){-.-...}, at: [<c052cc2c>] __rpm_callback+0x58/0x60
>
> stack backtrace:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6.0-rc5-next-20160426+ #1116
> Hardware name: Generic OMAP36xx (Flattened Device Tree)
> [<c0110290>] (unwind_backtrace) from [<c010c3a8>] (show_stack+0x10/0x14)
> [<c010c3a8>] (show_stack) from [<c047fd68>] (dump_stack+0xb0/0xe4)
> [<c047fd68>] (dump_stack) from [<c052d5d0>] (rpm_suspend+0x580/0x768)
> [<c052d5d0>] (rpm_suspend) from [<c052ec58>] (__pm_runtime_suspend+0x64/0x84)
> [<c052ec58>] (__pm_runtime_suspend) from [<c04bf25c>] (omap2_gpio_prepare_for_idle+0x5c/0x70)
> [<c04bf25c>] (omap2_gpio_prepare_for_idle) from [<c0125568>] (omap_sram_idle+0x140/0x244)
> [<c0125568>] (omap_sram_idle) from [<c01269dc>] (omap3_enter_idle_bm+0xfc/0x1ec)
> [<c01269dc>] (omap3_enter_idle_bm) from [<c0601bdc>] (cpuidle_enter_state+0x80/0x3d4)
> [<c0601bdc>] (cpuidle_enter_state) from [<c0183b08>] (cpu_startup_entry+0x198/0x3a0)
> [<c0183b08>] (cpu_startup_entry) from [<c0b00c0c>] (start_kernel+0x354/0x3c8)
> [<c0b00c0c>] (start_kernel) from [<8000807c>] (0x8000807c)
In the immortal words of Steven Rostedt, "*Whack* *Whack* *Whack*!!!"
Reported-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
WhACKED-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
This commit applies another _rcuidle suffix to fix an RCU use from
idle.
> ===============================
> [ INFO: suspicious RCU usage. ]
> 4.6.0-rc5-next-20160426+ #1122 Not tainted
> -------------------------------
> include/trace/events/rpm.h:69 suspicious rcu_dereference_check() usage!
>
> other info that might help us debug this:
>
>
> RCU used illegally from idle CPU!
> rcu_scheduler_active = 1, debug_locks = 0
> RCU used illegally from extended quiescent state!
> 1 lock held by swapper/0/0:
> #0: (&(&dev->power.lock)->rlock){-.-...}, at: [<c052e3dc>] __pm_runtime_resume+0x3c/0x64
>
> stack backtrace:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6.0-rc5-next-20160426+ #1122
> Hardware name: Generic OMAP36xx (Flattened Device Tree)
> [<c0110290>] (unwind_backtrace) from [<c010c3a8>] (show_stack+0x10/0x14)
> [<c010c3a8>] (show_stack) from [<c047fd68>] (dump_stack+0xb0/0xe4)
> [<c047fd68>] (dump_stack) from [<c052e178>] (rpm_resume+0x5cc/0x7f4)
> [<c052e178>] (rpm_resume) from [<c052e3ec>] (__pm_runtime_resume+0x4c/0x64)
> [<c052e3ec>] (__pm_runtime_resume) from [<c04bf2c4>] (omap2_gpio_resume_after_idle+0x54/0x68)
> [<c04bf2c4>] (omap2_gpio_resume_after_idle) from [<c01269dc>] (omap3_enter_idle_bm+0xfc/0x1ec)
> [<c01269dc>] (omap3_enter_idle_bm) from [<c060198c>] (cpuidle_enter_state+0x80/0x3d4)
> [<c060198c>] (cpuidle_enter_state) from [<c0183b08>] (cpu_startup_entry+0x198/0x3a0)
> [<c0183b08>] (cpu_startup_entry) from [<c0b00c0c>] (start_kernel+0x354/0x3c8)
> [<c0b00c0c>] (start_kernel) from [<8000807c>] (0x8000807c)
Reported-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
If device_add_property_set() is called for a device, a secondary fwnode
is allocated and assigned to the device but currently not freed once the
device is removed.
This can be triggered on Apple Macs if a Thunderbolt device is plugged
in on boot since Apple's NHI EFI driver sets a number of properties for
that device which are leaked on unplug.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
regmap_write
->_regmap_raw_write
-->regcache_write first and than use map->bus->write to wirte i2c or spi
But if the i2c or spi transfer failed, But the cache is updated, So if I use
regmap_read will get the cache data which is not the real register value.
Signed-off-by: Elaine Zhang <zhangqing@rock-chips.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Debugfs support for PM domains is only enabled if both CONFIG_PM_DEBUG
and CONFIG_PM_ADVANCED_DEBUG are enabled. CONFIG_PM_ADVANCED_DEBUG is
described as "extra PM attributes in sysfs for low-level
debugging/testing" which does not seem related.
Given that the debugfs for PM domains only allows users to view the
state of the PM domains, always enable debugfs support for PM domains
if PM domains and debugfs support is enabled.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"A few more fixes and cleanups in the x86-64 low-level hibernation
code, PM core, cpufreq (Kconfig and intel_pstate), and the operating
points framework.
Specifics:
- Prevent the low-level assembly hibernate code on x86-64 from
referring to __PAGE_OFFSET directly as a symbol which doesn't work
when the kernel identity mapping base is randomized, in which case
__PAGE_OFFSET is a variable (Rafael Wysocki).
- Avoid selecting CPU_FREQ_STAT by default as the statistics are not
required for proper cpufreq operation (Borislav Petkov).
- Add Skylake-X and Broadwell-X IDs to the intel_pstate's list of
processors where out-of-band (OBB) control of P-states is possible
and if that is in use, intel_pstate should not attempt to manage
P-states (Srinivas Pandruvada).
- Drop some unnecessary checks from the wakeup IRQ handling code in
the PM core (Markus Elfring).
- Reduce the number operating performance point (OPP) lookups in one
of the OPP framework's helper functions (Jisheng Zhang)"
* tag 'pm-extra-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
x86/power/64: Do not refer to __PAGE_OFFSET from assembly code
cpufreq: Do not default-yes CPU_FREQ_STAT
cpufreq: intel_pstate: Add more out-of-band IDs
PM / OPP: optimize dev_pm_opp_set_rate() performance a bit
PM-wakeup: Delete unnecessary checks before three function calls
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"RTC for 4.8
Cleanups:
- huge cleanup of rtc-generic and char/genrtc this allowed to cleanup
rtc-cmos, rtc-sh, rtc-m68k, rtc-powerpc and rtc-parisc
- move mn10300 to rtc-cmos
Subsystem:
- fix wakealarms after hibernate
- multiples fixes for rctest
- simplify implementations of .read_alarm
New drivers:
- Maxim MAX6916
Drivers:
- ds1307: fix weekday
- m41t80: add wakeup support
- pcf85063: add support for PCF85063A variant
- rv8803: extend i2c fix and other fixes
- s35390a: fix alarm reading, this fixes instant reboot after
shutdown for QNAP TS-41x
- s3c: clock fixes"
* tag 'rtc-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (65 commits)
rtc: rv8803: Clear V1F when setting the time
rtc: rv8803: Stop the clock while setting the time
rtc: rv8803: Always apply the I²C workaround
rtc: rv8803: Fix read day of week
rtc: rv8803: Remove the check for valid time
rtc: rv8803: Kconfig: Indicate rx8900 support
rtc: asm9260: remove .owner field for driver
rtc: at91sam9: Fix missing spin_lock_init()
rtc: m41t80: add suspend handlers for alarm IRQ
rtc: m41t80: make it a real error message
rtc: pcf85063: Add support for the PCF85063A device
rtc: pcf85063: fix year range
rtc: hym8563: in .read_alarm set .tm_sec to 0 to signal minute accuracy
rtc: explicitly set tm_sec = 0 for drivers with minute accurancy
rtc: s3c: Add s3c_rtc_{enable/disable}_clk in s3c_rtc_setfreq()
rtc: s3c: Remove unnecessary call to disable already disabled clock
rtc: abx80x: use devm_add_action_or_reset()
rtc: m41t80: use devm_add_action_or_reset()
rtc: fix a typo and reduce three empty lines to one
rtc: s35390a: improve two comments in .set_alarm
...
|
|
* pm-sleep:
x86/power/64: Do not refer to __PAGE_OFFSET from assembly code
* pm-cpufreq:
cpufreq: Do not default-yes CPU_FREQ_STAT
cpufreq: intel_pstate: Add more out-of-band IDs
* pm-core:
PM-wakeup: Delete unnecessary checks before three function calls
* pm-opp:
PM / OPP: optimize dev_pm_opp_set_rate() performance a bit
|
|
When searching for a suitable node that should be used for inserting a new
register, which does not fall within the range of any existing node, we not
only looks for nodes which are directly adjacent to the new register, but
for nodes within a certain proximity. This is done to avoid creating lots
of small nodes with just a few registers spacing in between, which would
increase memory usage as well as tree traversal time.
This means there might be multiple node candidates which fall within the
proximity range of the new register. If we choose the first node we
encounter, under certain register insertion patterns it is possible to end
up with overlapping ranges. This will break order in the rbtree and can
cause the cached register value to become corrupted.
E.g. take the simplified example where the proximity range is 2 and the
register insertion sequence is 1, 4, 2, 3, 5.
* Insert of register 1 creates a new node, this is the root of the rbtree
* Insert of register 4 creates a new node, which is inserted to the right
of the root.
* Insert of register 2 gets inserted to the first node
* Insert of register 3 gets inserted to the first node
* Insert of register 5 also gets inserted into the first node since
this is the first node encountered and it is within the proximity range.
Now there are two overlapping nodes.
To avoid this always choose the node that is closest to the new register.
This will ensure that nodes will not overlap. The tree traversal is still
done as a binary search, we just don't stop at the first node found. So the
complexity of the algorithm stays within the same order.
Ideally if a new register is in the range of two adjacent blocks those
blocks should be merged, but that is a much more invasive change and left
for later.
The issue was initially introduced in commit 472fdec7380c ("regmap: rbtree:
Reduce number of nodes, take 2"), but became much more exposed by commit
6399aea629b0 ("regmap: rbtree: When adding a reg do a bsearch for target
node") which changed the order in which nodes are looked-up.
Fixes: 6399aea629b0 ("regmap: rbtree: When adding a reg do a bsearch for target node")
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Some systems are memory constrained but they need to load very large
firmwares. The firmware subsystem allows drivers to request this
firmware be loaded from the filesystem, but this requires that the
entire firmware be loaded into kernel memory first before it's provided
to the driver. This can lead to a situation where we map the firmware
twice, once to load the firmware into kernel memory and once to copy the
firmware into the final resting place.
This creates needless memory pressure and delays loading because we have
to copy from kernel memory to somewhere else. Let's add a
request_firmware_into_buf() API that allows drivers to request firmware
be loaded directly into a pre-allocated buffer. This skips the
intermediate step of allocating a buffer in kernel memory to hold the
firmware image while it's read from the filesystem. It also requires
that drivers know how much memory they'll require before requesting the
firmware and negates any benefits of firmware caching because the
firmware layer doesn't manage the buffer lifetime.
For a 16MB buffer, about half the time is spent performing a memcpy from
the buffer to the final resting place. I see loading times go from
0.081171 seconds to 0.047696 seconds after applying this patch. Plus
the vmalloc pressure is reduced.
This is based on a patch from Vikram Mulukutla on codeaurora.org:
https://www.codeaurora.org/cgit/quic/la/kernel/msm-3.18/commit/drivers/base/firmware_class.c?h=rel/msm-3.18&id=0a328c5f6cd999f5c591f172216835636f39bcb5
Link: http://lkml.kernel.org/r/20160607164741.31849-4-stephen.boyd@linaro.org
Signed-off-by: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Vikram Mulukutla <markivx@codeaurora.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Some low memory systems with complex peripherals cannot afford to have
the relatively large firmware images taking up valuable memory during
suspend and resume. Change the internal implementation of
firmware_class to disallow caching based on a configurable option. In
the near future, variants of request_firmware will take advantage of
this feature.
Link: http://lkml.kernel.org/r/20160607164741.31849-3-stephen.boyd@linaro.org
[stephen.boyd@linaro.org: Drop firmware_desc design and use flags]
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
Signed-off-by: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Some systems are memory constrained but they need to load very large
firmwares. The firmware subsystem allows drivers to request this
firmware be loaded from the filesystem, but this requires that the
entire firmware be loaded into kernel memory first before it's provided
to the driver. This can lead to a situation where we map the firmware
twice, once to load the firmware into kernel memory and once to copy the
firmware into the final resting place.
This design creates needless memory pressure and delays loading because
we have to copy from kernel memory to somewhere else. This patch sets
adds support to the request firmware API to load the firmware directly
into a pre-allocated buffer, skipping the intermediate copying step and
alleviating memory pressure during firmware loading. The drawback is
that we can't use the firmware caching feature because the memory for
the firmware cache is not managed by the firmware layer.
This patch (of 3):
We use similar structured code to read and write the kmapped firmware
pages. The only difference is read copies from the kmap region and
write copies to it. Consolidate this into one function to reduce
duplication.
Link: http://lkml.kernel.org/r/20160607164741.31849-2-stephen.boyd@linaro.org
Signed-off-by: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Vikram Mulukutla <markivx@codeaurora.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There was only one use of __initdata_refok and __exit_refok
__init_refok was used 46 times against 82 for __ref.
Those definitions are obsolete since commit 312b1485fb50 ("Introduce new
section reference annotations tags: __ref, __refdata, __refconst")
This patch removes the following compatibility definitions and replaces
them treewide.
/* compatibility defines */
#define __init_refok __ref
#define __initdata_refok __refdata
#define __exit_refok __ref
I can also provide separate patches if necessary.
(One patch per tree and check in 1 month or 2 to remove old definitions)
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1466796271-3043-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"The majority of this update is about ASoC, including a few new
drivers, and the rest are mostly minor changes. The only substantial
change in ALSA core is about the additional error handling in the
compress-offload API. Below are highlights:
- Add the error propagating support in compress-offload API
- HD-audio: a usual Dell headset fixup, an Intel HDMI/DP fix, and the
default mixer setup change ot turn off the loopback
- Lots of updates for ASoC Intel drivers, mostly board support and
bug fixing, and to the NAU8825 driver
- Work on generalizing bits of simple-card to allow more code sharing
with the Renesas rsrc-card (which can't use simple-card due to DPCM)
- Removal of the Odroid X2 driver due to replacement with simple-card
- Support for several new Mediatek platforms and associated boards
- New ASoC drivers for Allwinner A10, Analog Devices ADAU7002,
Broadcom Cygnus, Cirrus Logic CS35L33 and CS53L30, Maxim MAX8960
and MAX98504, Realtek RT5514 and Wolfson WM8758"
* tag 'sound-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (278 commits)
sound: oss: Use kernel_read_file_from_path() for mod_firmware_load()
ASoC: Intel: Skylake: Delete an unnecessary check before the function call "release_firmware"
ASoC: Intel: Skylake: Fix NULL Pointer exception in dynamic_debug.
ASoC: samsung: Specify DMA channels through struct snd_dmaengine_pcm_config
ASoC: samsung: Fix error paths in the I2S driver's probe()
ASoC: cs53l30: Fix bit shift issue of TDM mode
ASoC: cs53l30: Fix a bug for TDM slot location validation
ASoC: rockchip: correct the spdif clk
ALSA: echoaudio: purge contradictions between dimension matrix members and total number of members
ASoC: rsrc-card: use asoc_simple_card_parse_card_name()
ASoC: rsrc-card: use asoc_simple_dai instead of rsrc_card_dai
ASoC: rsrc-card: use asoc_simple_card_parse_dailink_name()
ASoC: simple-card: use asoc_simple_card_parse_card_name()
ASoC: simple-card-utils: add asoc_simple_card_parse_card_name()
ASoC: simple-card: use asoc_simple_card_parse_dailink_name()
ASoC: simple-card-utils: add asoc_simple_card_set_dailink_name()
ASoC: nau8825: drop redundant idiom when converting integer to boolean
ASoC: nau8825: jack connection decision with different insertion logic
ASoC: mediatek: Add HDMI dai-links to the mt8173-rt5650 machine driver
ASoC: mediatek: mt2701: fix non static symbol warning
...
|
|
In 3245d460 (regmap: cache: Fall back to register by register read for
cache defaults) non-readable registers are skipped when initializing
reg_defaults, but are still included in num_reg_defaults. So there can
be uninitialized entries at the end of reg_defaults, which can cause
problems when the register cache initializes from the full array.
Fixed it by excluding non-readable registers from the count as well.
Signed-off-by: Maarten ter Huurne <maarten@treewalker.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Currently, NR_KERNEL_STACK tracks the number of kernel stacks in a zone.
This only makes sense if each kernel stack exists entirely in one zone,
and allowing vmapped stacks could break this assumption.
Since frv has THREAD_SIZE < PAGE_SIZE, we need to track kernel stack
allocations in a unit that divides both THREAD_SIZE and PAGE_SIZE on all
architectures. Keep it simple and use KiB.
Link: http://lkml.kernel.org/r/083c71e642c5fa5f1b6898902e1b2db7b48940d4.1468523549.git.luto@kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There are now a number of accounting oddities such as mapped file pages
being accounted for on the node while the total number of file pages are
accounted on the zone. This can be coped with to some extent but it's
confusing so this patch moves the relevant file-based accounted. Due to
throttling logic in the page allocator for reliable OOM detection, it is
still necessary to track dirty and writeback pages on a per-zone basis.
[mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting]
Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net
Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
NR_FILE_PAGES is the number of file pages.
NR_FILE_MAPPED is the number of mapped file pages.
NR_ANON_PAGES is the number of mapped anon pages.
This is unhelpful naming as it's easy to confuse NR_FILE_MAPPED and
NR_ANON_PAGES for mapped pages. This patch renames NR_ANON_PAGES so we
have
NR_FILE_PAGES is the number of file pages.
NR_FILE_MAPPED is the number of mapped file pages.
NR_ANON_MAPPED is the number of mapped anon pages.
Link: http://lkml.kernel.org/r/1467970510-21195-19-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Reclaim makes decisions based on the number of pages that are mapped but
it's mixing node and zone information. Account NR_FILE_MAPPED and
NR_ANON_PAGES pages on the node.
Link: http://lkml.kernel.org/r/1467970510-21195-18-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This moves the LRU lists from the zone to the node and related data such
as counters, tracing, congestion tracking and writeback tracking.
Unfortunately, due to reclaim and compaction retry logic, it is
necessary to account for the number of LRU pages on both zone and node
logic. Most reclaim logic is based on the node counters but the retry
logic uses the zone counters which do not distinguish inactive and
active sizes. It would be possible to leave the LRU counters on a
per-zone basis but it's a heavier calculation across multiple cache
lines that is much more frequent than the retry checks.
Other than the LRU counters, this is mostly a mechanical patch but note
that it introduces a number of anomalies. For example, the scans are
per-zone but using per-node counters. We also mark a node as congested
when a zone is congested. This causes weird problems that are fixed
later but is easier to review.
In the event that there is excessive overhead on 32-bit systems due to
the nodes being on LRU then there are two potential solutions
1. Long-term isolation of highmem pages when reclaim is lowmem
When pages are skipped, they are immediately added back onto the LRU
list. If lowmem reclaim persisted for long periods of time, the same
highmem pages get continually scanned. The idea would be that lowmem
keeps those pages on a separate list until a reclaim for highmem pages
arrives that splices the highmem pages back onto the LRU. It potentially
could be implemented similar to the UNEVICTABLE list.
That would reduce the skip rate with the potential corner case is that
highmem pages have to be scanned and reclaimed to free lowmem slab pages.
2. Linear scan lowmem pages if the initial LRU shrink fails
This will break LRU ordering but may be preferable and faster during
memory pressure than skipping LRU pages.
Link: http://lkml.kernel.org/r/1467970510-21195-4-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patchset: "Move LRU page reclaim from zones to nodes v9"
This series moves LRUs from the zones to the node. While this is a
current rebase, the test results were based on mmotm as of June 23rd.
Conceptually, this series is simple but there are a lot of details.
Some of the broad motivations for this are;
1. The residency of a page partially depends on what zone the page was
allocated from. This is partially combatted by the fair zone allocation
policy but that is a partial solution that introduces overhead in the
page allocator paths.
2. Currently, reclaim on node 0 behaves slightly different to node 1. For
example, direct reclaim scans in zonelist order and reclaims even if
the zone is over the high watermark regardless of the age of pages
in that LRU. Kswapd on the other hand starts reclaim on the highest
unbalanced zone. A difference in distribution of file/anon pages due
to when they were allocated results can result in a difference in
again. While the fair zone allocation policy mitigates some of the
problems here, the page reclaim results on a multi-zone node will
always be different to a single-zone node.
it was scheduled on as a result.
3. kswapd and the page allocator scan zones in the opposite order to
avoid interfering with each other but it's sensitive to timing. This
mitigates the page allocator using pages that were allocated very recently
in the ideal case but it's sensitive to timing. When kswapd is allocating
from lower zones then it's great but during the rebalancing of the highest
zone, the page allocator and kswapd interfere with each other. It's worse
if the highest zone is small and difficult to balance.
4. slab shrinkers are node-based which makes it harder to identify the exact
relationship between slab reclaim and LRU reclaim.
The reason we have zone-based reclaim is that we used to have
large highmem zones in common configurations and it was necessary
to quickly find ZONE_NORMAL pages for reclaim. Today, this is much
less of a concern as machines with lots of memory will (or should) use
64-bit kernels. Combinations of 32-bit hardware and 64-bit hardware are
rare. Machines that do use highmem should have relatively low highmem:lowmem
ratios than we worried about in the past.
Conceptually, moving to node LRUs should be easier to understand. The
page allocator plays fewer tricks to game reclaim and reclaim behaves
similarly on all nodes.
The series has been tested on a 16 core UMA machine and a 2-socket 48
core NUMA machine. The UMA results are presented in most cases as the NUMA
machine behaved similarly.
pagealloc
---------
This is a microbenchmark that shows the benefit of removing the fair zone
allocation policy. It was tested uip to order-4 but only orders 0 and 1 are
shown as the other orders were comparable.
4.7.0-rc4 4.7.0-rc4
mmotm-20160623 nodelru-v9
Min total-odr0-1 490.00 ( 0.00%) 457.00 ( 6.73%)
Min total-odr0-2 347.00 ( 0.00%) 329.00 ( 5.19%)
Min total-odr0-4 288.00 ( 0.00%) 273.00 ( 5.21%)
Min total-odr0-8 251.00 ( 0.00%) 239.00 ( 4.78%)
Min total-odr0-16 234.00 ( 0.00%) 222.00 ( 5.13%)
Min total-odr0-32 223.00 ( 0.00%) 211.00 ( 5.38%)
Min total-odr0-64 217.00 ( 0.00%) 208.00 ( 4.15%)
Min total-odr0-128 214.00 ( 0.00%) 204.00 ( 4.67%)
Min total-odr0-256 250.00 ( 0.00%) 230.00 ( 8.00%)
Min total-odr0-512 271.00 ( 0.00%) 269.00 ( 0.74%)
Min total-odr0-1024 291.00 ( 0.00%) 282.00 ( 3.09%)
Min total-odr0-2048 303.00 ( 0.00%) 296.00 ( 2.31%)
Min total-odr0-4096 311.00 ( 0.00%) 309.00 ( 0.64%)
Min total-odr0-8192 316.00 ( 0.00%) 314.00 ( 0.63%)
Min total-odr0-16384 317.00 ( 0.00%) 315.00 ( 0.63%)
Min total-odr1-1 742.00 ( 0.00%) 712.00 ( 4.04%)
Min total-odr1-2 562.00 ( 0.00%) 530.00 ( 5.69%)
Min total-odr1-4 457.00 ( 0.00%) 433.00 ( 5.25%)
Min total-odr1-8 411.00 ( 0.00%) 381.00 ( 7.30%)
Min total-odr1-16 381.00 ( 0.00%) 356.00 ( 6.56%)
Min total-odr1-32 372.00 ( 0.00%) 346.00 ( 6.99%)
Min total-odr1-64 372.00 ( 0.00%) 343.00 ( 7.80%)
Min total-odr1-128 375.00 ( 0.00%) 351.00 ( 6.40%)
Min total-odr1-256 379.00 ( 0.00%) 351.00 ( 7.39%)
Min total-odr1-512 385.00 ( 0.00%) 355.00 ( 7.79%)
Min total-odr1-1024 386.00 ( 0.00%) 358.00 ( 7.25%)
Min total-odr1-2048 390.00 ( 0.00%) 362.00 ( 7.18%)
Min total-odr1-4096 390.00 ( 0.00%) 362.00 ( 7.18%)
Min total-odr1-8192 388.00 ( 0.00%) 363.00 ( 6.44%)
This shows a steady improvement throughout. The primary benefit is from
reduced system CPU usage which is obvious from the overall times;
4.7.0-rc4 4.7.0-rc4
mmotm-20160623nodelru-v8
User 189.19 191.80
System 2604.45 2533.56
Elapsed 2855.30 2786.39
The vmstats also showed that the fair zone allocation policy was definitely
removed as can be seen here;
4.7.0-rc3 4.7.0-rc3
mmotm-20160623 nodelru-v8
DMA32 allocs 28794729769 0
Normal allocs 48432501431 77227309877
Movable allocs 0 0
tiobench on ext4
----------------
tiobench is a benchmark that artifically benefits if old pages remain resident
while new pages get reclaimed. The fair zone allocation policy mitigates this
problem so pages age fairly. While the benchmark has problems, it is important
that tiobench performance remains constant as it implies that page aging
problems that the fair zone allocation policy fixes are not re-introduced.
4.7.0-rc4 4.7.0-rc4
mmotm-20160623 nodelru-v9
Min PotentialReadSpeed 89.65 ( 0.00%) 90.21 ( 0.62%)
Min SeqRead-MB/sec-1 82.68 ( 0.00%) 82.01 ( -0.81%)
Min SeqRead-MB/sec-2 72.76 ( 0.00%) 72.07 ( -0.95%)
Min SeqRead-MB/sec-4 75.13 ( 0.00%) 74.92 ( -0.28%)
Min SeqRead-MB/sec-8 64.91 ( 0.00%) 65.19 ( 0.43%)
Min SeqRead-MB/sec-16 62.24 ( 0.00%) 62.22 ( -0.03%)
Min RandRead-MB/sec-1 0.88 ( 0.00%) 0.88 ( 0.00%)
Min RandRead-MB/sec-2 0.95 ( 0.00%) 0.92 ( -3.16%)
Min RandRead-MB/sec-4 1.43 ( 0.00%) 1.34 ( -6.29%)
Min RandRead-MB/sec-8 1.61 ( 0.00%) 1.60 ( -0.62%)
Min RandRead-MB/sec-16 1.80 ( 0.00%) 1.90 ( 5.56%)
Min SeqWrite-MB/sec-1 76.41 ( 0.00%) 76.85 ( 0.58%)
Min SeqWrite-MB/sec-2 74.11 ( 0.00%) 73.54 ( -0.77%)
Min SeqWrite-MB/sec-4 80.05 ( 0.00%) 80.13 ( 0.10%)
Min SeqWrite-MB/sec-8 72.88 ( 0.00%) 73.20 ( 0.44%)
Min SeqWrite-MB/sec-16 75.91 ( 0.00%) 76.44 ( 0.70%)
Min RandWrite-MB/sec-1 1.18 ( 0.00%) 1.14 ( -3.39%)
Min RandWrite-MB/sec-2 1.02 ( 0.00%) 1.03 ( 0.98%)
Min RandWrite-MB/sec-4 1.05 ( 0.00%) 0.98 ( -6.67%)
Min RandWrite-MB/sec-8 0.89 ( 0.00%) 0.92 ( 3.37%)
Min RandWrite-MB/sec-16 0.92 ( 0.00%) 0.93 ( 1.09%)
4.7.0-rc4 4.7.0-rc4
mmotm-20160623 approx-v9
User 645.72 525.90
System 403.85 331.75
Elapsed 6795.36 6783.67
This shows that the series has little or not impact on tiobench which is
desirable and a reduction in system CPU usage. It indicates that the fair
zone allocation policy was removed in a manner that didn't reintroduce
one class of page aging bug. There were only minor differences in overall
reclaim activity
4.7.0-rc4 4.7.0-rc4
mmotm-20160623nodelru-v8
Minor Faults 645838 647465
Major Faults 573 640
Swap Ins 0 0
Swap Outs 0 0
DMA allocs 0 0
DMA32 allocs 46041453 44190646
Normal allocs 78053072 79887245
Movable allocs 0 0
Allocation stalls 24 67
Stall zone DMA 0 0
Stall zone DMA32 0 0
Stall zone Normal 0 2
Stall zone HighMem 0 0
Stall zone Movable 0 65
Direct pages scanned 10969 30609
Kswapd pages scanned 93375144 93492094
Kswapd pages reclaimed 93372243 93489370
Direct pages reclaimed 10969 30609
Kswapd efficiency 99% 99%
Kswapd velocity 13741.015 13781.934
Direct efficiency 100% 100%
Direct velocity 1.614 4.512
Percentage direct scans 0% 0%
kswapd activity was roughly comparable. There were differences in direct
reclaim activity but negligible in the context of the overall workload
(velocity of 4 pages per second with the patches applied, 1.6 pages per
second in the baseline kernel).
pgbench read-only large configuration on ext4
---------------------------------------------
pgbench is a database benchmark that can be sensitive to page reclaim
decisions. This also checks if removing the fair zone allocation policy
is safe
pgbench Transactions
4.7.0-rc4 4.7.0-rc4
mmotm-20160623 nodelru-v8
Hmean 1 188.26 ( 0.00%) 189.78 ( 0.81%)
Hmean 5 330.66 ( 0.00%) 328.69 ( -0.59%)
Hmean 12 370.32 ( 0.00%) 380.72 ( 2.81%)
Hmean 21 368.89 ( 0.00%) 369.00 ( 0.03%)
Hmean 30 382.14 ( 0.00%) 360.89 ( -5.56%)
Hmean 32 428.87 ( 0.00%) 432.96 ( 0.95%)
Negligible differences again. As with tiobench, overall reclaim activity
was comparable.
bonnie++ on ext4
----------------
No interesting performance difference, negligible differences on reclaim
stats.
paralleldd on ext4
------------------
This workload uses varying numbers of dd instances to read large amounts of
data from disk.
4.7.0-rc3 4.7.0-rc3
mmotm-20160623 nodelru-v9
Amean Elapsd-1 186.04 ( 0.00%) 189.41 ( -1.82%)
Amean Elapsd-3 192.27 ( 0.00%) 191.38 ( 0.46%)
Amean Elapsd-5 185.21 ( 0.00%) 182.75 ( 1.33%)
Amean Elapsd-7 183.71 ( 0.00%) 182.11 ( 0.87%)
Amean Elapsd-12 180.96 ( 0.00%) 181.58 ( -0.35%)
Amean Elapsd-16 181.36 ( 0.00%) 183.72 ( -1.30%)
4.7.0-rc4 4.7.0-rc4
mmotm-20160623 nodelru-v9
User 1548.01 1552.44
System 8609.71 8515.08
Elapsed 3587.10 3594.54
There is little or no change in performance but some drop in system CPU usage.
4.7.0-rc3 4.7.0-rc3
mmotm-20160623 nodelru-v9
Minor Faults 362662 367360
Major Faults 1204 1143
Swap Ins 22 0
Swap Outs 2855 1029
DMA allocs 0 0
DMA32 allocs 31409797 28837521
Normal allocs 46611853 49231282
Movable allocs 0 0
Direct pages scanned 0 0
Kswapd pages scanned 40845270 40869088
Kswapd pages reclaimed 40830976 40855294
Direct pages reclaimed 0 0
Kswapd efficiency 99% 99%
Kswapd velocity 11386.711 11369.769
Direct efficiency 100% 100%
Direct velocity 0.000 0.000
Percentage direct scans 0% 0%
Page writes by reclaim 2855 1029
Page writes file 0 0
Page writes anon 2855 1029
Page reclaim immediate 771 1628
Sector Reads 293312636 293536360
Sector Writes 18213568 18186480
Page rescued immediate 0 0
Slabs scanned 128257 132747
Direct inode steals 181 56
Kswapd inode steals 59 1131
It basically shows that kswapd was active at roughly the same rate in
both kernels. There was also comparable slab scanning activity and direct
reclaim was avoided in both cases. There appears to be a large difference
in numbers of inodes reclaimed but the workload has few active inodes and
is likely a timing artifact.
stutter
-------
stutter simulates a simple workload. One part uses a lot of anonymous
memory, a second measures mmap latency and a third copies a large file.
The primary metric is checking for mmap latency.
stutter
4.7.0-rc4 4.7.0-rc4
mmotm-20160623 nodelru-v8
Min mmap 16.6283 ( 0.00%) 13.4258 ( 19.26%)
1st-qrtle mmap 54.7570 ( 0.00%) 34.9121 ( 36.24%)
2nd-qrtle mmap 57.3163 ( 0.00%) 46.1147 ( 19.54%)
3rd-qrtle mmap 58.9976 ( 0.00%) 47.1882 ( 20.02%)
Max-90% mmap 59.7433 ( 0.00%) 47.4453 ( 20.58%)
Max-93% mmap 60.1298 ( 0.00%) 47.6037 ( 20.83%)
Max-95% mmap 73.4112 ( 0.00%) 82.8719 (-12.89%)
Max-99% mmap 92.8542 ( 0.00%) 88.8870 ( 4.27%)
Max mmap 1440.6569 ( 0.00%) 121.4201 ( 91.57%)
Mean mmap 59.3493 ( 0.00%) 42.2991 ( 28.73%)
Best99%Mean mmap 57.2121 ( 0.00%) 41.8207 ( 26.90%)
Best95%Mean mmap 55.9113 ( 0.00%) 39.9620 ( 28.53%)
Best90%Mean mmap 55.6199 ( 0.00%) 39.3124 ( 29.32%)
Best50%Mean mmap 53.2183 ( 0.00%) 33.1307 ( 37.75%)
Best10%Mean mmap 45.9842 ( 0.00%) 20.4040 ( 55.63%)
Best5%Mean mmap 43.2256 ( 0.00%) 17.9654 ( 58.44%)
Best1%Mean mmap 32.9388 ( 0.00%) 16.6875 ( 49.34%)
This shows a number of improvements with the worst-case outlier greatly
improved.
Some of the vmstats are interesting
4.7.0-rc4 4.7.0-rc4
mmotm-20160623nodelru-v8
Swap Ins 163 502
Swap Outs 0 0
DMA allocs 0 0
DMA32 allocs 618719206 1381662383
Normal allocs 891235743 564138421
Movable allocs 0 0
Allocation stalls 2603 1
Direct pages scanned 216787 2
Kswapd pages scanned 50719775 41778378
Kswapd pages reclaimed 41541765 41777639
Direct pages reclaimed 209159 0
Kswapd efficiency 81% 99%
Kswapd velocity 16859.554 14329.059
Direct efficiency 96% 0%
Direct velocity 72.061 0.001
Percentage direct scans 0% 0%
Page writes by reclaim 6215049 0
Page writes file 6215049 0
Page writes anon 0 0
Page reclaim immediate 70673 90
Sector Reads 81940800 81680456
Sector Writes 100158984 98816036
Page rescued immediate 0 0
Slabs scanned 1366954 22683
While this is not guaranteed in all cases, this particular test showed
a large reduction in direct reclaim activity. It's also worth noting
that no page writes were issued from reclaim context.
This series is not without its hazards. There are at least three areas
that I'm concerned with even though I could not reproduce any problems in
that area.
1. Reclaim/compaction is going to be affected because the amount of reclaim is
no longer targetted at a specific zone. Compaction works on a per-zone basis
so there is no guarantee that reclaiming a few THP's worth page pages will
have a positive impact on compaction success rates.
2. The Slab/LRU reclaim ratio is affected because the frequency the shrinkers
are called is now different. This may or may not be a problem but if it
is, it'll be because shrinkers are not called enough and some balancing
is required.
3. The anon/file reclaim ratio may be affected. Pages about to be dirtied are
distributed between zones and the fair zone allocation policy used to do
something very similar for anon. The distribution is now different but not
necessarily in any way that matters but it's still worth bearing in mind.
VM statistic counters for reclaim decisions are zone-based. If the kernel
is to reclaim on a per-node basis then we need to track per-node
statistics but there is no infrastructure for that. The most notable
change is that the old node_page_state is renamed to
sum_zone_node_page_state. The new node_page_state takes a pglist_data and
uses per-node stats but none exist yet. There is some renaming such as
vm_stat to vm_zone_stat and the addition of vm_node_stat and the renaming
of mod_state to mod_zone_state. Otherwise, this is mostly a mechanical
patch with no functional change. There is a lot of similarity between the
node and zone helpers which is unfortunate but there was no obvious way of
reusing the code and maintaining type safety.
Link: http://lkml.kernel.org/r/1467970510-21195-2-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In dev_pm_opp_set_rate(), _find_opp_table() is called 4 times: once by
_get_opp_clk(), once by dev_pm_opp_set_rate() itself, and twice by
dev_pm_opp_find_freq_ceil(). If there are several opp_tables in the
system, three times of opp table finding is a big waste. This patch
reduced the call of _find_opp_table() to twice.
Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The following functions test whether their argument is NULL
and then return immediately.
* dev_pm_arm_wake_irq
* dev_pm_disarm_wake_irq
* wakeup_source_unregister
Thus the test around the calls is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Pavel Machek <pavel@ucw.cz>
[ rjw: Minor whitespace adjustments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Merge updates from Andrew Morton:
- a few misc bits
- ocfs2
- most(?) of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits)
thp: fix comments of __pmd_trans_huge_lock()
cgroup: remove unnecessary 0 check from css_from_id()
cgroup: fix idr leak for the first cgroup root
mm: memcontrol: fix documentation for compound parameter
mm: memcontrol: remove BUG_ON in uncharge_list
mm: fix build warnings in <linux/compaction.h>
mm, thp: convert from optimistic swapin collapsing to conservative
mm, thp: fix comment inconsistency for swapin readahead functions
thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
shmem: split huge pages beyond i_size under memory pressure
thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
khugepaged: add support of collapse for tmpfs/shmem pages
shmem: make shmem_inode_info::lock irq-safe
khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
thp: extract khugepaged from mm/huge_memory.c
shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
shmem: add huge pages support
shmem: get_unmapped_area align huge page
shmem: prepare huge= mount option and sysfs knob
mm, rmap: account shmem thp pages
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"Several small updates and API enhancements:
- provide transparent unrolling of bulk writes into individual writes
so they can be used with devices without raw formatting.
- fix compatibility between I2C controllers supporting block commands
and devices with more than 8 bit wide registers.
- add some helpers for iopoll-like functionality and workarounds for
weird interrupt controllers"
* tag 'regmap-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: add iopoll-like polling macro
regmap: Support bulk writes for devices without raw formatting
regmap-i2c: Use i2c block command only if register value width is 8 bit
regmap: irq: Add support to call client specific pre/post interrupt service
regmap: Add file patterns for regmap device tree bindings
|