summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-05-09NFS: Fix an LOCK/OPEN race when unlinking an open fileChuck Lever
At Connectathon 2016, we found that recent upstream Linux clients would occasionally send a LOCK operation with a zero stateid. This appeared to happen in close proximity to another thread returning a delegation before unlinking the same file while it remained open. Earlier, the client received a write delegation on this file and returned the open stateid. Now, as it is getting ready to unlink the file, it returns the write delegation. But there is still an open file descriptor on that file, so the client must OPEN the file again before it returns the delegation. Since commit 24311f884189 ('NFSv4: Recovery of recalled read delegations is broken'), nfs_open_delegation_recall() clears the NFS_DELEGATED_STATE flag _before_ it sends the OPEN. This allows a racing LOCK on the same inode to be put on the wire before the OPEN operation has returned a valid open stateid. To eliminate this race, serialize delegation return with the acquisition of a file lock on the same file. Adopt the same approach as is used in the unlock path. This patch also eliminates a similar race seen when sending a LOCK operation at the same time as returning a delegation on the same file. Fixes: 24311f884189 ('NFSv4: Recovery of recalled read ... ') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> [Anna: Add sentence about LOCK / delegation race] Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-09nfs: have flexfiles mirror keep creds for both ro and rw layoutsJeff Layton
A mirror can be shared between multiple layouts, even with different iomodes. That makes stats gathering simpler, but it causes a problem when we get different creds in READ vs. RW layouts. The current code drops the newer credentials onto the floor when this occurs. That's problematic when you fetch a READ layout first, and then a RW. If the READ layout doesn't have the correct creds to do a write, then writes will fail. We could just overwrite the READ credentials with the RW ones, but that would break the ability for the server to fence the layout for reads if things go awry. We need to be able to revert to the earlier READ creds if the RW layout is returned afterward. The simplest fix is to just keep two sets of creds per mirror. One for READ layouts and one for RW, and then use the appropriate set depending on the iomode of the layout segment. Also fix up some RCU nits that sparse found. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-09nfs: get a reference to the credential in ff_layout_alloc_lsegJeff Layton
We're just as likely to have allocation problems here as we would if we delay looking up the credential like we currently do. Fix the code to get a rpc_cred reference early, as soon as the mirror is set up. This allows us to eliminate the mirror early if there is a problem getting an rpc credential. This also allows us to drop the uid/gid from the layout_mirror struct as well. In the event that we find an existing mirror where this one would go, we swap in the new creds unconditionally, and drop the reference to the old one. Note that the old ff_layout_update_mirror_cred function wouldn't set this pointer unless the DS version was 3, but we don't know what the DS version is at this point. I'm a little unclear on why it did that as you still need creds to talk to v4 servers as well. I have the code set it regardless of the DS version here. Also note the change to using generic creds instead of calling lookup_cred directly. With that change, we also need to populate the group_info pointer in the acred as some functions expect that to never be NULL. Instead of allocating one every time however, we can allocate one when the module is loaded and share it since the group_info is refcounted. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-09nfs: have ff_layout_get_ds_cred take a reference to the credJeff Layton
In later patches, we're going to want to allow the creds to be updated when we get a new layout with updated creds. Have this function take a reference to the cred that is later put once the call has been dispatched. Also, prepare for this change by ensuring we follow RCU rules when getting a reference to the cred as well. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-09nfs: don't call nfs4_ff_layout_prepare_ds from ff_layout_get_ds_credJeff Layton
All the callers already call that function before calling into here, so it ends up being a no-op anyway. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-09sunrpc: add a get_rpccred_rcu inlineJeff Layton
Sometimes we might have a RCU managed credential pointer and don't want to use locking to handle it. Add a function that will take a reference to the cred iff the refcount is not already zero. Callers can dereference the pointer under the rcu_read_lock and use that function to take a reference only if the cred is not on its way to destruction. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-09sunrpc: add rpc_lookup_generic_credWeston Andros Adamson
Add function rpc_lookup_generic_cred, which allows lookups of a generic credential that's not current_cred(). [jlayton: add gfp_t parm] Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-09sunrpc: plumb gfp_t parm into crcreate operationJeff Layton
We need to be able to call the generic_cred creator from different contexts. Add a gfp_t parm to the crcreate operation and to rpcauth_lookup_credcache. For now, we just push the gfp_t parms up one level to the *_lookup_cred functions. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-09SUNRPC: init xdr_stream for zero iov_len, page_lenBenjamin Coddington
An xdr_buf with head[0].iov_len = 0 and page_len = 0 will cause xdr_init_decode() to incorrectly setup the xdr_stream. Specifically, xdr->end is never initialized. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-09NFS: Save struct inode * inside nfs_commit_info to clarify usage of i_lockDave Wysochanski
Commit ea2cf22 created nfs_commit_info and saved &inode->i_lock inside this NFS specific structure. This obscures the usage of i_lock. Instead, save struct inode * so later it's clear the spinlock taken is i_lock. Should be no functional change. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-09nfs: add debug to directio "good_bytes" countingWeston Andros Adamson
This will pop a warning if we count too many good bytes. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-09pnfs: set NFS_IOHDR_REDO in pnfs_read_resend_pnfsWeston Andros Adamson
Like other resend paths, mark the (old) hdr as NFS_IOHDR_REDO. This ensures the hdr completion function will not count the (old) hdr as good bytes. Also, vector the error back through the hdr->task.tk_status like other retry calls. This fixes a bug with the FlexFiles layout where libaio was reporting more bytes read than requested. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-08Linux 4.6-rc7Linus Torvalds
2016-05-07Merge tag 'char-misc-4.6-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull misc driver fixes from Gfreg KH: "Here are three small fixes for some driver problems that were reported. Full details in the shortlog below. All of these have been in linux-next with no reported issues" * tag 'char-misc-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: nvmem: mxs-ocotp: fix buffer overflow in read Drivers: hv: vmbus: Fix signaling logic in hv_need_to_signal_on_read() misc: mic: Fix for double fetch security bug in VOP driver
2016-05-07Merge tag 'staging-4.6-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull IIO driver fixes from Grek KH: "It's really just IIO drivers here, some small fixes that resolve some 'crash on boot' errors that have shown up in the -rc series, and other bugfixes that are required. All have been in linux-next with no reported problems" * tag 'staging-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: iio: imu: mpu6050: Fix name/chip_id when using ACPI iio: imu: mpu6050: fix possible NULL dereferences iio:adc:at91-sama5d2: Repair crash on module removal iio: ak8975: fix maybe-uninitialized warning iio: ak8975: Fix NULL pointer exception on early interrupt
2016-05-07Merge tag 'usb-4.6-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some last-remaining fixes for USB drivers to resolve issues that have shown up in testing. And two new device ids as well. All of these have been in linux-next with no reported issues" * tag 'usb-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: Revert "USB / PM: Allow USB devices to remain runtime-suspended when sleeping" usb: musb: jz4740: fix error check of usb_get_phy() Revert "usb: musb: musb_host: Enable HCD_BH flag to handle urb return in bottom half" usb: musb: gadget: nuke endpoint before setting its descriptor to NULL USB: serial: cp210x: add Straizona Focusers device ids USB: serial: cp210x: add ID for Link ECU
2016-05-07Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds
Pull ARM fixes from Russell King: "These are a number of updates to fix a few problems found in the ARM nommu code over the last couple of years, caused mostly by changes on the mmu side" * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 8573/1: domain: move {set,get}_domain under config guard ARM: 8572/1: nommu: change memory reserve for the vectors ARM: 8571/1: nommu: fix PMSAv7 setup
2016-05-07Merge tag 'media/v4.6-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: - deadlock fixes on driver probe at exynos4-is and s43-camif drivers - a build breakage if media controller is enabled and USB or PCI is built as module. * tag 'media/v4.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: [media] media-device: fix builds when USB or PCI is compiled as module [media] media: s3c-camif: fix deadlock on driver probe() [media] media: exynos4-is: fix deadlock on driver probe
2016-05-07Merge branch 'for-4.6-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata fixes from Tejun Heo: "An ahci driver addition and updates to ahci port enable handling for some platform devices" * 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: ata: add AMD Seattle platform driver ARM: dts: apq8064: add ahci ports-implemented mask ata: ahci-platform: Add ports-implemented DT bindings. libahci: save port map for forced port map
2016-05-07Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fix from Doug Ledford: "Fix for max sector calculation in iSER" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/iser: Fix max_sectors calculation
2016-05-06Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull writeback fix from Jens Axboe: "Just a single fix for domain aware writeback, fixing a regression that can cause balance_dirty_pages() to keep looping while not getting any work done" * 'for-linus' of git://git.kernel.dk/linux-block: writeback: Fix performance regression in wb_over_bg_thresh()
2016-05-06Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "This contains two fixes: a boot fix for older SGI/UV systems, and an APIC calibration fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO x86/platform/UV: Bring back the call to map_low_mmrs in uv_system_init
2016-05-06Merge tag 'pm+acpi-4.6-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management and ACPI fixes from Rafael Wysocki: "Fixes for problems introduced or discovered recently (intel_pstate, sti-cpufreq, ARM64 cpuidle, Operating Performance Points framework, generic device properties framework) and one fix for a hotplug-related deadlock in ACPICA that's been there forever, but is nasty enough. Specifics: - Fix for a recent regression in the intel_pstate driver causing it to fail to restore the HWP (HW-managed P-states) configuration of the boot CPU after suspend-to-RAM (Rafael Wysocki). - Fix for two recent regressions in the intel_pstate driver, one that can trigger a divide by zero if the driver is accessed via sysfs before it manages to take the first sample and one causing it to fail to update a structure field used in a trace point, so the information coming from it is less useful (Rafael Wysocki). - Fix for a problem in the sti-cpufreq driver introduced during the 4.5 cycle that causes it to break CPU PM in multi-platform kernels by registering cpufreq-dt (which subsequently doesn't work) unconditionally and preventing the driver that would actually work from registering (Sudeep Holla). - Stable-candidate fix for an ARM64 cpuidle issue causing idle state usage counters to be incorrectly updated for idle states that were not entered due to errors (James Morse). - Fix for a recently introduced issue in the OPP (Operating Performance Points) framework causing it to print bogus error messages for missing optional regulators (Viresh Kumar). - Fix for a recently introduced issue in the generic device properties framework that may cause it to attempt to dereferece and invalid pointer in some cases (Heikki Krogerus). - Fix for a deadlock in the ACPICA core that may be triggered by device (eg Thunderbolt) hotplug (Prarit Bhargava)" * tag 'pm+acpi-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / OPP: Remove useless check ACPICA: Dispatcher: Update thread ID for recursive method calls intel_pstate: Fix intel_pstate_get() cpufreq: intel_pstate: Fix HWP on boot CPU after system resume cpufreq: st: enable selective initialization based on the platform ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value device property: Avoid potential dereferences of invalid pointers
2016-05-06Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "This contains a single fix that fixes a nohz tick stopping bug when mixed-poliocy SCHED_FIFO and SCHED_RR tasks are present on a runqueue" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: nohz/full, sched/rt: Fix missed tick-reenabling bug in sched_can_stop_tick()
2016-05-06Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "This tree contains two fixes: new Intel CPU model numbers and an AMD/iommu uncore PMU driver fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/amd/iommu: Do not register a task ctx for uncore like PMUs perf/x86: Add model numbers for Kabylake CPUs
2016-05-06Merge branch 'efi-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: "This tree contains three fixes: a console spam fix, a file pattern fix and a sysfb_efi fix for a bug that triggered on older ThinkPads" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sysfb_efi: Fix valid BAR address range check x86/efi-bgrt: Switch all pr_err() to pr_notice() for invalid BGRT MAINTAINERS: Remove asterisk from EFI directory names
2016-05-06Merge branch 'parisc-4.6-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fix from Helge Deller: "Patch from Dmitry V Levin to fix a kernel crash when a straced process calls the (invalid) syscall which is equal to value of __NR_Linux_syscalls" * 'parisc-4.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: fix a bug when syscall number of tracee is __NR_Linux_syscalls
2016-05-06Merge tag 'arc-4.6-rc7-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC fixes from Vineet Gupta: "Late in the cycle, but this has fixes for couple of issues: a PAE40 boot crash and Arnd spotting lack of barriers in BE io-accessors. The 3rd patch for enabling highmem in low physical mem ;-) honestly is more than a "fix" but its been in works for some time, seems to be stable in testing and enables 2 of our customers to go forward with 4.6 kernel. - Fix for PTE truncation in PAE40 builds - Fix for big endian IO accessors lacking IO barrier - Allow HIGHMEM to work with low physical addresses" * tag 'arc-4.6-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: support HIGHMEM even without PAE40 ARC: Fix PAE40 boot failures due to PTE truncation ARC: Add missing io barriers to io{read,write}{16,32}be()
2016-05-06Merge tag 'powerpc-4.6-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: "Fix bad inline asm constraint in create_zero_mask() from Anton Blanchard" * tag 'powerpc-4.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Fix bad inline asm constraint in create_zero_mask()
2016-05-06Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm fixes from Dave Airlie: "Fixes for i915, amdgpu/radeon and imx. The IMX fix is for an autoloading regression found in Fedora. The radeon fixes, are the same fix to amdgpu/radeon to avoid a hardware lockup in some circumstances with a bad mode, and a double free bug I took a few hours chasing down the other morning. The i915 fixes are across the board, all stable material, and fixing some hangs and suspend/resume issues, along with a live status regressions" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading drm/amdgpu: make sure vertical front porch is at least 1 drm/radeon: make sure vertical front porch is at least 1 drm/amdgpu: set metadata pointer to NULL after freeing. drm/i915: Make RPS EI/thresholds multiple of 25 on SNB-BDW drm/i915: Fake HDMI live status drm/i915: Fix eDP low vswing for Broadwell drm/i915/ddi: Fix eDP VDD handling during booting and suspend/resume drm/i915: Fix system resume if PCI device remained enabled drm/i915: Avoid stalling on pending flips for legacy cursor updates
2016-05-06parisc: fix a bug when syscall number of tracee is __NR_Linux_syscallsDmitry V. Levin
Do not load one entry beyond the end of the syscall table when the syscall number of a traced process equals to __NR_Linux_syscalls. Similar bug with regular processes was fixed by commit 3bb457af4fa8 ("[PARISC] Fix bug when syscall nr is __NR_Linux_syscalls"). This bug was found by strace test suite. Cc: stable@vger.kernel.org Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Acked-by: Helge Deller <deller@gmx.de> Signed-off-by: Helge Deller <deller@gmx.de>
2016-05-06Merge branches 'pm-opp-fixes', 'pm-cpufreq-fixes' and 'pm-cpuidle-fixes'Rafael J. Wysocki
* pm-opp-fixes: PM / OPP: Remove useless check * pm-cpufreq-fixes: intel_pstate: Fix intel_pstate_get() cpufreq: intel_pstate: Fix HWP on boot CPU after system resume cpufreq: st: enable selective initialization based on the platform * pm-cpuidle-fixes: ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value
2016-05-06Merge branches 'acpica-fixes' and 'device-properties-fixes'Rafael J. Wysocki
* acpica-fixes: ACPICA: Dispatcher: Update thread ID for recursive method calls * device-properties-fixes: device property: Avoid potential dereferences of invalid pointers
2016-05-06x86/tsc: Read all ratio bits from MSR_PLATFORM_INFOChen Yu
Currently we read the tsc radio: ratio = (MSR_PLATFORM_INFO >> 8) & 0x1f; Thus we get bit 8-12 of MSR_PLATFORM_INFO, however according to the SDM (35.5), the ratio bits are bit 8-15. Ignoring the upper bits can result in an incorrect tsc ratio, which causes the TSC calibration and the Local APIC timer frequency to be incorrect. Fix this problem by masking 0xff instead. [ tglx: Massaged changelog ] Fixes: 7da7c1561366 "x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs" Signed-off-by: Chen Yu <yu.c.chen@intel.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: stable@vger.kernel.org Cc: Bin Gao <bin.gao@intel.com> Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/1462505619-5516-1-git-send-email-yu.c.chen@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-05-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge fixes from Andrew Morton: "14 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: byteswap: try to avoid __builtin_constant_p gcc bug lib/stackdepot: avoid to return 0 handle mm: fix kcompactd hang during memory offlining modpost: fix module autoloading for OF devices with generic compatible property proc: prevent accessing /proc/<PID>/environ until it's ready mm/zswap: provide unique zpool name mm: thp: kvm: fix memory corruption in KVM with THP enabled MAINTAINERS: fix Rajendra Nayak's address mm, cma: prevent nr_isolated_* counters from going negative mm: update min_free_kbytes from khugepaged after core initialization huge pagecache: mmap_sem is unlocked when truncation splits pmd rapidio/mport_cdev: fix uapi type definitions mm: memcontrol: let v2 cgroups follow changes in system swappiness mm: thp: correct split_huge_pages file permission
2016-05-06mailmap: add John Paul Adrian GlaubitzLinus Torvalds
Apparently patchwork ended up truncating the full name. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-06Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: - a fix for the persistent memory 'struct page' driver. The implementation overlooked the fact that pages are allocated in 2MB units leading to -ENOMEM when establishing some configurations. It's tagged for -stable as the problem was introduced with the initial implementation in 4.5. - The new "error status translation" routine, introduced with the 4.6 updates to the nfit driver, missed a necessary path in acpi_nfit_ctl(). The end result is that we are falsely assuming commands complete successfully when the embedded status says otherwise. * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nfit: fix translation of command status results libnvdimm, pfn: fix memmap reservation sizing
2016-05-06byteswap: try to avoid __builtin_constant_p gcc bugArnd Bergmann
This is another attempt to avoid a regression in wwn_to_u64() after that started using get_unaligned_be64(), which in turn ran into a bug on gcc-4.9 through 6.1. The regression got introduced due to the combination of two separate workarounds (commits e3bde9568d99: "include/linux/unaligned: force inlining of byteswap operations" and ef3fb2422ffe: "scsi: fc: use get/put_unaligned64 for wwn access") that each try to sidestep distinct problems with gcc behavior (code growth and increased stack usage). Unfortunately after both have been applied, a more serious gcc bug has been uncovered, leading to incorrect object code that discards part of a function and causes undefined behavior. As part of this problem is how __builtin_constant_p gets evaluated on an argument passed by reference into an inline function, this avoids the use of __builtin_constant_p() for all architectures that set CONFIG_ARCH_USE_BUILTIN_BSWAP. Most architectures do not set ARCH_SUPPORTS_OPTIMIZED_INLINING, which means they probably do not suffer from the problem in the qla2xxx driver, but they might still run into it elsewhere. Both of the original workarounds were only merged in the 4.6 kernel, and the bug that is fixed by this patch should only appear if both are there, so we probably don't need to backport the fix. On the other hand, it works by simplifying the code path and should not have any negative effects. [arnd@arndb.de: fix older gcc warnings] (http://lkml.kernel.org/r/12243652.bxSxEgjgfk@wuerfel) Link: https://lkml.org/lkml/headers/2016/4/12/1103 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70232 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646 Fixes: e3bde9568d99 ("include/linux/unaligned: force inlining of byteswap operations") Fixes: ef3fb2422ffe ("scsi: fc: use get/put_unaligned64 for wwn access") Link: http://lkml.kernel.org/r/1780465.XdtPJpi8Tt@wuerfel Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Josh Poimboeuf <jpoimboe@redhat.com> # on gcc-5.3 Tested-by: Quinn Tran <quinn.tran@qlogic.com> Cc: Martin Jambor <mjambor@suse.cz> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Thomas Graf <tgraf@suug.ch> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Himanshu Madhani <himanshu.madhani@qlogic.com> Cc: Jan Hubicka <hubicka@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-06lib/stackdepot: avoid to return 0 handleJoonsoo Kim
Recently, we allow to save the stacktrace whose hashed value is 0. It causes the problem that stackdepot could return 0 even if in success. User of stackdepot cannot distinguish whether it is success or not so we need to solve this problem. In this patch, 1 bit are added to handle and make valid handle none 0 by setting this bit. After that, valid handle will not be 0 and 0 handle will represent failure correctly. Fixes: 33334e25769c ("lib/stackdepot.c: allow the stack trace hash to be zero") Link: http://lkml.kernel.org/r/1462252403-1106-1-git-send-email-iamjoonsoo.kim@lge.com Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-06mm: fix kcompactd hang during memory offliningVlastimil Babka
Assume memory47 is the last online block left in node1. This will hang: # echo offline > /sys/devices/system/node/node1/memory47/state After a couple of minutes, the following pops up in dmesg: INFO: task bash:957 blocked for more than 120 seconds. Not tainted 4.6.0-rc6+ #6 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. bash D ffff8800b7adbaf8 0 957 951 0x00000000 Call Trace: schedule+0x35/0x80 schedule_timeout+0x1ac/0x270 wait_for_completion+0xe1/0x120 kthread_stop+0x4f/0x110 kcompactd_stop+0x26/0x40 __offline_pages.constprop.28+0x7e6/0x840 offline_pages+0x11/0x20 memory_block_action+0x73/0x1d0 memory_subsys_offline+0x47/0x60 device_offline+0x86/0xb0 store_mem_state+0xda/0xf0 dev_attr_store+0x18/0x30 sysfs_kf_write+0x37/0x40 kernfs_fop_write+0x11d/0x170 __vfs_write+0x37/0x120 vfs_write+0xa9/0x1a0 SyS_write+0x55/0xc0 entry_SYSCALL_64_fastpath+0x1a/0xa4 kcompactd is waiting for kcompactd_max_order > 0 when it's woken up to actually exit. Check kthread_should_stop() to break out of the wait. Fixes: 698b1b306 ("mm, compaction: introduce kcompactd"). Reported-by: Reza Arbab <arbab@linux.vnet.ibm.com> Tested-by: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-06modpost: fix module autoloading for OF devices with generic compatible propertyPhilipp Zabel
Since the wildcard at the end of OF module aliases is gone, autoloading of modules that don't match a device's last (most generic) compatible value fails. For example the CODA960 VPU on i.MX6Q has the SoC specific compatible "fsl,imx6q-vpu" and the generic compatible "cnm,coda960". Since the driver currently only works with knowledge about the SoC specific integration, it doesn't list "cnm,cod960" in the module device table. This results in the device compatible "of:NvpuT<NULL>Cfsl,imx6q-vpuCcnm,coda960" not matching the module alias "of:N*T*Cfsl,imx6q-vpu" anymore, whereas before commit 2f632369ab79 ("modpost: don't add a trailing wildcard for OF module aliases") it matched the module alias "of:N*T*Cfsl,imx6q-vpu*". This patch adds two module aliases for each compatible, one without the wildcard and one with "C*" appended. $ modinfo coda | grep imx6q alias: of:N*T*Cfsl,imx6q-vpuC* alias: of:N*T*Cfsl,imx6q-vpu Fixes: 2f632369ab79 ("modpost: don't add a trailing wildcard for OF module aliases") Link: http://lkml.kernel.org/r/1462203339-15340-1-git-send-email-p.zabel@pengutronix.de Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de> Cc: Javier Martinez Canillas <javier@osg.samsung.com> Cc: Brian Norris <computersforpeace@gmail.com> Cc: Sjoerd Simons <sjoerd.simons@collabora.co.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> [4.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-06proc: prevent accessing /proc/<PID>/environ until it's readyMathias Krause
If /proc/<PID>/environ gets read before the envp[] array is fully set up in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying to read more bytes than are actually written, as env_start will already be set but env_end will still be zero, making the range calculation underflow, allowing to read beyond the end of what has been written. Fix this as it is done for /proc/<PID>/cmdline by testing env_end for zero. It is, apparently, intentionally set last in create_*_tables(). This bug was found by the PaX size_overflow plugin that detected the arithmetic underflow of 'this_len = env_end - (env_start + src)' when env_end is still zero. The expected consequence is that userland trying to access /proc/<PID>/environ of a not yet fully set up process may get inconsistent data as we're in the middle of copying in the environment variables. Fixes: https://forums.grsecurity.net/viewtopic.php?f=3&t=4363 Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=116461 Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Emese Revfy <re.emese@gmail.com> Cc: Pax Team <pageexec@freemail.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Jarod Wilson <jarod@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-06mm/zswap: provide unique zpool nameDan Streetman
Instead of using "zswap" as the name for all zpools created, add an atomic counter and use "zswap%x" with the counter number for each zpool created, to provide a unique name for each new zpool. As zsmalloc, one of the zpool implementations, requires/expects a unique name for each pool created, zswap should provide a unique name. The zsmalloc pool creation does not fail if a new pool with a conflicting name is created, unless CONFIG_ZSMALLOC_STAT is enabled; in that case, zsmalloc pool creation fails with -ENOMEM. Then zswap will be unable to change its compressor parameter if its zpool is zsmalloc; it also will be unable to change its zpool parameter back to zsmalloc, if it has any existing old zpool using zsmalloc with page(s) in it. Attempts to change the parameters will result in failure to create the zpool. This changes zswap to provide a unique name for each zpool creation. Fixes: f1c54846ee45 ("zswap: dynamic pool creation") Signed-off-by: Dan Streetman <ddstreet@ieee.org> Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Dan Streetman <dan.streetman@canonical.com> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-06mm: thp: kvm: fix memory corruption in KVM with THP enabledAndrea Arcangeli
After the THP refcounting change, obtaining a compound pages from get_user_pages() no longer allows us to assume the entire compound page is immediately mappable from a secondary MMU. A secondary MMU doesn't want to call get_user_pages() more than once for each compound page, in order to know if it can map the whole compound page. So a secondary MMU needs to know from a single get_user_pages() invocation when it can map immediately the entire compound page to avoid a flood of unnecessary secondary MMU faults and spurious atomic_inc()/atomic_dec() (pages don't have to be pinned by MMU notifier users). Ideally instead of the page->_mapcount < 1 check, get_user_pages() should return the granularity of the "page" mapping in the "mm" passed to get_user_pages(). However it's non trivial change to pass the "pmd" status belonging to the "mm" walked by get_user_pages up the stack (up to the caller of get_user_pages). So the fix just checks if there is not a single pte mapping on the page returned by get_user_pages, and in turn if the caller can assume that the whole compound page is mapped in the current "mm" (in a pmd_trans_huge()). In such case the entire compound page is safe to map into the secondary MMU without additional get_user_pages() calls on the surrounding tail/head pages. In addition of being faster, not having to run other get_user_pages() calls also reduces the memory footprint of the secondary MMU fault in case the pmd split happened as result of memory pressure. Without this fix after a MADV_DONTNEED (like invoked by QEMU during postcopy live migration or balloning) or after generic swapping (with a failure in split_huge_page() that would only result in pmd splitting and not a physical page split), KVM would map the whole compound page into the shadow pagetables, despite regular faults or userfaults (like UFFDIO_COPY) may map regular pages into the primary MMU as result of the pte faults, leading to the guest mode and userland mode going out of sync and not working on the same memory at all times. Any other secondary MMU notifier manager (KVM is just one of the many MMU notifier users) will need the same information if it doesn't want to run a flood of get_user_pages_fast and it can support multiple granularity in the secondary MMU mappings, so I think it is justified to be exposed not just to KVM. The other option would be to move transparent_hugepage_adjust to mm/huge_memory.c but that currently has all kind of KVM data structures in it, so it's definitely not a cut-and-paste work, so I couldn't do a fix as cleaner as this one for 4.6. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: "Li, Liang Z" <liang.z.li@intel.com> Cc: Amit Shah <amit.shah@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-06MAINTAINERS: fix Rajendra Nayak's addressEric Engestrom
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com> Cc: Rajendra Nayak <rnayak@codeaurora.org> Cc: Afzal Mohammed <afzal.mohd.ma@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-06mm, cma: prevent nr_isolated_* counters from going negativeHugh Dickins
/proc/sys/vm/stat_refresh warns nr_isolated_anon and nr_isolated_file go increasingly negative under compaction: which would add delay when should be none, or no delay when should delay. The bug in compaction was due to a recent mmotm patch, but much older instance of the bug was also noticed in isolate_migratepages_range() which is used for CMA and gigantic hugepage allocations. The bug is caused by putback_movable_pages() in an error path decrementing the isolated counters without them being previously incremented by acct_isolated(). Fix isolate_migratepages_range() by removing the error-path putback, thus reaching acct_isolated() with migratepages still isolated, and leaving putback to caller like most other places do. Fixes: edc2ca612496 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()") [vbabka@suse.cz: expanded the changelog] Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-06mm: update min_free_kbytes from khugepaged after core initializationJason Baron
Khugepaged attempts to raise min_free_kbytes if its set too low. However, on boot khugepaged sets min_free_kbytes first from subsys_initcall(), and then the mm 'core' over-rides min_free_kbytes after from init_per_zone_wmark_min(), via a module_init() call. Khugepaged used to use a late_initcall() to set min_free_kbytes (such that it occurred after the core initialization), however this was removed when the initialization of min_free_kbytes was integrated into the starting of the khugepaged thread. The fix here is simply to invoke the core initialization using a core_initcall() instead of module_init(), such that the previous initialization ordering is restored. I didn't restore the late_initcall() since start_stop_khugepaged() already sets min_free_kbytes via set_recommended_min_free_kbytes(). This was noticed when we had a number of page allocation failures when moving a workload to a kernel with this new initialization ordering. On an 8GB system this restores min_free_kbytes back to 67584 from 11365 when CONFIG_TRANSPARENT_HUGEPAGE=y is set and either CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y or CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y. Fixes: 79553da293d3 ("thp: cleanup khugepaged startup") Signed-off-by: Jason Baron <jbaron@akamai.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-06huge pagecache: mmap_sem is unlocked when truncation splits pmdHugh Dickins
zap_pmd_range()'s CONFIG_DEBUG_VM !rwsem_is_locked(&mmap_sem) BUG() will be invalid with huge pagecache, in whatever way it is implemented: truncation of a hugely-mapped file to an unhugely-aligned size would easily hit it. (Although anon THP could in principle apply khugepaged to private file mappings, which are not excluded by the MADV_HUGEPAGE restrictions, in practice there's a vm_ops check which excludes them, so it never hits this BUG() - there's no interface to "truncate" an anonymous mapping.) We could complicate the test, to check i_mmap_rwsem also when there's a vm_file; but my inclination was to make zap_pmd_range() more readable by simply deleting this check. A search has shown no report of the issue in the years since commit e0897d75f0b2 ("mm, thp: print useful information when mmap_sem is unlocked in zap_pmd_range") expanded it from VM_BUG_ON() - though I cannot point to what commit I would say then fixed the issue. But there are a couple of other patches now floating around, neither yet in the tree: let's agree to retain the check as a VM_BUG_ON_VMA(), as Matthew Wilcox has done; but subject to a vma_is_anonymous() check, as Kirill Shutemov has done. And let's get this in, without waiting for any particular huge pagecache implementation to reach the tree. Matthew said "We can reproduce this BUG() in the current Linus tree with DAX PMDs". Signed-off-by: Hugh Dickins <hughd@google.com> Tested-by: Matthew Wilcox <willy@linux.intel.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Yang Shi <yang.shi@linaro.org> Cc: Ning Qu <quning@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-06rapidio/mport_cdev: fix uapi type definitionsAlexandre Bounine
Fix problems in uapi definitions reported by Gabriel Laskar: (see https://lkml.org/lkml/2016/4/5/205 for details) - move public header file rio_mport_cdev.h to include/uapi/linux directory - change types in data structures passed as IOCTL parameters - improve parameter checking in some IOCTL service routines Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Reported-by: Gabriel Laskar <gabriel@lse.epita.fr> Tested-by: Barry Wood <barry.wood@idt.com> Cc: Gabriel Laskar <gabriel@lse.epita.fr> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-06mm: memcontrol: let v2 cgroups follow changes in system swappinessJohannes Weiner
Cgroup2 currently doesn't have a per-cgroup swappiness setting. We might want to add one later - that's a different discussion - but until we do, the cgroups should always follow the system setting. Otherwise it will be unchangeably set to whatever the ancestor inherited from the system setting at the time of cgroup creation. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: <stable@vger.kernel.org> [4.5] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>