From f48cfddc6729ef133933062320039808bafa6f45 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Fri, 8 Nov 2013 16:31:29 -0800
Subject: vfs: In d_path don't call d_dname on a mount point

Aditya Kali (adityakali@google.com) wrote:
> Commit bf056bfa80596a5d14b26b17276a56a0dcb080e5:
> "proc: Fix the namespace inode permission checks." converted
> the namespace files into symlinks. The same commit changed
> the way namespace bind mounts appear in /proc/mounts:
>   $ mount --bind /proc/self/ns/ipc /mnt/ipc
> Originally:
>   $ cat /proc/mounts | grep ipc
>   proc /mnt/ipc proc rw,nosuid,nodev,noexec 0 0
>
> After commit bf056bfa80596a5d14b26b17276a56a0dcb080e5:
>   $ cat /proc/mounts | grep ipc
>   proc ipc:[4026531839] proc rw,nosuid,nodev,noexec 0 0
>
> This breaks userspace which expects the 2nd field in
> /proc/mounts to be a valid path.

The symlink /proc/<pid>/ns/{ipc,mnt,net,pid,user,uts} point to
dentries allocated with d_alloc_pseudo that we can mount, and
that have interesting names printed out with d_dname.

When these files are bind mounted /proc/mounts is not currently
displaying the mount point correctly because d_dname is called instead
of just displaying the path where the file is mounted.

Solve this by adding an explicit check to distinguish mounted pseudo
inodes and unmounted pseudo inodes.  Unmounted pseudo inodes always
use mount of their filesstem as the mnt_root  in their path making
these two cases easy to distinguish.

CC: stable@vger.kernel.org
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Reported-by: Aditya Kali <adityakali@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
diff --git a/fs/dcache.c b/fs/dcache.c
index 4bdb300..f7282eb 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3061,8 +3061,13 @@ char *d_path(const struct path *path, char *buf, int buflen)
 	 * thus don't need to be hashed.  They also don't need a name until a
 	 * user wants to identify the object in /proc/pid/fd/.  The little hack
 	 * below allows us to generate a name for these objects on demand:
+	 *
+	 * Some pseudo inodes are mountable.  When they are mounted
+	 * path->dentry == path->mnt->mnt_root.  In that case don't call d_dname
+	 * and instead have d_path return the mounted path.
 	 */
-	if (path->dentry->d_op && path->dentry->d_op->d_dname)
+	if (path->dentry->d_op && path->dentry->d_op->d_dname &&
+	    (!IS_ROOT(path->dentry) || path->dentry != path->mnt->mnt_root))
 		return path->dentry->d_op->d_dname(path->dentry, buf, buflen);
 
 	rcu_read_lock();
-- 
cgit v0.10.2


From 1f7f4dde5c945f41a7abc2285be43d918029ecc5 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Thu, 14 Nov 2013 21:10:16 -0800
Subject: fork:  Allow CLONE_PARENT after setns(CLONE_NEWPID)

Serge Hallyn <serge.hallyn@ubuntu.com> writes:
> Hi Oleg,
>
> commit 40a0d32d1eaffe6aac7324ca92604b6b3977eb0e :
> "fork: unify and tighten up CLONE_NEWUSER/CLONE_NEWPID checks"
> breaks lxc-attach in 3.12.  That code forks a child which does
> setns() and then does a clone(CLONE_PARENT).  That way the
> grandchild can be in the right namespaces (which the child was
> not) and be a child of the original task, which is the monitor.
>
> lxc-attach in 3.11 was working fine with no side effects that I
> could see.  Is there a real danger in allowing CLONE_PARENT
> when current->nsproxy->pidns_for_children is not our pidns,
> or was this done out of an "over-abundance of caution"?  Can we
> safely revert that new extra check?

The two fundamental things I know we can not allow are:
- A shared signal queue aka CLONE_THREAD.  Because we compute the pid
  and uid of the signal when we place it in the queue.

- Changing the pid and by extention pid_namespace of an existing
  process.

From a parents perspective there is nothing special about the pid
namespace, to deny CLONE_PARENT, because the parent simply won't know or
care.

From the childs perspective all that is special really are shared signal
queues.

User mode threading with CLONE_PARENT|CLONE_VM|CLONE_SIGHAND and tasks
in different pid namespaces is almost certainly going to break because
it is complicated.  But shared signal handlers can look at per thread
information to know which pid namespace a process is in, so I don't know
of any reason not to support CLONE_PARENT|CLONE_VM|CLONE_SIGHAND threads
at the kernel level.  It would be absolutely stupid to implement but
that is a different thing.

So hmm.

Because it can do no harm, and because it is a regression let's remove
the CLONE_PARENT check and send it stable.

Cc: stable@vger.kernel.org
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

diff --git a/kernel/fork.c b/kernel/fork.c
index 728d5be..f82fa2e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1171,7 +1171,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	 * do not allow it to share a thread group or signal handlers or
 	 * parent with the forking task.
 	 */
-	if (clone_flags & (CLONE_SIGHAND | CLONE_PARENT)) {
+	if (clone_flags & CLONE_SIGHAND) {
 		if ((clone_flags & (CLONE_NEWUSER | CLONE_NEWPID)) ||
 		    (task_active_pid_ns(current) !=
 				current->nsproxy->pid_ns_for_children))
-- 
cgit v0.10.2


From 41301ae78a99ead04ea42672a1ab72c6f44cc81d Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Thu, 14 Nov 2013 21:22:25 -0800
Subject: vfs: Fix a regression in mounting proc

Gao feng <gaofeng@cn.fujitsu.com> reported that commit
e51db73532955dc5eaba4235e62b74b460709d5b
userns: Better restrictions on when proc and sysfs can be mounted
caused a regression on mounting a new instance of proc in a mount
namespace created with user namespace privileges, when binfmt_misc
is mounted on /proc/sys/fs/binfmt_misc.

This is an unintended regression caused by the absolutely bogus empty
directory check in fs_fully_visible.  The check fs_fully_visible replaced
didn't even bother to attempt to verify proc was fully visible and
hiding proc files with any kind of mount is rare.  So for now fix
the userspace regression by allowing directory with nlink == 1
as /proc/sys/fs/binfmt_misc has.

I will have a better patch but it is not stable material, or
last minute kernel material.  So it will have to wait.

Cc: stable@vger.kernel.org
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
Tested-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

diff --git a/fs/namespace.c b/fs/namespace.c
index ac2ce8a..be32ebc 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2886,7 +2886,7 @@ bool fs_fully_visible(struct file_system_type *type)
 			struct inode *inode = child->mnt_mountpoint->d_inode;
 			if (!S_ISDIR(inode->i_mode))
 				goto next;
-			if (inode->i_nlink != 2)
+			if (inode->i_nlink > 2)
 				goto next;
 		}
 		visible = true;
-- 
cgit v0.10.2


From f9b0e058cbd04ada76b13afffa7e1df830543c24 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Sat, 14 Dec 2013 04:21:26 +0800
Subject: writeback: Fix data corruption on NFS

Commit 4f8ad655dbc8 "writeback: Refactor writeback_single_inode()" added
a condition to skip clean inode. However this is wrong in WB_SYNC_ALL
mode because there we also want to wait for outstanding writeback on
possibly clean inode. This was causing occasional data corruption issues
on NFS because it uses sync_inode() to make sure all outstanding writes
are flushed to the server before truncating the inode and with
sync_inode() returning prematurely file was sometimes extended back
by an outstanding write after it was truncated.

So modify the test to also check for pages under writeback in
WB_SYNC_ALL mode.

CC: stable@vger.kernel.org # >= 3.5
Fixes: 4f8ad655dbc82cf05d2edc11e66b78a42d38bf93
Reported-and-tested-by: Dan Duval <dan.duval@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 1f4a10e..e0259a1 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -516,13 +516,16 @@ writeback_single_inode(struct inode *inode, struct bdi_writeback *wb,
 	}
 	WARN_ON(inode->i_state & I_SYNC);
 	/*
-	 * Skip inode if it is clean. We don't want to mess with writeback
-	 * lists in this function since flusher thread may be doing for example
-	 * sync in parallel and if we move the inode, it could get skipped. So
-	 * here we make sure inode is on some writeback list and leave it there
-	 * unless we have completely cleaned the inode.
+	 * Skip inode if it is clean and we have no outstanding writeback in
+	 * WB_SYNC_ALL mode. We don't want to mess with writeback lists in this
+	 * function since flusher thread may be doing for example sync in
+	 * parallel and if we move the inode, it could get skipped. So here we
+	 * make sure inode is on some writeback list and leave it there unless
+	 * we have completely cleaned the inode.
 	 */
-	if (!(inode->i_state & I_DIRTY))
+	if (!(inode->i_state & I_DIRTY) &&
+	    (wbc->sync_mode != WB_SYNC_ALL ||
+	     !mapping_tagged(inode->i_mapping, PAGECACHE_TAG_WRITEBACK)))
 		goto out;
 	inode->i_state |= I_SYNC;
 	spin_unlock(&inode->i_lock);
-- 
cgit v0.10.2


From c1dcc927dae01dfd4904ee82ce2c00b50eab6dc3 Mon Sep 17 00:00:00 2001
From: Soren Brinkmann <soren.brinkmann@xilinx.com>
Date: Tue, 26 Nov 2013 17:04:50 -0800
Subject: clocksource: cadence_ttc: Fix mutex taken inside interrupt context

When the kernel is compiled with:
CONFIG_HIGH_RES_TIMERS=no
CONFIG_HZ_PERIODIC=yes
CONFIG_DEBUG_ATOMIC_SLEEP=yes

The following WARN appears:

WARNING: CPU: 1 PID: 0 at linux/kernel/mutex.c:856 mutex_trylock+0x70/0x1fc()
DEBUG_LOCKS_WARN_ON(in_interrupt())
Modules linked in:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.12.0-xilinx-dirty #93
[<c0014a78>] (unwind_backtrace+0x0/0x11c) from [<c0011b6c>] (show_stack+0x10/0x14)
[<c0011b6c>] (show_stack+0x10/0x14) from [<c039120c>] (dump_stack+0x7c/0xc0)
[<c039120c>] (dump_stack+0x7c/0xc0) from [<c001fda4>] (warn_slowpath_common+0x60/0x84)
[<c001fda4>] (warn_slowpath_common+0x60/0x84) from [<c001fe48>] (warn_slowpath_fmt+0x2c/0x3c)
[<c001fe48>] (warn_slowpath_fmt+0x2c/0x3c) from [<c0392658>] (mutex_trylock+0x70/0x1fc)
[<c0392658>] (mutex_trylock+0x70/0x1fc) from [<c02dfc08>] (clk_prepare_lock+0xc/0xe4)
[<c02dfc08>] (clk_prepare_lock+0xc/0xe4) from [<c02e099c>] (clk_get_rate+0xc/0x44)
[<c02e099c>] (clk_get_rate+0xc/0x44) from [<c02d0394>] (ttc_set_mode+0x34/0x78)
[<c02d0394>] (ttc_set_mode+0x34/0x78) from [<c005f794>] (clockevents_set_mode+0x28/0x5c)
[<c005f794>] (clockevents_set_mode+0x28/0x5c) from [<c00607fc>] (tick_broadcast_on_off+0x190/0x1c0)
[<c00607fc>] (tick_broadcast_on_off+0x190/0x1c0) from [<c005f168>] (clockevents_notify+0x58/0x1ac)
[<c005f168>] (clockevents_notify+0x58/0x1ac) from [<c02b99dc>] (cpuidle_setup_broadcast_timer+0x20/0x24)
[<c02b99dc>] (cpuidle_setup_broadcast_timer+0x20/0x24) from [<c006cd04>] (generic_smp_call_function_single_interrupt+0)
[<c006cd04>] (generic_smp_call_function_single_interrupt+0xe0/0x130) from [<c00138c8>] (handle_IPI+0x88/0x118)
[<c00138c8>] (handle_IPI+0x88/0x118) from [<c0008504>] (gic_handle_irq+0x58/0x60)
[<c0008504>] (gic_handle_irq+0x58/0x60) from [<c0012644>] (__irq_svc+0x44/0x78)
Exception stack(0xef099fa0 to 0xef099fe8)
9fa0: 00000001 ef092100 00000000 ef092100 ef098000 00000015 c0399f2c c0579d74
9fc0: 0000406a 413fc090 00000000 00000000 00000000 ef099fe8 c00666ec c000f46c
9fe0: 20000113 ffffffff
[<c0012644>] (__irq_svc+0x44/0x78) from [<c000f46c>] (arch_cpu_idle+0x34/0x3c)
[<c000f46c>] (arch_cpu_idle+0x34/0x3c) from [<c0053980>] (cpu_startup_entry+0xa8/0x10c)
[<c0053980>] (cpu_startup_entry+0xa8/0x10c) from [<000085a4>] (0x85a4)

We are in an interrupt context (IPI) and we are calling clk_get_rate in the
set_mode function which in turn ends up by getting a mutex... Even if that
does not hang, it is a potential kernel deadlock.

It is not allowed to call clk_get_rate() from interrupt context. To
avoid such calls the timer input frequency is stored in the driver's
data struct which makes it accessible to the driver in any context.

[dlezcano] completed the changelog with the WARN trace and added a more
detailed description. Tested on zync zc702.

Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

diff --git a/drivers/clocksource/cadence_ttc_timer.c b/drivers/clocksource/cadence_ttc_timer.c
index b2bb3a4b..a92350b 100644
--- a/drivers/clocksource/cadence_ttc_timer.c
+++ b/drivers/clocksource/cadence_ttc_timer.c
@@ -67,11 +67,13 @@
  * struct ttc_timer - This definition defines local timer structure
  *
  * @base_addr:	Base address of timer
+ * @freq:	Timer input clock frequency
  * @clk:	Associated clock source
  * @clk_rate_change_nb	Notifier block for clock rate changes
  */
 struct ttc_timer {
 	void __iomem *base_addr;
+	unsigned long freq;
 	struct clk *clk;
 	struct notifier_block clk_rate_change_nb;
 };
@@ -196,9 +198,8 @@ static void ttc_set_mode(enum clock_event_mode mode,
 
 	switch (mode) {
 	case CLOCK_EVT_MODE_PERIODIC:
-		ttc_set_interval(timer,
-				DIV_ROUND_CLOSEST(clk_get_rate(ttce->ttc.clk),
-					PRESCALE * HZ));
+		ttc_set_interval(timer, DIV_ROUND_CLOSEST(ttce->ttc.freq,
+						PRESCALE * HZ));
 		break;
 	case CLOCK_EVT_MODE_ONESHOT:
 	case CLOCK_EVT_MODE_UNUSED:
@@ -273,6 +274,8 @@ static void __init ttc_setup_clocksource(struct clk *clk, void __iomem *base)
 		return;
 	}
 
+	ttccs->ttc.freq = clk_get_rate(ttccs->ttc.clk);
+
 	ttccs->ttc.clk_rate_change_nb.notifier_call =
 		ttc_rate_change_clocksource_cb;
 	ttccs->ttc.clk_rate_change_nb.next = NULL;
@@ -298,16 +301,14 @@ static void __init ttc_setup_clocksource(struct clk *clk, void __iomem *base)
 	__raw_writel(CNT_CNTRL_RESET,
 		     ttccs->ttc.base_addr + TTC_CNT_CNTRL_OFFSET);
 
-	err = clocksource_register_hz(&ttccs->cs,
-			clk_get_rate(ttccs->ttc.clk) / PRESCALE);
+	err = clocksource_register_hz(&ttccs->cs, ttccs->ttc.freq / PRESCALE);
 	if (WARN_ON(err)) {
 		kfree(ttccs);
 		return;
 	}
 
 	ttc_sched_clock_val_reg = base + TTC_COUNT_VAL_OFFSET;
-	setup_sched_clock(ttc_sched_clock_read, 16,
-			clk_get_rate(ttccs->ttc.clk) / PRESCALE);
+	setup_sched_clock(ttc_sched_clock_read, 16, ttccs->ttc.freq / PRESCALE);
 }
 
 static int ttc_rate_change_clockevent_cb(struct notifier_block *nb,
@@ -334,6 +335,9 @@ static int ttc_rate_change_clockevent_cb(struct notifier_block *nb,
 				ndata->new_rate / PRESCALE);
 		local_irq_restore(flags);
 
+		/* update cached frequency */
+		ttc->freq = ndata->new_rate;
+
 		/* fall through */
 	}
 	case PRE_RATE_CHANGE:
@@ -367,6 +371,7 @@ static void __init ttc_setup_clockevent(struct clk *clk,
 	if (clk_notifier_register(ttcce->ttc.clk,
 				&ttcce->ttc.clk_rate_change_nb))
 		pr_warn("Unable to register clock notifier.\n");
+	ttcce->ttc.freq = clk_get_rate(ttcce->ttc.clk);
 
 	ttcce->ttc.base_addr = base;
 	ttcce->ce.name = "ttc_clockevent";
@@ -396,7 +401,7 @@ static void __init ttc_setup_clockevent(struct clk *clk,
 	}
 
 	clockevents_config_and_register(&ttcce->ce,
-			clk_get_rate(ttcce->ttc.clk) / PRESCALE, 1, 0xfffe);
+			ttcce->ttc.freq / PRESCALE, 1, 0xfffe);
 }
 
 /**
-- 
cgit v0.10.2


From 6bcac805bacb3b637911244d95368512ae389abc Mon Sep 17 00:00:00 2001
From: Russell King <rmk+kernel@arm.linux.org.uk>
Date: Tue, 7 Jan 2014 17:53:54 +0000
Subject: Revert "ARM: 7908/1: mm: Fix the arm_dma_limit calculation"

This reverts commit 787b0d5c1ca7ff24feb6f92e4c7f4410ee7d81a8 since
it is no longer required after 7909/1 was applied, and it causes
build regressions when ARM_PATCH_PHYS_VIRT is disabled and DMA_ZONE
is enabled.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 1f7b19a..3e8f106e 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -229,7 +229,7 @@ void __init setup_dma_zone(const struct machine_desc *mdesc)
 #ifdef CONFIG_ZONE_DMA
 	if (mdesc->dma_zone_size) {
 		arm_dma_zone_size = mdesc->dma_zone_size;
-		arm_dma_limit = __pv_phys_offset + arm_dma_zone_size - 1;
+		arm_dma_limit = PHYS_OFFSET + arm_dma_zone_size - 1;
 	} else
 		arm_dma_limit = 0xffffffff;
 	arm_dma_pfn_limit = arm_dma_limit >> PAGE_SHIFT;
-- 
cgit v0.10.2


From 0882dae983707455e97479e5e904e37673517ebc Mon Sep 17 00:00:00 2001
From: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date: Wed, 8 Jan 2014 11:12:27 -0200
Subject: drm/i915: fix DDI PLLs HW state readout code

Properly zero the refcounts and crtc->ddi_pll_set so the previous HW
state doesn't affect the result of reading the current HW state.

This fixes WARNs about WRPLL refcount if we have an HDMI monitor on
HSW and then suspend/resume.

Cc: stable@vger.kernel.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64379
Tested-by: Qingshuai Tian <qingshuai.tian@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

diff --git a/drivers/gpu/drm/i915/intel_ddi.c b/drivers/gpu/drm/i915/intel_ddi.c
index 526c8de..b69dc3e 100644
--- a/drivers/gpu/drm/i915/intel_ddi.c
+++ b/drivers/gpu/drm/i915/intel_ddi.c
@@ -1057,12 +1057,18 @@ void intel_ddi_setup_hw_pll_state(struct drm_device *dev)
 	enum pipe pipe;
 	struct intel_crtc *intel_crtc;
 
+	dev_priv->ddi_plls.spll_refcount = 0;
+	dev_priv->ddi_plls.wrpll1_refcount = 0;
+	dev_priv->ddi_plls.wrpll2_refcount = 0;
+
 	for_each_pipe(pipe) {
 		intel_crtc =
 			to_intel_crtc(dev_priv->pipe_to_crtc_mapping[pipe]);
 
-		if (!intel_crtc->active)
+		if (!intel_crtc->active) {
+			intel_crtc->ddi_pll_sel = PORT_CLK_SEL_NONE;
 			continue;
+		}
 
 		intel_crtc->ddi_pll_sel = intel_ddi_get_crtc_pll(dev_priv,
 								 pipe);
-- 
cgit v0.10.2


From 1739f09e33d8f66bf48ddbc3eca615574da6c4f6 Mon Sep 17 00:00:00 2001
From: Steven Rostedt <rostedt@goodmis.org>
Date: Wed, 13 Nov 2013 15:20:04 -0500
Subject: ftrace/x86: Load ftrace_ops in parameter not the variable holding it

Function tracing callbacks expect to have the ftrace_ops that registered it
passed to them, not the address of the variable that holds the ftrace_ops
that registered it.

Use a mov instead of a lea to store the ftrace_ops into the parameter
of the function tracing callback.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/20131113152004.459787f9@gandalf.local.home
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.8+

diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 51e2988..a2a4f46 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -1082,7 +1082,7 @@ ENTRY(ftrace_caller)
 	pushl $0	/* Pass NULL as regs pointer */
 	movl 4*4(%esp), %eax
 	movl 0x4(%ebp), %edx
-	leal function_trace_op, %ecx
+	movl function_trace_op, %ecx
 	subl $MCOUNT_INSN_SIZE, %eax
 
 .globl ftrace_call
@@ -1140,7 +1140,7 @@ ENTRY(ftrace_regs_caller)
 	movl 12*4(%esp), %eax	/* Load ip (1st parameter) */
 	subl $MCOUNT_INSN_SIZE, %eax	/* Adjust ip */
 	movl 0x4(%ebp), %edx	/* Load parent ip (2nd parameter) */
-	leal function_trace_op, %ecx /* Save ftrace_pos in 3rd parameter */
+	movl function_trace_op, %ecx /* Save ftrace_pos in 3rd parameter */
 	pushl %esp		/* Save pt_regs as 4th parameter */
 
 GLOBAL(ftrace_regs_call)
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index e21b078..1e96c36 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -88,7 +88,7 @@ END(function_hook)
 	MCOUNT_SAVE_FRAME \skip
 
 	/* Load the ftrace_ops into the 3rd parameter */
-	leaq function_trace_op, %rdx
+	movq function_trace_op(%rip), %rdx
 
 	/* Load ip into the first parameter */
 	movq RIP(%rsp), %rdi
-- 
cgit v0.10.2


From 7ad228b11ec26a820291c9f5a1168d6176580dc1 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala@linux.intel.com>
Date: Tue, 7 Jan 2014 16:15:36 +0200
Subject: drm/i915: Don't grab crtc mutexes in intel_modeset_gem_init()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When the pipe A force quirk is applied the code will attempt to grab
a crtc mutex during intel_modeset_setup_hw_state(). If we're already
holding all crtc mutexes this will obviously deadlock every time.

So instead of using drm_modeset_lock_all() just grab the
mode_config.mutex. This is enough to avoid the unlocked mutex warnings
from certain lower level functions.

The regression was introduced in:

 commit 027476642811f8559cbe00ef6cc54db230e48a20
 Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
 Date:   Mon Dec 2 11:08:06 2013 +0200

    drm/i915: Take modeset locks around intel_modeset_setup_hw_state()

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: stable@vger.kernel.org
[danvet: Add cc: stable since the offending commit has that, too.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 769b864..2bde35d 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11053,10 +11053,10 @@ void intel_modeset_gem_init(struct drm_device *dev)
 
 	intel_setup_overlay(dev);
 
-	drm_modeset_lock_all(dev);
+	mutex_lock(&dev->mode_config.mutex);
 	drm_mode_config_reset(dev);
 	intel_modeset_setup_hw_state(dev, false);
-	drm_modeset_unlock_all(dev);
+	mutex_unlock(&dev->mode_config.mutex);
 }
 
 void intel_modeset_cleanup(struct drm_device *dev)
-- 
cgit v0.10.2


From 09f2344d0896eb458a639b922224c7a202c11599 Mon Sep 17 00:00:00 2001
From: Jesse Barnes <jbarnes@virtuousgeek.org>
Date: Fri, 10 Jan 2014 13:13:09 -0800
Subject: drm/i915/bdw: make sure south port interrupts are enabled properly v2

We were apparently relying on the defaults on BDW, which resulted in no
hotplug or AUX interrupts.  So be sure to call the ibx_irq_preinstall to
enable all interrupts.

v2: use preinstall instead of redundant SDIER write

References: https://bugs.freedesktop.org/show_bug.cgi?id=72834
References: https://bugs.freedesktop.org/show_bug.cgi?id=72833
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 5d1dedc..f13d5ed 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2713,6 +2713,8 @@ static void gen8_irq_preinstall(struct drm_device *dev)
 #undef GEN8_IRQ_INIT_NDX
 
 	POSTING_READ(GEN8_PCU_IIR);
+
+	ibx_irq_preinstall(dev);
 }
 
 static void ibx_hpd_irq_setup(struct drm_device *dev)
-- 
cgit v0.10.2


From e44ef891e9e68b6ce7d3fd3bac73b7d5433050ae Mon Sep 17 00:00:00 2001
From: Sudeep Holla <sudeep.holla@arm.com>
Date: Wed, 8 Jan 2014 18:24:21 +0100
Subject: ARM: 7934/1: DT/kernel: fix arch_match_cpu_phys_id to avoid erroneous
 match

The MPIDR contains specific bitfields(MPIDR.Aff{2..0}) which uniquely
identify a CPU, in addition to some non-identifying information and
reserved bits. The ARM cpu binding defines the 'reg' property to only
contain the affinity bits, and any cpu nodes with other bits set in
their 'reg' entry are skipped.

As such it is not necessary to mask the phys_id with MPIDR_HWID_BITMASK,
and doing so could lead to matching erroneous CPU nodes in the device
tree. This patch removes the masking of the physical identifier.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

diff --git a/arch/arm/kernel/devtree.c b/arch/arm/kernel/devtree.c
index 739c3df..34d5fd5 100644
--- a/arch/arm/kernel/devtree.c
+++ b/arch/arm/kernel/devtree.c
@@ -171,7 +171,7 @@ void __init arm_dt_init_cpu_maps(void)
 
 bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
 {
-	return (phys_id & MPIDR_HWID_BITMASK) == cpu_logical_map(cpu);
+	return phys_id == cpu_logical_map(cpu);
 }
 
 static const void * __init arch_get_next_mach(const char *const **match)
-- 
cgit v0.10.2


From 261521f142fb426b58ed460d2476049dc8fa4fb9 Mon Sep 17 00:00:00 2001
From: Stephen Boyd <sboyd@codeaurora.org>
Date: Fri, 10 Jan 2014 00:57:06 +0100
Subject: ARM: 7937/1: perf_event: Silence sparse warning

arch/arm/kernel/perf_event_cpu.c:274:25: warning: incorrect type in assignment (different modifiers)
arch/arm/kernel/perf_event_cpu.c:274:25: expected int ( *init_fn )( ... )
arch/arm/kernel/perf_event_cpu.c:274:25: got void const *const data

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
index d85055c..20d553c 100644
--- a/arch/arm/kernel/perf_event_cpu.c
+++ b/arch/arm/kernel/perf_event_cpu.c
@@ -254,7 +254,7 @@ static int probe_current_pmu(struct arm_pmu *pmu)
 static int cpu_pmu_device_probe(struct platform_device *pdev)
 {
 	const struct of_device_id *of_id;
-	int (*init_fn)(struct arm_pmu *);
+	const int (*init_fn)(struct arm_pmu *);
 	struct device_node *node = pdev->dev.of_node;
 	struct arm_pmu *pmu;
 	int ret = -ENODEV;
-- 
cgit v0.10.2


From d6cd989477e2fee29ccda257614ef7b2621d0601 Mon Sep 17 00:00:00 2001
From: Taras Kondratiuk <taras.kondratiuk@linaro.org>
Date: Fri, 10 Jan 2014 01:36:00 +0100
Subject: ARM: 7939/1: traps: fix opcode endianness when read from user memory

Currently code has an inverted logic: opcode from user memory
is swapped to a proper endianness only in case of read error.
While normally opcode should be swapped only if it was read
correctly from user memory.

Reviewed-by: Victor Kamensky <victor.kamensky@linaro.org>
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Taras Kondratiuk <taras.kondratiuk@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 6eda3bf..4636d56 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -431,9 +431,10 @@ asmlinkage void __exception do_undefinstr(struct pt_regs *regs)
 			instr2 = __mem_to_opcode_thumb16(instr2);
 			instr = __opcode_thumb32_compose(instr, instr2);
 		}
-	} else if (get_user(instr, (u32 __user *)pc)) {
+	} else {
+		if (get_user(instr, (u32 __user *)pc))
+			goto die_sig;
 		instr = __mem_to_opcode_arm(instr);
-		goto die_sig;
 	}
 
 	if (call_undef_hook(regs, instr) == 0)
-- 
cgit v0.10.2


From 26bef1318adc1b3a530ecc807ef99346db2aa8b0 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sat, 11 Jan 2014 19:15:52 -0800
Subject: x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround

Before we do an EMMS in the AMD FXSAVE information leak workaround we
need to clear any pending exceptions, otherwise we trap with a
floating-point exception inside this code.

Reported-by: halfdog <me@halfdog.net>
Tested-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=YLXL-yr7oMxyy0WU2gCBAf3ydg@mail.gmail.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>

diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h
index c49a613..cea1c76 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -293,12 +293,13 @@ static inline int restore_fpu_checking(struct task_struct *tsk)
 	/* AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception
 	   is pending.  Clear the x87 state here by setting it to fixed
 	   values. "m" is a random variable that should be in L1 */
-	alternative_input(
-		ASM_NOP8 ASM_NOP2,
-		"emms\n\t"		/* clear stack tags */
-		"fildl %P[addr]",	/* set F?P to defined value */
-		X86_FEATURE_FXSAVE_LEAK,
-		[addr] "m" (tsk->thread.fpu.has_fpu));
+	if (unlikely(static_cpu_has(X86_FEATURE_FXSAVE_LEAK))) {
+		asm volatile(
+			"fnclex\n\t"
+			"emms\n\t"
+			"fildl %P[addr]"	/* set F?P to defined value */
+			: : [addr] "m" (tsk->thread.fpu.has_fpu));
+	}
 
 	return fpu_restore_checking(&tsk->thread.fpu);
 }
-- 
cgit v0.10.2


From 9722c2dac708e9468cc0dc30218ef76946ffbc9d Mon Sep 17 00:00:00 2001
From: Rik van Riel <riel@redhat.com>
Date: Mon, 6 Jan 2014 11:39:12 +0000
Subject: sched: Calculate effective load even if local weight is 0

Thomas Hellstrom bisected a regression where erratic 3D performance is
experienced on virtual machines as measured by glxgears. It identified
commit 58d081b5 ("sched/numa: Avoid overloading CPUs on a preferred NUMA
node") as the problem which had modified the behaviour of effective_load.

Effective load calculates the difference to the system-wide load if a
scheduling entity was moved to another CPU. The task group is not heavier
as a result of the move but overall system load can increase/decrease as a
result of the change. Commit 58d081b5 ("sched/numa: Avoid overloading CPUs
on a preferred NUMA node") changed effective_load to make it suitable for
calculating if a particular NUMA node was compute overloaded. To reduce
the cost of the function, it assumed that a current sched entity weight
of 0 was uninteresting but that is not the case.

wake_affine() uses a weight of 0 for sync wakeups on the grounds that it
is assuming the waking task will sleep and not contribute to load in the
near future. In this case, we still want to calculate the effective load
of the sched entity hierarchy. As effective_load is no longer used by
task_numa_compare since commit fb13c7ee (sched/numa: Use a system-wide
search to find swap/migration candidates), this patch simply restores the
historical behaviour.

Reported-and-tested-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
[ Wrote changelog]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140106113912.GC6178@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c7395d9..e64b079 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3923,7 +3923,7 @@ static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
 {
 	struct sched_entity *se = tg->se[cpu];
 
-	if (!tg->parent || !wl)	/* the trivial, non-cgroup case */
+	if (!tg->parent)	/* the trivial, non-cgroup case */
 		return wl;
 
 	for_each_sched_entity(se) {
-- 
cgit v0.10.2


From 0c3351d451ae2fa438d5d1ed719fc43354fbffbb Mon Sep 17 00:00:00 2001
From: John Stultz <john.stultz@linaro.org>
Date: Thu, 2 Jan 2014 15:11:13 -0800
Subject: seqlock: Use raw_ prefix instead of _no_lockdep
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Linus disliked the _no_lockdep() naming, so instead
use the more-consistent raw_* prefix to the non-lockdep
enabled seqcount methods.

This also adds raw_ methods for the write operations
as well, which will be utilized in a following patch.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Krzysztof Hałasa <khalasa@piap.pl>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Willy Tarreau <w@1wt.eu>
Link: http://lkml.kernel.org/r/1388704274-5278-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

diff --git a/arch/x86/vdso/vclock_gettime.c b/arch/x86/vdso/vclock_gettime.c
index 2ada505..eb5d7a5 100644
--- a/arch/x86/vdso/vclock_gettime.c
+++ b/arch/x86/vdso/vclock_gettime.c
@@ -178,7 +178,7 @@ notrace static int __always_inline do_realtime(struct timespec *ts)
 
 	ts->tv_nsec = 0;
 	do {
-		seq = read_seqcount_begin_no_lockdep(&gtod->seq);
+		seq = raw_read_seqcount_begin(&gtod->seq);
 		mode = gtod->clock.vclock_mode;
 		ts->tv_sec = gtod->wall_time_sec;
 		ns = gtod->wall_time_snsec;
@@ -198,7 +198,7 @@ notrace static int do_monotonic(struct timespec *ts)
 
 	ts->tv_nsec = 0;
 	do {
-		seq = read_seqcount_begin_no_lockdep(&gtod->seq);
+		seq = raw_read_seqcount_begin(&gtod->seq);
 		mode = gtod->clock.vclock_mode;
 		ts->tv_sec = gtod->monotonic_time_sec;
 		ns = gtod->monotonic_time_snsec;
@@ -214,7 +214,7 @@ notrace static int do_realtime_coarse(struct timespec *ts)
 {
 	unsigned long seq;
 	do {
-		seq = read_seqcount_begin_no_lockdep(&gtod->seq);
+		seq = raw_read_seqcount_begin(&gtod->seq);
 		ts->tv_sec = gtod->wall_time_coarse.tv_sec;
 		ts->tv_nsec = gtod->wall_time_coarse.tv_nsec;
 	} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
@@ -225,7 +225,7 @@ notrace static int do_monotonic_coarse(struct timespec *ts)
 {
 	unsigned long seq;
 	do {
-		seq = read_seqcount_begin_no_lockdep(&gtod->seq);
+		seq = raw_read_seqcount_begin(&gtod->seq);
 		ts->tv_sec = gtod->monotonic_time_coarse.tv_sec;
 		ts->tv_nsec = gtod->monotonic_time_coarse.tv_nsec;
 	} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index cf87a24..535f158 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -117,15 +117,15 @@ repeat:
 }
 
 /**
- * read_seqcount_begin_no_lockdep - start seq-read critical section w/o lockdep
+ * raw_read_seqcount_begin - start seq-read critical section w/o lockdep
  * @s: pointer to seqcount_t
  * Returns: count to be passed to read_seqcount_retry
  *
- * read_seqcount_begin_no_lockdep opens a read critical section of the given
+ * raw_read_seqcount_begin opens a read critical section of the given
  * seqcount, but without any lockdep checking. Validity of the critical
  * section is tested by checking read_seqcount_retry function.
  */
-static inline unsigned read_seqcount_begin_no_lockdep(const seqcount_t *s)
+static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
 {
 	unsigned ret = __read_seqcount_begin(s);
 	smp_rmb();
@@ -144,7 +144,7 @@ static inline unsigned read_seqcount_begin_no_lockdep(const seqcount_t *s)
 static inline unsigned read_seqcount_begin(const seqcount_t *s)
 {
 	seqcount_lockdep_reader_access(s);
-	return read_seqcount_begin_no_lockdep(s);
+	return raw_read_seqcount_begin(s);
 }
 
 /**
@@ -206,14 +206,26 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
 }
 
 
+
+static inline void raw_write_seqcount_begin(seqcount_t *s)
+{
+	s->sequence++;
+	smp_wmb();
+}
+
+static inline void raw_write_seqcount_end(seqcount_t *s)
+{
+	smp_wmb();
+	s->sequence++;
+}
+
 /*
  * Sequence counter only version assumes that callers are using their
  * own mutexing.
  */
 static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
 {
-	s->sequence++;
-	smp_wmb();
+	raw_write_seqcount_begin(s);
 	seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_);
 }
 
@@ -225,8 +237,7 @@ static inline void write_seqcount_begin(seqcount_t *s)
 static inline void write_seqcount_end(seqcount_t *s)
 {
 	seqcount_release(&s->dep_map, 1, _RET_IP_);
-	smp_wmb();
-	s->sequence++;
+	raw_write_seqcount_end(s);
 }
 
 /**
-- 
cgit v0.10.2


From 7a06c41cbec33c6dbe7eec575c61986122617408 Mon Sep 17 00:00:00 2001
From: John Stultz <john.stultz@linaro.org>
Date: Thu, 2 Jan 2014 15:11:14 -0800
Subject: sched_clock: Disable seqlock lockdep usage in sched_clock()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Unfortunately the seqlock lockdep enablement can't be used
in sched_clock(), since the lockdep infrastructure eventually
calls into sched_clock(), which causes a deadlock.

Thus, this patch changes all generic sched_clock() usage
to use the raw_* methods.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Reported-by: Krzysztof Hałasa <khalasa@piap.pl>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1388704274-5278-2-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 68b7993..0abb364 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -74,7 +74,7 @@ unsigned long long notrace sched_clock(void)
 		return cd.epoch_ns;
 
 	do {
-		seq = read_seqcount_begin(&cd.seq);
+		seq = raw_read_seqcount_begin(&cd.seq);
 		epoch_cyc = cd.epoch_cyc;
 		epoch_ns = cd.epoch_ns;
 	} while (read_seqcount_retry(&cd.seq, seq));
@@ -99,10 +99,10 @@ static void notrace update_sched_clock(void)
 			  cd.mult, cd.shift);
 
 	raw_local_irq_save(flags);
-	write_seqcount_begin(&cd.seq);
+	raw_write_seqcount_begin(&cd.seq);
 	cd.epoch_ns = ns;
 	cd.epoch_cyc = cyc;
-	write_seqcount_end(&cd.seq);
+	raw_write_seqcount_end(&cd.seq);
 	raw_local_irq_restore(flags);
 }
 
-- 
cgit v0.10.2


From 518d00b7498c5894be94545848d55e5b9c55749e Mon Sep 17 00:00:00 2001
From: Ming Lei <tom.leiming@gmail.com>
Date: Thu, 26 Dec 2013 21:31:37 +0800
Subject: block: null_blk: fix queue leak inside removing device

When queue_mode is NULL_Q_MQ and null_blk is being removed,
blk_cleanup_queue() isn't called to cleanup queue, so the queue
allocated won't be freed.

This patch calls blk_cleanup_queue() for MQ to drain all pending
requests first and release the reference counter of queue kobject, then
blk_mq_free_queue() will be called in queue kobject's release handler
when queue kobject's reference counter drops to zero.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index a2e69d2..83a598e 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -425,10 +425,7 @@ static void null_del_dev(struct nullb *nullb)
 	list_del_init(&nullb->list);
 
 	del_gendisk(nullb->disk);
-	if (queue_mode == NULL_Q_MQ)
-		blk_mq_free_queue(nullb->q);
-	else
-		blk_cleanup_queue(nullb->q);
+	blk_cleanup_queue(nullb->q);
 	put_disk(nullb->disk);
 	kfree(nullb);
 }
@@ -578,10 +575,7 @@ static int null_add_dev(void)
 	disk = nullb->disk = alloc_disk_node(1, home_node);
 	if (!disk) {
 queue_fail:
-		if (queue_mode == NULL_Q_MQ)
-			blk_mq_free_queue(nullb->q);
-		else
-			blk_cleanup_queue(nullb->q);
+		blk_cleanup_queue(nullb->q);
 		cleanup_queues(nullb);
 err:
 		kfree(nullb);
-- 
cgit v0.10.2


From eecc1e426d681351a6026a7d3e7d225f38955b6c Mon Sep 17 00:00:00 2001
From: Hugh Dickins <hughd@google.com>
Date: Sun, 12 Jan 2014 01:25:21 -0800
Subject: thp: fix copy_page_rep GPF by testing is_huge_zero_pmd once only

We see General Protection Fault on RSI in copy_page_rep: that RSI is
what you get from a NULL struct page pointer.

  RIP: 0010:[<ffffffff81154955>]  [<ffffffff81154955>] copy_page_rep+0x5/0x10
  RSP: 0000:ffff880136e15c00  EFLAGS: 00010286
  RAX: ffff880000000000 RBX: ffff880136e14000 RCX: 0000000000000200
  RDX: 6db6db6db6db6db7 RSI: db73880000000000 RDI: ffff880dd0c00000
  RBP: ffff880136e15c18 R08: 0000000000000200 R09: 000000000005987c
  R10: 000000000005987c R11: 0000000000000200 R12: 0000000000000001
  R13: ffffea00305aa000 R14: 0000000000000000 R15: 0000000000000000
  FS:  00007f195752f700(0000) GS:ffff880c7fc20000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000093010000 CR3: 00000001458e1000 CR4: 00000000000027e0
  Call Trace:
    copy_user_huge_page+0x93/0xab
    do_huge_pmd_wp_page+0x710/0x815
    handle_mm_fault+0x15d8/0x1d70
    __do_page_fault+0x14d/0x840
    do_page_fault+0x2f/0x90
    page_fault+0x22/0x30

do_huge_pmd_wp_page() tests is_huge_zero_pmd(orig_pmd) four times: but
since shrink_huge_zero_page() can free the huge_zero_page, and we have
no hold of our own on it here (except where the fourth test holds
page_table_lock and has checked pmd_same), it's possible for it to
answer yes the first time, but no to the second or third test.  Change
all those last three to tests for NULL page.

(Note: this is not the same issue as trinity's DEBUG_PAGEALLOC BUG
in copy_page_rep with RSI: ffff88009c422000, reported by Sasha Levin
in https://lkml.org/lkml/2013/3/29/103.  I believe that one is due
to the source page being split, and a tail page freed, while copy
is in progress; and not a problem without DEBUG_PAGEALLOC, since
the pmd_same check will prevent a miscopy from being made visible.)

Fixes: 97ae17497e99 ("thp: implement refcounting for huge zero page")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v3.10 v3.11 v3.12
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9c0b172..95d1acb 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1154,7 +1154,7 @@ alloc:
 		new_page = NULL;
 
 	if (unlikely(!new_page)) {
-		if (is_huge_zero_pmd(orig_pmd)) {
+		if (!page) {
 			ret = do_huge_pmd_wp_zero_page_fallback(mm, vma,
 					address, pmd, orig_pmd, haddr);
 		} else {
@@ -1181,7 +1181,7 @@ alloc:
 
 	count_vm_event(THP_FAULT_ALLOC);
 
-	if (is_huge_zero_pmd(orig_pmd))
+	if (!page)
 		clear_huge_page(new_page, haddr, HPAGE_PMD_NR);
 	else
 		copy_user_huge_page(new_page, page, haddr, vma, HPAGE_PMD_NR);
@@ -1207,7 +1207,7 @@ alloc:
 		page_add_new_anon_rmap(new_page, vma, haddr);
 		set_pmd_at(mm, haddr, pmd, entry);
 		update_mmu_cache_pmd(vma, address, pmd);
-		if (is_huge_zero_pmd(orig_pmd)) {
+		if (!page) {
 			add_mm_counter(mm, MM_ANONPAGES, HPAGE_PMD_NR);
 			put_huge_zero_page();
 		} else {
-- 
cgit v0.10.2


From 3dc91d4338d698ce77832985f9cb183d8eeaf6be Mon Sep 17 00:00:00 2001
From: Steven Rostedt <rostedt@goodmis.org>
Date: Thu, 9 Jan 2014 21:46:34 -0500
Subject: SELinux: Fix possible NULL pointer dereference in
 selinux_inode_permission()

While running stress tests on adding and deleting ftrace instances I hit
this bug:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
  IP: selinux_inode_permission+0x85/0x160
  PGD 63681067 PUD 7ddbe067 PMD 0
  Oops: 0000 [#1] PREEMPT
  CPU: 0 PID: 5634 Comm: ftrace-test-mki Not tainted 3.13.0-rc4-test-00033-gd2a6dde-dirty #20
  Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
  task: ffff880078375800 ti: ffff88007ddb0000 task.ti: ffff88007ddb0000
  RIP: 0010:[<ffffffff812d8bc5>]  [<ffffffff812d8bc5>] selinux_inode_permission+0x85/0x160
  RSP: 0018:ffff88007ddb1c48  EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000800000 RCX: ffff88006dd43840
  RDX: 0000000000000001 RSI: 0000000000000081 RDI: ffff88006ee46000
  RBP: ffff88007ddb1c88 R08: 0000000000000000 R09: ffff88007ddb1c54
  R10: 6e6576652f6f6f66 R11: 0000000000000003 R12: 0000000000000000
  R13: 0000000000000081 R14: ffff88006ee46000 R15: 0000000000000000
  FS:  00007f217b5b6700(0000) GS:ffffffff81e21000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
  CR2: 0000000000000020 CR3: 000000006a0fe000 CR4: 00000000000007f0
  Call Trace:
    security_inode_permission+0x1c/0x30
    __inode_permission+0x41/0xa0
    inode_permission+0x18/0x50
    link_path_walk+0x66/0x920
    path_openat+0xa6/0x6c0
    do_filp_open+0x43/0xa0
    do_sys_open+0x146/0x240
    SyS_open+0x1e/0x20
    system_call_fastpath+0x16/0x1b
  Code: 84 a1 00 00 00 81 e3 00 20 00 00 89 d8 83 c8 02 40 f6 c6 04 0f 45 d8 40 f6 c6 08 74 71 80 cf 02 49 8b 46 38 4c 8d 4d cc 45 31 c0 <0f> b7 50 20 8b 70 1c 48 8b 41 70 89 d9 8b 78 04 e8 36 cf ff ff
  RIP  selinux_inode_permission+0x85/0x160
  CR2: 0000000000000020

Investigating, I found that the inode->i_security was NULL, and the
dereference of it caused the oops.

in selinux_inode_permission():

	isec = inode->i_security;

	rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd);

Note, the crash came from stressing the deletion and reading of debugfs
files.  I was not able to recreate this via normal files.  But I'm not
sure they are safe.  It may just be that the race window is much harder
to hit.

What seems to have happened (and what I have traced), is the file is
being opened at the same time the file or directory is being deleted.
As the dentry and inode locks are not held during the path walk, nor is
the inodes ref counts being incremented, there is nothing saving these
structures from being discarded except for an rcu_read_lock().

The rcu_read_lock() protects against freeing of the inode, but it does
not protect freeing of the inode_security_struct.  Now if the freeing of
the i_security happens with a call_rcu(), and the i_security field of
the inode is not changed (it gets freed as the inode gets freed) then
there will be no issue here.  (Linus Torvalds suggested not setting the
field to NULL such that we do not need to check if it is NULL in the
permission check).

Note, this is a hack, but it fixes the problem at hand.  A real fix is
to restructure the destroy_inode() to call all the destructor handlers
from the RCU callback.  But that is a major job to do, and requires a
lot of work.  For now, we just band-aid this bug with this fix (it
works), and work on a more maintainable solution in the future.

Link: http://lkml.kernel.org/r/20140109101932.0508dec7@gandalf.local.home
Link: http://lkml.kernel.org/r/20140109182756.17abaaa8@gandalf.local.home

Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 6625699..57b0b49 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -234,6 +234,14 @@ static int inode_alloc_security(struct inode *inode)
 	return 0;
 }
 
+static void inode_free_rcu(struct rcu_head *head)
+{
+	struct inode_security_struct *isec;
+
+	isec = container_of(head, struct inode_security_struct, rcu);
+	kmem_cache_free(sel_inode_cache, isec);
+}
+
 static void inode_free_security(struct inode *inode)
 {
 	struct inode_security_struct *isec = inode->i_security;
@@ -244,8 +252,16 @@ static void inode_free_security(struct inode *inode)
 		list_del_init(&isec->list);
 	spin_unlock(&sbsec->isec_lock);
 
-	inode->i_security = NULL;
-	kmem_cache_free(sel_inode_cache, isec);
+	/*
+	 * The inode may still be referenced in a path walk and
+	 * a call to selinux_inode_permission() can be made
+	 * after inode_free_security() is called. Ideally, the VFS
+	 * wouldn't do this, but fixing that is a much harder
+	 * job. For now, simply free the i_security via RCU, and
+	 * leave the current inode->i_security pointer intact.
+	 * The inode will be freed after the RCU grace period too.
+	 */
+	call_rcu(&isec->rcu, inode_free_rcu);
 }
 
 static int file_alloc_security(struct file *file)
diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h
index b1dfe10..078e553 100644
--- a/security/selinux/include/objsec.h
+++ b/security/selinux/include/objsec.h
@@ -38,7 +38,10 @@ struct task_security_struct {
 
 struct inode_security_struct {
 	struct inode *inode;	/* back pointer to inode object */
-	struct list_head list;	/* list of inode_security_struct */
+	union {
+		struct list_head list;	/* list of inode_security_struct */
+		struct rcu_head rcu;	/* for freeing the inode_security_struct */
+	};
 	u32 task_sid;		/* SID of creating task */
 	u32 sid;		/* SID of this object */
 	u16 sclass;		/* security class of this object */
-- 
cgit v0.10.2


From 7e22e91102c6b9df7c4ae2168910e19d2bb14cd6 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sun, 12 Jan 2014 17:04:18 +0700
Subject: Linux 3.13-rc8


diff --git a/Makefile b/Makefile
index ae1a55a..eeec740 100644
--- a/Makefile
+++ b/Makefile
@@ -1,7 +1,7 @@
 VERSION = 3
 PATCHLEVEL = 13
 SUBLEVEL = 0
-EXTRAVERSION = -rc7
+EXTRAVERSION = -rc8
 NAME = One Giant Leap for Frogkind
 
 # *DOCUMENTATION*
-- 
cgit v0.10.2


From b25f3e1c358434bf850220e04f28eebfc45eb634 Mon Sep 17 00:00:00 2001
From: Taras Kondratiuk <taras.kondratiuk@linaro.org>
Date: Fri, 10 Jan 2014 01:27:08 +0100
Subject: ARM: 7938/1: OMAP4/highbank: Flush L2 cache before disabling

Kexec disables outer cache before jumping to reboot code, but it doesn't
flush it explicitly. Flush is done implicitly inside of l2x0_disable().
But some SoC's override default .disable handler and don't flush cache.
This may lead to a corrupted memory during Kexec reboot on these
platforms.

This patch adds cache flush inside of OMAP4 and Highbank outer_cache.disable()
handlers to make it consistent with default l2x0_disable().

Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Taras Kondratiuk <taras.kondratiuk@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

diff --git a/arch/arm/mach-highbank/highbank.c b/arch/arm/mach-highbank/highbank.c
index b3d7e56..ae17150 100644
--- a/arch/arm/mach-highbank/highbank.c
+++ b/arch/arm/mach-highbank/highbank.c
@@ -50,6 +50,7 @@ static void __init highbank_scu_map_io(void)
 
 static void highbank_l2x0_disable(void)
 {
+	outer_flush_all();
 	/* Disable PL310 L2 Cache controller */
 	highbank_smc1(0x102, 0x0);
 }
diff --git a/arch/arm/mach-omap2/omap4-common.c b/arch/arm/mach-omap2/omap4-common.c
index 5791143..3f44b16 100644
--- a/arch/arm/mach-omap2/omap4-common.c
+++ b/arch/arm/mach-omap2/omap4-common.c
@@ -163,6 +163,7 @@ void __iomem *omap4_get_l2cache_base(void)
 
 static void omap4_l2x0_disable(void)
 {
+	outer_flush_all();
 	/* Disable PL310 L2 Cache controller */
 	omap_smc1(0x102, 0x0);
 }
-- 
cgit v0.10.2


From 10348f5976830e5d8f74e8abb04a9a057a5e8478 Mon Sep 17 00:00:00 2001
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Mon, 13 Jan 2014 09:49:17 +1100
Subject: powerpc: Check return value of instance-to-package OF call

On PA-Semi firmware, the instance-to-package callback doesn't seem
to be implemented. We didn't check for error, however, thus
subsequently passed the -1 value returned into stdout_node to
thins like prom_getprop etc...

Thus caused the firmware to load values around 0 (physical) internally
as node structures. It somewhat "worked" as long as we had a NULL in the
right place (address 8) at the beginning of the kernel, we didn't "see"
the bug. But commit 5c0484e25ec03243d4c2f2d4416d4a13efc77f6a
"powerpc: Endian safe trampoline" changed the kernel entry point causing
that old bug to now cause a crash early during boot.

This fixes booting on PA-Semi board by properly checking the return
value from instance-to-package.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Olof Johansson <olof@lixom.net>
---

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index cb64a6e..078145a 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -1986,19 +1986,23 @@ static void __init prom_init_stdout(void)
 	/* Get the full OF pathname of the stdout device */
 	memset(path, 0, 256);
 	call_prom("instance-to-path", 3, 1, prom.stdout, path, 255);
-	stdout_node = call_prom("instance-to-package", 1, 1, prom.stdout);
-	val = cpu_to_be32(stdout_node);
-	prom_setprop(prom.chosen, "/chosen", "linux,stdout-package",
-		     &val, sizeof(val));
 	prom_printf("OF stdout device is: %s\n", of_stdout_device);
 	prom_setprop(prom.chosen, "/chosen", "linux,stdout-path",
 		     path, strlen(path) + 1);
 
-	/* If it's a display, note it */
-	memset(type, 0, sizeof(type));
-	prom_getprop(stdout_node, "device_type", type, sizeof(type));
-	if (strcmp(type, "display") == 0)
-		prom_setprop(stdout_node, path, "linux,boot-display", NULL, 0);
+	/* instance-to-package fails on PA-Semi */
+	stdout_node = call_prom("instance-to-package", 1, 1, prom.stdout);
+	if (stdout_node != PROM_ERROR) {
+		val = cpu_to_be32(stdout_node);
+		prom_setprop(prom.chosen, "/chosen", "linux,stdout-package",
+			     &val, sizeof(val));
+
+		/* If it's a display, note it */
+		memset(type, 0, sizeof(type));
+		prom_getprop(stdout_node, "device_type", type, sizeof(type));
+		if (strcmp(type, "display") == 0)
+			prom_setprop(stdout_node, path, "linux,boot-display", NULL, 0);
+	}
 }
 
 static int __init prom_find_machine_type(void)
-- 
cgit v0.10.2


From abce1ec9b08a4f318f431e6b9a12524227ae7109 Mon Sep 17 00:00:00 2001
From: Dave Airlie <airlied@redhat.com>
Date: Tue, 14 Jan 2014 12:50:49 +1000
Subject: Revert "drm: copy mode type in drm_mode_connector_list_update()"

This reverts commit 3fbd6439e4639ecaeaae6c079e0aa497a1ac3482.

This caused some strange booting lockup issues on an Intel G33
belonging to Daniel Vetter, very unusual, I was hoping Daniel
would track this down, but it looks like instead I'll have to hack
a different fix for -next.

Signed-off-by: Dave Airlie <airlied@redhat.com>

diff --git a/drivers/gpu/drm/drm_modes.c b/drivers/gpu/drm/drm_modes.c
index 85071a1..b073315 100644
--- a/drivers/gpu/drm/drm_modes.c
+++ b/drivers/gpu/drm/drm_modes.c
@@ -1041,7 +1041,7 @@ void drm_mode_connector_list_update(struct drm_connector *connector)
 				/* if equal delete the probed mode */
 				mode->status = pmode->status;
 				/* Merge type bits together */
-				mode->type = pmode->type;
+				mode->type |= pmode->type;
 				list_del(&pmode->head);
 				drm_mode_destroy(connector->dev, pmode);
 				break;
-- 
cgit v0.10.2


From 1cc03eb93245e63b0b7a7832165efdc52e25b4e6 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Mon, 6 Jan 2014 13:19:42 +1100
Subject: md/raid5: Fix possible confusion when multiple write errors occur.

commit 5d8c71f9e5fbdd95650be00294d238e27a363b5c
    md: raid5 crash during degradation

Fixed a crash in an overly simplistic way which could leave
R5_WriteError or R5_MadeGood set in the stripe cache for devices
for which it is no longer relevant.
When those devices are removed and spares added the flags are still
set and can cause incorrect behaviour.

commit 14a75d3e07c784c004b4b44b34af996b8e4ac453
    md/raid5: preferentially read from replacement device if possible.

Fixed the same bug if a more effective way, so we can now revert
the original commit.

Reported-and-tested-by: Alexander Lyakas <alex.bolshoy@gmail.com>
Cc: stable@vger.kernel.org (3.2+ - 3.2 will need a different fix though)
Fixes: 5d8c71f9e5fbdd95650be00294d238e27a363b5c
Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index cc055da..9168173 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3608,7 +3608,7 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
 			 */
 			set_bit(R5_Insync, &dev->flags);
 
-		if (rdev && test_bit(R5_WriteError, &dev->flags)) {
+		if (test_bit(R5_WriteError, &dev->flags)) {
 			/* This flag does not apply to '.replacement'
 			 * only to .rdev, so make sure to check that*/
 			struct md_rdev *rdev2 = rcu_dereference(
@@ -3621,7 +3621,7 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
 			} else
 				clear_bit(R5_WriteError, &dev->flags);
 		}
-		if (rdev && test_bit(R5_MadeGood, &dev->flags)) {
+		if (test_bit(R5_MadeGood, &dev->flags)) {
 			/* This flag does not apply to '.replacement'
 			 * only to .rdev, so make sure to check that*/
 			struct md_rdev *rdev2 = rcu_dereference(
-- 
cgit v0.10.2


From b50c259e25d9260b9108dc0c2964c26e5ecbe1c1 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Tue, 14 Jan 2014 10:38:09 +1100
Subject: md/raid10: fix two bugs in handling of known-bad-blocks.

If we discover a bad block when reading we split the request and
potentially read some of it from a different device.

The code path of this has two bugs in RAID10.
1/ we get a spin_lock with _irq, but unlock without _irq!!
2/ The calculation of 'sectors_handled' is wrong, as can be clearly
   seen by comparison with raid1.c

This leads to at least 2 warnings and a probable crash is a RAID10
ever had known bad blocks.

Cc: stable@vger.kernel.org (v3.1+)
Fixes: 856e08e23762dfb92ffc68fd0a8d228f9e152160
Reported-by: Damian Nowak <spam@nowaker.net>
URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181
Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index c504e83..6528521 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1319,7 +1319,7 @@ read_again:
 			/* Could not read all from this device, so we will
 			 * need another r10_bio.
 			 */
-			sectors_handled = (r10_bio->sectors + max_sectors
+			sectors_handled = (r10_bio->sector + max_sectors
 					   - bio->bi_sector);
 			r10_bio->sectors = max_sectors;
 			spin_lock_irq(&conf->device_lock);
@@ -1327,7 +1327,7 @@ read_again:
 				bio->bi_phys_segments = 2;
 			else
 				bio->bi_phys_segments++;
-			spin_unlock(&conf->device_lock);
+			spin_unlock_irq(&conf->device_lock);
 			/* Cannot call generic_make_request directly
 			 * as that will be queued in __generic_make_request
 			 * and subsequent mempool_alloc might block
-- 
cgit v0.10.2


From 41a336e011887f73e7c879b60e1e3544045435cb Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Tue, 14 Jan 2014 11:56:14 +1100
Subject: md/raid1: fix request counting bug in new 'barrier' code.

The new iobarrier implementation in raid1 (which keeps normal writes
and resync activity separate) counts every request what is not before
the current resync point in either next_window_requests or
current_window_requests.
It flags that the request is counted by setting ->start_next_window.

allow_barrier follows this model exactly and decrements one of the
*_window_requests if and only if ->start_next_window is set.

However wait_barrier(), which increments *_window_requests uses a
slightly different test for setting -.start_next_window (which is set
from the return value of this function).
So there is a possibility of the counts getting out of sync, and this
leads to the resync hanging.

So change wait_barrier() to return a non-zero value in exactly the
same cases that it increments *_window_requests.

But was introduced in 3.13-rc1.

Reported-by: Bruno Wolff III <bruno@wolff.to>
URL: https://bugzilla.kernel.org/show_bug.cgi?id=68061
Fixes: 79ef3a8aa1cb1523cc231c9a90a278333c21f761
Cc: majianpeng <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 1e5a540..a49cfcc 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -924,9 +924,8 @@ static sector_t wait_barrier(struct r1conf *conf, struct bio *bio)
 				conf->next_window_requests++;
 			else
 				conf->current_window_requests++;
-		}
-		if (bio->bi_sector >= conf->start_next_window)
 			sector = conf->start_next_window;
+		}
 	}
 
 	conf->nr_pending++;
-- 
cgit v0.10.2


From 5af9bef72c074dbe946da8b74eabd79cd5a9ff19 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Tue, 14 Jan 2014 15:16:10 +1100
Subject: md/raid5: fix a recently broken BUG_ON().

commit 6d183de4077191d1201283a9035ce57a9b05254d
    md/raid5: fix newly-broken locking in get_active_stripe.

simplified a BUG_ON, but removed too much so now it sometimes fires
when it shouldn't.

When the STRIPE_EXPANDING flag is set, the stripe_head might be on a
special list while multiple stripe_heads are collected, or it might
not be on any list, even a 'free' list when the refcount is zero.  As
long as STRIPE_EXPANDING is set, it will be found and added back to a
list eventually.

So both of the BUG_ONs which test for the ->lru being empty or not
need to avoid the case where STRIPE_EXPANDING is set.

The patch which broke this was marked for -stable, so this patch needs
to be applied to any branch that received 6d183de4

Fixes: 6d183de4077191d1201283a9035ce57a9b05254d
Cc: stable@vger.kernel.org (any release to which above was applied)
Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 9168173..cbb1571 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -687,7 +687,8 @@ get_active_stripe(struct r5conf *conf, sector_t sector,
 			} else {
 				if (!test_bit(STRIPE_HANDLE, &sh->state))
 					atomic_inc(&conf->active_stripes);
-				BUG_ON(list_empty(&sh->lru));
+				BUG_ON(list_empty(&sh->lru) &&
+				       !test_bit(STRIPE_EXPANDING, &sh->state));
 				list_del_init(&sh->lru);
 				if (sh->group) {
 					sh->group->stripes_cnt--;
-- 
cgit v0.10.2


From e8b849158508565e0cd6bc80061124afc5879160 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Mon, 6 Jan 2014 10:35:34 +1100
Subject: md/raid10: fix bug when raid10 recovery fails to recover a block.

commit e875ecea266a543e643b19e44cf472f1412708f9
    md/raid10 record bad blocks as needed during recovery.

added code to the "cannot recover this block" path to record a bad
block rather than fail the whole recovery.
Unfortunately this new case was placed *after* r10bio was freed rather
than *before*, yet it still uses r10bio.
This is will crash with a null dereference.

So move the freeing of r10bio down where it is safe.

Cc: stable@vger.kernel.org (v3.1+)
Fixes: e875ecea266a543e643b19e44cf472f1412708f9
Reported-by: Damian Nowak <spam@nowaker.net>
URL: https://bugzilla.kernel.org/show_bug.cgi?id=68181
Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 6528521..06eeb99 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3218,10 +3218,6 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
 			if (j == conf->copies) {
 				/* Cannot recover, so abort the recovery or
 				 * record a bad block */
-				put_buf(r10_bio);
-				if (rb2)
-					atomic_dec(&rb2->remaining);
-				r10_bio = rb2;
 				if (any_working) {
 					/* problem is that there are bad blocks
 					 * on other device(s)
@@ -3253,6 +3249,10 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr,
 					mirror->recovery_disabled
 						= mddev->recovery_disabled;
 				}
+				put_buf(r10_bio);
+				if (rb2)
+					atomic_dec(&rb2->remaining);
+				r10_bio = rb2;
 				break;
 			}
 		}
-- 
cgit v0.10.2


From 8313b8e57f55b15e5b7f7fc5d1630bbf686a9a97 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Thu, 12 Dec 2013 10:13:33 +1100
Subject: md: fix problem when adding device to read-only array with bitmap.

If an array is started degraded, and then the missing device
is found it can be re-added and a minimal bitmap-based recovery
will bring it fully up-to-date.

If the array is read-only a recovery would not be allowed.
But also if the array is read-only and the missing device was
present very recently, then there could be no need for any
recovery at all, so we simply include the device in the read-only
array without any recovery.

However... if the missing device was removed a little longer ago
it could be missing some updates, but if a bitmap is present it will
be conditionally accepted pending a bitmap-based update.  We don't
currently detect this case properly and will include that old
device into the read-only array with no recovery even though it really
needs a recovery.

This patch keeps track of whether a bitmap-based-recovery is really
needed or not in the new Bitmap_sync rdev flag.  If that is set,
then the device will not be added to a read-only array.

Cc: Andrei Warkentin <andreiw@vmware.com>
Fixes: d70ed2e4fafdbef0800e73942482bb075c21578b
Cc: stable@vger.kernel.org (3.2+)
Signed-off-by: NeilBrown <neilb@suse.de>

diff --git a/drivers/md/md.c b/drivers/md/md.c
index e60cebf..2a456a5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1087,6 +1087,7 @@ static int super_90_validate(struct mddev *mddev, struct md_rdev *rdev)
 	rdev->raid_disk = -1;
 	clear_bit(Faulty, &rdev->flags);
 	clear_bit(In_sync, &rdev->flags);
+	clear_bit(Bitmap_sync, &rdev->flags);
 	clear_bit(WriteMostly, &rdev->flags);
 
 	if (mddev->raid_disks == 0) {
@@ -1165,6 +1166,8 @@ static int super_90_validate(struct mddev *mddev, struct md_rdev *rdev)
 		 */
 		if (ev1 < mddev->bitmap->events_cleared)
 			return 0;
+		if (ev1 < mddev->events)
+			set_bit(Bitmap_sync, &rdev->flags);
 	} else {
 		if (ev1 < mddev->events)
 			/* just a hot-add of a new device, leave raid_disk at -1 */
@@ -1573,6 +1576,7 @@ static int super_1_validate(struct mddev *mddev, struct md_rdev *rdev)
 	rdev->raid_disk = -1;
 	clear_bit(Faulty, &rdev->flags);
 	clear_bit(In_sync, &rdev->flags);
+	clear_bit(Bitmap_sync, &rdev->flags);
 	clear_bit(WriteMostly, &rdev->flags);
 
 	if (mddev->raid_disks == 0) {
@@ -1655,6 +1659,8 @@ static int super_1_validate(struct mddev *mddev, struct md_rdev *rdev)
 		 */
 		if (ev1 < mddev->bitmap->events_cleared)
 			return 0;
+		if (ev1 < mddev->events)
+			set_bit(Bitmap_sync, &rdev->flags);
 	} else {
 		if (ev1 < mddev->events)
 			/* just a hot-add of a new device, leave raid_disk at -1 */
@@ -2798,6 +2804,7 @@ slot_store(struct md_rdev *rdev, const char *buf, size_t len)
 		else
 			rdev->saved_raid_disk = -1;
 		clear_bit(In_sync, &rdev->flags);
+		clear_bit(Bitmap_sync, &rdev->flags);
 		err = rdev->mddev->pers->
 			hot_add_disk(rdev->mddev, rdev);
 		if (err) {
@@ -5770,6 +5777,7 @@ static int add_new_disk(struct mddev * mddev, mdu_disk_info_t *info)
 			    info->raid_disk < mddev->raid_disks) {
 				rdev->raid_disk = info->raid_disk;
 				set_bit(In_sync, &rdev->flags);
+				clear_bit(Bitmap_sync, &rdev->flags);
 			} else
 				rdev->raid_disk = -1;
 		} else
@@ -7716,7 +7724,8 @@ static int remove_and_add_spares(struct mddev *mddev,
 		if (test_bit(Faulty, &rdev->flags))
 			continue;
 		if (mddev->ro &&
-		    rdev->saved_raid_disk < 0)
+		    ! (rdev->saved_raid_disk >= 0 &&
+		       !test_bit(Bitmap_sync, &rdev->flags)))
 			continue;
 
 		rdev->recovery_offset = 0;
@@ -7797,9 +7806,12 @@ void md_check_recovery(struct mddev *mddev)
 			 * As we only add devices that are already in-sync,
 			 * we can activate the spares immediately.
 			 */
-			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 			remove_and_add_spares(mddev, NULL);
-			mddev->pers->spare_active(mddev);
+			/* There is no thread, but we need to call
+			 * ->spare_active and clear saved_raid_disk
+			 */
+			md_reap_sync_thread(mddev);
+			clear_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 			goto unlock;
 		}
 
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 2f5cc8a..0095ec8 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -129,6 +129,9 @@ struct md_rdev {
 enum flag_bits {
 	Faulty,			/* device is known to have a fault */
 	In_sync,		/* device is in_sync with rest of array */
+	Bitmap_sync,		/* ..actually, not quite In_sync.  Need a
+				 * bitmap-based recovery to get fully in sync
+				 */
 	Unmerged,		/* device is being added to array and should
 				 * be considerred for bvec_merge_fn but not
 				 * yet for actual IO
-- 
cgit v0.10.2


From 2fac2b891f287691c27ee8d2eeecf39571b27fea Mon Sep 17 00:00:00 2001
From: Stephen Warren <swarren@nvidia.com>
Date: Mon, 13 Jan 2014 14:29:04 -0700
Subject: i2c: Re-instate body of i2c_parent_is_i2c_adapter()

The body of i2c_parent_is_i2c_adapter() is currently guarded by
I2C_MUX. It should be CONFIG_I2C_MUX instead.

Among potentially other problems, this resulted in i2c_lock_adapter()
only locking I2C mux child adapters, and not the parent adapter. In
turn, this could allow inter-mingling of mux child selection and I2C
transactions, which could result in I2C transactions being directed to
the wrong I2C bus, and possibly even switching between busses in the
middle of a transaction.

One concrete issue caused by this bug was corrupted HDMI EDID reads
during boot on the NVIDIA Tegra Seaboard system, although this only
became apparent in recent linux-next, when the boot timing was changed
just enough to trigger the race condition.

Fixes: 3923172b3d70 ("i2c: reduce parent checking to a NOOP in non-I2C_MUX case")
Cc: Phil Carmody <phil.carmody@partner.samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

diff --git a/include/linux/i2c.h b/include/linux/i2c.h
index eff50e0..d9c8dbd3 100644
--- a/include/linux/i2c.h
+++ b/include/linux/i2c.h
@@ -445,7 +445,7 @@ static inline void i2c_set_adapdata(struct i2c_adapter *dev, void *data)
 static inline struct i2c_adapter *
 i2c_parent_is_i2c_adapter(const struct i2c_adapter *adapter)
 {
-#if IS_ENABLED(I2C_MUX)
+#if IS_ENABLED(CONFIG_I2C_MUX)
 	struct device *parent = adapter->dev.parent;
 
 	if (parent != NULL && parent->type == &i2c_adapter_type)
-- 
cgit v0.10.2


From 3f9aec7610b39521c7c69d754de7265f6994c194 Mon Sep 17 00:00:00 2001
From: Jean Delvare <khali@linux-fr.org>
Date: Tue, 14 Jan 2014 15:59:55 +0100
Subject: hwmon: (coretemp) Fix truncated name of alarm attributes

When the core number exceeds 9, the size of the buffer storing the
alarm attribute name is insufficient and the attribute name is
truncated. This causes libsensors to skip these attributes as the
truncated name is not recognized.

Reported-by: Andreas Hollmann <hollmann@in.tum.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: stable@vger.kernel.org
Signed-off-by: Guenter Roeck <linux@roeck-us.net>

diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
index 78be661..9425098 100644
--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -52,7 +52,7 @@ MODULE_PARM_DESC(tjmax, "TjMax value in degrees Celsius");
 
 #define BASE_SYSFS_ATTR_NO	2	/* Sysfs Base attr no for coretemp */
 #define NUM_REAL_CORES		32	/* Number of Real cores per cpu */
-#define CORETEMP_NAME_LENGTH	17	/* String Length of attrs */
+#define CORETEMP_NAME_LENGTH	19	/* String Length of attrs */
 #define MAX_CORE_ATTRS		4	/* Maximum no of basic attrs */
 #define TOTAL_ATTRS		(MAX_CORE_ATTRS + 1)
 #define MAX_CORE_DATA		(NUM_REAL_CORES + BASE_SYSFS_ATTR_NO)
-- 
cgit v0.10.2


From 267d29a69c6af39445f36102a832b25ed483f299 Mon Sep 17 00:00:00 2001
From: Christian Engelmayer <cengelma@gmx.at>
Date: Sat, 11 Jan 2014 22:19:30 +0100
Subject: ieee802154: Fix memory leak in ieee802154_add_iface()

Fix a memory leak in the ieee802154_add_iface() error handling path.
Detected by Coverity: CID 710490.

Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/ieee802154/nl-phy.c b/net/ieee802154/nl-phy.c
index d08c7a4..89b265a 100644
--- a/net/ieee802154/nl-phy.c
+++ b/net/ieee802154/nl-phy.c
@@ -221,8 +221,10 @@ int ieee802154_add_iface(struct sk_buff *skb, struct genl_info *info)
 
 	if (info->attrs[IEEE802154_ATTR_DEV_TYPE]) {
 		type = nla_get_u8(info->attrs[IEEE802154_ATTR_DEV_TYPE]);
-		if (type >= __IEEE802154_DEV_MAX)
-			return -EINVAL;
+		if (type >= __IEEE802154_DEV_MAX) {
+			rc = -EINVAL;
+			goto nla_put_failure;
+		}
 	}
 
 	dev = phy->add_iface(phy, devname, type);
-- 
cgit v0.10.2


From a02bbb1ccfe81d3e859ab2ddde6a2eff9c8b39fd Mon Sep 17 00:00:00 2001
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Sun, 12 Jan 2014 13:37:56 +0200
Subject: MAINTAINERS: add virtio-dev ML for virtio

Since virtio is an OASIS standard draft now, virtio implementation
discussions are taking place on the virtio-dev OASIS mailing list.
Update MAINTAINERS.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/MAINTAINERS b/MAINTAINERS
index 31a0462..6a6e4ac 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9231,6 +9231,7 @@ F:	include/media/videobuf2-*
 
 VIRTIO CONSOLE DRIVER
 M:	Amit Shah <amit.shah@redhat.com>
+L:	virtio-dev@lists.oasis-open.org
 L:	virtualization@lists.linux-foundation.org
 S:	Maintained
 F:	drivers/char/virtio_console.c
@@ -9240,6 +9241,7 @@ F:	include/uapi/linux/virtio_console.h
 VIRTIO CORE, NET AND BLOCK DRIVERS
 M:	Rusty Russell <rusty@rustcorp.com.au>
 M:	"Michael S. Tsirkin" <mst@redhat.com>
+L:	virtio-dev@lists.oasis-open.org
 L:	virtualization@lists.linux-foundation.org
 S:	Maintained
 F:	drivers/virtio/
@@ -9252,6 +9254,7 @@ F:	include/uapi/linux/virtio_*.h
 VIRTIO HOST (VHOST)
 M:	"Michael S. Tsirkin" <mst@redhat.com>
 L:	kvm@vger.kernel.org
+L:	virtio-dev@lists.oasis-open.org
 L:	virtualization@lists.linux-foundation.org
 L:	netdev@vger.kernel.org
 S:	Maintained
-- 
cgit v0.10.2


From 7c4b5175f65f31e0cd9867a6ddc3171007dfc110 Mon Sep 17 00:00:00 2001
From: Peter Korsgaard <peter@korsgaard.com>
Date: Sun, 12 Jan 2014 23:15:51 +0100
Subject: dm9601: add USB IDs for new dm96xx variants

A number of new dm96xx variants now exist.

Reported-by: Joseph Chang <joseph_chang@davicom.com.tw>
Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/usb/dm9601.c b/drivers/net/usb/dm9601.c
index 14aa48f..e802198 100644
--- a/drivers/net/usb/dm9601.c
+++ b/drivers/net/usb/dm9601.c
@@ -614,6 +614,18 @@ static const struct usb_device_id products[] = {
 	 USB_DEVICE(0x0a46, 0x9621),	/* DM9621A USB to Fast Ethernet Adapter */
 	 .driver_info = (unsigned long)&dm9601_info,
 	},
+	{
+	 USB_DEVICE(0x0a46, 0x9622),	/* DM9622 USB to Fast Ethernet Adapter */
+	 .driver_info = (unsigned long)&dm9601_info,
+	},
+	{
+	 USB_DEVICE(0x0a46, 0x0269),	/* DM962OA USB to Fast Ethernet Adapter */
+	 .driver_info = (unsigned long)&dm9601_info,
+	},
+	{
+	 USB_DEVICE(0x0a46, 0x1269),	/* DM9621A USB to Fast Ethernet Adapter */
+	 .driver_info = (unsigned long)&dm9601_info,
+	},
 	{},			// END
 };
 
-- 
cgit v0.10.2


From 95f4a45de1a0f172b35451fc52283290adb21f6e Mon Sep 17 00:00:00 2001
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Mon, 13 Jan 2014 02:45:22 +0100
Subject: net: avoid reference counter overflows on fib_rules in multicast
 forwarding

Bob Falken reported that after 4G packets, multicast forwarding stopped
working. This was because of a rule reference counter overflow which
freed the rule as soon as the overflow happend.

This patch solves this by adding the FIB_LOOKUP_NOREF flag to
fib_rules_lookup calls. This is safe even from non-rcu locked sections
as in this case the flag only implies not taking a reference to the rule,
which we don't need at all.

Rules only hold references to the namespace, which are guaranteed to be
available during the call of the non-rcu protected function reg_vif_xmit
because of the interface reference which itself holds a reference to
the net namespace.

Fixes: f0ad0860d01e47 ("ipv4: ipmr: support multiple tables")
Fixes: d1db275dd3f6e4 ("ipv6: ip6mr: support multiple tables")
Reported-by: Bob Falken <NetFestivalHaveFun@gmx.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 62212c7..1672409 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -157,9 +157,12 @@ static struct mr_table *ipmr_get_table(struct net *net, u32 id)
 static int ipmr_fib_lookup(struct net *net, struct flowi4 *flp4,
 			   struct mr_table **mrt)
 {
-	struct ipmr_result res;
-	struct fib_lookup_arg arg = { .result = &res, };
 	int err;
+	struct ipmr_result res;
+	struct fib_lookup_arg arg = {
+		.result = &res,
+		.flags = FIB_LOOKUP_NOREF,
+	};
 
 	err = fib_rules_lookup(net->ipv4.mr_rules_ops,
 			       flowi4_to_flowi(flp4), 0, &arg);
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index f365310..0eb4038 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -141,9 +141,12 @@ static struct mr6_table *ip6mr_get_table(struct net *net, u32 id)
 static int ip6mr_fib_lookup(struct net *net, struct flowi6 *flp6,
 			    struct mr6_table **mrt)
 {
-	struct ip6mr_result res;
-	struct fib_lookup_arg arg = { .result = &res, };
 	int err;
+	struct ip6mr_result res;
+	struct fib_lookup_arg arg = {
+		.result = &res,
+		.flags = FIB_LOOKUP_NOREF,
+	};
 
 	err = fib_rules_lookup(net->ipv6.mr6_rules_ops,
 			       flowi6_to_flowi(flp6), 0, &arg);
-- 
cgit v0.10.2


From 51bb352f15595f2dee42b599680809de3d08999d Mon Sep 17 00:00:00 2001
From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Date: Tue, 14 Jan 2014 13:57:25 -0500
Subject: qlge: Fix vlan netdev features.

vlan gets the same netdev features except vlan filter.

Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index 449f506..f705aee 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -4765,6 +4765,8 @@ static int qlge_probe(struct pci_dev *pdev,
 			    NETIF_F_RXCSUM;
 	ndev->features = ndev->hw_features;
 	ndev->vlan_features = ndev->hw_features;
+	/* vlan gets same features (except vlan filter) */
+	ndev->vlan_features &= ~NETIF_F_HW_VLAN_CTAG_FILTER;
 
 	if (test_bit(QL_DMA64, &qdev->flags))
 		ndev->features |= NETIF_F_HIGHDMA;
-- 
cgit v0.10.2


From fdd239ac99a0cc298b382c5ab5e7bcd09e8933d7 Mon Sep 17 00:00:00 2001
From: Ben Skeggs <bskeggs@redhat.com>
Date: Tue, 14 Jan 2014 14:56:22 +1000
Subject: drm/nouveau: fix null ptr dereferences on some boards

Regression from "device: populate master subdev pointer only when fully
constructed"

Reported-by: Bob Gleitsmann <rjgleits@bellsouth.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>

diff --git a/drivers/gpu/drm/nouveau/core/include/subdev/i2c.h b/drivers/gpu/drm/nouveau/core/include/subdev/i2c.h
index 9fa5da7..7f50a85 100644
--- a/drivers/gpu/drm/nouveau/core/include/subdev/i2c.h
+++ b/drivers/gpu/drm/nouveau/core/include/subdev/i2c.h
@@ -73,7 +73,7 @@ struct nouveau_i2c {
 	int (*identify)(struct nouveau_i2c *, int index,
 			const char *what, struct nouveau_i2c_board_info *,
 			bool (*match)(struct nouveau_i2c_port *,
-				      struct i2c_board_info *));
+				      struct i2c_board_info *, void *), void *);
 	struct list_head ports;
 };
 
diff --git a/drivers/gpu/drm/nouveau/core/include/subdev/instmem.h b/drivers/gpu/drm/nouveau/core/include/subdev/instmem.h
index ec7a54e..4aca338 100644
--- a/drivers/gpu/drm/nouveau/core/include/subdev/instmem.h
+++ b/drivers/gpu/drm/nouveau/core/include/subdev/instmem.h
@@ -50,6 +50,13 @@ struct nouveau_instmem {
 static inline struct nouveau_instmem *
 nouveau_instmem(void *obj)
 {
+	/* nv04/nv40 impls need to create objects in their constructor,
+	 * which is before the subdev pointer is valid
+	 */
+	if (nv_iclass(obj, NV_SUBDEV_CLASS) &&
+	    nv_subidx(obj) == NVDEV_SUBDEV_INSTMEM)
+		return obj;
+
 	return (void *)nv_device(obj)->subdev[NVDEV_SUBDEV_INSTMEM];
 }
 
diff --git a/drivers/gpu/drm/nouveau/core/subdev/i2c/base.c b/drivers/gpu/drm/nouveau/core/subdev/i2c/base.c
index 041fd5e..c33c03d 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/i2c/base.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/i2c/base.c
@@ -197,7 +197,7 @@ static int
 nouveau_i2c_identify(struct nouveau_i2c *i2c, int index, const char *what,
 		     struct nouveau_i2c_board_info *info,
 		     bool (*match)(struct nouveau_i2c_port *,
-				   struct i2c_board_info *))
+				   struct i2c_board_info *, void *), void *data)
 {
 	struct nouveau_i2c_port *port = nouveau_i2c_find(i2c, index);
 	int i;
@@ -221,7 +221,7 @@ nouveau_i2c_identify(struct nouveau_i2c *i2c, int index, const char *what,
 		}
 
 		if (nv_probe_i2c(port, info[i].dev.addr) &&
-		    (!match || match(port, &info[i].dev))) {
+		    (!match || match(port, &info[i].dev, data))) {
 			nv_info(i2c, "detected %s: %s\n", what,
 				info[i].dev.type);
 			return i;
diff --git a/drivers/gpu/drm/nouveau/core/subdev/therm/ic.c b/drivers/gpu/drm/nouveau/core/subdev/therm/ic.c
index 13b8500..30c384e 100644
--- a/drivers/gpu/drm/nouveau/core/subdev/therm/ic.c
+++ b/drivers/gpu/drm/nouveau/core/subdev/therm/ic.c
@@ -29,9 +29,9 @@
 
 static bool
 probe_monitoring_device(struct nouveau_i2c_port *i2c,
-			struct i2c_board_info *info)
+			struct i2c_board_info *info, void *data)
 {
-	struct nouveau_therm_priv *priv = (void *)nouveau_therm(i2c);
+	struct nouveau_therm_priv *priv = data;
 	struct nvbios_therm_sensor *sensor = &priv->bios_sensor;
 	struct i2c_client *client;
 
@@ -95,7 +95,7 @@ nouveau_therm_ic_ctor(struct nouveau_therm *therm)
 		};
 
 		i2c->identify(i2c, NV_I2C_DEFAULT(0), "monitoring device",
-				  board, probe_monitoring_device);
+			      board, probe_monitoring_device, therm);
 		if (priv->ic)
 			return;
 	}
@@ -107,7 +107,7 @@ nouveau_therm_ic_ctor(struct nouveau_therm *therm)
 		};
 
 		i2c->identify(i2c, NV_I2C_DEFAULT(0), "monitoring device",
-				  board, probe_monitoring_device);
+			      board, probe_monitoring_device, therm);
 		if (priv->ic)
 			return;
 	}
@@ -116,5 +116,5 @@ nouveau_therm_ic_ctor(struct nouveau_therm *therm)
 	   device. Let's try our static list.
 	 */
 	i2c->identify(i2c, NV_I2C_DEFAULT(0), "monitoring device",
-		      nv_board_infos, probe_monitoring_device);
+		      nv_board_infos, probe_monitoring_device, therm);
 }
diff --git a/drivers/gpu/drm/nouveau/dispnv04/dfp.c b/drivers/gpu/drm/nouveau/dispnv04/dfp.c
index 936a71c..7fdc51e 100644
--- a/drivers/gpu/drm/nouveau/dispnv04/dfp.c
+++ b/drivers/gpu/drm/nouveau/dispnv04/dfp.c
@@ -643,7 +643,7 @@ static void nv04_tmds_slave_init(struct drm_encoder *encoder)
 	    get_tmds_slave(encoder))
 		return;
 
-	type = i2c->identify(i2c, 2, "TMDS transmitter", info, NULL);
+	type = i2c->identify(i2c, 2, "TMDS transmitter", info, NULL, NULL);
 	if (type < 0)
 		return;
 
diff --git a/drivers/gpu/drm/nouveau/dispnv04/tvnv04.c b/drivers/gpu/drm/nouveau/dispnv04/tvnv04.c
index cc4b208..244822d 100644
--- a/drivers/gpu/drm/nouveau/dispnv04/tvnv04.c
+++ b/drivers/gpu/drm/nouveau/dispnv04/tvnv04.c
@@ -59,7 +59,7 @@ int nv04_tv_identify(struct drm_device *dev, int i2c_index)
 	struct nouveau_i2c *i2c = nouveau_i2c(drm->device);
 
 	return i2c->identify(i2c, i2c_index, "TV encoder",
-			     nv04_tv_encoder_info, NULL);
+			     nv04_tv_encoder_info, NULL, NULL);
 }
 
 
-- 
cgit v0.10.2


From 70f2fe3a26248724d8a5019681a869abdaf3e89a Mon Sep 17 00:00:00 2001
From: Andreas Rohner <andreas.rohner@gmx.net>
Date: Tue, 14 Jan 2014 17:56:36 -0800
Subject: nilfs2: fix segctor bug that causes file system corruption

There is a bug in the function nilfs_segctor_collect, which results in
active data being written to a segment, that is marked as clean.  It is
possible, that this segment is selected for a later segment
construction, whereby the old data is overwritten.

The problem shows itself with the following kernel log message:

  nilfs_sufile_do_cancel_free: segment 6533 must be clean

Usually a few hours later the file system gets corrupted:

  NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0, nchildren = 0
  NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap (inode number=114660)

The issue can be reproduced with a file system that is nearly full and
with the cleaner running, while some IO intensive task is running.
Although it is quite hard to reproduce.

This is what happens:

 1. The cleaner starts the segment construction
 2. nilfs_segctor_collect is called
 3. sc_stage is on NILFS_ST_SUFILE and segments are freed
 4. sc_stage is on NILFS_ST_DAT current segment is full
 5. nilfs_segctor_extend_segments is called, which
    allocates a new segment
 6. The new segment is one of the segments freed in step 3
 7. nilfs_sufile_cancel_freev is called and produces an error message
 8. Loop around and the collection starts again
 9. sc_stage is on NILFS_ST_SUFILE and segments are freed
    including the newly allocated segment, which will contain active
    data and can be allocated at a later time
10. A few hours later another segment construction allocates the
    segment and causes file system corruption

This can be prevented by simply reordering the statements.  If
nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments
the freed segments are marked as dirty and cannot be allocated any more.

Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Reviewed-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 9f6b486..a1a1916 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1440,17 +1440,19 @@ static int nilfs_segctor_collect(struct nilfs_sc_info *sci,
 
 		nilfs_clear_logs(&sci->sc_segbufs);
 
-		err = nilfs_segctor_extend_segments(sci, nilfs, nadd);
-		if (unlikely(err))
-			return err;
-
 		if (sci->sc_stage.flags & NILFS_CF_SUFREED) {
 			err = nilfs_sufile_cancel_freev(nilfs->ns_sufile,
 							sci->sc_freesegs,
 							sci->sc_nfreesegs,
 							NULL);
 			WARN_ON(err); /* do not happen */
+			sci->sc_stage.flags &= ~NILFS_CF_SUFREED;
 		}
+
+		err = nilfs_segctor_extend_segments(sci, nilfs, nadd);
+		if (unlikely(err))
+			return err;
+
 		nadd = min_t(int, nadd << 1, SC_MAX_SEGDELTA);
 		sci->sc_stage = prev_stage;
 	}
-- 
cgit v0.10.2


From bad009fe354a00e6b2bf87328995ec76e59ab970 Mon Sep 17 00:00:00 2001
From: Huacai Chen <chenhc@lemote.com>
Date: Tue, 14 Jan 2014 17:56:37 -0800
Subject: MIPS: fix case mismatch in local_r4k_flush_icache_range()

Currently, Loongson-2 call protected_blast_icache_range() and others
call protected_loongson23_blast_icache_range(), but I think the correct
behavior should be the opposite.  BTW, Loongson-3's cache-ops is
compatible with MIPS64, but not compatible with Loongson-2.  So, rename
xxx_loongson23_yyy things to xxx_loongson2_yyy.

The patch fixes early boot hang with 3.13-rc1, introduced in commit
14bd8c082016 ("MIPS: Loongson: Get rid of Loongson 2 #ifdefery all over
arch/mips").

Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Acked-by: John Crispin <blogic@openwrt.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/arch/mips/include/asm/cacheops.h b/arch/mips/include/asm/cacheops.h
index c75025f..06b9bc7 100644
--- a/arch/mips/include/asm/cacheops.h
+++ b/arch/mips/include/asm/cacheops.h
@@ -83,6 +83,6 @@
 /*
  * Loongson2-specific cacheops
  */
-#define Hit_Invalidate_I_Loongson23	0x00
+#define Hit_Invalidate_I_Loongson2	0x00
 
 #endif	/* __ASM_CACHEOPS_H */
diff --git a/arch/mips/include/asm/r4kcache.h b/arch/mips/include/asm/r4kcache.h
index 34d1a19..91d20b0 100644
--- a/arch/mips/include/asm/r4kcache.h
+++ b/arch/mips/include/asm/r4kcache.h
@@ -165,7 +165,7 @@ static inline void flush_icache_line(unsigned long addr)
 	__iflush_prologue
 	switch (boot_cpu_type()) {
 	case CPU_LOONGSON2:
-		cache_op(Hit_Invalidate_I_Loongson23, addr);
+		cache_op(Hit_Invalidate_I_Loongson2, addr);
 		break;
 
 	default:
@@ -219,7 +219,7 @@ static inline void protected_flush_icache_line(unsigned long addr)
 {
 	switch (boot_cpu_type()) {
 	case CPU_LOONGSON2:
-		protected_cache_op(Hit_Invalidate_I_Loongson23, addr);
+		protected_cache_op(Hit_Invalidate_I_Loongson2, addr);
 		break;
 
 	default:
@@ -452,8 +452,8 @@ static inline void prot##extra##blast_##pfx##cache##_range(unsigned long start,
 __BUILD_BLAST_CACHE_RANGE(d, dcache, Hit_Writeback_Inv_D, protected_, )
 __BUILD_BLAST_CACHE_RANGE(s, scache, Hit_Writeback_Inv_SD, protected_, )
 __BUILD_BLAST_CACHE_RANGE(i, icache, Hit_Invalidate_I, protected_, )
-__BUILD_BLAST_CACHE_RANGE(i, icache, Hit_Invalidate_I_Loongson23, \
-	protected_, loongson23_)
+__BUILD_BLAST_CACHE_RANGE(i, icache, Hit_Invalidate_I_Loongson2, \
+	protected_, loongson2_)
 __BUILD_BLAST_CACHE_RANGE(d, dcache, Hit_Writeback_Inv_D, , )
 __BUILD_BLAST_CACHE_RANGE(s, scache, Hit_Writeback_Inv_SD, , )
 /* blast_inv_dcache_range */
diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index 62ffd20..73f02da6 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -580,11 +580,11 @@ static inline void local_r4k_flush_icache_range(unsigned long start, unsigned lo
 	else {
 		switch (boot_cpu_type()) {
 		case CPU_LOONGSON2:
-			protected_blast_icache_range(start, end);
+			protected_loongson2_blast_icache_range(start, end);
 			break;
 
 		default:
-			protected_loongson23_blast_icache_range(start, end);
+			protected_blast_icache_range(start, end);
 			break;
 		}
 	}
-- 
cgit v0.10.2


From 43a06847b9d277e9f2c3bf8052b44b74e17526c7 Mon Sep 17 00:00:00 2001
From: Aaro Koskinen <aaro.koskinen@iki.fi>
Date: Tue, 14 Jan 2014 17:56:38 -0800
Subject: MIPS: fix blast_icache32 on loongson2

Commit 14bd8c082016 ("MIPS: Loongson: Get rid of Loongson 2 #ifdefery
all over arch/mips") failed to add Loongson2 specific blast_icache32
functions.  Fix that.

The patch fixes the following crash seen with 3.13-rc1:

  Reserved instruction in kernel code[#1]:
  [...]
  Call Trace:
    blast_icache32_page+0x8/0xb0
    r4k_flush_cache_page+0x19c/0x200
    do_wp_page.isra.97+0x47c/0xe08
    handle_mm_fault+0x938/0x1118
    __do_page_fault+0x140/0x540
    resume_userspace_check+0x0/0x10
  Code: 00200825  64834000  00200825 <bc900000> bc900020  bc900040  bc900060  bc900080  bc9000a0

Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Acked-by: John Crispin <blogic@openwrt.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/arch/mips/include/asm/r4kcache.h b/arch/mips/include/asm/r4kcache.h
index 91d20b0..c84cadd 100644
--- a/arch/mips/include/asm/r4kcache.h
+++ b/arch/mips/include/asm/r4kcache.h
@@ -357,8 +357,8 @@ static inline void invalidate_tcache_page(unsigned long addr)
 		  "i" (op));
 
 /* build blast_xxx, blast_xxx_page, blast_xxx_page_indexed */
-#define __BUILD_BLAST_CACHE(pfx, desc, indexop, hitop, lsize) \
-static inline void blast_##pfx##cache##lsize(void)			\
+#define __BUILD_BLAST_CACHE(pfx, desc, indexop, hitop, lsize, extra)	\
+static inline void extra##blast_##pfx##cache##lsize(void)		\
 {									\
 	unsigned long start = INDEX_BASE;				\
 	unsigned long end = start + current_cpu_data.desc.waysize;	\
@@ -376,7 +376,7 @@ static inline void blast_##pfx##cache##lsize(void)			\
 	__##pfx##flush_epilogue						\
 }									\
 									\
-static inline void blast_##pfx##cache##lsize##_page(unsigned long page) \
+static inline void extra##blast_##pfx##cache##lsize##_page(unsigned long page) \
 {									\
 	unsigned long start = page;					\
 	unsigned long end = page + PAGE_SIZE;				\
@@ -391,7 +391,7 @@ static inline void blast_##pfx##cache##lsize##_page(unsigned long page) \
 	__##pfx##flush_epilogue						\
 }									\
 									\
-static inline void blast_##pfx##cache##lsize##_page_indexed(unsigned long page) \
+static inline void extra##blast_##pfx##cache##lsize##_page_indexed(unsigned long page) \
 {									\
 	unsigned long indexmask = current_cpu_data.desc.waysize - 1;	\
 	unsigned long start = INDEX_BASE + (page & indexmask);		\
@@ -410,23 +410,24 @@ static inline void blast_##pfx##cache##lsize##_page_indexed(unsigned long page)
 	__##pfx##flush_epilogue						\
 }
 
-__BUILD_BLAST_CACHE(d, dcache, Index_Writeback_Inv_D, Hit_Writeback_Inv_D, 16)
-__BUILD_BLAST_CACHE(i, icache, Index_Invalidate_I, Hit_Invalidate_I, 16)
-__BUILD_BLAST_CACHE(s, scache, Index_Writeback_Inv_SD, Hit_Writeback_Inv_SD, 16)
-__BUILD_BLAST_CACHE(d, dcache, Index_Writeback_Inv_D, Hit_Writeback_Inv_D, 32)
-__BUILD_BLAST_CACHE(i, icache, Index_Invalidate_I, Hit_Invalidate_I, 32)
-__BUILD_BLAST_CACHE(s, scache, Index_Writeback_Inv_SD, Hit_Writeback_Inv_SD, 32)
-__BUILD_BLAST_CACHE(d, dcache, Index_Writeback_Inv_D, Hit_Writeback_Inv_D, 64)
-__BUILD_BLAST_CACHE(i, icache, Index_Invalidate_I, Hit_Invalidate_I, 64)
-__BUILD_BLAST_CACHE(s, scache, Index_Writeback_Inv_SD, Hit_Writeback_Inv_SD, 64)
-__BUILD_BLAST_CACHE(s, scache, Index_Writeback_Inv_SD, Hit_Writeback_Inv_SD, 128)
-
-__BUILD_BLAST_CACHE(inv_d, dcache, Index_Writeback_Inv_D, Hit_Invalidate_D, 16)
-__BUILD_BLAST_CACHE(inv_d, dcache, Index_Writeback_Inv_D, Hit_Invalidate_D, 32)
-__BUILD_BLAST_CACHE(inv_s, scache, Index_Writeback_Inv_SD, Hit_Invalidate_SD, 16)
-__BUILD_BLAST_CACHE(inv_s, scache, Index_Writeback_Inv_SD, Hit_Invalidate_SD, 32)
-__BUILD_BLAST_CACHE(inv_s, scache, Index_Writeback_Inv_SD, Hit_Invalidate_SD, 64)
-__BUILD_BLAST_CACHE(inv_s, scache, Index_Writeback_Inv_SD, Hit_Invalidate_SD, 128)
+__BUILD_BLAST_CACHE(d, dcache, Index_Writeback_Inv_D, Hit_Writeback_Inv_D, 16, )
+__BUILD_BLAST_CACHE(i, icache, Index_Invalidate_I, Hit_Invalidate_I, 16, )
+__BUILD_BLAST_CACHE(s, scache, Index_Writeback_Inv_SD, Hit_Writeback_Inv_SD, 16, )
+__BUILD_BLAST_CACHE(d, dcache, Index_Writeback_Inv_D, Hit_Writeback_Inv_D, 32, )
+__BUILD_BLAST_CACHE(i, icache, Index_Invalidate_I, Hit_Invalidate_I, 32, )
+__BUILD_BLAST_CACHE(i, icache, Index_Invalidate_I, Hit_Invalidate_I_Loongson2, 32, loongson2_)
+__BUILD_BLAST_CACHE(s, scache, Index_Writeback_Inv_SD, Hit_Writeback_Inv_SD, 32, )
+__BUILD_BLAST_CACHE(d, dcache, Index_Writeback_Inv_D, Hit_Writeback_Inv_D, 64, )
+__BUILD_BLAST_CACHE(i, icache, Index_Invalidate_I, Hit_Invalidate_I, 64, )
+__BUILD_BLAST_CACHE(s, scache, Index_Writeback_Inv_SD, Hit_Writeback_Inv_SD, 64, )
+__BUILD_BLAST_CACHE(s, scache, Index_Writeback_Inv_SD, Hit_Writeback_Inv_SD, 128, )
+
+__BUILD_BLAST_CACHE(inv_d, dcache, Index_Writeback_Inv_D, Hit_Invalidate_D, 16, )
+__BUILD_BLAST_CACHE(inv_d, dcache, Index_Writeback_Inv_D, Hit_Invalidate_D, 32, )
+__BUILD_BLAST_CACHE(inv_s, scache, Index_Writeback_Inv_SD, Hit_Invalidate_SD, 16, )
+__BUILD_BLAST_CACHE(inv_s, scache, Index_Writeback_Inv_SD, Hit_Invalidate_SD, 32, )
+__BUILD_BLAST_CACHE(inv_s, scache, Index_Writeback_Inv_SD, Hit_Invalidate_SD, 64, )
+__BUILD_BLAST_CACHE(inv_s, scache, Index_Writeback_Inv_SD, Hit_Invalidate_SD, 128, )
 
 /* build blast_xxx_range, protected_blast_xxx_range */
 #define __BUILD_BLAST_CACHE_RANGE(pfx, desc, hitop, prot, extra)	\
diff --git a/arch/mips/mm/c-r4k.c b/arch/mips/mm/c-r4k.c
index 73f02da6..49e572d 100644
--- a/arch/mips/mm/c-r4k.c
+++ b/arch/mips/mm/c-r4k.c
@@ -237,6 +237,8 @@ static void r4k_blast_icache_page_setup(void)
 		r4k_blast_icache_page = (void *)cache_noop;
 	else if (ic_lsize == 16)
 		r4k_blast_icache_page = blast_icache16_page;
+	else if (ic_lsize == 32 && current_cpu_type() == CPU_LOONGSON2)
+		r4k_blast_icache_page = loongson2_blast_icache32_page;
 	else if (ic_lsize == 32)
 		r4k_blast_icache_page = blast_icache32_page;
 	else if (ic_lsize == 64)
@@ -261,6 +263,9 @@ static void r4k_blast_icache_page_indexed_setup(void)
 		else if (TX49XX_ICACHE_INDEX_INV_WAR)
 			r4k_blast_icache_page_indexed =
 				tx49_blast_icache32_page_indexed;
+		else if (current_cpu_type() == CPU_LOONGSON2)
+			r4k_blast_icache_page_indexed =
+				loongson2_blast_icache32_page_indexed;
 		else
 			r4k_blast_icache_page_indexed =
 				blast_icache32_page_indexed;
@@ -284,6 +289,8 @@ static void r4k_blast_icache_setup(void)
 			r4k_blast_icache = blast_r4600_v1_icache32;
 		else if (TX49XX_ICACHE_INDEX_INV_WAR)
 			r4k_blast_icache = tx49_blast_icache32;
+		else if (current_cpu_type() == CPU_LOONGSON2)
+			r4k_blast_icache = loongson2_blast_icache32;
 		else
 			r4k_blast_icache = blast_icache32;
 	} else if (ic_lsize == 64)
-- 
cgit v0.10.2


From 03e5ac2fc3bf6f4140db0371e8bb4243b24e3e02 Mon Sep 17 00:00:00 2001
From: Mikulas Patocka <mpatocka@redhat.com>
Date: Tue, 14 Jan 2014 17:56:40 -0800
Subject: mm: fix crash when using XFS on loopback

Commit 8456a648cf44 ("slab: use struct page for slab management") causes
a crash in the LVM2 testsuite on PA-RISC (the crashing test is
fsadm.sh).  The testsuite doesn't crash on 3.12, crashes on 3.13-rc1 and
later.

 Bad Address (null pointer deref?): Code=15 regs=000000413edd89a0 (Addr=000006202224647d)
 CPU: 3 PID: 24008 Comm: loop0 Not tainted 3.13.0-rc6 #5
 task: 00000001bf3c0048 ti: 000000413edd8000 task.ti: 000000413edd8000

      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
 PSW: 00001000000001101111100100001110 Not tainted
 r00-03  000000ff0806f90e 00000000405c8de0 000000004013e6c0 000000413edd83f0
 r04-07  00000000405a95e0 0000000000000200 00000001414735f0 00000001bf349e40
 r08-11  0000000010fe3d10 0000000000000001 00000040829c7778 000000413efd9000
 r12-15  0000000000000000 000000004060d800 0000000010fe3000 0000000010fe3000
 r16-19  000000413edd82a0 00000041078ddbc0 0000000000000010 0000000000000001
 r20-23  0008f3d0d83a8000 0000000000000000 00000040829c7778 0000000000000080
 r24-27  00000001bf349e40 00000001bf349e40 202d66202224640d 00000000405a95e0
 r28-31  202d662022246465 000000413edd88f0 000000413edd89a0 0000000000000001
 sr00-03  000000000532c000 0000000000000000 0000000000000000 000000000532c000
 sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000

 IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000401fe42c 00000000401fe430
  IIR: 539c0030    ISR: 00000000202d6000  IOR: 000006202224647d
  CPU:        3   CR30: 000000413edd8000 CR31: 0000000000000000
  ORIG_R28: 00000000405a95e0
  IAOQ[0]: vma_interval_tree_iter_first+0x14/0x48
  IAOQ[1]: vma_interval_tree_iter_first+0x18/0x48
  RP(r2): flush_dcache_page+0x128/0x388
 Backtrace:
   flush_dcache_page+0x128/0x388
   lo_splice_actor+0x90/0x148 [loop]
   splice_from_pipe_feed+0xc0/0x1d0
   __splice_from_pipe+0xac/0xc0
   lo_direct_splice_actor+0x1c/0x70 [loop]
   splice_direct_to_actor+0xec/0x228
   lo_receive+0xe4/0x298 [loop]
   loop_thread+0x478/0x640 [loop]
   kthread+0x134/0x168
   end_fault_vector+0x20/0x28
   xfs_setsize_buftarg+0x0/0x90 [xfs]

 Kernel panic - not syncing: Bad Address (null pointer deref?)

Commit 8456a648cf44 changes the page structure so that the slab
subsystem reuses the page->mapping field.

The crash happens in the following way:
 * XFS allocates some memory from slab and issues a bio to read data
   into it.
 * the bio is sent to the loopback device.
 * lo_receive creates an actor and calls splice_direct_to_actor.
 * lo_splice_actor copies data to the target page.
 * lo_splice_actor calls flush_dcache_page because the page may be
   mapped by userspace.  In that case we need to flush the kernel cache.
 * flush_dcache_page asks for the list of userspace mappings, however
   that page->mapping field is reused by the slab subsystem for a
   different purpose.  This causes the crash.

Note that other architectures without coherent caches (sparc, arm, mips)
also call page_mapping from flush_dcache_page, so they may crash in the
same way.

This patch fixes this bug by testing if the page is a slab page in
page_mapping and returning NULL if it is.

The patch also fixes VM_BUG_ON(PageSlab(page)) that could happen in
earlier kernels in the same scenario on architectures without cache
coherence when CONFIG_DEBUG_VM is enabled - so it should be backported
to stable kernels.

In the old kernels, the function page_mapping is placed in
include/linux/mm.h, so you should modify the patch accordingly when
backporting it.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: John David Anglin <dave.anglin@bell.net>]
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/mm/util.c b/mm/util.c
index f7bc209..808f375 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -390,7 +390,10 @@ struct address_space *page_mapping(struct page *page)
 {
 	struct address_space *mapping = page->mapping;
 
-	VM_BUG_ON(PageSlab(page));
+	/* This happens if someone calls flush_dcache_page on slab page */
+	if (unlikely(PageSlab(page)))
+		return NULL;
+
 	if (unlikely(PageSwapCache(page))) {
 		swp_entry_t entry;
 
-- 
cgit v0.10.2


From 5a610fcc7390ee60308deaf09426ada87a1eeec2 Mon Sep 17 00:00:00 2001
From: Qais Yousef <qais.yousef@imgtec.com>
Date: Tue, 14 Jan 2014 17:56:41 -0800
Subject: crash_dump: fix compilation error (on MIPS at least)

  In file included from kernel/crash_dump.c:2:0:
  include/linux/crash_dump.h:22:27: error: unknown type name `pgprot_t'

when CONFIG_CRASH_DUMP=y

The error was traced back to commit 9cb218131de1 ("vmcore: introduce
remap_oldmem_pfn_range()")

include <asm/pgtable.h> to get the missing definition

Signed-off-by: Qais Yousef <qais.yousef@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: <stable@vger.kernel.org>	[3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h
index fe68a5a..7032518 100644
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -6,6 +6,8 @@
 #include <linux/proc_fs.h>
 #include <linux/elf.h>
 
+#include <asm/pgtable.h> /* for pgprot_t */
+
 #define ELFCORE_ADDR_MAX	(-1ULL)
 #define ELFCORE_ADDR_ERR	(-2ULL)
 
-- 
cgit v0.10.2


From 74e72f894d56eb9d2e1218530c658e7d297e002b Mon Sep 17 00:00:00 2001
From: Ming Lei <tom.leiming@gmail.com>
Date: Tue, 14 Jan 2014 17:56:42 -0800
Subject: lib/percpu_counter.c: fix __percpu_counter_add()

__percpu_counter_add() may be called in softirq/hardirq handler (such
as, blk_mq_queue_exit() is typically called in hardirq/softirq handler),
so we need to call this_cpu_add()(irq safe helper) to update percpu
counter, otherwise counts may be lost.

This fixes the problem that 'rmmod null_blk' hangs in blk_cleanup_queue()
because of miscounting of request_queue->mq_usage_counter.

This patch is the v1 of previous one of "lib/percpu_counter.c:
disable local irq when updating percpu couter", and takes Andrew's
approach which may be more efficient for ARCHs(x86, s390) that
have optimized this_cpu_add().

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Shaohua Li <shli@fusionio.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Fan Du <fan.du@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
index 7473ee3..1da85bb 100644
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -82,10 +82,10 @@ void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch)
 		unsigned long flags;
 		raw_spin_lock_irqsave(&fbc->lock, flags);
 		fbc->count += count;
+		 __this_cpu_sub(*fbc->counters, count);
 		raw_spin_unlock_irqrestore(&fbc->lock, flags);
-		__this_cpu_write(*fbc->counters, 0);
 	} else {
-		__this_cpu_write(*fbc->counters, count);
+		this_cpu_add(*fbc->counters, amount);
 	}
 	preempt_enable();
 }
-- 
cgit v0.10.2


From 0dce7cd67fd9055c4a2ff278f8af1431e646d346 Mon Sep 17 00:00:00 2001
From: Andrew Jones <drjones@redhat.com>
Date: Wed, 15 Jan 2014 13:39:59 +0100
Subject: kvm: x86: fix apic_base enable check

Commit e66d2ae7c67bd moved the assignment
vcpu->arch.apic_base = value above a condition with
(vcpu->arch.apic_base ^ value), causing that check
to always fail. Use old_value, vcpu->arch.apic_base's
old value, in the condition instead.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 1673940..775702f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1355,7 +1355,7 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
 	vcpu->arch.apic_base = value;
 
 	/* update jump label if enable bit changes */
-	if ((vcpu->arch.apic_base ^ value) & MSR_IA32_APICBASE_ENABLE) {
+	if ((old_value ^ value) & MSR_IA32_APICBASE_ENABLE) {
 		if (value & MSR_IA32_APICBASE_ENABLE)
 			static_key_slow_dec_deferred(&apic_hw_disabled);
 		else
-- 
cgit v0.10.2


From 1df0cbd509bc21b0c331358c1f9d9a6fc94bada8 Mon Sep 17 00:00:00 2001
From: Marek Lindner <mareklindner@neomailbox.ch>
Date: Wed, 15 Jan 2014 20:31:18 +0800
Subject: batman-adv: fix batman-adv header overhead calculation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Batman-adv prepends a full ethernet header in addition to its own
header. This has to be reflected in the MTU calculation, especially
since the value is used to set dev->hard_header_len.

Introduced by 411d6ed93a5d0601980d3e5ce75de07c98e3a7de
("batman-adv: consider network coding overhead when calculating required mtu")

Reported-by: cmsv <cmsv@wirelesspt.net>
Reported-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>

diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c
index 1511f64..faba0f6 100644
--- a/net/batman-adv/main.c
+++ b/net/batman-adv/main.c
@@ -277,7 +277,7 @@ int batadv_max_header_len(void)
 			   sizeof(struct batadv_coded_packet));
 #endif
 
-	return header_len;
+	return header_len + ETH_HLEN;
 }
 
 /**
-- 
cgit v0.10.2


From a926592f5e4e900f3fa903298c4619a131e60963 Mon Sep 17 00:00:00 2001
From: Richard Weinberger <richard@nod.at>
Date: Tue, 14 Jan 2014 22:46:36 +0100
Subject: net,via-rhine: Fix tx_timeout handling

rhine_reset_task() misses to disable the tx scheduler upon reset,
this can lead to a crash if work is still scheduled while we're resetting
the tx queue.

Fixes:
[   93.591707] BUG: unable to handle kernel NULL pointer dereference at 0000004c
[   93.595514] IP: [<c119d10d>] rhine_napipoll+0x491/0x6

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/ethernet/via/via-rhine.c b/drivers/net/ethernet/via/via-rhine.c
index cce6c4b..ef312bc 100644
--- a/drivers/net/ethernet/via/via-rhine.c
+++ b/drivers/net/ethernet/via/via-rhine.c
@@ -1618,6 +1618,7 @@ static void rhine_reset_task(struct work_struct *work)
 		goto out_unlock;
 
 	napi_disable(&rp->napi);
+	netif_tx_disable(dev);
 	spin_lock_bh(&rp->lock);
 
 	/* clear all descriptors */
-- 
cgit v0.10.2


From d9aee591b0f06bd44cd577b757d3f267bc35fe4d Mon Sep 17 00:00:00 2001
From: Yuval Mintz <yuvalmin@broadcom.com>
Date: Wed, 15 Jan 2014 12:05:30 +0200
Subject: bnx2x: Don't release PCI bars on shutdown

The bnx2x driver in its pci shutdown() callback releases its pci bars (in the
same manner it does during its pci remove() callback).
During a system reboot while VFs are enabled, its possible for the VF's remove
to be called (as a result of pci_disable_sriov()) after its shutdown callback
has already finished running; This will cause a paging request fault as the VF
tries to access the pci bar which it has previously released, crashing the
system.

This patch further differentiates the shutdown and remove callbacks, preventing the
pci release procedures from being called during shutdown.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 8b3107b..0067b97 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -12942,25 +12942,26 @@ static void __bnx2x_remove(struct pci_dev *pdev,
 		pci_set_power_state(pdev, PCI_D3hot);
 	}
 
-	if (bp->regview)
-		iounmap(bp->regview);
+	if (remove_netdev) {
+		if (bp->regview)
+			iounmap(bp->regview);
 
-	/* for vf doorbells are part of the regview and were unmapped along with
-	 * it. FW is only loaded by PF.
-	 */
-	if (IS_PF(bp)) {
-		if (bp->doorbells)
-			iounmap(bp->doorbells);
+		/* For vfs, doorbells are part of the regview and were unmapped
+		 * along with it. FW is only loaded by PF.
+		 */
+		if (IS_PF(bp)) {
+			if (bp->doorbells)
+				iounmap(bp->doorbells);
 
-		bnx2x_release_firmware(bp);
-	}
-	bnx2x_free_mem_bp(bp);
+			bnx2x_release_firmware(bp);
+		}
+		bnx2x_free_mem_bp(bp);
 
-	if (remove_netdev)
 		free_netdev(dev);
 
-	if (atomic_read(&pdev->enable_cnt) == 1)
-		pci_release_regions(pdev);
+		if (atomic_read(&pdev->enable_cnt) == 1)
+			pci_release_regions(pdev);
+	}
 
 	pci_disable_device(pdev);
 }
-- 
cgit v0.10.2


From ba42fad0964a41f0830e80c1b6be49c1e6bfcc01 Mon Sep 17 00:00:00 2001
From: Ivan Vecera <ivecera@redhat.com>
Date: Wed, 15 Jan 2014 11:11:34 +0100
Subject: be2net: add dma_mapping_error() check for dma_map_page()

The driver does not check value returned by dma_map_page. The patch
fixes this.

v2: Removed the bugfix for non-bug ;-) (thanks Sathya)

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Acked-by: Sathya Perla <Sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index bf40fda..a37039d 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -1776,6 +1776,7 @@ static void be_post_rx_frags(struct be_rx_obj *rxo, gfp_t gfp)
 	struct be_rx_page_info *page_info = NULL, *prev_page_info = NULL;
 	struct be_queue_info *rxq = &rxo->q;
 	struct page *pagep = NULL;
+	struct device *dev = &adapter->pdev->dev;
 	struct be_eth_rx_d *rxd;
 	u64 page_dmaaddr = 0, frag_dmaaddr;
 	u32 posted, page_offset = 0;
@@ -1788,9 +1789,15 @@ static void be_post_rx_frags(struct be_rx_obj *rxo, gfp_t gfp)
 				rx_stats(rxo)->rx_post_fail++;
 				break;
 			}
-			page_dmaaddr = dma_map_page(&adapter->pdev->dev, pagep,
-						    0, adapter->big_page_size,
+			page_dmaaddr = dma_map_page(dev, pagep, 0,
+						    adapter->big_page_size,
 						    DMA_FROM_DEVICE);
+			if (dma_mapping_error(dev, page_dmaaddr)) {
+				put_page(pagep);
+				pagep = NULL;
+				rx_stats(rxo)->rx_post_fail++;
+				break;
+			}
 			page_info->page_offset = 0;
 		} else {
 			get_page(pagep);
-- 
cgit v0.10.2


From aee636c4809fa54848ff07a899b326eb1f9987a2 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Wed, 15 Jan 2014 06:50:07 -0800
Subject: bpf: do not use reciprocal divide

At first Jakub Zawadzki noticed that some divisions by reciprocal_divide
were not correct. (off by one in some cases)
http://www.wireshark.org/~darkjames/reciprocal-buggy.c

He could also show this with BPF:
http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c

The reciprocal divide in linux kernel is not generic enough,
lets remove its use in BPF, as it is not worth the pain with
current cpus.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl>
Cc: Mircea Gherzan <mgherzan@gmail.com>
Cc: Daniel Borkmann <dxchgb@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Matt Evans <matt@ozlabs.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 9ed155a..271b5e9 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -641,10 +641,10 @@ load_ind:
 			emit(ARM_MUL(r_A, r_A, r_X), ctx);
 			break;
 		case BPF_S_ALU_DIV_K:
-			/* current k == reciprocal_value(userspace k) */
+			if (k == 1)
+				break;
 			emit_mov_i(r_scratch, k, ctx);
-			/* A = top 32 bits of the product */
-			emit(ARM_UMULL(r_scratch, r_A, r_A, r_scratch), ctx);
+			emit_udiv(r_A, r_A, r_scratch, ctx);
 			break;
 		case BPF_S_ALU_DIV_X:
 			update_on_xread(ctx);
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index ac3c2a1..555034f 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -223,10 +223,11 @@ static int bpf_jit_build_body(struct sk_filter *fp, u32 *image,
 			}
 			PPC_DIVWU(r_A, r_A, r_X);
 			break;
-		case BPF_S_ALU_DIV_K: /* A = reciprocal_divide(A, K); */
+		case BPF_S_ALU_DIV_K: /* A /= K */
+			if (K == 1)
+				break;
 			PPC_LI32(r_scratch1, K);
-			/* Top 32 bits of 64bit result -> A */
-			PPC_MULHWU(r_A, r_A, r_scratch1);
+			PPC_DIVWU(r_A, r_A, r_scratch1);
 			break;
 		case BPF_S_ALU_AND_X:
 			ctx->seen |= SEEN_XREG;
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 16871da..fc0fa77 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -371,11 +371,13 @@ static int bpf_jit_insn(struct bpf_jit *jit, struct sock_filter *filter,
 		/* dr %r4,%r12 */
 		EMIT2(0x1d4c);
 		break;
-	case BPF_S_ALU_DIV_K: /* A = reciprocal_divide(A, K) */
-		/* m %r4,<d(K)>(%r13) */
-		EMIT4_DISP(0x5c40d000, EMIT_CONST(K));
-		/* lr %r5,%r4 */
-		EMIT2(0x1854);
+	case BPF_S_ALU_DIV_K: /* A /= K */
+		if (K == 1)
+			break;
+		/* lhi %r4,0 */
+		EMIT4(0xa7480000);
+		/* d %r4,<d(K)>(%r13) */
+		EMIT4_DISP(0x5d40d000, EMIT_CONST(K));
 		break;
 	case BPF_S_ALU_MOD_X: /* A %= X */
 		jit->seen |= SEEN_XREG | SEEN_RET0;
@@ -391,6 +393,11 @@ static int bpf_jit_insn(struct bpf_jit *jit, struct sock_filter *filter,
 		EMIT2(0x1854);
 		break;
 	case BPF_S_ALU_MOD_K: /* A %= K */
+		if (K == 1) {
+			/* lhi %r5,0 */
+			EMIT4(0xa7580000);
+			break;
+		}
 		/* lhi %r4,0 */
 		EMIT4(0xa7480000);
 		/* d %r4,<d(K)>(%r13) */
diff --git a/arch/sparc/net/bpf_jit_comp.c b/arch/sparc/net/bpf_jit_comp.c
index 218b6b2..01fe994 100644
--- a/arch/sparc/net/bpf_jit_comp.c
+++ b/arch/sparc/net/bpf_jit_comp.c
@@ -497,9 +497,20 @@ void bpf_jit_compile(struct sk_filter *fp)
 			case BPF_S_ALU_MUL_K:	/* A *= K */
 				emit_alu_K(MUL, K);
 				break;
-			case BPF_S_ALU_DIV_K:	/* A /= K */
-				emit_alu_K(MUL, K);
-				emit_read_y(r_A);
+			case BPF_S_ALU_DIV_K:	/* A /= K with K != 0*/
+				if (K == 1)
+					break;
+				emit_write_y(G0);
+#ifdef CONFIG_SPARC32
+				/* The Sparc v8 architecture requires
+				 * three instructions between a %y
+				 * register write and the first use.
+				 */
+				emit_nop();
+				emit_nop();
+				emit_nop();
+#endif
+				emit_alu_K(DIV, K);
 				break;
 			case BPF_S_ALU_DIV_X:	/* A /= X; */
 				emit_cmpi(r_X, 0);
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 26328e8..4ed75dd 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -359,15 +359,21 @@ void bpf_jit_compile(struct sk_filter *fp)
 				EMIT2(0x89, 0xd0);	/* mov %edx,%eax */
 				break;
 			case BPF_S_ALU_MOD_K: /* A %= K; */
+				if (K == 1) {
+					CLEAR_A();
+					break;
+				}
 				EMIT2(0x31, 0xd2);	/* xor %edx,%edx */
 				EMIT1(0xb9);EMIT(K, 4);	/* mov imm32,%ecx */
 				EMIT2(0xf7, 0xf1);	/* div %ecx */
 				EMIT2(0x89, 0xd0);	/* mov %edx,%eax */
 				break;
-			case BPF_S_ALU_DIV_K: /* A = reciprocal_divide(A, K); */
-				EMIT3(0x48, 0x69, 0xc0); /* imul imm32,%rax,%rax */
-				EMIT(K, 4);
-				EMIT4(0x48, 0xc1, 0xe8, 0x20); /* shr $0x20,%rax */
+			case BPF_S_ALU_DIV_K: /* A /= K */
+				if (K == 1)
+					break;
+				EMIT2(0x31, 0xd2);	/* xor %edx,%edx */
+				EMIT1(0xb9);EMIT(K, 4);	/* mov imm32,%ecx */
+				EMIT2(0xf7, 0xf1);	/* div %ecx */
 				break;
 			case BPF_S_ALU_AND_X:
 				seen |= SEEN_XREG;
diff --git a/net/core/filter.c b/net/core/filter.c
index 01b7808..ad30d62 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -36,7 +36,6 @@
 #include <asm/uaccess.h>
 #include <asm/unaligned.h>
 #include <linux/filter.h>
-#include <linux/reciprocal_div.h>
 #include <linux/ratelimit.h>
 #include <linux/seccomp.h>
 #include <linux/if_vlan.h>
@@ -166,7 +165,7 @@ unsigned int sk_run_filter(const struct sk_buff *skb,
 			A /= X;
 			continue;
 		case BPF_S_ALU_DIV_K:
-			A = reciprocal_divide(A, K);
+			A /= K;
 			continue;
 		case BPF_S_ALU_MOD_X:
 			if (X == 0)
@@ -553,11 +552,6 @@ int sk_chk_filter(struct sock_filter *filter, unsigned int flen)
 		/* Some instructions need special checks */
 		switch (code) {
 		case BPF_S_ALU_DIV_K:
-			/* check for division by zero */
-			if (ftest->k == 0)
-				return -EINVAL;
-			ftest->k = reciprocal_value(ftest->k);
-			break;
 		case BPF_S_ALU_MOD_K:
 			/* check for division by zero */
 			if (ftest->k == 0)
@@ -853,27 +847,7 @@ void sk_decode_filter(struct sock_filter *filt, struct sock_filter *to)
 	to->code = decodes[code];
 	to->jt = filt->jt;
 	to->jf = filt->jf;
-
-	if (code == BPF_S_ALU_DIV_K) {
-		/*
-		 * When loaded this rule user gave us X, which was
-		 * translated into R = r(X). Now we calculate the
-		 * RR = r(R) and report it back. If next time this
-		 * value is loaded and RRR = r(RR) is calculated
-		 * then the R == RRR will be true.
-		 *
-		 * One exception. X == 1 translates into R == 0 and
-		 * we can't calculate RR out of it with r().
-		 */
-
-		if (filt->k == 0)
-			to->k = 1;
-		else
-			to->k = reciprocal_value(filt->k);
-
-		BUG_ON(reciprocal_value(to->k) != filt->k);
-	} else
-		to->k = filt->k;
+	to->k = filt->k;
 }
 
 int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf, unsigned int len)
-- 
cgit v0.10.2


From 4ce00dfcf19c473f3dbf23d5b1372639f0c334f6 Mon Sep 17 00:00:00 2001
From: Catalin Marinas <catalin.marinas@arm.com>
Date: Thu, 16 Jan 2014 18:32:25 +0000
Subject: Revert "arm64: Fix memory shareability attribute for
 ioremap_wc/cache"

This reverts commit 2f7dc6027522499582a520807cb9ffda589de47e.

The above commit breaks the mapping type for Device memory because
pgprot_default already contains a Normal memory type. pgprot_default is
also not initialised early enough for earlyprintk resulting in an
inconsistent memory mapping with 64K PAGE_SIZE configuration.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Will Deacon <will.deacon@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>

diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index 5727697..4cc813e 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -229,7 +229,7 @@ extern void __iomem *__ioremap(phys_addr_t phys_addr, size_t size, pgprot_t prot
 extern void __iounmap(volatile void __iomem *addr);
 extern void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size);
 
-#define PROT_DEFAULT		(pgprot_default | PTE_DIRTY)
+#define PROT_DEFAULT		(PTE_TYPE_PAGE | PTE_AF | PTE_DIRTY)
 #define PROT_DEVICE_nGnRE	(PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_ATTRINDX(MT_DEVICE_nGnRE))
 #define PROT_NORMAL_NC		(PROT_DEFAULT | PTE_ATTRINDX(MT_NORMAL_NC))
 #define PROT_NORMAL		(PROT_DEFAULT | PTE_ATTRINDX(MT_NORMAL))
-- 
cgit v0.10.2


From 38a529b5d42e4cfc5ac94844e61335a00eb2d320 Mon Sep 17 00:00:00 2001
From: Mika Westerberg <mika.westerberg@linux.intel.com>
Date: Thu, 16 Jan 2014 14:39:39 +0200
Subject: e1000e: Fix compilation warning when !CONFIG_PM_SLEEP
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Commit 7509963c703b (e1000e: Fix a compile flag mis-match for
suspend/resume) moved suspend and resume hooks to be available when
CONFIG_PM is set. However, it can be set even if CONFIG_PM_SLEEP is not set
causing following warnings to be emitted:

drivers/net/ethernet/intel/e1000e/netdev.c:6178:12: warning:
  	‘e1000_suspend’ defined but not used [-Wunused-function]

drivers/net/ethernet/intel/e1000e/netdev.c:6185:12: warning:
	‘e1000_resume’ defined but not used [-Wunused-function]

To fix this make the hooks to be available only when CONFIG_PM_SLEEP is set
and remove CONFIG_PM wrapping from driver ops because this is already
handled by SET_SYSTEM_SLEEP_PM_OPS() and SET_RUNTIME_PM_OPS().

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Dave Ertman <davidx.m.ertman@intel.com>
Cc: Aaron Brown <aaron.f.brown@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index c30d41d..6d14eea 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6174,7 +6174,7 @@ static int __e1000_resume(struct pci_dev *pdev)
 	return 0;
 }
 
-#ifdef CONFIG_PM
+#ifdef CONFIG_PM_SLEEP
 static int e1000_suspend(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
@@ -6193,7 +6193,7 @@ static int e1000_resume(struct device *dev)
 
 	return __e1000_resume(pdev);
 }
-#endif /* CONFIG_PM */
+#endif /* CONFIG_PM_SLEEP */
 
 #ifdef CONFIG_PM_RUNTIME
 static int e1000_runtime_suspend(struct device *dev)
@@ -7015,13 +7015,11 @@ static DEFINE_PCI_DEVICE_TABLE(e1000_pci_tbl) = {
 };
 MODULE_DEVICE_TABLE(pci, e1000_pci_tbl);
 
-#ifdef CONFIG_PM
 static const struct dev_pm_ops e1000_pm_ops = {
 	SET_SYSTEM_SLEEP_PM_OPS(e1000_suspend, e1000_resume)
 	SET_RUNTIME_PM_OPS(e1000_runtime_suspend, e1000_runtime_resume,
 			   e1000_idle)
 };
-#endif
 
 /* PCI Device API Driver */
 static struct pci_driver e1000_driver = {
@@ -7029,11 +7027,9 @@ static struct pci_driver e1000_driver = {
 	.id_table = e1000_pci_tbl,
 	.probe    = e1000_probe,
 	.remove   = e1000_remove,
-#ifdef CONFIG_PM
 	.driver   = {
 		.pm = &e1000_pm_ops,
 	},
-#endif
 	.shutdown = e1000_shutdown,
 	.err_handler = &e1000_err_handler
 };
-- 
cgit v0.10.2


From d1969a84dd6a44d375aa82bba7d6c38713a429c3 Mon Sep 17 00:00:00 2001
From: Hugh Dickins <hughd@google.com>
Date: Thu, 16 Jan 2014 15:26:48 -0800
Subject: percpu_counter: unbreak __percpu_counter_add()

Commit 74e72f894d56 ("lib/percpu_counter.c: fix __percpu_counter_add()")
looked very plausible, but its arithmetic was badly wrong: obvious once
you see the fix, but maddening to get there from the weird tmpfs ENOSPCs

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Ming Lei <tom.leiming@gmail.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Shaohua Li <shli@fusionio.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Fan Du <fan.du@windriver.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
index 1da85bb..8280a5d 100644
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -82,7 +82,7 @@ void __percpu_counter_add(struct percpu_counter *fbc, s64 amount, s32 batch)
 		unsigned long flags;
 		raw_spin_lock_irqsave(&fbc->lock, flags);
 		fbc->count += count;
-		 __this_cpu_sub(*fbc->counters, count);
+		__this_cpu_sub(*fbc->counters, count - amount);
 		raw_spin_unlock_irqrestore(&fbc->lock, flags);
 	} else {
 		this_cpu_add(*fbc->counters, amount);
-- 
cgit v0.10.2


From c196403b79aa241c3fefb3ee5bb328aa7c5cc860 Mon Sep 17 00:00:00 2001
From: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Date: Thu, 16 Jan 2014 16:54:48 +0100
Subject: net: rds: fix per-cpu helper usage

commit ae4b46e9d "net: rds: use this_cpu_* per-cpu helper" broke per-cpu
handling for rds. chpfirst is the result of __this_cpu_read(), so it is
an absolute pointer and not __percpu. Therefore, __this_cpu_write()
should not operate on chpfirst, but rather on cache->percpu->first, just
like __this_cpu_read() did before.

Cc: <stable@vger.kernel.org> # 3.8+
Signed-off-byd Gerald Schaefer <gerald.schaefer@de.ibm.com>

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
index 8eb9501..b7ebe23 100644
--- a/net/rds/ib_recv.c
+++ b/net/rds/ib_recv.c
@@ -421,8 +421,7 @@ static void rds_ib_recv_cache_put(struct list_head *new_item,
 				 struct rds_ib_refill_cache *cache)
 {
 	unsigned long flags;
-	struct list_head *old;
-	struct list_head __percpu *chpfirst;
+	struct list_head *old, *chpfirst;
 
 	local_irq_save(flags);
 
@@ -432,7 +431,7 @@ static void rds_ib_recv_cache_put(struct list_head *new_item,
 	else /* put on front */
 		list_add_tail(new_item, chpfirst);
 
-	__this_cpu_write(chpfirst, new_item);
+	__this_cpu_write(cache->percpu->first, new_item);
 	__this_cpu_inc(cache->percpu->count);
 
 	if (__this_cpu_read(cache->percpu->count) < RDS_IB_RECYCLE_BATCH_COUNT)
@@ -452,7 +451,7 @@ static void rds_ib_recv_cache_put(struct list_head *new_item,
 	} while (old);
 
 
-	__this_cpu_write(chpfirst, NULL);
+	__this_cpu_write(cache->percpu->first, NULL);
 	__this_cpu_write(cache->percpu->count, 0);
 end:
 	local_irq_restore(flags);
-- 
cgit v0.10.2


From 77f99ad16a07aa062c2d30fae57b1fee456f6ef6 Mon Sep 17 00:00:00 2001
From: Christoph Paasch <christoph.paasch@uclouvain.be>
Date: Thu, 16 Jan 2014 20:01:21 +0100
Subject: tcp: metrics: Avoid duplicate entries with the same destination-IP

Because the tcp-metrics is an RCU-list, it may be that two
soft-interrupts are inside __tcp_get_metrics() for the same
destination-IP at the same time. If this destination-IP is not yet part of
the tcp-metrics, both soft-interrupts will end up in tcpm_new and create
a new entry for this IP.
So, we will have two tcp-metrics with the same destination-IP in the list.

This patch checks twice __tcp_get_metrics(). First without holding the
lock, then while holding the lock. The second one is there to confirm
that the entry has not been added by another soft-irq while waiting for
the spin-lock.

Fixes: 51c5d0c4b169b (tcp: Maintain dynamic metrics in local cache.)
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index 0649373..098b3a2 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -22,6 +22,9 @@
 
 int sysctl_tcp_nometrics_save __read_mostly;
 
+static struct tcp_metrics_block *__tcp_get_metrics(const struct inetpeer_addr *addr,
+						   struct net *net, unsigned int hash);
+
 struct tcp_fastopen_metrics {
 	u16	mss;
 	u16	syn_loss:10;		/* Recurring Fast Open SYN losses */
@@ -130,16 +133,41 @@ static void tcpm_suck_dst(struct tcp_metrics_block *tm, struct dst_entry *dst,
 	}
 }
 
+#define TCP_METRICS_TIMEOUT		(60 * 60 * HZ)
+
+static void tcpm_check_stamp(struct tcp_metrics_block *tm, struct dst_entry *dst)
+{
+	if (tm && unlikely(time_after(jiffies, tm->tcpm_stamp + TCP_METRICS_TIMEOUT)))
+		tcpm_suck_dst(tm, dst, false);
+}
+
+#define TCP_METRICS_RECLAIM_DEPTH	5
+#define TCP_METRICS_RECLAIM_PTR		(struct tcp_metrics_block *) 0x1UL
+
 static struct tcp_metrics_block *tcpm_new(struct dst_entry *dst,
 					  struct inetpeer_addr *addr,
-					  unsigned int hash,
-					  bool reclaim)
+					  unsigned int hash)
 {
 	struct tcp_metrics_block *tm;
 	struct net *net;
+	bool reclaim = false;
 
 	spin_lock_bh(&tcp_metrics_lock);
 	net = dev_net(dst->dev);
+
+	/* While waiting for the spin-lock the cache might have been populated
+	 * with this entry and so we have to check again.
+	 */
+	tm = __tcp_get_metrics(addr, net, hash);
+	if (tm == TCP_METRICS_RECLAIM_PTR) {
+		reclaim = true;
+		tm = NULL;
+	}
+	if (tm) {
+		tcpm_check_stamp(tm, dst);
+		goto out_unlock;
+	}
+
 	if (unlikely(reclaim)) {
 		struct tcp_metrics_block *oldest;
 
@@ -169,17 +197,6 @@ out_unlock:
 	return tm;
 }
 
-#define TCP_METRICS_TIMEOUT		(60 * 60 * HZ)
-
-static void tcpm_check_stamp(struct tcp_metrics_block *tm, struct dst_entry *dst)
-{
-	if (tm && unlikely(time_after(jiffies, tm->tcpm_stamp + TCP_METRICS_TIMEOUT)))
-		tcpm_suck_dst(tm, dst, false);
-}
-
-#define TCP_METRICS_RECLAIM_DEPTH	5
-#define TCP_METRICS_RECLAIM_PTR		(struct tcp_metrics_block *) 0x1UL
-
 static struct tcp_metrics_block *tcp_get_encode(struct tcp_metrics_block *tm, int depth)
 {
 	if (tm)
@@ -282,7 +299,6 @@ static struct tcp_metrics_block *tcp_get_metrics(struct sock *sk,
 	struct inetpeer_addr addr;
 	unsigned int hash;
 	struct net *net;
-	bool reclaim;
 
 	addr.family = sk->sk_family;
 	switch (addr.family) {
@@ -304,13 +320,10 @@ static struct tcp_metrics_block *tcp_get_metrics(struct sock *sk,
 	hash = hash_32(hash, net->ipv4.tcp_metrics_hash_log);
 
 	tm = __tcp_get_metrics(&addr, net, hash);
-	reclaim = false;
-	if (tm == TCP_METRICS_RECLAIM_PTR) {
-		reclaim = true;
+	if (tm == TCP_METRICS_RECLAIM_PTR)
 		tm = NULL;
-	}
 	if (!tm && create)
-		tm = tcpm_new(dst, &addr, hash, reclaim);
+		tm = tcpm_new(dst, &addr, hash);
 	else
 		tcpm_check_stamp(tm, dst);
 
-- 
cgit v0.10.2


From 11ffff752c6a5adc86f7dd397b2f75af8f917c51 Mon Sep 17 00:00:00 2001
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Thu, 16 Jan 2014 20:13:04 +0100
Subject: ipv6: simplify detection of first operational link-local address on
 interface

In commit 1ec047eb4751e3 ("ipv6: introduce per-interface counter for
dad-completed ipv6 addresses") I build the detection of the first
operational link-local address much to complex. Additionally this code
now has a race condition.

Replace it with a much simpler variant, which just scans the address
list when duplicate address detection completes, to check if this is
the first valid link local address and send RS and MLD reports then.

Fixes: 1ec047eb4751e3 ("ipv6: introduce per-interface counter for dad-completed ipv6 addresses")
Reported-by: Jiri Pirko <jiri@resnulli.us>
Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h
index 76d5427..65bb130 100644
--- a/include/net/if_inet6.h
+++ b/include/net/if_inet6.h
@@ -165,7 +165,6 @@ struct inet6_dev {
 	struct net_device	*dev;
 
 	struct list_head	addr_list;
-	int			valid_ll_addr_cnt;
 
 	struct ifmcaddr6	*mc_list;
 	struct ifmcaddr6	*mc_tomb;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index abe46a4..4b6b720 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3189,6 +3189,22 @@ out:
 	in6_ifa_put(ifp);
 }
 
+/* ifp->idev must be at least read locked */
+static bool ipv6_lonely_lladdr(struct inet6_ifaddr *ifp)
+{
+	struct inet6_ifaddr *ifpiter;
+	struct inet6_dev *idev = ifp->idev;
+
+	list_for_each_entry(ifpiter, &idev->addr_list, if_list) {
+		if (ifp != ifpiter && ifpiter->scope == IFA_LINK &&
+		    (ifpiter->flags & (IFA_F_PERMANENT|IFA_F_TENTATIVE|
+				       IFA_F_OPTIMISTIC|IFA_F_DADFAILED)) ==
+		    IFA_F_PERMANENT)
+			return false;
+	}
+	return true;
+}
+
 static void addrconf_dad_completed(struct inet6_ifaddr *ifp)
 {
 	struct net_device *dev = ifp->idev->dev;
@@ -3208,14 +3224,11 @@ static void addrconf_dad_completed(struct inet6_ifaddr *ifp)
 	 */
 
 	read_lock_bh(&ifp->idev->lock);
-	spin_lock(&ifp->lock);
-	send_mld = ipv6_addr_type(&ifp->addr) & IPV6_ADDR_LINKLOCAL &&
-		   ifp->idev->valid_ll_addr_cnt == 1;
+	send_mld = ifp->scope == IFA_LINK && ipv6_lonely_lladdr(ifp);
 	send_rs = send_mld &&
 		  ipv6_accept_ra(ifp->idev) &&
 		  ifp->idev->cnf.rtr_solicits > 0 &&
 		  (dev->flags&IFF_LOOPBACK) == 0;
-	spin_unlock(&ifp->lock);
 	read_unlock_bh(&ifp->idev->lock);
 
 	/* While dad is in progress mld report's source address is in6_addrany.
@@ -4512,19 +4525,6 @@ errout:
 		rtnl_set_sk_err(net, RTNLGRP_IPV6_PREFIX, err);
 }
 
-static void update_valid_ll_addr_cnt(struct inet6_ifaddr *ifp, int count)
-{
-	write_lock_bh(&ifp->idev->lock);
-	spin_lock(&ifp->lock);
-	if (((ifp->flags & (IFA_F_PERMANENT|IFA_F_TENTATIVE|IFA_F_OPTIMISTIC|
-			    IFA_F_DADFAILED)) == IFA_F_PERMANENT) &&
-	    (ipv6_addr_type(&ifp->addr) & IPV6_ADDR_LINKLOCAL))
-		ifp->idev->valid_ll_addr_cnt += count;
-	WARN_ON(ifp->idev->valid_ll_addr_cnt < 0);
-	spin_unlock(&ifp->lock);
-	write_unlock_bh(&ifp->idev->lock);
-}
-
 static void __ipv6_ifa_notify(int event, struct inet6_ifaddr *ifp)
 {
 	struct net *net = dev_net(ifp->idev->dev);
@@ -4533,8 +4533,6 @@ static void __ipv6_ifa_notify(int event, struct inet6_ifaddr *ifp)
 
 	switch (event) {
 	case RTM_NEWADDR:
-		update_valid_ll_addr_cnt(ifp, 1);
-
 		/*
 		 * If the address was optimistic
 		 * we inserted the route at the start of
@@ -4550,8 +4548,6 @@ static void __ipv6_ifa_notify(int event, struct inet6_ifaddr *ifp)
 					      ifp->idev->dev, 0, 0);
 		break;
 	case RTM_DELADDR:
-		update_valid_ll_addr_cnt(ifp, -1);
-
 		if (ifp->idev->cnf.forwarding)
 			addrconf_leave_anycast(ifp);
 		addrconf_leave_solict(ifp->idev, &ifp->addr);
-- 
cgit v0.10.2


From 75b99dbd634f9caf7b4e48ec5c2dcec8c1837372 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Thu, 16 Jan 2014 11:15:12 -0800
Subject: parisc: fix SO_MAX_PACING_RATE typo

SO_MAX_PACING_RATE definition on parisc got a typo.
Its not too late to fix it, before 3.13 is official.

Fixes: 62748f32d501 ("net: introduce SO_MAX_PACING_RATE")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index f33113a..70b3674 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -75,6 +75,6 @@
 
 #define SO_BUSY_POLL		0x4027
 
-#define SO_MAX_PACING_RATE	0x4048
+#define SO_MAX_PACING_RATE	0x4028
 
 #endif /* _UAPI_ASM_SOCKET_H */
-- 
cgit v0.10.2


From 3af57f78c38131b7a66e2b01e06fdacae01992a3 Mon Sep 17 00:00:00 2001
From: Heiko Carstens <heiko.carstens@de.ibm.com>
Date: Fri, 17 Jan 2014 09:37:15 +0100
Subject: s390/bpf,jit: fix 32 bit divisions, use unsigned divide instructions

The s390 bpf jit compiler emits the signed divide instructions "dr" and "d"
for unsigned divisions.
This can cause problems: the dividend will be zero extended to a 64 bit value
and the divisor is the 32 bit signed value as specified A or X accumulator,
even though A and X are supposed to be treated as unsigned values.

The divide instrunctions will generate an exception if the result cannot be
expressed with a 32 bit signed value.
This is the case if e.g. the dividend is 0xffffffff and the divisor either 1
or also 0xffffffff (signed: -1).

To avoid all these issues simply use unsigned divide instructions.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index fc0fa77..708d60e 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -368,16 +368,16 @@ static int bpf_jit_insn(struct bpf_jit *jit, struct sock_filter *filter,
 		EMIT4_PCREL(0xa7840000, (jit->ret0_ip - jit->prg));
 		/* lhi %r4,0 */
 		EMIT4(0xa7480000);
-		/* dr %r4,%r12 */
-		EMIT2(0x1d4c);
+		/* dlr %r4,%r12 */
+		EMIT4(0xb997004c);
 		break;
 	case BPF_S_ALU_DIV_K: /* A /= K */
 		if (K == 1)
 			break;
 		/* lhi %r4,0 */
 		EMIT4(0xa7480000);
-		/* d %r4,<d(K)>(%r13) */
-		EMIT4_DISP(0x5d40d000, EMIT_CONST(K));
+		/* dl %r4,<d(K)>(%r13) */
+		EMIT6_DISP(0xe340d000, 0x0097, EMIT_CONST(K));
 		break;
 	case BPF_S_ALU_MOD_X: /* A %= X */
 		jit->seen |= SEEN_XREG | SEEN_RET0;
@@ -387,8 +387,8 @@ static int bpf_jit_insn(struct bpf_jit *jit, struct sock_filter *filter,
 		EMIT4_PCREL(0xa7840000, (jit->ret0_ip - jit->prg));
 		/* lhi %r4,0 */
 		EMIT4(0xa7480000);
-		/* dr %r4,%r12 */
-		EMIT2(0x1d4c);
+		/* dlr %r4,%r12 */
+		EMIT4(0xb997004c);
 		/* lr %r5,%r4 */
 		EMIT2(0x1854);
 		break;
@@ -400,8 +400,8 @@ static int bpf_jit_insn(struct bpf_jit *jit, struct sock_filter *filter,
 		}
 		/* lhi %r4,0 */
 		EMIT4(0xa7480000);
-		/* d %r4,<d(K)>(%r13) */
-		EMIT4_DISP(0x5d40d000, EMIT_CONST(K));
+		/* dl %r4,<d(K)>(%r13) */
+		EMIT6_DISP(0xe340d000, 0x0097, EMIT_CONST(K));
 		/* lr %r5,%r4 */
 		EMIT2(0x1854);
 		break;
-- 
cgit v0.10.2