From caaa4c804fae7bb654f7d00b35b8583280a9c52c Mon Sep 17 00:00:00 2001 From: Paul Mackerras Date: Sat, 16 Nov 2013 17:46:02 +1100 Subject: KVM: PPC: Book3S HV: Fix physical address calculations This fixes a bug in kvmppc_do_h_enter() where the physical address for a page can be calculated incorrectly if transparent huge pages (THP) are active. Until THP came along, it was true that if we encountered a large (16M) page in kvmppc_do_h_enter(), then the associated memslot must be 16M aligned for both its guest physical address and the userspace address, and the physical address calculations in kvmppc_do_h_enter() assumed that. With THP, that is no longer true. In the case where we are using MMU notifiers and the page size that we get from the Linux page tables is larger than the page being mapped by the guest, we need to fill in some low-order bits of the physical address. Without THP, these bits would be the same in the guest physical address (gpa) and the host virtual address (hva). With THP, they can be different, and we need to use the bits from hva rather than gpa. In the case where we are not using MMU notifiers, the host physical address we get from the memslot->arch.slot_phys[] array already includes the low-order bits down to the PAGE_SIZE level, even if we are using large pages. Thus we can simplify the calculation in this case to just add in the remaining bits in the case where PAGE_SIZE is 64k and the guest is mapping a 4k page. The same bug exists in kvmppc_book3s_hv_page_fault(). The basic fix is to use psize (the page size from the HPTE) rather than pte_size (the page size from the Linux PTE) when updating the HPTE low word in r. That means that pfn needs to be computed to PAGE_SIZE granularity even if the Linux PTE is a huge page PTE. That can be arranged simply by doing the page_to_pfn() before setting page to the head of the compound page. If psize is less than PAGE_SIZE, then we need to make sure we only update the bits from PAGE_SIZE upwards, in order not to lose any sub-page offset bits in r. On the other hand, if psize is greater than PAGE_SIZE, we need to make sure we don't bring in non-zero low order bits in pfn, hence we mask (pfn << PAGE_SHIFT) with ~(psize - 1). Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index f3ff587..47bbeaf 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -665,6 +665,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, return -EFAULT; } else { page = pages[0]; + pfn = page_to_pfn(page); if (PageHuge(page)) { page = compound_head(page); pte_size <<= compound_order(page); @@ -689,7 +690,6 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, } rcu_read_unlock_sched(); } - pfn = page_to_pfn(page); } ret = -EFAULT; @@ -707,8 +707,14 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, r = (r & ~(HPTE_R_W|HPTE_R_I|HPTE_R_G)) | HPTE_R_M; } - /* Set the HPTE to point to pfn */ - r = (r & ~(HPTE_R_PP0 - pte_size)) | (pfn << PAGE_SHIFT); + /* + * Set the HPTE to point to pfn. + * Since the pfn is at PAGE_SIZE granularity, make sure we + * don't mask out lower-order bits if psize < PAGE_SIZE. + */ + if (psize < PAGE_SIZE) + psize = PAGE_SIZE; + r = (r & ~(HPTE_R_PP0 - psize)) | ((pfn << PAGE_SHIFT) & ~(psize - 1)); if (hpte_is_writable(r) && !write_ok) r = hpte_make_readonly(r); ret = RESUME_GUEST; diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 9c51544..fddbf98 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -225,6 +225,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, is_io = pa & (HPTE_R_I | HPTE_R_W); pte_size = PAGE_SIZE << (pa & KVMPPC_PAGE_ORDER_MASK); pa &= PAGE_MASK; + pa |= gpa & ~PAGE_MASK; } else { /* Translate to host virtual address */ hva = __gfn_to_hva_memslot(memslot, gfn); @@ -238,13 +239,12 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, ptel = hpte_make_readonly(ptel); is_io = hpte_cache_bits(pte_val(pte)); pa = pte_pfn(pte) << PAGE_SHIFT; + pa |= hva & (pte_size - 1); } } if (pte_size < psize) return H_PARAMETER; - if (pa && pte_size > psize) - pa |= gpa & (pte_size - 1); ptel &= ~(HPTE_R_PP0 - psize); ptel |= pa; -- cgit v0.10.2 From f019b7ad76e6bdbc8462cbe17ad5b86a25fcdf24 Mon Sep 17 00:00:00 2001 From: Paul Mackerras Date: Sat, 16 Nov 2013 17:46:03 +1100 Subject: KVM: PPC: Book3S HV: Refine barriers in guest entry/exit Some users have reported instances of the host hanging with secondary threads of a core waiting for the primary thread to exit the guest, and the primary thread stuck in nap mode. This prompted a review of the memory barriers in the guest entry/exit code, and this is the result. Most of these changes are the suggestions of Dean Burdick . The barriers between updating napping_threads and reading the entry_exit_count on the one hand, and updating entry_exit_count and reading napping_threads on the other, need to be isync not lwsync, since we need to ensure that either the napping_threads update or the entry_exit_count update get seen. It is not sufficient to order the load vs. lwarx, as lwsync does; we need to order the load vs. the stwcx., so we need isync. In addition, we need a full sync before sending IPIs to wake other threads from nap, to ensure that the write to the entry_exit_count is visible before the IPI occurs. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index bc8de75..bde28da 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -153,7 +153,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206) 13: b machine_check_fwnmi - /* * We come in here when wakened from nap mode on a secondary hw thread. * Relocation is off and most register values are lost. @@ -224,6 +223,11 @@ kvm_start_guest: /* Clear our vcpu pointer so we don't come back in early */ li r0, 0 std r0, HSTATE_KVM_VCPU(r13) + /* + * Make sure we clear HSTATE_KVM_VCPU(r13) before incrementing + * the nap_count, because once the increment to nap_count is + * visible we could be given another vcpu. + */ lwsync /* Clear any pending IPI - we're an offline thread */ ld r5, HSTATE_XICS_PHYS(r13) @@ -241,7 +245,6 @@ kvm_start_guest: /* increment the nap count and then go to nap mode */ ld r4, HSTATE_KVM_VCORE(r13) addi r4, r4, VCORE_NAP_COUNT - lwsync /* make previous updates visible */ 51: lwarx r3, 0, r4 addi r3, r3, 1 stwcx. r3, 0, r4 @@ -990,14 +993,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201) */ /* Increment the threads-exiting-guest count in the 0xff00 bits of vcore->entry_exit_count */ - lwsync ld r5,HSTATE_KVM_VCORE(r13) addi r6,r5,VCORE_ENTRY_EXIT 41: lwarx r3,0,r6 addi r0,r3,0x100 stwcx. r0,0,r6 bne 41b - lwsync + isync /* order stwcx. vs. reading napping_threads */ /* * At this point we have an interrupt that we have to pass @@ -1030,6 +1032,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201) sld r0,r0,r4 andc. r3,r3,r0 /* no sense IPI'ing ourselves */ beq 43f + /* Order entry/exit update vs. IPIs */ + sync mulli r4,r4,PACA_SIZE /* get paca for thread 0 */ subf r6,r4,r13 42: andi. r0,r3,1 @@ -1638,10 +1642,10 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_206) bge kvm_cede_exit stwcx. r4,0,r6 bne 31b + /* order napping_threads update vs testing entry_exit_count */ + isync li r0,1 stb r0,HSTATE_NAPPING(r13) - /* order napping_threads update vs testing entry_exit_count */ - lwsync mr r4,r3 lwz r7,VCORE_ENTRY_EXIT(r5) cmpwi r7,0x100 -- cgit v0.10.2 From bf3d32e1156c36c88b75960fd2e5457d5d75620b Mon Sep 17 00:00:00 2001 From: Paul Mackerras Date: Sat, 16 Nov 2013 17:46:04 +1100 Subject: KVM: PPC: Book3S HV: Make tbacct_lock irq-safe Lockdep reported that there is a potential for deadlock because vcpu->arch.tbacct_lock is not irq-safe, and is sometimes taken inside the rq_lock (run-queue lock) in the scheduler, which is taken within interrupts. The lockdep splat looks like: ====================================================== [ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ] 3.12.0-rc5-kvm+ #8 Not tainted ------------------------------------------------------ qemu-system-ppc/4803 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: (&(&vcpu->arch.tbacct_lock)->rlock){+.+...}, at: [] .kvmppc_core_vcpu_put_hv+0x2c/0xa0 and this task is already holding: (&rq->lock){-.-.-.}, at: [] .__schedule+0x180/0xaa0 which would create a new lock dependency: (&rq->lock){-.-.-.} -> (&(&vcpu->arch.tbacct_lock)->rlock){+.+...} but this new dependency connects a HARDIRQ-irq-safe lock: (&rq->lock){-.-.-.} ... which became HARDIRQ-irq-safe at: [] .lock_acquire+0xbc/0x190 [] ._raw_spin_lock+0x34/0x60 [] .scheduler_tick+0x54/0x180 [] .update_process_times+0x70/0xa0 [] .tick_periodic+0x3c/0xe0 [] .tick_handle_periodic+0x28/0xb0 [] .timer_interrupt+0x120/0x2e0 [] decrementer_common+0x168/0x180 [] .get_page_from_freelist+0x924/0xc10 [] .__alloc_pages_nodemask+0x200/0xba0 [] .alloc_pages_exact_nid+0x68/0x110 [] .page_cgroup_init+0x1e0/0x270 [] .start_kernel+0x3e0/0x4e4 [] .start_here_common+0x20/0x70 to a HARDIRQ-irq-unsafe lock: (&(&vcpu->arch.tbacct_lock)->rlock){+.+...} ... which became HARDIRQ-irq-unsafe at: ... [] .lock_acquire+0xbc/0x190 [] ._raw_spin_lock+0x34/0x60 [] .kvmppc_core_vcpu_load_hv+0x2c/0x100 [] .kvmppc_core_vcpu_load+0x2c/0x40 [] .kvm_arch_vcpu_load+0x10/0x30 [] .vcpu_load+0x64/0xd0 [] .kvm_vcpu_ioctl+0x68/0x730 [] .do_vfs_ioctl+0x4dc/0x7a0 [] .SyS_ioctl+0xc4/0xe0 [] syscall_exit+0x0/0x98 Some users have reported this deadlock occurring in practice, though the reports have been primarily on 3.10.x-based kernels. This fixes the problem by making tbacct_lock be irq-safe. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 072287f..31d9cfb 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -131,8 +131,9 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu) static void kvmppc_core_vcpu_load_hv(struct kvm_vcpu *vcpu, int cpu) { struct kvmppc_vcore *vc = vcpu->arch.vcore; + unsigned long flags; - spin_lock(&vcpu->arch.tbacct_lock); + spin_lock_irqsave(&vcpu->arch.tbacct_lock, flags); if (vc->runner == vcpu && vc->vcore_state != VCORE_INACTIVE && vc->preempt_tb != TB_NIL) { vc->stolen_tb += mftb() - vc->preempt_tb; @@ -143,19 +144,20 @@ static void kvmppc_core_vcpu_load_hv(struct kvm_vcpu *vcpu, int cpu) vcpu->arch.busy_stolen += mftb() - vcpu->arch.busy_preempt; vcpu->arch.busy_preempt = TB_NIL; } - spin_unlock(&vcpu->arch.tbacct_lock); + spin_unlock_irqrestore(&vcpu->arch.tbacct_lock, flags); } static void kvmppc_core_vcpu_put_hv(struct kvm_vcpu *vcpu) { struct kvmppc_vcore *vc = vcpu->arch.vcore; + unsigned long flags; - spin_lock(&vcpu->arch.tbacct_lock); + spin_lock_irqsave(&vcpu->arch.tbacct_lock, flags); if (vc->runner == vcpu && vc->vcore_state != VCORE_INACTIVE) vc->preempt_tb = mftb(); if (vcpu->arch.state == KVMPPC_VCPU_BUSY_IN_HOST) vcpu->arch.busy_preempt = mftb(); - spin_unlock(&vcpu->arch.tbacct_lock); + spin_unlock_irqrestore(&vcpu->arch.tbacct_lock, flags); } static void kvmppc_set_msr_hv(struct kvm_vcpu *vcpu, u64 msr) @@ -486,11 +488,11 @@ static u64 vcore_stolen_time(struct kvmppc_vcore *vc, u64 now) */ if (vc->vcore_state != VCORE_INACTIVE && vc->runner->arch.run_task != current) { - spin_lock(&vc->runner->arch.tbacct_lock); + spin_lock_irq(&vc->runner->arch.tbacct_lock); p = vc->stolen_tb; if (vc->preempt_tb != TB_NIL) p += now - vc->preempt_tb; - spin_unlock(&vc->runner->arch.tbacct_lock); + spin_unlock_irq(&vc->runner->arch.tbacct_lock); } else { p = vc->stolen_tb; } @@ -512,10 +514,10 @@ static void kvmppc_create_dtl_entry(struct kvm_vcpu *vcpu, core_stolen = vcore_stolen_time(vc, now); stolen = core_stolen - vcpu->arch.stolen_logged; vcpu->arch.stolen_logged = core_stolen; - spin_lock(&vcpu->arch.tbacct_lock); + spin_lock_irq(&vcpu->arch.tbacct_lock); stolen += vcpu->arch.busy_stolen; vcpu->arch.busy_stolen = 0; - spin_unlock(&vcpu->arch.tbacct_lock); + spin_unlock_irq(&vcpu->arch.tbacct_lock); if (!dt || !vpa) return; memset(dt, 0, sizeof(struct dtl_entry)); @@ -1115,13 +1117,13 @@ static void kvmppc_remove_runnable(struct kvmppc_vcore *vc, if (vcpu->arch.state != KVMPPC_VCPU_RUNNABLE) return; - spin_lock(&vcpu->arch.tbacct_lock); + spin_lock_irq(&vcpu->arch.tbacct_lock); now = mftb(); vcpu->arch.busy_stolen += vcore_stolen_time(vc, now) - vcpu->arch.stolen_logged; vcpu->arch.busy_preempt = now; vcpu->arch.state = KVMPPC_VCPU_BUSY_IN_HOST; - spin_unlock(&vcpu->arch.tbacct_lock); + spin_unlock_irq(&vcpu->arch.tbacct_lock); --vc->n_runnable; list_del(&vcpu->arch.run_list); } -- cgit v0.10.2 From c9438092cae4a5bdbd146ca1385e85dcd6e847f8 Mon Sep 17 00:00:00 2001 From: Paul Mackerras Date: Sat, 16 Nov 2013 17:46:05 +1100 Subject: KVM: PPC: Book3S HV: Take SRCU read lock around kvm_read_guest() call Running a kernel with CONFIG_PROVE_RCU=y yields the following diagnostic: =============================== [ INFO: suspicious RCU usage. ] 3.12.0-rc5-kvm+ #9 Not tainted ------------------------------- include/linux/kvm_host.h:473 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by qemu-system-ppc/4831: stack backtrace: CPU: 28 PID: 4831 Comm: qemu-system-ppc Not tainted 3.12.0-rc5-kvm+ #9 Call Trace: [c000000be462b2a0] [c00000000001644c] .show_stack+0x7c/0x1f0 (unreliable) [c000000be462b370] [c000000000ad57c0] .dump_stack+0x88/0xb4 [c000000be462b3f0] [c0000000001315e8] .lockdep_rcu_suspicious+0x138/0x180 [c000000be462b480] [c00000000007862c] .gfn_to_memslot+0x13c/0x170 [c000000be462b510] [c00000000007d384] .gfn_to_hva_prot+0x24/0x90 [c000000be462b5a0] [c00000000007d420] .kvm_read_guest_page+0x30/0xd0 [c000000be462b630] [c00000000007d528] .kvm_read_guest+0x68/0x110 [c000000be462b6e0] [c000000000084594] .kvmppc_rtas_hcall+0x34/0x180 [c000000be462b7d0] [c000000000097934] .kvmppc_pseries_do_hcall+0x74/0x830 [c000000be462b880] [c0000000000990e8] .kvmppc_vcpu_run_hv+0xff8/0x15a0 [c000000be462b9e0] [c0000000000839cc] .kvmppc_vcpu_run+0x2c/0x40 [c000000be462ba50] [c0000000000810b4] .kvm_arch_vcpu_ioctl_run+0x54/0x1b0 [c000000be462bae0] [c00000000007b508] .kvm_vcpu_ioctl+0x478/0x730 [c000000be462bca0] [c00000000025532c] .do_vfs_ioctl+0x4dc/0x7a0 [c000000be462bd80] [c0000000002556b4] .SyS_ioctl+0xc4/0xe0 [c000000be462be30] [c000000000009ee4] syscall_exit+0x0/0x98 To fix this, we take the SRCU read lock around the kvmppc_rtas_hcall() call. Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 31d9cfb..b51d5db 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -591,7 +591,9 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) if (list_empty(&vcpu->kvm->arch.rtas_tokens)) return RESUME_HOST; + idx = srcu_read_lock(&vcpu->kvm->srcu); rc = kvmppc_rtas_hcall(vcpu); + srcu_read_unlock(&vcpu->kvm->srcu, idx); if (rc == -ENOENT) return RESUME_HOST; -- cgit v0.10.2 From 91648ec09c1ef69c4d840ab6dab391bfb452d554 Mon Sep 17 00:00:00 2001 From: pingfan liu Date: Fri, 15 Nov 2013 16:35:00 +0800 Subject: powerpc: kvm: fix rare but potential deadlock scene Since kvmppc_hv_find_lock_hpte() is called from both virtmode and realmode, so it can trigger the deadlock. Suppose the following scene: Two physical cpuM, cpuN, two VM instances A, B, each VM has a group of vcpus. If on cpuM, vcpu_A_1 holds bitlock X (HPTE_V_HVLOCK), then is switched out, and on cpuN, vcpu_A_2 try to lock X in realmode, then cpuN will be caught in realmode for a long time. What makes things even worse if the following happens, On cpuM, bitlockX is hold, on cpuN, Y is hold. vcpu_B_2 try to lock Y on cpuM in realmode vcpu_A_2 try to lock X on cpuN in realmode Oops! deadlock happens Signed-off-by: Liu Ping Fan Reviewed-by: Paul Mackerras CC: stable@vger.kernel.org Signed-off-by: Alexander Graf diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 47bbeaf..c5d1484 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -469,11 +469,14 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, slb_v = vcpu->kvm->arch.vrma_slb_v; } + preempt_disable(); /* Find the HPTE in the hash table */ index = kvmppc_hv_find_lock_hpte(kvm, eaddr, slb_v, HPTE_V_VALID | HPTE_V_ABSENT); - if (index < 0) + if (index < 0) { + preempt_enable(); return -ENOENT; + } hptep = (unsigned long *)(kvm->arch.hpt_virt + (index << 4)); v = hptep[0] & ~HPTE_V_HVLOCK; gr = kvm->arch.revmap[index].guest_rpte; @@ -481,6 +484,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr, /* Unlock the HPTE */ asm volatile("lwsync" : : : "memory"); hptep[0] = v; + preempt_enable(); gpte->eaddr = eaddr; gpte->vpage = ((v & HPTE_V_AVPN) << 4) | ((eaddr >> 12) & 0xfff); diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index fddbf98..1931aa3 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -749,6 +749,10 @@ static int slb_base_page_shift[4] = { 20, /* 1M, unsupported */ }; +/* When called from virtmode, this func should be protected by + * preempt_disable(), otherwise, the holding of HPTE_V_HVLOCK + * can trigger deadlock issue. + */ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v, unsigned long valid) { -- cgit v0.10.2 From 6802cdc58d4fe66cffd6cd04ee55e65dd61eeeeb Mon Sep 17 00:00:00 2001 From: Simon Horman Date: Wed, 6 Nov 2013 09:41:09 +0900 Subject: ARM: shmobile: lager: phy fixup needs CONFIG_PHYLIB Do not build the phy fixup unless CONFIG_PHYLIB is enabled. Other than not being useful it is also not possible to link the code under this condition as phy_register_fixup_for_id(), mdiobus_read() and mdiobus_write() are absent. arch/arm/mach-shmobile/built-in.o: In function `lager_ksz8041_fixup': board-lager.c:(.text+0xb8): undefined reference to `mdiobus_read' board-lager.c:(.text+0xd4): undefined reference to `mdiobus_write' arch/arm/mach-shmobile/built-in.o: In function `lager_init': board-lager.c:(.init.text+0xafc): undefined reference to `phy_register_fixup_for_id' This problem was introduced by 48c8b96f21817aad ("ARM: shmobile: Lager: add Micrel KSZ8041 PHY fixup") Cc: Sergei Shtylyov Signed-off-by: Simon Horman diff --git a/arch/arm/mach-shmobile/board-lager.c b/arch/arm/mach-shmobile/board-lager.c index a8d3ce6..e0406fd 100644 --- a/arch/arm/mach-shmobile/board-lager.c +++ b/arch/arm/mach-shmobile/board-lager.c @@ -245,7 +245,9 @@ static void __init lager_init(void) { lager_add_standard_devices(); - phy_register_fixup_for_id("r8a7790-ether-ff:01", lager_ksz8041_fixup); + if (IS_ENABLED(CONFIG_PHYLIB)) + phy_register_fixup_for_id("r8a7790-ether-ff:01", + lager_ksz8041_fixup); } static const char * const lager_boards_compat_dt[] __initconst = { -- cgit v0.10.2 From e55bc55867585e6628359fd5496316576fe58a2f Mon Sep 17 00:00:00 2001 From: Laurent Pinchart Date: Sat, 9 Nov 2013 13:18:01 +0100 Subject: irqchip: renesas-intc-irqpin: Fix register bitfield shift calculation The SENSE register bitfield position is incorrectly computed for SoCs that use 2-bit IRQ sense fields. Fix it. This has been tested on the Marzen (H1) and Bockw (M1) boards. This bug has been present since the renesas-intc-irqpin driver was introduced by 443580486e3b9657 ("irqchip: Renesas INTC External IRQ pin driver") in v3.10-rc1. Signed-off-by: Laurent Pinchart Acked-by: Magnus Damm Tested-by: Simon Horman Signed-off-by: Simon Horman diff --git a/drivers/irqchip/irq-renesas-intc-irqpin.c b/drivers/irqchip/irq-renesas-intc-irqpin.c index 82cec63..3ee78f0 100644 --- a/drivers/irqchip/irq-renesas-intc-irqpin.c +++ b/drivers/irqchip/irq-renesas-intc-irqpin.c @@ -149,8 +149,9 @@ static void intc_irqpin_read_modify_write(struct intc_irqpin_priv *p, static void intc_irqpin_mask_unmask_prio(struct intc_irqpin_priv *p, int irq, int do_mask) { - int bitfield_width = 4; /* PRIO assumed to have fixed bitfield width */ - int shift = (7 - irq) * bitfield_width; /* PRIO assumed to be 32-bit */ + /* The PRIO register is assumed to be 32-bit with fixed 4-bit fields. */ + int bitfield_width = 4; + int shift = 32 - (irq + 1) * bitfield_width; intc_irqpin_read_modify_write(p, INTC_IRQPIN_REG_PRIO, shift, bitfield_width, @@ -159,8 +160,9 @@ static void intc_irqpin_mask_unmask_prio(struct intc_irqpin_priv *p, static int intc_irqpin_set_sense(struct intc_irqpin_priv *p, int irq, int value) { + /* The SENSE register is assumed to be 32-bit. */ int bitfield_width = p->config.sense_bitfield_width; - int shift = (7 - irq) * bitfield_width; /* SENSE assumed to be 32-bit */ + int shift = 32 - (irq + 1) * bitfield_width; dev_dbg(&p->pdev->dev, "sense irq = %d, mode = %d\n", irq, value); -- cgit v0.10.2 From 3088f1085d1c08f31f02db26796555a66cdf7ca1 Mon Sep 17 00:00:00 2001 From: Kishon Vijay Abraham I Date: Mon, 25 Nov 2013 15:31:21 +0530 Subject: usb: dwc3: invoke phy_resume after phy_init usb_phy_set_suspend(phy, 0) is called before usb_phy_init. Fix it. Reported-by: Roger Quadros Signed-off-by: Kishon Vijay Abraham I Signed-off-by: Felipe Balbi diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c index 74f9cf0..4dd1f7e 100644 --- a/drivers/usb/dwc3/core.c +++ b/drivers/usb/dwc3/core.c @@ -455,9 +455,6 @@ static int dwc3_probe(struct platform_device *pdev) if (IS_ERR(regs)) return PTR_ERR(regs); - usb_phy_set_suspend(dwc->usb2_phy, 0); - usb_phy_set_suspend(dwc->usb3_phy, 0); - spin_lock_init(&dwc->lock); platform_set_drvdata(pdev, dwc); @@ -488,6 +485,9 @@ static int dwc3_probe(struct platform_device *pdev) goto err0; } + usb_phy_set_suspend(dwc->usb2_phy, 0); + usb_phy_set_suspend(dwc->usb3_phy, 0); + ret = dwc3_event_buffers_setup(dwc); if (ret) { dev_err(dwc->dev, "failed to setup event buffers\n"); -- cgit v0.10.2 From 501fae512584e601d34945efccd35c5e5d759132 Mon Sep 17 00:00:00 2001 From: Kishon Vijay Abraham I Date: Mon, 25 Nov 2013 15:31:22 +0530 Subject: usb: dwc3: power off usb phy in error path usb phy was power'ed on but was never power'ed off in the error path. Fix it. Signed-off-by: Kishon Vijay Abraham I Signed-off-by: Felipe Balbi diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c index 4dd1f7e..a49217a 100644 --- a/drivers/usb/dwc3/core.c +++ b/drivers/usb/dwc3/core.c @@ -569,6 +569,8 @@ err2: dwc3_event_buffers_cleanup(dwc); err1: + usb_phy_set_suspend(dwc->usb2_phy, 1); + usb_phy_set_suspend(dwc->usb3_phy, 1); dwc3_core_exit(dwc); err0: -- cgit v0.10.2 From 0b55149033403d78f345fb613d8ad52d16c161fc Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Fri, 8 Nov 2013 01:43:17 -0800 Subject: usb: phy: twl6030-usb: signedness bug in twl6030_readb() "ret" needs to be signed for the error handling to work. Signed-off-by: Dan Carpenter Signed-off-by: Felipe Balbi diff --git a/drivers/usb/phy/phy-twl6030-usb.c b/drivers/usb/phy/phy-twl6030-usb.c index 30e8a61..bad57ce 100644 --- a/drivers/usb/phy/phy-twl6030-usb.c +++ b/drivers/usb/phy/phy-twl6030-usb.c @@ -127,7 +127,8 @@ static inline int twl6030_writeb(struct twl6030_usb *twl, u8 module, static inline u8 twl6030_readb(struct twl6030_usb *twl, u8 module, u8 address) { - u8 data, ret = 0; + u8 data; + int ret; ret = twl_i2c_read_u8(module, &data, address); if (ret >= 0) -- cgit v0.10.2 From 23de2278ebc3a2f971ce45ca5e5e35c9d5a74040 Mon Sep 17 00:00:00 2001 From: Magnus Damm Date: Thu, 21 Nov 2013 14:19:29 +0900 Subject: ARM: shmobile: r8a7790: Fix GPIO resources in DTS The r8a7790 GPIO resources are currently incorrect. Fix that by making them match the English r8a7790 v0.6 data sheet. Tested with GPIO LED using Lager DT reference. This problem has been present since GPIOs were added to the r8a7790 SoC by f98e10c88aa95bf7 ("ARM: shmobile: r8a7790: Add GPIO controller devices to device tree") in v3.12-rc1. Signed-off-by: Magnus Damm Acked-by: Laurent Pinchart Signed-off-by: Simon Horman diff --git a/arch/arm/boot/dts/r8a7790.dtsi b/arch/arm/boot/dts/r8a7790.dtsi index ee845fa..46e1d7e 100644 --- a/arch/arm/boot/dts/r8a7790.dtsi +++ b/arch/arm/boot/dts/r8a7790.dtsi @@ -87,9 +87,9 @@ interrupts = <1 9 0xf04>; }; - gpio0: gpio@ffc40000 { + gpio0: gpio@e6050000 { compatible = "renesas,gpio-r8a7790", "renesas,gpio-rcar"; - reg = <0 0xffc40000 0 0x2c>; + reg = <0 0xe6050000 0 0x50>; interrupt-parent = <&gic>; interrupts = <0 4 0x4>; #gpio-cells = <2>; @@ -99,9 +99,9 @@ interrupt-controller; }; - gpio1: gpio@ffc41000 { + gpio1: gpio@e6051000 { compatible = "renesas,gpio-r8a7790", "renesas,gpio-rcar"; - reg = <0 0xffc41000 0 0x2c>; + reg = <0 0xe6051000 0 0x50>; interrupt-parent = <&gic>; interrupts = <0 5 0x4>; #gpio-cells = <2>; @@ -111,9 +111,9 @@ interrupt-controller; }; - gpio2: gpio@ffc42000 { + gpio2: gpio@e6052000 { compatible = "renesas,gpio-r8a7790", "renesas,gpio-rcar"; - reg = <0 0xffc42000 0 0x2c>; + reg = <0 0xe6052000 0 0x50>; interrupt-parent = <&gic>; interrupts = <0 6 0x4>; #gpio-cells = <2>; @@ -123,9 +123,9 @@ interrupt-controller; }; - gpio3: gpio@ffc43000 { + gpio3: gpio@e6053000 { compatible = "renesas,gpio-r8a7790", "renesas,gpio-rcar"; - reg = <0 0xffc43000 0 0x2c>; + reg = <0 0xe6053000 0 0x50>; interrupt-parent = <&gic>; interrupts = <0 7 0x4>; #gpio-cells = <2>; @@ -135,9 +135,9 @@ interrupt-controller; }; - gpio4: gpio@ffc44000 { + gpio4: gpio@e6054000 { compatible = "renesas,gpio-r8a7790", "renesas,gpio-rcar"; - reg = <0 0xffc44000 0 0x2c>; + reg = <0 0xe6054000 0 0x50>; interrupt-parent = <&gic>; interrupts = <0 8 0x4>; #gpio-cells = <2>; @@ -147,9 +147,9 @@ interrupt-controller; }; - gpio5: gpio@ffc45000 { + gpio5: gpio@e6055000 { compatible = "renesas,gpio-r8a7790", "renesas,gpio-rcar"; - reg = <0 0xffc45000 0 0x2c>; + reg = <0 0xe6055000 0 0x50>; interrupt-parent = <&gic>; interrupts = <0 9 0x4>; #gpio-cells = <2>; -- cgit v0.10.2 From 08239ca2a053dbc3b082916bdfbd88e5a9ad9267 Mon Sep 17 00:00:00 2001 From: Wei Yongjun Date: Thu, 28 Nov 2013 10:31:35 +0800 Subject: bcache: fix sparse non static symbol warning Fixes the following sparse warning: drivers/md/bcache/btree.c:2220:5: warning: symbol 'btree_insert_fn' was not declared. Should it be static? Signed-off-by: Wei Yongjun Signed-off-by: Kent Overstreet diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 5e2765a..dc6e2be 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -2217,7 +2217,7 @@ struct btree_insert_op { struct bkey *replace_key; }; -int btree_insert_fn(struct btree_op *b_op, struct btree *b) +static int btree_insert_fn(struct btree_op *b_op, struct btree *b) { struct btree_insert_op *op = container_of(b_op, struct btree_insert_op, op); -- cgit v0.10.2 From 87809942d3fa60bafb7a58d0bdb1c79e90a6821d Mon Sep 17 00:00:00 2001 From: Michele Baldessari Date: Mon, 25 Nov 2013 19:00:14 +0000 Subject: libata: add ATA_HORKAGE_BROKEN_FPDMA_AA quirk for Seagate Momentus SpinPoint M8 We've received multiple reports in Fedora via (BZ 907193) that the Seagate Momentus SpinPoint M8 errors out when enabling AA: [ 2.555905] ata2.00: failed to enable AA (error_mask=0x1) [ 2.568482] ata2.00: failed to enable AA (error_mask=0x1) Add the ATA_HORKAGE_BROKEN_FPDMA_AA for this specific harddisk. Reported-by: Nicholas Signed-off-by: Michele Baldessari Tested-by: Nicholas Acked-by: Alan Cox Signed-off-by: Tejun Heo Cc: stable@vger.kernel.org diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 75b9367..dae73ef 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4156,6 +4156,9 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = { { "ST3320[68]13AS", "SD1[5-9]", ATA_HORKAGE_NONCQ | ATA_HORKAGE_FIRMWARE_WARN }, + /* Seagate Momentus SpinPoint M8 seem to have FPMDA_AA issues */ + { "ST1000LM024 HN-M101MBB", "2AR10001", ATA_HORKAGE_BROKEN_FPDMA_AA }, + /* Blacklist entries taken from Silicon Image 3124/3132 Windows driver .inf file - also several Linux problem reports */ { "HTS541060G9SA00", "MB3OC60D", ATA_HORKAGE_NONCQ, }, -- cgit v0.10.2 From f6308b36c411dc5afd6a6f73e6454722bfde57b7 Mon Sep 17 00:00:00 2001 From: Paul Drews Date: Mon, 25 Nov 2013 14:15:55 -0800 Subject: ACPI: Add BayTrail SoC GPIO and LPSS ACPI IDs This adds the new ACPI ID (INT33FC) for the BayTrail GPIO banks as seen on a BayTrail M System-On-Chip platform. This ACPI ID is used by the BayTrail GPIO (pinctrl) driver to manage the Low Power Subsystem (LPSS). Signed-off-by: Paul Drews Acked-by: Linus Walleij Signed-off-by: Rafael J. Wysocki diff --git a/drivers/acpi/acpi_lpss.c b/drivers/acpi/acpi_lpss.c index 6745fe1..e603905 100644 --- a/drivers/acpi/acpi_lpss.c +++ b/drivers/acpi/acpi_lpss.c @@ -162,6 +162,7 @@ static const struct acpi_device_id acpi_lpss_device_ids[] = { { "80860F14", (unsigned long)&byt_sdio_dev_desc }, { "80860F41", (unsigned long)&byt_i2c_dev_desc }, { "INT33B2", }, + { "INT33FC", }, { "INT3430", (unsigned long)&lpt_dev_desc }, { "INT3431", (unsigned long)&lpt_dev_desc }, diff --git a/drivers/pinctrl/pinctrl-baytrail.c b/drivers/pinctrl/pinctrl-baytrail.c index 2832576..114f5ef 100644 --- a/drivers/pinctrl/pinctrl-baytrail.c +++ b/drivers/pinctrl/pinctrl-baytrail.c @@ -512,6 +512,7 @@ static const struct dev_pm_ops byt_gpio_pm_ops = { static const struct acpi_device_id byt_gpio_acpi_match[] = { { "INT33B2", 0 }, + { "INT33FC", 0 }, { } }; MODULE_DEVICE_TABLE(acpi, byt_gpio_acpi_match); -- cgit v0.10.2 From ae1495b12df1897d4f42842a7aa7276d920f6290 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Mon, 2 Dec 2013 09:31:36 -0500 Subject: ext4: call ext4_error_inode() if jbd2_journal_dirty_metadata() fails While it's true that errors can only happen if there is a bug in jbd2_journal_dirty_metadata(), if a bug does happen, we need to halt the kernel or remount the file system read-only in order to avoid further data loss. The ext4_journal_abort_handle() function doesn't do any of this, and while it's likely that this call (since it doesn't adjust refcounts) will likely result in the file system eventually deadlocking since the current transaction will never be able to close, it's much cleaner to call let ext4's error handling system deal with this situation. There's a separate bug here which is that if certain jbd2 errors errors occur and file system is mounted errors=continue, the file system will probably eventually end grind to a halt as described above. But things have been this way in a long time, and usually when we have these sorts of errors it's pretty much a disaster --- and that's why the jbd2 layer aggressively retries memory allocations, which is the most likely cause of these jbd2 errors. Signed-off-by: "Theodore Ts'o" Reviewed-by: Jan Kara Cc: stable@vger.kernel.org diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c index 17ac112..3fe29de 100644 --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -259,6 +259,15 @@ int __ext4_handle_dirty_metadata(const char *where, unsigned int line, if (WARN_ON_ONCE(err)) { ext4_journal_abort_handle(where, line, __func__, bh, handle, err); + ext4_error_inode(inode, where, line, + bh->b_blocknr, + "journal_dirty_metadata failed: " + "handle type %u started at line %u, " + "credits %u/%u, errcode %d", + handle->h_type, + handle->h_line_no, + handle->h_requested_credits, + handle->h_buffer_credits, err); } } else { if (inode) -- cgit v0.10.2 From 10becdb402af4fd4808a0491a726b96128c41076 Mon Sep 17 00:00:00 2001 From: Marek Vasut Date: Mon, 25 Nov 2013 09:47:00 +0100 Subject: ahci: imx: Explicitly clear IMX6Q_GPR13_SATA_MPLL_CLK_EN We must clear this IMX6Q_GPR13_SATA_MPLL_CLK_EN bit on i.MX6Q, otherwise Linux will fail to find the attached drive on some boards. This entire fix was: Reported-by: Eric Nelson Signed-off-by: Marek Vasut Reviewed-by: Shawn Guo Cc: Richard Zhu Cc: Linux-IDE Signed-off-by: Tejun Heo Cc: stable@vger.kernel.org diff --git a/drivers/ata/ahci_imx.c b/drivers/ata/ahci_imx.c index ae2d73f..3e23e99 100644 --- a/drivers/ata/ahci_imx.c +++ b/drivers/ata/ahci_imx.c @@ -113,7 +113,7 @@ static int imx6q_sata_init(struct device *dev, void __iomem *mmio) /* * set PHY Paremeters, two steps to configure the GPR13, * one write for rest of parameters, mask of first write - * is 0x07fffffd, and the other one write for setting + * is 0x07ffffff, and the other one write for setting * the mpll_clk_en. */ regmap_update_bits(imxpriv->gpr, 0x34, IMX6Q_GPR13_SATA_RX_EQ_VAL_MASK @@ -124,6 +124,7 @@ static int imx6q_sata_init(struct device *dev, void __iomem *mmio) | IMX6Q_GPR13_SATA_TX_ATTEN_MASK | IMX6Q_GPR13_SATA_TX_BOOST_MASK | IMX6Q_GPR13_SATA_TX_LVL_MASK + | IMX6Q_GPR13_SATA_MPLL_CLK_EN | IMX6Q_GPR13_SATA_TX_EDGE_RATE , IMX6Q_GPR13_SATA_RX_EQ_VAL_3_0_DB | IMX6Q_GPR13_SATA_RX_LOS_LVL_SATA2M -- cgit v0.10.2 From 4e8d2139802ce4f41936a687f06c560b12115247 Mon Sep 17 00:00:00 2001 From: Junho Ryu Date: Tue, 3 Dec 2013 18:10:28 -0500 Subject: ext4: fix use-after-free in ext4_mb_new_blocks ext4_mb_put_pa should hold pa->pa_lock before accessing pa->pa_count. While ext4_mb_use_preallocated checks pa->pa_deleted first and then increments pa->count later, ext4_mb_put_pa decrements pa->pa_count before holding pa->pa_lock and then sets pa->pa_deleted. * Free sequence ext4_mb_put_pa (1): atomic_dec_and_test pa->pa_count ext4_mb_put_pa (2): lock pa->pa_lock ext4_mb_put_pa (3): check pa->pa_deleted ext4_mb_put_pa (4): set pa->pa_deleted=1 ext4_mb_put_pa (5): unlock pa->pa_lock ext4_mb_put_pa (6): remove pa from a list ext4_mb_pa_callback: free pa * Use sequence ext4_mb_use_preallocated (1): iterate over preallocation ext4_mb_use_preallocated (2): lock pa->pa_lock ext4_mb_use_preallocated (3): check pa->pa_deleted ext4_mb_use_preallocated (4): increase pa->pa_count ext4_mb_use_preallocated (5): unlock pa->pa_lock ext4_mb_release_context: access pa * Use-after-free sequence [initial status] pa_deleted = 0, pa_count = 1> ext4_mb_use_preallocated (1): iterate over preallocation ext4_mb_use_preallocated (2): lock pa->pa_lock ext4_mb_use_preallocated (3): check pa->pa_deleted ext4_mb_put_pa (1): atomic_dec_and_test pa->pa_count [pa_count decremented] pa_deleted = 0, pa_count = 0> ext4_mb_use_preallocated (4): increase pa->pa_count [pa_count incremented] pa_deleted = 0, pa_count = 1> ext4_mb_use_preallocated (5): unlock pa->pa_lock ext4_mb_put_pa (2): lock pa->pa_lock ext4_mb_put_pa (3): check pa->pa_deleted ext4_mb_put_pa (4): set pa->pa_deleted=1 [race condition!] pa_deleted = 1, pa_count = 1> ext4_mb_put_pa (5): unlock pa->pa_lock ext4_mb_put_pa (6): remove pa from a list ext4_mb_pa_callback: free pa ext4_mb_release_context: access pa AddressSanitizer has detected use-after-free in ext4_mb_new_blocks Bug report: http://goo.gl/rG1On3 Signed-off-by: Junho Ryu Signed-off-by: "Theodore Ts'o" Cc: stable@vger.kernel.org diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 4d113ef..04766d9 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -3442,6 +3442,9 @@ static void ext4_mb_pa_callback(struct rcu_head *head) { struct ext4_prealloc_space *pa; pa = container_of(head, struct ext4_prealloc_space, u.pa_rcu); + + BUG_ON(atomic_read(&pa->pa_count)); + BUG_ON(pa->pa_deleted == 0); kmem_cache_free(ext4_pspace_cachep, pa); } @@ -3455,11 +3458,13 @@ static void ext4_mb_put_pa(struct ext4_allocation_context *ac, ext4_group_t grp; ext4_fsblk_t grp_blk; - if (!atomic_dec_and_test(&pa->pa_count) || pa->pa_free != 0) - return; - /* in this short window concurrent discard can set pa_deleted */ spin_lock(&pa->pa_lock); + if (!atomic_dec_and_test(&pa->pa_count) || pa->pa_free != 0) { + spin_unlock(&pa->pa_lock); + return; + } + if (pa->pa_deleted == 1) { spin_unlock(&pa->pa_lock); return; -- cgit v0.10.2 From 5946d089379a35dda0e531710b48fca05446a196 Mon Sep 17 00:00:00 2001 From: Eryu Guan Date: Tue, 3 Dec 2013 21:22:21 -0500 Subject: ext4: check for overlapping extents in ext4_valid_extent_entries() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A corrupted ext4 may have out of order leaf extents, i.e. extent: lblk 0--1023, len 1024, pblk 9217, flags: LEAF UNINIT extent: lblk 1000--2047, len 1024, pblk 10241, flags: LEAF UNINIT ^^^^ overlap with previous extent Reading such extent could hit BUG_ON() in ext4_es_cache_extent(). BUG_ON(end < lblk); The problem is that __read_extent_tree_block() tries to cache holes as well but assumes 'lblk' is greater than 'prev' and passes underflowed length to ext4_es_cache_extent(). Fix it by checking for overlapping extents in ext4_valid_extent_entries(). I hit this when fuzz testing ext4, and am able to reproduce it by modifying the on-disk extent by hand. Also add the check for (ee_block + len - 1) in ext4_valid_extent() to make sure the value is not overflow. Ran xfstests on patched ext4 and no regression. Cc: Lukáš Czerner Signed-off-by: Eryu Guan Signed-off-by: "Theodore Ts'o" Cc: stable@vger.kernel.org diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 35f65cf..267c9fb 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -360,8 +360,10 @@ static int ext4_valid_extent(struct inode *inode, struct ext4_extent *ext) { ext4_fsblk_t block = ext4_ext_pblock(ext); int len = ext4_ext_get_actual_len(ext); + ext4_lblk_t lblock = le32_to_cpu(ext->ee_block); + ext4_lblk_t last = lblock + len - 1; - if (len == 0) + if (lblock > last) return 0; return ext4_data_block_valid(EXT4_SB(inode->i_sb), block, len); } @@ -387,11 +389,26 @@ static int ext4_valid_extent_entries(struct inode *inode, if (depth == 0) { /* leaf entries */ struct ext4_extent *ext = EXT_FIRST_EXTENT(eh); + struct ext4_super_block *es = EXT4_SB(inode->i_sb)->s_es; + ext4_fsblk_t pblock = 0; + ext4_lblk_t lblock = 0; + ext4_lblk_t prev = 0; + int len = 0; while (entries) { if (!ext4_valid_extent(inode, ext)) return 0; + + /* Check for overlapping extents */ + lblock = le32_to_cpu(ext->ee_block); + len = ext4_ext_get_actual_len(ext); + if ((lblock <= prev) && prev) { + pblock = ext4_ext_pblock(ext); + es->s_last_error_block = cpu_to_le64(pblock); + return 0; + } ext++; entries--; + prev = lblock + len - 1; } } else { struct ext4_extent_idx *ext_idx = EXT_FIRST_INDEX(eh); -- cgit v0.10.2 From df4e7ac0bb70abc97fbfd9ef09671fc084b3f9db Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Tue, 3 Dec 2013 11:20:06 +0100 Subject: ext2: Fix oops in ext2_get_block() called from ext2_quota_write() ext2_quota_write() doesn't properly setup bh it passes to ext2_get_block() and thus we hit assertion BUG_ON(maxblocks == 0) in ext2_get_blocks() (or we could actually ask for mapping arbitrary number of blocks depending on whatever value was on stack). Fix ext2_quota_write() to properly fill in number of blocks to map. CC: stable@vger.kernel.org # >= 2.6.12 Reviewed-by: "Theodore Ts'o" Reviewed-by: Christoph Hellwig Reported-by: Christoph Hellwig Signed-off-by: Jan Kara diff --git a/fs/ext2/super.c b/fs/ext2/super.c index 2885349..20d6697 100644 --- a/fs/ext2/super.c +++ b/fs/ext2/super.c @@ -1493,6 +1493,7 @@ static ssize_t ext2_quota_write(struct super_block *sb, int type, sb->s_blocksize - offset : towrite; tmp_bh.b_state = 0; + tmp_bh.b_size = sb->s_blocksize; err = ext2_get_block(inode, blk, &tmp_bh, 1); if (err < 0) goto out; -- cgit v0.10.2 From c94cae53f9e564484f906a79be5639fc66e8cb02 Mon Sep 17 00:00:00 2001 From: Eric Trudeau Date: Wed, 4 Dec 2013 11:39:33 +0000 Subject: XEN: Grant table address, xen_hvm_resume_frames, is a phys_addr not a pfn From: Eric Trudeau xen_hvm_resume_frames stores the physical address of the grant table. englighten.c was incorrectly setting it as if it was a page frame number. This caused the table to be mapped into the guest at an unexpected physical address. Additionally, a warning is improved to include the grant table address which failed in xen_remap. Signed-off-by: Eric Trudeau Signed-off-by: Stefano Stabellini diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c index 83e4f95..6a288c7 100644 --- a/arch/arm/xen/enlighten.c +++ b/arch/arm/xen/enlighten.c @@ -224,10 +224,10 @@ static int __init xen_guest_init(void) } if (of_address_to_resource(node, GRANT_TABLE_PHYSADDR, &res)) return 0; - xen_hvm_resume_frames = res.start >> PAGE_SHIFT; + xen_hvm_resume_frames = res.start; xen_events_irq = irq_of_parse_and_map(node, 0); pr_info("Xen %s support found, events_irq=%d gnttab_frame_pfn=%lx\n", - version, xen_events_irq, xen_hvm_resume_frames); + version, xen_events_irq, (xen_hvm_resume_frames >> PAGE_SHIFT)); xen_domain_type = XEN_HVM_DOMAIN; xen_setup_features(); diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c index 02838719..aa846a4 100644 --- a/drivers/xen/grant-table.c +++ b/drivers/xen/grant-table.c @@ -1176,7 +1176,8 @@ static int gnttab_setup(void) gnttab_shared.addr = xen_remap(xen_hvm_resume_frames, PAGE_SIZE * max_nr_gframes); if (gnttab_shared.addr == NULL) { - pr_warn("Failed to ioremap gnttab share frames!\n"); + pr_warn("Failed to ioremap gnttab share frames (addr=0x%08lx)!\n", + xen_hvm_resume_frames); return -ENOMEM; } } -- cgit v0.10.2 From 7f62b6ee767586ee7e5d12787dbaaaf47a91979a Mon Sep 17 00:00:00 2001 From: Nicolin Chen Date: Wed, 4 Dec 2013 11:18:36 +0800 Subject: ASoC: soc-pcm: Use valid condition for snd_soc_dai_digital_mute() in hw_free() The snd_soc_dai_digital_mute() here will be never executed because we only decrease codec->active in snd_soc_close(). Thus correct it. Signed-off-by: Nicolin Chen Signed-off-by: Mark Brown diff --git a/sound/soc/soc-pcm.c b/sound/soc/soc-pcm.c index 11a90cd..891b9a9 100644 --- a/sound/soc/soc-pcm.c +++ b/sound/soc/soc-pcm.c @@ -600,12 +600,13 @@ static int soc_pcm_hw_free(struct snd_pcm_substream *substream) struct snd_soc_platform *platform = rtd->platform; struct snd_soc_dai *cpu_dai = rtd->cpu_dai; struct snd_soc_dai *codec_dai = rtd->codec_dai; - struct snd_soc_codec *codec = rtd->codec; + bool playback = substream->stream == SNDRV_PCM_STREAM_PLAYBACK; mutex_lock_nested(&rtd->pcm_mutex, rtd->pcm_subclass); /* apply codec digital mute */ - if (!codec->active) + if ((playback && codec_dai->playback_active == 1) || + (!playback && codec_dai->capture_active == 1)) snd_soc_dai_digital_mute(codec_dai, 1, substream->stream); /* free any machine hw params */ -- cgit v0.10.2 From a8f1f100ad994725a8295f6997524c57c72e06f5 Mon Sep 17 00:00:00 2001 From: Bo Shen Date: Wed, 4 Dec 2013 10:37:03 +0800 Subject: ASoC: atmel_ssc_dai: add dai trigger ops According to the SSC specifiation, it should be enabled after DMA is enabled. So, add trigger operation to make sure the right sequence. Signed-off-by: Bo Shen Tested-by: Richard Genoud Signed-off-by: Mark Brown diff --git a/sound/soc/atmel/atmel_ssc_dai.c b/sound/soc/atmel/atmel_ssc_dai.c index 8697ced..1ead3c9 100644 --- a/sound/soc/atmel/atmel_ssc_dai.c +++ b/sound/soc/atmel/atmel_ssc_dai.c @@ -648,7 +648,7 @@ static int atmel_ssc_prepare(struct snd_pcm_substream *substream, dma_params = ssc_p->dma_params[dir]; - ssc_writel(ssc_p->ssc->regs, CR, dma_params->mask->ssc_enable); + ssc_writel(ssc_p->ssc->regs, CR, dma_params->mask->ssc_disable); ssc_writel(ssc_p->ssc->regs, IDR, dma_params->mask->ssc_error); pr_debug("%s enabled SSC_SR=0x%08x\n", @@ -657,6 +657,33 @@ static int atmel_ssc_prepare(struct snd_pcm_substream *substream, return 0; } +static int atmel_ssc_trigger(struct snd_pcm_substream *substream, + int cmd, struct snd_soc_dai *dai) +{ + struct atmel_ssc_info *ssc_p = &ssc_info[dai->id]; + struct atmel_pcm_dma_params *dma_params; + int dir; + + if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) + dir = 0; + else + dir = 1; + + dma_params = ssc_p->dma_params[dir]; + + switch (cmd) { + case SNDRV_PCM_TRIGGER_START: + case SNDRV_PCM_TRIGGER_RESUME: + case SNDRV_PCM_TRIGGER_PAUSE_RELEASE: + ssc_writel(ssc_p->ssc->regs, CR, dma_params->mask->ssc_enable); + break; + default: + ssc_writel(ssc_p->ssc->regs, CR, dma_params->mask->ssc_disable); + break; + } + + return 0; +} #ifdef CONFIG_PM static int atmel_ssc_suspend(struct snd_soc_dai *cpu_dai) @@ -731,6 +758,7 @@ static const struct snd_soc_dai_ops atmel_ssc_dai_ops = { .startup = atmel_ssc_startup, .shutdown = atmel_ssc_shutdown, .prepare = atmel_ssc_prepare, + .trigger = atmel_ssc_trigger, .hw_params = atmel_ssc_hw_params, .set_fmt = atmel_ssc_set_dai_fmt, .set_clkdiv = atmel_ssc_set_dai_clkdiv, -- cgit v0.10.2 From bc567a93502275755492141524935269dcf0ea1b Mon Sep 17 00:00:00 2001 From: Bo Shen Date: Wed, 4 Dec 2013 10:37:04 +0800 Subject: ASoC: sam9x5_wm8731: change to work in DSP A mode Change sam9x5 with wm8731 work in DSP A mode, this will fix the left/right channel swap issue. Signed-off-by: Bo Shen Tested-by: Richard Genoud Signed-off-by: Mark Brown diff --git a/sound/soc/atmel/sam9x5_wm8731.c b/sound/soc/atmel/sam9x5_wm8731.c index 1b37228..7d6a905 100644 --- a/sound/soc/atmel/sam9x5_wm8731.c +++ b/sound/soc/atmel/sam9x5_wm8731.c @@ -109,7 +109,7 @@ static int sam9x5_wm8731_driver_probe(struct platform_device *pdev) dai->stream_name = "WM8731 PCM"; dai->codec_dai_name = "wm8731-hifi"; dai->init = sam9x5_wm8731_init; - dai->dai_fmt = SND_SOC_DAIFMT_I2S | SND_SOC_DAIFMT_NB_NF + dai->dai_fmt = SND_SOC_DAIFMT_DSP_A | SND_SOC_DAIFMT_NB_NF | SND_SOC_DAIFMT_CBM_CFM; ret = snd_soc_of_parse_card_name(card, "atmel,model"); -- cgit v0.10.2 From bd0976dd3379e790b031cef7f477c58b82a65fc2 Mon Sep 17 00:00:00 2001 From: Marco Piazza Date: Thu, 28 Nov 2013 00:15:25 +0100 Subject: Bluetooth: Add support for Toshiba Bluetooth device [0930:0220] This patch adds support for new Toshiba Bluetooth device. T: Bus=05 Lev=01 Prnt=01 Port=02 Cnt=02 Dev#= 4 Spd=12 MxCh= 0 D: Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0930 ProdID=0220 Rev=00.02 C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA I: If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb I: If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb Signed-off-by: Marco Piazza Signed-off-by: Gustavo Padovan diff --git a/drivers/bluetooth/ath3k.c b/drivers/bluetooth/ath3k.c index 6bfc1bb..dceb85f 100644 --- a/drivers/bluetooth/ath3k.c +++ b/drivers/bluetooth/ath3k.c @@ -87,6 +87,7 @@ static const struct usb_device_id ath3k_table[] = { { USB_DEVICE(0x0CF3, 0xE004) }, { USB_DEVICE(0x0CF3, 0xE005) }, { USB_DEVICE(0x0930, 0x0219) }, + { USB_DEVICE(0x0930, 0x0220) }, { USB_DEVICE(0x0489, 0xe057) }, { USB_DEVICE(0x13d3, 0x3393) }, { USB_DEVICE(0x0489, 0xe04e) }, @@ -129,6 +130,7 @@ static const struct usb_device_id ath3k_blist_tbl[] = { { USB_DEVICE(0x0cf3, 0xe004), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0cf3, 0xe005), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0930, 0x0219), .driver_info = BTUSB_ATH3012 }, + { USB_DEVICE(0x0930, 0x0220), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0489, 0xe057), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x13d3, 0x3393), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0489, 0xe04e), .driver_info = BTUSB_ATH3012 }, diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index c0ff34f..3980fd1 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -154,6 +154,7 @@ static const struct usb_device_id blacklist_table[] = { { USB_DEVICE(0x0cf3, 0xe004), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0cf3, 0xe005), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0930, 0x0219), .driver_info = BTUSB_ATH3012 }, + { USB_DEVICE(0x0930, 0x0220), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0489, 0xe057), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x13d3, 0x3393), .driver_info = BTUSB_ATH3012 }, { USB_DEVICE(0x0489, 0xe04e), .driver_info = BTUSB_ATH3012 }, -- cgit v0.10.2 From 3459f11a8b16f40f9cde8e4281c2d5dd2ff1a732 Mon Sep 17 00:00:00 2001 From: Luiz Capitulino Date: Thu, 5 Dec 2013 13:04:10 +1030 Subject: virtio_balloon: update_balloon_size(): update correct field According to the virtio spec, the device configuration field that should be updated after an inflation or deflation operation is the 'actual' field, not the 'num_pages' one. Commit 855e0c5288177bcb193f6f6316952d2490478e1c swapped them in update_balloon_size(). Fix it. Signed-off-by: Luiz Capitulino Signed-off-by: Rusty Russell Fixes: 855e0c5288177bcb193f6f6316952d2490478e1c diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index c444654..5c4a95b 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -285,7 +285,7 @@ static void update_balloon_size(struct virtio_balloon *vb) { __le32 actual = cpu_to_le32(vb->num_pages); - virtio_cwrite(vb->vdev, struct virtio_balloon_config, num_pages, + virtio_cwrite(vb->vdev, struct virtio_balloon_config, actual, &actual); } -- cgit v0.10.2 From 4c4d684a55fc01dac6bee696efc56b96d0e6c03a Mon Sep 17 00:00:00 2001 From: Ujjal Roy Date: Wed, 4 Dec 2013 17:27:34 +0530 Subject: cfg80211: fix WARN_ON for re-association to the expired BSS cfg80211 allows re-association in managed mode and if a user wants to re-associate to the same AP network after the time period of IEEE80211_SCAN_RESULT_EXPIRE, cfg80211 warns with the following message on receiving the connect result event. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 13984 at net/wireless/sme.c:658 __cfg80211_connect_result+0x3a6/0x3e0 [cfg80211]() Call Trace: [] dump_stack+0x46/0x58 [] warn_slowpath_common+0x87/0xb0 [] warn_slowpath_null+0x15/0x20 [] __cfg80211_connect_result+0x3a6/0x3e0 [cfg80211] [] ? update_rq_clock+0x2b/0x50 [] ? update_curr+0x1/0x160 [] cfg80211_process_wdev_events+0xb2/0x1c0 [cfg80211] [] ? pick_next_task_fair+0x63/0x170 [] cfg80211_process_rdev_events+0x38/0x90 [cfg80211] [] cfg80211_event_work+0x1d/0x30 [cfg80211] [] process_one_work+0x17f/0x420 [] worker_thread+0x11a/0x370 [] ? rescuer_thread+0x2f0/0x2f0 [] kthread+0xbb/0xc0 [] ? kthread_create_on_node+0x120/0x120 [] ret_from_fork+0x7c/0xb0 [] ? kthread_create_on_node+0x120/0x120 ---[ end trace 61f3bddc9c4981f7 ]--- The reason is that, in connect result event cfg80211 unholds the BSS to which the device is associated (and was held so far). So, for the event with status successful, when cfg80211 wants to get that BSS from the device's BSS list it gets a NULL BSS because the BSS has been expired and unheld already. Fix it by reshuffling the code. Signed-off-by: Ujjal Roy Signed-off-by: Johannes Berg diff --git a/net/wireless/sme.c b/net/wireless/sme.c index 65f8008..d3c5bd7 100644 --- a/net/wireless/sme.c +++ b/net/wireless/sme.c @@ -632,6 +632,16 @@ void __cfg80211_connect_result(struct net_device *dev, const u8 *bssid, } #endif + if (!bss && (status == WLAN_STATUS_SUCCESS)) { + WARN_ON_ONCE(!wiphy_to_dev(wdev->wiphy)->ops->connect); + bss = cfg80211_get_bss(wdev->wiphy, NULL, bssid, + wdev->ssid, wdev->ssid_len, + WLAN_CAPABILITY_ESS, + WLAN_CAPABILITY_ESS); + if (bss) + cfg80211_hold_bss(bss_from_pub(bss)); + } + if (wdev->current_bss) { cfg80211_unhold_bss(wdev->current_bss); cfg80211_put_bss(wdev->wiphy, &wdev->current_bss->pub); @@ -649,16 +659,8 @@ void __cfg80211_connect_result(struct net_device *dev, const u8 *bssid, return; } - if (!bss) { - WARN_ON_ONCE(!wiphy_to_dev(wdev->wiphy)->ops->connect); - bss = cfg80211_get_bss(wdev->wiphy, NULL, bssid, - wdev->ssid, wdev->ssid_len, - WLAN_CAPABILITY_ESS, - WLAN_CAPABILITY_ESS); - if (WARN_ON(!bss)) - return; - cfg80211_hold_bss(bss_from_pub(bss)); - } + if (WARN_ON(!bss)) + return; wdev->current_bss = bss_from_pub(bss); -- cgit v0.10.2 From 75704ecfbb4124139b78b71dd603f05d61abe689 Mon Sep 17 00:00:00 2001 From: Nicolin Chen Date: Wed, 4 Dec 2013 17:22:16 +0800 Subject: ASoC: wm8962: Enable SYSCLK provisonally before fetching generated DSPCLK_DIV DSPCLK_DIV can be only generated correctly after enabling SYSCLK. But if the current bias_level hasn't reached SND_SOC_BIAS_ON, DAPM won't enable SYSCLK, which would cause the calculation result from DSPCLK_DIV invalid since bit DSPCLK_DIV will be finally turned to its true value after DAPM enables SYSCLK while the driver won't calculate it again for the current instance. In this circumstance, a playback which needs non-zero DSPCLK_DIV would be distorted due to unexpected clock frequency resulted from an invalid DSPCLK_DIV value. So this patch provisionally enables the SYSCLK to get a valid DSPCLK_DIV for calculation and then disables it afterward. Signed-off-by: Nicolin Chen Acked-by: Charles Keepax Signed-off-by: Mark Brown diff --git a/sound/soc/codecs/wm8962.c b/sound/soc/codecs/wm8962.c index 543c5c2..0f17ed3 100644 --- a/sound/soc/codecs/wm8962.c +++ b/sound/soc/codecs/wm8962.c @@ -2439,7 +2439,20 @@ static void wm8962_configure_bclk(struct snd_soc_codec *codec) snd_soc_update_bits(codec, WM8962_CLOCKING_4, WM8962_SYSCLK_RATE_MASK, clocking4); + /* DSPCLK_DIV can be only generated correctly after enabling SYSCLK. + * So we here provisionally enable it and then disable it afterward + * if current bias_level hasn't reached SND_SOC_BIAS_ON. + */ + if (codec->dapm.bias_level != SND_SOC_BIAS_ON) + snd_soc_update_bits(codec, WM8962_CLOCKING2, + WM8962_SYSCLK_ENA_MASK, WM8962_SYSCLK_ENA); + dspclk = snd_soc_read(codec, WM8962_CLOCKING1); + + if (codec->dapm.bias_level != SND_SOC_BIAS_ON) + snd_soc_update_bits(codec, WM8962_CLOCKING2, + WM8962_SYSCLK_ENA_MASK, 0); + if (dspclk < 0) { dev_err(codec->dev, "Failed to read DSPCLK: %d\n", dspclk); return; -- cgit v0.10.2 From b1a0fbfdde65dffd83c84c006f84fa12041907c5 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Wed, 4 Dec 2013 10:12:40 -0500 Subject: percpu: fix spurious sparse warnings from DEFINE_PER_CPU() When CONFIG_DEBUG_FORCE_WEAK_PER_CPU or CONFIG_ARCH_NEEDS_WEAK_PER_CPU is set, DEFINE_PER_CPU() explodes into cryptic series of definitions to still allow using "static" for percpu variables while keeping all per-cpu symbols unique in the kernel image which is required for weak symbols. This ultimately converts the actual symbol to global whether DEFINE_PER_CPU() is prefixed with static or not. Unfortunately, the macro forgot to add explicit extern declartion of the actual symbol ending up defining global symbol without preceding declaration for static definitions which naturally don't have matching DECLARE_PER_CPU(). The only ill effect is triggering of the following warnings. fs/inode.c:74:8: warning: symbol 'nr_inodes' was not declared. Should it be static? fs/inode.c:75:8: warning: symbol 'nr_unused' was not declared. Should it be static? Fix it by adding extern declaration in the DEFINE_PER_CPU() macro. Signed-off-by: Tejun Heo Reported-by: Wanlong Gao Tested-by: Wanlong Gao diff --git a/include/linux/percpu-defs.h b/include/linux/percpu-defs.h index 57e890a..a5fc7d0 100644 --- a/include/linux/percpu-defs.h +++ b/include/linux/percpu-defs.h @@ -69,6 +69,7 @@ __PCPU_DUMMY_ATTRS char __pcpu_scope_##name; \ extern __PCPU_DUMMY_ATTRS char __pcpu_unique_##name; \ __PCPU_DUMMY_ATTRS char __pcpu_unique_##name; \ + extern __PCPU_ATTRS(sec) __typeof__(type) name; \ __PCPU_ATTRS(sec) PER_CPU_DEF_ATTRIBUTES __weak \ __typeof__(type) name #else -- cgit v0.10.2 From 8515736604941334bd9e8fc01edea685a228acd5 Mon Sep 17 00:00:00 2001 From: Andrey Vagin Date: Fri, 6 Dec 2013 09:06:41 +0400 Subject: block: fix memory leaks on unplugging block device All objects, which are allocated in blk_mq_register_disk, must be released in blk_mq_unregister_disk. I use a KVM virtual machine and virtio disk to reproduce this issue. kmemleak: 18 new suspected memory leaks (see /sys/kernel/debug/kmemleak) $ cat /sys/kernel/debug/kmemleak | head -n 30 unreferenced object 0xffff8800b6636150 (size 8): comm "kworker/0:2", pid 65, jiffies 4294809903 (age 86.358s) hex dump (first 8 bytes): 76 69 72 74 69 6f 34 00 virtio4. backtrace: [] kmemleak_alloc+0x4e/0xb0 [] __kmalloc_track_caller+0xf5/0x260 [] kstrdup+0x31/0x60 [] sysfs_new_dirent+0x2e/0x140 [] create_dir+0x38/0xe0 [] sysfs_create_dir_ns+0x73/0xc0 [] kobject_add_internal+0xc9/0x340 [] kobject_add+0x65/0xb0 [] device_add+0x128/0x660 [] device_register+0x1a/0x20 [] register_virtio_device+0x98/0xe0 [] virtio_pci_probe+0x12e/0x1c0 [] local_pci_probe+0x45/0xa0 [] pci_device_probe+0x121/0x130 [] driver_probe_device+0x87/0x390 [] __device_attach+0x3b/0x40 unreferenced object 0xffff8800b65aa1d8 (size 144): Fixes: 320ae51feed5 (blk-mq: new multi-queue block IO queueing mechanism) Cc: Jens Axboe Signed-off-by: Andrey Vagin Signed-off-by: Jens Axboe diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c index ba6cf8e..b91ce75 100644 --- a/block/blk-mq-sysfs.c +++ b/block/blk-mq-sysfs.c @@ -335,9 +335,22 @@ static struct kobj_type blk_mq_hw_ktype = { void blk_mq_unregister_disk(struct gendisk *disk) { struct request_queue *q = disk->queue; + struct blk_mq_hw_ctx *hctx; + struct blk_mq_ctx *ctx; + int i, j; + + queue_for_each_hw_ctx(q, hctx, i) { + hctx_for_each_ctx(hctx, ctx, j) { + kobject_del(&ctx->kobj); + kobject_put(&ctx->kobj); + } + kobject_del(&hctx->kobj); + kobject_put(&hctx->kobj); + } kobject_uevent(&q->mq_kobj, KOBJ_REMOVE); kobject_del(&q->mq_kobj); + kobject_put(&q->mq_kobj); kobject_put(&disk_to_dev(disk)->kobj); } -- cgit v0.10.2 From b6497b383f65dd1bc60ea1f5d70e157a268fbd15 Mon Sep 17 00:00:00 2001 From: Ian Campbell Date: Fri, 6 Dec 2013 17:55:56 +0000 Subject: xen: privcmd: do not return pages which we have failed to unmap This failure represents a hypervisor issue, but if it does occur then nothing good can come of returning pages which still refer to a foreign owned page into the general allocation pool. Instead we are forced to leak them. Log that we have done so. The potential for failure only exists for autotranslated guest (e.g. ARM and x86 PVH). Signed-off-by: Ian Campbell Signed-off-by: Stefano Stabellini Reviewed-by: David Vrabel diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index 8e74590..569a13b 100644 --- a/drivers/xen/privcmd.c +++ b/drivers/xen/privcmd.c @@ -533,12 +533,17 @@ static void privcmd_close(struct vm_area_struct *vma) { struct page **pages = vma->vm_private_data; int numpgs = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; + int rc; if (!xen_feature(XENFEAT_auto_translated_physmap) || !numpgs || !pages) return; - xen_unmap_domain_mfn_range(vma, numpgs, pages); - free_xenballooned_pages(numpgs, pages); + rc = xen_unmap_domain_mfn_range(vma, numpgs, pages); + if (rc == 0) + free_xenballooned_pages(numpgs, pages); + else + pr_crit("unable to unmap MFN range: leaking %d pages. rc=%d\n", + numpgs, rc); kfree(pages); } -- cgit v0.10.2 From 266ccd505e8acb98717819cef9d91d66c7b237cc Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Fri, 6 Dec 2013 15:07:32 -0500 Subject: cgroup: fix cgroup_create() error handling path ae7f164a09 ("cgroup: move cgroup->subsys[] assignment to online_css()") moved cgroup->subsys[] assignements later in cgroup_create() but didn't update error handling path accordingly leading to the following oops and leaking later css's after an online_css() failure. The oops is from cgroup destruction path being invoked on the partially constructed cgroup which is not ready to handle empty slots in cgrp->subsys[] array. BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [] cgroup_destroy_locked+0x118/0x2f0 PGD a780a067 PUD aadbe067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: CPU: 6 PID: 7360 Comm: mkdir Not tainted 3.13.0-rc2+ #69 Hardware name: task: ffff8800b9dbec00 ti: ffff8800a781a000 task.ti: ffff8800a781a000 RIP: 0010:[] [] cgroup_destroy_locked+0x118/0x2f0 RSP: 0018:ffff8800a781bd98 EFLAGS: 00010282 RAX: ffff880586903878 RBX: ffff880586903800 RCX: ffff880586903820 RDX: ffff880586903860 RSI: ffff8800a781bdb0 RDI: ffff880586903820 RBP: ffff8800a781bde8 R08: ffff88060e0b8048 R09: ffffffff811d7bc1 R10: 000000000000008c R11: 0000000000000001 R12: ffff8800a72286c0 R13: 0000000000000000 R14: ffffffff81cf7a40 R15: 0000000000000001 FS: 00007f60ecda57a0(0000) GS:ffff8806272c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 00000000a7a03000 CR4: 00000000000007e0 Stack: ffff880586903860 ffff880586903910 ffff8800a72286c0 ffff880586903820 ffffffff81cf7a40 ffff880586903800 ffff88060e0b8018 ffffffff81cf7a40 ffff8800b9dbec00 ffff8800b9dbf098 ffff8800a781bec8 ffffffff810ef5bf Call Trace: [] cgroup_mkdir+0x55f/0x5f0 [] vfs_mkdir+0xee/0x140 [] SyS_mkdirat+0x6e/0xf0 [] SyS_mkdir+0x19/0x20 [] system_call_fastpath+0x16/0x1b This patch moves reference bumping inside online_css() loop, clears css_ar[] as css's are brought online successfully, and updates err_destroy path so that either a css is fully online and destroyed by cgroup_destroy_locked() or the error path frees it. This creates a duplicate css free logic in the error path but it will be cleaned up soon. v2: Li pointed out that cgroup_destroy_locked() would do NULL-deref if invoked with a cgroup which doesn't have all css's populated. Update cgroup_destroy_locked() so that it skips NULL css's. Signed-off-by: Tejun Heo Acked-by: Li Zefan Reported-by: Vladimir Davydov Cc: stable@vger.kernel.org # v3.12+ diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 8b729c2..bcb1755 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -4426,14 +4426,6 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry, list_add_tail_rcu(&cgrp->sibling, &cgrp->parent->children); root->number_of_cgroups++; - /* each css holds a ref to the cgroup's dentry and the parent css */ - for_each_root_subsys(root, ss) { - struct cgroup_subsys_state *css = css_ar[ss->subsys_id]; - - dget(dentry); - css_get(css->parent); - } - /* hold a ref to the parent's dentry */ dget(parent->dentry); @@ -4445,6 +4437,13 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry, if (err) goto err_destroy; + /* each css holds a ref to the cgroup's dentry and parent css */ + dget(dentry); + css_get(css->parent); + + /* mark it consumed for error path */ + css_ar[ss->subsys_id] = NULL; + if (ss->broken_hierarchy && !ss->warned_broken_hierarchy && parent->parent) { pr_warning("cgroup: %s (%d) created nested cgroup for controller \"%s\" which has incomplete hierarchy support. Nested cgroups may change behavior in the future.\n", @@ -4491,6 +4490,14 @@ err_free_cgrp: return err; err_destroy: + for_each_root_subsys(root, ss) { + struct cgroup_subsys_state *css = css_ar[ss->subsys_id]; + + if (css) { + percpu_ref_cancel_init(&css->refcnt); + ss->css_free(css); + } + } cgroup_destroy_locked(cgrp); mutex_unlock(&cgroup_mutex); mutex_unlock(&dentry->d_inode->i_mutex); @@ -4652,8 +4659,12 @@ static int cgroup_destroy_locked(struct cgroup *cgrp) * will be invoked to perform the rest of destruction once the * percpu refs of all css's are confirmed to be killed. */ - for_each_root_subsys(cgrp->root, ss) - kill_css(cgroup_css(cgrp, ss)); + for_each_root_subsys(cgrp->root, ss) { + struct cgroup_subsys_state *css = cgroup_css(cgrp, ss); + + if (css) + kill_css(css); + } /* * Mark @cgrp dead. This prevents further task migration and child -- cgit v0.10.2 From 851dd02b4f1fdb437586190f87217e3382a736bd Mon Sep 17 00:00:00 2001 From: Chris Ruehl Date: Wed, 4 Dec 2013 10:02:44 +0800 Subject: usb: phy-tegra-usb.c: wrong pointer check for remap UTMI usb: phy-tegra-usb.c: wrong pointer check for remap UTMI A wrong pointer was used to test the result of devm_ioremap() Acked-by: Stephen Warren Reviewed-by: Thierry Reding Signed-off-by: Chris Ruehl Acked-by: Venu Byravarasu Signed-off-by: Felipe Balbi diff --git a/drivers/usb/phy/phy-tegra-usb.c b/drivers/usb/phy/phy-tegra-usb.c index 82232ac..bbe4f8e 100644 --- a/drivers/usb/phy/phy-tegra-usb.c +++ b/drivers/usb/phy/phy-tegra-usb.c @@ -876,7 +876,7 @@ static int utmi_phy_probe(struct tegra_usb_phy *tegra_phy, tegra_phy->pad_regs = devm_ioremap(&pdev->dev, res->start, resource_size(res)); - if (!tegra_phy->regs) { + if (!tegra_phy->pad_regs) { dev_err(&pdev->dev, "Failed to remap UTMI Pad regs\n"); return -ENOMEM; } -- cgit v0.10.2 From a8b14744429f90763f258ad0f69601cdcad610aa Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Wed, 4 Dec 2013 09:20:40 -0500 Subject: sysfs: give different locking key to regular and bin files 027a485d12e0 ("sysfs: use a separate locking class for open files depending on mmap") assigned different lockdep key to sysfs_open_file->mutex depending on whether the file implements mmap or not in an attempt to avoid spurious lockdep warning caused by merging of regular and bin file paths. While this restored some of the original behavior of using different locks (at least lockdep is concerned) for the different clases of files. The restoration wasn't full because now the lockdep key assignment depends on whether the file has mmap or not instead of whether it's a regular file or not. This means that bin files which don't implement mmap will get assigned the same lockdep class as regular files. This is problematic because file_operations for bin files still implements the mmap file operation and checking whether the sysfs file actually implements mmap happens in the file operation after grabbing @sysfs_open_file->mutex. We still end up adding locking dependency from mmap locking to sysfs_open_file->mutex to the regular file mutex which triggers spurious circular locking warning. Fix it by restoring the original behavior fully by differentiating lockdep key by whether the file is regular or bin, instead of the existence of mmap. Signed-off-by: Tejun Heo Reported-by: Dave Jones Link: http://lkml.kernel.org/g/20131203184324.GA11320@redhat.com Signed-off-by: Greg Kroah-Hartman diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c index b94f936..35e7d08 100644 --- a/fs/sysfs/file.c +++ b/fs/sysfs/file.c @@ -609,7 +609,7 @@ static int sysfs_open_file(struct inode *inode, struct file *file) struct sysfs_dirent *attr_sd = file->f_path.dentry->d_fsdata; struct kobject *kobj = attr_sd->s_parent->s_dir.kobj; struct sysfs_open_file *of; - bool has_read, has_write, has_mmap; + bool has_read, has_write; int error = -EACCES; /* need attr_sd for attr and ops, its parent for kobj */ @@ -621,7 +621,6 @@ static int sysfs_open_file(struct inode *inode, struct file *file) has_read = battr->read || battr->mmap; has_write = battr->write || battr->mmap; - has_mmap = battr->mmap; } else { const struct sysfs_ops *ops = sysfs_file_ops(attr_sd); @@ -633,7 +632,6 @@ static int sysfs_open_file(struct inode *inode, struct file *file) has_read = ops->show; has_write = ops->store; - has_mmap = false; } /* check perms and supported operations */ @@ -661,9 +659,9 @@ static int sysfs_open_file(struct inode *inode, struct file *file) * open file has a separate mutex, it's okay as long as those don't * happen on the same file. At this point, we can't easily give * each file a separate locking class. Let's differentiate on - * whether the file has mmap or not for now. + * whether the file is bin or not for now. */ - if (has_mmap) + if (sysfs_is_bin(attr_sd)) mutex_init(&of->mutex); else mutex_init(&of->mutex); -- cgit v0.10.2 From 9105bb149bbbc555d2e11ba5166dfe7a24eae09e Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sun, 8 Dec 2013 20:52:31 -0500 Subject: ext4: fix del_timer() misuse for ->s_err_report That thing should be del_timer_sync(); consider what happens if ext4_put_super() call of del_timer() happens to come just as it's getting run on another CPU. Since that timer reschedules itself to run next day, you are pretty much guaranteed that you'll end up with kfree'd scheduled timer, with usual fun consequences. AFAICS, that's -stable fodder all way back to 2010... [the second del_timer_sync() is almost certainly not needed, but it doesn't hurt either] Signed-off-by: Al Viro Signed-off-by: "Theodore Ts'o" Cc: stable@vger.kernel.org diff --git a/fs/ext4/super.c b/fs/ext4/super.c index c977f4e..9d70c0c 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -792,7 +792,7 @@ static void ext4_put_super(struct super_block *sb) } ext4_es_unregister_shrinker(sbi); - del_timer(&sbi->s_err_report); + del_timer_sync(&sbi->s_err_report); ext4_release_system_zone(sb); ext4_mb_release(sb); ext4_ext_release(sb); @@ -4184,7 +4184,7 @@ failed_mount_wq: } failed_mount3: ext4_es_unregister_shrinker(sbi); - del_timer(&sbi->s_err_report); + del_timer_sync(&sbi->s_err_report); if (sbi->s_flex_groups) ext4_kvfree(sbi->s_flex_groups); percpu_counter_destroy(&sbi->s_freeclusters_counter); -- cgit v0.10.2 From 30fac0f75da24dd5bb43c9e911d2039a984ac815 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Sun, 8 Dec 2013 21:11:59 -0500 Subject: ext4: Do not reserve clusters when fs doesn't support extents When the filesystem doesn't support extents (like in ext2/3 compatibility modes), there is no need to reserve any clusters. Space estimates for writing are exact, hole punching doesn't need new metadata, and there are no unwritten extents to convert. This fixes a problem when filesystem still having some free space when accessed with a native ext2/3 driver suddently reports ENOSPC when accessed with ext4 driver. Reported-by: Geert Uytterhoeven Tested-by: Geert Uytterhoeven Reviewed-by: Lukas Czerner Signed-off-by: Jan Kara Signed-off-by: "Theodore Ts'o" Cc: stable@vger.kernel.org diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 9d70c0c..1f7784d 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3316,11 +3316,19 @@ int ext4_calculate_overhead(struct super_block *sb) } -static ext4_fsblk_t ext4_calculate_resv_clusters(struct ext4_sb_info *sbi) +static ext4_fsblk_t ext4_calculate_resv_clusters(struct super_block *sb) { ext4_fsblk_t resv_clusters; /* + * There's no need to reserve anything when we aren't using extents. + * The space estimates are exact, there are no unwritten extents, + * hole punching doesn't need new metadata... This is needed especially + * to keep ext2/3 backward compatibility. + */ + if (!EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_EXTENTS)) + return 0; + /* * By default we reserve 2% or 4096 clusters, whichever is smaller. * This should cover the situations where we can not afford to run * out of space like for example punch hole, or converting @@ -3328,7 +3336,8 @@ static ext4_fsblk_t ext4_calculate_resv_clusters(struct ext4_sb_info *sbi) * allocation would require 1, or 2 blocks, higher numbers are * very rare. */ - resv_clusters = ext4_blocks_count(sbi->s_es) >> sbi->s_cluster_bits; + resv_clusters = ext4_blocks_count(EXT4_SB(sb)->s_es) >> + EXT4_SB(sb)->s_cluster_bits; do_div(resv_clusters, 50); resv_clusters = min_t(ext4_fsblk_t, resv_clusters, 4096); @@ -4071,10 +4080,10 @@ no_journal: "available"); } - err = ext4_reserve_clusters(sbi, ext4_calculate_resv_clusters(sbi)); + err = ext4_reserve_clusters(sbi, ext4_calculate_resv_clusters(sb)); if (err) { ext4_msg(sb, KERN_ERR, "failed to reserve %llu clusters for " - "reserved pool", ext4_calculate_resv_clusters(sbi)); + "reserved pool", ext4_calculate_resv_clusters(sb)); goto failed_mount4a; } -- cgit v0.10.2 From f6c07cad081ba222d63623d913aafba5586c1d2c Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Sun, 8 Dec 2013 21:12:59 -0500 Subject: jbd2: don't BUG but return ENOSPC if a handle runs out of space If a handle runs out of space, we currently stop the kernel with a BUG in jbd2_journal_dirty_metadata(). This makes it hard to figure out what might be going on. So return an error of ENOSPC, so we can let the file system layer figure out what is going on, to make it more likely we can get useful debugging information). This should make it easier to debug problems such as the one which was reported by: https://bugzilla.kernel.org/show_bug.cgi?id=44731 The only two callers of this function are ext4_handle_dirty_metadata() and ocfs2_journal_dirty(). The ocfs2 function will trigger a BUG_ON(), which means there will be no change in behavior. The ext4 function will call ext4_error_inode() which will print the useful debugging information and then handle the situation using ext4's error handling mechanisms (i.e., which might mean halting the kernel or remounting the file system read-only). Also, since both file systems already call WARN_ON(), drop the WARN_ON from jbd2_journal_dirty_metadata() to avoid two stack traces from being displayed. Signed-off-by: "Theodore Ts'o" Cc: ocfs2-devel@oss.oracle.com Acked-by: Joel Becker diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 7aa9a32..b0b74e5 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -1290,7 +1290,10 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh) * once a transaction -bzzz */ jh->b_modified = 1; - J_ASSERT_JH(jh, handle->h_buffer_credits > 0); + if (handle->h_buffer_credits <= 0) { + ret = -ENOSPC; + goto out_unlock_bh; + } handle->h_buffer_credits--; } @@ -1373,7 +1376,6 @@ out_unlock_bh: jbd2_journal_put_journal_head(jh); out: JBUFFER_TRACE(jh, "exit"); - WARN_ON(ret); /* All errors are bugs, so dump the stack */ return ret; } -- cgit v0.10.2 From 75685071cd5b26d64fd2577330387aeab019ac97 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Sun, 8 Dec 2013 21:13:59 -0500 Subject: jbd2: revise KERN_EMERG error messages Some of KERN_EMERG printk messages do not really deserve this log level and the one in log_wait_commit() is even rather useless (the journal has been previously aborted and *that* is where we should have been complaining). So make some messages just KERN_ERR and remove the useless message. Signed-off-by: Jan Kara Signed-off-by: "Theodore Ts'o" diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index 5203264..f66faed 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -702,7 +702,7 @@ int jbd2_log_wait_commit(journal_t *journal, tid_t tid) read_lock(&journal->j_state_lock); #ifdef CONFIG_JBD2_DEBUG if (!tid_geq(journal->j_commit_request, tid)) { - printk(KERN_EMERG + printk(KERN_ERR "%s: error: j_commit_request=%d, tid=%d\n", __func__, journal->j_commit_request, tid); } @@ -718,10 +718,8 @@ int jbd2_log_wait_commit(journal_t *journal, tid_t tid) } read_unlock(&journal->j_state_lock); - if (unlikely(is_journal_aborted(journal))) { - printk(KERN_EMERG "journal commit I/O error\n"); + if (unlikely(is_journal_aborted(journal))) err = -EIO; - } return err; } @@ -2645,7 +2643,7 @@ static void __exit journal_exit(void) #ifdef CONFIG_JBD2_DEBUG int n = atomic_read(&nr_journal_heads); if (n) - printk(KERN_EMERG "JBD2: leaked %d journal_heads!\n", n); + printk(KERN_ERR "JBD2: leaked %d journal_heads!\n", n); #endif jbd2_remove_jbd_stats_proc_entry(); jbd2_journal_destroy_caches(); diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index b0b74e5..80797cf 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -932,7 +932,7 @@ repeat: jbd2_alloc(jh2bh(jh)->b_size, GFP_NOFS); if (!frozen_buffer) { - printk(KERN_EMERG + printk(KERN_ERR "%s: OOM for frozen_buffer\n", __func__); JBUFFER_TRACE(jh, "oom!"); @@ -1166,7 +1166,7 @@ repeat: if (!jh->b_committed_data) { committed_data = jbd2_alloc(jh2bh(jh)->b_size, GFP_NOFS); if (!committed_data) { - printk(KERN_EMERG "%s: No memory for committed data\n", + printk(KERN_ERR "%s: No memory for committed data\n", __func__); err = -ENOMEM; goto out; @@ -1308,7 +1308,7 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh) JBUFFER_TRACE(jh, "fastpath"); if (unlikely(jh->b_transaction != journal->j_running_transaction)) { - printk(KERN_EMERG "JBD: %s: " + printk(KERN_ERR "JBD: %s: " "jh->b_transaction (%llu, %p, %u) != " "journal->j_running_transaction (%p, %u)", journal->j_devname, @@ -1335,7 +1335,7 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh) JBUFFER_TRACE(jh, "already on other transaction"); if (unlikely(jh->b_transaction != journal->j_committing_transaction)) { - printk(KERN_EMERG "JBD: %s: " + printk(KERN_ERR "JBD: %s: " "jh->b_transaction (%llu, %p, %u) != " "journal->j_committing_transaction (%p, %u)", journal->j_devname, @@ -1348,7 +1348,7 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh) ret = -EINVAL; } if (unlikely(jh->b_next_transaction != transaction)) { - printk(KERN_EMERG "JBD: %s: " + printk(KERN_ERR "JBD: %s: " "jh->b_next_transaction (%llu, %p, %u) != " "transaction (%p, %u)", journal->j_devname, -- cgit v0.10.2 From a67c848a8b9aa9e471f9eaadd2cb29cc527462cf Mon Sep 17 00:00:00 2001 From: Dmitry Monakhov Date: Sun, 8 Dec 2013 21:14:59 -0500 Subject: jbd2: rename obsoleted msg JBD->JBD2 Rename performed via: perl -pi -e 's/JBD:/JBD2:/g' fs/jbd2/*.c Signed-off-by: Dmitry Monakhov Signed-off-by: "Theodore Ts'o" Reviewed-by: Carlos Maiolino diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c index f66faed..5fa344a 100644 --- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -1525,13 +1525,13 @@ static int journal_get_superblock(journal_t *journal) if (JBD2_HAS_COMPAT_FEATURE(journal, JBD2_FEATURE_COMPAT_CHECKSUM) && JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_CSUM_V2)) { /* Can't have checksum v1 and v2 on at the same time! */ - printk(KERN_ERR "JBD: Can't enable checksumming v1 and v2 " + printk(KERN_ERR "JBD2: Can't enable checksumming v1 and v2 " "at the same time!\n"); goto out; } if (!jbd2_verify_csum_type(journal, sb)) { - printk(KERN_ERR "JBD: Unknown checksum type\n"); + printk(KERN_ERR "JBD2: Unknown checksum type\n"); goto out; } @@ -1539,7 +1539,7 @@ static int journal_get_superblock(journal_t *journal) if (JBD2_HAS_INCOMPAT_FEATURE(journal, JBD2_FEATURE_INCOMPAT_CSUM_V2)) { journal->j_chksum_driver = crypto_alloc_shash("crc32c", 0, 0); if (IS_ERR(journal->j_chksum_driver)) { - printk(KERN_ERR "JBD: Cannot load crc32c driver.\n"); + printk(KERN_ERR "JBD2: Cannot load crc32c driver.\n"); err = PTR_ERR(journal->j_chksum_driver); journal->j_chksum_driver = NULL; goto out; @@ -1548,7 +1548,7 @@ static int journal_get_superblock(journal_t *journal) /* Check superblock checksum */ if (!jbd2_superblock_csum_verify(journal, sb)) { - printk(KERN_ERR "JBD: journal checksum error\n"); + printk(KERN_ERR "JBD2: journal checksum error\n"); goto out; } @@ -1834,7 +1834,7 @@ int jbd2_journal_set_features (journal_t *journal, unsigned long compat, journal->j_chksum_driver = crypto_alloc_shash("crc32c", 0, 0); if (IS_ERR(journal->j_chksum_driver)) { - printk(KERN_ERR "JBD: Cannot load crc32c " + printk(KERN_ERR "JBD2: Cannot load crc32c " "driver.\n"); journal->j_chksum_driver = NULL; return 0; diff --git a/fs/jbd2/recovery.c b/fs/jbd2/recovery.c index 3929c50..3b6bb19 100644 --- a/fs/jbd2/recovery.c +++ b/fs/jbd2/recovery.c @@ -594,7 +594,7 @@ static int do_one_pass(journal_t *journal, be32_to_cpu(tmp->h_sequence))) { brelse(obh); success = -EIO; - printk(KERN_ERR "JBD: Invalid " + printk(KERN_ERR "JBD2: Invalid " "checksum recovering " "block %llu in log\n", blocknr); diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 80797cf..8360674 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -1308,7 +1308,7 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh) JBUFFER_TRACE(jh, "fastpath"); if (unlikely(jh->b_transaction != journal->j_running_transaction)) { - printk(KERN_ERR "JBD: %s: " + printk(KERN_ERR "JBD2: %s: " "jh->b_transaction (%llu, %p, %u) != " "journal->j_running_transaction (%p, %u)", journal->j_devname, @@ -1335,7 +1335,7 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh) JBUFFER_TRACE(jh, "already on other transaction"); if (unlikely(jh->b_transaction != journal->j_committing_transaction)) { - printk(KERN_ERR "JBD: %s: " + printk(KERN_ERR "JBD2: %s: " "jh->b_transaction (%llu, %p, %u) != " "journal->j_committing_transaction (%p, %u)", journal->j_devname, @@ -1348,7 +1348,7 @@ int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh) ret = -EINVAL; } if (unlikely(jh->b_next_transaction != transaction)) { - printk(KERN_ERR "JBD: %s: " + printk(KERN_ERR "JBD2: %s: " "jh->b_next_transaction (%llu, %p, %u) != " "transaction (%p, %u)", journal->j_devname, -- cgit v0.10.2 From d825a04387ff4ce66117306f2862c7cedca5c597 Mon Sep 17 00:00:00 2001 From: Alexander Graf Date: Fri, 29 Nov 2013 02:24:18 +0100 Subject: KVM: PPC: Book3S: PR: Don't clobber our exit handler id We call a C helper to save all svcpu fields into our vcpu. The C ABI states that r12 is considered volatile. However, we keep our exit handler id in r12 currently. So we need to save it away into a non-volatile register instead that definitely does get preserved across the C call. This bug usually didn't hit anyone yet since gcc is smart enough to generate code that doesn't even need r12 which means it stayed identical throughout the call by sheer luck. But we can't rely on that. Signed-off-by: Alexander Graf diff --git a/arch/powerpc/kvm/book3s_interrupts.S b/arch/powerpc/kvm/book3s_interrupts.S index f4dd041..5e7cb32 100644 --- a/arch/powerpc/kvm/book3s_interrupts.S +++ b/arch/powerpc/kvm/book3s_interrupts.S @@ -132,9 +132,17 @@ kvm_start_lightweight: * */ + PPC_LL r3, GPR4(r1) /* vcpu pointer */ + + /* + * kvmppc_copy_from_svcpu can clobber volatile registers, save + * the exit handler id to the vcpu and restore it from there later. + */ + stw r12, VCPU_TRAP(r3) + /* Transfer reg values from shadow vcpu back to vcpu struct */ /* On 64-bit, interrupts are still off at this point */ - PPC_LL r3, GPR4(r1) /* vcpu pointer */ + GET_SHADOW_VCPU(r4) bl FUNC(kvmppc_copy_from_svcpu) nop @@ -151,7 +159,6 @@ kvm_start_lightweight: */ ld r3, PACA_SPRG3(r13) mtspr SPRN_SPRG3, r3 - #endif /* CONFIG_PPC_BOOK3S_64 */ /* R7 = vcpu */ @@ -177,7 +184,7 @@ kvm_start_lightweight: PPC_STL r31, VCPU_GPR(R31)(r7) /* Pass the exit number as 3rd argument to kvmppc_handle_exit */ - mr r5, r12 + lwz r5, VCPU_TRAP(r7) /* Restore r3 (kvm_run) and r4 (vcpu) */ REST_2GPRS(3, r1) -- cgit v0.10.2 From c9dad7f9db4ed42de37d3f0ef2b2c0e10d5b6f92 Mon Sep 17 00:00:00 2001 From: Alexander Graf Date: Fri, 29 Nov 2013 02:27:23 +0100 Subject: KVM: PPC: Book3S: PR: Export kvmppc_copy_to|from_svcpu The kvmppc_copy_{to,from}_svcpu functions are publically visible, so we should also export them in a header for others C files to consume. So far we didn't need this because we only called it from asm code. The next patch will introduce a C caller. Signed-off-by: Alexander Graf diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 4a594b7..bc23b1b 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -192,6 +192,10 @@ extern void kvmppc_load_up_vsx(void); extern u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst); extern ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst); extern int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd); +extern void kvmppc_copy_to_svcpu(struct kvmppc_book3s_shadow_vcpu *svcpu, + struct kvm_vcpu *vcpu); +extern void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu, + struct kvmppc_book3s_shadow_vcpu *svcpu); static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu) { -- cgit v0.10.2 From 40fdd8c88c4a5e9b26bfbed2215ac661f24aef07 Mon Sep 17 00:00:00 2001 From: Alexander Graf Date: Fri, 29 Nov 2013 02:29:00 +0100 Subject: KVM: PPC: Book3S: PR: Make svcpu -> vcpu store preempt savvy As soon as we get back to our "highmem" handler in virtual address space we may get preempted. Today the reason we can get preempted is that we replay interrupts and all the lazy logic thinks we have interrupts enabled. However, it's not hard to make the code interruptible and that way we can enable and handle interrupts even earlier. This fixes random guest crashes that happened with CONFIG_PREEMPT=y for me. Signed-off-by: Alexander Graf diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 0bd9348..412b2f3 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -106,6 +106,7 @@ struct kvmppc_host_state { }; struct kvmppc_book3s_shadow_vcpu { + bool in_use; ulong gpr[14]; u32 cr; u32 xer; diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index fe14ca3..5b9e906 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -66,6 +66,7 @@ static void kvmppc_core_vcpu_load_pr(struct kvm_vcpu *vcpu, int cpu) struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu); memcpy(svcpu->slb, to_book3s(vcpu)->slb_shadow, sizeof(svcpu->slb)); svcpu->slb_max = to_book3s(vcpu)->slb_shadow_max; + svcpu->in_use = 0; svcpu_put(svcpu); #endif vcpu->cpu = smp_processor_id(); @@ -78,6 +79,9 @@ static void kvmppc_core_vcpu_put_pr(struct kvm_vcpu *vcpu) { #ifdef CONFIG_PPC_BOOK3S_64 struct kvmppc_book3s_shadow_vcpu *svcpu = svcpu_get(vcpu); + if (svcpu->in_use) { + kvmppc_copy_from_svcpu(vcpu, svcpu); + } memcpy(to_book3s(vcpu)->slb_shadow, svcpu->slb, sizeof(svcpu->slb)); to_book3s(vcpu)->slb_shadow_max = svcpu->slb_max; svcpu_put(svcpu); @@ -110,12 +114,26 @@ void kvmppc_copy_to_svcpu(struct kvmppc_book3s_shadow_vcpu *svcpu, svcpu->ctr = vcpu->arch.ctr; svcpu->lr = vcpu->arch.lr; svcpu->pc = vcpu->arch.pc; + svcpu->in_use = true; } /* Copy data touched by real-mode code from shadow vcpu back to vcpu */ void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu, struct kvmppc_book3s_shadow_vcpu *svcpu) { + /* + * vcpu_put would just call us again because in_use hasn't + * been updated yet. + */ + preempt_disable(); + + /* + * Maybe we were already preempted and synced the svcpu from + * our preempt notifiers. Don't bother touching this svcpu then. + */ + if (!svcpu->in_use) + goto out; + vcpu->arch.gpr[0] = svcpu->gpr[0]; vcpu->arch.gpr[1] = svcpu->gpr[1]; vcpu->arch.gpr[2] = svcpu->gpr[2]; @@ -139,6 +157,10 @@ void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu, vcpu->arch.fault_dar = svcpu->fault_dar; vcpu->arch.fault_dsisr = svcpu->fault_dsisr; vcpu->arch.last_inst = svcpu->last_inst; + svcpu->in_use = false; + +out: + preempt_enable(); } static int kvmppc_core_check_requests_pr(struct kvm_vcpu *vcpu) -- cgit v0.10.2 From 3d3319b45eea26df56c53aae1a65adf74c8ab12a Mon Sep 17 00:00:00 2001 From: Alexander Graf Date: Fri, 29 Nov 2013 02:32:31 +0100 Subject: KVM: PPC: Book3S: PR: Enable interrupts earlier Now that the svcpu sync is interrupt aware we can enable interrupts earlier in the exit code path again, moving 32bit and 64bit closer together. While at it, document the fact that we're always executing the exit path with interrupts enabled so that the next person doesn't trap over this. Signed-off-by: Alexander Graf diff --git a/arch/powerpc/kvm/book3s_interrupts.S b/arch/powerpc/kvm/book3s_interrupts.S index 5e7cb32..f779450 100644 --- a/arch/powerpc/kvm/book3s_interrupts.S +++ b/arch/powerpc/kvm/book3s_interrupts.S @@ -129,6 +129,7 @@ kvm_start_lightweight: * R12 = exit handler id * R13 = PACA * SVCPU.* = guest * + * MSR.EE = 1 * */ @@ -148,11 +149,6 @@ kvm_start_lightweight: nop #ifdef CONFIG_PPC_BOOK3S_64 - /* Re-enable interrupts */ - ld r3, HSTATE_HOST_MSR(r13) - ori r3, r3, MSR_EE - MTMSR_EERI(r3) - /* * Reload kernel SPRG3 value. * No need to save guest value as usermode can't modify SPRG3. diff --git a/arch/powerpc/kvm/book3s_rmhandlers.S b/arch/powerpc/kvm/book3s_rmhandlers.S index a38c4c9..c3c5231 100644 --- a/arch/powerpc/kvm/book3s_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_rmhandlers.S @@ -153,15 +153,11 @@ _GLOBAL(kvmppc_entry_trampoline) li r6, MSR_IR | MSR_DR andc r6, r5, r6 /* Clear DR and IR in MSR value */ -#ifdef CONFIG_PPC_BOOK3S_32 /* * Set EE in HOST_MSR so that it's enabled when we get into our - * C exit handler function. On 64-bit we delay enabling - * interrupts until we have finished transferring stuff - * to or from the PACA. + * C exit handler function. */ ori r5, r5, MSR_EE -#endif mtsrr0 r7 mtsrr1 r6 RFI -- cgit v0.10.2 From 241bf43321a10815225f477bba96a42285a2da73 Mon Sep 17 00:00:00 2001 From: Stephen Warren Date: Fri, 6 Dec 2013 13:34:50 -0700 Subject: ASoC: tegra: fix uninitialized variables in set_fmt In tegra*_i2s_set_fmt(), in the (fmt == SND_SOC_DAIFMT_CBM_CFM) case, "val" is never assigned to, but left uninitialized. The other case does initialized it. Fix this by initializing val at the start of the function, and only ever ORing into it. Update the handling of "mask" so it works the same way for consistency. Update tegra20_spdif.c to use the same code-style for consistency, even though it doesn't happen to suffer from the same problem at present. Signed-off-by: Stephen Warren Reviewed-by: Thierry Reding Signed-off-by: Mark Brown Fixes: 0f163546a772 ("ASoC: tegra: use regmap more directly") Cc: diff --git a/sound/soc/tegra/tegra20_i2s.c b/sound/soc/tegra/tegra20_i2s.c index 364bf6a9..8c819f8 100644 --- a/sound/soc/tegra/tegra20_i2s.c +++ b/sound/soc/tegra/tegra20_i2s.c @@ -74,7 +74,7 @@ static int tegra20_i2s_set_fmt(struct snd_soc_dai *dai, unsigned int fmt) { struct tegra20_i2s *i2s = snd_soc_dai_get_drvdata(dai); - unsigned int mask, val; + unsigned int mask = 0, val = 0; switch (fmt & SND_SOC_DAIFMT_INV_MASK) { case SND_SOC_DAIFMT_NB_NF: @@ -83,10 +83,10 @@ static int tegra20_i2s_set_fmt(struct snd_soc_dai *dai, return -EINVAL; } - mask = TEGRA20_I2S_CTRL_MASTER_ENABLE; + mask |= TEGRA20_I2S_CTRL_MASTER_ENABLE; switch (fmt & SND_SOC_DAIFMT_MASTER_MASK) { case SND_SOC_DAIFMT_CBS_CFS: - val = TEGRA20_I2S_CTRL_MASTER_ENABLE; + val |= TEGRA20_I2S_CTRL_MASTER_ENABLE; break; case SND_SOC_DAIFMT_CBM_CFM: break; diff --git a/sound/soc/tegra/tegra20_spdif.c b/sound/soc/tegra/tegra20_spdif.c index 08bc693..8c7c102 100644 --- a/sound/soc/tegra/tegra20_spdif.c +++ b/sound/soc/tegra/tegra20_spdif.c @@ -67,15 +67,15 @@ static int tegra20_spdif_hw_params(struct snd_pcm_substream *substream, { struct device *dev = dai->dev; struct tegra20_spdif *spdif = snd_soc_dai_get_drvdata(dai); - unsigned int mask, val; + unsigned int mask = 0, val = 0; int ret, spdifclock; - mask = TEGRA20_SPDIF_CTRL_PACK | - TEGRA20_SPDIF_CTRL_BIT_MODE_MASK; + mask |= TEGRA20_SPDIF_CTRL_PACK | + TEGRA20_SPDIF_CTRL_BIT_MODE_MASK; switch (params_format(params)) { case SNDRV_PCM_FORMAT_S16_LE: - val = TEGRA20_SPDIF_CTRL_PACK | - TEGRA20_SPDIF_CTRL_BIT_MODE_16BIT; + val |= TEGRA20_SPDIF_CTRL_PACK | + TEGRA20_SPDIF_CTRL_BIT_MODE_16BIT; break; default: return -EINVAL; diff --git a/sound/soc/tegra/tegra30_i2s.c b/sound/soc/tegra/tegra30_i2s.c index 231a785..02247fe 100644 --- a/sound/soc/tegra/tegra30_i2s.c +++ b/sound/soc/tegra/tegra30_i2s.c @@ -118,7 +118,7 @@ static int tegra30_i2s_set_fmt(struct snd_soc_dai *dai, unsigned int fmt) { struct tegra30_i2s *i2s = snd_soc_dai_get_drvdata(dai); - unsigned int mask, val; + unsigned int mask = 0, val = 0; switch (fmt & SND_SOC_DAIFMT_INV_MASK) { case SND_SOC_DAIFMT_NB_NF: @@ -127,10 +127,10 @@ static int tegra30_i2s_set_fmt(struct snd_soc_dai *dai, return -EINVAL; } - mask = TEGRA30_I2S_CTRL_MASTER_ENABLE; + mask |= TEGRA30_I2S_CTRL_MASTER_ENABLE; switch (fmt & SND_SOC_DAIFMT_MASTER_MASK) { case SND_SOC_DAIFMT_CBS_CFS: - val = TEGRA30_I2S_CTRL_MASTER_ENABLE; + val |= TEGRA30_I2S_CTRL_MASTER_ENABLE; break; case SND_SOC_DAIFMT_CBM_CFM: break; -- cgit v0.10.2 From 5dfc03f141993c101c626c84d639019e98a4f39c Mon Sep 17 00:00:00 2001 From: Nicolin Chen Date: Fri, 6 Dec 2013 23:38:28 +0800 Subject: ASoC: fsl: imx-wm8962: Don't update bias_level in machine driver If we update it here, the set_bias_level() of Codec driver won't be normally called and we will then miss some essential procedures in set_bias_level() of the Codec driver. Thus drop it. Signed-off-by: Nicolin Chen Signed-off-by: Mark Brown diff --git a/sound/soc/fsl/imx-wm8962.c b/sound/soc/fsl/imx-wm8962.c index 72064e9..72718e1 100644 --- a/sound/soc/fsl/imx-wm8962.c +++ b/sound/soc/fsl/imx-wm8962.c @@ -130,8 +130,6 @@ static int imx_wm8962_set_bias_level(struct snd_soc_card *card, break; } - dapm->bias_level = level; - return 0; } -- cgit v0.10.2 From 6b9f3e65282b3bd7ed77e7b2b1edfe7cfed48115 Mon Sep 17 00:00:00 2001 From: Stephen Warren Date: Tue, 3 Dec 2013 14:26:33 -0700 Subject: ASoC: don't leak on error in snd_dmaengine_pcm_register If snd_dmaengine_pcm_register()'s call to snd_soc_add_platform() fails, all objects allocated during registration are leaked. Fix this by adding error-handling code. Signed-off-by: Stephen Warren Acked-by: Lars-Peter Clausen Signed-off-by: Mark Brown diff --git a/sound/soc/soc-generic-dmaengine-pcm.c b/sound/soc/soc-generic-dmaengine-pcm.c index cbc9c96..41949af 100644 --- a/sound/soc/soc-generic-dmaengine-pcm.c +++ b/sound/soc/soc-generic-dmaengine-pcm.c @@ -305,6 +305,20 @@ static void dmaengine_pcm_request_chan_of(struct dmaengine_pcm *pcm, } } +static void dmaengine_pcm_release_chan(struct dmaengine_pcm *pcm) +{ + unsigned int i; + + for (i = SNDRV_PCM_STREAM_PLAYBACK; i <= SNDRV_PCM_STREAM_CAPTURE; + i++) { + if (!pcm->chan[i]) + continue; + dma_release_channel(pcm->chan[i]); + if (pcm->flags & SND_DMAENGINE_PCM_FLAG_HALF_DUPLEX) + break; + } +} + /** * snd_dmaengine_pcm_register - Register a dmaengine based PCM device * @dev: The parent device for the PCM device @@ -315,6 +329,7 @@ int snd_dmaengine_pcm_register(struct device *dev, const struct snd_dmaengine_pcm_config *config, unsigned int flags) { struct dmaengine_pcm *pcm; + int ret; pcm = kzalloc(sizeof(*pcm), GFP_KERNEL); if (!pcm) @@ -326,11 +341,20 @@ int snd_dmaengine_pcm_register(struct device *dev, dmaengine_pcm_request_chan_of(pcm, dev); if (flags & SND_DMAENGINE_PCM_FLAG_NO_RESIDUE) - return snd_soc_add_platform(dev, &pcm->platform, + ret = snd_soc_add_platform(dev, &pcm->platform, &dmaengine_no_residue_pcm_platform); else - return snd_soc_add_platform(dev, &pcm->platform, + ret = snd_soc_add_platform(dev, &pcm->platform, &dmaengine_pcm_platform); + if (ret) + goto err_free_dma; + + return 0; + +err_free_dma: + dmaengine_pcm_release_chan(pcm); + kfree(pcm); + return ret; } EXPORT_SYMBOL_GPL(snd_dmaengine_pcm_register); @@ -345,7 +369,6 @@ void snd_dmaengine_pcm_unregister(struct device *dev) { struct snd_soc_platform *platform; struct dmaengine_pcm *pcm; - unsigned int i; platform = snd_soc_lookup_platform(dev); if (!platform) @@ -353,15 +376,8 @@ void snd_dmaengine_pcm_unregister(struct device *dev) pcm = soc_platform_to_pcm(platform); - for (i = SNDRV_PCM_STREAM_PLAYBACK; i <= SNDRV_PCM_STREAM_CAPTURE; i++) { - if (pcm->chan[i]) { - dma_release_channel(pcm->chan[i]); - if (pcm->flags & SND_DMAENGINE_PCM_FLAG_HALF_DUPLEX) - break; - } - } - snd_soc_remove_platform(platform); + dmaengine_pcm_release_chan(pcm); kfree(pcm); } EXPORT_SYMBOL_GPL(snd_dmaengine_pcm_unregister); -- cgit v0.10.2 From 4144bc861ed7934d56f16d2acd808d44af0fcc90 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= Date: Fri, 29 Nov 2013 20:17:45 +0100 Subject: usb: cdc-wdm: manage_power should always set needs_remote_wakeup MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: stable Reported-by: Oliver Neukum Signed-off-by: Bjørn Mork Acked-by: Oliver Neukum Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c index 4d38759..0b23a86 100644 --- a/drivers/usb/class/cdc-wdm.c +++ b/drivers/usb/class/cdc-wdm.c @@ -854,13 +854,11 @@ static int wdm_manage_power(struct usb_interface *intf, int on) { /* need autopm_get/put here to ensure the usbcore sees the new value */ int rv = usb_autopm_get_interface(intf); - if (rv < 0) - goto err; intf->needs_remote_wakeup = on; - usb_autopm_put_interface(intf); -err: - return rv; + if (!rv) + usb_autopm_put_interface(intf); + return 0; } static int wdm_probe(struct usb_interface *intf, const struct usb_device_id *id) -- cgit v0.10.2 From 52d0dc7597c89b2ab779f3dcb9b9bf0800dd9218 Mon Sep 17 00:00:00 2001 From: Dmitry Kunilov Date: Tue, 3 Dec 2013 12:11:30 -0800 Subject: usb: serial: zte_ev: move support for ZTE AC2726 from zte_ev back to option ZTE AC2726 EVDO modem drops ppp connection every minute when driven by zte_ev but works fine when driven by option. Move the support for AC2726 back to option driver. Signed-off-by: Dmitry Kunilov Cc: stable Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index 496b7e39..cc7a241 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -251,6 +251,7 @@ static void option_instat_callback(struct urb *urb); #define ZTE_PRODUCT_MF628 0x0015 #define ZTE_PRODUCT_MF626 0x0031 #define ZTE_PRODUCT_MC2718 0xffe8 +#define ZTE_PRODUCT_AC2726 0xfff1 #define BENQ_VENDOR_ID 0x04a5 #define BENQ_PRODUCT_H10 0x4068 @@ -1453,6 +1454,7 @@ static const struct usb_device_id option_ids[] = { { USB_VENDOR_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0xff, 0x02, 0x01) }, { USB_VENDOR_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0xff, 0x02, 0x05) }, { USB_VENDOR_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0xff, 0x86, 0x10) }, + { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, ZTE_PRODUCT_AC2726, 0xff, 0xff, 0xff) }, { USB_DEVICE(BENQ_VENDOR_ID, BENQ_PRODUCT_H10) }, { USB_DEVICE(DLINK_VENDOR_ID, DLINK_PRODUCT_DWM_652) }, diff --git a/drivers/usb/serial/zte_ev.c b/drivers/usb/serial/zte_ev.c index fca4c75..eae2c87 100644 --- a/drivers/usb/serial/zte_ev.c +++ b/drivers/usb/serial/zte_ev.c @@ -281,8 +281,7 @@ static const struct usb_device_id id_table[] = { { USB_DEVICE(0x19d2, 0xfffd) }, { USB_DEVICE(0x19d2, 0xfffc) }, { USB_DEVICE(0x19d2, 0xfffb) }, - /* AC2726, AC8710_V3 */ - { USB_DEVICE_AND_INTERFACE_INFO(0x19d2, 0xfff1, 0xff, 0xff, 0xff) }, + /* AC8710_V3 */ { USB_DEVICE(0x19d2, 0xfff6) }, { USB_DEVICE(0x19d2, 0xfff7) }, { USB_DEVICE(0x19d2, 0xfff8) }, -- cgit v0.10.2 From cc5c9eb67f912cb2c349b04063ff9f444affbc59 Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Thu, 5 Dec 2013 15:20:49 +0800 Subject: usb: chipidea: host: Only disable the vbus regulator if it is not NULL Commit 40ed51a4b (usb: chipidea: host: add vbus regulator control) introduced a smatch complaint because regulator_disable() is called without checking whether ci->platdata->reg_vbus is not NULL. Fix this by adding the check. This patch is needed for 3.12 stable Cc: stable Reported-by: Dan Carpenter Signed-off-by: Fabio Estevam Signed-off-by: Peter Chen Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/usb/chipidea/host.c b/drivers/usb/chipidea/host.c index 59e6020..526cd77 100644 --- a/drivers/usb/chipidea/host.c +++ b/drivers/usb/chipidea/host.c @@ -88,7 +88,8 @@ static int host_start(struct ci_hdrc *ci) return ret; disable_reg: - regulator_disable(ci->platdata->reg_vbus); + if (ci->platdata->reg_vbus) + regulator_disable(ci->platdata->reg_vbus); put_hcd: usb_put_hcd(hcd); -- cgit v0.10.2 From 5a1e1456fc633da5291285b1fff75d2a7507375b Mon Sep 17 00:00:00 2001 From: Peter Chen Date: Thu, 5 Dec 2013 15:20:50 +0800 Subject: usb: chipidea: fix nobody cared IRQ when booting with host role If we connect Male-A-To-Male-A cable between otg-host and host pc, the ci->vbus_active is set wrongly, and cause the controller run at peripheral mode when we load gadget module (ci_udc_start will be run), but the software runs at host mode due to id = 0. The ehci_irq can't handle suspend (USBi_SLI) interrupt which is enabled for peripheral mode, it causes no one will handle irq error. This patch is needed for 3.12 stable Cc: stable Acked-by: Michael Grzeschik Reported-by: Marc Kleine-Budde Tested-by: Marc Kleine-Budde Signed-off-by: Marc Kleine-Budde Signed-off-by: Peter Chen Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/usb/chipidea/core.c b/drivers/usb/chipidea/core.c index 5d8981c..6e73f8c 100644 --- a/drivers/usb/chipidea/core.c +++ b/drivers/usb/chipidea/core.c @@ -642,6 +642,10 @@ static int ci_hdrc_probe(struct platform_device *pdev) : CI_ROLE_GADGET; } + /* only update vbus status for peripheral */ + if (ci->role == CI_ROLE_GADGET) + ci_handle_vbus_change(ci); + ret = ci_role_start(ci, ci->role); if (ret) { dev_err(dev, "can't start %s role\n", ci_role(ci)->name); diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c index b34c819..69d20fb 100644 --- a/drivers/usb/chipidea/udc.c +++ b/drivers/usb/chipidea/udc.c @@ -1795,9 +1795,6 @@ static int udc_start(struct ci_hdrc *ci) pm_runtime_no_callbacks(&ci->gadget.dev); pm_runtime_enable(&ci->gadget.dev); - /* Update ci->vbus_active */ - ci_handle_vbus_change(ci); - return retval; destroy_eps: -- cgit v0.10.2 From 7122c3e9154b5d9a7422f68f02d8acf050fad2b0 Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Tue, 10 Dec 2013 16:46:29 +1030 Subject: scripts/link-vmlinux.sh: only filter kernel symbols for arm Actually CONFIG_PAGE_OFFSET isn't same with PAGE_OFFSET, so it isn't easy to figue out PAGE_OFFSET defined in header file from scripts. Because CONFIG_PAGE_OFFSET may not be defined in some ARCHs( 64bit ARCH), or defined as bogus value in !MMU case, so this patch only applys the filter on ARM when CONFIG_PAGE_OFFSET is defined as the original problem is only on ARM. Cc: Cc: Rusty Russell Fixes: f6537f2f0eba4eba3354e48dbe3047db6d8b6254 Singed-off-by: Ming Lei Signed-off-by: Rusty Russell diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index 32b10f5..2dcb377 100644 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -82,7 +82,9 @@ kallsyms() kallsymopt="${kallsymopt} --all-symbols" fi - kallsymopt="${kallsymopt} --page-offset=$CONFIG_PAGE_OFFSET" + if [ -n "${CONFIG_ARM}" ] && [ -n "${CONFIG_PAGE_OFFSET}" ]; then + kallsymopt="${kallsymopt} --page-offset=$CONFIG_PAGE_OFFSET" + fi local aflags="${KBUILD_AFLAGS} ${KBUILD_AFLAGS_KERNEL} \ ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS}" -- cgit v0.10.2 From 6962d914f317b119e0db7189199b21ec77a4b3e0 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Mon, 9 Dec 2013 14:53:36 +0100 Subject: xhci: Limit the spurious wakeup fix only to HP machines We've got regression reports that my previous fix for spurious wakeups after S5 on HP Haswell machines leads to the automatic reboot at shutdown on some machines. It turned out that the fix for one side triggers another BIOS bug in other side. So, it's exclusive. Since the original S5 wakeups have been confirmed only on HP machines, it'd be safer to apply it only to limited machines. As a wild guess, limiting to machines with HP PCI SSID should suffice. This patch should be backported to kernels as old as 3.12, that contain the commit 638298dc66ea36623dbc2757a24fc2c4ab41b016 "xhci: Fix spurious wakeups after S5 on Haswell". Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=66171 Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai Signed-off-by: Sarah Sharp Tested-by: Reported-by: Niklas Schnelle Reported-by: Giorgos Reported-by: diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c index b8dffd5..73f5208 100644 --- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -128,7 +128,12 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci) * any other sleep) on Haswell machines with LPT and LPT-LP * with the new Intel BIOS */ - xhci->quirks |= XHCI_SPURIOUS_WAKEUP; + /* Limit the quirk to only known vendors, as this triggers + * yet another BIOS bug on some other machines + * https://bugzilla.kernel.org/show_bug.cgi?id=66171 + */ + if (pdev->subsystem_vendor == PCI_VENDOR_ID_HP) + xhci->quirks |= XHCI_SPURIOUS_WAKEUP; } if (pdev->vendor == PCI_VENDOR_ID_ETRON && pdev->device == PCI_DEVICE_ID_ASROCK_P67) { -- cgit v0.10.2 From c1b1731d2002b93857810aa5da856b43f46bb470 Mon Sep 17 00:00:00 2001 From: Sachin Kamat Date: Fri, 6 Dec 2013 17:51:18 +0530 Subject: drivers: phy: Fix memory leak 'phy' was not being freed upon error in one of the cases. Adjust the 'goto's to fix this. Signed-off-by: Sachin Kamat Signed-off-by: Kishon Vijay Abraham I Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/phy/phy-core.c b/drivers/phy/phy-core.c index 03cf8fb..712b358 100644 --- a/drivers/phy/phy-core.c +++ b/drivers/phy/phy-core.c @@ -453,7 +453,7 @@ struct phy *phy_create(struct device *dev, const struct phy_ops *ops, if (id < 0) { dev_err(dev, "unable to get id\n"); ret = id; - goto err0; + goto err1; } device_initialize(&phy->dev); @@ -468,11 +468,11 @@ struct phy *phy_create(struct device *dev, const struct phy_ops *ops, ret = dev_set_name(&phy->dev, "phy-%s.%d", dev_name(dev), id); if (ret) - goto err1; + goto err2; ret = device_add(&phy->dev); if (ret) - goto err1; + goto err2; if (pm_runtime_enabled(dev)) { pm_runtime_enable(&phy->dev); @@ -481,11 +481,11 @@ struct phy *phy_create(struct device *dev, const struct phy_ops *ops, return phy; -err1: +err2: ida_remove(&phy_ida, phy->id); put_device(&phy->dev); +err1: kfree(phy); - err0: return ERR_PTR(ret); } -- cgit v0.10.2 From 52797d293287f5a98bd686616527b102b077ac39 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Fri, 6 Dec 2013 17:51:19 +0530 Subject: drivers: phy: tweaks to phy_create() If this was called with a NULL "dev" then it lead to a NULL dereference when we called dev_WARN(). I have changed it to WARN_ON() so that we get a stack dump and can fix the caller. The rest of this patch is just cleanup like returning directly instead of having do-nothing gotos. Using descriptive labels instead of GW-BASIC style "err0" and "err1". I also flipped the order of put_device() and ida_remove() so they are a mirror reflection of the order they were allocated. Signed-off-by: Dan Carpenter Signed-off-by: Kishon Vijay Abraham I Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/phy/phy-core.c b/drivers/phy/phy-core.c index 712b358..58e0e97 100644 --- a/drivers/phy/phy-core.c +++ b/drivers/phy/phy-core.c @@ -437,23 +437,18 @@ struct phy *phy_create(struct device *dev, const struct phy_ops *ops, int id; struct phy *phy; - if (!dev) { - dev_WARN(dev, "no device provided for PHY\n"); - ret = -EINVAL; - goto err0; - } + if (WARN_ON(!dev)) + return ERR_PTR(-EINVAL); phy = kzalloc(sizeof(*phy), GFP_KERNEL); - if (!phy) { - ret = -ENOMEM; - goto err0; - } + if (!phy) + return ERR_PTR(-ENOMEM); id = ida_simple_get(&phy_ida, 0, 0, GFP_KERNEL); if (id < 0) { dev_err(dev, "unable to get id\n"); ret = id; - goto err1; + goto free_phy; } device_initialize(&phy->dev); @@ -468,11 +463,11 @@ struct phy *phy_create(struct device *dev, const struct phy_ops *ops, ret = dev_set_name(&phy->dev, "phy-%s.%d", dev_name(dev), id); if (ret) - goto err2; + goto put_dev; ret = device_add(&phy->dev); if (ret) - goto err2; + goto put_dev; if (pm_runtime_enabled(dev)) { pm_runtime_enable(&phy->dev); @@ -481,12 +476,11 @@ struct phy *phy_create(struct device *dev, const struct phy_ops *ops, return phy; -err2: - ida_remove(&phy_ida, phy->id); +put_dev: put_device(&phy->dev); -err1: + ida_remove(&phy_ida, phy->id); +free_phy: kfree(phy); -err0: return ERR_PTR(ret); } EXPORT_SYMBOL_GPL(phy_create); -- cgit v0.10.2 From 8820784203ac24279be23a6e3d85d216c46d65ee Mon Sep 17 00:00:00 2001 From: Kishon Vijay Abraham I Date: Fri, 6 Dec 2013 17:51:20 +0530 Subject: phy: kconfig: add depends on "USB_PHY" to OMAP_USB2 and TWL4030_USB Fixes warning: (OMAP_USB2 && TWL4030_USB) selects USB_PHY which has unmet direct dependencies (USB_SUPPORT) that shows up while disabling USB_SUPPORT from menuconfig. Reported-by: Russell King Signed-off-by: Kishon Vijay Abraham I Acked-by: Felipe Balbi Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig index a344f3d..330ef2d 100644 --- a/drivers/phy/Kconfig +++ b/drivers/phy/Kconfig @@ -24,8 +24,8 @@ config PHY_EXYNOS_MIPI_VIDEO config OMAP_USB2 tristate "OMAP USB2 PHY Driver" depends on ARCH_OMAP2PLUS + depends on USB_PHY select GENERIC_PHY - select USB_PHY select OMAP_CONTROL_USB help Enable this to support the transceiver that is part of SOC. This @@ -36,8 +36,8 @@ config OMAP_USB2 config TWL4030_USB tristate "TWL4030 USB Transceiver Driver" depends on TWL4030_CORE && REGULATOR_TWL4030 && USB_MUSB_OMAP2PLUS + depends on USB_PHY select GENERIC_PHY - select USB_PHY help Enable this to support the USB OTG transceiver on TWL4030 family chips (including the TWL5030 and TPS659x0 devices). -- cgit v0.10.2 From f5f972102d5c12729f0a35fce266b580aaa03f66 Mon Sep 17 00:00:00 2001 From: Scott Wood Date: Fri, 22 Nov 2013 15:52:29 -0600 Subject: powerpc/kvm/booke: Fix build break due to stack frame size warning Commit ce11e48b7fdd256ec68b932a89b397a790566031 ("KVM: PPC: E500: Add userspace debug stub support") added "struct thread_struct" to the stack of kvmppc_vcpu_run(). thread_struct is 1152 bytes on my build, compared to 48 bytes for the recently-introduced "struct debug_reg". Use the latter instead. This fixes the following error: cc1: warnings being treated as errors arch/powerpc/kvm/booke.c: In function 'kvmppc_vcpu_run': arch/powerpc/kvm/booke.c:760:1: error: the frame size of 1424 bytes is larger than 1024 bytes make[2]: *** [arch/powerpc/kvm/booke.o] Error 1 make[1]: *** [arch/powerpc/kvm] Error 2 make[1]: *** Waiting for unfinished jobs.... Signed-off-by: Scott Wood Signed-off-by: Alexander Graf diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h index 9ee1261..aace905 100644 --- a/arch/powerpc/include/asm/switch_to.h +++ b/arch/powerpc/include/asm/switch_to.h @@ -35,7 +35,7 @@ extern void giveup_vsx(struct task_struct *); extern void enable_kernel_spe(void); extern void giveup_spe(struct task_struct *); extern void load_up_spe(struct task_struct *); -extern void switch_booke_debug_regs(struct thread_struct *new_thread); +extern void switch_booke_debug_regs(struct debug_reg *new_debug); #ifndef CONFIG_SMP extern void discard_lazy_cpu_state(void); diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 75c2d10..83530af 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -339,7 +339,7 @@ static void set_debug_reg_defaults(struct thread_struct *thread) #endif } -static void prime_debug_regs(struct thread_struct *thread) +static void prime_debug_regs(struct debug_reg *debug) { /* * We could have inherited MSR_DE from userspace, since @@ -348,22 +348,22 @@ static void prime_debug_regs(struct thread_struct *thread) */ mtmsr(mfmsr() & ~MSR_DE); - mtspr(SPRN_IAC1, thread->debug.iac1); - mtspr(SPRN_IAC2, thread->debug.iac2); + mtspr(SPRN_IAC1, debug->iac1); + mtspr(SPRN_IAC2, debug->iac2); #if CONFIG_PPC_ADV_DEBUG_IACS > 2 - mtspr(SPRN_IAC3, thread->debug.iac3); - mtspr(SPRN_IAC4, thread->debug.iac4); + mtspr(SPRN_IAC3, debug->iac3); + mtspr(SPRN_IAC4, debug->iac4); #endif - mtspr(SPRN_DAC1, thread->debug.dac1); - mtspr(SPRN_DAC2, thread->debug.dac2); + mtspr(SPRN_DAC1, debug->dac1); + mtspr(SPRN_DAC2, debug->dac2); #if CONFIG_PPC_ADV_DEBUG_DVCS > 0 - mtspr(SPRN_DVC1, thread->debug.dvc1); - mtspr(SPRN_DVC2, thread->debug.dvc2); + mtspr(SPRN_DVC1, debug->dvc1); + mtspr(SPRN_DVC2, debug->dvc2); #endif - mtspr(SPRN_DBCR0, thread->debug.dbcr0); - mtspr(SPRN_DBCR1, thread->debug.dbcr1); + mtspr(SPRN_DBCR0, debug->dbcr0); + mtspr(SPRN_DBCR1, debug->dbcr1); #ifdef CONFIG_BOOKE - mtspr(SPRN_DBCR2, thread->debug.dbcr2); + mtspr(SPRN_DBCR2, debug->dbcr2); #endif } /* @@ -371,11 +371,11 @@ static void prime_debug_regs(struct thread_struct *thread) * debug registers, set the debug registers from the values * stored in the new thread. */ -void switch_booke_debug_regs(struct thread_struct *new_thread) +void switch_booke_debug_regs(struct debug_reg *new_debug) { if ((current->thread.debug.dbcr0 & DBCR0_IDM) - || (new_thread->debug.dbcr0 & DBCR0_IDM)) - prime_debug_regs(new_thread); + || (new_debug->dbcr0 & DBCR0_IDM)) + prime_debug_regs(new_debug); } EXPORT_SYMBOL_GPL(switch_booke_debug_regs); #else /* !CONFIG_PPC_ADV_DEBUG_REGS */ @@ -683,7 +683,7 @@ struct task_struct *__switch_to(struct task_struct *prev, #endif /* CONFIG_SMP */ #ifdef CONFIG_PPC_ADV_DEBUG_REGS - switch_booke_debug_regs(&new->thread); + switch_booke_debug_regs(&new->thread.debug); #else /* * For PPC_BOOK3S_64, we use the hw-breakpoint interfaces that would diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 53e65a2..0591e05 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -681,7 +681,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu) int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) { int ret, s; - struct thread_struct thread; + struct debug_reg debug; #ifdef CONFIG_PPC_FPU struct thread_fp_state fp; int fpexc_mode; @@ -723,9 +723,9 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) #endif /* Switch to guest debug context */ - thread.debug = vcpu->arch.shadow_dbg_reg; - switch_booke_debug_regs(&thread); - thread.debug = current->thread.debug; + debug = vcpu->arch.shadow_dbg_reg; + switch_booke_debug_regs(&debug); + debug = current->thread.debug; current->thread.debug = vcpu->arch.shadow_dbg_reg; kvmppc_fix_ee_before_entry(); @@ -736,8 +736,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) We also get here with interrupts enabled. */ /* Switch back to user space debug context */ - switch_booke_debug_regs(&thread); - current->thread.debug = thread.debug; + switch_booke_debug_regs(&debug); + current->thread.debug = debug; #ifdef CONFIG_PPC_FPU kvmppc_save_guest_fp(vcpu); -- cgit v0.10.2 From 6f041e99fc7ba00e7e65a431527f9235d6b16463 Mon Sep 17 00:00:00 2001 From: James Hogan Date: Thu, 21 Nov 2013 13:44:14 +0000 Subject: of: Fix NULL dereference in unflatten_and_copy() Check whether initial_boot_params is NULL before dereferencing it in unflatten_and_copy_device_tree() for the case where no device tree is available but the arch can still boot to a minimal usable system without it. In this case also log a warning for when the kernel log buffer is obtainable. Signed-off-by: James Hogan Cc: Rob Herring Signed-off-by: Grant Likely diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 2fa024b..758b4f8 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -922,8 +922,16 @@ void __init unflatten_device_tree(void) */ void __init unflatten_and_copy_device_tree(void) { - int size = __be32_to_cpu(initial_boot_params->totalsize); - void *dt = early_init_dt_alloc_memory_arch(size, + int size; + void *dt; + + if (!initial_boot_params) { + pr_warn("No valid device tree found, continuing without\n"); + return; + } + + size = __be32_to_cpu(initial_boot_params->totalsize); + dt = early_init_dt_alloc_memory_arch(size, __alignof__(struct boot_param_header)); if (dt) { -- cgit v0.10.2 From 02ab71cdae248533620abefa1d46097581457110 Mon Sep 17 00:00:00 2001 From: Stefano Stabellini Date: Mon, 9 Dec 2013 15:55:11 +0000 Subject: xen/arm64: do not call the swiotlb functions twice On arm64 the dma_map_ops implementation is based on the swiotlb. swiotlb-xen, used by default in dom0 on Xen, is also based on the swiotlb. Avoid calling into the default arm64 dma_map_ops functions from xen_dma_map_page, xen_dma_unmap_page, xen_dma_sync_single_for_cpu, and xen_dma_sync_single_for_device otherwise we end up calling into the swiotlb twice. When arm64 gets a non-swiotlb based implementation of dma_map_ops, we'll probably have to reintroduce dma_map_ops calls in page-coherent.h. Signed-off-by: Stefano Stabellini CC: catalin.marinas@arm.com CC: Will.Deacon@arm.com CC: Ian.Campbell@citrix.com diff --git a/arch/arm64/include/asm/xen/page-coherent.h b/arch/arm64/include/asm/xen/page-coherent.h index 2820f1a..dde3fc9 100644 --- a/arch/arm64/include/asm/xen/page-coherent.h +++ b/arch/arm64/include/asm/xen/page-coherent.h @@ -23,25 +23,21 @@ static inline void xen_dma_map_page(struct device *hwdev, struct page *page, unsigned long offset, size_t size, enum dma_data_direction dir, struct dma_attrs *attrs) { - __generic_dma_ops(hwdev)->map_page(hwdev, page, offset, size, dir, attrs); } static inline void xen_dma_unmap_page(struct device *hwdev, dma_addr_t handle, size_t size, enum dma_data_direction dir, struct dma_attrs *attrs) { - __generic_dma_ops(hwdev)->unmap_page(hwdev, handle, size, dir, attrs); } static inline void xen_dma_sync_single_for_cpu(struct device *hwdev, dma_addr_t handle, size_t size, enum dma_data_direction dir) { - __generic_dma_ops(hwdev)->sync_single_for_cpu(hwdev, handle, size, dir); } static inline void xen_dma_sync_single_for_device(struct device *hwdev, dma_addr_t handle, size_t size, enum dma_data_direction dir) { - __generic_dma_ops(hwdev)->sync_single_for_device(hwdev, handle, size, dir); } #endif /* _ASM_ARM64_XEN_PAGE_COHERENT_H */ -- cgit v0.10.2 From a7892f32cc3534d4cc0e64b245fbf47a8e364652 Mon Sep 17 00:00:00 2001 From: Ian Campbell Date: Wed, 11 Dec 2013 17:02:27 +0000 Subject: arm: xen: foreign mapping PTEs are special. These mappings are in fact special and require special handling in privcmd, which already exists. Failure to mark the PTE as special on arm64 causes all sorts of bad PTE fun. e.g. e.g.: BUG: Bad page map in process xl pte:e0004077b33f53 pmd:4079575003 page:ffffffbce1a2f328 count:1 mapcount:-1 mapping: (null) index:0x0 page flags: 0x4000000000000014(referenced|dirty) addr:0000007fb5259000 vm_flags:040644fa anon_vma: (null) mapping:ffffffc03a6fda58 index:0 vma->vm_ops->fault: privcmd_fault+0x0/0x38 vma->vm_file->f_op->mmap: privcmd_mmap+0x0/0x2c CPU: 0 PID: 2657 Comm: xl Not tainted 3.12.0+ #102 Call trace: [] dump_backtrace+0x0/0x12c [] show_stack+0x14/0x1c [] dump_stack+0x70/0x90 [] print_bad_pte+0x12c/0x1bc [] unmap_single_vma+0x4cc/0x700 [] unmap_vmas+0x68/0xb4 [] unmap_region+0xcc/0x1d4 [] do_munmap+0x218/0x314 [] vm_munmap+0x44/0x64 [] SyS_munmap+0x24/0x34 Where unmap_single_vma contains inlined -> unmap_page_range -> zap_pud_range -> zap_pmd_range -> zap_pte_range -> print_bad_pte. Or: BUG: Bad page state in process xl pfn:4077b4d page:ffffffbce1a2f8d8 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x4000000000000014(referenced|dirty) Modules linked in: CPU: 0 PID: 2657 Comm: xl Tainted: G B 3.12.0+ #102 Call trace: [] dump_backtrace+0x0/0x12c [] show_stack+0x14/0x1c [] dump_stack+0x70/0x90 [] bad_page+0xc4/0x110 [] free_pages_prepare+0xd0/0xd8 [] free_hot_cold_page+0x28/0x178 [] free_hot_cold_page_list+0x38/0x60 [] release_pages+0x190/0x1dc [] unmap_region+0x15c/0x1d4 [] do_munmap+0x218/0x314 [] vm_munmap+0x44/0x64 [] SyS_munmap+0x24/0x34 x86 already gets this correct. 32-bit arm gets away with this because there is not PTE_SPECIAL bit in the PTE there and the vm_normal_page fallback path does the right thing. Signed-off-by: Ian Campbell Signed-off-by: Stefano Stabellini diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c index 6a288c7..8550123 100644 --- a/arch/arm/xen/enlighten.c +++ b/arch/arm/xen/enlighten.c @@ -96,7 +96,7 @@ static int remap_pte_fn(pte_t *ptep, pgtable_t token, unsigned long addr, struct remap_data *info = data; struct page *page = info->pages[info->index++]; unsigned long pfn = page_to_pfn(page); - pte_t pte = pfn_pte(pfn, info->prot); + pte_t pte = pte_mkspecial(pfn_pte(pfn, info->prot)); if (map_foreign_page(pfn, info->fgmfn, info->domid)) return -EFAULT; -- cgit v0.10.2 From 2306bfb208b9c403592a0513b08f2e6ad53056d5 Mon Sep 17 00:00:00 2001 From: Eric Seppanen Date: Thu, 21 Nov 2013 14:49:56 -0800 Subject: iscsi-target: return -EINVAL on oversized configfs parameter The iSCSI CHAP auth parameters are already copied with respect for the destination buffer size. Return -EINVAL instead of silently truncating the input. Signed-off-by: Eric Seppanen Signed-off-by: Nicholas Bellinger diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configfs.c index e3318ed..1c0088f 100644 --- a/drivers/target/iscsi/iscsi_target_configfs.c +++ b/drivers/target/iscsi/iscsi_target_configfs.c @@ -474,7 +474,8 @@ static ssize_t __iscsi_##prefix##_store_##name( \ \ if (!capable(CAP_SYS_ADMIN)) \ return -EPERM; \ - \ + if (count >= sizeof(auth->name)) \ + return -EINVAL; \ snprintf(auth->name, sizeof(auth->name), "%s", page); \ if (!strncmp("NULL", auth->name, 4)) \ auth->naf_flags &= ~flags; \ -- cgit v0.10.2 From a51d5229d10dd3a337b674ce8603437d2996c5c3 Mon Sep 17 00:00:00 2001 From: Roland Dreier Date: Sat, 23 Nov 2013 10:35:58 -0800 Subject: target: Remove write-only stats fields and lock from struct se_node_acl Commit 04f3b31bff72 ("iscsi-target: Convert iscsi_session statistics to atomic_long_t") removed the updating of these fields in iscsi (the only fabric driver that ever touched these counters), and the core has no way to report or otherwise use the values. Remove the last remnants of these counters. Signed-off-by: Roland Dreier Signed-off-by: Nicholas Bellinger diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c index f697f8b..f755712 100644 --- a/drivers/target/target_core_tpg.c +++ b/drivers/target/target_core_tpg.c @@ -278,7 +278,6 @@ struct se_node_acl *core_tpg_check_initiator_node_acl( snprintf(acl->initiatorname, TRANSPORT_IQN_LEN, "%s", initiatorname); acl->se_tpg = tpg; acl->acl_index = scsi_get_new_index(SCSI_AUTH_INTR_INDEX); - spin_lock_init(&acl->stats_lock); acl->dynamic_node_acl = 1; tpg->se_tpg_tfo->set_default_node_attributes(acl); @@ -406,7 +405,6 @@ struct se_node_acl *core_tpg_add_initiator_node_acl( snprintf(acl->initiatorname, TRANSPORT_IQN_LEN, "%s", initiatorname); acl->se_tpg = tpg; acl->acl_index = scsi_get_new_index(SCSI_AUTH_INTR_INDEX); - spin_lock_init(&acl->stats_lock); tpg->se_tpg_tfo->set_default_node_attributes(acl); diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h index 45412a6..9f1dda6 100644 --- a/include/target/target_core_base.h +++ b/include/target/target_core_base.h @@ -517,10 +517,6 @@ struct se_node_acl { u32 acl_index; #define MAX_ACL_TAG_SIZE 64 char acl_tag[MAX_ACL_TAG_SIZE]; - u64 num_cmds; - u64 read_bytes; - u64 write_bytes; - spinlock_t stats_lock; /* Used for PR SPEC_I_PT=1 and REGISTER_AND_MOVE */ atomic_t acl_pr_ref_count; struct se_dev_entry **device_list; -- cgit v0.10.2 From 4454b66cb67f14c33cd70ddcf0ff4985b26324b7 Mon Sep 17 00:00:00 2001 From: Nicholas Bellinger Date: Mon, 25 Nov 2013 14:53:57 -0800 Subject: iscsi-target: Fix-up all zero data-length CDBs with R/W_BIT set This patch changes special case handling for ISCSI_OP_SCSI_CMD where an initiator sends a zero length Expected Data Transfer Length (EDTL), but still sets the WRITE and/or READ flag bits when no payload transfer is requested. Many, many moons ago two special cases where added for an ancient version of ESX that has long since been fixed, so instead of adding a new special case for the reported bug with a Broadcom 57800 NIC, go ahead and always strip off the incorrect WRITE + READ flag bits. Also, avoid sending a reject here, as RFC-3720 does mandate this case be handled without protocol error. Reported-by: Witold Bazakbal <865perl@wp.pl> Tested-by: Witold Bazakbal <865perl@wp.pl> Cc: #3.1+ Signed-off-by: Nicholas Bellinger diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index d70e911..02182ab 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -823,24 +823,22 @@ int iscsit_setup_scsi_cmd(struct iscsi_conn *conn, struct iscsi_cmd *cmd, if (((hdr->flags & ISCSI_FLAG_CMD_READ) || (hdr->flags & ISCSI_FLAG_CMD_WRITE)) && !hdr->data_length) { /* - * Vmware ESX v3.0 uses a modified Cisco Initiator (v3.4.2) - * that adds support for RESERVE/RELEASE. There is a bug - * add with this new functionality that sets R/W bits when - * neither CDB carries any READ or WRITE datapayloads. + * From RFC-3720 Section 10.3.1: + * + * "Either or both of R and W MAY be 1 when either the + * Expected Data Transfer Length and/or Bidirectional Read + * Expected Data Transfer Length are 0" + * + * For this case, go ahead and clear the unnecssary bits + * to avoid any confusion with ->data_direction. */ - if ((hdr->cdb[0] == 0x16) || (hdr->cdb[0] == 0x17)) { - hdr->flags &= ~ISCSI_FLAG_CMD_READ; - hdr->flags &= ~ISCSI_FLAG_CMD_WRITE; - goto done; - } + hdr->flags &= ~ISCSI_FLAG_CMD_READ; + hdr->flags &= ~ISCSI_FLAG_CMD_WRITE; - pr_err("ISCSI_FLAG_CMD_READ or ISCSI_FLAG_CMD_WRITE" + pr_warn("ISCSI_FLAG_CMD_READ or ISCSI_FLAG_CMD_WRITE" " set when Expected Data Transfer Length is 0 for" - " CDB: 0x%02x. Bad iSCSI Initiator.\n", hdr->cdb[0]); - return iscsit_add_reject_cmd(cmd, - ISCSI_REASON_BOOKMARK_INVALID, buf); + " CDB: 0x%02x, Fixing up flags\n", hdr->cdb[0]); } -done: if (!(hdr->flags & ISCSI_FLAG_CMD_READ) && !(hdr->flags & ISCSI_FLAG_CMD_WRITE) && (hdr->data_length != 0)) { -- cgit v0.10.2 From 94a7111043d99819cd0a72d9b3174c7054adb2a0 Mon Sep 17 00:00:00 2001 From: Wei Yongjun Date: Tue, 29 Oct 2013 09:56:34 +0800 Subject: iser-target: fix error return code in isert_create_device_ib_res() Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun Cc: #3.10+ Signed-off-by: Nicholas Bellinger diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index 6be57c3..78f6e92 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -264,21 +264,29 @@ isert_create_device_ib_res(struct isert_device *device) isert_cq_event_callback, (void *)&cq_desc[i], ISER_MAX_RX_CQ_LEN, i); - if (IS_ERR(device->dev_rx_cq[i])) + if (IS_ERR(device->dev_rx_cq[i])) { + ret = PTR_ERR(device->dev_rx_cq[i]); + device->dev_rx_cq[i] = NULL; goto out_cq; + } device->dev_tx_cq[i] = ib_create_cq(device->ib_device, isert_cq_tx_callback, isert_cq_event_callback, (void *)&cq_desc[i], ISER_MAX_TX_CQ_LEN, i); - if (IS_ERR(device->dev_tx_cq[i])) + if (IS_ERR(device->dev_tx_cq[i])) { + ret = PTR_ERR(device->dev_tx_cq[i]); + device->dev_tx_cq[i] = NULL; goto out_cq; + } - if (ib_req_notify_cq(device->dev_rx_cq[i], IB_CQ_NEXT_COMP)) + ret = ib_req_notify_cq(device->dev_rx_cq[i], IB_CQ_NEXT_COMP); + if (ret) goto out_cq; - if (ib_req_notify_cq(device->dev_tx_cq[i], IB_CQ_NEXT_COMP)) + ret = ib_req_notify_cq(device->dev_tx_cq[i], IB_CQ_NEXT_COMP); + if (ret) goto out_cq; } -- cgit v0.10.2 From 63832aabec12a28a41a221773ab3819d30ba0a67 Mon Sep 17 00:00:00 2001 From: Shivaram Upadhyayula Date: Tue, 10 Dec 2013 16:06:40 +0530 Subject: qla2xxx: Fix schedule_delayed_work() for target timeout calculations This patch fixes two cases in qla_target.c code where the schedule_delayed_work() value was being incorrectly calculated from sess->expires - jiffies. Signed-off-by: Shivaram U Cc: #3.6+ Signed-off-by: Nicholas Bellinger diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c index 5964800..3bb0a1d 100644 --- a/drivers/scsi/qla2xxx/qla_target.c +++ b/drivers/scsi/qla2xxx/qla_target.c @@ -471,7 +471,7 @@ static void qlt_schedule_sess_for_deletion(struct qla_tgt_sess *sess, schedule_delayed_work(&tgt->sess_del_work, 0); else schedule_delayed_work(&tgt->sess_del_work, - jiffies - sess->expires); + sess->expires - jiffies); } /* ha->hardware_lock supposed to be held on entry */ @@ -550,13 +550,14 @@ static void qlt_del_sess_work_fn(struct delayed_work *work) struct scsi_qla_host *vha = tgt->vha; struct qla_hw_data *ha = vha->hw; struct qla_tgt_sess *sess; - unsigned long flags; + unsigned long flags, elapsed; spin_lock_irqsave(&ha->hardware_lock, flags); while (!list_empty(&tgt->del_sess_list)) { sess = list_entry(tgt->del_sess_list.next, typeof(*sess), del_list_entry); - if (time_after_eq(jiffies, sess->expires)) { + elapsed = jiffies; + if (time_after_eq(elapsed, sess->expires)) { qlt_undelete_sess(sess); ql_dbg(ql_dbg_tgt_mgt, vha, 0xf004, @@ -566,7 +567,7 @@ static void qlt_del_sess_work_fn(struct delayed_work *work) ha->tgt.tgt_ops->put_sess(sess); } else { schedule_delayed_work(&tgt->sess_del_work, - jiffies - sess->expires); + sess->expires - elapsed); break; } } -- cgit v0.10.2 From 9ae9ab522094406915c27dc729d3ecef7b44af72 Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Wed, 4 Dec 2013 09:52:58 +0000 Subject: drm/i915: Prevent double unref following alloc failure during execbuffer Whilst looking up the objects required for an execbuffer, an untimely allocation failure in creating the vma results in the object being unreferenced from two lists. The ownership during the lookup is meant to be moved from the list of objects being looked to the vma, and this double unreference upon error results in a use-after-free. Fixes regression from commit 27173f1f95db5e74ceb35fe9a2f2f348ea11bac9 Author: Ben Widawsky Date: Wed Aug 14 11:38:36 2013 +0200 drm/i915: Convert execbuf code to use vmas Based on the fix by Ben Widawsky. Signed-off-by: Chris Wilson Cc: Ben Widawsky Cc: stable@vger.kernel.org [danvet: Bikeshed the crucial comment above the ownership transfer as discussed on irc.] Signed-off-by: Daniel Vetter diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index b7e787f..a3ba9a8 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -93,7 +93,7 @@ eb_lookup_vmas(struct eb_vmas *eb, { struct drm_i915_gem_object *obj; struct list_head objects; - int i, ret = 0; + int i, ret; INIT_LIST_HEAD(&objects); spin_lock(&file->table_lock); @@ -106,7 +106,7 @@ eb_lookup_vmas(struct eb_vmas *eb, DRM_DEBUG("Invalid object handle %d at index %d\n", exec[i].handle, i); ret = -ENOENT; - goto out; + goto err; } if (!list_empty(&obj->obj_exec_link)) { @@ -114,7 +114,7 @@ eb_lookup_vmas(struct eb_vmas *eb, DRM_DEBUG("Object %p [handle %d, index %d] appears more than once in object list\n", obj, exec[i].handle, i); ret = -EINVAL; - goto out; + goto err; } drm_gem_object_reference(&obj->base); @@ -123,9 +123,13 @@ eb_lookup_vmas(struct eb_vmas *eb, spin_unlock(&file->table_lock); i = 0; - list_for_each_entry(obj, &objects, obj_exec_link) { + while (!list_empty(&objects)) { struct i915_vma *vma; + obj = list_first_entry(&objects, + struct drm_i915_gem_object, + obj_exec_link); + /* * NOTE: We can leak any vmas created here when something fails * later on. But that's no issue since vma_unbind can deal with @@ -138,10 +142,12 @@ eb_lookup_vmas(struct eb_vmas *eb, if (IS_ERR(vma)) { DRM_DEBUG("Failed to lookup VMA\n"); ret = PTR_ERR(vma); - goto out; + goto err; } + /* Transfer ownership from the objects list to the vmas list. */ list_add_tail(&vma->exec_list, &eb->vmas); + list_del_init(&obj->obj_exec_link); vma->exec_entry = &exec[i]; if (eb->and < 0) { @@ -155,16 +161,22 @@ eb_lookup_vmas(struct eb_vmas *eb, ++i; } + return 0; + -out: +err: while (!list_empty(&objects)) { obj = list_first_entry(&objects, struct drm_i915_gem_object, obj_exec_link); list_del_init(&obj->obj_exec_link); - if (ret) - drm_gem_object_unreference(&obj->base); + drm_gem_object_unreference(&obj->base); } + /* + * Objects already transfered to the vmas list will be unreferenced by + * eb_destroy. + */ + return ret; } -- cgit v0.10.2 From 4db080f9e93411c3c41ec402244da28e2bbde835 Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Wed, 4 Dec 2013 11:37:09 +0000 Subject: drm/i915: Fix erroneous dereference of batch_obj inside reset_status As the rings may be processed and their requests deallocated in a different order to the natural retirement during a reset, /* Whilst this request exists, batch_obj will be on the * active_list, and so will hold the active reference. Only when this * request is retired will the the batch_obj be moved onto the * inactive_list and lose its active reference. Hence we do not need * to explicitly hold another reference here. */ is violated, and the batch_obj may be dereferenced after it had been freed on another ring. This can be simply avoided by processing the status update prior to deallocating any requests. Fixes regression (a possible OOPS following a GPU hang) from commit aa60c664e6df502578454621c3a9b1f087ff8d25 Author: Mika Kuoppala Date: Wed Jun 12 15:13:20 2013 +0300 drm/i915: find guilty batch buffer on ring resets Signed-off-by: Chris Wilson Cc: Mika Kuoppala Cc: stable@vger.kernel.org Reviewed-by: Mika Kuoppala [danvet: Add the code comment Chris supplied.] Signed-off-by: Daniel Vetter diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 621c7c6..76d3d1a 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2343,15 +2343,24 @@ static void i915_gem_free_request(struct drm_i915_gem_request *request) kfree(request); } -static void i915_gem_reset_ring_lists(struct drm_i915_private *dev_priv, - struct intel_ring_buffer *ring) +static void i915_gem_reset_ring_status(struct drm_i915_private *dev_priv, + struct intel_ring_buffer *ring) { - u32 completed_seqno; - u32 acthd; + u32 completed_seqno = ring->get_seqno(ring, false); + u32 acthd = intel_ring_get_active_head(ring); + struct drm_i915_gem_request *request; + + list_for_each_entry(request, &ring->request_list, list) { + if (i915_seqno_passed(completed_seqno, request->seqno)) + continue; - acthd = intel_ring_get_active_head(ring); - completed_seqno = ring->get_seqno(ring, false); + i915_set_reset_status(ring, request, acthd); + } +} +static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv, + struct intel_ring_buffer *ring) +{ while (!list_empty(&ring->request_list)) { struct drm_i915_gem_request *request; @@ -2359,9 +2368,6 @@ static void i915_gem_reset_ring_lists(struct drm_i915_private *dev_priv, struct drm_i915_gem_request, list); - if (request->seqno > completed_seqno) - i915_set_reset_status(ring, request, acthd); - i915_gem_free_request(request); } @@ -2403,8 +2409,16 @@ void i915_gem_reset(struct drm_device *dev) struct intel_ring_buffer *ring; int i; + /* + * Before we free the objects from the requests, we need to inspect + * them for finding the guilty party. As the requests only borrow + * their reference to the objects, the inspection must be done first. + */ + for_each_ring(ring, dev_priv, i) + i915_gem_reset_ring_status(dev_priv, ring); + for_each_ring(ring, dev_priv, i) - i915_gem_reset_ring_lists(dev_priv, ring); + i915_gem_reset_ring_cleanup(dev_priv, ring); i915_gem_cleanup_ringbuffer(dev); -- cgit v0.10.2 From 693e0cb052c607e2d41edf9e9f1fa99ff8c266c1 Mon Sep 17 00:00:00 2001 From: David Henningsson Date: Thu, 12 Dec 2013 09:52:03 +0100 Subject: ALSA: hda - Add enable_msi=0 workaround for four HP machines While enabling these machines, we found we would sometimes lose an interrupt if we change hardware volume during playback, and that disabling msi fixed this issue. (Losing the interrupt caused underruns and crackling audio, as the one second timeout is usually bigger than the period size.) The machines were all machines from HP, running AMD Hudson controller, and Realtek ALC282 codec. Cc: stable@vger.kernel.org BugLink: https://bugs.launchpad.net/bugs/1260225 Signed-off-by: David Henningsson Signed-off-by: Takashi Iwai diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c index 27aa140..956871d 100644 --- a/sound/pci/hda/hda_intel.c +++ b/sound/pci/hda/hda_intel.c @@ -3433,6 +3433,10 @@ static void check_probe_mask(struct azx *chip, int dev) * white/black-list for enable_msi */ static struct snd_pci_quirk msi_black_list[] = { + SND_PCI_QUIRK(0x103c, 0x2191, "HP", 0), /* AMD Hudson */ + SND_PCI_QUIRK(0x103c, 0x2192, "HP", 0), /* AMD Hudson */ + SND_PCI_QUIRK(0x103c, 0x21f7, "HP", 0), /* AMD Hudson */ + SND_PCI_QUIRK(0x103c, 0x21fa, "HP", 0), /* AMD Hudson */ SND_PCI_QUIRK(0x1043, 0x81f2, "ASUS", 0), /* Athlon64 X2 + nvidia */ SND_PCI_QUIRK(0x1043, 0x81f6, "ASUS", 0), /* nvidia */ SND_PCI_QUIRK(0x1043, 0x822d, "ASUS", 0), /* Athlon64 X2 + nvidia MCP55 */ -- cgit v0.10.2 From f984841bc0c4d596c4e95d1798f318291c4f78f5 Mon Sep 17 00:00:00 2001 From: Jason Cooper Date: Mon, 9 Dec 2013 11:15:56 -0800 Subject: dma: mv_xor: remove mv_desc_get_dest_addr() The following commit: 54f8d501e842 dmaengine: remove DMA unmap from drivers removed the last caller to mv_desc_get_dest_addr(), creating the warning: drivers/dma/mv_xor.c:57:12: warning: mv_desc_get_dest_addr defined but not used [-Wunused-function] Remove it. Signed-off-by: Jason Cooper Acked-by: Vinod Koul Signed-off-by: Dan Williams diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c index 7807f0e..23bcc91 100644 --- a/drivers/dma/mv_xor.c +++ b/drivers/dma/mv_xor.c @@ -54,12 +54,6 @@ static void mv_desc_init(struct mv_xor_desc_slot *desc, unsigned long flags) hw_desc->desc_command = (1 << 31); } -static u32 mv_desc_get_dest_addr(struct mv_xor_desc_slot *desc) -{ - struct mv_xor_desc *hw_desc = desc->hw_desc; - return hw_desc->phy_dest_addr; -} - static void mv_desc_set_byte_count(struct mv_xor_desc_slot *desc, u32 byte_count) { -- cgit v0.10.2 From d7fb0300fe663a5e338e3adeb7f811a2f9727e6c Mon Sep 17 00:00:00 2001 From: Olof Johansson Date: Mon, 9 Dec 2013 11:15:57 -0800 Subject: dmaengine: at_hdmac: remove unused function commit 54f8d501e8428 ('dmaengine: remove DMA unmap from drivers') refactored some code which resulted in an unused function in the at_hdmac driver: drivers/dma/at_hdmac_regs.h:350:23: warning: 'chan2parent' defined but not used [-Wunused-function] Fixes: 54f8d501e8428 ('dmaengine: remove DMA unmap from drivers') Signed-off-by: Olof Johansson Cc: Bartlomiej Zolnierkiewicz Acked-by: Nicolas Ferre Acked-by: Bartlomiej Zolnierkiewicz Acked-by: Vinod Koul Signed-off-by: Dan Williams diff --git a/drivers/dma/at_hdmac_regs.h b/drivers/dma/at_hdmac_regs.h index f31d647..2787aba 100644 --- a/drivers/dma/at_hdmac_regs.h +++ b/drivers/dma/at_hdmac_regs.h @@ -347,10 +347,6 @@ static struct device *chan2dev(struct dma_chan *chan) { return &chan->dev->device; } -static struct device *chan2parent(struct dma_chan *chan) -{ - return chan->dev->device.parent; -} #if defined(VERBOSE_DEBUG) static void vdbg_dump_regs(struct at_dma_chan *atchan) -- cgit v0.10.2 From 6aa2731ce2c7bd1305b553b5fc14ae4856d36569 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Mon, 9 Dec 2013 11:15:59 -0800 Subject: dma: fix build warnings in ppc4xx drivers/dma/ppc4xx/adma.c:1507:6: warning: unused variable 'i' [-Wunused-variable] - due to unmap reworks drivers/dma/ppc4xx/adma.c:3900:2: warning: format '%s' expects a matching 'char *' argument [-Wformat] - due to memset removal drivers/dma/ppc4xx/adma.c:538:13: warning: 'ppc440spe_desc_init_memset' defined but not used [-Wunused-function] - due to memset removal Cc: Bartlomiej Zolnierkiewicz Signed-off-by: Dan Williams diff --git a/drivers/dma/ppc4xx/adma.c b/drivers/dma/ppc4xx/adma.c index 8da48c6..8bba298 100644 --- a/drivers/dma/ppc4xx/adma.c +++ b/drivers/dma/ppc4xx/adma.c @@ -533,29 +533,6 @@ static void ppc440spe_desc_init_memcpy(struct ppc440spe_adma_desc_slot *desc, } /** - * ppc440spe_desc_init_memset - initialize the descriptor for MEMSET operation - */ -static void ppc440spe_desc_init_memset(struct ppc440spe_adma_desc_slot *desc, - int value, unsigned long flags) -{ - struct dma_cdb *hw_desc = desc->hw_desc; - - memset(desc->hw_desc, 0, sizeof(struct dma_cdb)); - desc->hw_next = NULL; - desc->src_cnt = 1; - desc->dst_cnt = 1; - - if (flags & DMA_PREP_INTERRUPT) - set_bit(PPC440SPE_DESC_INT, &desc->flags); - else - clear_bit(PPC440SPE_DESC_INT, &desc->flags); - - hw_desc->sg1u = hw_desc->sg1l = cpu_to_le32((u32)value); - hw_desc->sg3u = hw_desc->sg3l = cpu_to_le32((u32)value); - hw_desc->opc = DMA_CDB_OPC_DFILL128; -} - -/** * ppc440spe_desc_set_src_addr - set source address into the descriptor */ static void ppc440spe_desc_set_src_addr(struct ppc440spe_adma_desc_slot *desc, @@ -1504,8 +1481,6 @@ static dma_cookie_t ppc440spe_adma_run_tx_complete_actions( struct ppc440spe_adma_chan *chan, dma_cookie_t cookie) { - int i; - BUG_ON(desc->async_tx.cookie < 0); if (desc->async_tx.cookie > 0) { cookie = desc->async_tx.cookie; @@ -3898,7 +3873,7 @@ static void ppc440spe_adma_init_capabilities(struct ppc440spe_adma_device *adev) ppc440spe_adma_prep_dma_interrupt; } pr_info("%s: AMCC(R) PPC440SP(E) ADMA Engine: " - "( %s%s%s%s%s%s%s)\n", + "( %s%s%s%s%s%s)\n", dev_name(adev->dev), dma_has_cap(DMA_PQ, adev->common.cap_mask) ? "pq " : "", dma_has_cap(DMA_PQ_VAL, adev->common.cap_mask) ? "pq_val " : "", -- cgit v0.10.2 From bbc76560d488c437dbddff72242b0a07e42a0fd0 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Mon, 9 Dec 2013 11:16:00 -0800 Subject: dma: fix fsldma build warnings drivers/dma/fsldma.c: In function 'fsldma_cleanup_descriptor': drivers/dma/fsldma.c:860:6: warning: unused variable 'len' [-Wunused-variable] drivers/dma/fsldma.c:859:13: warning: unused variable 'dst' [-Wunused-variable] drivers/dma/fsldma.c:858:13: warning: unused variable 'src' [-Wunused-variable] drivers/dma/fsldma.c:857:17: warning: unused variable 'dev' [-Wunused-variable] - due to unmap changes drivers/dma/fsldma.c: In function 'fsl_dma_tx_submit': drivers/dma/fsldma.c:428:2: warning: 'cookie' may be used uninitialized in this function [-Wuninitialized] - long standing warning Cc: Bartlomiej Zolnierkiewicz Cc: Li Yang Cc: Zhang Wei Signed-off-by: Dan Williams diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 7086a16..f157c6f 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -86,11 +86,6 @@ static void set_desc_cnt(struct fsldma_chan *chan, hw->count = CPU_TO_DMA(chan, count, 32); } -static u32 get_desc_cnt(struct fsldma_chan *chan, struct fsl_desc_sw *desc) -{ - return DMA_TO_CPU(chan, desc->hw.count, 32); -} - static void set_desc_src(struct fsldma_chan *chan, struct fsl_dma_ld_hw *hw, dma_addr_t src) { @@ -101,16 +96,6 @@ static void set_desc_src(struct fsldma_chan *chan, hw->src_addr = CPU_TO_DMA(chan, snoop_bits | src, 64); } -static dma_addr_t get_desc_src(struct fsldma_chan *chan, - struct fsl_desc_sw *desc) -{ - u64 snoop_bits; - - snoop_bits = ((chan->feature & FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) - ? ((u64)FSL_DMA_SATR_SREADTYPE_SNOOP_READ << 32) : 0; - return DMA_TO_CPU(chan, desc->hw.src_addr, 64) & ~snoop_bits; -} - static void set_desc_dst(struct fsldma_chan *chan, struct fsl_dma_ld_hw *hw, dma_addr_t dst) { @@ -121,16 +106,6 @@ static void set_desc_dst(struct fsldma_chan *chan, hw->dst_addr = CPU_TO_DMA(chan, snoop_bits | dst, 64); } -static dma_addr_t get_desc_dst(struct fsldma_chan *chan, - struct fsl_desc_sw *desc) -{ - u64 snoop_bits; - - snoop_bits = ((chan->feature & FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) - ? ((u64)FSL_DMA_DATR_DWRITETYPE_SNOOP_WRITE << 32) : 0; - return DMA_TO_CPU(chan, desc->hw.dst_addr, 64) & ~snoop_bits; -} - static void set_desc_next(struct fsldma_chan *chan, struct fsl_dma_ld_hw *hw, dma_addr_t next) { @@ -408,7 +383,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct dma_async_tx_descriptor *tx) struct fsl_desc_sw *desc = tx_to_fsl_desc(tx); struct fsl_desc_sw *child; unsigned long flags; - dma_cookie_t cookie; + dma_cookie_t cookie = -EINVAL; spin_lock_irqsave(&chan->desc_lock, flags); @@ -854,10 +829,6 @@ static void fsldma_cleanup_descriptor(struct fsldma_chan *chan, struct fsl_desc_sw *desc) { struct dma_async_tx_descriptor *txd = &desc->async_tx; - struct device *dev = chan->common.device->dev; - dma_addr_t src = get_desc_src(chan, desc); - dma_addr_t dst = get_desc_dst(chan, desc); - u32 len = get_desc_cnt(chan, desc); /* Run the link descriptor callback function */ if (txd->callback) { -- cgit v0.10.2 From 745c00daf9a75bacb53d0fe8635a132673ab0b46 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Mon, 9 Dec 2013 11:16:01 -0800 Subject: dmatest: fix build warning on mips drivers/dma/dmatest.c:543:11: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast [enabled by default] mips expects virt_to_phys() to take a pointer. Fix up the types accordingly. Signed-off-by: Dan Williams diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c index 20f9a3a..9dfcaf5 100644 --- a/drivers/dma/dmatest.c +++ b/drivers/dma/dmatest.c @@ -539,9 +539,9 @@ static int dmatest_func(void *data) um->len = params->buf_size; for (i = 0; i < src_cnt; i++) { - unsigned long buf = (unsigned long) thread->srcs[i]; + void *buf = thread->srcs[i]; struct page *pg = virt_to_page(buf); - unsigned pg_off = buf & ~PAGE_MASK; + unsigned pg_off = (unsigned long) buf & ~PAGE_MASK; um->addr[i] = dma_map_page(dev->dev, pg, pg_off, um->len, DMA_TO_DEVICE); @@ -559,9 +559,9 @@ static int dmatest_func(void *data) /* map with DMA_BIDIRECTIONAL to force writeback/invalidate */ dsts = &um->addr[src_cnt]; for (i = 0; i < dst_cnt; i++) { - unsigned long buf = (unsigned long) thread->dsts[i]; + void *buf = thread->dsts[i]; struct page *pg = virt_to_page(buf); - unsigned pg_off = buf & ~PAGE_MASK; + unsigned pg_off = (unsigned long) buf & ~PAGE_MASK; dsts[i] = dma_map_page(dev->dev, pg, pg_off, um->len, DMA_BIDIRECTIONAL); -- cgit v0.10.2 From 8e5ee258d98a6643227d958361aec2a62559b804 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Mon, 9 Dec 2013 11:16:03 -0800 Subject: dma: fix build warnings in txx9 The unmap rework missed this: drivers/dma/txx9dmac.c:409:25: warning: unused variable 'ds' [-Wunused-variable] Cc: Bartlomiej Zolnierkiewicz Signed-off-by: Dan Williams diff --git a/drivers/dma/txx9dmac.c b/drivers/dma/txx9dmac.c index bae6c29..17686ca 100644 --- a/drivers/dma/txx9dmac.c +++ b/drivers/dma/txx9dmac.c @@ -406,7 +406,6 @@ txx9dmac_descriptor_complete(struct txx9dmac_chan *dc, dma_async_tx_callback callback; void *param; struct dma_async_tx_descriptor *txd = &desc->txd; - struct txx9dmac_slave *ds = dc->chan.private; dev_vdbg(chan2dev(&dc->chan), "descriptor %u %p complete\n", txd->cookie, desc); -- cgit v0.10.2 From 3cc377b9ae4bd3133bf8ba388d2b2b66b2b973c1 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Mon, 9 Dec 2013 10:33:16 -0800 Subject: dmaengine: fix enable for high order unmap pools The higher order mempools support raid operations, and we want to disable them when raid support is not enabled. Making them conditional on ASYNC_TX_DMA is not sufficient as other users (specifically dmatest) will also issue raid operations. Make raid drivers explicitly request that the core carry the higher order pools. Reported-by: Ezequiel Garcia Tested-by: Ezequiel Garcia Signed-off-by: Dan Williams diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 446687c..132a4fd 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -62,6 +62,7 @@ config INTEL_IOATDMA tristate "Intel I/OAT DMA support" depends on PCI && X86 select DMA_ENGINE + select DMA_ENGINE_RAID select DCA help Enable support for the Intel(R) I/OAT DMA engine present @@ -112,6 +113,7 @@ config MV_XOR bool "Marvell XOR engine support" depends on PLAT_ORION select DMA_ENGINE + select DMA_ENGINE_RAID select ASYNC_TX_ENABLE_CHANNEL_SWITCH ---help--- Enable support for the Marvell XOR engine. @@ -187,6 +189,7 @@ config AMCC_PPC440SPE_ADMA tristate "AMCC PPC440SPe ADMA support" depends on 440SPe || 440SP select DMA_ENGINE + select DMA_ENGINE_RAID select ARCH_HAS_ASYNC_TX_FIND_CHANNEL select ASYNC_TX_ENABLE_CHANNEL_SWITCH help @@ -377,4 +380,7 @@ config DMATEST Simple DMA test client. Say N unless you're debugging a DMA Device driver. +config DMA_ENGINE_RAID + bool + endif diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index ea806bd..b601024 100644 --- a/drivers/dma/dmaengine.c +++ b/drivers/dma/dmaengine.c @@ -912,7 +912,7 @@ struct dmaengine_unmap_pool { #define __UNMAP_POOL(x) { .size = x, .name = "dmaengine-unmap-" __stringify(x) } static struct dmaengine_unmap_pool unmap_pool[] = { __UNMAP_POOL(2), - #if IS_ENABLED(CONFIG_ASYNC_TX_DMA) + #if IS_ENABLED(CONFIG_DMA_ENGINE_RAID) __UNMAP_POOL(16), __UNMAP_POOL(128), __UNMAP_POOL(256), -- cgit v0.10.2 From d16695a75019ac4baad7a117dc86d1d292e09115 Mon Sep 17 00:00:00 2001 From: Ezequiel Garcia Date: Tue, 10 Dec 2013 09:32:36 -0300 Subject: dma: mv_xor: Use dmaengine_unmap_data for the self-tests The driver-specific unmap code was removed in: commit 54f8d501e842879143e867e70996574a54d1e130 Author: Bartlomiej Zolnierkiewicz Date: Fri Oct 18 19:35:32 2013 +0200 dmaengine: remove DMA unmap from drivers which had the side-effect of not unmapping the self-test mappings. Fix this by using dmaengine_unmap_data in the self-test routines. In addition, since dmaengine_unmap() assumes that all mappings were created with dma_map_page, this commit changes the single mapping to a page mapping to avoid an incorrect unmapping of the memcpy self-test. The allocation could be changed to be alloc_page(), but sticking to kmalloc results in a less intrusive patch. The size of the test buffer is increased, since dma_map_page() seem to fail when the source and destination pages are the same page. Signed-off-by: Ezequiel Garcia Signed-off-by: Dan Williams diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c index 23bcc91..7962088 100644 --- a/drivers/dma/mv_xor.c +++ b/drivers/dma/mv_xor.c @@ -781,7 +781,6 @@ static void mv_xor_issue_pending(struct dma_chan *chan) /* * Perform a transaction to verify the HW works. */ -#define MV_XOR_TEST_SIZE 2000 static int mv_xor_memcpy_self_test(struct mv_xor_chan *mv_chan) { @@ -791,20 +790,21 @@ static int mv_xor_memcpy_self_test(struct mv_xor_chan *mv_chan) struct dma_chan *dma_chan; dma_cookie_t cookie; struct dma_async_tx_descriptor *tx; + struct dmaengine_unmap_data *unmap; int err = 0; - src = kmalloc(sizeof(u8) * MV_XOR_TEST_SIZE, GFP_KERNEL); + src = kmalloc(sizeof(u8) * PAGE_SIZE, GFP_KERNEL); if (!src) return -ENOMEM; - dest = kzalloc(sizeof(u8) * MV_XOR_TEST_SIZE, GFP_KERNEL); + dest = kzalloc(sizeof(u8) * PAGE_SIZE, GFP_KERNEL); if (!dest) { kfree(src); return -ENOMEM; } /* Fill in src buffer */ - for (i = 0; i < MV_XOR_TEST_SIZE; i++) + for (i = 0; i < PAGE_SIZE; i++) ((u8 *) src)[i] = (u8)i; dma_chan = &mv_chan->dmachan; @@ -813,14 +813,26 @@ static int mv_xor_memcpy_self_test(struct mv_xor_chan *mv_chan) goto out; } - dest_dma = dma_map_single(dma_chan->device->dev, dest, - MV_XOR_TEST_SIZE, DMA_FROM_DEVICE); + unmap = dmaengine_get_unmap_data(dma_chan->device->dev, 2, GFP_KERNEL); + if (!unmap) { + err = -ENOMEM; + goto free_resources; + } + + src_dma = dma_map_page(dma_chan->device->dev, virt_to_page(src), 0, + PAGE_SIZE, DMA_TO_DEVICE); + unmap->to_cnt = 1; + unmap->addr[0] = src_dma; - src_dma = dma_map_single(dma_chan->device->dev, src, - MV_XOR_TEST_SIZE, DMA_TO_DEVICE); + dest_dma = dma_map_page(dma_chan->device->dev, virt_to_page(dest), 0, + PAGE_SIZE, DMA_FROM_DEVICE); + unmap->from_cnt = 1; + unmap->addr[1] = dest_dma; + + unmap->len = PAGE_SIZE; tx = mv_xor_prep_dma_memcpy(dma_chan, dest_dma, src_dma, - MV_XOR_TEST_SIZE, 0); + PAGE_SIZE, 0); cookie = mv_xor_tx_submit(tx); mv_xor_issue_pending(dma_chan); async_tx_ack(tx); @@ -835,8 +847,8 @@ static int mv_xor_memcpy_self_test(struct mv_xor_chan *mv_chan) } dma_sync_single_for_cpu(dma_chan->device->dev, dest_dma, - MV_XOR_TEST_SIZE, DMA_FROM_DEVICE); - if (memcmp(src, dest, MV_XOR_TEST_SIZE)) { + PAGE_SIZE, DMA_FROM_DEVICE); + if (memcmp(src, dest, PAGE_SIZE)) { dev_err(dma_chan->device->dev, "Self-test copy failed compare, disabling\n"); err = -ENODEV; @@ -844,6 +856,7 @@ static int mv_xor_memcpy_self_test(struct mv_xor_chan *mv_chan) } free_resources: + dmaengine_unmap_put(unmap); mv_xor_free_chan_resources(dma_chan); out: kfree(src); @@ -861,13 +874,15 @@ mv_xor_xor_self_test(struct mv_xor_chan *mv_chan) dma_addr_t dma_srcs[MV_XOR_NUM_SRC_TEST]; dma_addr_t dest_dma; struct dma_async_tx_descriptor *tx; + struct dmaengine_unmap_data *unmap; struct dma_chan *dma_chan; dma_cookie_t cookie; u8 cmp_byte = 0; u32 cmp_word; int err = 0; + int src_count = MV_XOR_NUM_SRC_TEST; - for (src_idx = 0; src_idx < MV_XOR_NUM_SRC_TEST; src_idx++) { + for (src_idx = 0; src_idx < src_count; src_idx++) { xor_srcs[src_idx] = alloc_page(GFP_KERNEL); if (!xor_srcs[src_idx]) { while (src_idx--) @@ -884,13 +899,13 @@ mv_xor_xor_self_test(struct mv_xor_chan *mv_chan) } /* Fill in src buffers */ - for (src_idx = 0; src_idx < MV_XOR_NUM_SRC_TEST; src_idx++) { + for (src_idx = 0; src_idx < src_count; src_idx++) { u8 *ptr = page_address(xor_srcs[src_idx]); for (i = 0; i < PAGE_SIZE; i++) ptr[i] = (1 << src_idx); } - for (src_idx = 0; src_idx < MV_XOR_NUM_SRC_TEST; src_idx++) + for (src_idx = 0; src_idx < src_count; src_idx++) cmp_byte ^= (u8) (1 << src_idx); cmp_word = (cmp_byte << 24) | (cmp_byte << 16) | @@ -904,16 +919,29 @@ mv_xor_xor_self_test(struct mv_xor_chan *mv_chan) goto out; } + unmap = dmaengine_get_unmap_data(dma_chan->device->dev, src_count + 1, + GFP_KERNEL); + if (!unmap) { + err = -ENOMEM; + goto free_resources; + } + /* test xor */ - dest_dma = dma_map_page(dma_chan->device->dev, dest, 0, PAGE_SIZE, - DMA_FROM_DEVICE); + for (i = 0; i < src_count; i++) { + unmap->addr[i] = dma_map_page(dma_chan->device->dev, xor_srcs[i], + 0, PAGE_SIZE, DMA_TO_DEVICE); + dma_srcs[i] = unmap->addr[i]; + unmap->to_cnt++; + } - for (i = 0; i < MV_XOR_NUM_SRC_TEST; i++) - dma_srcs[i] = dma_map_page(dma_chan->device->dev, xor_srcs[i], - 0, PAGE_SIZE, DMA_TO_DEVICE); + unmap->addr[src_count] = dma_map_page(dma_chan->device->dev, dest, 0, PAGE_SIZE, + DMA_FROM_DEVICE); + dest_dma = unmap->addr[src_count]; + unmap->from_cnt = 1; + unmap->len = PAGE_SIZE; tx = mv_xor_prep_dma_xor(dma_chan, dest_dma, dma_srcs, - MV_XOR_NUM_SRC_TEST, PAGE_SIZE, 0); + src_count, PAGE_SIZE, 0); cookie = mv_xor_tx_submit(tx); mv_xor_issue_pending(dma_chan); @@ -942,9 +970,10 @@ mv_xor_xor_self_test(struct mv_xor_chan *mv_chan) } free_resources: + dmaengine_unmap_put(unmap); mv_xor_free_chan_resources(dma_chan); out: - src_idx = MV_XOR_NUM_SRC_TEST; + src_idx = src_count; while (src_idx--) __free_page(xor_srcs[src_idx]); __free_page(dest); -- cgit v0.10.2 From 0be8253fa2b4385e6246387db1d6067366e987ba Mon Sep 17 00:00:00 2001 From: Russell King Date: Thu, 12 Dec 2013 23:59:08 +0000 Subject: dmaengine: mv_xor: fix oops when channels fail to initialise When a channel fails to initialise, we error out and clean up any previously unregistered channels by walking the entire xordev->channels array. Unfortunately, there are paths which end up storing an error pointer in this array, which we then try and dereference in the cleanup code, which causes an oops. Fix this by avoiding writing invalid pointers to this array in the first place. Tested-by: Aaro Koskinen Signed-off-by: Russell King Signed-off-by: Dan Williams diff --git a/drivers/dma/mv_xor.c b/drivers/dma/mv_xor.c index 7962088..53fb0c8 100644 --- a/drivers/dma/mv_xor.c +++ b/drivers/dma/mv_xor.c @@ -1199,6 +1199,7 @@ static int mv_xor_probe(struct platform_device *pdev) int i = 0; for_each_child_of_node(pdev->dev.of_node, np) { + struct mv_xor_chan *chan; dma_cap_mask_t cap_mask; int irq; @@ -1216,21 +1217,21 @@ static int mv_xor_probe(struct platform_device *pdev) goto err_channel_add; } - xordev->channels[i] = - mv_xor_channel_add(xordev, pdev, i, - cap_mask, irq); - if (IS_ERR(xordev->channels[i])) { - ret = PTR_ERR(xordev->channels[i]); - xordev->channels[i] = NULL; + chan = mv_xor_channel_add(xordev, pdev, i, + cap_mask, irq); + if (IS_ERR(chan)) { + ret = PTR_ERR(chan); irq_dispose_mapping(irq); goto err_channel_add; } + xordev->channels[i] = chan; i++; } } else if (pdata && pdata->channels) { for (i = 0; i < MV_XOR_MAX_CHANNELS; i++) { struct mv_xor_channel_data *cd; + struct mv_xor_chan *chan; int irq; cd = &pdata->channels[i]; @@ -1245,13 +1246,14 @@ static int mv_xor_probe(struct platform_device *pdev) goto err_channel_add; } - xordev->channels[i] = - mv_xor_channel_add(xordev, pdev, i, - cd->cap_mask, irq); - if (IS_ERR(xordev->channels[i])) { - ret = PTR_ERR(xordev->channels[i]); + chan = mv_xor_channel_add(xordev, pdev, i, + cd->cap_mask, irq); + if (IS_ERR(chan)) { + ret = PTR_ERR(chan); goto err_channel_add; } + + xordev->channels[i] = chan; } } -- cgit v0.10.2 From c29cb5eb8157a0049c882672a7f941261f23ea34 Mon Sep 17 00:00:00 2001 From: Hui Wang Date: Fri, 13 Dec 2013 11:57:05 +0800 Subject: ALSA: hda - Add Dell headset detection quirk for three laptop models On the Dell machines with codec whose Subsystem Id is 0x10280610, 0x10280629 or 0x1028063e, no external microphone can be detected when plugging a 3-ring headset. If we add "model=dell-headset-multi" for the snd-hda-intel.ko, the problem will disappear. The codecs on these machines belong to alc_269 family. BugLink: https://bugs.launchpad.net/bugs/1260303 Cc: David Henningsson Cc: stable@vger.kernel.org Signed-off-by: Hui Wang Signed-off-by: Takashi Iwai diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 34de5dc..5ab8e16 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -4247,11 +4247,14 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1028, 0x0606, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1028, 0x0608, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1028, 0x0609, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x1028, 0x0610, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1028, 0x0613, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1028, 0x0614, "Dell Inspiron 3135", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1028, 0x0616, "Dell Vostro 5470", ALC290_FIXUP_MONO_SPEAKERS), SND_PCI_QUIRK(0x1028, 0x061f, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x1028, 0x0629, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1028, 0x0638, "Dell Inspiron 5439", ALC290_FIXUP_MONO_SPEAKERS), + SND_PCI_QUIRK(0x1028, 0x063e, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1028, 0x063f, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1028, 0x15cc, "Dell X5 Precision", ALC269_FIXUP_DELL2_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1028, 0x15cd, "Dell X5 Precision", ALC269_FIXUP_DELL2_MIC_NO_PRESENCE), -- cgit v0.10.2 From 8194ee27764b1a86fa7a6b0d411f0a225a6abd5f Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Fri, 13 Dec 2013 00:57:03 -0800 Subject: dmaengine: fix sleep in atomic BUG: sleeping function called from invalid context at mm/mempool.c:203 in_atomic(): 1, irqs_disabled(): 0, pid: 43502, name: linbug no locks held by linbug/43502. CPU: 7 PID: 43502 Comm: linbug Not tainted 3.13.0-rc1+ #15 Hardware name: 0000000000000010 ffff88005ebd1878 ffffffff8172d512 ffff8801752bc1c0 ffff8801752bc1c0 ffff88005ebd1898 ffffffff8109d1f6 ffff88005f9a3c58 ffff880177f0f080 ffff88005ebd1918 ffffffff81161f43 ffff88005ebd18f8 Call Trace: [] dump_stack+0x4e/0x68 [] __might_sleep+0xe6/0x120 [] mempool_alloc+0x93/0x170 [] ? mark_held_locks+0x74/0x140 [] ? follow_page_mask+0x556/0x600 [] dmaengine_get_unmap_data+0x2e/0x60 [] dma_async_memcpy_pg_to_pg+0x41/0x1c0 [] dma_async_memcpy_buf_to_pg+0x50/0x60 [] dma_memcpy_to_iovec+0xfc/0x190 [] dma_skb_copy_datagram_iovec+0x6f/0x2b0 Signed-off-by: Dan Williams diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index b601024..ef63b90 100644 --- a/drivers/dma/dmaengine.c +++ b/drivers/dma/dmaengine.c @@ -1054,7 +1054,7 @@ dma_async_memcpy_pg_to_pg(struct dma_chan *chan, struct page *dest_pg, dma_cookie_t cookie; unsigned long flags; - unmap = dmaengine_get_unmap_data(dev->dev, 2, GFP_NOIO); + unmap = dmaengine_get_unmap_data(dev->dev, 2, GFP_NOWAIT); if (!unmap) return -ENOMEM; -- cgit v0.10.2 From 380108d891acf8db5cf0d477176c7ed2b62b7928 Mon Sep 17 00:00:00 2001 From: Julien Grall Date: Tue, 3 Dec 2013 15:40:37 +0000 Subject: xen/block: Correctly define structures in public headers on ARM32 and ARM64 On ARM (32 bits and 64 bits), the double-word is 8-bytes aligned. This will result on different structure from Xen and Linux repositories. As Linux is using __packed__ attribute, it must have a 4-bytes padding before each "id" field. This change breaks guest block support with older kernel. IMHO, it's acceptable because Xen on ARM is still on Tech Preview and the hypercall ABI is not yet freezed. Only one architecture (x86_32) doesn't have 64-bit ABI for the block interface. Don't add padding if Linux is compiled for this architecture. Signed-off-by: Julien Grall Acked-by: Roger Pau Monne Acked-by: David Vrabel Cc: Boris Ostrovsky Cc: Ian Campbell Acked-by: Stefano Stabellini [I had asked for confirmation that it did not break x86 and Ian went beyound the call of duty to confirm it. Also a internal regression bucket with 32/64 dom0 with 32/64 domU (PV and HVM) confirmed no regressions. ABI changes are a drag..] Signed-off-by: Konrad Rzeszutek Wilk diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h index 65e1209..ae665ac 100644 --- a/include/xen/interface/io/blkif.h +++ b/include/xen/interface/io/blkif.h @@ -146,7 +146,7 @@ struct blkif_request_segment_aligned { struct blkif_request_rw { uint8_t nr_segments; /* number of segments */ blkif_vdev_t handle; /* only for read/write requests */ -#ifdef CONFIG_X86_64 +#ifndef CONFIG_X86_32 uint32_t _pad1; /* offsetof(blkif_request,u.rw.id) == 8 */ #endif uint64_t id; /* private guest value, echoed in resp */ @@ -163,7 +163,7 @@ struct blkif_request_discard { uint8_t flag; /* BLKIF_DISCARD_SECURE or zero. */ #define BLKIF_DISCARD_SECURE (1<<0) /* ignored if discard-secure=0 */ blkif_vdev_t _pad1; /* only for read/write requests */ -#ifdef CONFIG_X86_64 +#ifndef CONFIG_X86_32 uint32_t _pad2; /* offsetof(blkif_req..,u.discard.id)==8*/ #endif uint64_t id; /* private guest value, echoed in resp */ @@ -175,7 +175,7 @@ struct blkif_request_discard { struct blkif_request_other { uint8_t _pad1; blkif_vdev_t _pad2; /* only for read/write requests */ -#ifdef CONFIG_X86_64 +#ifndef CONFIG_X86_32 uint32_t _pad3; /* offsetof(blkif_req..,u.other.id)==8*/ #endif uint64_t id; /* private guest value, echoed in resp */ @@ -184,7 +184,7 @@ struct blkif_request_other { struct blkif_request_indirect { uint8_t indirect_op; uint16_t nr_segments; -#ifdef CONFIG_X86_64 +#ifndef CONFIG_X86_32 uint32_t _pad1; /* offsetof(blkif_...,u.indirect.id) == 8 */ #endif uint64_t id; @@ -192,7 +192,7 @@ struct blkif_request_indirect { blkif_vdev_t handle; uint16_t _pad2; grant_ref_t indirect_grefs[BLKIF_MAX_INDIRECT_PAGES_PER_REQUEST]; -#ifdef CONFIG_X86_64 +#ifndef CONFIG_X86_32 uint32_t _pad3; /* make it 64 byte aligned */ #else uint64_t _pad3; /* make it 64 byte aligned */ -- cgit v0.10.2 From d7ec435fdd03cfee70dba934ee384acc87bd6d00 Mon Sep 17 00:00:00 2001 From: David Howells Date: Fri, 13 Dec 2013 15:20:19 +0000 Subject: X.509: Fix certificate gathering Fix the gathering of certificates from both the source tree and the build tree to correctly calculate the pathnames of all the certificates. The problem was that if the default generated cert, signing_key.x509, didn't exist then it would not have a path attached and if it did, it would have a path attached. This means that the contents of kernel/.x509.list would change between the first compilation in a directory and the second. After the second it would remain stable because the signing_key.x509 file exists. The consequence was that the kernel would get relinked unconditionally on the second recompilation. The second recompilation would also show something like this: X.509 certificate list changed CERTS kernel/x509_certificate_list - Including cert /home/torvalds/v2.6/linux/signing_key.x509 AS kernel/system_certificates.o LD kernel/built-in.o which is why the relink would happen. Unfortunately, it isn't a simple matter of just sticking a path on the front of the filename of the certificate in the build directory as make can't then work out how to build it. So the path has to be prepended to the name for sorting and duplicate elimination and then removed for the make rule if it is in the build tree. Reported-by: Linus Torvalds Signed-off-by: David Howells diff --git a/kernel/Makefile b/kernel/Makefile index bbaf7d5..c23bb0b 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -137,9 +137,10 @@ $(obj)/timeconst.h: $(obj)/hz.bc $(src)/timeconst.bc FORCE ############################################################################### ifeq ($(CONFIG_SYSTEM_TRUSTED_KEYRING),y) X509_CERTIFICATES-y := $(wildcard *.x509) $(wildcard $(srctree)/*.x509) -X509_CERTIFICATES-$(CONFIG_MODULE_SIG) += signing_key.x509 -X509_CERTIFICATES := $(sort $(foreach CERT,$(X509_CERTIFICATES-y), \ +X509_CERTIFICATES-$(CONFIG_MODULE_SIG) += $(objtree)/signing_key.x509 +X509_CERTIFICATES-raw := $(sort $(foreach CERT,$(X509_CERTIFICATES-y), \ $(or $(realpath $(CERT)),$(CERT)))) +X509_CERTIFICATES := $(subst $(realpath $(objtree))/,,$(X509_CERTIFICATES-raw)) ifeq ($(X509_CERTIFICATES),) $(warning *** No X.509 certificates found ***) -- cgit v0.10.2 From f46a3cbbebdaa5ca7b3ab23d7b81925dbe152bcb Mon Sep 17 00:00:00 2001 From: Kirill Tkhai Date: Tue, 10 Dec 2013 22:39:57 +0400 Subject: KEYS: Remove files generated when SYSTEM_TRUSTED_KEYRING=y Always remove generated SYSTEM_TRUSTED_KEYRING files while doing make mrproper. Signed-off-by: Kirill Tkhai Signed-off-by: David Howells diff --git a/kernel/Makefile b/kernel/Makefile index c23bb0b..bc010ee 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -165,9 +165,9 @@ $(obj)/x509_certificate_list: $(X509_CERTIFICATES) $(obj)/.x509.list targets += $(obj)/.x509.list $(obj)/.x509.list: @echo $(X509_CERTIFICATES) >$@ +endif clean-files := x509_certificate_list .x509.list -endif ifeq ($(CONFIG_MODULE_SIG),y) ############################################################################### -- cgit v0.10.2 From 6bd364d82920be726c2d678e7ba9e27112686e11 Mon Sep 17 00:00:00 2001 From: Xiao Guangrong Date: Fri, 13 Dec 2013 15:00:32 +0800 Subject: KEYS: fix uninitialized persistent_keyring_register_sem We run into this bug: [ 2736.063245] Unable to handle kernel paging request for data at address 0x00000000 [ 2736.063293] Faulting instruction address: 0xc00000000037efb0 [ 2736.063300] Oops: Kernel access of bad area, sig: 11 [#1] [ 2736.063303] SMP NR_CPUS=2048 NUMA pSeries [ 2736.063310] Modules linked in: sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6table_security ip6table_raw ip6t_REJECT iptable_nat nf_nat_ipv4 iptable_mangle iptable_security iptable_raw ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack ebtable_filter ebtables ip6table_filter iptable_filter ip_tables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 nf_nat nf_conntrack ip6_tables ibmveth pseries_rng nx_crypto nfsd auth_rpcgss nfs_acl lockd sunrpc binfmt_misc xfs libcrc32c dm_service_time sd_mod crc_t10dif crct10dif_common ibmvfc scsi_transport_fc scsi_tgt dm_mirror dm_region_hash dm_log dm_multipath dm_mod [ 2736.063383] CPU: 1 PID: 7128 Comm: ssh Not tainted 3.10.0-48.el7.ppc64 #1 [ 2736.063389] task: c000000131930120 ti: c0000001319a0000 task.ti: c0000001319a0000 [ 2736.063394] NIP: c00000000037efb0 LR: c0000000006c40f8 CTR: 0000000000000000 [ 2736.063399] REGS: c0000001319a3870 TRAP: 0300 Not tainted (3.10.0-48.el7.ppc64) [ 2736.063403] MSR: 8000000000009032 CR: 28824242 XER: 20000000 [ 2736.063415] SOFTE: 0 [ 2736.063418] CFAR: c00000000000908c [ 2736.063421] DAR: 0000000000000000, DSISR: 40000000 [ 2736.063425] GPR00: c0000000006c40f8 c0000001319a3af0 c000000001074788 c0000001319a3bf0 GPR04: 0000000000000000 0000000000000000 0000000000000020 000000000000000a GPR08: fffffffe00000002 00000000ffff0000 0000000080000001 c000000000924888 GPR12: 0000000028824248 c000000007e00400 00001fffffa0f998 0000000000000000 GPR16: 0000000000000022 00001fffffa0f998 0000010022e92470 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 c000000000f4a828 00003ffffe527108 0000000000000000 GPR28: c000000000f4a730 c000000000f4a828 0000000000000000 c0000001319a3bf0 [ 2736.063498] NIP [c00000000037efb0] .__list_add+0x30/0x110 [ 2736.063504] LR [c0000000006c40f8] .rwsem_down_write_failed+0x78/0x264 [ 2736.063508] PACATMSCRATCH [800000000280f032] [ 2736.063511] Call Trace: [ 2736.063516] [c0000001319a3af0] [c0000001319a3b80] 0xc0000001319a3b80 (unreliable) [ 2736.063523] [c0000001319a3b80] [c0000000006c40f8] .rwsem_down_write_failed+0x78/0x264 [ 2736.063530] [c0000001319a3c50] [c0000000006c1bb0] .down_write+0x70/0x78 [ 2736.063536] [c0000001319a3cd0] [c0000000002e5ffc] .keyctl_get_persistent+0x20c/0x320 [ 2736.063542] [c0000001319a3dc0] [c0000000002e2388] .SyS_keyctl+0x238/0x260 [ 2736.063548] [c0000001319a3e30] [c000000000009e7c] syscall_exit+0x0/0x7c [ 2736.063553] Instruction dump: [ 2736.063556] 7c0802a6 fba1ffe8 fbc1fff0 fbe1fff8 7cbd2b78 7c9e2378 7c7f1b78 f8010010 [ 2736.063566] f821ff71 e8a50008 7fa52040 40de00c0 7fbd2840 40de0094 7fbff040 [ 2736.063579] ---[ end trace 2708241785538296 ]--- It's caused by uninitialized persistent_keyring_register_sem. The bug was introduced by commit f36f8c75, two typos are in that commit: CONFIG_KEYS_KERBEROS_CACHE should be CONFIG_PERSISTENT_KEYRINGS and krb_cache_register_sem should be persistent_keyring_register_sem. Signed-off-by: Xiao Guangrong Signed-off-by: David Howells diff --git a/kernel/user.c b/kernel/user.c index a3a0dbf..c006131 100644 --- a/kernel/user.c +++ b/kernel/user.c @@ -51,9 +51,9 @@ struct user_namespace init_user_ns = { .owner = GLOBAL_ROOT_UID, .group = GLOBAL_ROOT_GID, .proc_inum = PROC_USER_INIT_INO, -#ifdef CONFIG_KEYS_KERBEROS_CACHE - .krb_cache_register_sem = - __RWSEM_INITIALIZER(init_user_ns.krb_cache_register_sem), +#ifdef CONFIG_PERSISTENT_KEYRINGS + .persistent_keyring_register_sem = + __RWSEM_INITIALIZER(init_user_ns.persistent_keyring_register_sem), #endif }; EXPORT_SYMBOL_GPL(init_user_ns); -- cgit v0.10.2 From 3cafea3076423987726023235e548af1d534ff1a Mon Sep 17 00:00:00 2001 From: James Solner Date: Wed, 6 Nov 2013 12:53:36 -0600 Subject: Add Documentation/module-signing.txt file This patch adds the Documentation/module-signing.txt file that is currently missing from the Documentation directory. The init/Kconfig file references the Documentation/module-signing.txt file to explain how kernel module signing works. This patch supplies this documentation. Signed-off-by: James Solner Signed-off-by: David Howells diff --git a/Documentation/module-signing.txt b/Documentation/module-signing.txt new file mode 100644 index 0000000..2b40e04 --- /dev/null +++ b/Documentation/module-signing.txt @@ -0,0 +1,240 @@ + ============================== + KERNEL MODULE SIGNING FACILITY + ============================== + +CONTENTS + + - Overview. + - Configuring module signing. + - Generating signing keys. + - Public keys in the kernel. + - Manually signing modules. + - Signed modules and stripping. + - Loading signed modules. + - Non-valid signatures and unsigned modules. + - Administering/protecting the private key. + + +======== +OVERVIEW +======== + +The kernel module signing facility cryptographically signs modules during +installation and then checks the signature upon loading the module. This +allows increased kernel security by disallowing the loading of unsigned modules +or modules signed with an invalid key. Module signing increases security by +making it harder to load a malicious module into the kernel. The module +signature checking is done by the kernel so that it is not necessary to have +trusted userspace bits. + +This facility uses X.509 ITU-T standard certificates to encode the public keys +involved. The signatures are not themselves encoded in any industrial standard +type. The facility currently only supports the RSA public key encryption +standard (though it is pluggable and permits others to be used). The possible +hash algorithms that can be used are SHA-1, SHA-224, SHA-256, SHA-384, and +SHA-512 (the algorithm is selected by data in the signature). + + +========================== +CONFIGURING MODULE SIGNING +========================== + +The module signing facility is enabled by going to the "Enable Loadable Module +Support" section of the kernel configuration and turning on + + CONFIG_MODULE_SIG "Module signature verification" + +This has a number of options available: + + (1) "Require modules to be validly signed" (CONFIG_MODULE_SIG_FORCE) + + This specifies how the kernel should deal with a module that has a + signature for which the key is not known or a module that is unsigned. + + If this is off (ie. "permissive"), then modules for which the key is not + available and modules that are unsigned are permitted, but the kernel will + be marked as being tainted. + + If this is on (ie. "restrictive"), only modules that have a valid + signature that can be verified by a public key in the kernel's possession + will be loaded. All other modules will generate an error. + + Irrespective of the setting here, if the module has a signature block that + cannot be parsed, it will be rejected out of hand. + + + (2) "Automatically sign all modules" (CONFIG_MODULE_SIG_ALL) + + If this is on then modules will be automatically signed during the + modules_install phase of a build. If this is off, then the modules must + be signed manually using: + + scripts/sign-file + + + (3) "Which hash algorithm should modules be signed with?" + + This presents a choice of which hash algorithm the installation phase will + sign the modules with: + + CONFIG_SIG_SHA1 "Sign modules with SHA-1" + CONFIG_SIG_SHA224 "Sign modules with SHA-224" + CONFIG_SIG_SHA256 "Sign modules with SHA-256" + CONFIG_SIG_SHA384 "Sign modules with SHA-384" + CONFIG_SIG_SHA512 "Sign modules with SHA-512" + + The algorithm selected here will also be built into the kernel (rather + than being a module) so that modules signed with that algorithm can have + their signatures checked without causing a dependency loop. + + +======================= +GENERATING SIGNING KEYS +======================= + +Cryptographic keypairs are required to generate and check signatures. A +private key is used to generate a signature and the corresponding public key is +used to check it. The private key is only needed during the build, after which +it can be deleted or stored securely. The public key gets built into the +kernel so that it can be used to check the signatures as the modules are +loaded. + +Under normal conditions, the kernel build will automatically generate a new +keypair using openssl if one does not exist in the files: + + signing_key.priv + signing_key.x509 + +during the building of vmlinux (the public part of the key needs to be built +into vmlinux) using parameters in the: + + x509.genkey + +file (which is also generated if it does not already exist). + +It is strongly recommended that you provide your own x509.genkey file. + +Most notably, in the x509.genkey file, the req_distinguished_name section +should be altered from the default: + + [ req_distinguished_name ] + O = Magrathea + CN = Glacier signing key + emailAddress = slartibartfast@magrathea.h2g2 + +The generated RSA key size can also be set with: + + [ req ] + default_bits = 4096 + + +It is also possible to manually generate the key private/public files using the +x509.genkey key generation configuration file in the root node of the Linux +kernel sources tree and the openssl command. The following is an example to +generate the public/private key files: + + openssl req -new -nodes -utf8 -sha256 -days 36500 -batch -x509 \ + -config x509.genkey -outform DER -out signing_key.x509 \ + -keyout signing_key.priv + + +========================= +PUBLIC KEYS IN THE KERNEL +========================= + +The kernel contains a ring of public keys that can be viewed by root. They're +in a keyring called ".system_keyring" that can be seen by: + + [root@deneb ~]# cat /proc/keys + ... + 223c7853 I------ 1 perm 1f030000 0 0 keyring .system_keyring: 1 + 302d2d52 I------ 1 perm 1f010000 0 0 asymmetri Fedora kernel signing key: d69a84e6bce3d216b979e9505b3e3ef9a7118079: X509.RSA a7118079 [] + ... + +Beyond the public key generated specifically for module signing, any file +placed in the kernel source root directory or the kernel build root directory +whose name is suffixed with ".x509" will be assumed to be an X.509 public key +and will be added to the keyring. + +Further, the architecture code may take public keys from a hardware store and +add those in also (e.g. from the UEFI key database). + +Finally, it is possible to add additional public keys by doing: + + keyctl padd asymmetric "" [.system_keyring-ID] <[key-file] + +e.g.: + + keyctl padd asymmetric "" 0x223c7853 Date: Wed, 11 Dec 2013 16:58:42 +0000 Subject: xen/balloon: Seperate the auto-translate logic properly (v2) The auto-xlat logic vs the non-xlat means that we don't need to for auto-xlat guests (like PVH, HVM or ARM): - use P2M - use scratch page. However the code in increase_reservation does modify the p2m for auto_translate guests, but not in decrease_reservation. Fix that by avoiding any p2m modifications in both increase_reservation and decrease_reservation for auto_translated guests. And also avoid allocating or using scratch pages for auto_translated guests. Lastly, since !auto-xlat is really another way of saying 'xen_pv' remove the redundant 'xen_pv_domain' check. Signed-off-by: Stefano Stabellini [v2: Updated the description] Signed-off-by: Konrad Rzeszutek Wilk diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index 55ea73f..4c02e2b 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -350,17 +350,19 @@ static enum bp_state increase_reservation(unsigned long nr_pages) pfn = page_to_pfn(page); - set_phys_to_machine(pfn, frame_list[i]); - #ifdef CONFIG_XEN_HAVE_PVMMU - /* Link back into the page tables if not highmem. */ - if (xen_pv_domain() && !PageHighMem(page)) { - int ret; - ret = HYPERVISOR_update_va_mapping( - (unsigned long)__va(pfn << PAGE_SHIFT), - mfn_pte(frame_list[i], PAGE_KERNEL), - 0); - BUG_ON(ret); + if (!xen_feature(XENFEAT_auto_translated_physmap)) { + set_phys_to_machine(pfn, frame_list[i]); + + /* Link back into the page tables if not highmem. */ + if (!PageHighMem(page)) { + int ret; + ret = HYPERVISOR_update_va_mapping( + (unsigned long)__va(pfn << PAGE_SHIFT), + mfn_pte(frame_list[i], PAGE_KERNEL), + 0); + BUG_ON(ret); + } } #endif @@ -378,7 +380,6 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp) enum bp_state state = BP_DONE; unsigned long pfn, i; struct page *page; - struct page *scratch_page; int ret; struct xen_memory_reservation reservation = { .address_bits = 0, @@ -411,27 +412,29 @@ static enum bp_state decrease_reservation(unsigned long nr_pages, gfp_t gfp) scrub_page(page); +#ifdef CONFIG_XEN_HAVE_PVMMU /* * Ballooned out frames are effectively replaced with * a scratch frame. Ensure direct mappings and the * p2m are consistent. */ - scratch_page = get_balloon_scratch_page(); -#ifdef CONFIG_XEN_HAVE_PVMMU - if (xen_pv_domain() && !PageHighMem(page)) { - ret = HYPERVISOR_update_va_mapping( - (unsigned long)__va(pfn << PAGE_SHIFT), - pfn_pte(page_to_pfn(scratch_page), - PAGE_KERNEL_RO), 0); - BUG_ON(ret); - } -#endif if (!xen_feature(XENFEAT_auto_translated_physmap)) { unsigned long p; + struct page *scratch_page = get_balloon_scratch_page(); + + if (!PageHighMem(page)) { + ret = HYPERVISOR_update_va_mapping( + (unsigned long)__va(pfn << PAGE_SHIFT), + pfn_pte(page_to_pfn(scratch_page), + PAGE_KERNEL_RO), 0); + BUG_ON(ret); + } p = page_to_pfn(scratch_page); __set_phys_to_machine(pfn, pfn_to_mfn(p)); + + put_balloon_scratch_page(); } - put_balloon_scratch_page(); +#endif balloon_append(pfn_to_page(pfn)); } @@ -627,15 +630,17 @@ static int __init balloon_init(void) if (!xen_domain()) return -ENODEV; - for_each_online_cpu(cpu) - { - per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL); - if (per_cpu(balloon_scratch_page, cpu) == NULL) { - pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n", cpu); - return -ENOMEM; + if (!xen_feature(XENFEAT_auto_translated_physmap)) { + for_each_online_cpu(cpu) + { + per_cpu(balloon_scratch_page, cpu) = alloc_page(GFP_KERNEL); + if (per_cpu(balloon_scratch_page, cpu) == NULL) { + pr_warn("Failed to allocate balloon_scratch_page for cpu %d\n", cpu); + return -ENOMEM; + } } + register_cpu_notifier(&balloon_cpu_notifier); } - register_cpu_notifier(&balloon_cpu_notifier); pr_info("Initialising balloon driver\n"); -- cgit v0.10.2 From 96b4026878d9dac71bd4c3d6e05c7fbb16a3e0aa Mon Sep 17 00:00:00 2001 From: Paulo Zanoni Date: Fri, 6 Dec 2013 20:29:01 -0200 Subject: drm/i915: change CRTC assertion on LCPLL disable Currently, PC8 is enabled at modeset_global_resources, which is called after intel_modeset_update_state. Due to this, there's a small race condition on the case where we start enabling PC8, then do a modeset while PC8 is still being enabled. The racing condition triggers a WARN because intel_modeset_update_state will mark the CRTC as enabled, then the thread that's still enabling PC8 might look at the data structure and think that PC8 is being enabled while a pipe is enabled. Despite the WARN, this is not really a bug since we'll wait for the PC8-enabling thread to finish when we call modeset_global_resources. The spec says the CRTC cannot be enabled when we disable LCPLL, so we had a check for crtc->base.enabled. If we change to crtc->active we will still prevent disabling LCPLL while the CRTC is enabled, and we will also prevent the WARN above. This is a replacement for the previous patch named "drm/i915: get/put PC8 when we get/put a CRTC" Testcase: igt/pm_pc8/modeset-lpsp-stress-no-wait Signed-off-by: Paulo Zanoni (cherry picked from commit 798183c54799fbe1e5a5bfabb3a8c0505ffd2149 from -next due to Dave's report.) Reported-by: Dave Jones Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 8b8bde7..4049a65 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -6303,7 +6303,7 @@ static void assert_can_disable_lcpll(struct drm_i915_private *dev_priv) uint32_t val; list_for_each_entry(crtc, &dev->mode_config.crtc_list, base.head) - WARN(crtc->base.enabled, "CRTC for pipe %c enabled\n", + WARN(crtc->active, "CRTC for pipe %c enabled\n", pipe_name(crtc->pipe)); WARN(I915_READ(HSW_PWR_WELL_DRIVER), "Power well on\n"); -- cgit v0.10.2 From be3d26b0588ca5f4db6b50d13e92afb0ac57d6ab Mon Sep 17 00:00:00 2001 From: Paulo Zanoni Date: Fri, 13 Dec 2013 17:46:38 -0200 Subject: drm/i915: get a PC8 reference when enabling the power well In the current code, at haswell_modeset_global_resources, first we decide if we want to enable/disable the power well, then we decide if we want to enable/disable PC8. On the case where we're enabling PC8 this works fine, but on the case where we disable PC8 due to a non-eDP monitor being enabled, we first enable the power well and then disable PC8. Although wrong, this doesn't seem to be causing any problems now, and we don't even see anything in dmesg. But the patches for runtime D3 turn this problem into a real bug, so we need to fix it. This fixes the "modeset-non-lpsp" subtest from the "pm_pc8" test from intel-gpu-tools. v2: - Rebase (i915_disable_power_well). v3: - More reabase. v4: - Rebase on top of -fixes instead of -nightly. This is commit d62292c8f778772d1b6ec125d461c8c16fdc0417 in -next, but we need it in -fixes to address Dave's report. Signed-off-by: Paulo Zanoni Reviewed-by: Rodrigo Vivi Reported-by: Dave Jones Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index 3657ab4..26c29c1 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -5688,6 +5688,8 @@ static void __intel_set_power_well(struct drm_device *dev, bool enable) unsigned long irqflags; uint32_t tmp; + WARN_ON(dev_priv->pc8.enabled); + tmp = I915_READ(HSW_PWR_WELL_DRIVER); is_enabled = tmp & HSW_PWR_WELL_STATE_ENABLED; enable_requested = tmp & HSW_PWR_WELL_ENABLE_REQUEST; @@ -5747,16 +5749,24 @@ static void __intel_set_power_well(struct drm_device *dev, bool enable) static void __intel_power_well_get(struct drm_device *dev, struct i915_power_well *power_well) { - if (!power_well->count++) + struct drm_i915_private *dev_priv = dev->dev_private; + + if (!power_well->count++) { + hsw_disable_package_c8(dev_priv); __intel_set_power_well(dev, true); + } } static void __intel_power_well_put(struct drm_device *dev, struct i915_power_well *power_well) { + struct drm_i915_private *dev_priv = dev->dev_private; + WARN_ON(!power_well->count); - if (!--power_well->count && i915_disable_power_well) + if (!--power_well->count && i915_disable_power_well) { __intel_set_power_well(dev, false); + hsw_enable_package_c8(dev_priv); + } } void intel_display_power_get(struct drm_device *dev, -- cgit v0.10.2 From cb12057256ec58b6f798855b27e120579d6c9aee Mon Sep 17 00:00:00 2001 From: Tomasz Figa Date: Fri, 13 Dec 2013 20:59:39 +0100 Subject: ARM: s3c64xx: dt: Fix boot failure due to double clock initialization Commit 4178bac ARM: call of_clk_init from default time_init handler added implicit call to of_clk_init() from default time_init callback, but it did not change platforms calling it from other callbacks, despite of not having custom time_init callbacks. This caused double clock initialization on such platforms, leading to boot failures. An example of such platform is mach-s3c64xx. This patch fixes boot failure on s3c64xx by dropping custom init_irq callback, which had a call to of_clk_init() and moving system reset initialization to init_machine callback. This allows us to have clocks initialized properly without a need to have custom init_time or init_irq callbacks. Signed-off-by: Tomasz Figa Signed-off-by: Olof Johansson diff --git a/arch/arm/mach-s3c64xx/mach-s3c64xx-dt.c b/arch/arm/mach-s3c64xx/mach-s3c64xx-dt.c index 7eb9a10..2fddf38 100644 --- a/arch/arm/mach-s3c64xx/mach-s3c64xx-dt.c +++ b/arch/arm/mach-s3c64xx/mach-s3c64xx-dt.c @@ -8,8 +8,6 @@ * published by the Free Software Foundation. */ -#include -#include #include #include @@ -48,15 +46,9 @@ static void __init s3c64xx_dt_map_io(void) panic("SoC is not S3C64xx!"); } -static void __init s3c64xx_dt_init_irq(void) -{ - of_clk_init(NULL); - samsung_wdt_reset_of_init(); - irqchip_init(); -}; - static void __init s3c64xx_dt_init_machine(void) { + samsung_wdt_reset_of_init(); of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL); } @@ -79,7 +71,6 @@ DT_MACHINE_START(S3C6400_DT, "Samsung S3C64xx (Flattened Device Tree)") /* Maintainer: Tomasz Figa */ .dt_compat = s3c64xx_dt_compat, .map_io = s3c64xx_dt_map_io, - .init_irq = s3c64xx_dt_init_irq, .init_machine = s3c64xx_dt_init_machine, .restart = s3c64xx_dt_restart, MACHINE_END -- cgit v0.10.2 From ce027ed98fd176710fb14be9d6015697b62436f0 Mon Sep 17 00:00:00 2001 From: Stefan Richter Date: Sun, 15 Dec 2013 16:18:01 +0100 Subject: firewire: sbp2: bring back WRITE SAME support Commit 54b2b50c20a6 "[SCSI] Disable WRITE SAME for RAID and virtual host adapter drivers" disabled WRITE SAME support for all SBP-2 attached targets. But as described in the changelog of commit b0ea5f19d3d8 "firewire: sbp2: allow WRITE SAME and REPORT SUPPORTED OPERATION CODES", it is not required to blacklist WRITE SAME. Bring the feature back by reverting the sbp2.c hunk of commit 54b2b50c20a6. Signed-off-by: Stefan Richter Cc: stable@kernel.org diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c index b0bb056..281029d 100644 --- a/drivers/firewire/sbp2.c +++ b/drivers/firewire/sbp2.c @@ -1623,7 +1623,6 @@ static struct scsi_host_template scsi_driver_template = { .cmd_per_lun = 1, .can_queue = 1, .sdev_attrs = sbp2_scsi_sysfs_attrs, - .no_write_same = 1, }; MODULE_AUTHOR("Kristian Hoegsberg "); -- cgit v0.10.2 From c00850dd6c517169734ec60eed99502c0912a7d3 Mon Sep 17 00:00:00 2001 From: Rashika Date: Sat, 14 Dec 2013 18:42:14 +0530 Subject: RDMA/cxgb4: Make _c4iw_write_mem_dma() static MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This patch marks the function _c4iw_write_mem_dma() as static because it is not used outside this file, which fixes the warning: drivers/infiniband/hw/cxgb4/mem.c:176:5: warning: no previous prototype for ‘_c4iw_write_mem_dma’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria Acked-by: Steve Wise Reviewed-by: Josh Triplett Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c index 4cb8eb2..84e4500 100644 --- a/drivers/infiniband/hw/cxgb4/mem.c +++ b/drivers/infiniband/hw/cxgb4/mem.c @@ -173,7 +173,7 @@ static int _c4iw_write_mem_inline(struct c4iw_rdev *rdev, u32 addr, u32 len, return ret; } -int _c4iw_write_mem_dma(struct c4iw_rdev *rdev, u32 addr, u32 len, void *data) +static int _c4iw_write_mem_dma(struct c4iw_rdev *rdev, u32 addr, u32 len, void *data) { u32 remain = len; u32 dmalen; -- cgit v0.10.2 From 128d6637cce0747e21b8936e449b1c78a497b189 Mon Sep 17 00:00:00 2001 From: Beomho Seo Date: Mon, 9 Dec 2013 02:17:00 +0000 Subject: iio: cm36651: Changed return value of read function A return value of callback have been changed to IIO_VAL_INT. If not IIO_VAL_INT, driver will print wrong value(*_read_int_time). A follow up patch will deal with a related bug in the new event handling code. Signed-off-by: Beomho Seo Signed-off-by: Jonathan Cameron diff --git a/drivers/iio/light/cm36651.c b/drivers/iio/light/cm36651.c index 21df571..0922e39 100644 --- a/drivers/iio/light/cm36651.c +++ b/drivers/iio/light/cm36651.c @@ -387,7 +387,7 @@ static int cm36651_read_int_time(struct cm36651_data *cm36651, return -EINVAL; } - return IIO_VAL_INT_PLUS_MICRO; + return IIO_VAL_INT; } static int cm36651_write_int_time(struct cm36651_data *cm36651, -- cgit v0.10.2 From 6b59ba609bb61e4fa2ecca7827f170ac07842d64 Mon Sep 17 00:00:00 2001 From: Steve Wise Date: Thu, 21 Nov 2013 15:40:14 -0600 Subject: RDMA/iwcm: Don't touch cm_id after deref in rem_ref rem_ref() calls iwcm_deref_id(), which will wake up any blockers on cm_id_priv->destroy_comp if the refcnt hits 0. That will unblock someone in iw_destroy_cm_id() which will free the cmid. If that happens before rem_ref() calls test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags), then the test_bit() will touch freed memory. The fix is to read the bit first, then deref. We should never be in iw_destroy_cm_id() with IWCM_F_CALLBACK_DESTROY set, and there is a BUG_ON() to make sure of that. Signed-off-by: Steve Wise Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/core/iwcm.c b/drivers/infiniband/core/iwcm.c index c47c203..0717940 100644 --- a/drivers/infiniband/core/iwcm.c +++ b/drivers/infiniband/core/iwcm.c @@ -181,9 +181,16 @@ static void add_ref(struct iw_cm_id *cm_id) static void rem_ref(struct iw_cm_id *cm_id) { struct iwcm_id_private *cm_id_priv; + int cb_destroy; + cm_id_priv = container_of(cm_id, struct iwcm_id_private, id); - if (iwcm_deref_id(cm_id_priv) && - test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags)) { + + /* + * Test bit before deref in case the cm_id gets freed on another + * thread. + */ + cb_destroy = test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags); + if (iwcm_deref_id(cm_id_priv) && cb_destroy) { BUG_ON(!list_empty(&cm_id_priv->work_list)); free_cm_id(cm_id_priv); } -- cgit v0.10.2 From bd02cd2549cfcdfc57cb5ce57ffc3feb94f70575 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Mon, 16 Dec 2013 12:04:36 +0100 Subject: radiotap: fix bitmap-end-finding buffer overrun Evan Huus found (by fuzzing in wireshark) that the radiotap iterator code can access beyond the length of the buffer if the first bitmap claims an extension but then there's no data at all. Fix this. Cc: stable@vger.kernel.org Reported-by: Evan Huus Signed-off-by: Johannes Berg diff --git a/net/wireless/radiotap.c b/net/wireless/radiotap.c index a271c27..722da61 100644 --- a/net/wireless/radiotap.c +++ b/net/wireless/radiotap.c @@ -124,6 +124,10 @@ int ieee80211_radiotap_iterator_init( /* find payload start allowing for extended bitmap(s) */ if (iterator->_bitmap_shifter & (1<_arg - + (unsigned long)iterator->_rtheader + sizeof(uint32_t) > + (unsigned long)iterator->_max_length) + return -EINVAL; while (get_unaligned_le32(iterator->_arg) & (1 << IEEE80211_RADIOTAP_EXT)) { iterator->_arg += sizeof(uint32_t); -- cgit v0.10.2 From 6fec88712cea016b1fc929fee53f67e3993194a6 Mon Sep 17 00:00:00 2001 From: Paul Bolle Date: Mon, 16 Dec 2013 11:34:21 +0100 Subject: ahci: bail out on ICH6 before using AHCI BAR The check for "combined mode" (which disables ahci support) on ICH6 is done after the first use of AHCI BAR. But if ahci is not enabled AHCI BAR is initialized to 0x00000000. (At least it is on the ICH6-M I tested this on. If I understand the datasheet correctly it should also be on ICH6R.) This apparently makes the call of pcim_iomap_regions_request_all() return -EINVAL. And we end up with ahci: probe of 0000:00:1f.2 failed with error -22 (at warning level) in the logs. So check for "combined mode" before calling pcim_iomap_regions_request_all(). Signed-off-by: Paul Bolle Signed-off-by: Tejun Heo diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 14f1e95..c0ed4f27 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -1238,15 +1238,6 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) if (rc) return rc; - /* AHCI controllers often implement SFF compatible interface. - * Grab all PCI BARs just in case. - */ - rc = pcim_iomap_regions_request_all(pdev, 1 << ahci_pci_bar, DRV_NAME); - if (rc == -EBUSY) - pcim_pin_device(pdev); - if (rc) - return rc; - if (pdev->vendor == PCI_VENDOR_ID_INTEL && (pdev->device == 0x2652 || pdev->device == 0x2653)) { u8 map; @@ -1263,6 +1254,15 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) } } + /* AHCI controllers often implement SFF compatible interface. + * Grab all PCI BARs just in case. + */ + rc = pcim_iomap_regions_request_all(pdev, 1 << ahci_pci_bar, DRV_NAME); + if (rc == -EBUSY) + pcim_pin_device(pdev); + if (rc) + return rc; + hpriv = devm_kzalloc(dev, sizeof(*hpriv), GFP_KERNEL); if (!hpriv) return -ENOMEM; -- cgit v0.10.2 From c4602c1c818bd6626178d6d3fcc152d9f2f48ac0 Mon Sep 17 00:00:00 2001 From: Miao Xie Date: Mon, 16 Dec 2013 15:20:01 +0800 Subject: ftrace: Initialize the ftrace profiler for each possible cpu Ftrace currently initializes only the online CPUs. This implementation has two problems: - If we online a CPU after we enable the function profile, and then run the test, we will lose the trace information on that CPU. Steps to reproduce: # echo 0 > /sys/devices/system/cpu/cpu1/online # cd /tracing/ # echo >> set_ftrace_filter # echo 1 > function_profile_enabled # echo 1 > /sys/devices/system/cpu/cpu1/online # run test - If we offline a CPU before we enable the function profile, we will not clear the trace information when we enable the function profile. It will trouble the users. Steps to reproduce: # cd /tracing/ # echo >> set_ftrace_filter # echo 1 > function_profile_enabled # run test # cat trace_stat/function* # echo 0 > /sys/devices/system/cpu/cpu1/online # echo 0 > function_profile_enabled # echo 1 > function_profile_enabled # cat trace_stat/function* # run test # cat trace_stat/function* So it is better that we initialize the ftrace profiler for each possible cpu every time we enable the function profile instead of just the online ones. Link: http://lkml.kernel.org/r/1387178401-10619-1-git-send-email-miaox@cn.fujitsu.com Cc: stable@vger.kernel.org # 2.6.31+ Signed-off-by: Miao Xie Signed-off-by: Steven Rostedt diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 0e9f9ea..72a0f81 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -775,7 +775,7 @@ static int ftrace_profile_init(void) int cpu; int ret = 0; - for_each_online_cpu(cpu) { + for_each_possible_cpu(cpu) { ret = ftrace_profile_init_cpu(cpu); if (ret) break; -- cgit v0.10.2 From b8bd6dc36186fe99afa7b73e9e2d9a98ad5c4865 Mon Sep 17 00:00:00 2001 From: "Robin H. Johnson" Date: Mon, 16 Dec 2013 09:31:19 -0800 Subject: libata: disable a disk via libata.force params A user on StackExchange had a failing SSD that's soldered directly onto the motherboard of his system. The BIOS does not give any option to disable it at all, so he can't just hide it from the OS via the BIOS. The old IDE layer had hdX=noprobe override for situations like this, but that was never ported to the libata layer. This patch implements a disable flag for libata.force. Example use: libata.force=2.0:disable [v2 of the patch, removed the nodisable flag per Tejun Heo] Signed-off-by: Robin H. Johnson Signed-off-by: Tejun Heo Cc: stable@vger.kernel.org Link: http://unix.stackexchange.com/questions/102648/how-to-tell-linux-kernel-3-0-to-completely-ignore-a-failing-disk Link: http://askubuntu.com/questions/352836/how-can-i-tell-linux-kernel-to-completely-ignore-a-disk-as-if-it-was-not-even-co Link: http://superuser.com/questions/599333/how-to-disable-kernel-probing-for-drive diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 50680a5..b9e9bd8 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1529,6 +1529,8 @@ bytes respectively. Such letter suffixes can also be entirely omitted. * atapi_dmadir: Enable ATAPI DMADIR bridge support + * disable: Disable this device. + If there are multiple matching configurations changing the same attribute, the last one is used. diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index dae73ef..ff01584 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -6522,6 +6522,7 @@ static int __init ata_parse_force_one(char **cur, { "norst", .lflags = ATA_LFLAG_NO_HRST | ATA_LFLAG_NO_SRST }, { "rstonce", .lflags = ATA_LFLAG_RST_ONCE }, { "atapi_dmadir", .horkage_on = ATA_HORKAGE_ATAPI_DMADIR }, + { "disable", .horkage_on = ATA_HORKAGE_DISABLE }, }; char *start = *cur, *p = *cur; char *id, *val, *endp; -- cgit v0.10.2 From 309243ec14fde1149e1c66f19746e239e86caf39 Mon Sep 17 00:00:00 2001 From: Yann Droneaud Date: Wed, 11 Dec 2013 23:01:44 +0100 Subject: IB/core: const'ify inbuf in struct ib_udata Userspace input buffer is not modified by kernel, so it can be 'const'. This is also a prerequisite to remove the implicit cast from INIT_UDATA(). Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com> Signed-off-by: Yann Droneaud Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h index bdc842e..9879568 100644 --- a/drivers/infiniband/core/uverbs.h +++ b/drivers/infiniband/core/uverbs.h @@ -49,7 +49,7 @@ #define INIT_UDATA(udata, ibuf, obuf, ilen, olen) \ do { \ - (udata)->inbuf = (void __user *) (ibuf); \ + (udata)->inbuf = (const void __user *) (ibuf); \ (udata)->outbuf = (void __user *) (obuf); \ (udata)->inlen = (ilen); \ (udata)->outlen = (olen); \ diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 979874c..61e1935 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -978,7 +978,7 @@ struct ib_uobject { }; struct ib_udata { - void __user *inbuf; + const void __user *inbuf; void __user *outbuf; size_t inlen; size_t outlen; -- cgit v0.10.2 From 317929cd8ef32121fb6574f12d02621f78f24732 Mon Sep 17 00:00:00 2001 From: Santosh Shilimkar Date: Sat, 23 Nov 2013 17:31:27 -0500 Subject: MAINTAINERS: Add keystone git tree information Update the Keystone entry to add git tree information. Signed-off-by: Santosh Shilimkar diff --git a/MAINTAINERS b/MAINTAINERS index 8285ed4..25b945b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1013,6 +1013,7 @@ M: Santosh Shilimkar L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers) S: Maintained F: arch/arm/mach-keystone/ +T: git git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone.git ARM/LOGICPD PXA270 MACHINE SUPPORT M: Lennert Buytenhek -- cgit v0.10.2 From cffa8e3b5c7dcf7617268561397eb236651446f8 Mon Sep 17 00:00:00 2001 From: Santosh Shilimkar Date: Sat, 23 Nov 2013 17:35:23 -0500 Subject: MAINTAINERS: Add keystone clock drivers Cc: Mike Turquette Signed-off-by: Santosh Shilimkar diff --git a/MAINTAINERS b/MAINTAINERS index 25b945b..249651f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1013,6 +1013,7 @@ M: Santosh Shilimkar L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers) S: Maintained F: arch/arm/mach-keystone/ +F: drivers/clk/keystone/ T: git git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux-keystone.git ARM/LOGICPD PXA270 MACHINE SUPPORT -- cgit v0.10.2 From f665c0f852316ff99e9eb7f71f34d43003f8e139 Mon Sep 17 00:00:00 2001 From: Stefan Priebe Date: Sat, 16 Nov 2013 21:26:52 +0100 Subject: bcache: kthread don't set writeback task to INTERUPTIBLE at the beginning (schedule_timout_interuptible) and others do his on their own This prevents wrong load average calculation (load of 1 per thread) Signed-off-by: Kent Overstreet diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index 99053b1..484e57d 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -500,8 +500,6 @@ int bch_cached_dev_writeback_init(struct cached_dev *dc) if (IS_ERR(dc->writeback_thread)) return PTR_ERR(dc->writeback_thread); - set_task_state(dc->writeback_thread, TASK_INTERRUPTIBLE); - INIT_DELAYED_WORK(&dc->writeback_rate_update, update_writeback_rate); schedule_delayed_work(&dc->writeback_rate_update, dc->writeback_rate_update_seconds * HZ); -- cgit v0.10.2 From ce2b3f595e1c56639085645e0130426e443008c0 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Thu, 28 Nov 2013 17:28:37 -0800 Subject: bcache: Use uninterruptible sleep in writeback We're just waiting on kthread_should_stop(), nothing else, so interruptible sleep was wrong here. Signed-off-by: Kent Overstreet diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index 484e57d..3cd931d 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -241,7 +241,7 @@ static void read_dirty(struct cached_dev *dc) if (KEY_START(&w->key) != dc->last_read || jiffies_to_msecs(delay) > 50) while (!kthread_should_stop() && delay) - delay = schedule_timeout_interruptible(delay); + delay = schedule_timeout_uninterruptible(delay); dc->last_read = KEY_OFFSET(&w->key); @@ -438,7 +438,7 @@ static int bch_writeback_thread(void *arg) while (delay && !kthread_should_stop() && !test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags)) - delay = schedule_timeout_interruptible(delay); + delay = schedule_timeout_uninterruptible(delay); } } -- cgit v0.10.2 From d24a6e1087030b6da286df9433add5fa2f21b83b Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Sun, 10 Nov 2013 21:55:27 -0800 Subject: bcache: Fix dirty_data accounting Dirty data accounting wasn't quite right - firstly, we were adding the key we're inserting after it could have merged with another dirty key already in the btree, and secondly we could sometimes pass the wrong offset to bcache_dev_sectors_dirty_add() for dirty data we were overwriting - which is important when tracking dirty data by stripe. NOTE FOR BACKPORTERS: For 3.10 (and 3.11?) there's other accounting fixes necessary that got squashed in with other patches; the full patch against 3.10 is 408cc2f47eeac93a, available at: git://evilpiepirate.org/~kent/linux-bcache.git bcache-3.10-writeback-fixes Signed-off-by: Kent Overstreet Cc: linux-stable # >= v3.10 diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 2a46036..4a12b2f 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -1817,7 +1817,8 @@ static bool fix_overlapping_extents(struct btree *b, struct bkey *insert, if (KEY_START(k) > KEY_START(insert) + sectors_found) goto check_failed; - if (KEY_PTRS(replace_key) != KEY_PTRS(k)) + if (KEY_PTRS(k) != KEY_PTRS(replace_key) || + KEY_DIRTY(k) != KEY_DIRTY(replace_key)) goto check_failed; /* skip past gen */ diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index dc6e2be..c36745e 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -1817,7 +1817,8 @@ static bool fix_overlapping_extents(struct btree *b, struct bkey *insert, if (KEY_START(k) > KEY_START(insert) + sectors_found) goto check_failed; - if (KEY_PTRS(replace_key) != KEY_PTRS(k)) + if (KEY_PTRS(k) != KEY_PTRS(replace_key) || + KEY_DIRTY(k) != KEY_DIRTY(replace_key)) goto check_failed; /* skip past gen */ -- cgit v0.10.2 From 9eb8ebeb2471b8cbaa078066aeb85fe5d46f5304 Mon Sep 17 00:00:00 2001 From: Nicholas Swenson Date: Tue, 22 Oct 2013 13:19:23 -0700 Subject: bcache: Fix for can_attach_cache() Signed-off-by: Nicholas Swenson Signed-off-by: Kent Overstreet diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index dec15cd..c57bfa0 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1676,7 +1676,7 @@ err: static bool can_attach_cache(struct cache *ca, struct cache_set *c) { return ca->sb.block_size == c->sb.block_size && - ca->sb.bucket_size == c->sb.block_size && + ca->sb.bucket_size == c->sb.bucket_size && ca->sb.nr_in_set == c->sb.nr_in_set; } -- cgit v0.10.2 From 97d11a660fd906dbea3dccd2638495d8497c3c81 Mon Sep 17 00:00:00 2001 From: Nicholas Swenson Date: Wed, 23 Oct 2013 17:35:26 -0700 Subject: bcache: Fix heap_peek() macro Signed-off-by: Nicholas Swenson Signed-off-by: Kent Overstreet diff --git a/drivers/md/bcache/util.h b/drivers/md/bcache/util.h index 362c4b3..1030c60 100644 --- a/drivers/md/bcache/util.h +++ b/drivers/md/bcache/util.h @@ -110,7 +110,7 @@ do { \ _r; \ }) -#define heap_peek(h) ((h)->size ? (h)->data[0] : NULL) +#define heap_peek(h) ((h)->used ? (h)->data[0] : NULL) #define heap_full(h) ((h)->used == (h)->size) -- cgit v0.10.2 From bee63f40cb5f5e8ab2abfbc85acde99cc0acd4b5 Mon Sep 17 00:00:00 2001 From: Nicholas Swenson Date: Thu, 31 Oct 2013 19:25:18 -0700 Subject: bcache: fix for gc crashing when no sectors are used Signed-off-by: Nicholas Swenson Signed-off-by: Kent Overstreet diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c index 7c1275e..46c9523 100644 --- a/drivers/md/bcache/movinggc.c +++ b/drivers/md/bcache/movinggc.c @@ -184,7 +184,8 @@ static bool bucket_cmp(struct bucket *l, struct bucket *r) static unsigned bucket_heap_top(struct cache *ca) { - return GC_SECTORS_USED(heap_peek(&ca->heap)); + struct bucket *b; + return (b = heap_peek(&ca->heap)) ? GC_SECTORS_USED(b) : 0; } void bch_moving_gc(struct cache_set *c) -- cgit v0.10.2 From 981aa8c091e164ea51dd1e81b71a1f3852bbcceb Mon Sep 17 00:00:00 2001 From: Nicholas Swenson Date: Thu, 7 Nov 2013 17:53:19 -0800 Subject: bcache: bugfix - moving_gc now moves only correct buckets Removed gc_move_threshold because picking buckets only by threshold could lead moving extra buckets (ei. if there are buckets at the threshold that aren't supposed to be moved do to space considerations). This is replaced by a GC_MOVE bit in the gc_mark bitmask. Now only marked buckets get moved. Signed-off-by: Nicholas Swenson Signed-off-by: Kent Overstreet diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c index 2b46bf1..4c9852d 100644 --- a/drivers/md/bcache/alloc.c +++ b/drivers/md/bcache/alloc.c @@ -421,9 +421,11 @@ out: if (watermark <= WATERMARK_METADATA) { SET_GC_MARK(b, GC_MARK_METADATA); + SET_GC_MOVE(b, 0); b->prio = BTREE_PRIO; } else { SET_GC_MARK(b, GC_MARK_RECLAIMABLE); + SET_GC_MOVE(b, 0); b->prio = INITIAL_PRIO; } diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index 4beb55a..a7b1a76 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -197,7 +197,7 @@ struct bucket { uint8_t disk_gen; uint8_t last_gc; /* Most out of date gen in the btree */ uint8_t gc_gen; - uint16_t gc_mark; + uint16_t gc_mark; /* Bitfield used by GC. See below for field */ }; /* @@ -209,7 +209,8 @@ BITMASK(GC_MARK, struct bucket, gc_mark, 0, 2); #define GC_MARK_RECLAIMABLE 0 #define GC_MARK_DIRTY 1 #define GC_MARK_METADATA 2 -BITMASK(GC_SECTORS_USED, struct bucket, gc_mark, 2, 14); +BITMASK(GC_SECTORS_USED, struct bucket, gc_mark, 2, 13); +BITMASK(GC_MOVE, struct bucket, gc_mark, 15, 1); #include "journal.h" #include "stats.h" @@ -445,7 +446,6 @@ struct cache { * call prio_write() to keep gens from wrapping. */ uint8_t need_save_prio; - unsigned gc_move_threshold; /* * If nonzero, we know we aren't going to find any buckets to invalidate diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c index 46c9523..30f347d 100644 --- a/drivers/md/bcache/movinggc.c +++ b/drivers/md/bcache/movinggc.c @@ -25,10 +25,9 @@ static bool moving_pred(struct keybuf *buf, struct bkey *k) unsigned i; for (i = 0; i < KEY_PTRS(k); i++) { - struct cache *ca = PTR_CACHE(c, k, i); struct bucket *g = PTR_BUCKET(c, k, i); - if (GC_SECTORS_USED(g) < ca->gc_move_threshold) + if (GC_MOVE(g)) return true; } @@ -227,9 +226,8 @@ void bch_moving_gc(struct cache_set *c) sectors_to_move -= GC_SECTORS_USED(b); } - ca->gc_move_threshold = bucket_heap_top(ca); - - pr_debug("threshold %u", ca->gc_move_threshold); + while (heap_pop(&ca->heap, b, bucket_cmp)) + SET_GC_MOVE(b, 1); } mutex_unlock(&c->bucket_lock); -- cgit v0.10.2 From bf0a628a95dba7f983b6047cea695fb066fb2512 Mon Sep 17 00:00:00 2001 From: Nicholas Swenson Date: Tue, 26 Nov 2013 19:14:23 -0800 Subject: bcache: fix for gc and writeback race Garbage collector needs to check keys in the writeback keybuf to make sure it's not invalidating buckets to which the writeback keys point to. Signed-off-by: Nicholas Swenson Signed-off-by: Kent Overstreet diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index c36745e..31bb53f 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -1561,6 +1561,28 @@ size_t bch_btree_gc_finish(struct cache_set *c) SET_GC_MARK(PTR_BUCKET(c, &c->uuid_bucket, i), GC_MARK_METADATA); + /* don't reclaim buckets to which writeback keys point */ + rcu_read_lock(); + for (i = 0; i < c->nr_uuids; i++) { + struct bcache_device *d = c->devices[i]; + struct cached_dev *dc; + struct keybuf_key *w, *n; + unsigned j; + + if (!d || UUID_FLASH_ONLY(&c->uuids[i])) + continue; + dc = container_of(d, struct cached_dev, disk); + + spin_lock(&dc->writeback_keys.lock); + rbtree_postorder_for_each_entry_safe(w, n, + &dc->writeback_keys.keys, node) + for (j = 0; j < KEY_PTRS(&w->key); j++) + SET_GC_MARK(PTR_BUCKET(c, &w->key, j), + GC_MARK_DIRTY); + spin_unlock(&dc->writeback_keys.lock); + } + rcu_read_unlock(); + for_each_cache(ca, c, i) { uint64_t *i; -- cgit v0.10.2 From 6d3d1a9c542b19dff1c7d7c8354d0869e4655287 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Mon, 16 Dec 2013 14:12:09 -0800 Subject: bcache: bugfix for race between moving_gc and bucket_invalidate There is a possibility for a bucket to be invalidated by the allocator while moving_gc was copying it's contents to another bucket, if the bucket only held cached data. To prevent this moving checks for a stale ptr (to an invalidated bucket), before and after reads. It it finds one, it simply ignores moving that data. This only affects bcache if the moving_gc was turned on, note that it's off by default. Signed-off-by: Nicholas Swenson Signed-off-by: Kent Overstreet diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c index 30f347d..f2f0998 100644 --- a/drivers/md/bcache/movinggc.c +++ b/drivers/md/bcache/movinggc.c @@ -64,11 +64,16 @@ static void write_moving_finish(struct closure *cl) static void read_moving_endio(struct bio *bio, int error) { + struct bbio *b = container_of(bio, struct bbio, bio); struct moving_io *io = container_of(bio->bi_private, struct moving_io, cl); if (error) io->op.error = error; + else if (!KEY_DIRTY(&b->key) && + ptr_stale(io->op.c, &b->key, 0)) { + io->op.error = -EINTR; + } bch_bbio_endio(io->op.c, bio, error, "reading data to move"); } @@ -140,6 +145,11 @@ static void read_moving(struct cache_set *c) if (!w) break; + if (ptr_stale(c, &w->key, 0)) { + bch_keybuf_del(&c->moving_gc_keys, w); + continue; + } + io = kzalloc(sizeof(struct moving_io) + sizeof(struct bio_vec) * DIV_ROUND_UP(KEY_SIZE(&w->key), PAGE_SECTORS), GFP_KERNEL); -- cgit v0.10.2 From 16749c23c00c686ed168471963e3ddb0f3fcd855 Mon Sep 17 00:00:00 2001 From: Kent Overstreet Date: Mon, 11 Nov 2013 13:58:34 -0800 Subject: bcache: New writeback PD controller The old writeback PD controller could get into states where it had throttled all the way down and take way too long to recover - it was too complicated to really understand what it was doing. This rewrites a good chunk of it to hopefully be simpler and make more sense, and it also pays more attention to units which should make the behaviour a bit easier to understand. Signed-off-by: Kent Overstreet diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h index a7b1a76..754f4317 100644 --- a/drivers/md/bcache/bcache.h +++ b/drivers/md/bcache/bcache.h @@ -373,14 +373,14 @@ struct cached_dev { unsigned char writeback_percent; unsigned writeback_delay; - int writeback_rate_change; - int64_t writeback_rate_derivative; uint64_t writeback_rate_target; + int64_t writeback_rate_proportional; + int64_t writeback_rate_derivative; + int64_t writeback_rate_change; unsigned writeback_rate_update_seconds; unsigned writeback_rate_d_term; unsigned writeback_rate_p_term_inverse; - unsigned writeback_rate_d_smooth; }; enum alloc_watermarks { diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c index 80d4c2b..a1f8561 100644 --- a/drivers/md/bcache/sysfs.c +++ b/drivers/md/bcache/sysfs.c @@ -83,7 +83,6 @@ rw_attribute(writeback_rate); rw_attribute(writeback_rate_update_seconds); rw_attribute(writeback_rate_d_term); rw_attribute(writeback_rate_p_term_inverse); -rw_attribute(writeback_rate_d_smooth); read_attribute(writeback_rate_debug); read_attribute(stripe_size); @@ -129,31 +128,41 @@ SHOW(__bch_cached_dev) var_printf(writeback_running, "%i"); var_print(writeback_delay); var_print(writeback_percent); - sysfs_print(writeback_rate, dc->writeback_rate.rate); + sysfs_hprint(writeback_rate, dc->writeback_rate.rate << 9); var_print(writeback_rate_update_seconds); var_print(writeback_rate_d_term); var_print(writeback_rate_p_term_inverse); - var_print(writeback_rate_d_smooth); if (attr == &sysfs_writeback_rate_debug) { + char rate[20]; char dirty[20]; - char derivative[20]; char target[20]; - bch_hprint(dirty, - bcache_dev_sectors_dirty(&dc->disk) << 9); - bch_hprint(derivative, dc->writeback_rate_derivative << 9); + char proportional[20]; + char derivative[20]; + char change[20]; + s64 next_io; + + bch_hprint(rate, dc->writeback_rate.rate << 9); + bch_hprint(dirty, bcache_dev_sectors_dirty(&dc->disk) << 9); bch_hprint(target, dc->writeback_rate_target << 9); + bch_hprint(proportional,dc->writeback_rate_proportional << 9); + bch_hprint(derivative, dc->writeback_rate_derivative << 9); + bch_hprint(change, dc->writeback_rate_change << 9); + + next_io = div64_s64(dc->writeback_rate.next - local_clock(), + NSEC_PER_MSEC); return sprintf(buf, - "rate:\t\t%u\n" - "change:\t\t%i\n" + "rate:\t\t%s/sec\n" "dirty:\t\t%s\n" + "target:\t\t%s\n" + "proportional:\t%s\n" "derivative:\t%s\n" - "target:\t\t%s\n", - dc->writeback_rate.rate, - dc->writeback_rate_change, - dirty, derivative, target); + "change:\t\t%s/sec\n" + "next io:\t%llims\n", + rate, dirty, target, proportional, + derivative, change, next_io); } sysfs_hprint(dirty_data, @@ -189,6 +198,7 @@ STORE(__cached_dev) struct kobj_uevent_env *env; #define d_strtoul(var) sysfs_strtoul(var, dc->var) +#define d_strtoul_nonzero(var) sysfs_strtoul_clamp(var, dc->var, 1, INT_MAX) #define d_strtoi_h(var) sysfs_hatoi(var, dc->var) sysfs_strtoul(data_csum, dc->disk.data_csum); @@ -197,16 +207,15 @@ STORE(__cached_dev) d_strtoul(writeback_metadata); d_strtoul(writeback_running); d_strtoul(writeback_delay); - sysfs_strtoul_clamp(writeback_rate, - dc->writeback_rate.rate, 1, 1000000); + sysfs_strtoul_clamp(writeback_percent, dc->writeback_percent, 0, 40); - d_strtoul(writeback_rate_update_seconds); + sysfs_strtoul_clamp(writeback_rate, + dc->writeback_rate.rate, 1, INT_MAX); + + d_strtoul_nonzero(writeback_rate_update_seconds); d_strtoul(writeback_rate_d_term); - d_strtoul(writeback_rate_p_term_inverse); - sysfs_strtoul_clamp(writeback_rate_p_term_inverse, - dc->writeback_rate_p_term_inverse, 1, INT_MAX); - d_strtoul(writeback_rate_d_smooth); + d_strtoul_nonzero(writeback_rate_p_term_inverse); d_strtoi_h(sequential_cutoff); d_strtoi_h(readahead); @@ -313,7 +322,6 @@ static struct attribute *bch_cached_dev_files[] = { &sysfs_writeback_rate_update_seconds, &sysfs_writeback_rate_d_term, &sysfs_writeback_rate_p_term_inverse, - &sysfs_writeback_rate_d_smooth, &sysfs_writeback_rate_debug, &sysfs_dirty_data, &sysfs_stripe_size, diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c index 462214e..bb37618 100644 --- a/drivers/md/bcache/util.c +++ b/drivers/md/bcache/util.c @@ -209,7 +209,13 @@ uint64_t bch_next_delay(struct bch_ratelimit *d, uint64_t done) { uint64_t now = local_clock(); - d->next += div_u64(done, d->rate); + d->next += div_u64(done * NSEC_PER_SEC, d->rate); + + if (time_before64(now + NSEC_PER_SEC, d->next)) + d->next = now + NSEC_PER_SEC; + + if (time_after64(now - NSEC_PER_SEC * 2, d->next)) + d->next = now - NSEC_PER_SEC * 2; return time_after64(d->next, now) ? div_u64(d->next - now, NSEC_PER_SEC / HZ) diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index 3cd931d..6c44fe0 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -30,38 +30,40 @@ static void __update_writeback_rate(struct cached_dev *dc) /* PD controller */ - int change = 0; - int64_t error; int64_t dirty = bcache_dev_sectors_dirty(&dc->disk); int64_t derivative = dirty - dc->disk.sectors_dirty_last; + int64_t proportional = dirty - target; + int64_t change; dc->disk.sectors_dirty_last = dirty; - derivative *= dc->writeback_rate_d_term; - derivative = clamp(derivative, -dirty, dirty); + /* Scale to sectors per second */ - derivative = ewma_add(dc->disk.sectors_dirty_derivative, derivative, - dc->writeback_rate_d_smooth, 0); + proportional *= dc->writeback_rate_update_seconds; + proportional = div_s64(proportional, dc->writeback_rate_p_term_inverse); - /* Avoid divide by zero */ - if (!target) - goto out; + derivative = div_s64(derivative, dc->writeback_rate_update_seconds); - error = div64_s64((dirty + derivative - target) << 8, target); + derivative = ewma_add(dc->disk.sectors_dirty_derivative, derivative, + (dc->writeback_rate_d_term / + dc->writeback_rate_update_seconds) ?: 1, 0); + + derivative *= dc->writeback_rate_d_term; + derivative = div_s64(derivative, dc->writeback_rate_p_term_inverse); - change = div_s64((dc->writeback_rate.rate * error) >> 8, - dc->writeback_rate_p_term_inverse); + change = proportional + derivative; /* Don't increase writeback rate if the device isn't keeping up */ if (change > 0 && time_after64(local_clock(), - dc->writeback_rate.next + 10 * NSEC_PER_MSEC)) + dc->writeback_rate.next + NSEC_PER_MSEC)) change = 0; dc->writeback_rate.rate = - clamp_t(int64_t, dc->writeback_rate.rate + change, + clamp_t(int64_t, (int64_t) dc->writeback_rate.rate + change, 1, NSEC_PER_MSEC); -out: + + dc->writeback_rate_proportional = proportional; dc->writeback_rate_derivative = derivative; dc->writeback_rate_change = change; dc->writeback_rate_target = target; @@ -87,15 +89,11 @@ static void update_writeback_rate(struct work_struct *work) static unsigned writeback_delay(struct cached_dev *dc, unsigned sectors) { - uint64_t ret; - if (test_bit(BCACHE_DEV_DETACHING, &dc->disk.flags) || !dc->writeback_percent) return 0; - ret = bch_next_delay(&dc->writeback_rate, sectors * 10000000ULL); - - return min_t(uint64_t, ret, HZ); + return bch_next_delay(&dc->writeback_rate, sectors); } struct dirty_io { @@ -476,6 +474,8 @@ void bch_sectors_dirty_init(struct cached_dev *dc) bch_btree_map_keys(&op.op, dc->disk.c, &KEY(op.inode, 0, 0), sectors_dirty_init_fn, 0); + + dc->disk.sectors_dirty_last = bcache_dev_sectors_dirty(&dc->disk); } int bch_cached_dev_writeback_init(struct cached_dev *dc) @@ -490,10 +490,9 @@ int bch_cached_dev_writeback_init(struct cached_dev *dc) dc->writeback_delay = 30; dc->writeback_rate.rate = 1024; - dc->writeback_rate_update_seconds = 30; - dc->writeback_rate_d_term = 16; - dc->writeback_rate_p_term_inverse = 64; - dc->writeback_rate_d_smooth = 8; + dc->writeback_rate_update_seconds = 5; + dc->writeback_rate_d_term = 30; + dc->writeback_rate_p_term_inverse = 6000; dc->writeback_thread = kthread_create(bch_writeback_thread, dc, "bcache_writeback"); -- cgit v0.10.2 From cf872776fc84128bb779ce2b83a37c884c3203ae Mon Sep 17 00:00:00 2001 From: Peter Hurley Date: Wed, 11 Dec 2013 21:11:58 -0500 Subject: tty: Fix hang at ldsem_down_read() When a controlling tty is being hung up and the hang up is waiting for a just-signalled tty reader or writer to exit, and a new tty reader/writer tries to acquire an ldisc reference concurrently with the ldisc reference release from the signalled reader/writer, the hangup can hang. The new reader/writer is sleeping in ldsem_down_read() and the hangup is sleeping in ldsem_down_write() [1]. The new reader/writer fails to wakeup the waiting hangup because the wrong lock count value is checked (the old lock count rather than the new lock count) to see if the lock is unowned. Change helper function to return the new lock count if the cmpxchg was successful; document this behavior. [1] edited dmesg log from reporter SysRq : Show Blocked State task PC stack pid father systemd D ffff88040c4f0000 0 1 0 0x00000000 ffff88040c49fbe0 0000000000000046 ffff88040c4a0000 ffff88040c49ffd8 00000000001d3980 00000000001d3980 ffff88040c4a0000 ffff88040593d840 ffff88040c49fb40 ffffffff810a4cc0 0000000000000006 0000000000000023 Call Trace: [] ? sched_clock_cpu+0x9f/0xe4 [] ? sched_clock_cpu+0x9f/0xe4 [] ? sched_clock_cpu+0x9f/0xe4 [] ? sched_clock_cpu+0x9f/0xe4 [] schedule+0x24/0x5e [] schedule_timeout+0x15b/0x1ec [] ? sched_clock_cpu+0x9f/0xe4 [] ? _raw_spin_unlock_irq+0x24/0x26 [] down_read_failed+0xe3/0x1b9 [] ldsem_down_read+0x8b/0xa5 [] ? tty_ldisc_ref_wait+0x1b/0x44 [] tty_ldisc_ref_wait+0x1b/0x44 [] tty_write+0x7d/0x28a [] redirected_tty_write+0x8d/0x98 [] ? tty_write+0x28a/0x28a [] do_loop_readv_writev+0x56/0x79 [] do_readv_writev+0x1b0/0x1ff [] ? do_vfs_ioctl+0x32a/0x489 [] ? final_putname+0x1d/0x3a [] vfs_writev+0x2e/0x49 [] SyS_writev+0x47/0xaa [] system_call_fastpath+0x16/0x1b bash D ffffffff81c104c0 0 5469 5302 0x00000082 ffff8800cf817ac0 0000000000000046 ffff8804086b22a0 ffff8800cf817fd8 00000000001d3980 00000000001d3980 ffff8804086b22a0 ffff8800cf817a48 000000000000b9a0 ffff8800cf817a78 ffffffff81004675 ffff8800cf817a44 Call Trace: [] ? dump_trace+0x165/0x29c [] ? sched_clock_cpu+0x9f/0xe4 [] ? save_stack_trace+0x26/0x41 [] schedule+0x24/0x5e [] schedule_timeout+0x15b/0x1ec [] ? sched_clock_cpu+0x9f/0xe4 [] ? down_write_failed+0xa3/0x1c9 [] ? _raw_spin_unlock_irq+0x24/0x26 [] down_write_failed+0xab/0x1c9 [] ldsem_down_write+0x79/0xb1 [] ? tty_ldisc_lock_pair_timeout+0xa5/0xd9 [] tty_ldisc_lock_pair_timeout+0xa5/0xd9 [] tty_ldisc_hangup+0xc4/0x218 [] __tty_hangup+0x2e2/0x3ed [] disassociate_ctty+0x63/0x226 [] do_exit+0x79f/0xa11 [] ? get_signal_to_deliver+0x206/0x62f [] ? lock_release_holdtime.part.8+0xf/0x16e [] do_group_exit+0x47/0xb5 [] get_signal_to_deliver+0x241/0x62f [] do_signal+0x43/0x59d [] ? __audit_syscall_exit+0x21a/0x2a8 [] ? lock_release_holdtime.part.8+0xf/0x16e [] do_notify_resume+0x54/0x6c [] int_signal+0x12/0x17 Reported-by: Sami Farin Cc: # 3.12.x Signed-off-by: Peter Hurley Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c index 22fad8a..d8a55e8 100644 --- a/drivers/tty/tty_ldsem.c +++ b/drivers/tty/tty_ldsem.c @@ -86,11 +86,21 @@ static inline long ldsem_atomic_update(long delta, struct ld_semaphore *sem) return atomic_long_add_return(delta, (atomic_long_t *)&sem->count); } +/* + * ldsem_cmpxchg() updates @*old with the last-known sem->count value. + * Returns 1 if count was successfully changed; @*old will have @new value. + * Returns 0 if count was not changed; @*old will have most recent sem->count + */ static inline int ldsem_cmpxchg(long *old, long new, struct ld_semaphore *sem) { - long tmp = *old; - *old = atomic_long_cmpxchg(&sem->count, *old, new); - return *old == tmp; + long tmp = atomic_long_cmpxchg(&sem->count, *old, new); + if (tmp == *old) { + *old = new; + return 1; + } else { + *old = tmp; + return 0; + } } /* -- cgit v0.10.2 From 6979f8d28049879e6147767d93ba6732c8bd94f4 Mon Sep 17 00:00:00 2001 From: James Hogan Date: Tue, 10 Dec 2013 22:28:04 +0000 Subject: serial: 8250_dw: Fix LCR workaround regression Commit c49436b657d0 (serial: 8250_dw: Improve unwritable LCR workaround) caused a regression. It added a check that the LCR was written properly to detect and workaround the busy quirk, but the behaviour of bit 5 (UART_LCR_SPAR) differs between IP versions 3.00a and 3.14c per the docs. On older versions this caused the check to fail and it would repeatedly force idle and rewrite the LCR register, causing delays and preventing any input from serial being received. This is fixed by masking out UART_LCR_SPAR before making the comparison. Signed-off-by: James Hogan Cc: Greg Kroah-Hartman Cc: Jiri Slaby Cc: Tim Kryger Cc: Ezequiel Garcia Cc: Matt Porter Cc: Markus Mayer Tested-by: Tim Kryger Tested-by: Ezequiel Garcia Tested-by: Heikki Krogerus Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/tty/serial/8250/8250_dw.c b/drivers/tty/serial/8250/8250_dw.c index 4658e3e..5f096c7 100644 --- a/drivers/tty/serial/8250/8250_dw.c +++ b/drivers/tty/serial/8250/8250_dw.c @@ -96,7 +96,8 @@ static void dw8250_serial_out(struct uart_port *p, int offset, int value) if (offset == UART_LCR) { int tries = 1000; while (tries--) { - if (value == p->serial_in(p, UART_LCR)) + unsigned int lcr = p->serial_in(p, UART_LCR); + if ((value & ~UART_LCR_SPAR) == (lcr & ~UART_LCR_SPAR)) return; dw8250_force_idle(p); writeb(value, p->membase + (UART_LCR << p->regshift)); @@ -132,7 +133,8 @@ static void dw8250_serial_out32(struct uart_port *p, int offset, int value) if (offset == UART_LCR) { int tries = 1000; while (tries--) { - if (value == p->serial_in(p, UART_LCR)) + unsigned int lcr = p->serial_in(p, UART_LCR); + if ((value & ~UART_LCR_SPAR) == (lcr & ~UART_LCR_SPAR)) return; dw8250_force_idle(p); writel(value, p->membase + (UART_LCR << p->regshift)); -- cgit v0.10.2 From 7cd0c298f6e09476f474b7ae6c1ca876c5d7f881 Mon Sep 17 00:00:00 2001 From: Felipe Balbi Date: Tue, 10 Dec 2013 17:00:06 -0600 Subject: usb: phy: fix driver dependencies both isp1301-omap and fsl_usb2_otg drivers depend on usb_bus_start_enum() which is only defined if CONFIG_USB != n. There is a problem, however, where both those drivers could be statically linked, while CONFIG_USB=m. Fix the problem by fixing driver dependency. Signed-off-by: Felipe Balbi Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/usb/phy/Kconfig b/drivers/usb/phy/Kconfig index 08e2f39..2b41c63 100644 --- a/drivers/usb/phy/Kconfig +++ b/drivers/usb/phy/Kconfig @@ -19,8 +19,9 @@ config AB8500_USB in host mode, low speed. config FSL_USB2_OTG - bool "Freescale USB OTG Transceiver Driver" + tristate "Freescale USB OTG Transceiver Driver" depends on USB_EHCI_FSL && USB_FSL_USB2 && PM_RUNTIME + depends on USB select USB_OTG select USB_PHY help @@ -29,6 +30,7 @@ config FSL_USB2_OTG config ISP1301_OMAP tristate "Philips ISP1301 with OMAP OTG" depends on I2C && ARCH_OMAP_OTG + depends on USB select USB_PHY help If you say yes here you get support for the Philips ISP1301 -- cgit v0.10.2 From 1bc5ad168f441f6f8bfd944288a5f7b4963ac1f6 Mon Sep 17 00:00:00 2001 From: Marcel Holtmann Date: Tue, 17 Dec 2013 03:21:25 -0800 Subject: Bluetooth: Fix HCI User Channel permission check in hci_sock_sendmsg The HCI User Channel is an admin operation which enforces CAP_NET_ADMIN when binding the socket. Problem now is that it then requires also CAP_NET_RAW when calling into hci_sock_sendmsg. This is not intended and just an oversight since general HCI sockets (which do not require special permission to bind) and HCI User Channel share the same code path here. Remove the extra CAP_NET_RAW check for HCI User Channel write operation since the permission check has already been enforced when binding the socket. This also makes it possible to open HCI User Channel from a privileged process and then hand the file descriptor to an unprivilged process. Signed-off-by: Marcel Holtmann Tested-by: Samuel Ortiz Signed-off-by: Johan Hedberg diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c index 71f0be1..73bf644 100644 --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -942,8 +942,22 @@ static int hci_sock_sendmsg(struct kiocb *iocb, struct socket *sock, bt_cb(skb)->pkt_type = *((unsigned char *) skb->data); skb_pull(skb, 1); - if (hci_pi(sk)->channel == HCI_CHANNEL_RAW && - bt_cb(skb)->pkt_type == HCI_COMMAND_PKT) { + if (hci_pi(sk)->channel == HCI_CHANNEL_USER) { + /* No permission check is needed for user channel + * since that gets enforced when binding the socket. + * + * However check that the packet type is valid. + */ + if (bt_cb(skb)->pkt_type != HCI_COMMAND_PKT && + bt_cb(skb)->pkt_type != HCI_ACLDATA_PKT && + bt_cb(skb)->pkt_type != HCI_SCODATA_PKT) { + err = -EINVAL; + goto drop; + } + + skb_queue_tail(&hdev->raw_q, skb); + queue_work(hdev->workqueue, &hdev->tx_work); + } else if (bt_cb(skb)->pkt_type == HCI_COMMAND_PKT) { u16 opcode = get_unaligned_le16(skb->data); u16 ogf = hci_opcode_ogf(opcode); u16 ocf = hci_opcode_ocf(opcode); @@ -974,14 +988,6 @@ static int hci_sock_sendmsg(struct kiocb *iocb, struct socket *sock, goto drop; } - if (hci_pi(sk)->channel == HCI_CHANNEL_USER && - bt_cb(skb)->pkt_type != HCI_COMMAND_PKT && - bt_cb(skb)->pkt_type != HCI_ACLDATA_PKT && - bt_cb(skb)->pkt_type != HCI_SCODATA_PKT) { - err = -EINVAL; - goto drop; - } - skb_queue_tail(&hdev->raw_q, skb); queue_work(hdev->workqueue, &hdev->tx_work); } -- cgit v0.10.2 From f78dea064c5f7de07de4912a6e5136dbc443d614 Mon Sep 17 00:00:00 2001 From: Marc Carino Date: Mon, 16 Dec 2013 18:15:53 -0800 Subject: libata: implement ATA_HORKAGE_NO_NCQ_TRIM and apply it to Micro M500 SSDs Certain drives cannot handle queued TRIM commands properly, even though support is indicated in the IDENTIFY DEVICE buffer. This patch allows for disabling the commands for the affected drives and apply it to the Micron/Crucial M500 SSDs which exhibit incorrect protocol behavior when issued queued TRIM commands, which could lead to silent data corruption. tj: Merged two unnecessarily split patches and made minor edits including shortening horkage name. Signed-off-by: Marc Carino Signed-off-by: Tejun Heo Link: http://lkml.kernel.org/g/1387246554-7311-1-git-send-email-marc.ceeeee@gmail.com Cc: stable@vger.kernel.org # 3.12+ diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index ff01584..1393a58 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -2149,9 +2149,16 @@ static int ata_dev_config_ncq(struct ata_device *dev, "failed to get NCQ Send/Recv Log Emask 0x%x\n", err_mask); } else { + u8 *cmds = dev->ncq_send_recv_cmds; + dev->flags |= ATA_DFLAG_NCQ_SEND_RECV; - memcpy(dev->ncq_send_recv_cmds, ap->sector_buf, - ATA_LOG_NCQ_SEND_RECV_SIZE); + memcpy(cmds, ap->sector_buf, ATA_LOG_NCQ_SEND_RECV_SIZE); + + if (dev->horkage & ATA_HORKAGE_NO_NCQ_TRIM) { + ata_dev_dbg(dev, "disabling queued TRIM support\n"); + cmds[ATA_LOG_NCQ_SEND_RECV_DSM_OFFSET] &= + ~ATA_LOG_NCQ_SEND_RECV_DSM_TRIM; + } } } @@ -4205,6 +4212,10 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = { { "PIONEER DVD-RW DVR-212D", NULL, ATA_HORKAGE_NOSETXFER }, { "PIONEER DVD-RW DVR-216D", NULL, ATA_HORKAGE_NOSETXFER }, + /* devices that don't properly handle queued TRIM commands */ + { "Micron_M500*", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, + { "Crucial_CT???M500SSD1", NULL, ATA_HORKAGE_NO_NCQ_TRIM, }, + /* End Marker */ { } }; diff --git a/include/linux/libata.h b/include/linux/libata.h index 0e23c26..9b50337 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -418,6 +418,7 @@ enum { ATA_HORKAGE_DUMP_ID = (1 << 16), /* dump IDENTIFY data */ ATA_HORKAGE_MAX_SEC_LBA48 = (1 << 17), /* Set max sects to 65535 */ ATA_HORKAGE_ATAPI_DMADIR = (1 << 18), /* device requires dmadir */ + ATA_HORKAGE_NO_NCQ_TRIM = (1 << 19), /* don't use queued TRIM */ /* DMA mask for user DMA control: User visible values; DO NOT renumber */ -- cgit v0.10.2 From c1a71504e9715812a2d15e7c03b5aa147ae70ded Mon Sep 17 00:00:00 2001 From: Li Zefan Date: Tue, 17 Dec 2013 11:13:39 +0800 Subject: cgroup: don't recycle cgroup id until all csses' have been destroyed Hugh reported this bug: > CONFIG_MEMCG_SWAP is broken in 3.13-rc. Try something like this: > > mkdir -p /tmp/tmpfs /tmp/memcg > mount -t tmpfs -o size=1G tmpfs /tmp/tmpfs > mount -t cgroup -o memory memcg /tmp/memcg > mkdir /tmp/memcg/old > echo 512M >/tmp/memcg/old/memory.limit_in_bytes > echo $$ >/tmp/memcg/old/tasks > cp /dev/zero /tmp/tmpfs/zero 2>/dev/null > echo $$ >/tmp/memcg/tasks > rmdir /tmp/memcg/old > sleep 1 # let rmdir work complete > mkdir /tmp/memcg/new > umount /tmp/tmpfs > dmesg | grep WARNING > rmdir /tmp/memcg/new > umount /tmp/memcg > > Shows lots of WARNING: CPU: 1 PID: 1006 at kernel/res_counter.c:91 > res_counter_uncharge_locked+0x1f/0x2f() > > Breakage comes from 34c00c319ce7 ("memcg: convert to use cgroup id"). > > The lifetime of a cgroup id is different from the lifetime of the > css id it replaced: memsw's css_get()s do nothing to hold on to the > old cgroup id, it soon gets recycled to a new cgroup, which then > mysteriously inherits the old's swap, without any charge for it. Instead of removing cgroup id right after all the csses have been offlined, we should do that after csses have been destroyed. To make sure an invalid css pointer won't be returned after the css is destroyed, make sure css_from_id() returns NULL in this case. tj: Updated comment to note planned changes for cgrp->id. Reported-by: Hugh Dickins Signed-off-by: Li Zefan Reviewed-by: Michal Hocko Signed-off-by: Tejun Heo diff --git a/kernel/cgroup.c b/kernel/cgroup.c index bcb1755..bc1dcab 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -890,6 +890,16 @@ static void cgroup_diput(struct dentry *dentry, struct inode *inode) struct cgroup *cgrp = dentry->d_fsdata; BUG_ON(!(cgroup_is_dead(cgrp))); + + /* + * XXX: cgrp->id is only used to look up css's. As cgroup + * and css's lifetimes will be decoupled, it should be made + * per-subsystem and moved to css->id so that lookups are + * successful until the target css is released. + */ + idr_remove(&cgrp->root->cgroup_idr, cgrp->id); + cgrp->id = -1; + call_rcu(&cgrp->rcu_head, cgroup_free_rcu); } else { struct cfent *cfe = __d_cfe(dentry); @@ -4268,6 +4278,7 @@ static void css_release(struct percpu_ref *ref) struct cgroup_subsys_state *css = container_of(ref, struct cgroup_subsys_state, refcnt); + rcu_assign_pointer(css->cgroup->subsys[css->ss->subsys_id], NULL); call_rcu(&css->rcu_head, css_free_rcu_fn); } @@ -4733,14 +4744,6 @@ static void cgroup_destroy_css_killed(struct cgroup *cgrp) /* delete this cgroup from parent->children */ list_del_rcu(&cgrp->sibling); - /* - * We should remove the cgroup object from idr before its grace - * period starts, so we won't be looking up a cgroup while the - * cgroup is being freed. - */ - idr_remove(&cgrp->root->cgroup_idr, cgrp->id); - cgrp->id = -1; - dput(d); set_bit(CGRP_RELEASABLE, &parent->flags); -- cgit v0.10.2 From 280484e708a3cc38fe6807718caa460e744c0b20 Mon Sep 17 00:00:00 2001 From: Charles Keepax Date: Tue, 17 Dec 2013 13:16:16 +0000 Subject: ASoC: wm5110: Correct HPOUT3 DAPM route typo Reported-by: Kyung-Kwee Ryu Signed-off-by: Charles Keepax Signed-off-by: Mark Brown Cc: stable@vger.kernel.org diff --git a/sound/soc/codecs/wm5110.c b/sound/soc/codecs/wm5110.c index c3c7396..b3d0b28 100644 --- a/sound/soc/codecs/wm5110.c +++ b/sound/soc/codecs/wm5110.c @@ -1037,7 +1037,7 @@ static const struct snd_soc_dapm_route wm5110_dapm_routes[] = { { "AEC Loopback", "HPOUT3L", "OUT3L" }, { "AEC Loopback", "HPOUT3R", "OUT3R" }, { "HPOUT3L", NULL, "OUT3L" }, - { "HPOUT3R", NULL, "OUT3L" }, + { "HPOUT3R", NULL, "OUT3R" }, { "AEC Loopback", "SPKOUTL", "OUT4L" }, { "SPKOUTLN", NULL, "OUT4L" }, -- cgit v0.10.2 From 443772776c69ac9293d66b4d69fd9af16299cc2a Mon Sep 17 00:00:00 2001 From: Alexander Shishkin Date: Mon, 16 Dec 2013 14:17:36 +0200 Subject: perf: Disable all pmus on unthrottling and rescheduling Currently, only one PMU in a context gets disabled during unthrottling and event_sched_{out,in}(), however, events in one context may belong to different pmus, which results in PMUs being reprogrammed while they are still enabled. This means that mixed PMU use [which is rare in itself] resulted in potentially completely unreliable results: corrupted events, bogus results, etc. This patch temporarily disables PMUs that correspond to each event in the context while these events are being modified. Signed-off-by: Alexander Shishkin Reviewed-by: Andi Kleen Signed-off-by: Peter Zijlstra Cc: Frederic Weisbecker Cc: Mike Galbraith Cc: Paul Mackerras Cc: Stephane Eranian Cc: Alexander Shishkin Link: http://lkml.kernel.org/r/1387196256-8030-1-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar diff --git a/kernel/events/core.c b/kernel/events/core.c index 72348dc..f574401 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1396,6 +1396,8 @@ event_sched_out(struct perf_event *event, if (event->state != PERF_EVENT_STATE_ACTIVE) return; + perf_pmu_disable(event->pmu); + event->state = PERF_EVENT_STATE_INACTIVE; if (event->pending_disable) { event->pending_disable = 0; @@ -1412,6 +1414,8 @@ event_sched_out(struct perf_event *event, ctx->nr_freq--; if (event->attr.exclusive || !cpuctx->active_oncpu) cpuctx->exclusive = 0; + + perf_pmu_enable(event->pmu); } static void @@ -1652,6 +1656,7 @@ event_sched_in(struct perf_event *event, struct perf_event_context *ctx) { u64 tstamp = perf_event_time(event); + int ret = 0; if (event->state <= PERF_EVENT_STATE_OFF) return 0; @@ -1674,10 +1679,13 @@ event_sched_in(struct perf_event *event, */ smp_wmb(); + perf_pmu_disable(event->pmu); + if (event->pmu->add(event, PERF_EF_START)) { event->state = PERF_EVENT_STATE_INACTIVE; event->oncpu = -1; - return -EAGAIN; + ret = -EAGAIN; + goto out; } event->tstamp_running += tstamp - event->tstamp_stopped; @@ -1693,7 +1701,10 @@ event_sched_in(struct perf_event *event, if (event->attr.exclusive) cpuctx->exclusive = 1; - return 0; +out: + perf_pmu_enable(event->pmu); + + return ret; } static int @@ -2743,6 +2754,8 @@ static void perf_adjust_freq_unthr_context(struct perf_event_context *ctx, if (!event_filter_match(event)) continue; + perf_pmu_disable(event->pmu); + hwc = &event->hw; if (hwc->interrupts == MAX_INTERRUPTS) { @@ -2752,7 +2765,7 @@ static void perf_adjust_freq_unthr_context(struct perf_event_context *ctx, } if (!event->attr.freq || !event->attr.sample_freq) - continue; + goto next; /* * stop the event and update event->count @@ -2774,6 +2787,8 @@ static void perf_adjust_freq_unthr_context(struct perf_event_context *ctx, perf_adjust_period(event, period, delta, false); event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0); + next: + perf_pmu_enable(event->pmu); } perf_pmu_enable(ctx->pmu); -- cgit v0.10.2 From 189b84fb54490ae24111124346a8e63f8e019385 Mon Sep 17 00:00:00 2001 From: Vince Weaver Date: Fri, 13 Dec 2013 15:52:25 -0500 Subject: perf: Document the new transaction sample type Commit fdfbbd07e91f8fe3871 ("perf: Add generic transaction flags") added support for PERF_SAMPLE_TRANSACTION but forgot to add documentation for the sample type to include/uapi/linux/perf_event.h Signed-off-by: Vince Weaver Signed-off-by: Peter Zijlstra Cc: Andi Kleen Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1312131548450.10372@pianoman.cluster.toy Signed-off-by: Ingo Molnar diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index e1802d6..959d454 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -679,6 +679,7 @@ enum perf_event_type { * * { u64 weight; } && PERF_SAMPLE_WEIGHT * { u64 data_src; } && PERF_SAMPLE_DATA_SRC + * { u64 transaction; } && PERF_SAMPLE_TRANSACTION * }; */ PERF_RECORD_SAMPLE = 9, -- cgit v0.10.2 From 5d4cf996cf134e8ddb4f906b8197feb9267c2b77 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Tue, 17 Dec 2013 09:21:25 +0000 Subject: sched: Assign correct scheduling domain to 'sd_llc' Commit 42eb088e (sched: Avoid NULL dereference on sd_busy) corrected a NULL dereference on sd_busy but the fix also altered what scheduling domain it used for the 'sd_llc' percpu variable. One impact of this is that a task selecting a runqueue may consider idle CPUs that are not cache siblings as candidates for running. Tasks are then running on CPUs that are not cache hot. This was found through bisection where ebizzy threads were not seeing equal performance and it looked like a scheduling fairness issue. This patch mitigates but does not completely fix the problem on all machines tested implying there may be an additional bug or a common root cause. Here are the average range of performance seen by individual ebizzy threads. It was tested on top of candidate patches related to x86 TLB range flushing. 4-core machine 3.13.0-rc3 3.13.0-rc3 vanilla fixsd-v3r3 Mean 1 0.00 ( 0.00%) 0.00 ( 0.00%) Mean 2 0.34 ( 0.00%) 0.10 ( 70.59%) Mean 3 1.29 ( 0.00%) 0.93 ( 27.91%) Mean 4 7.08 ( 0.00%) 0.77 ( 89.12%) Mean 5 193.54 ( 0.00%) 2.14 ( 98.89%) Mean 6 151.12 ( 0.00%) 2.06 ( 98.64%) Mean 7 115.38 ( 0.00%) 2.04 ( 98.23%) Mean 8 108.65 ( 0.00%) 1.92 ( 98.23%) 8-core machine Mean 1 0.00 ( 0.00%) 0.00 ( 0.00%) Mean 2 0.40 ( 0.00%) 0.21 ( 47.50%) Mean 3 23.73 ( 0.00%) 0.89 ( 96.25%) Mean 4 12.79 ( 0.00%) 1.04 ( 91.87%) Mean 5 13.08 ( 0.00%) 2.42 ( 81.50%) Mean 6 23.21 ( 0.00%) 69.46 (-199.27%) Mean 7 15.85 ( 0.00%) 101.72 (-541.77%) Mean 8 109.37 ( 0.00%) 19.13 ( 82.51%) Mean 12 124.84 ( 0.00%) 28.62 ( 77.07%) Mean 16 113.50 ( 0.00%) 24.16 ( 78.71%) It's eliminated for one machine and reduced for another. Signed-off-by: Mel Gorman Signed-off-by: Peter Zijlstra Cc: Alex Shi Cc: Andrew Morton Cc: Fengguang Wu Cc: H Peter Anvin Cc: Linus Torvalds Link: http://lkml.kernel.org/r/20131217092124.GV11295@suse.de Signed-off-by: Ingo Molnar diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 19af58f..a88f4a4 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4902,6 +4902,7 @@ DEFINE_PER_CPU(struct sched_domain *, sd_asym); static void update_top_cache_domain(int cpu) { struct sched_domain *sd; + struct sched_domain *busy_sd = NULL; int id = cpu; int size = 1; @@ -4909,9 +4910,9 @@ static void update_top_cache_domain(int cpu) if (sd) { id = cpumask_first(sched_domain_span(sd)); size = cpumask_weight(sched_domain_span(sd)); - sd = sd->parent; /* sd_busy */ + busy_sd = sd->parent; /* sd_busy */ } - rcu_assign_pointer(per_cpu(sd_busy, cpu), sd); + rcu_assign_pointer(per_cpu(sd_busy, cpu), busy_sd); rcu_assign_pointer(per_cpu(sd_llc, cpu), sd); per_cpu(sd_llc_size, cpu) = size; -- cgit v0.10.2 From 757dfcaa41844595964f1220f1d33182dae49976 Mon Sep 17 00:00:00 2001 From: Kirill Tkhai Date: Wed, 27 Nov 2013 19:59:13 +0400 Subject: sched/rt: Fix rq's cpupri leak while enqueue/dequeue child RT entities This patch touches the RT group scheduling case. Functions inc_rt_prio_smp() and dec_rt_prio_smp() change (global) rq's priority, while rt_rq passed to them may be not the top-level rt_rq. This is wrong, because changing of priority on a child level does not guarantee that the priority is the highest all over the rq. So, this leak makes RT balancing unusable. The short example: the task having the highest priority among all rq's RT tasks (no one other task has the same priority) are waking on a throttle rt_rq. The rq's cpupri is set to the task's priority equivalent, but real rq->rt.highest_prio.curr is less. The patch below fixes the problem. Signed-off-by: Kirill Tkhai Signed-off-by: Peter Zijlstra CC: Steven Rostedt CC: stable@vger.kernel.org Link: http://lkml.kernel.org/r/49231385567953@web4m.yandex.ru Signed-off-by: Ingo Molnar diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 7d57275..1c40655 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -901,6 +901,13 @@ inc_rt_prio_smp(struct rt_rq *rt_rq, int prio, int prev_prio) { struct rq *rq = rq_of_rt_rq(rt_rq); +#ifdef CONFIG_RT_GROUP_SCHED + /* + * Change rq's cpupri only if rt_rq is the top queue. + */ + if (&rq->rt != rt_rq) + return; +#endif if (rq->online && prio < prev_prio) cpupri_set(&rq->rd->cpupri, rq->cpu, prio); } @@ -910,6 +917,13 @@ dec_rt_prio_smp(struct rt_rq *rt_rq, int prio, int prev_prio) { struct rq *rq = rq_of_rt_rq(rt_rq); +#ifdef CONFIG_RT_GROUP_SCHED + /* + * Change rq's cpupri only if rt_rq is the top queue. + */ + if (&rq->rt != rt_rq) + return; +#endif if (rq->online && rt_rq->highest_prio.curr != prev_prio) cpupri_set(&rq->rd->cpupri, rq->cpu, rt_rq->highest_prio.curr); } -- cgit v0.10.2 From 02fc17c10258ad70c1b9a93f8884bdaf0ac3f766 Mon Sep 17 00:00:00 2001 From: Jean-Francois Moine Date: Wed, 27 Nov 2013 21:10:24 +0100 Subject: ASoC: kirkwood: Fix the CPU DAI rates This patch fixes the rates declared in the CPU DAI parameters: - SNDRV_PCM_RATE_KNOT and the discrete rates SNDRV_PCM_RATE_xxx should not be used with SNDRV_PCM_RATE_CONTINUOUS, - SNDRV_PCM_RATE_CONTINUOUS asks for rate_min and rate_max, - the device may do streaming down to 5512Hz. Signed-off-by: Jean-Francois Moine Signed-off-by: Mark Brown diff --git a/sound/soc/kirkwood/kirkwood-i2s.c b/sound/soc/kirkwood/kirkwood-i2s.c index 0b18f65..3920a5e 100644 --- a/sound/soc/kirkwood/kirkwood-i2s.c +++ b/sound/soc/kirkwood/kirkwood-i2s.c @@ -473,17 +473,17 @@ static struct snd_soc_dai_driver kirkwood_i2s_dai_extclk[2] = { .playback = { .channels_min = 1, .channels_max = 2, - .rates = SNDRV_PCM_RATE_8000_192000 | - SNDRV_PCM_RATE_CONTINUOUS | - SNDRV_PCM_RATE_KNOT, + .rates = SNDRV_PCM_RATE_CONTINUOUS, + .rate_min = 5512, + .rate_max = 192000, .formats = KIRKWOOD_I2S_FORMATS, }, .capture = { .channels_min = 1, .channels_max = 2, - .rates = SNDRV_PCM_RATE_8000_192000 | - SNDRV_PCM_RATE_CONTINUOUS | - SNDRV_PCM_RATE_KNOT, + .rates = SNDRV_PCM_RATE_CONTINUOUS, + .rate_min = 5512, + .rate_max = 192000, .formats = KIRKWOOD_I2S_FORMATS, }, .ops = &kirkwood_i2s_dai_ops, @@ -494,17 +494,17 @@ static struct snd_soc_dai_driver kirkwood_i2s_dai_extclk[2] = { .playback = { .channels_min = 1, .channels_max = 2, - .rates = SNDRV_PCM_RATE_8000_192000 | - SNDRV_PCM_RATE_CONTINUOUS | - SNDRV_PCM_RATE_KNOT, + .rates = SNDRV_PCM_RATE_CONTINUOUS, + .rate_min = 5512, + .rate_max = 192000, .formats = KIRKWOOD_SPDIF_FORMATS, }, .capture = { .channels_min = 1, .channels_max = 2, - .rates = SNDRV_PCM_RATE_8000_192000 | - SNDRV_PCM_RATE_CONTINUOUS | - SNDRV_PCM_RATE_KNOT, + .rates = SNDRV_PCM_RATE_CONTINUOUS, + .rate_min = 5512, + .rate_max = 192000, .formats = KIRKWOOD_SPDIF_FORMATS, }, .ops = &kirkwood_i2s_dai_ops, -- cgit v0.10.2 From 533518a43ab9d662c864a9b63a6723050bfea488 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Thu, 12 Dec 2013 12:13:17 -0500 Subject: drm/radeon/dce6: set correct number of audio pins DCE6.0, 8.x has 6 DCE6.1 has 4 Signed-off-by: Alex Deucher diff --git a/drivers/gpu/drm/radeon/dce6_afmt.c b/drivers/gpu/drm/radeon/dce6_afmt.c index de86493..ab59fd7 100644 --- a/drivers/gpu/drm/radeon/dce6_afmt.c +++ b/drivers/gpu/drm/radeon/dce6_afmt.c @@ -308,7 +308,9 @@ int dce6_audio_init(struct radeon_device *rdev) rdev->audio.enabled = true; if (ASIC_IS_DCE8(rdev)) - rdev->audio.num_pins = 7; + rdev->audio.num_pins = 6; + else if (ASIC_IS_DCE61(rdev)) + rdev->audio.num_pins = 4; else rdev->audio.num_pins = 6; -- cgit v0.10.2 From c745fe611ca42295c9d91d8e305d27983e9132ef Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Mon, 9 Dec 2013 17:46:59 -0500 Subject: drm/radeon/dpm: disable ss on Cayman Spread spectrum seems to cause hangs when dynamic clock switching is enabled. Disable it for now. This does not affect performance or the amount of power saved. Tracked down by Martin Andersson. bug: https://bugs.freedesktop.org/show_bug.cgi?id=69723 Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org diff --git a/drivers/gpu/drm/radeon/rv770_dpm.c b/drivers/gpu/drm/radeon/rv770_dpm.c index 913b025..374499d 100644 --- a/drivers/gpu/drm/radeon/rv770_dpm.c +++ b/drivers/gpu/drm/radeon/rv770_dpm.c @@ -2328,6 +2328,12 @@ void rv770_get_engine_memory_ss(struct radeon_device *rdev) pi->mclk_ss = radeon_atombios_get_asic_ss_info(rdev, &ss, ASIC_INTERNAL_MEMORY_SS, 0); + /* disable ss, causes hangs on some cayman boards */ + if (rdev->family == CHIP_CAYMAN) { + pi->sclk_ss = false; + pi->mclk_ss = false; + } + if (pi->sclk_ss || pi->mclk_ss) pi->dynamic_ss = true; else -- cgit v0.10.2 From b67ce39a30976171e7b96b30a94a0216ab89df97 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Fri, 13 Dec 2013 09:05:49 -0500 Subject: drm/radeon: check for 0 count in speaker allocation and SAD code If there is no speaker allocation block or SAD block, bail early. bug: https://bugs.freedesktop.org/show_bug.cgi?id=72283 Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org diff --git a/drivers/gpu/drm/radeon/dce6_afmt.c b/drivers/gpu/drm/radeon/dce6_afmt.c index ab59fd7..713a5d3 100644 --- a/drivers/gpu/drm/radeon/dce6_afmt.c +++ b/drivers/gpu/drm/radeon/dce6_afmt.c @@ -174,7 +174,7 @@ void dce6_afmt_write_speaker_allocation(struct drm_encoder *encoder) } sad_count = drm_edid_to_speaker_allocation(radeon_connector->edid, &sadb); - if (sad_count < 0) { + if (sad_count <= 0) { DRM_ERROR("Couldn't read Speaker Allocation Data Block: %d\n", sad_count); return; } @@ -235,7 +235,7 @@ void dce6_afmt_write_sad_regs(struct drm_encoder *encoder) } sad_count = drm_edid_to_sad(radeon_connector->edid, &sads); - if (sad_count < 0) { + if (sad_count <= 0) { DRM_ERROR("Couldn't read SADs: %d\n", sad_count); return; } diff --git a/drivers/gpu/drm/radeon/evergreen_hdmi.c b/drivers/gpu/drm/radeon/evergreen_hdmi.c index aa695c4..0c6d5ce 100644 --- a/drivers/gpu/drm/radeon/evergreen_hdmi.c +++ b/drivers/gpu/drm/radeon/evergreen_hdmi.c @@ -118,7 +118,7 @@ static void dce4_afmt_write_speaker_allocation(struct drm_encoder *encoder) } sad_count = drm_edid_to_speaker_allocation(radeon_connector->edid, &sadb); - if (sad_count < 0) { + if (sad_count <= 0) { DRM_ERROR("Couldn't read Speaker Allocation Data Block: %d\n", sad_count); return; } @@ -173,7 +173,7 @@ static void evergreen_hdmi_write_sad_regs(struct drm_encoder *encoder) } sad_count = drm_edid_to_sad(radeon_connector->edid, &sads); - if (sad_count < 0) { + if (sad_count <= 0) { DRM_ERROR("Couldn't read SADs: %d\n", sad_count); return; } -- cgit v0.10.2 From 3a8c92086d1c05ae1cf6a52ffd8dde2851e6b1d3 Mon Sep 17 00:00:00 2001 From: Mark Tinguely Date: Sat, 5 Oct 2013 21:48:25 -0500 Subject: xfs: fix memory leak in xfs_dir2_node_removename Fix the leak of kernel memory in xfs_dir2_node_removename() when xfs_dir2_leafn_remove() returns an error code. Signed-off-by: Mark Tinguely Reviewed-by: Ben Myers Signed-off-by: Ben Myers (cherry picked from commit ef701600fd26cace9d513ee174688a2b83832126) diff --git a/fs/xfs/xfs_dir2_node.c b/fs/xfs/xfs_dir2_node.c index 56369d4..48c7d18 100644 --- a/fs/xfs/xfs_dir2_node.c +++ b/fs/xfs/xfs_dir2_node.c @@ -2067,12 +2067,12 @@ xfs_dir2_node_lookup( */ int /* error */ xfs_dir2_node_removename( - xfs_da_args_t *args) /* operation arguments */ + struct xfs_da_args *args) /* operation arguments */ { - xfs_da_state_blk_t *blk; /* leaf block */ + struct xfs_da_state_blk *blk; /* leaf block */ int error; /* error return value */ int rval; /* operation return value */ - xfs_da_state_t *state; /* btree cursor */ + struct xfs_da_state *state; /* btree cursor */ trace_xfs_dir2_node_removename(args); @@ -2084,19 +2084,18 @@ xfs_dir2_node_removename( state->mp = args->dp->i_mount; state->blocksize = state->mp->m_dirblksize; state->node_ents = state->mp->m_dir_node_ents; - /* - * Look up the entry we're deleting, set up the cursor. - */ + + /* Look up the entry we're deleting, set up the cursor. */ error = xfs_da3_node_lookup_int(state, &rval); if (error) - rval = error; - /* - * Didn't find it, upper layer screwed up. - */ + goto out_free; + + /* Didn't find it, upper layer screwed up. */ if (rval != EEXIST) { - xfs_da_state_free(state); - return rval; + error = rval; + goto out_free; } + blk = &state->path.blk[state->path.active - 1]; ASSERT(blk->magic == XFS_DIR2_LEAFN_MAGIC); ASSERT(state->extravalid); @@ -2107,7 +2106,7 @@ xfs_dir2_node_removename( error = xfs_dir2_leafn_remove(args, blk->bp, blk->index, &state->extrablk, &rval); if (error) - return error; + goto out_free; /* * Fix the hash values up the btree. */ @@ -2122,6 +2121,7 @@ xfs_dir2_node_removename( */ if (!error) error = xfs_dir2_node_to_leaf(state); +out_free: xfs_da_state_free(state); return error; } -- cgit v0.10.2 From 30d161c9aacb4b719789c30623b9eec7d1aa1d08 Mon Sep 17 00:00:00 2001 From: Jie Liu Date: Tue, 26 Nov 2013 21:38:54 +0800 Subject: xfs: fix false assertion at xfs_qm_vop_create_dqattach After the previous fix, there still has another ASSERT failure if turning off any type of quota while fsstress is running at the same time. Backtrace in this case: [ 50.867897] XFS: Assertion failed: XFS_IS_GQUOTA_ON(mp), file: fs/xfs/xfs_qm.c, line: 2118 [ 50.867924] ------------[ cut here ]------------ ... [ 50.867957] Kernel BUG at ffffffffa0b55a32 [verbose debug info unavailable] [ 50.867999] invalid opcode: 0000 [#1] SMP [ 50.869407] Call Trace: [ 50.869446] [] xfs_qm_vop_create_dqattach+0x19a/0x2d0 [xfs] [ 50.869512] [] xfs_create+0x5c5/0x6a0 [xfs] [ 50.869564] [] xfs_vn_mknod+0xac/0x1d0 [xfs] [ 50.869615] [] xfs_vn_mkdir+0x16/0x20 [xfs] [ 50.869655] [] vfs_mkdir+0x95/0x130 [ 50.869689] [] SyS_mkdirat+0xaa/0xe0 [ 50.869723] [] SyS_mkdir+0x19/0x20 [ 50.869757] [] system_call_fastpath+0x1a/0x1f [ 50.869793] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 [ 50.870003] RIP [] assfail+0x22/0x30 [xfs] [ 50.870050] RSP [ 50.879251] ---[ end trace c93a2b342341c65b ]--- We're hitting the ASSERT(XFS_IS_*QUOTA_ON(mp)) in xfs_qm_vop_create_dqattach(), however the assertion itself is not right IMHO. While performing quota off, we firstly clear the XFS_*QUOTA_ACTIVE bit(s) from struct xfs_mount without taking any special locks, see xfs_qm_scall_quotaoff(). Hence there is no guarantee that the desired quota is still active. Signed-off-by: Jie Liu Reviewed-by: Christoph Hellwig Signed-off-by: Ben Myers (cherry picked from commit 37eb9706ebf5b99d14c6086cdeef2c2f73f9c9fb) diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 14a4996..588e490 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -2082,24 +2082,21 @@ xfs_qm_vop_create_dqattach( ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); ASSERT(XFS_IS_QUOTA_RUNNING(mp)); - if (udqp) { + if (udqp && XFS_IS_UQUOTA_ON(mp)) { ASSERT(ip->i_udquot == NULL); - ASSERT(XFS_IS_UQUOTA_ON(mp)); ASSERT(ip->i_d.di_uid == be32_to_cpu(udqp->q_core.d_id)); ip->i_udquot = xfs_qm_dqhold(udqp); xfs_trans_mod_dquot(tp, udqp, XFS_TRANS_DQ_ICOUNT, 1); } - if (gdqp) { + if (gdqp && XFS_IS_GQUOTA_ON(mp)) { ASSERT(ip->i_gdquot == NULL); - ASSERT(XFS_IS_GQUOTA_ON(mp)); ASSERT(ip->i_d.di_gid == be32_to_cpu(gdqp->q_core.d_id)); ip->i_gdquot = xfs_qm_dqhold(gdqp); xfs_trans_mod_dquot(tp, gdqp, XFS_TRANS_DQ_ICOUNT, 1); } - if (pdqp) { + if (pdqp && XFS_IS_PQUOTA_ON(mp)) { ASSERT(ip->i_pdquot == NULL); - ASSERT(XFS_IS_PQUOTA_ON(mp)); ASSERT(xfs_get_projid(ip) == be32_to_cpu(pdqp->q_core.d_id)); ip->i_pdquot = xfs_qm_dqhold(pdqp); -- cgit v0.10.2 From 5c22727895bf61cb851835be0d30260fb36de648 Mon Sep 17 00:00:00 2001 From: Jie Liu Date: Tue, 26 Nov 2013 21:38:34 +0800 Subject: xfs: fix assertion failure at xfs_setattr_nonsize For CRC enabled v5 super block, change a file's ownership can simply trigger an ASSERT failure at xfs_setattr_nonsize() if both group and project quota are enabled, i.e, [ 305.337609] XFS: Assertion failed: !XFS_IS_PQUOTA_ON(mp), file: fs/xfs/xfs_iops.c, line: 621 [ 305.339250] Kernel BUG at ffffffffa0a7fa32 [verbose debug info unavailable] [ 305.383939] Call Trace: [ 305.385536] [] xfs_setattr_nonsize+0x69a/0x720 [xfs] [ 305.387142] [] xfs_vn_setattr+0x29/0x70 [xfs] [ 305.388727] [] notify_change+0x1a8/0x350 [ 305.390298] [] chown_common+0xfd/0x110 [ 305.391868] [] SyS_fchownat+0xaf/0x110 [ 305.393440] [] SyS_lchown+0x20/0x30 [ 305.394995] [] system_call_fastpath+0x1a/0x1f [ 305.399870] RIP [] assfail+0x22/0x30 [xfs] This fix adjust the assertion to check if the super block support both quota inodes or not. Signed-off-by: Jie Liu Reviewed-by: Christoph Hellwig Signed-off-by: Ben Myers (cherry picked from commit 5a01dd54f4a7fb513062070c5acef20d13cad980) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 27e0e54..104455b 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -618,7 +618,8 @@ xfs_setattr_nonsize( } if (!gid_eq(igid, gid)) { if (XFS_IS_QUOTA_RUNNING(mp) && XFS_IS_GQUOTA_ON(mp)) { - ASSERT(!XFS_IS_PQUOTA_ON(mp)); + ASSERT(xfs_sb_version_has_pquotino(&mp->m_sb) || + !XFS_IS_PQUOTA_ON(mp)); ASSERT(mask & ATTR_GID); ASSERT(gdqp); olddquot2 = xfs_qm_vop_chown(tp, ip, -- cgit v0.10.2 From 718cc6f88cbfc4fbd39609f28c4c86883945f90d Mon Sep 17 00:00:00 2001 From: Jie Liu Date: Tue, 26 Nov 2013 21:38:49 +0800 Subject: xfs: fix infinite loop by detaching the group/project hints from user dquot xfs_quota(8) will hang up if trying to turn group/project quota off before the user quota is off, this could be 100% reproduced by: # mount -ouquota,gquota /dev/sda7 /xfs # mkdir /xfs/test # xfs_quota -xc 'off -g' /xfs <-- hangs up # echo w > /proc/sysrq-trigger # dmesg SysRq : Show Blocked State task PC stack pid father xfs_quota D 0000000000000000 0 27574 2551 0x00000000 [snip] Call Trace: [] schedule+0xad/0xc0 [] schedule_timeout+0x35e/0x3c0 [] ? mark_held_locks+0x176/0x1c0 [] ? call_timer_fn+0x2c0/0x2c0 [] ? xfs_qm_shrink_count+0x30/0x30 [xfs] [] schedule_timeout_uninterruptible+0x26/0x30 [] xfs_qm_dquot_walk+0x235/0x260 [xfs] [] ? xfs_perag_get+0x1d8/0x2d0 [xfs] [] ? xfs_perag_get+0x5/0x2d0 [xfs] [] ? xfs_inode_ag_iterator+0xae/0xf0 [xfs] [] ? xfs_trans_free_dqinfo+0x50/0x50 [xfs] [] ? xfs_inode_ag_iterator+0xcf/0xf0 [xfs] [] xfs_qm_dqpurge_all+0x66/0xb0 [xfs] [] xfs_qm_scall_quotaoff+0x20a/0x5f0 [xfs] [] xfs_fs_set_xstate+0x136/0x180 [xfs] [] do_quotactl+0x53a/0x6b0 [] ? iput+0x5b/0x90 [] SyS_quotactl+0x167/0x1d0 [] ? trace_hardirqs_on_thunk+0x3a/0x3f [] system_call_fastpath+0x16/0x1b It's fine if we turn user quota off at first, then turn off other kind of quotas if they are enabled since the group/project dquot refcount is decreased to zero once the user quota if off. Otherwise, those dquots refcount is non-zero due to the user dquot might refer to them as hint(s). Hence, above operation cause an infinite loop at xfs_qm_dquot_walk() while trying to purge dquot cache. This problem has been around since Linux 3.4, it was introduced by: [ b84a3a9675 xfs: remove the per-filesystem list of dquots ] Originally we will release the group dquot pointers because the user dquots maybe carrying around as a hint via xfs_qm_detach_gdquots(). However, with above change, there is no such work to be done before purging group/project dquot cache. In order to solve this problem, this patch introduces a special routine xfs_qm_dqpurge_hints(), and it would release the group/project dquot pointers the user dquots maybe carrying around as a hint, and then it will proceed to purge the user dquot cache if requested. Cc: stable@vger.kernel.org Signed-off-by: Jie Liu Reviewed-by: Dave Chinner Signed-off-by: Ben Myers (cherry picked from commit df8052e7dae00bde6f21b40b6e3e1099770f3afc) diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 588e490..dd88f0e 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -134,8 +134,6 @@ xfs_qm_dqpurge( { struct xfs_mount *mp = dqp->q_mount; struct xfs_quotainfo *qi = mp->m_quotainfo; - struct xfs_dquot *gdqp = NULL; - struct xfs_dquot *pdqp = NULL; xfs_dqlock(dqp); if ((dqp->dq_flags & XFS_DQ_FREEING) || dqp->q_nrefs != 0) { @@ -143,21 +141,6 @@ xfs_qm_dqpurge( return EAGAIN; } - /* - * If this quota has a hint attached, prepare for releasing it now. - */ - gdqp = dqp->q_gdquot; - if (gdqp) { - xfs_dqlock(gdqp); - dqp->q_gdquot = NULL; - } - - pdqp = dqp->q_pdquot; - if (pdqp) { - xfs_dqlock(pdqp); - dqp->q_pdquot = NULL; - } - dqp->dq_flags |= XFS_DQ_FREEING; xfs_dqflock(dqp); @@ -206,11 +189,47 @@ xfs_qm_dqpurge( XFS_STATS_DEC(xs_qm_dquot_unused); xfs_qm_dqdestroy(dqp); + return 0; +} + +/* + * Release the group or project dquot pointers the user dquots maybe carrying + * around as a hint, and proceed to purge the user dquot cache if requested. +*/ +STATIC int +xfs_qm_dqpurge_hints( + struct xfs_dquot *dqp, + void *data) +{ + struct xfs_dquot *gdqp = NULL; + struct xfs_dquot *pdqp = NULL; + uint flags = *((uint *)data); + + xfs_dqlock(dqp); + if (dqp->dq_flags & XFS_DQ_FREEING) { + xfs_dqunlock(dqp); + return EAGAIN; + } + + /* If this quota has a hint attached, prepare for releasing it now */ + gdqp = dqp->q_gdquot; + if (gdqp) + dqp->q_gdquot = NULL; + + pdqp = dqp->q_pdquot; + if (pdqp) + dqp->q_pdquot = NULL; + + xfs_dqunlock(dqp); if (gdqp) - xfs_qm_dqput(gdqp); + xfs_qm_dqrele(gdqp); if (pdqp) - xfs_qm_dqput(pdqp); + xfs_qm_dqrele(pdqp); + + if (flags & XFS_QMOPT_UQUOTA) + return xfs_qm_dqpurge(dqp, NULL); + return 0; } @@ -222,8 +241,18 @@ xfs_qm_dqpurge_all( struct xfs_mount *mp, uint flags) { - if (flags & XFS_QMOPT_UQUOTA) - xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge, NULL); + /* + * We have to release group/project dquot hint(s) from the user dquot + * at first if they are there, otherwise we would run into an infinite + * loop while walking through radix tree to purge other type of dquots + * since their refcount is not zero if the user dquot refers to them + * as hint. + * + * Call the special xfs_qm_dqpurge_hints() will end up go through the + * general xfs_qm_dqpurge() against user dquot cache if requested. + */ + xfs_qm_dquot_walk(mp, XFS_DQ_USER, xfs_qm_dqpurge_hints, &flags); + if (flags & XFS_QMOPT_GQUOTA) xfs_qm_dquot_walk(mp, XFS_DQ_GROUP, xfs_qm_dqpurge, NULL); if (flags & XFS_QMOPT_PQUOTA) -- cgit v0.10.2 From 809625ca544bdc9e02a5728c1939a76dc29decb1 Mon Sep 17 00:00:00 2001 From: Namjae Jeon Date: Sun, 8 Dec 2013 23:33:50 +0900 Subject: MAINTAINERS: fix incorrect mail address of XFS maintainer When I tried to send the patches to XFS Maintainers, I got returned mail included delivery fail message for Dave's mail. Maybe, Dave Chinner mail address is incorrect. I try to fix it correctly. Signed-off-by: Namjae Jeon Reviewed-by: Ben Myers Signed-off-by: Ben Myers (cherry picked from commit db10bddc7d4f412bcd8630fc479fa1eb009e325b) diff --git a/MAINTAINERS b/MAINTAINERS index f216db8..f5d0bdd 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9530,7 +9530,7 @@ F: drivers/xen/*swiotlb* XFS FILESYSTEM P: Silicon Graphics Inc -M: Dave Chinner +M: Dave Chinner M: Ben Myers M: xfs@oss.sgi.com L: xfs@oss.sgi.com -- cgit v0.10.2 From 6e708bcf6583f663da9fe7bc5292cc62f0a8410d Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Fri, 22 Nov 2013 10:41:16 +1100 Subject: xfs: align initial file allocations correctly The function xfs_bmap_isaeof() is used to indicate that an allocation is occurring at or past the end of file, and as such should be aligned to the underlying storage geometry if possible. Commit 27a3f8f ("xfs: introduce xfs_bmap_last_extent") changed the behaviour of this function for empty files - it turned off allocation alignment for this case accidentally. Hence large initial allocations from direct IO are not getting correctly aligned to the underlying geometry, and that is cause write performance to drop in alignment sensitive configurations. Fix it by considering allocation into empty files as requiring aligned allocation again. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Signed-off-by: Ben Myers (cherry picked from commit f9b395a8ef8f34d19cae2cde361e19c96e097fad) diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c index 3ef11b2..8401f11 100644 --- a/fs/xfs/xfs_bmap.c +++ b/fs/xfs/xfs_bmap.c @@ -1635,7 +1635,7 @@ xfs_bmap_last_extent( * blocks at the end of the file which do not start at the previous data block, * we will try to align the new blocks at stripe unit boundaries. * - * Returns 0 in bma->aeof if the file (fork) is empty as any new write will be + * Returns 1 in bma->aeof if the file (fork) is empty as any new write will be * at, or past the EOF. */ STATIC int @@ -1650,9 +1650,14 @@ xfs_bmap_isaeof( bma->aeof = 0; error = xfs_bmap_last_extent(NULL, bma->ip, whichfork, &rec, &is_empty); - if (error || is_empty) + if (error) return error; + if (is_empty) { + bma->aeof = 1; + return 0; + } + /* * Check if we are allocation or past the last extent, or at least into * the last delayed allocated extent. -- cgit v0.10.2 From 83a0adc3f93aae4ab9c59113e3145c7bdb2b4a8c Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 17 Dec 2013 00:03:52 -0800 Subject: xfs: remove xfsbdstrat error The xfsbdstrat helper is a small but useless wrapper for xfs_buf_iorequest that handles the case of a shut down filesystem. Most of the users have private, uncached buffers that can just be freed in this case, but the complex error handling in xfs_bioerror_relse messes up the case when it's called without a locked buffer. Remove xfsbdstrat and opencode the error handling in the callers. All but one can simply return an error and don't need to deal with buffer state, and the one caller that cares about the buffer state could do with a major cleanup as well, but we'll defer that to later. Signed-off-by: Christoph Hellwig Reviewed-by: Ben Myers Signed-off-by: Ben Myers diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 5887e41..1394106 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1187,7 +1187,12 @@ xfs_zero_remaining_bytes( XFS_BUF_UNWRITE(bp); XFS_BUF_READ(bp); XFS_BUF_SET_ADDR(bp, xfs_fsb_to_db(ip, imap.br_startblock)); - xfsbdstrat(mp, bp); + + if (XFS_FORCED_SHUTDOWN(mp)) { + error = XFS_ERROR(EIO); + break; + } + xfs_buf_iorequest(bp); error = xfs_buf_iowait(bp); if (error) { xfs_buf_ioerror_alert(bp, @@ -1200,7 +1205,12 @@ xfs_zero_remaining_bytes( XFS_BUF_UNDONE(bp); XFS_BUF_UNREAD(bp); XFS_BUF_WRITE(bp); - xfsbdstrat(mp, bp); + + if (XFS_FORCED_SHUTDOWN(mp)) { + error = XFS_ERROR(EIO); + break; + } + xfs_buf_iorequest(bp); error = xfs_buf_iowait(bp); if (error) { xfs_buf_ioerror_alert(bp, diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index c7f0b77..9fa9c43 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -698,7 +698,11 @@ xfs_buf_read_uncached( bp->b_flags |= XBF_READ; bp->b_ops = ops; - xfsbdstrat(target->bt_mount, bp); + if (XFS_FORCED_SHUTDOWN(target->bt_mount)) { + xfs_buf_relse(bp); + return NULL; + } + xfs_buf_iorequest(bp); xfs_buf_iowait(bp); return bp; } @@ -1089,7 +1093,7 @@ xfs_bioerror( * This is meant for userdata errors; metadata bufs come with * iodone functions attached, so that we can track down errors. */ -STATIC int +int xfs_bioerror_relse( struct xfs_buf *bp) { @@ -1164,25 +1168,6 @@ xfs_bwrite( return error; } -/* - * Wrapper around bdstrat so that we can stop data from going to disk in case - * we are shutting down the filesystem. Typically user data goes thru this - * path; one of the exceptions is the superblock. - */ -void -xfsbdstrat( - struct xfs_mount *mp, - struct xfs_buf *bp) -{ - if (XFS_FORCED_SHUTDOWN(mp)) { - trace_xfs_bdstrat_shut(bp, _RET_IP_); - xfs_bioerror_relse(bp); - return; - } - - xfs_buf_iorequest(bp); -} - STATIC void _xfs_buf_ioend( xfs_buf_t *bp, diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h index e656833..7e41b08 100644 --- a/fs/xfs/xfs_buf.h +++ b/fs/xfs/xfs_buf.h @@ -269,9 +269,6 @@ extern void xfs_buf_unlock(xfs_buf_t *); /* Buffer Read and Write Routines */ extern int xfs_bwrite(struct xfs_buf *bp); - -extern void xfsbdstrat(struct xfs_mount *, struct xfs_buf *); - extern void xfs_buf_ioend(xfs_buf_t *, int); extern void xfs_buf_ioerror(xfs_buf_t *, int); extern void xfs_buf_ioerror_alert(struct xfs_buf *, const char *func); @@ -282,6 +279,8 @@ extern void xfs_buf_iomove(xfs_buf_t *, size_t, size_t, void *, #define xfs_buf_zero(bp, off, len) \ xfs_buf_iomove((bp), (off), (len), NULL, XBRW_ZERO) +extern int xfs_bioerror_relse(struct xfs_buf *); + static inline int xfs_buf_geterror(xfs_buf_t *bp) { return bp ? bp->b_error : ENOMEM; diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index b6b669d..eae1692 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -193,7 +193,10 @@ xlog_bread_noalign( bp->b_io_length = nbblks; bp->b_error = 0; - xfsbdstrat(log->l_mp, bp); + if (XFS_FORCED_SHUTDOWN(log->l_mp)) + return XFS_ERROR(EIO); + + xfs_buf_iorequest(bp); error = xfs_buf_iowait(bp); if (error) xfs_buf_ioerror_alert(bp, __func__); @@ -4397,7 +4400,13 @@ xlog_do_recover( XFS_BUF_READ(bp); XFS_BUF_UNASYNC(bp); bp->b_ops = &xfs_sb_buf_ops; - xfsbdstrat(log->l_mp, bp); + + if (XFS_FORCED_SHUTDOWN(log->l_mp)) { + xfs_buf_relse(bp); + return XFS_ERROR(EIO); + } + + xfs_buf_iorequest(bp); error = xfs_buf_iowait(bp); if (error) { xfs_buf_ioerror_alert(bp, __func__); diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c index c035d11..647b6f1 100644 --- a/fs/xfs/xfs_trans_buf.c +++ b/fs/xfs/xfs_trans_buf.c @@ -314,7 +314,18 @@ xfs_trans_read_buf_map( ASSERT(bp->b_iodone == NULL); XFS_BUF_READ(bp); bp->b_ops = ops; - xfsbdstrat(tp->t_mountp, bp); + + /* + * XXX(hch): clean up the error handling here to be less + * of a mess.. + */ + if (XFS_FORCED_SHUTDOWN(mp)) { + trace_xfs_bdstrat_shut(bp, _RET_IP_); + xfs_bioerror_relse(bp); + } else { + xfs_buf_iorequest(bp); + } + error = xfs_buf_iowait(bp); if (error) { xfs_buf_ioerror_alert(bp, __func__); -- cgit v0.10.2 From 33177f05364c6cd13b06d0f3500dad07cf4647c2 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Thu, 12 Dec 2013 16:34:36 +1100 Subject: xfs: swalloc doesn't align allocations properly When swalloc is specified as a mount option, allocations are supposed to be aligned to the stripe width rather than the stripe unit of the underlying filesystem. However, it does not do this. What the implementation does is round up the allocation size to a stripe width, hence ensuring that all allocations span a full stripe width. It does not, however, ensure that that allocation is aligned to a stripe width, and hence the allocations can span multiple underlying stripes and so still see RMW cycles for things like direct IO on MD RAID. So, if the swalloc mount option is set, change the allocation alignment in xfs_bmap_btalloc() to use the stripe width rather than the stripe unit. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Reviewed-by: Ben Myers Signed-off-by: Ben Myers diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c index 8401f11..3b2c14b 100644 --- a/fs/xfs/xfs_bmap.c +++ b/fs/xfs/xfs_bmap.c @@ -3648,10 +3648,19 @@ xfs_bmap_btalloc( int isaligned; int tryagain; int error; + int stripe_align; ASSERT(ap->length); mp = ap->ip->i_mount; + + /* stripe alignment for allocation is determined by mount parameters */ + stripe_align = 0; + if (mp->m_swidth && (mp->m_flags & XFS_MOUNT_SWALLOC)) + stripe_align = mp->m_swidth; + else if (mp->m_dalign) + stripe_align = mp->m_dalign; + align = ap->userdata ? xfs_get_extsz_hint(ap->ip) : 0; if (unlikely(align)) { error = xfs_bmap_extsize_align(mp, &ap->got, &ap->prev, @@ -3660,6 +3669,8 @@ xfs_bmap_btalloc( ASSERT(!error); ASSERT(ap->length); } + + nullfb = *ap->firstblock == NULLFSBLOCK; fb_agno = nullfb ? NULLAGNUMBER : XFS_FSB_TO_AGNO(mp, *ap->firstblock); if (nullfb) { @@ -3735,7 +3746,7 @@ xfs_bmap_btalloc( */ if (!ap->flist->xbf_low && ap->aeof) { if (!ap->offset) { - args.alignment = mp->m_dalign; + args.alignment = stripe_align; atype = args.type; isaligned = 1; /* @@ -3760,13 +3771,13 @@ xfs_bmap_btalloc( * of minlen+alignment+slop doesn't go up * between the calls. */ - if (blen > mp->m_dalign && blen <= args.maxlen) - nextminlen = blen - mp->m_dalign; + if (blen > stripe_align && blen <= args.maxlen) + nextminlen = blen - stripe_align; else nextminlen = args.minlen; - if (nextminlen + mp->m_dalign > args.minlen + 1) + if (nextminlen + stripe_align > args.minlen + 1) args.minalignslop = - nextminlen + mp->m_dalign - + nextminlen + stripe_align - args.minlen - 1; else args.minalignslop = 0; @@ -3788,7 +3799,7 @@ xfs_bmap_btalloc( */ args.type = atype; args.fsbno = ap->blkno; - args.alignment = mp->m_dalign; + args.alignment = stripe_align; args.minlen = nextminlen; args.minalignslop = 0; isaligned = 1; -- cgit v0.10.2 From ac8809f9ab01a73de1a47b5a37bd8dcca8712fb3 Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Thu, 12 Dec 2013 16:34:38 +1100 Subject: xfs: abort metadata writeback on permanent errors If we are doing aysnc writeback of metadata, we can get write errors but have nobody to report them to. At the moment, we simply attempt to reissue the write from io completion in the hope that it's a transient error. When it's not a transient error, the buffer is stuck forever in this loop, and we cannot break out of it. Eventually, unmount will hang because the AIL cannot be emptied and everything goes downhill from them. To solve this problem, only retry the write IO once before aborting it. We don't throw the buffer away because some transient errors can last minutes (e.g. FC path failover) or even hours (thin provisioned devices that have run out of backing space) before they go away. Hence we really want to keep trying until we can't try any more. Because the buffer was not cleaned, however, it does not get removed from the AIL and hence the next pass across the AIL will start IO on it again. As such, we still get the "retry forever" semantics that we currently have, but we allow other access to the buffer in the mean time. Meanwhile the filesystem can continue to modify the buffer and relog it, so the IO errors won't hang the log or the filesystem. Now when we are pushing the AIL, we can see all these "permanent IO error" buffers and we can issue a warning about failures before we retry the IO. We can also catch these buffers when unmounting an issue a corruption warning, too. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig Signed-off-by: Ben Myers diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 9fa9c43..afe7645 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -1156,7 +1156,7 @@ xfs_bwrite( ASSERT(xfs_buf_islocked(bp)); bp->b_flags |= XBF_WRITE; - bp->b_flags &= ~(XBF_ASYNC | XBF_READ | _XBF_DELWRI_Q); + bp->b_flags &= ~(XBF_ASYNC | XBF_READ | _XBF_DELWRI_Q | XBF_WRITE_FAIL); xfs_bdstrat_cb(bp); @@ -1501,6 +1501,12 @@ xfs_wait_buftarg( struct xfs_buf *bp; bp = list_first_entry(&dispose, struct xfs_buf, b_lru); list_del_init(&bp->b_lru); + if (bp->b_flags & XBF_WRITE_FAIL) { + xfs_alert(btp->bt_mount, +"Corruption Alert: Buffer at block 0x%llx had permanent write failures!\n" +"Please run xfs_repair to determine the extent of the problem.", + (long long)bp->b_bn); + } xfs_buf_rele(bp); } if (loop++ != 0) @@ -1784,7 +1790,7 @@ __xfs_buf_delwri_submit( blk_start_plug(&plug); list_for_each_entry_safe(bp, n, io_list, b_list) { - bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_ASYNC); + bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_ASYNC | XBF_WRITE_FAIL); bp->b_flags |= XBF_WRITE; if (!wait) { diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h index 7e41b08..1cf21a4 100644 --- a/fs/xfs/xfs_buf.h +++ b/fs/xfs/xfs_buf.h @@ -45,6 +45,7 @@ typedef enum { #define XBF_ASYNC (1 << 4) /* initiator will not wait for completion */ #define XBF_DONE (1 << 5) /* all pages in the buffer uptodate */ #define XBF_STALE (1 << 6) /* buffer has been staled, do not find it */ +#define XBF_WRITE_FAIL (1 << 24)/* async writes have failed on this buffer */ /* I/O hints for the BIO layer */ #define XBF_SYNCIO (1 << 10)/* treat this buffer as synchronous I/O */ @@ -70,6 +71,7 @@ typedef unsigned int xfs_buf_flags_t; { XBF_ASYNC, "ASYNC" }, \ { XBF_DONE, "DONE" }, \ { XBF_STALE, "STALE" }, \ + { XBF_WRITE_FAIL, "WRITE_FAIL" }, \ { XBF_SYNCIO, "SYNCIO" }, \ { XBF_FUA, "FUA" }, \ { XBF_FLUSH, "FLUSH" }, \ @@ -80,6 +82,7 @@ typedef unsigned int xfs_buf_flags_t; { _XBF_DELWRI_Q, "DELWRI_Q" }, \ { _XBF_COMPOUND, "COMPOUND" } + /* * Internal state flags. */ @@ -300,7 +303,8 @@ extern void xfs_buf_terminate(void); #define XFS_BUF_ZEROFLAGS(bp) \ ((bp)->b_flags &= ~(XBF_READ|XBF_WRITE|XBF_ASYNC| \ - XBF_SYNCIO|XBF_FUA|XBF_FLUSH)) + XBF_SYNCIO|XBF_FUA|XBF_FLUSH| \ + XBF_WRITE_FAIL)) void xfs_buf_stale(struct xfs_buf *bp); #define XFS_BUF_UNSTALE(bp) ((bp)->b_flags &= ~XBF_STALE) diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index a64f67b..2227b9b 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -496,6 +496,14 @@ xfs_buf_item_unpin( } } +/* + * Buffer IO error rate limiting. Limit it to no more than 10 messages per 30 + * seconds so as to not spam logs too much on repeated detection of the same + * buffer being bad.. + */ + +DEFINE_RATELIMIT_STATE(xfs_buf_write_fail_rl_state, 30 * HZ, 10); + STATIC uint xfs_buf_item_push( struct xfs_log_item *lip, @@ -524,6 +532,14 @@ xfs_buf_item_push( trace_xfs_buf_item_push(bip); + /* has a previous flush failed due to IO errors? */ + if ((bp->b_flags & XBF_WRITE_FAIL) && + ___ratelimit(&xfs_buf_write_fail_rl_state, "XFS:")) { + xfs_warn(bp->b_target->bt_mount, +"Detected failing async write on buffer block 0x%llx. Retrying async write.\n", + (long long)bp->b_bn); + } + if (!xfs_buf_delwri_queue(bp, buffer_list)) rval = XFS_ITEM_FLUSHING; xfs_buf_unlock(bp); @@ -1096,8 +1112,9 @@ xfs_buf_iodone_callbacks( xfs_buf_ioerror(bp, 0); /* errno of 0 unsets the flag */ - if (!XFS_BUF_ISSTALE(bp)) { - bp->b_flags |= XBF_WRITE | XBF_ASYNC | XBF_DONE; + if (!(bp->b_flags & (XBF_STALE|XBF_WRITE_FAIL))) { + bp->b_flags |= XBF_WRITE | XBF_ASYNC | + XBF_DONE | XBF_WRITE_FAIL; xfs_buf_iorequest(bp); } else { xfs_buf_relse(bp); -- cgit v0.10.2 From ed697e1aaf7237b1a62af39f64463b05c262808d Mon Sep 17 00:00:00 2001 From: JongHo Kim Date: Tue, 17 Dec 2013 23:02:24 +0900 Subject: ALSA: Add SNDRV_PCM_STATE_PAUSED case in wait_for_avail function When the process is sleeping at the SNDRV_PCM_STATE_PAUSED state from the wait_for_avail function, the sleep process will be woken by timeout(10 seconds). Even if the sleep process wake up by timeout, by this patch, the process will continue with sleep and wait for the other state. Signed-off-by: JongHo Kim Cc: Signed-off-by: Takashi Iwai diff --git a/sound/core/pcm_lib.c b/sound/core/pcm_lib.c index 6e03b46..a210467 100644 --- a/sound/core/pcm_lib.c +++ b/sound/core/pcm_lib.c @@ -1937,6 +1937,8 @@ static int wait_for_avail(struct snd_pcm_substream *substream, case SNDRV_PCM_STATE_DISCONNECTED: err = -EBADFD; goto _endloop; + case SNDRV_PCM_STATE_PAUSED: + continue; } if (!tout) { snd_printd("%s write error (DMA or IRQ trouble?)\n", -- cgit v0.10.2 From d24c195f90cb1adb178d26d84c722d4b9e551e05 Mon Sep 17 00:00:00 2001 From: Mika Westerberg Date: Tue, 10 Dec 2013 12:56:59 +0200 Subject: serial: 8250_dw: add new ACPI IDs Newer Intel PCHs with LPSS have the same Designware controllers than Haswell but ACPI IDs are different. Add these IDs to the driver list. Signed-off-by: Mika Westerberg Cc: stable Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/tty/serial/8250/8250_dw.c b/drivers/tty/serial/8250/8250_dw.c index 5f096c7..06525f1 100644 --- a/drivers/tty/serial/8250/8250_dw.c +++ b/drivers/tty/serial/8250/8250_dw.c @@ -457,6 +457,8 @@ MODULE_DEVICE_TABLE(of, dw8250_of_match); static const struct acpi_device_id dw8250_acpi_match[] = { { "INT33C4", 0 }, { "INT33C5", 0 }, + { "INT3434", 0 }, + { "INT3435", 0 }, { "80860F0A", 0 }, { }, }; -- cgit v0.10.2 From 1075a6e2dc7e2a96efc417b98dd98f57fdae985d Mon Sep 17 00:00:00 2001 From: Peter Hurley Date: Mon, 9 Dec 2013 18:06:07 -0500 Subject: n_tty: Fix apparent order of echoed output With block processing of echoed output, observed output order is still required. Push completed echoes and echo commands prior to output. Introduce echo_mark echo buffer index, which tracks completed echo commands; ie., those submitted via commit_echoes but which may not have been committed. Ensure that completed echoes are output prior to subsequent terminal writes in process_echoes(). Fixes newline/prompt output order in cooked mode shell. Cc: # 3.12.x : 39434ab n_tty: Fix missing newline echo Reported-by: Karl Dahlke Reported-by: Mikulas Patocka Signed-off-by: Peter Hurley Tested-by: Karl Dahlke Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index 268b627..34aacaa 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -93,6 +93,7 @@ struct n_tty_data { size_t canon_head; size_t echo_head; size_t echo_commit; + size_t echo_mark; DECLARE_BITMAP(char_map, 256); /* private to n_tty_receive_overrun (single-threaded) */ @@ -336,6 +337,7 @@ static void reset_buffer_flags(struct n_tty_data *ldata) { ldata->read_head = ldata->canon_head = ldata->read_tail = 0; ldata->echo_head = ldata->echo_tail = ldata->echo_commit = 0; + ldata->echo_mark = 0; ldata->line_start = 0; ldata->erasing = 0; @@ -787,6 +789,7 @@ static void commit_echoes(struct tty_struct *tty) size_t head; head = ldata->echo_head; + ldata->echo_mark = head; old = ldata->echo_commit - ldata->echo_tail; /* Process committed echoes if the accumulated # of bytes @@ -811,10 +814,11 @@ static void process_echoes(struct tty_struct *tty) size_t echoed; if ((!L_ECHO(tty) && !L_ECHONL(tty)) || - ldata->echo_commit == ldata->echo_tail) + ldata->echo_mark == ldata->echo_tail) return; mutex_lock(&ldata->output_lock); + ldata->echo_commit = ldata->echo_mark; echoed = __process_echoes(tty); mutex_unlock(&ldata->output_lock); @@ -822,6 +826,7 @@ static void process_echoes(struct tty_struct *tty) tty->ops->flush_chars(tty); } +/* NB: echo_mark and echo_head should be equivalent here */ static void flush_echoes(struct tty_struct *tty) { struct n_tty_data *ldata = tty->disc_data; -- cgit v0.10.2 From a885b3ccc74d8e38074e1c43a47c354c5ea0b01e Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Tue, 17 Dec 2013 14:34:50 +0000 Subject: drm/i915: Use the correct GMCH_CTRL register for Sandybridge+ MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The GMCH_CTRL register (or MGCC in the spec) is at a different address on Sandybridge, and the address to which we currently write to is undefined. These stray writes appear to upset (hard hang) my Ivybridge machine whilst it is in UEFI mode. Note that the register is still marked as locked RO on Sandybridge, so vgaarb is still dysfunctional. Signed-off-by: Chris Wilson Cc: Jani Nikula Cc: Ville Syrjälä Cc: stable@vger.kernel.org Reviewed-by: Jani Nikula Signed-off-by: Daniel Vetter diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 4049a65..54e82a8 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -11126,14 +11126,15 @@ void intel_connector_attach_encoder(struct intel_connector *connector, int intel_modeset_vga_set_state(struct drm_device *dev, bool state) { struct drm_i915_private *dev_priv = dev->dev_private; + unsigned reg = INTEL_INFO(dev)->gen >= 6 ? SNB_GMCH_CTRL : INTEL_GMCH_CTRL; u16 gmch_ctrl; - pci_read_config_word(dev_priv->bridge_dev, INTEL_GMCH_CTRL, &gmch_ctrl); + pci_read_config_word(dev_priv->bridge_dev, reg, &gmch_ctrl); if (state) gmch_ctrl &= ~INTEL_GMCH_VGA_DISABLE; else gmch_ctrl |= INTEL_GMCH_VGA_DISABLE; - pci_write_config_word(dev_priv->bridge_dev, INTEL_GMCH_CTRL, gmch_ctrl); + pci_write_config_word(dev_priv->bridge_dev, reg, gmch_ctrl); return 0; } -- cgit v0.10.2 From 657eb17d87852c42b55c4b06d5425baa08b2ddb3 Mon Sep 17 00:00:00 2001 From: Mathy Vanhoef Date: Thu, 28 Nov 2013 12:21:45 +0100 Subject: ath9k_htc: properly set MAC address and BSSID mask Pick the MAC address of the first virtual interface as the new hardware MAC address. Set BSSID mask according to this MAC address. This fixes CVE-2013-4579. Signed-off-by: Mathy Vanhoef Signed-off-by: John W. Linville diff --git a/drivers/net/wireless/ath/ath9k/htc_drv_main.c b/drivers/net/wireless/ath/ath9k/htc_drv_main.c index 9a2657f..608d739 100644 --- a/drivers/net/wireless/ath/ath9k/htc_drv_main.c +++ b/drivers/net/wireless/ath/ath9k/htc_drv_main.c @@ -127,21 +127,26 @@ static void ath9k_htc_bssid_iter(void *data, u8 *mac, struct ieee80211_vif *vif) struct ath9k_vif_iter_data *iter_data = data; int i; - for (i = 0; i < ETH_ALEN; i++) - iter_data->mask[i] &= ~(iter_data->hw_macaddr[i] ^ mac[i]); + if (iter_data->hw_macaddr != NULL) { + for (i = 0; i < ETH_ALEN; i++) + iter_data->mask[i] &= ~(iter_data->hw_macaddr[i] ^ mac[i]); + } else { + iter_data->hw_macaddr = mac; + } } -static void ath9k_htc_set_bssid_mask(struct ath9k_htc_priv *priv, +static void ath9k_htc_set_mac_bssid_mask(struct ath9k_htc_priv *priv, struct ieee80211_vif *vif) { struct ath_common *common = ath9k_hw_common(priv->ah); struct ath9k_vif_iter_data iter_data; /* - * Use the hardware MAC address as reference, the hardware uses it - * together with the BSSID mask when matching addresses. + * Pick the MAC address of the first interface as the new hardware + * MAC address. The hardware will use it together with the BSSID mask + * when matching addresses. */ - iter_data.hw_macaddr = common->macaddr; + iter_data.hw_macaddr = NULL; memset(&iter_data.mask, 0xff, ETH_ALEN); if (vif) @@ -153,6 +158,10 @@ static void ath9k_htc_set_bssid_mask(struct ath9k_htc_priv *priv, ath9k_htc_bssid_iter, &iter_data); memcpy(common->bssidmask, iter_data.mask, ETH_ALEN); + + if (iter_data.hw_macaddr) + memcpy(common->macaddr, iter_data.hw_macaddr, ETH_ALEN); + ath_hw_setbssidmask(common); } @@ -1063,7 +1072,7 @@ static int ath9k_htc_add_interface(struct ieee80211_hw *hw, goto out; } - ath9k_htc_set_bssid_mask(priv, vif); + ath9k_htc_set_mac_bssid_mask(priv, vif); priv->vif_slot |= (1 << avp->index); priv->nvifs++; @@ -1128,7 +1137,7 @@ static void ath9k_htc_remove_interface(struct ieee80211_hw *hw, ath9k_htc_set_opmode(priv); - ath9k_htc_set_bssid_mask(priv, vif); + ath9k_htc_set_mac_bssid_mask(priv, vif); /* * Stop ANI only if there are no associated station interfaces. diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c index 74f452c..21aa09e 100644 --- a/drivers/net/wireless/ath/ath9k/main.c +++ b/drivers/net/wireless/ath/ath9k/main.c @@ -965,8 +965,9 @@ void ath9k_calculate_iter_data(struct ieee80211_hw *hw, struct ath_common *common = ath9k_hw_common(ah); /* - * Use the hardware MAC address as reference, the hardware uses it - * together with the BSSID mask when matching addresses. + * Pick the MAC address of the first interface as the new hardware + * MAC address. The hardware will use it together with the BSSID mask + * when matching addresses. */ memset(iter_data, 0, sizeof(*iter_data)); memset(&iter_data->mask, 0xff, ETH_ALEN); -- cgit v0.10.2 From 9278db6279e28d4d433bc8a848e10b4ece8793ed Mon Sep 17 00:00:00 2001 From: Larry Finger Date: Wed, 11 Dec 2013 17:13:10 -0600 Subject: rtlwifi: pci: Fix oops on driver unload On Fedora systems, unloading rtl8192ce causes an oops. This patch fixes the problem reported at https://bugzilla.redhat.com/show_bug.cgi?id=852761. Signed-off-by: Larry Finger Cc: Stable Signed-off-by: John W. Linville diff --git a/drivers/net/wireless/rtlwifi/pci.c b/drivers/net/wireless/rtlwifi/pci.c index 0f49444..5a53195 100644 --- a/drivers/net/wireless/rtlwifi/pci.c +++ b/drivers/net/wireless/rtlwifi/pci.c @@ -740,6 +740,8 @@ static void _rtl_pci_rx_interrupt(struct ieee80211_hw *hw) }; int index = rtlpci->rx_ring[rx_queue_idx].idx; + if (rtlpci->driver_is_goingto_unload) + return; /*RX NORMAL PKT */ while (count--) { /*rx descriptor */ @@ -1636,6 +1638,7 @@ static void rtl_pci_stop(struct ieee80211_hw *hw) */ set_hal_stop(rtlhal); + rtlpci->driver_is_goingto_unload = true; rtlpriv->cfg->ops->disable_interrupt(hw); cancel_work_sync(&rtlpriv->works.lps_change_work); @@ -1653,7 +1656,6 @@ static void rtl_pci_stop(struct ieee80211_hw *hw) ppsc->rfchange_inprogress = true; spin_unlock_irqrestore(&rtlpriv->locks.rf_ps_lock, flags); - rtlpci->driver_is_goingto_unload = true; rtlpriv->cfg->ops->hw_disable(hw); /* some things are not needed if firmware not available */ if (!rtlpriv->max_fw_size) -- cgit v0.10.2 From 73f0b56a1ff64e7fb6c3a62088804bab93bcedc2 Mon Sep 17 00:00:00 2001 From: Sujith Manoharan Date: Mon, 16 Dec 2013 07:04:59 +0530 Subject: ath9k: Fix interrupt handling for the AR9002 family This patch adds a driver workaround for a HW issue. A race condition in the HW results in missing interrupts, which can be avoided by a read/write with the ISR register. All chips in the AR9002 series are affected by this bug - AR9003 and above do not have this problem. Cc: stable@vger.kernel.org Cc: Felix Fietkau Signed-off-by: Sujith Manoharan Signed-off-by: John W. Linville diff --git a/drivers/net/wireless/ath/ath9k/ar9002_mac.c b/drivers/net/wireless/ath/ath9k/ar9002_mac.c index 8d78253..a366d6b 100644 --- a/drivers/net/wireless/ath/ath9k/ar9002_mac.c +++ b/drivers/net/wireless/ath/ath9k/ar9002_mac.c @@ -76,9 +76,16 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked) mask2 |= ATH9K_INT_CST; if (isr2 & AR_ISR_S2_TSFOOR) mask2 |= ATH9K_INT_TSFOOR; + + if (!(pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED)) { + REG_WRITE(ah, AR_ISR_S2, isr2); + isr &= ~AR_ISR_BCNMISC; + } } - isr = REG_READ(ah, AR_ISR_RAC); + if (pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED) + isr = REG_READ(ah, AR_ISR_RAC); + if (isr == 0xffffffff) { *masked = 0; return false; @@ -97,11 +104,23 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked) *masked |= ATH9K_INT_TX; - s0_s = REG_READ(ah, AR_ISR_S0_S); + if (pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED) { + s0_s = REG_READ(ah, AR_ISR_S0_S); + s1_s = REG_READ(ah, AR_ISR_S1_S); + } else { + s0_s = REG_READ(ah, AR_ISR_S0); + REG_WRITE(ah, AR_ISR_S0, s0_s); + s1_s = REG_READ(ah, AR_ISR_S1); + REG_WRITE(ah, AR_ISR_S1, s1_s); + + isr &= ~(AR_ISR_TXOK | + AR_ISR_TXDESC | + AR_ISR_TXERR | + AR_ISR_TXEOL); + } + ah->intr_txqs |= MS(s0_s, AR_ISR_S0_QCU_TXOK); ah->intr_txqs |= MS(s0_s, AR_ISR_S0_QCU_TXDESC); - - s1_s = REG_READ(ah, AR_ISR_S1_S); ah->intr_txqs |= MS(s1_s, AR_ISR_S1_QCU_TXERR); ah->intr_txqs |= MS(s1_s, AR_ISR_S1_QCU_TXEOL); } @@ -114,13 +133,15 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked) *masked |= mask2; } - if (AR_SREV_9100(ah)) - return true; - - if (isr & AR_ISR_GENTMR) { + if (!AR_SREV_9100(ah) && (isr & AR_ISR_GENTMR)) { u32 s5_s; - s5_s = REG_READ(ah, AR_ISR_S5_S); + if (pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED) { + s5_s = REG_READ(ah, AR_ISR_S5_S); + } else { + s5_s = REG_READ(ah, AR_ISR_S5); + } + ah->intr_gen_timer_trigger = MS(s5_s, AR_ISR_S5_GENTIMER_TRIG); @@ -133,8 +154,21 @@ static bool ar9002_hw_get_isr(struct ath_hw *ah, enum ath9k_int *masked) if ((s5_s & AR_ISR_S5_TIM_TIMER) && !(pCap->hw_caps & ATH9K_HW_CAP_AUTOSLEEP)) *masked |= ATH9K_INT_TIM_TIMER; + + if (!(pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED)) { + REG_WRITE(ah, AR_ISR_S5, s5_s); + isr &= ~AR_ISR_GENTMR; + } } + if (!(pCap->hw_caps & ATH9K_HW_CAP_RAC_SUPPORTED)) { + REG_WRITE(ah, AR_ISR, isr); + REG_READ(ah, AR_ISR); + } + + if (AR_SREV_9100(ah)) + return true; + if (sync_cause) { ath9k_debug_sync_cause(common, sync_cause); fatal_int = -- cgit v0.10.2 From c5fe7a41ad51e0c2df8381e9c85d7343b36b2c1e Mon Sep 17 00:00:00 2001 From: Jonathan Cameron Date: Wed, 11 Dec 2013 18:45:00 +0000 Subject: staging:iio:mag:hmc5843 fix incorrect endianness of channel as a result of missuse of the IIO_ST macro. This driver sets the shift value equal to IIO_BE (or 1) rather than setting that to 0 and specificying the endianness. This means the channel type is missreported as [be|le]:u16/16>>1 where the be|le is dependent on the cpu native endianness, rather than be:u16/16>>0 resulting in any userspace code using this information, miss converting the channel and generating thoroughly trashed data. Signed-off-by: Jonathan Cameron Acked-by: Lars-Peter Clausen Cc: stable@vger.kernel.org diff --git a/drivers/staging/iio/magnetometer/hmc5843.c b/drivers/staging/iio/magnetometer/hmc5843.c index 99421f9..0485d7f 100644 --- a/drivers/staging/iio/magnetometer/hmc5843.c +++ b/drivers/staging/iio/magnetometer/hmc5843.c @@ -451,7 +451,12 @@ done: .info_mask_shared_by_type = BIT(IIO_CHAN_INFO_SCALE) | \ BIT(IIO_CHAN_INFO_SAMP_FREQ), \ .scan_index = idx, \ - .scan_type = IIO_ST('s', 16, 16, IIO_BE), \ + .scan_type = { \ + .sign = 's', \ + .realbits = 16, \ + .storagebits = 16, \ + .endianness = IIO_BE, \ + }, \ } static const struct iio_chan_spec hmc5843_channels[] = { -- cgit v0.10.2 From 3425c0f7ac61f2fcfb7f2728e9b7ba7e27aec429 Mon Sep 17 00:00:00 2001 From: Jonathan Cameron Date: Wed, 11 Dec 2013 18:45:00 +0000 Subject: iio:imu:adis16400 fix pressure channel scan type A single channel in this driver was using the IIO_ST macro. This does not provide a parameter for setting the endianness of the channel. Thus this channel will have been reported as whatever is the native endianness of the cpu rather than big endian. This means it would be incorrect on little endian platforms. Signed-off-by: Jonathan Cameron Acked-by: Lars-Peter Clausen Cc: stable@vger.kernel.org diff --git a/drivers/iio/imu/adis16400_core.c b/drivers/iio/imu/adis16400_core.c index 3fb7757..368660d 100644 --- a/drivers/iio/imu/adis16400_core.c +++ b/drivers/iio/imu/adis16400_core.c @@ -651,7 +651,12 @@ static const struct iio_chan_spec adis16448_channels[] = { .info_mask_shared_by_type = BIT(IIO_CHAN_INFO_SCALE), .address = ADIS16448_BARO_OUT, .scan_index = ADIS16400_SCAN_BARO, - .scan_type = IIO_ST('s', 16, 16, 0), + .scan_type = { + .sign = 's', + .realbits = 16, + .storagebits = 16, + .endianness = IIO_BE, + }, }, ADIS16400_TEMP_CHAN(ADIS16448_TEMP_OUT, 12), IIO_CHAN_SOFT_TIMESTAMP(11) -- cgit v0.10.2 From e39d99059ad7f75d7ae2d3c59055d3c476cdb0d9 Mon Sep 17 00:00:00 2001 From: Jonathan Cameron Date: Wed, 11 Dec 2013 18:45:00 +0000 Subject: iio:adc:ad7887 Fix channel reported endianness from cpu to big endian Note this also sets the endianness to big endian whereas it would previously have defaulted to the cpu endian. Hence technically this is a bug fix on LE platforms. Signed-off-by: Jonathan Cameron Acked-by: Lars-Peter Clausen Cc: stable@vger.kernel.org diff --git a/drivers/iio/adc/ad7887.c b/drivers/iio/adc/ad7887.c index acb7f90..749a6ca 100644 --- a/drivers/iio/adc/ad7887.c +++ b/drivers/iio/adc/ad7887.c @@ -200,7 +200,13 @@ static const struct ad7887_chip_info ad7887_chip_info_tbl[] = { .info_mask_shared_by_type = BIT(IIO_CHAN_INFO_SCALE), .address = 1, .scan_index = 1, - .scan_type = IIO_ST('u', 12, 16, 0), + .scan_type = { + .sign = 'u', + .realbits = 12, + .storagebits = 16, + .shift = 0, + .endianness = IIO_BE, + }, }, .channel[1] = { .type = IIO_VOLTAGE, @@ -210,7 +216,13 @@ static const struct ad7887_chip_info ad7887_chip_info_tbl[] = { .info_mask_shared_by_type = BIT(IIO_CHAN_INFO_SCALE), .address = 0, .scan_index = 0, - .scan_type = IIO_ST('u', 12, 16, 0), + .scan_type = { + .sign = 'u', + .realbits = 12, + .storagebits = 16, + .shift = 0, + .endianness = IIO_BE, + }, }, .channel[2] = IIO_CHAN_SOFT_TIMESTAMP(2), .int_vref_mv = 2500, -- cgit v0.10.2 From 0283f7a100882684ad32b768f9f1ad81658a0b92 Mon Sep 17 00:00:00 2001 From: Ian Abbott Date: Fri, 13 Dec 2013 12:00:30 +0000 Subject: staging: comedi: 8255_pci: fix for newer PCI-DIO48H At some point, Measurement Computing / ComputerBoards redesigned the PCI-DIO48H to use a PLX PCI interface chip instead of an AMCC chip. This meant they had to put their hardware registers in the PCI BAR 2 region instead of PCI BAR 1. Unfortunately, they kept the same PCI device ID for the new design. This means the driver recognizes the newer cards, but doesn't work (and is likely to screw up the local configuration registers of the PLX chip) because it's using the wrong region. Since the PCI subvendor and subdevice IDs were both zero on the old design, but are the same as the vendor and device on the new design, we can tell the old design and new design apart easily enough. Split the existing entry for the PCI-DIO48H in `pci_8255_boards[]` into two new entries, referenced by different entries in the PCI device ID table `pci_8255_pci_table[]`. Use the same board name for both entries. Signed-off-by: Ian Abbott Cc: stablle # 3.10.y # 3.11.y # 3.12.y # 3.13.y Acked-by: H Hartley Sweeten Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/staging/comedi/drivers/8255_pci.c b/drivers/staging/comedi/drivers/8255_pci.c index 432e3f9..c55f234 100644 --- a/drivers/staging/comedi/drivers/8255_pci.c +++ b/drivers/staging/comedi/drivers/8255_pci.c @@ -63,7 +63,8 @@ enum pci_8255_boardid { BOARD_ADLINK_PCI7296, BOARD_CB_PCIDIO24, BOARD_CB_PCIDIO24H, - BOARD_CB_PCIDIO48H, + BOARD_CB_PCIDIO48H_OLD, + BOARD_CB_PCIDIO48H_NEW, BOARD_CB_PCIDIO96H, BOARD_NI_PCIDIO96, BOARD_NI_PCIDIO96B, @@ -106,11 +107,16 @@ static const struct pci_8255_boardinfo pci_8255_boards[] = { .dio_badr = 2, .n_8255 = 1, }, - [BOARD_CB_PCIDIO48H] = { + [BOARD_CB_PCIDIO48H_OLD] = { .name = "cb_pci-dio48h", .dio_badr = 1, .n_8255 = 2, }, + [BOARD_CB_PCIDIO48H_NEW] = { + .name = "cb_pci-dio48h", + .dio_badr = 2, + .n_8255 = 2, + }, [BOARD_CB_PCIDIO96H] = { .name = "cb_pci-dio96h", .dio_badr = 2, @@ -263,7 +269,10 @@ static DEFINE_PCI_DEVICE_TABLE(pci_8255_pci_table) = { { PCI_VDEVICE(ADLINK, 0x7296), BOARD_ADLINK_PCI7296 }, { PCI_VDEVICE(CB, 0x0028), BOARD_CB_PCIDIO24 }, { PCI_VDEVICE(CB, 0x0014), BOARD_CB_PCIDIO24H }, - { PCI_VDEVICE(CB, 0x000b), BOARD_CB_PCIDIO48H }, + { PCI_DEVICE_SUB(PCI_VENDOR_ID_CB, 0x000b, 0x0000, 0x0000), + .driver_data = BOARD_CB_PCIDIO48H_OLD }, + { PCI_DEVICE_SUB(PCI_VENDOR_ID_CB, 0x000b, PCI_VENDOR_ID_CB, 0x000b), + .driver_data = BOARD_CB_PCIDIO48H_NEW }, { PCI_VDEVICE(CB, 0x0017), BOARD_CB_PCIDIO96H }, { PCI_VDEVICE(NI, 0x0160), BOARD_NI_PCIDIO96 }, { PCI_VDEVICE(NI, 0x1630), BOARD_NI_PCIDIO96B }, -- cgit v0.10.2 From c6236c0ce39c809c336ca929f68cf8ad02cf94e0 Mon Sep 17 00:00:00 2001 From: H Hartley Sweeten Date: Tue, 10 Dec 2013 16:31:25 -0700 Subject: staging: comedi: drivers: fix return value of comedi_load_firmware() Some of the callback functions that upload the firmware in the comedi drivers return a positive value indicating the number of bytes sent to the device. Detect this condition and just return '0' to indicate a successful upload. Reported-by: Bernd Porr Signed-off-by: H Hartley Sweeten Acked-by: Ian Abbott Acked-by: Dan Carpenter Cc: stable Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/staging/comedi/drivers.c b/drivers/staging/comedi/drivers.c index 8f02bf6..4964d2a 100644 --- a/drivers/staging/comedi/drivers.c +++ b/drivers/staging/comedi/drivers.c @@ -446,7 +446,7 @@ int comedi_load_firmware(struct comedi_device *dev, release_firmware(fw); } - return ret; + return ret < 0 ? ret : 0; } EXPORT_SYMBOL_GPL(comedi_load_firmware); -- cgit v0.10.2 From fb5f1834c3221e459324c6885eaad75429f722a5 Mon Sep 17 00:00:00 2001 From: Boris BREZILLON Date: Sun, 8 Dec 2013 15:59:59 +0100 Subject: usb: ohci-at91: fix irq and iomem resource retrieval When using dt resources retrieval (interrupts and reg properties) there is no predefined order for these resources in the platform dev resources table. Retrieve resources using platform_get_resource and platform_get_irq functions instead of direct resource table entries to avoid resource type mismatch. Signed-off-by: Boris BREZILLON Acked-by: Nicolas Ferre Signed-off-by: Alan Stern Reviewed-by: Tomasz Figa Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/usb/host/ohci-at91.c b/drivers/usb/host/ohci-at91.c index 418444e..8c356af 100644 --- a/drivers/usb/host/ohci-at91.c +++ b/drivers/usb/host/ohci-at91.c @@ -136,23 +136,27 @@ static int usb_hcd_at91_probe(const struct hc_driver *driver, struct ohci_hcd *ohci; int retval; struct usb_hcd *hcd = NULL; - - if (pdev->num_resources != 2) { - pr_debug("hcd probe: invalid num_resources"); - return -ENODEV; + struct device *dev = &pdev->dev; + struct resource *res; + int irq; + + res = platform_get_resource(pdev, IORESOURCE_MEM, 0); + if (!res) { + dev_dbg(dev, "hcd probe: missing memory resource\n"); + return -ENXIO; } - if ((pdev->resource[0].flags != IORESOURCE_MEM) - || (pdev->resource[1].flags != IORESOURCE_IRQ)) { - pr_debug("hcd probe: invalid resource type\n"); - return -ENODEV; + irq = platform_get_irq(pdev, 0); + if (irq < 0) { + dev_dbg(dev, "hcd probe: missing irq resource\n"); + return irq; } hcd = usb_create_hcd(driver, &pdev->dev, "at91"); if (!hcd) return -ENOMEM; - hcd->rsrc_start = pdev->resource[0].start; - hcd->rsrc_len = resource_size(&pdev->resource[0]); + hcd->rsrc_start = res->start; + hcd->rsrc_len = resource_size(res); if (!request_mem_region(hcd->rsrc_start, hcd->rsrc_len, hcd_name)) { pr_debug("request_mem_region failed\n"); @@ -199,7 +203,7 @@ static int usb_hcd_at91_probe(const struct hc_driver *driver, ohci->num_ports = board->ports; at91_start_hc(pdev); - retval = usb_add_hcd(hcd, pdev->resource[1].start, IRQF_SHARED); + retval = usb_add_hcd(hcd, irq, IRQF_SHARED); if (retval == 0) return retval; -- cgit v0.10.2 From 45c2aff645c82da7b1574dad5062993cf451c699 Mon Sep 17 00:00:00 2001 From: Gao feng Date: Mon, 16 Dec 2013 14:59:22 +0800 Subject: netfilter: nfnetlink_log: unset nf_loggers for netns when unloading module Steven Rostedt and Arnaldo Carvalho de Melo reported a panic when access the files /proc/sys/net/netfilter/nf_log/*. This problem will occur when we do: echo nfnetlink_log > /proc/sys/net/netfilter/nf_log/any_file rmmod nfnetlink_log and then access the files. Since the nf_loggers of netns hasn't been unset, it will point to the memory that has been freed. This bug is introduced by commit 9368a53c ("netfilter: nfnetlink_log: add net namespace support for nfnetlink_log"). [17261.822047] BUG: unable to handle kernel paging request at ffffffffa0d49090 [17261.822056] IP: [] nf_log_proc_dostring+0xf0/0x1d0 [...] [17261.822226] Call Trace: [17261.822235] [] ? security_capable+0x18/0x20 [17261.822240] [] ? ns_capable+0x29/0x50 [17261.822247] [] ? net_ctl_permissions+0x1f/0x90 [17261.822254] [] proc_sys_call_handler+0xb3/0xc0 [17261.822258] [] proc_sys_read+0x11/0x20 [17261.822265] [] vfs_read+0x9e/0x170 [17261.822270] [] SyS_read+0x49/0xa0 [17261.822276] [] ? __audit_syscall_exit+0x1f6/0x2a0 [17261.822283] [] system_call_fastpath+0x16/0x1b [17261.822285] Code: cc 81 4d 63 e4 4c 89 45 88 48 89 4d 90 e8 19 03 0d 00 4b 8b 84 e5 28 08 00 00 48 8b 4d 90 4c 8b 45 88 48 85 c0 0f 84 a8 00 00 00 <48> 8b 40 10 48 89 43 08 48 89 df 4c 89 f2 31 f6 e8 4b 35 af ff [17261.822329] RIP [] nf_log_proc_dostring+0xf0/0x1d0 [17261.822334] RSP [17261.822336] CR2: ffffffffa0d49090 [17261.822340] ---[ end trace a14ce54c0897a90d ]--- Reported-by: Arnaldo Carvalho de Melo Reported-by: Steven Rostedt Signed-off-by: Gao feng Signed-off-by: Pablo Neira Ayuso diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c index 3c4b69e..a155d19 100644 --- a/net/netfilter/nfnetlink_log.c +++ b/net/netfilter/nfnetlink_log.c @@ -1053,6 +1053,7 @@ static void __net_exit nfnl_log_net_exit(struct net *net) #ifdef CONFIG_PROC_FS remove_proc_entry("nfnetlink_log", net->nf.proc_netfilter); #endif + nf_log_unset(net, &nfulnl_logger); } static struct pernet_operations nfnl_log_net_ops = { -- cgit v0.10.2 From c2db11eca089b60148ded7467b956a547e8a2ae0 Mon Sep 17 00:00:00 2001 From: Soren Brinkmann Date: Mon, 2 Dec 2013 11:38:38 -0800 Subject: tty: xuartps: Properly guard sysrq specific code Commit 'tty: xuartps: Implement BREAK detection, add SYSRQ support' (0c0c47bc40a2e358d593b2d7fb93b50027fbfc0c) introduced sysrq support without properly guarding sysrq specific code which results in build errors when sysrq is disabled: DNAME=KBUILD_STR(xilinx_uartps)" -c -o drivers/tty/serial/.tmp_xilinx_uartps.o drivers/tty/serial/xilinx_uartps.c drivers/tty/serial/xilinx_uartps.c: In function 'xuartps_isr': drivers/tty/serial/xilinx_uartps.c:247:5: error: 'struct uart_port' has no member named 'sysrq' drivers/tty/serial/xilinx_uartps.c:247:5: error: 'struct uart_port' has no member named 'sysrq' drivers/tty/serial/xilinx_uartps.c:247:5: error: 'struct uart_port' has no member named 'sysrq' make[3]: *** [drivers/tty/serial/xilinx_uartps.o] Error 1 Reported-by: Masanari Iida Cc: Vlad Lungu Signed-off-by: Soren Brinkmann Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/tty/serial/xilinx_uartps.c b/drivers/tty/serial/xilinx_uartps.c index e46e9f3..f619ad5 100644 --- a/drivers/tty/serial/xilinx_uartps.c +++ b/drivers/tty/serial/xilinx_uartps.c @@ -240,6 +240,7 @@ static irqreturn_t xuartps_isr(int irq, void *dev_id) continue; } +#ifdef SUPPORT_SYSRQ /* * uart_handle_sysrq_char() doesn't work if * spinlocked, for some reason @@ -253,6 +254,7 @@ static irqreturn_t xuartps_isr(int irq, void *dev_id) } spin_lock(&port->lock); } +#endif port->icount.rx++; -- cgit v0.10.2 From 130f769e81fc472beb2211320777e26050e3fa15 Mon Sep 17 00:00:00 2001 From: Tomi Valkeinen Date: Mon, 16 Dec 2013 09:14:48 +0200 Subject: Revert "ARM: OMAP2+: Remove legacy mux code for display.c" Commit e30b06f4d5f000c31a7747a7e7ada78a5fd419a1 (ARM: OMAP2+: Remove legacy mux code for display.c) removed non-DT DSI and HDMI pinmuxing. However, DSI pinmuxing is still needed, and removing that caused DSI displays not to work. This reverts the DSI parts of the commit. Signed-off-by: Tomi Valkeinen Signed-off-by: Tony Lindgren diff --git a/arch/arm/mach-omap2/display.c b/arch/arm/mach-omap2/display.c index 58347bb..4cf1655 100644 --- a/arch/arm/mach-omap2/display.c +++ b/arch/arm/mach-omap2/display.c @@ -101,13 +101,51 @@ static const struct omap_dss_hwmod_data omap4_dss_hwmod_data[] __initconst = { { "dss_hdmi", "omapdss_hdmi", -1 }, }; +static int omap4_dsi_mux_pads(int dsi_id, unsigned lanes) +{ + u32 enable_mask, enable_shift; + u32 pipd_mask, pipd_shift; + u32 reg; + + if (dsi_id == 0) { + enable_mask = OMAP4_DSI1_LANEENABLE_MASK; + enable_shift = OMAP4_DSI1_LANEENABLE_SHIFT; + pipd_mask = OMAP4_DSI1_PIPD_MASK; + pipd_shift = OMAP4_DSI1_PIPD_SHIFT; + } else if (dsi_id == 1) { + enable_mask = OMAP4_DSI2_LANEENABLE_MASK; + enable_shift = OMAP4_DSI2_LANEENABLE_SHIFT; + pipd_mask = OMAP4_DSI2_PIPD_MASK; + pipd_shift = OMAP4_DSI2_PIPD_SHIFT; + } else { + return -ENODEV; + } + + reg = omap4_ctrl_pad_readl(OMAP4_CTRL_MODULE_PAD_CORE_CONTROL_DSIPHY); + + reg &= ~enable_mask; + reg &= ~pipd_mask; + + reg |= (lanes << enable_shift) & enable_mask; + reg |= (lanes << pipd_shift) & pipd_mask; + + omap4_ctrl_pad_writel(reg, OMAP4_CTRL_MODULE_PAD_CORE_CONTROL_DSIPHY); + + return 0; +} + static int omap_dsi_enable_pads(int dsi_id, unsigned lane_mask) { + if (cpu_is_omap44xx()) + return omap4_dsi_mux_pads(dsi_id, lane_mask); + return 0; } static void omap_dsi_disable_pads(int dsi_id, unsigned lane_mask) { + if (cpu_is_omap44xx()) + omap4_dsi_mux_pads(dsi_id, 0); } static int omap_dss_set_min_bus_tput(struct device *dev, unsigned long tput) -- cgit v0.10.2 From 82832046e285952f14e707647f9ef82466c883cc Mon Sep 17 00:00:00 2001 From: Russell King Date: Mon, 16 Dec 2013 11:33:24 +0000 Subject: imx-drm: imx-drm-core: fix error cleanup path for imx_drm_add_crtc() imx_drm_add_crtc() was kfree'ing the imx_drm_crtc structure while leaving it on the list of CRTCs. Delete it from the list first. Signed-off-by: Russell King Acked-by: Sascha Hauer Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/staging/imx-drm/imx-drm-core.c b/drivers/staging/imx-drm/imx-drm-core.c index 6bd015a..b5ff45f3 100644 --- a/drivers/staging/imx-drm/imx-drm-core.c +++ b/drivers/staging/imx-drm/imx-drm-core.c @@ -528,6 +528,7 @@ int imx_drm_add_crtc(struct drm_crtc *crtc, return 0; err_register: + list_del(&imx_drm_crtc->list); kfree(imx_drm_crtc); err_alloc: err_busy: -- cgit v0.10.2 From 8007875f0619f36d885ba357b587e673d6f20704 Mon Sep 17 00:00:00 2001 From: Russell King Date: Mon, 16 Dec 2013 11:33:44 +0000 Subject: imx-drm: imx-drm-core: fix DRM cleanup paths We must call drm_vblank_cleanup() on the error cleanup and unload paths after we've had a successful call to drm_vblank_init(). Ensure that the calls are in the reverse order to the initialisation order. Signed-off-by: Russell King Acked-by: Sascha Hauer Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/staging/imx-drm/imx-drm-core.c b/drivers/staging/imx-drm/imx-drm-core.c index b5ff45f3..aff4bd4 100644 --- a/drivers/staging/imx-drm/imx-drm-core.c +++ b/drivers/staging/imx-drm/imx-drm-core.c @@ -88,8 +88,9 @@ static int imx_drm_driver_unload(struct drm_device *drm) imx_drm_device_put(); - drm_mode_config_cleanup(imxdrm->drm); + drm_vblank_cleanup(imxdrm->drm); drm_kms_helper_poll_fini(imxdrm->drm); + drm_mode_config_cleanup(imxdrm->drm); return 0; } @@ -428,11 +429,11 @@ static int imx_drm_driver_load(struct drm_device *drm, unsigned long flags) ret = drm_mode_group_init_legacy_group(imxdrm->drm, &imxdrm->drm->primary->mode_group); if (ret) - goto err_init; + goto err_kms; ret = drm_vblank_init(imxdrm->drm, MAX_CRTC); if (ret) - goto err_init; + goto err_kms; /* * with vblank_disable_allowed = true, vblank interrupt will be disabled @@ -441,12 +442,19 @@ static int imx_drm_driver_load(struct drm_device *drm, unsigned long flags) */ imxdrm->drm->vblank_disable_allowed = true; - if (!imx_drm_device_get()) + if (!imx_drm_device_get()) { ret = -EINVAL; + goto err_vblank; + } - ret = 0; + mutex_unlock(&imxdrm->mutex); + return 0; -err_init: +err_vblank: + drm_vblank_cleanup(drm); +err_kms: + drm_kms_helper_poll_fini(drm); + drm_mode_config_cleanup(drm); mutex_unlock(&imxdrm->mutex); return ret; -- cgit v0.10.2 From 4ae078d58ac7fadace7eb30bade8da649ecc4ded Mon Sep 17 00:00:00 2001 From: Russell King Date: Mon, 16 Dec 2013 11:34:25 +0000 Subject: imx-drm: ipu-v3: fix potential CRTC device registration race Clean up the IPUv3 CRTC device registration; we don't need a separate function just to call platform_device_register_data(), and we don't need the return value converted at all. Update the IPU client id under a mutex, so that parallel probing doesn't race. Signed-off-by: Russell King Acked-by: Sascha Hauer Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/staging/imx-drm/ipu-v3/ipu-common.c b/drivers/staging/imx-drm/ipu-v3/ipu-common.c index 7a22ce6..97ca692 100644 --- a/drivers/staging/imx-drm/ipu-v3/ipu-common.c +++ b/drivers/staging/imx-drm/ipu-v3/ipu-common.c @@ -996,35 +996,35 @@ static const struct ipu_platform_reg client_reg[] = { }, }; +static DEFINE_MUTEX(ipu_client_id_mutex); static int ipu_client_id; -static int ipu_add_subdevice_pdata(struct device *dev, - const struct ipu_platform_reg *reg) -{ - struct platform_device *pdev; - - pdev = platform_device_register_data(dev, reg->name, ipu_client_id++, - ®->pdata, sizeof(struct ipu_platform_reg)); - - return PTR_ERR_OR_ZERO(pdev); -} - static int ipu_add_client_devices(struct ipu_soc *ipu) { - int ret; - int i; + struct device *dev = ipu->dev; + unsigned i; + int id, ret; + + mutex_lock(&ipu_client_id_mutex); + id = ipu_client_id; + ipu_client_id += ARRAY_SIZE(client_reg); + mutex_unlock(&ipu_client_id_mutex); for (i = 0; i < ARRAY_SIZE(client_reg); i++) { const struct ipu_platform_reg *reg = &client_reg[i]; - ret = ipu_add_subdevice_pdata(ipu->dev, reg); - if (ret) + struct platform_device *pdev; + + pdev = platform_device_register_data(dev, reg->name, + id++, ®->pdata, sizeof(reg->pdata)); + + if (IS_ERR(pdev)) goto err_register; } return 0; err_register: - platform_device_unregister_children(to_platform_device(ipu->dev)); + platform_device_unregister_children(to_platform_device(dev)); return ret; } -- cgit v0.10.2 From bd5121bbb6b9a89d9d13878fa5bb1e833168ae18 Mon Sep 17 00:00:00 2001 From: Russell King Date: Mon, 16 Dec 2013 12:38:30 +0000 Subject: imx-drm: imx-tve: don't call sleeping functions beneath enable_lock spinlock Enable lock claims that it is serializing tve_enable/disable calls. However, DRM already serialises mode sets with a mutex, which prevents encoder/connector functions being called concurrently. Secondly, holding a spinlock while calling clk_prepare_enable() is wrong; it will cause a might_sleep() warning should that debugging be enabled. So, let's just get rid of the enable_lock. Signed-off-by: Russell King Acked-by: Sascha Hauer Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/staging/imx-drm/imx-tve.c b/drivers/staging/imx-drm/imx-tve.c index 680f4c8..2c44fef 100644 --- a/drivers/staging/imx-drm/imx-tve.c +++ b/drivers/staging/imx-drm/imx-tve.c @@ -114,7 +114,6 @@ struct imx_tve { struct drm_encoder encoder; struct imx_drm_encoder *imx_drm_encoder; struct device *dev; - spinlock_t enable_lock; /* serializes tve_enable/disable */ spinlock_t lock; /* register lock */ bool enabled; int mode; @@ -146,10 +145,8 @@ __releases(&tve->lock) static void tve_enable(struct imx_tve *tve) { - unsigned long flags; int ret; - spin_lock_irqsave(&tve->enable_lock, flags); if (!tve->enabled) { tve->enabled = true; clk_prepare_enable(tve->clk); @@ -169,23 +166,18 @@ static void tve_enable(struct imx_tve *tve) TVE_CD_SM_IEN | TVE_CD_LM_IEN | TVE_CD_MON_END_IEN); - - spin_unlock_irqrestore(&tve->enable_lock, flags); } static void tve_disable(struct imx_tve *tve) { - unsigned long flags; int ret; - spin_lock_irqsave(&tve->enable_lock, flags); if (tve->enabled) { tve->enabled = false; ret = regmap_update_bits(tve->regmap, TVE_COM_CONF_REG, TVE_IPU_CLK_EN | TVE_EN, 0); clk_disable_unprepare(tve->clk); } - spin_unlock_irqrestore(&tve->enable_lock, flags); } static int tve_setup_tvout(struct imx_tve *tve) @@ -601,7 +593,6 @@ static int imx_tve_probe(struct platform_device *pdev) tve->dev = &pdev->dev; spin_lock_init(&tve->lock); - spin_lock_init(&tve->enable_lock); ddc_node = of_parse_phandle(np, "ddc", 0); if (ddc_node) { -- cgit v0.10.2 From 942325c8b234fd67db976bd529d48320a6a73171 Mon Sep 17 00:00:00 2001 From: Russell King Date: Mon, 16 Dec 2013 12:38:50 +0000 Subject: imx-drm: imx-drm-core: use defined constant for number of CRTCs. We have this definition, there's no reason not to use it. Signed-off-by: Russell King Acked-by: Sascha Hauer Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/staging/imx-drm/imx-drm-core.c b/drivers/staging/imx-drm/imx-drm-core.c index aff4bd4..b52b827 100644 --- a/drivers/staging/imx-drm/imx-drm-core.c +++ b/drivers/staging/imx-drm/imx-drm-core.c @@ -200,8 +200,8 @@ static void imx_drm_driver_preclose(struct drm_device *drm, if (!file->is_master) return; - for (i = 0; i < 4; i++) - imx_drm_disable_vblank(drm , i); + for (i = 0; i < MAX_CRTC; i++) + imx_drm_disable_vblank(drm, i); } static const struct file_operations imx_drm_driver_fops = { -- cgit v0.10.2 From 9fe73d46edd358fc154f7332c8ff312e067255a0 Mon Sep 17 00:00:00 2001 From: Russell King Date: Mon, 16 Dec 2013 12:39:11 +0000 Subject: imx-drm: imx-drm-core: make imx_drm_crtc_register() safer imx_drm_crtc_register() doesn't clean up the CRTC upon failure, which leaves the CRTC attached to the DRM device. Also, it does setup after attaching the CRTC to the DRM device. Fix this by reordering the function such that we do the setup before drm_crtc_init(): this fixes both issues. Signed-off-by: Russell King Acked-by: Sascha Hauer Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/staging/imx-drm/imx-drm-core.c b/drivers/staging/imx-drm/imx-drm-core.c index b52b827..19e431d 100644 --- a/drivers/staging/imx-drm/imx-drm-core.c +++ b/drivers/staging/imx-drm/imx-drm-core.c @@ -377,8 +377,6 @@ static int imx_drm_crtc_register(struct imx_drm_crtc *imx_drm_crtc) struct imx_drm_device *imxdrm = __imx_drm_device(); int ret; - drm_crtc_init(imxdrm->drm, imx_drm_crtc->crtc, - imx_drm_crtc->imx_drm_helper_funcs.crtc_funcs); ret = drm_mode_crtc_set_gamma_size(imx_drm_crtc->crtc, 256); if (ret) return ret; @@ -386,6 +384,9 @@ static int imx_drm_crtc_register(struct imx_drm_crtc *imx_drm_crtc) drm_crtc_helper_add(imx_drm_crtc->crtc, imx_drm_crtc->imx_drm_helper_funcs.crtc_helper_funcs); + drm_crtc_init(imxdrm->drm, imx_drm_crtc->crtc, + imx_drm_crtc->imx_drm_helper_funcs.crtc_funcs); + drm_mode_group_reinit(imxdrm->drm); return 0; -- cgit v0.10.2 From fd6040ed57d8f200ab0cc2abf706c54995a48370 Mon Sep 17 00:00:00 2001 From: Russell King Date: Mon, 16 Dec 2013 12:39:31 +0000 Subject: imx-drm: imx-drm-core: improve safety of imx_drm_add_crtc() We must not add more CRTCs than we have declared to the vblank helpers, otherwise we overflow their arrays. Force failure if we exceed the number of CRTCs. Signed-off-by: Russell King Acked-by: Sascha Hauer Signed-off-by: Greg Kroah-Hartman diff --git a/drivers/staging/imx-drm/imx-drm-core.c b/drivers/staging/imx-drm/imx-drm-core.c index 19e431d..96e4eee 100644 --- a/drivers/staging/imx-drm/imx-drm-core.c +++ b/drivers/staging/imx-drm/imx-drm-core.c @@ -501,6 +501,15 @@ int imx_drm_add_crtc(struct drm_crtc *crtc, mutex_lock(&imxdrm->mutex); + /* + * The vblank arrays are dimensioned by MAX_CRTC - we can't + * pass IDs greater than this to those functions. + */ + if (imxdrm->pipes >= MAX_CRTC) { + ret = -EINVAL; + goto err_busy; + } + if (imxdrm->drm->open_count) { ret = -EBUSY; goto err_busy; -- cgit v0.10.2 From 85328240c625f322af9f69c7b60e619717101d77 Mon Sep 17 00:00:00 2001 From: John Fastabend Date: Tue, 26 Nov 2013 06:33:52 +0000 Subject: net: allow netdev_all_upper_get_next_dev_rcu with rtnl lock held It is useful to be able to walk all upper devices when bringing a device online where the RTNL lock is held. In this case it is safe to walk the all_adj_list because the RTNL lock is used to protect the write side as well. This patch adds a check to see if the rtnl lock is held before throwing a warning in netdev_all_upper_get_next_dev_rcu(). Also because we now have a call site for lockdep_rtnl_is_held() outside COFIG_LOCK_PROVING an inline definition returning 1 is needed. Similar to the rcu_read_lock_is_held(). Fixes: 2a47fa45d4df ("ixgbe: enable l2 forwarding acceleration for macvlans") CC: Veaceslav Falico Reported-by: Yuanhan Liu Signed-off-by: John Fastabend Tested-by: Phil Schmitt Signed-off-by: Jeff Kirsher diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index 939428a..8e3e66a 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -24,6 +24,11 @@ extern int rtnl_trylock(void); extern int rtnl_is_locked(void); #ifdef CONFIG_PROVE_LOCKING extern int lockdep_rtnl_is_held(void); +#else +static inline int lockdep_rtnl_is_held(void) +{ + return 1; +} #endif /* #ifdef CONFIG_PROVE_LOCKING */ /** diff --git a/net/core/dev.c b/net/core/dev.c index ba3b7ea..4fc1722 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4500,7 +4500,7 @@ struct net_device *netdev_all_upper_get_next_dev_rcu(struct net_device *dev, { struct netdev_adjacent *upper; - WARN_ON_ONCE(!rcu_read_lock_held()); + WARN_ON_ONCE(!rcu_read_lock_held() && !lockdep_rtnl_is_held()); upper = list_entry_rcu((*iter)->next, struct netdev_adjacent, list); -- cgit v0.10.2 From 34cf865d54813aab3497838132fb1bbd293f4054 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 18 Dec 2013 00:44:44 -0500 Subject: ext4: fix deadlock when writing in ENOSPC conditions Akira-san has been reporting rare deadlocks of his machine when running xfstests test 269 on ext4 filesystem. The problem turned out to be in ext4_da_reserve_metadata() and ext4_da_reserve_space() which called ext4_should_retry_alloc() while holding i_data_sem. Since ext4_should_retry_alloc() can force a transaction commit, this is a lock ordering violation and leads to deadlocks. Fix the problem by just removing the retry loops. These functions should just report ENOSPC to the caller (e.g. ext4_da_write_begin()) and that function must take care of retrying after dropping all necessary locks. Reported-and-tested-by: Akira Fujita Reviewed-by: Zheng Liu Signed-off-by: Jan Kara Signed-off-by: "Theodore Ts'o" Cc: stable@vger.kernel.org diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 0757634..61d49ff 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1206,7 +1206,6 @@ static int ext4_journalled_write_end(struct file *file, */ static int ext4_da_reserve_metadata(struct inode *inode, ext4_lblk_t lblock) { - int retries = 0; struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); struct ext4_inode_info *ei = EXT4_I(inode); unsigned int md_needed; @@ -1218,7 +1217,6 @@ static int ext4_da_reserve_metadata(struct inode *inode, ext4_lblk_t lblock) * in order to allocate nrblocks * worse case is one extent per block */ -repeat: spin_lock(&ei->i_block_reservation_lock); /* * ext4_calc_metadata_amount() has side effects, which we have @@ -1238,10 +1236,6 @@ repeat: ei->i_da_metadata_calc_len = save_len; ei->i_da_metadata_calc_last_lblock = save_last_lblock; spin_unlock(&ei->i_block_reservation_lock); - if (ext4_should_retry_alloc(inode->i_sb, &retries)) { - cond_resched(); - goto repeat; - } return -ENOSPC; } ei->i_reserved_meta_blocks += md_needed; @@ -1255,7 +1249,6 @@ repeat: */ static int ext4_da_reserve_space(struct inode *inode, ext4_lblk_t lblock) { - int retries = 0; struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); struct ext4_inode_info *ei = EXT4_I(inode); unsigned int md_needed; @@ -1277,7 +1270,6 @@ static int ext4_da_reserve_space(struct inode *inode, ext4_lblk_t lblock) * in order to allocate nrblocks * worse case is one extent per block */ -repeat: spin_lock(&ei->i_block_reservation_lock); /* * ext4_calc_metadata_amount() has side effects, which we have @@ -1297,10 +1289,6 @@ repeat: ei->i_da_metadata_calc_len = save_len; ei->i_da_metadata_calc_last_lblock = save_last_lblock; spin_unlock(&ei->i_block_reservation_lock); - if (ext4_should_retry_alloc(inode->i_sb, &retries)) { - cond_resched(); - goto repeat; - } dquot_release_reservation_block(inode, EXT4_C2B(sbi, 1)); return -ENOSPC; } -- cgit v0.10.2 From 9e6c3b63399dd424d33a34e08b77f2cab0b84cdc Mon Sep 17 00:00:00 2001 From: David Ertman Date: Sat, 14 Dec 2013 07:18:18 +0000 Subject: e1000e: fix compiler warnings This patch is to fix a compiler warning of __bad_udelay due to a value of >999 being passed as a parameter to udelay() in the function e1000e_phy_has_link_generic(). This affects the gcc compiler when it is given a flag of -O3 and the icc compiler. This patch is also making the change from mdelay() to msleep() in the same function, since it was determined though code inspection that this function is never called in atomic context. Signed-off-by: David Ertman Acked-by: Bruce Allan Tested-by: Aaron Brown Signed-off-by: Jeff Kirsher diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c index da2be59..20e71f4 100644 --- a/drivers/net/ethernet/intel/e1000e/phy.c +++ b/drivers/net/ethernet/intel/e1000e/phy.c @@ -1757,19 +1757,23 @@ s32 e1000e_phy_has_link_generic(struct e1000_hw *hw, u32 iterations, * it across the board. */ ret_val = e1e_rphy(hw, MII_BMSR, &phy_status); - if (ret_val) + if (ret_val) { /* If the first read fails, another entity may have * ownership of the resources, wait and try again to * see if they have relinquished the resources yet. */ - udelay(usec_interval); + if (usec_interval >= 1000) + msleep(usec_interval / 1000); + else + udelay(usec_interval); + } ret_val = e1e_rphy(hw, MII_BMSR, &phy_status); if (ret_val) break; if (phy_status & BMSR_LSTATUS) break; if (usec_interval >= 1000) - mdelay(usec_interval / 1000); + msleep(usec_interval / 1000); else udelay(usec_interval); } -- cgit v0.10.2 From 918a4308bad24a8c40a9e569fb7b3a6295ec7757 Mon Sep 17 00:00:00 2001 From: David Ertman Date: Sat, 14 Dec 2013 07:30:39 +0000 Subject: e1000e: fix compiler warning (maybe-unitialized variable) This patch is to fix a compiler warning of maybe-uininitialized-variable that is generated from gcc when the -O3 flag is used. In the function e1000_reset_hw_80003es2lan(), the variable krmn_reg_data is first given a value by being passed to a register read function as a pass-by-reference parameter. But, the return value of that read function was never checked to see if the read failed and the variable not given an initial value. The compiler was smart enough to spot this. This patch is to check the return value for that read function and return it, if an error occurs, without trying to utilize the value in kmrn_reg_data. Signed-off-by: David Ertman Tested-by: Aaron Brown Signed-off-by: Jeff Kirsher diff --git a/drivers/net/ethernet/intel/e1000e/80003es2lan.c b/drivers/net/ethernet/intel/e1000e/80003es2lan.c index 895450e..ff2d806 100644 --- a/drivers/net/ethernet/intel/e1000e/80003es2lan.c +++ b/drivers/net/ethernet/intel/e1000e/80003es2lan.c @@ -718,8 +718,11 @@ static s32 e1000_reset_hw_80003es2lan(struct e1000_hw *hw) e1000_release_phy_80003es2lan(hw); /* Disable IBIST slave mode (far-end loopback) */ - e1000_read_kmrn_reg_80003es2lan(hw, E1000_KMRNCTRLSTA_INBAND_PARAM, - &kum_reg_data); + ret_val = + e1000_read_kmrn_reg_80003es2lan(hw, E1000_KMRNCTRLSTA_INBAND_PARAM, + &kum_reg_data); + if (ret_val) + return ret_val; kum_reg_data |= E1000_KMRNCTRLSTA_IBIST_DISABLE; e1000_write_kmrn_reg_80003es2lan(hw, E1000_KMRNCTRLSTA_INBAND_PARAM, kum_reg_data); -- cgit v0.10.2 From 7509963c703b71eebccc421585e7f48ebbbd3f38 Mon Sep 17 00:00:00 2001 From: David Ertman Date: Tue, 17 Dec 2013 04:42:42 +0000 Subject: e1000e: Fix a compile flag mis-match for suspend/resume This patch addresses a mis-match between the declaration and usage of the e1000_suspend and e1000_resume functions. Previously, these functions were declared in a CONFIG_PM_SLEEP wrapper, and then utilized within a CONFIG_PM wrapper. Both the declaration and usage will now be contained within CONFIG_PM wrappers. Signed-off-by: Dave Ertman Tested-by: Aaron Brown Signed-off-by: Jeff Kirsher diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 8d3945a..c30d41d 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -6174,7 +6174,7 @@ static int __e1000_resume(struct pci_dev *pdev) return 0; } -#ifdef CONFIG_PM_SLEEP +#ifdef CONFIG_PM static int e1000_suspend(struct device *dev) { struct pci_dev *pdev = to_pci_dev(dev); @@ -6193,7 +6193,7 @@ static int e1000_resume(struct device *dev) return __e1000_resume(pdev); } -#endif /* CONFIG_PM_SLEEP */ +#endif /* CONFIG_PM */ #ifdef CONFIG_PM_RUNTIME static int e1000_runtime_suspend(struct device *dev) -- cgit v0.10.2 From 8f48f5bc759d6f945ff0c3b2bf2a1d5971e561ba Mon Sep 17 00:00:00 2001 From: Don Skidmore Date: Fri, 22 Nov 2013 04:27:23 +0000 Subject: ixgbe: fix for unused variable warning with certain config If CONFIG_PCI_IOV isn't defined we get an "unused variable" warining so now wrap the variable declaration like it's usage already was. Signed-off-by: Don Skidmore Acked-by: John Fastabend Tested-by: Phil Schmitt Signed-off-by: Jeff Kirsher diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c index d6f0c0d..72084f7 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c @@ -291,7 +291,9 @@ static int ixgbe_pci_sriov_disable(struct pci_dev *dev) { struct ixgbe_adapter *adapter = pci_get_drvdata(dev); int err; +#ifdef CONFIG_PCI_IOV u32 current_flags = adapter->flags; +#endif err = ixgbe_disable_sriov(adapter); -- cgit v0.10.2 From 36e7bb38028d3d812aa7749208249d600a30c22c Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Mon, 11 Nov 2013 19:29:47 +0530 Subject: powerpc: book3s: kvm: Don't abuse host r2 in exit path We don't use PACATOC for PR. Avoid updating HOST_R2 with PR KVM mode when both HV and PR are enabled in the kernel. Without this we get the below crash (qemu) Unable to handle kernel paging request for data at address 0xffffffffffff8310 Faulting instruction address: 0xc00000000001d5a4 cpu 0x2: Vector: 300 (Data Access) at [c0000001dc53aef0] pc: c00000000001d5a4: .vtime_delta.isra.1+0x34/0x1d0 lr: c00000000001d760: .vtime_account_system+0x20/0x60 sp: c0000001dc53b170 msr: 8000000000009032 dar: ffffffffffff8310 dsisr: 40000000 current = 0xc0000001d76c62d0 paca = 0xc00000000fef1100 softe: 0 irq_happened: 0x01 pid = 4472, comm = qemu-system-ppc enter ? for help [c0000001dc53b200] c00000000001d760 .vtime_account_system+0x20/0x60 [c0000001dc53b290] c00000000008d050 .kvmppc_handle_exit_pr+0x60/0xa50 [c0000001dc53b340] c00000000008f51c kvm_start_lightweight+0xb4/0xc4 [c0000001dc53b510] c00000000008cdf0 .kvmppc_vcpu_run_pr+0x150/0x2e0 [c0000001dc53b9e0] c00000000008341c .kvmppc_vcpu_run+0x2c/0x40 [c0000001dc53ba50] c000000000080af4 .kvm_arch_vcpu_ioctl_run+0x54/0x1b0 [c0000001dc53bae0] c00000000007b4c8 .kvm_vcpu_ioctl+0x478/0x730 [c0000001dc53bca0] c0000000002140cc .do_vfs_ioctl+0x4ac/0x770 [c0000001dc53bd80] c0000000002143e8 .SyS_ioctl+0x58/0xb0 [c0000001dc53be30] c000000000009e58 syscall_exit+0x0/0x98 Signed-off-by: Alexander Graf diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h index 412b2f3..192917d 100644 --- a/arch/powerpc/include/asm/kvm_book3s_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h @@ -79,6 +79,7 @@ struct kvmppc_host_state { ulong vmhandler; ulong scratch0; ulong scratch1; + ulong scratch2; u8 in_guest; u8 restore_hid5; u8 napping; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 2ea5cc0..d3de010 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -576,6 +576,7 @@ int main(void) HSTATE_FIELD(HSTATE_VMHANDLER, vmhandler); HSTATE_FIELD(HSTATE_SCRATCH0, scratch0); HSTATE_FIELD(HSTATE_SCRATCH1, scratch1); + HSTATE_FIELD(HSTATE_SCRATCH2, scratch2); HSTATE_FIELD(HSTATE_IN_GUEST, in_guest); HSTATE_FIELD(HSTATE_RESTORE_HID5, restore_hid5); HSTATE_FIELD(HSTATE_NAPPING, napping); diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index bde28da..be4fa04a 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -754,15 +754,14 @@ kvmppc_interrupt_hv: * guest CR, R12 saved in shadow VCPU SCRATCH1/0 * guest R13 saved in SPRN_SCRATCH0 */ - /* abuse host_r2 as third scratch area; we get r2 from PACATOC(r13) */ - std r9, HSTATE_HOST_R2(r13) + std r9, HSTATE_SCRATCH2(r13) lbz r9, HSTATE_IN_GUEST(r13) cmpwi r9, KVM_GUEST_MODE_HOST_HV beq kvmppc_bad_host_intr #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE cmpwi r9, KVM_GUEST_MODE_GUEST - ld r9, HSTATE_HOST_R2(r13) + ld r9, HSTATE_SCRATCH2(r13) beq kvmppc_interrupt_pr #endif /* We're now back in the host but in guest MMU context */ @@ -782,7 +781,7 @@ kvmppc_interrupt_hv: std r6, VCPU_GPR(R6)(r9) std r7, VCPU_GPR(R7)(r9) std r8, VCPU_GPR(R8)(r9) - ld r0, HSTATE_HOST_R2(r13) + ld r0, HSTATE_SCRATCH2(r13) std r0, VCPU_GPR(R9)(r9) std r10, VCPU_GPR(R10)(r9) std r11, VCPU_GPR(R11)(r9) -- cgit v0.10.2 From df9059bb64023da9f27e56a94a3e2b8f4b6336a9 Mon Sep 17 00:00:00 2001 From: Paul Mackerras Date: Mon, 16 Dec 2013 13:31:46 +1100 Subject: KVM: PPC: Book3S HV: Don't drop low-order page address bits Commit caaa4c804fae ("KVM: PPC: Book3S HV: Fix physical address calculations") unfortunately resulted in some low-order address bits getting dropped in the case where the guest is creating a 4k HPTE and the host page size is 64k. By getting the low-order bits from hva rather than gpa we miss out on bits 12 - 15 in this case, since hva is at page granularity. This puts the missing bits back in. Reported-by: Alexey Kardashevskiy Signed-off-by: Paul Mackerras Signed-off-by: Alexander Graf diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 1931aa3..8689e2e 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -240,6 +240,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, is_io = hpte_cache_bits(pte_val(pte)); pa = pte_pfn(pte) << PAGE_SHIFT; pa |= hva & (pte_size - 1); + pa |= gpa & ~PAGE_MASK; } } -- cgit v0.10.2 From 939fd1e8d9deff206f12bd9d4e54aa7a4bd0ffd6 Mon Sep 17 00:00:00 2001 From: Charles Keepax Date: Wed, 18 Dec 2013 09:25:49 +0000 Subject: ASoC: wm_adsp: Add small delay while polling DSP RAM start Some devices are getting very close to the limit whilst polling the RAM start, this patch adds a small delay to this loop to give a longer startup timeout. Signed-off-by: Charles Keepax Signed-off-by: Mark Brown Cc: stable@vger.kernel.org diff --git a/sound/soc/codecs/wm_adsp.c b/sound/soc/codecs/wm_adsp.c index 46ec0e9..4fbcab6 100644 --- a/sound/soc/codecs/wm_adsp.c +++ b/sound/soc/codecs/wm_adsp.c @@ -1474,13 +1474,17 @@ static int wm_adsp2_ena(struct wm_adsp *dsp) return ret; /* Wait for the RAM to start, should be near instantaneous */ - count = 0; - do { + for (count = 0; count < 10; ++count) { ret = regmap_read(dsp->regmap, dsp->base + ADSP2_STATUS1, &val); if (ret != 0) return ret; - } while (!(val & ADSP2_RAM_RDY) && ++count < 10); + + if (val & ADSP2_RAM_RDY) + break; + + msleep(1); + } if (!(val & ADSP2_RAM_RDY)) { adsp_err(dsp, "Failed to start DSP RAM\n"); -- cgit v0.10.2 From f0199bc5e3a3ec13f9bc938556517ec430b36437 Mon Sep 17 00:00:00 2001 From: Bo Shen Date: Wed, 18 Dec 2013 11:26:23 +0800 Subject: ASoC: wm8904: fix DSP mode B configuration When wm8904 work in DSP mode B, we still need to configure it to work in DSP mode. Or else, it will work in Right Justified mode. Signed-off-by: Bo Shen Acked-by: Charles Keepax Signed-off-by: Mark Brown Cc: stable@vger.kernel.org diff --git a/sound/soc/codecs/wm8904.c b/sound/soc/codecs/wm8904.c index 3938fb1..53bbfac 100644 --- a/sound/soc/codecs/wm8904.c +++ b/sound/soc/codecs/wm8904.c @@ -1444,7 +1444,7 @@ static int wm8904_set_fmt(struct snd_soc_dai *dai, unsigned int fmt) switch (fmt & SND_SOC_DAIFMT_FORMAT_MASK) { case SND_SOC_DAIFMT_DSP_B: - aif1 |= WM8904_AIF_LRCLK_INV; + aif1 |= 0x3 | WM8904_AIF_LRCLK_INV; case SND_SOC_DAIFMT_DSP_A: aif1 |= 0x3; break; -- cgit v0.10.2 From 3a6c5d8ad0a9253aafb76df3577edcb68c09b939 Mon Sep 17 00:00:00 2001 From: Hui Wang Date: Wed, 18 Dec 2013 18:09:56 +0800 Subject: ALSA: hda - Add Dell headset detection quirk for one more laptop model On the Dell machines with codec whose Subsystem Id is 0x10280640, no external microphone can be detected when plugging a 3-ring headset. Using ALC255_FIXUP_DELL1_MIC_NO_PRESENCE can fix this problem. The codec (Vendor ID: 0x10ec0255) on the machine belongs to alc_269 family. BugLink: https://bugs.launchpad.net/bugs/1260303 Cc: David Henningsson Cc: stable@vger.kernel.org Signed-off-by: Hui Wang Signed-off-by: Takashi Iwai diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 5ab8e16..c564694 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -4256,6 +4256,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1028, 0x0638, "Dell Inspiron 5439", ALC290_FIXUP_MONO_SPEAKERS), SND_PCI_QUIRK(0x1028, 0x063e, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1028, 0x063f, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x1028, 0x0640, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1028, 0x15cc, "Dell X5 Precision", ALC269_FIXUP_DELL2_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1028, 0x15cd, "Dell X5 Precision", ALC269_FIXUP_DELL2_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x103c, 0x1586, "HP", ALC269_FIXUP_HP_MUTE_LED_MIC2), -- cgit v0.10.2 From 0baf8f6a2ac86c2c40ed0cacab8ea3d17371a1bb Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Mon, 2 Dec 2013 18:01:30 +0000 Subject: dma: pl330: ensure DMA descriptors are zero-initialised I see the following splat with 3.13-rc1 when attempting to perform DMA: [ 253.004516] Alignment trap: not handling instruction e1902f9f at [] [ 253.004583] Unhandled fault: alignment exception (0x221) at 0xdfdfdfd7 [ 253.004646] Internal error: : 221 [#1] PREEMPT SMP ARM [ 253.004691] Modules linked in: dmatest(+) [last unloaded: dmatest] [ 253.004798] CPU: 0 PID: 671 Comm: kthreadd Not tainted 3.13.0-rc1+ #2 [ 253.004864] task: df9b0900 ti: df03e000 task.ti: df03e000 [ 253.004937] PC is at dmaengine_unmap_put+0x14/0x34 [ 253.005010] LR is at pl330_tasklet+0x3c8/0x550 [ 253.005087] pc : [] lr : [] psr: a00e0193 [ 253.005087] sp : df03fe48 ip : 00000000 fp : df03bf18 [ 253.005178] r10: bf00e108 r9 : 00000001 r8 : 00000000 [ 253.005245] r7 : df837040 r6 : dfb41800 r5 : df837048 r4 : df837000 [ 253.005316] r3 : dfdfdfcf r2 : dfb41f80 r1 : df837048 r0 : dfdfdfd7 [ 253.005384] Flags: NzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel [ 253.005459] Control: 30c5387d Table: 9fb9ba80 DAC: fffffffd [ 253.005520] Process kthreadd (pid: 671, stack limit = 0xdf03e248) This is due to desc->txd.unmap containing garbage (uninitialised memory). Rather than add another dummy initialisation to _init_desc, instead ensure that the descriptors are zero-initialised during allocation and remove the dummy, per-field initialisation. Cc: Andriy Shevchenko Acked-by: Jassi Brar Signed-off-by: Will Deacon Acked-by: Vinod Koul Signed-off-by: Dan Williams diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c index cdf0483..536632f 100644 --- a/drivers/dma/pl330.c +++ b/drivers/dma/pl330.c @@ -2492,12 +2492,9 @@ static dma_cookie_t pl330_tx_submit(struct dma_async_tx_descriptor *tx) static inline void _init_desc(struct dma_pl330_desc *desc) { - desc->pchan = NULL; desc->req.x = &desc->px; desc->req.token = desc; desc->rqcfg.swap = SWAP_NO; - desc->rqcfg.privileged = 0; - desc->rqcfg.insnaccess = 0; desc->rqcfg.scctl = SCCTRL0; desc->rqcfg.dcctl = DCCTRL0; desc->req.cfg = &desc->rqcfg; @@ -2517,7 +2514,7 @@ static int add_desc(struct dma_pl330_dmac *pdmac, gfp_t flg, int count) if (!pdmac) return 0; - desc = kmalloc(count * sizeof(*desc), flg); + desc = kcalloc(count, sizeof(*desc), flg); if (!desc) return 0; -- cgit v0.10.2 From bbca4d39175a96b882def1628e8b532998750dfa Mon Sep 17 00:00:00 2001 From: Gerhard Sittig Date: Tue, 10 Dec 2013 10:51:08 +0100 Subject: powerpc/512x: dts: remove misplaced IRQ spec from 'soc' node (5125) the 'soc' node in the MPC5125 "tower" board .dts has an '#interrupt-cells' property although this node is not an interrupt controller remove this erroneously placed property because starting with v3.13-rc1 lookup and resolution of 'interrupts' specs for peripherals gets misled (tries to use the 'soc' as the interrupt parent which fails), emits 'no irq domain found' WARN() messages and breaks the boot process [ best viewed with 'git diff -U5' to have DT node names in the context ] Cc: Anatolij Gustschin Cc: linuxppc-dev@lists.ozlabs.org Cc: devicetree@vger.kernel.org Signed-off-by: Gerhard Sittig Signed-off-by: Anatolij Gustschin diff --git a/arch/powerpc/boot/dts/mpc5125twr.dts b/arch/powerpc/boot/dts/mpc5125twr.dts index 4177b62..0a0fe92 100644 --- a/arch/powerpc/boot/dts/mpc5125twr.dts +++ b/arch/powerpc/boot/dts/mpc5125twr.dts @@ -58,7 +58,6 @@ compatible = "fsl,mpc5121-immr"; #address-cells = <1>; #size-cells = <1>; - #interrupt-cells = <2>; ranges = <0x0 0x80000000 0x400000>; reg = <0x80000000 0x400000>; bus-frequency = <66000000>; // 66 MHz ips bus -- cgit v0.10.2 From 77873803363c9e831fc1d1e6895c084279090c22 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Tue, 17 Dec 2013 10:09:32 -0800 Subject: net_dma: mark broken net_dma can cause data to be copied to a stale mapping if a copy-on-write fault occurs during dma. The application sees missing data. The following trace is triggered by modifying the kernel to WARN if it ever triggers copy-on-write on a page that is undergoing dma: WARNING: CPU: 24 PID: 2529 at lib/dma-debug.c:485 debug_dma_assert_idle+0xd2/0x120() ioatdma 0000:00:04.0: DMA-API: cpu touching an active dma mapped page [pfn=0x16bcd9] Modules linked in: iTCO_wdt iTCO_vendor_support ioatdma lpc_ich pcspkr dca CPU: 24 PID: 2529 Comm: linbug Tainted: G W 3.13.0-rc1+ #353 00000000000001e5 ffff88016f45f688 ffffffff81751041 ffff88017ab0ef70 ffff88016f45f6d8 ffff88016f45f6c8 ffffffff8104ed9c ffffffff810f3646 ffff8801768f4840 0000000000000282 ffff88016f6cca10 00007fa2bb699349 Call Trace: [] dump_stack+0x46/0x58 [] warn_slowpath_common+0x8c/0xc0 [] ? ftrace_pid_func+0x26/0x30 [] warn_slowpath_fmt+0x46/0x50 [] debug_dma_assert_idle+0xd2/0x120 [] do_wp_page+0xd0/0x790 [] handle_mm_fault+0x51c/0xde0 [] ? copy_user_enhanced_fast_string+0x9/0x20 [] __do_page_fault+0x19c/0x530 [] ? _raw_spin_lock_bh+0x16/0x40 [] ? trace_clock_local+0x9/0x10 [] ? rb_reserve_next_event+0x64/0x310 [] ? ioat2_dma_prep_memcpy_lock+0x60/0x130 [ioatdma] [] do_page_fault+0xe/0x10 [] page_fault+0x22/0x30 [] ? __kfree_skb+0x51/0xd0 [] ? copy_user_enhanced_fast_string+0x9/0x20 [] ? memcpy_toiovec+0x52/0xa0 [] skb_copy_datagram_iovec+0x5f/0x2a0 [] tcp_rcv_established+0x674/0x7f0 [] tcp_v4_do_rcv+0x2e5/0x4a0 [..] ---[ end trace e30e3b01191b7617 ]--- Mapped at: [] debug_dma_map_page+0xb9/0x160 [] dma_async_memcpy_pg_to_pg+0x127/0x210 [] dma_memcpy_pg_to_iovec+0x119/0x1f0 [] dma_skb_copy_datagram_iovec+0x11c/0x2b0 [] tcp_rcv_established+0x74a/0x7f0: ...the problem is that the receive path falls back to cpu-copy in several locations and this trace is just one of the areas. A few options were considered to fix this: 1/ sync all dma whenever a cpu copy branch is taken 2/ modify the page fault handler to hold off while dma is in-flight Option 1 adds yet more cpu overhead to an "offload" that struggles to compete with cpu-copy. Option 2 adds checks for behavior that is already documented as broken when using get_user_pages(). At a minimum a debug mode is warranted to catch and flag these violations of the dma-api vs get_user_pages(). Thanks to David for his reproducer. Cc: Cc: Dave Jiang Cc: Vinod Koul Cc: Alexander Duyck Reported-by: David Whipple Acked-by: David S. Miller Signed-off-by: Dan Williams diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 132a4fd..c823daa 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -355,6 +355,7 @@ config NET_DMA bool "Network: TCP receive copy offload" depends on DMA_ENGINE && NET default (INTEL_IOATDMA || FSL_DMA) + depends on BROKEN help This enables the use of DMA engines in the network stack to offload receive copy-to-user operations, freeing CPU cycles. -- cgit v0.10.2 From 71a06c59d19a57c8dd2972ebee88030f5b0c700e Mon Sep 17 00:00:00 2001 From: dingtianhong Date: Fri, 13 Dec 2013 17:29:19 +0800 Subject: bonding: protect port for bond_3ad_adapter_speed_changed() Jay Vosburgh said that the bond_3ad_adapter_speed_changed is called with RTNL only, and the function will modify the port's information with no further locking, it will not mutex against bond state machine and incoming LACPDU which do not hold RTNL, So I add __get_state_machine_lock to protect the port. But it is not a critical bug, it exist since day one, and till now it has never been hit and reported, because changes to speed is very rare, and will not occur critical problem. The comment in the function is very old, cleanup it. Suggested-by: Jay Vosburgh Signed-off-by: Ding Tianhong Signed-off-by: David S. Miller diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index 187b1b7..6405ce3 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -2201,20 +2201,25 @@ void bond_3ad_adapter_speed_changed(struct slave *slave) port = &(SLAVE_AD_INFO(slave).port); - // if slave is null, the whole port is not initialized + /* if slave is null, the whole port is not initialized */ if (!port->slave) { pr_warning("Warning: %s: speed changed for uninitialized port on %s\n", slave->bond->dev->name, slave->dev->name); return; } + __get_state_machine_lock(port); + port->actor_admin_port_key &= ~AD_SPEED_KEY_BITS; port->actor_oper_port_key = port->actor_admin_port_key |= (__get_link_speed(port) << 1); pr_debug("Port %d changed speed\n", port->actor_port_number); - // there is no need to reselect a new aggregator, just signal the - // state machines to reinitialize + /* there is no need to reselect a new aggregator, just signal the + * state machines to reinitialize + */ port->sm_vars |= AD_PORT_BEGIN; + + __release_state_machine_lock(port); } /** -- cgit v0.10.2 From bca44a7341924ec92ee7a1ba085c8f0745aeb74e Mon Sep 17 00:00:00 2001 From: dingtianhong Date: Fri, 13 Dec 2013 17:29:24 +0800 Subject: bonding: protect port for bond_3ad_adapter_duplex_changed() Jay Vosburgh said that the bond_3ad_adapter_duplex_changed is called with RTNL only, and the function will modify the port's information with no further locking, it will not mutex against bond state machine and incoming LACPDU which do not hold RTNL, So I add __get_state_machine_lock to protect the port. But it is not a critical bug, it exist since day one, and till now it has never been hit and reported, because changes to speed is very rare, and will not occur critical problem. The comments in the function is very old, cleanup it. Suggested-by: Jay Vosburgh Signed-off-by: Ding Tianhong Signed-off-by: David S. Miller diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index 6405ce3..e851a67 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -2234,20 +2234,25 @@ void bond_3ad_adapter_duplex_changed(struct slave *slave) port = &(SLAVE_AD_INFO(slave).port); - // if slave is null, the whole port is not initialized + /* if slave is null, the whole port is not initialized */ if (!port->slave) { pr_warning("%s: Warning: duplex changed for uninitialized port on %s\n", slave->bond->dev->name, slave->dev->name); return; } + __get_state_machine_lock(port); + port->actor_admin_port_key &= ~AD_DUPLEX_KEY_BITS; port->actor_oper_port_key = port->actor_admin_port_key |= __get_duplex(port); pr_debug("Port %d changed duplex\n", port->actor_port_number); - // there is no need to reselect a new aggregator, just signal the - // state machines to reinitialize + /* there is no need to reselect a new aggregator, just signal the + * state machines to reinitialize + */ port->sm_vars |= AD_PORT_BEGIN; + + __release_state_machine_lock(port); } /** -- cgit v0.10.2 From 108db7363736ceafdbc6fc708aa770939eb9906a Mon Sep 17 00:00:00 2001 From: dingtianhong Date: Fri, 13 Dec 2013 17:29:29 +0800 Subject: bonding: protect port for bond_3ad_handle_link_change() The bond_3ad_handle_link_change is called with RTNL only, and the function will modify the port's information with no further locking, it will not mutex against bond state machine and incoming LACPDU which do not hold RTNL, So I add __get_state_machine_lock to protect the port. But it is not a critical bug, it exist since day one, and till now it has never been hit and reported, because changes to speed is very rare, and will not occur critical problem. The comments in the function is very old, cleanup it and add a new pr_debug to debug the port message. Signed-off-by: Ding Tianhong Signed-off-by: David S. Miller diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index e851a67..4ced594 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -2268,15 +2268,21 @@ void bond_3ad_handle_link_change(struct slave *slave, char link) port = &(SLAVE_AD_INFO(slave).port); - // if slave is null, the whole port is not initialized + /* if slave is null, the whole port is not initialized */ if (!port->slave) { pr_warning("Warning: %s: link status changed for uninitialized port on %s\n", slave->bond->dev->name, slave->dev->name); return; } - // on link down we are zeroing duplex and speed since some of the adaptors(ce1000.lan) report full duplex/speed instead of N/A(duplex) / 0(speed) - // on link up we are forcing recheck on the duplex and speed since some of he adaptors(ce1000.lan) report + __get_state_machine_lock(port); + /* on link down we are zeroing duplex and speed since + * some of the adaptors(ce1000.lan) report full duplex/speed + * instead of N/A(duplex) / 0(speed). + * + * on link up we are forcing recheck on the duplex and speed since + * some of he adaptors(ce1000.lan) report. + */ if (link == BOND_LINK_UP) { port->is_enabled = true; port->actor_admin_port_key &= ~AD_DUPLEX_KEY_BITS; @@ -2292,10 +2298,15 @@ void bond_3ad_handle_link_change(struct slave *slave, char link) port->actor_oper_port_key = (port->actor_admin_port_key &= ~AD_SPEED_KEY_BITS); } - //BOND_PRINT_DBG(("Port %d changed link status to %s", port->actor_port_number, ((link == BOND_LINK_UP)?"UP":"DOWN"))); - // there is no need to reselect a new aggregator, just signal the - // state machines to reinitialize + pr_debug("Port %d changed link status to %s", + port->actor_port_number, + (link == BOND_LINK_UP) ? "UP" : "DOWN"); + /* there is no need to reselect a new aggregator, just signal the + * state machines to reinitialize + */ port->sm_vars |= AD_PORT_BEGIN; + + __release_state_machine_lock(port); } /* -- cgit v0.10.2 From a81d8762d71396f69d27187a909fcc69f1a7be75 Mon Sep 17 00:00:00 2001 From: Mugunthan V N Date: Fri, 13 Dec 2013 18:42:55 +0530 Subject: drivers: net cpsw: Enable In Band mode in cpsw for 10 mbps This patch adds support for enabling In Band mode in 10 mbps speed. RGMII supports 1 Gig and 100 mbps mode for Forced mode of operation. For 10mbps mode it should be configured to in band mode so that link status, duplexity and speed are determined from the RGMII input data stream Signed-off-by: Mugunthan V N Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 5120d9c..614f284 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -740,6 +740,8 @@ static void _cpsw_adjust_link(struct cpsw_slave *slave, /* set speed_in input in case RMII mode is used in 100Mbps */ if (phy->speed == 100) mac_control |= BIT(15); + else if (phy->speed == 10) + mac_control |= BIT(18); /* In Band mode */ *link = true; } else { -- cgit v0.10.2 From 0e3da5bb8da45890b1dc413404e0f978ab71173e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Timo=20Ter=C3=A4s?= Date: Mon, 16 Dec 2013 11:02:09 +0200 Subject: ip_gre: fix msg_name parsing for recvfrom/recvmsg MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ipgre_header_parse() needs to parse the tunnel's ip header and it uses mac_header to locate the iphdr. This got broken when gre tunneling was refactored as mac_header is no longer updated to point to iphdr. Introduce skb_pop_mac_header() helper to do the mac_header assignment and use it in ipgre_rcv() to fix msg_name parsing. Bug introduced in commit c54419321455 (GRE: Refactor GRE tunneling code.) Cc: Pravin B Shelar Signed-off-by: Timo Teräs Signed-off-by: David S. Miller diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 215b5ea..6aae838 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1638,6 +1638,11 @@ static inline void skb_set_mac_header(struct sk_buff *skb, const int offset) skb->mac_header += offset; } +static inline void skb_pop_mac_header(struct sk_buff *skb) +{ + skb->mac_header = skb->network_header; +} + static inline void skb_probe_transport_header(struct sk_buff *skb, const int offset_hint) { diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index d7aea4c..e560ef3 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -217,6 +217,7 @@ static int ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi) iph->saddr, iph->daddr, tpi->key); if (tunnel) { + skb_pop_mac_header(skb); ip_tunnel_rcv(tunnel, skb, tpi, log_ecn_error); return PACKET_RCVD; } -- cgit v0.10.2 From 3b3878926c7679765635ed2803dd694a8658042d Mon Sep 17 00:00:00 2001 From: Peter Korsgaard Date: Mon, 16 Dec 2013 11:35:32 +0100 Subject: dm9601: add support for dm9621a based dongle dm9621a is functionally identical to dm9620, so the existing handling can directly be used. Thanks to Davicom for sending me a dongle. Signed-off-by: Peter Korsgaard Signed-off-by: David S. Miller diff --git a/drivers/net/usb/dm9601.c b/drivers/net/usb/dm9601.c index c6867f9..8e88572 100644 --- a/drivers/net/usb/dm9601.c +++ b/drivers/net/usb/dm9601.c @@ -594,6 +594,10 @@ static const struct usb_device_id products[] = { USB_DEVICE(0x0a46, 0x9620), /* DM9620 USB to Fast Ethernet Adapter */ .driver_info = (unsigned long)&dm9601_info, }, + { + USB_DEVICE(0x0a46, 0x9621), /* DM9621A USB to Fast Ethernet Adapter */ + .driver_info = (unsigned long)&dm9601_info, + }, {}, // END }; -- cgit v0.10.2 From 407900cfb54bdb2cfa228010b6697305f66b2948 Mon Sep 17 00:00:00 2001 From: Peter Korsgaard Date: Mon, 16 Dec 2013 11:35:33 +0100 Subject: dm9601: fix reception of full size ethernet frames on dm9620/dm9621a dm9620/dm9621a require room for 4 byte padding even in dm9601 (3 byte header) mode. Cc: Signed-off-by: Peter Korsgaard Signed-off-by: David S. Miller diff --git a/drivers/net/usb/dm9601.c b/drivers/net/usb/dm9601.c index 8e88572..aca0285 100644 --- a/drivers/net/usb/dm9601.c +++ b/drivers/net/usb/dm9601.c @@ -364,7 +364,12 @@ static int dm9601_bind(struct usbnet *dev, struct usb_interface *intf) dev->net->ethtool_ops = &dm9601_ethtool_ops; dev->net->hard_header_len += DM_TX_OVERHEAD; dev->hard_mtu = dev->net->mtu + dev->net->hard_header_len; - dev->rx_urb_size = dev->net->mtu + ETH_HLEN + DM_RX_OVERHEAD; + + /* dm9620/21a require room for 4 byte padding, even in dm9601 + * mode, so we need +1 to be able to receive full size + * ethernet frames. + */ + dev->rx_urb_size = dev->net->mtu + ETH_HLEN + DM_RX_OVERHEAD + 1; dev->mii.dev = dev->net; dev->mii.mdio_read = dm9601_mdio_read; -- cgit v0.10.2 From cabd0e3a3861d45569b8f91633c98fc48d820cdb Mon Sep 17 00:00:00 2001 From: Peter Korsgaard Date: Mon, 16 Dec 2013 11:35:34 +0100 Subject: dm9601: make it clear that dm9620/dm9621a are also supported The driver nowadays also support dm9620/dm9621a based USB 2.0 ethernet adapters, so adjust module/driver description and Kconfig help text to match. Signed-off-by: Peter Korsgaard Signed-off-by: David S. Miller diff --git a/drivers/net/usb/Kconfig b/drivers/net/usb/Kconfig index 85e4a016..47b0f73 100644 --- a/drivers/net/usb/Kconfig +++ b/drivers/net/usb/Kconfig @@ -276,12 +276,12 @@ config USB_NET_CDC_MBIM module will be called cdc_mbim. config USB_NET_DM9601 - tristate "Davicom DM9601 based USB 1.1 10/100 ethernet devices" + tristate "Davicom DM96xx based USB 10/100 ethernet devices" depends on USB_USBNET select CRC32 help - This option adds support for Davicom DM9601 based USB 1.1 - 10/100 Ethernet adapters. + This option adds support for Davicom DM9601/DM9620/DM9621A + based USB 10/100 Ethernet adapters. config USB_NET_SR9700 tristate "CoreChip-sz SR9700 based USB 1.1 10/100 ethernet devices" diff --git a/drivers/net/usb/dm9601.c b/drivers/net/usb/dm9601.c index aca0285..ca99c35 100644 --- a/drivers/net/usb/dm9601.c +++ b/drivers/net/usb/dm9601.c @@ -1,5 +1,5 @@ /* - * Davicom DM9601 USB 1.1 10/100Mbps ethernet devices + * Davicom DM96xx USB 10/100Mbps ethernet devices * * Peter Korsgaard * @@ -548,7 +548,7 @@ static int dm9601_link_reset(struct usbnet *dev) } static const struct driver_info dm9601_info = { - .description = "Davicom DM9601 USB Ethernet", + .description = "Davicom DM96xx USB 10/100 Ethernet", .flags = FLAG_ETHER | FLAG_LINK_INTR, .bind = dm9601_bind, .rx_fixup = dm9601_rx_fixup, @@ -621,5 +621,5 @@ static struct usb_driver dm9601_driver = { module_usb_driver(dm9601_driver); MODULE_AUTHOR("Peter Korsgaard "); -MODULE_DESCRIPTION("Davicom DM9601 USB 1.1 ethernet devices"); +MODULE_DESCRIPTION("Davicom DM96xx USB 10/100 ethernet devices"); MODULE_LICENSE("GPL"); -- cgit v0.10.2 From 4263c86dca5198da6bd3ad826d0b2304fbe25776 Mon Sep 17 00:00:00 2001 From: Peter Korsgaard Date: Mon, 16 Dec 2013 11:35:35 +0100 Subject: dm9601: work around tx fifo sync issue on dm962x Certain dm962x revisions contain an bug, where if a USB bulk transfer retry (E.G. if bulk crc mismatch) happens right after a transfer with odd or maxpacket length, the internal tx hardware fifo gets out of sync causing the interface to stop working. Work around it by adding up to 3 bytes of padding to ensure this situation cannot trigger. This workaround also means we never pass multiple-of-maxpacket size skb's to usbnet, so the length adjustment to handle usbnet's padding of those can be removed. Cc: Reported-by: Joseph Chang Signed-off-by: Peter Korsgaard Signed-off-by: David S. Miller diff --git a/drivers/net/usb/dm9601.c b/drivers/net/usb/dm9601.c index ca99c35..14aa48f 100644 --- a/drivers/net/usb/dm9601.c +++ b/drivers/net/usb/dm9601.c @@ -473,7 +473,7 @@ static int dm9601_rx_fixup(struct usbnet *dev, struct sk_buff *skb) static struct sk_buff *dm9601_tx_fixup(struct usbnet *dev, struct sk_buff *skb, gfp_t flags) { - int len; + int len, pad; /* format: b1: packet length low @@ -481,12 +481,23 @@ static struct sk_buff *dm9601_tx_fixup(struct usbnet *dev, struct sk_buff *skb, b3..n: packet data */ - len = skb->len; + len = skb->len + DM_TX_OVERHEAD; + + /* workaround for dm962x errata with tx fifo getting out of + * sync if a USB bulk transfer retry happens right after a + * packet with odd / maxpacket length by adding up to 3 bytes + * padding. + */ + while ((len & 1) || !(len % dev->maxpacket)) + len++; - if (skb_headroom(skb) < DM_TX_OVERHEAD) { + len -= DM_TX_OVERHEAD; /* hw header doesn't count as part of length */ + pad = len - skb->len; + + if (skb_headroom(skb) < DM_TX_OVERHEAD || skb_tailroom(skb) < pad) { struct sk_buff *skb2; - skb2 = skb_copy_expand(skb, DM_TX_OVERHEAD, 0, flags); + skb2 = skb_copy_expand(skb, DM_TX_OVERHEAD, pad, flags); dev_kfree_skb_any(skb); skb = skb2; if (!skb) @@ -495,10 +506,10 @@ static struct sk_buff *dm9601_tx_fixup(struct usbnet *dev, struct sk_buff *skb, __skb_push(skb, DM_TX_OVERHEAD); - /* usbnet adds padding if length is a multiple of packet size - if so, adjust length value in header */ - if ((skb->len % dev->maxpacket) == 0) - len++; + if (pad) { + memset(skb->data + skb->len, 0, pad); + __skb_put(skb, pad); + } skb->data[0] = len; skb->data[1] = len >> 8; -- cgit v0.10.2 From 4df98e76cde7c64b5606d82584c65dda4151bd6a Mon Sep 17 00:00:00 2001 From: Hannes Frederic Sowa Date: Mon, 16 Dec 2013 12:36:44 +0100 Subject: ipv6: pmtudisc setting not respected with UFO/CORK Sockets marked with IPV6_PMTUDISC_PROBE (or later IPV6_PMTUDISC_INTERFACE) don't respect this setting when the outgoing interface supports UFO. We had the same problem in IPv4, which was fixed in commit daba287b299ec7a2c61ae3a714920e90e8396ad5 ("ipv4: fix DO and PROBE pmtu mode regarding local fragmentation with UFO/CORK"). Also IPV6_DONTFRAG mode did not care about already corked data, thus it may generate a fragmented frame even if this socket option was specified. It also did not care about the length of the ipv6 header and possible options. In the error path allow the user to receive the pmtu notifications via both, rxpmtu method or error queue. The user may opted in for both, so deliver the notification to both error handlers (the handlers check if the error needs to be enqueued). Also report back consistent pmtu values when sending on an already cork-appended socket. Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 4acdb63..e6f9319 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1193,11 +1193,35 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to, fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len + (opt ? opt->opt_nflen : 0); - maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen - sizeof(struct frag_hdr); + maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen - + sizeof(struct frag_hdr); if (mtu <= sizeof(struct ipv6hdr) + IPV6_MAXPLEN) { - if (cork->length + length > sizeof(struct ipv6hdr) + IPV6_MAXPLEN - fragheaderlen) { - ipv6_local_error(sk, EMSGSIZE, fl6, mtu-exthdrlen); + unsigned int maxnonfragsize, headersize; + + headersize = sizeof(struct ipv6hdr) + + (opt ? opt->tot_len : 0) + + (dst_allfrag(&rt->dst) ? + sizeof(struct frag_hdr) : 0) + + rt->rt6i_nfheader_len; + + maxnonfragsize = (np->pmtudisc >= IPV6_PMTUDISC_DO) ? + mtu : sizeof(struct ipv6hdr) + IPV6_MAXPLEN; + + /* dontfrag active */ + if ((cork->length + length > mtu - headersize) && dontfrag && + (sk->sk_protocol == IPPROTO_UDP || + sk->sk_protocol == IPPROTO_RAW)) { + ipv6_local_rxpmtu(sk, fl6, mtu - headersize + + sizeof(struct ipv6hdr)); + goto emsgsize; + } + + if (cork->length + length > maxnonfragsize - headersize) { +emsgsize: + ipv6_local_error(sk, EMSGSIZE, fl6, + mtu - headersize + + sizeof(struct ipv6hdr)); return -EMSGSIZE; } } @@ -1222,12 +1246,6 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to, * --yoshfuji */ - if ((length > mtu) && dontfrag && (sk->sk_protocol == IPPROTO_UDP || - sk->sk_protocol == IPPROTO_RAW)) { - ipv6_local_rxpmtu(sk, fl6, mtu-exthdrlen); - return -EMSGSIZE; - } - skb = skb_peek_tail(&sk->sk_write_queue); cork->length += length; if (((length > mtu) || -- cgit v0.10.2 From 58a4782449c5882f61882396ef18cc34c7dc1269 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 16 Dec 2013 06:31:23 -0800 Subject: ipv6: sit: update mtu check to take care of gso packets While testing my changes for TSO support in SIT devices, I was using sit0 tunnel which appears to include nopmtudisc flag. But using : ip tun add sittun mode sit remote $REMOTE_IPV4 local $LOCAL_IPV4 \ dev $IFACE We get a tunnel which rejects too long packets because of the mtu check which is not yet GSO aware. erd:~# ip tunnel sittun: ipv6/ip remote 10.246.17.84 local 10.246.17.83 ttl inherit 6rd-prefix 2002::/16 sit0: ipv6/ip remote any local any ttl 64 nopmtudisc 6rd-prefix 2002::/16 This patch is based on an excellent report from Michal Shmidt. In the future, we probably want to extend the MTU check to do the right thing for GSO packets... Fixes: ("61c1db7fae21 ipv6: sit: add GSO/TSO support") Reported-by: Michal Schmidt Signed-off-by: Eric Dumazet Tested-by: Michal Schmidt Signed-off-by: David S. Miller diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index 366fbba..a710fde 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -924,7 +924,7 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb, if (tunnel->parms.iph.daddr && skb_dst(skb)) skb_dst(skb)->ops->update_pmtu(skb_dst(skb), NULL, skb, mtu); - if (skb->len > mtu) { + if (skb->len > mtu && !skb_is_gso(skb)) { icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu); ip_rt_put(rt); goto tx_error; -- cgit v0.10.2 From c97102ba96324da330078ad8619ba4dfe840dbe3 Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Wed, 18 Dec 2013 17:08:31 -0800 Subject: kexec: migrate to reboot cpu Commit 1b3a5d02ee07 ("reboot: move arch/x86 reboot= handling to generic kernel") moved reboot= handling to generic code. In the process it also removed the code in native_machine_shutdown() which are moving reboot process to reboot_cpu/cpu0. I guess that thought must have been that all reboot paths are calling migrate_to_reboot_cpu(), so we don't need this special handling. But kexec reboot path (kernel_kexec()) is not calling migrate_to_reboot_cpu() so above change broke kexec. Now reboot can happen on non-boot cpu and when INIT is sent in second kerneo to bring up BP, it brings down the machine. So start calling migrate_to_reboot_cpu() in kexec reboot path to avoid this problem. Bisected by WANG Chao. Reported-by: Matthew Whitehead Reported-by: Dave Young Signed-off-by: Vivek Goyal Tested-by: Baoquan He Tested-by: WANG Chao Acked-by: H. Peter Anvin Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/reboot.h b/include/linux/reboot.h index 8e00f9f..9e7db9e 100644 --- a/include/linux/reboot.h +++ b/include/linux/reboot.h @@ -43,6 +43,7 @@ extern int unregister_reboot_notifier(struct notifier_block *); * Architecture-specific implementations of sys_reboot commands. */ +extern void migrate_to_reboot_cpu(void); extern void machine_restart(char *cmd); extern void machine_halt(void); extern void machine_power_off(void); diff --git a/kernel/kexec.c b/kernel/kexec.c index d0d8fca..9c97016 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1680,6 +1680,7 @@ int kernel_kexec(void) { kexec_in_progress = true; kernel_restart_prepare(NULL); + migrate_to_reboot_cpu(); printk(KERN_EMERG "Starting new kernel\n"); machine_shutdown(); } diff --git a/kernel/reboot.c b/kernel/reboot.c index f813b34..662c83f 100644 --- a/kernel/reboot.c +++ b/kernel/reboot.c @@ -104,7 +104,7 @@ int unregister_reboot_notifier(struct notifier_block *nb) } EXPORT_SYMBOL(unregister_reboot_notifier); -static void migrate_to_reboot_cpu(void) +void migrate_to_reboot_cpu(void) { /* The boot cpu is always logical cpu 0 */ int cpu = reboot_cpu; -- cgit v0.10.2 From 2b4847e73004c10ae6666c2e27b5c5430aed8698 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Wed, 18 Dec 2013 17:08:32 -0800 Subject: mm: numa: serialise parallel get_user_page against THP migration Base pages are unmapped and flushed from cache and TLB during normal page migration and replaced with a migration entry that causes any parallel NUMA hinting fault or gup to block until migration completes. THP does not unmap pages due to a lack of support for migration entries at a PMD level. This allows races with get_user_pages and get_user_pages_fast which commit 3f926ab945b6 ("mm: Close races between THP migration and PMD numa clearing") made worse by introducing a pmd_clear_flush(). This patch forces get_user_page (fast and normal) on a pmd_numa page to go through the slow get_user_page path where it will serialise against THP migration and properly account for the NUMA hinting fault. On the migration side the page table lock is taken for each PTE update. Signed-off-by: Mel Gorman Reviewed-by: Rik van Riel Cc: Alex Thorlton Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c index dd74e46..0596e8e 100644 --- a/arch/x86/mm/gup.c +++ b/arch/x86/mm/gup.c @@ -83,6 +83,12 @@ static noinline int gup_pte_range(pmd_t pmd, unsigned long addr, pte_t pte = gup_get_pte(ptep); struct page *page; + /* Similar to the PMD case, NUMA hinting must take slow path */ + if (pte_numa(pte)) { + pte_unmap(ptep); + return 0; + } + if ((pte_flags(pte) & (mask | _PAGE_SPECIAL)) != mask) { pte_unmap(ptep); return 0; @@ -167,6 +173,13 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end, if (pmd_none(pmd) || pmd_trans_splitting(pmd)) return 0; if (unlikely(pmd_large(pmd))) { + /* + * NUMA hinting faults need to be handled in the GUP + * slowpath for accounting purposes and so that they + * can be serialised against THP migration. + */ + if (pmd_numa(pmd)) + return 0; if (!gup_huge_pmd(pmd, addr, next, write, pages, nr)) return 0; } else { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 33a5dc4..51f0693 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1243,6 +1243,10 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd)) return ERR_PTR(-EFAULT); + /* Full NUMA hinting faults to serialise migration in fault paths */ + if ((flags & FOLL_NUMA) && pmd_numa(*pmd)) + goto out; + page = pmd_page(*pmd); VM_BUG_ON(!PageHead(page)); if (flags & FOLL_TOUCH) { @@ -1323,23 +1327,27 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma, /* If the page was locked, there are no parallel migrations */ if (page_locked) goto clear_pmdnuma; + } - /* - * Otherwise wait for potential migrations and retry. We do - * relock and check_same as the page may no longer be mapped. - * As the fault is being retried, do not account for it. - */ + /* + * If there are potential migrations, wait for completion and retry. We + * do not relock and check_same as the page may no longer be mapped. + * Furtermore, even if the page is currently misplaced, there is no + * guarantee it is still misplaced after the migration completes. + */ + if (!page_locked) { spin_unlock(ptl); wait_on_page_locked(page); page_nid = -1; goto out; } - /* Page is misplaced, serialise migrations and parallel THP splits */ + /* + * Page is misplaced. Page lock serialises migrations. Acquire anon_vma + * to serialises splits + */ get_page(page); spin_unlock(ptl); - if (!page_locked) - lock_page(page); anon_vma = page_lock_anon_vma_read(page); /* Confirm the PMD did not change while page_table_lock was released */ diff --git a/mm/migrate.c b/mm/migrate.c index bb94004..2cabbd5 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1722,6 +1722,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, struct page *new_page = NULL; struct mem_cgroup *memcg = NULL; int page_lru = page_is_file_cache(page); + pmd_t orig_entry; /* * Rate-limit the amount of data that is being migrated to a node. @@ -1756,7 +1757,8 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, /* Recheck the target PMD */ ptl = pmd_lock(mm, pmd); - if (unlikely(!pmd_same(*pmd, entry))) { + if (unlikely(!pmd_same(*pmd, entry) || page_count(page) != 2)) { +fail_putback: spin_unlock(ptl); /* Reverse changes made by migrate_page_copy() */ @@ -1786,16 +1788,34 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, */ mem_cgroup_prepare_migration(page, new_page, &memcg); + orig_entry = *pmd; entry = mk_pmd(new_page, vma->vm_page_prot); - entry = pmd_mknonnuma(entry); - entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); entry = pmd_mkhuge(entry); + entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); + /* + * Clear the old entry under pagetable lock and establish the new PTE. + * Any parallel GUP will either observe the old page blocking on the + * page lock, block on the page table lock or observe the new page. + * The SetPageUptodate on the new page and page_add_new_anon_rmap + * guarantee the copy is visible before the pagetable update. + */ + flush_cache_range(vma, haddr, haddr + HPAGE_PMD_SIZE); + page_add_new_anon_rmap(new_page, vma, haddr); pmdp_clear_flush(vma, haddr, pmd); set_pmd_at(mm, haddr, pmd, entry); - page_add_new_anon_rmap(new_page, vma, haddr); update_mmu_cache_pmd(vma, address, &entry); + + if (page_count(page) != 2) { + set_pmd_at(mm, haddr, pmd, orig_entry); + flush_tlb_range(vma, haddr, haddr + HPAGE_PMD_SIZE); + update_mmu_cache_pmd(vma, address, &entry); + page_remove_rmap(new_page); + goto fail_putback; + } + page_remove_rmap(page); + /* * Finish the charge transaction under the page table lock to * prevent split_huge_page() from dividing up the charge @@ -1820,9 +1840,13 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, out_fail: count_vm_events(PGMIGRATE_FAIL, HPAGE_PMD_NR); out_dropref: - entry = pmd_mknonnuma(entry); - set_pmd_at(mm, haddr, pmd, entry); - update_mmu_cache_pmd(vma, address, &entry); + ptl = pmd_lock(mm, pmd); + if (pmd_same(*pmd, entry)) { + entry = pmd_mknonnuma(entry); + set_pmd_at(mm, haddr, pmd, entry); + update_mmu_cache_pmd(vma, address, &entry); + } + spin_unlock(ptl); unlock_page(page); put_page(page); -- cgit v0.10.2 From f714f4f20e59ea6eea264a86b9a51fd51b88fc54 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Wed, 18 Dec 2013 17:08:33 -0800 Subject: mm: numa: call MMU notifiers on THP migration MMU notifiers must be called on THP page migration or secondary MMUs will get very confused. Signed-off-by: Mel Gorman Reviewed-by: Rik van Riel Cc: Alex Thorlton Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/migrate.c b/mm/migrate.c index 2cabbd5..be787d5 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -36,6 +36,7 @@ #include #include #include +#include #include @@ -1716,12 +1717,13 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, struct page *page, int node) { spinlock_t *ptl; - unsigned long haddr = address & HPAGE_PMD_MASK; pg_data_t *pgdat = NODE_DATA(node); int isolated = 0; struct page *new_page = NULL; struct mem_cgroup *memcg = NULL; int page_lru = page_is_file_cache(page); + unsigned long mmun_start = address & HPAGE_PMD_MASK; + unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE; pmd_t orig_entry; /* @@ -1756,10 +1758,12 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, WARN_ON(PageLRU(new_page)); /* Recheck the target PMD */ + mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end); ptl = pmd_lock(mm, pmd); if (unlikely(!pmd_same(*pmd, entry) || page_count(page) != 2)) { fail_putback: spin_unlock(ptl); + mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); /* Reverse changes made by migrate_page_copy() */ if (TestClearPageActive(new_page)) @@ -1800,15 +1804,16 @@ fail_putback: * The SetPageUptodate on the new page and page_add_new_anon_rmap * guarantee the copy is visible before the pagetable update. */ - flush_cache_range(vma, haddr, haddr + HPAGE_PMD_SIZE); - page_add_new_anon_rmap(new_page, vma, haddr); - pmdp_clear_flush(vma, haddr, pmd); - set_pmd_at(mm, haddr, pmd, entry); + flush_cache_range(vma, mmun_start, mmun_end); + page_add_new_anon_rmap(new_page, vma, mmun_start); + pmdp_clear_flush(vma, mmun_start, pmd); + set_pmd_at(mm, mmun_start, pmd, entry); + flush_tlb_range(vma, mmun_start, mmun_end); update_mmu_cache_pmd(vma, address, &entry); if (page_count(page) != 2) { - set_pmd_at(mm, haddr, pmd, orig_entry); - flush_tlb_range(vma, haddr, haddr + HPAGE_PMD_SIZE); + set_pmd_at(mm, mmun_start, pmd, orig_entry); + flush_tlb_range(vma, mmun_start, mmun_end); update_mmu_cache_pmd(vma, address, &entry); page_remove_rmap(new_page); goto fail_putback; @@ -1823,6 +1828,7 @@ fail_putback: */ mem_cgroup_end_migration(memcg, page, new_page, true); spin_unlock(ptl); + mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); unlock_page(new_page); unlock_page(page); @@ -1843,7 +1849,7 @@ out_dropref: ptl = pmd_lock(mm, pmd); if (pmd_same(*pmd, entry)) { entry = pmd_mknonnuma(entry); - set_pmd_at(mm, haddr, pmd, entry); + set_pmd_at(mm, mmun_start, pmd, entry); update_mmu_cache_pmd(vma, address, &entry); } spin_unlock(ptl); -- cgit v0.10.2 From 67f87463d3a3362424efcbe8b40e4772fd34fc61 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Wed, 18 Dec 2013 17:08:34 -0800 Subject: mm: clear pmd_numa before invalidating On x86, PMD entries are similar to _PAGE_PROTNONE protection and are handled as NUMA hinting faults. The following two page table protection bits are what defines them _PAGE_NUMA:set _PAGE_PRESENT:clear A PMD is considered present if any of the _PAGE_PRESENT, _PAGE_PROTNONE, _PAGE_PSE or _PAGE_NUMA bits are set. If pmdp_invalidate encounters a pmd_numa, it clears the present bit leaving _PAGE_NUMA which will be considered not present by the CPU but present by pmd_present. The existing caller of pmdp_invalidate should handle it but it's an inconsistent state for a PMD. This patch keeps the state consistent when calling pmdp_invalidate. Signed-off-by: Mel Gorman Reviewed-by: Rik van Riel Cc: Alex Thorlton Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index cbb3854..e84cad2 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -191,6 +191,9 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp) void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) { + pmd_t entry = *pmdp; + if (pmd_numa(entry)) + entry = pmd_mknonnuma(entry); set_pmd_at(vma->vm_mm, address, pmdp, pmd_mknotpresent(*pmdp)); flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE); } -- cgit v0.10.2 From 5a6dac3ec5f583cc8ee7bc53b5500a207c4ca433 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Wed, 18 Dec 2013 17:08:36 -0800 Subject: mm: numa: do not clear PMD during PTE update scan If the PMD is flushed then a parallel fault in handle_mm_fault() will enter the pmd_none and do_huge_pmd_anonymous_page() path where it'll attempt to insert a huge zero page. This is wasteful so the patch avoids clearing the PMD when setting pmd_numa. Signed-off-by: Mel Gorman Reviewed-by: Rik van Riel Cc: Alex Thorlton Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 51f0693..420826e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1539,7 +1539,7 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, */ if (!is_huge_zero_page(page) && !pmd_numa(*pmd)) { - entry = pmdp_get_and_clear(mm, addr, pmd); + entry = *pmd; entry = pmd_mknuma(entry); ret = HPAGE_PMD_NR; } -- cgit v0.10.2 From 0c5f83c23ca703d32f930393825487257a5cde6d Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Wed, 18 Dec 2013 17:08:37 -0800 Subject: mm: numa: do not clear PTE for pte_numa update The TLB must be flushed if the PTE is updated but change_pte_range is clearing the PTE while marking PTEs pte_numa without necessarily flushing the TLB if it reinserts the same entry. Without the flush, it's conceivable that two processors have different TLBs for the same virtual address and at the very least it would generate spurious faults. This patch only unmaps the pages in change_pte_range for a full protection change. [riel@redhat.com: write pte_numa pte back to the page tables] Signed-off-by: Mel Gorman Signed-off-by: Rik van Riel Reviewed-by: Rik van Riel Cc: Alex Thorlton Cc: Chegu Vinod Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/mprotect.c b/mm/mprotect.c index 2666797..1291a05 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -52,17 +52,19 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, pte_t ptent; bool updated = false; - ptent = ptep_modify_prot_start(mm, addr, pte); if (!prot_numa) { + ptent = ptep_modify_prot_start(mm, addr, pte); ptent = pte_modify(ptent, newprot); updated = true; } else { struct page *page; + ptent = *pte; page = vm_normal_page(vma, addr, oldpte); if (page) { if (!pte_numa(oldpte)) { ptent = pte_mknuma(ptent); + set_pte_at(mm, addr, pte, ptent); updated = true; } } @@ -79,7 +81,10 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, if (updated) pages++; - ptep_modify_prot_commit(mm, addr, pte, ptent); + + /* Only !prot_numa always clears the pte */ + if (!prot_numa) + ptep_modify_prot_commit(mm, addr, pte, ptent); } else if (IS_ENABLED(CONFIG_MIGRATION) && !pte_file(oldpte)) { swp_entry_t entry = pte_to_swp_entry(oldpte); -- cgit v0.10.2 From c3a489cac38d43ea6dc4ac240473b44b46deecf7 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Wed, 18 Dec 2013 17:08:38 -0800 Subject: mm: numa: ensure anon_vma is locked to prevent parallel THP splits The anon_vma lock prevents parallel THP splits and any associated complexity that arises when handling splits during THP migration. This patch checks if the lock was successfully acquired and bails from THP migration if it failed for any reason. Signed-off-by: Mel Gorman Reviewed-by: Rik van Riel Cc: Alex Thorlton Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 420826e..dbafffa 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1359,6 +1359,13 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma, goto out_unlock; } + /* Bail if we fail to protect against THP splits for any reason */ + if (unlikely(!anon_vma)) { + put_page(page); + page_nid = -1; + goto clear_pmdnuma; + } + /* * Migrate the THP to the requested node, returns with page unlocked * and pmd_numa cleared. -- cgit v0.10.2 From eb4489f69f224356193364dc2762aa009738ca7f Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Wed, 18 Dec 2013 17:08:39 -0800 Subject: mm: numa: avoid unnecessary work on the failure path If a PMD changes during a THP migration then migration aborts but the failure path is doing more work than is necessary. Signed-off-by: Mel Gorman Reviewed-by: Rik van Riel Cc: Alex Thorlton Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/migrate.c b/mm/migrate.c index be787d5..a987525 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1780,7 +1780,8 @@ fail_putback: putback_lru_page(page); mod_zone_page_state(page_zone(page), NR_ISOLATED_ANON + page_lru, -HPAGE_PMD_NR); - goto out_fail; + + goto out_unlock; } /* @@ -1854,6 +1855,7 @@ out_dropref: } spin_unlock(ptl); +out_unlock: unlock_page(page); put_page(page); return 0; -- cgit v0.10.2 From 3c67f474558748b604e247d92b55dfe89654c81d Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Wed, 18 Dec 2013 17:08:40 -0800 Subject: sched: numa: skip inaccessible VMAs Inaccessible VMA should not be trapping NUMA hint faults. Skip them. Signed-off-by: Mel Gorman Reviewed-by: Rik van Riel Cc: Alex Thorlton Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9030da7..c7395d9 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1738,6 +1738,13 @@ void task_numa_work(struct callback_head *work) (vma->vm_file && (vma->vm_flags & (VM_READ|VM_WRITE)) == (VM_READ))) continue; + /* + * Skip inaccessible VMAs to avoid any confusion between + * PROT_NONE and NUMA hinting ptes + */ + if (!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE))) + continue; + do { start = max(start, vma->vm_start); end = ALIGN(start + (pages << PAGE_SHIFT), HPAGE_SIZE); -- cgit v0.10.2 From 1667918b6483b12a6496bf54151b827b8235d7b1 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Wed, 18 Dec 2013 17:08:41 -0800 Subject: mm: numa: clear numa hinting information on mprotect On a protection change it is no longer clear if the page should be still accessible. This patch clears the NUMA hinting fault bits on a protection change. Signed-off-by: Mel Gorman Reviewed-by: Rik van Riel Cc: Alex Thorlton Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/huge_memory.c b/mm/huge_memory.c index dbafffa..70e7429 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1532,6 +1532,8 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, ret = 1; if (!prot_numa) { entry = pmdp_get_and_clear(mm, addr, pmd); + if (pmd_numa(entry)) + entry = pmd_mknonnuma(entry); entry = pmd_modify(entry, newprot); ret = HPAGE_PMD_NR; BUG_ON(pmd_write(entry)); diff --git a/mm/mprotect.c b/mm/mprotect.c index 1291a05..f842172 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -54,6 +54,8 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, if (!prot_numa) { ptent = ptep_modify_prot_start(mm, addr, pte); + if (pte_numa(ptent)) + ptent = pte_mknonnuma(ptent); ptent = pte_modify(ptent, newprot); updated = true; } else { -- cgit v0.10.2 From de466bd628e8d663fdf3f791bc8db318ee85c714 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Wed, 18 Dec 2013 17:08:42 -0800 Subject: mm: numa: avoid unnecessary disruption of NUMA hinting during migration do_huge_pmd_numa_page() handles the case where there is parallel THP migration. However, by the time it is checked the NUMA hinting information has already been disrupted. This patch adds an earlier check with some helpers. Signed-off-by: Mel Gorman Reviewed-by: Rik van Riel Cc: Alex Thorlton Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/migrate.h b/include/linux/migrate.h index f5096b5..b7717d7 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -90,10 +90,19 @@ static inline int migrate_huge_page_move_mapping(struct address_space *mapping, #endif /* CONFIG_MIGRATION */ #ifdef CONFIG_NUMA_BALANCING +extern bool pmd_trans_migrating(pmd_t pmd); +extern void wait_migrate_huge_page(struct anon_vma *anon_vma, pmd_t *pmd); extern int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, int node); extern bool migrate_ratelimited(int node); #else +static inline bool pmd_trans_migrating(pmd_t pmd) +{ + return false; +} +static inline void wait_migrate_huge_page(struct anon_vma *anon_vma, pmd_t *pmd) +{ +} static inline int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, int node) { diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 70e7429..7de1bf8 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -882,6 +882,10 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, ret = 0; goto out_unlock; } + + /* mmap_sem prevents this happening but warn if that changes */ + WARN_ON(pmd_trans_migrating(pmd)); + if (unlikely(pmd_trans_splitting(pmd))) { /* split huge page running from under us */ spin_unlock(src_ptl); @@ -1299,6 +1303,17 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma, if (unlikely(!pmd_same(pmd, *pmdp))) goto out_unlock; + /* + * If there are potential migrations, wait for completion and retry + * without disrupting NUMA hinting information. Do not relock and + * check_same as the page may no longer be mapped. + */ + if (unlikely(pmd_trans_migrating(*pmdp))) { + spin_unlock(ptl); + wait_migrate_huge_page(vma->anon_vma, pmdp); + goto out; + } + page = pmd_page(pmd); BUG_ON(is_huge_zero_page(page)); page_nid = page_to_nid(page); @@ -1329,12 +1344,7 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma, goto clear_pmdnuma; } - /* - * If there are potential migrations, wait for completion and retry. We - * do not relock and check_same as the page may no longer be mapped. - * Furtermore, even if the page is currently misplaced, there is no - * guarantee it is still misplaced after the migration completes. - */ + /* Migration could have started since the pmd_trans_migrating check */ if (!page_locked) { spin_unlock(ptl); wait_on_page_locked(page); diff --git a/mm/migrate.c b/mm/migrate.c index a987525..cfb4190 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1655,6 +1655,18 @@ int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page) return 1; } +bool pmd_trans_migrating(pmd_t pmd) +{ + struct page *page = pmd_page(pmd); + return PageLocked(page); +} + +void wait_migrate_huge_page(struct anon_vma *anon_vma, pmd_t *pmd) +{ + struct page *page = pmd_page(*pmd); + wait_on_page_locked(page); +} + /* * Attempt to migrate a misplaced page to the specified destination * node. Caller is expected to have an elevated reference count on -- cgit v0.10.2 From 20841405940e7be0617612d521e206e4b6b325db Mon Sep 17 00:00:00 2001 From: Rik van Riel Date: Wed, 18 Dec 2013 17:08:44 -0800 Subject: mm: fix TLB flush race between migration, and change_protection_range There are a few subtle races, between change_protection_range (used by mprotect and change_prot_numa) on one side, and NUMA page migration and compaction on the other side. The basic race is that there is a time window between when the PTE gets made non-present (PROT_NONE or NUMA), and the TLB is flushed. During that time, a CPU may continue writing to the page. This is fine most of the time, however compaction or the NUMA migration code may come in, and migrate the page away. When that happens, the CPU may continue writing, through the cached translation, to what is no longer the current memory location of the process. This only affects x86, which has a somewhat optimistic pte_accessible. All other architectures appear to be safe, and will either always flush, or flush whenever there is a valid mapping, even with no permissions (SPARC). The basic race looks like this: CPU A CPU B CPU C load TLB entry make entry PTE/PMD_NUMA fault on entry read/write old page start migrating page change PTE/PMD to new page read/write old page [*] flush TLB reload TLB from new entry read/write new page lose data [*] the old page may belong to a new user at this point! The obvious fix is to flush remote TLB entries, by making sure that pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may still be accessible if there is a TLB flush pending for the mm. This should fix both NUMA migration and compaction. [mgorman@suse.de: fix build] Signed-off-by: Rik van Riel Signed-off-by: Mel Gorman Cc: Alex Thorlton Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index 8358dc1..0f9e945 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -619,7 +619,7 @@ static inline unsigned long pte_present(pte_t pte) } #define pte_accessible pte_accessible -static inline unsigned long pte_accessible(pte_t a) +static inline unsigned long pte_accessible(struct mm_struct *mm, pte_t a) { return pte_val(a) & _PAGE_VALID; } @@ -847,7 +847,7 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr, * SUN4V NOTE: _PAGE_VALID is the same value in both the SUN4U * and SUN4V pte layout, so this inline test is fine. */ - if (likely(mm != &init_mm) && pte_accessible(orig)) + if (likely(mm != &init_mm) && pte_accessible(mm, orig)) tlb_batch_add(mm, addr, ptep, orig, fullmm); } diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 3d19994..bbc8b12 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -452,9 +452,16 @@ static inline int pte_present(pte_t a) } #define pte_accessible pte_accessible -static inline int pte_accessible(pte_t a) +static inline bool pte_accessible(struct mm_struct *mm, pte_t a) { - return pte_flags(a) & _PAGE_PRESENT; + if (pte_flags(a) & _PAGE_PRESENT) + return true; + + if ((pte_flags(a) & (_PAGE_PROTNONE | _PAGE_NUMA)) && + mm_tlb_flush_pending(mm)) + return true; + + return false; } static inline int pte_hidden(pte_t pte) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index f330d28..b12079a 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -217,7 +217,7 @@ static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b) #endif #ifndef pte_accessible -# define pte_accessible(pte) ((void)(pte),1) +# define pte_accessible(mm, pte) ((void)(pte), 1) #endif #ifndef flush_tlb_fix_spurious_fault diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index bd29941..e5c49c3 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -443,6 +443,14 @@ struct mm_struct { /* numa_scan_seq prevents two threads setting pte_numa */ int numa_scan_seq; #endif +#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION) + /* + * An operation with batched TLB flushing is going on. Anything that + * can move process memory needs to flush the TLB when moving a + * PROT_NONE or PROT_NUMA mapped page. + */ + bool tlb_flush_pending; +#endif struct uprobes_state uprobes_state; }; @@ -459,4 +467,40 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm) return mm->cpu_vm_mask_var; } +#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION) +/* + * Memory barriers to keep this state in sync are graciously provided by + * the page table locks, outside of which no page table modifications happen. + * The barriers below prevent the compiler from re-ordering the instructions + * around the memory barriers that are already present in the code. + */ +static inline bool mm_tlb_flush_pending(struct mm_struct *mm) +{ + barrier(); + return mm->tlb_flush_pending; +} +static inline void set_tlb_flush_pending(struct mm_struct *mm) +{ + mm->tlb_flush_pending = true; + barrier(); +} +/* Clearing is done after a TLB flush, which also provides a barrier. */ +static inline void clear_tlb_flush_pending(struct mm_struct *mm) +{ + barrier(); + mm->tlb_flush_pending = false; +} +#else +static inline bool mm_tlb_flush_pending(struct mm_struct *mm) +{ + return false; +} +static inline void set_tlb_flush_pending(struct mm_struct *mm) +{ +} +static inline void clear_tlb_flush_pending(struct mm_struct *mm) +{ +} +#endif + #endif /* _LINUX_MM_TYPES_H */ diff --git a/kernel/fork.c b/kernel/fork.c index 728d5be..5721f0e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -537,6 +537,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p) spin_lock_init(&mm->page_table_lock); mm_init_aio(mm); mm_init_owner(mm, p); + clear_tlb_flush_pending(mm); if (likely(!mm_alloc_pgd(mm))) { mm->def_flags = 0; diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 7de1bf8..3d2783e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1377,6 +1377,13 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma, } /* + * The page_table_lock above provides a memory barrier + * with change_protection_range. + */ + if (mm_tlb_flush_pending(mm)) + flush_tlb_range(vma, haddr, haddr + HPAGE_PMD_SIZE); + + /* * Migrate the THP to the requested node, returns with page unlocked * and pmd_numa cleared. */ diff --git a/mm/mprotect.c b/mm/mprotect.c index f842172..bb53a65 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -188,6 +188,7 @@ static unsigned long change_protection_range(struct vm_area_struct *vma, BUG_ON(addr >= end); pgd = pgd_offset(mm, addr); flush_cache_range(vma, addr, end); + set_tlb_flush_pending(mm); do { next = pgd_addr_end(addr, end); if (pgd_none_or_clear_bad(pgd)) @@ -199,6 +200,7 @@ static unsigned long change_protection_range(struct vm_area_struct *vma, /* Only flush the TLB if we actually modified any entries: */ if (pages) flush_tlb_range(vma, start, end); + clear_tlb_flush_pending(mm); return pages; } diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index e84cad2..a8b9199 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -110,9 +110,10 @@ int pmdp_clear_flush_young(struct vm_area_struct *vma, pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) { + struct mm_struct *mm = (vma)->vm_mm; pte_t pte; - pte = ptep_get_and_clear((vma)->vm_mm, address, ptep); - if (pte_accessible(pte)) + pte = ptep_get_and_clear(mm, address, ptep); + if (pte_accessible(mm, pte)) flush_tlb_page(vma, address); return pte; } -- cgit v0.10.2 From af2c1401e6f9177483be4fad876d0073669df9df Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Wed, 18 Dec 2013 17:08:45 -0800 Subject: mm: numa: guarantee that tlb_flush_pending updates are visible before page table updates According to documentation on barriers, stores issued before a LOCK can complete after the lock implying that it's possible tlb_flush_pending can be visible after a page table update. As per revised documentation, this patch adds a smp_mb__before_spinlock to guarantee the correct ordering. Signed-off-by: Mel Gorman Acked-by: Paul E. McKenney Reviewed-by: Rik van Riel Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index e5c49c3..ad0616f 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -482,7 +482,12 @@ static inline bool mm_tlb_flush_pending(struct mm_struct *mm) static inline void set_tlb_flush_pending(struct mm_struct *mm) { mm->tlb_flush_pending = true; - barrier(); + + /* + * Guarantee that the tlb_flush_pending store does not leak into the + * critical section updating the page tables + */ + smp_mb__before_spinlock(); } /* Clearing is done after a TLB flush, which also provides a barrier. */ static inline void clear_tlb_flush_pending(struct mm_struct *mm) -- cgit v0.10.2 From b0943d61b8fa420180f92f64ef67662b4f6cc493 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Wed, 18 Dec 2013 17:08:46 -0800 Subject: mm: numa: defer TLB flush for THP migration as long as possible THP migration can fail for a variety of reasons. Avoid flushing the TLB to deal with THP migration races until the copy is ready to start. Signed-off-by: Mel Gorman Reviewed-by: Rik van Riel Cc: Alex Thorlton Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 3d2783e..7de1bf8 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1377,13 +1377,6 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma, } /* - * The page_table_lock above provides a memory barrier - * with change_protection_range. - */ - if (mm_tlb_flush_pending(mm)) - flush_tlb_range(vma, haddr, haddr + HPAGE_PMD_SIZE); - - /* * Migrate the THP to the requested node, returns with page unlocked * and pmd_numa cleared. */ diff --git a/mm/migrate.c b/mm/migrate.c index cfb4190..e9b7102 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1759,6 +1759,9 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, goto out_fail; } + if (mm_tlb_flush_pending(mm)) + flush_tlb_range(vma, mmun_start, mmun_end); + /* Prepare a page as a migration target */ __set_page_locked(new_page); SetPageSwapBacked(new_page); -- cgit v0.10.2 From 73f038b863dfe98acabc7c36c17342b84ad52e94 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Wed, 18 Dec 2013 17:08:47 -0800 Subject: mm: page_alloc: exclude unreclaimable allocations from zone fairness policy Dave Hansen noted a regression in a microbenchmark that loops around open() and close() on an 8-node NUMA machine and bisected it down to commit 81c0a2bb515f ("mm: page_alloc: fair zone allocator policy"). That change forces the slab allocations of the file descriptor to spread out to all 8 nodes, causing remote references in the page allocator and slab. The round-robin policy is only there to provide fairness among memory allocations that are reclaimed involuntarily based on pressure in each zone. It does not make sense to apply it to unreclaimable kernel allocations that are freed manually, in this case instantly after the allocation, and incur the remote reference costs twice for no reason. Only round-robin allocations that are usually freed through page reclaim or slab shrinking. Bisected by Dave Hansen. Signed-off-by: Johannes Weiner Cc: Dave Hansen Cc: Mel Gorman Reviewed-by: Rik van Riel Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 580a5f0..f861d02 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1920,7 +1920,8 @@ zonelist_scan: * back to remote zones that do not partake in the * fairness round-robin cycle of this zonelist. */ - if (alloc_flags & ALLOC_WMARK_LOW) { + if ((alloc_flags & ALLOC_WMARK_LOW) && + (gfp_mask & GFP_MOVABLE_MASK)) { if (zone_page_state(zone, NR_ALLOC_BATCH) <= 0) continue; if (zone_reclaim_mode && -- cgit v0.10.2 From 84ed8a99058e61567f495cc43118344261641c5f Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Wed, 18 Dec 2013 17:08:48 -0800 Subject: sh: always link in helper functions extracted from libgcc E.g. landisk_defconfig, which has CONFIG_NTFS_FS=m: ERROR: "__ashrdi3" [fs/ntfs/ntfs.ko] undefined! For "lib-y", if no symbols in a compilation unit are referenced by other units, the compilation unit will not be included in vmlinux. This breaks modules that do reference those symbols. Use "obj-y" instead to fix this. http://kisskb.ellerman.id.au/kisskb/buildresult/8838077/ This doesn't fix all cases. There are others, e.g. udivsi3. This is also not limited to sh, many architectures handle this in the same way. A simple solution is to unconditionally include all helper functions. A more complex solution is to make the choice of "lib-y" or "obj-y" depend on CONFIG_MODULES: obj-$(CONFIG_MODULES) += ... lib-y($CONFIG_MODULES) += ... Signed-off-by: Geert Uytterhoeven Cc: Paul Mundt Tested-by: Nobuhiro Iwamatsu Reviewed-by: Nobuhiro Iwamatsu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/arch/sh/lib/Makefile b/arch/sh/lib/Makefile index 7b95f29..3baff31 100644 --- a/arch/sh/lib/Makefile +++ b/arch/sh/lib/Makefile @@ -6,7 +6,7 @@ lib-y = delay.o memmove.o memchr.o \ checksum.o strlen.o div64.o div64-generic.o # Extracted from libgcc -lib-y += movmem.o ashldi3.o ashrdi3.o lshrdi3.o \ +obj-y += movmem.o ashldi3.o ashrdi3.o lshrdi3.o \ ashlsi3.o ashrsi3.o ashiftrt.o lshrsi3.o \ udiv_qrnnd.o -- cgit v0.10.2 From a844f38671d45ee6d981cfe41c5c4c2334578d73 Mon Sep 17 00:00:00 2001 From: Sima Baymani Date: Wed, 18 Dec 2013 17:08:49 -0800 Subject: mm: add missing dependency in Kconfig Eliminate the following (rand)config warning by adding missing PROC_FS dependency: warning: (HWPOISON_INJECT && MEM_SOFT_DIRTY) selects PROC_PAGE_MONITOR which has unmet direct dependencies (PROC_FS && MMU) Signed-off-by: Sima Baymani Suggested-by: David Rientjes Acked-by: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/Kconfig b/mm/Kconfig index eb69f35..723bbe0 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -543,7 +543,7 @@ config ZSWAP config MEM_SOFT_DIRTY bool "Track memory changes" - depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY + depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY && PROC_FS select PROC_PAGE_MONITOR help This option enables memory changes tracking by introducing a -- cgit v0.10.2 From b0e5fd7359f1ce8db4ccb862b3aa80d2f2cbf4d0 Mon Sep 17 00:00:00 2001 From: Joonsoo Kim Date: Wed, 18 Dec 2013 17:08:51 -0800 Subject: mm/mempolicy: correct putback method for isolate pages if failed queue_pages_range() isolates hugetlbfs pages and putback_lru_pages() can't handle these. We should change it to putback_movable_pages(). Naoya said that it is worth going into stable, because it can break in-use hugepage list. Signed-off-by: Joonsoo Kim Acked-by: Rafael Aquini Reviewed-by: Naoya Horiguchi Reviewed-by: Wanpeng Li Cc: Christoph Lameter Cc: Vlastimil Babka Cc: Wanpeng Li Cc: Mel Gorman Cc: Rik van Riel Cc: Vlastimil Babka Cc: Zhang Yanfei Cc: [3.12.x] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/mempolicy.c b/mm/mempolicy.c index eca4a31..6d04d37 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1318,7 +1318,7 @@ static long do_mbind(unsigned long start, unsigned long len, if (nr_failed && (flags & MPOL_MF_STRICT)) err = -EIO; } else - putback_lru_pages(&pagelist); + putback_movable_pages(&pagelist); up_write(&mm->mmap_sem); mpol_out: -- cgit v0.10.2 From 6815bf3f233e0b10c99a758497d5d236063b010b Mon Sep 17 00:00:00 2001 From: Joonsoo Kim Date: Wed, 18 Dec 2013 17:08:52 -0800 Subject: mm/compaction: respect ignore_skip_hint in update_pageblock_skip update_pageblock_skip() only fits to compaction which tries to isolate by pageblock unit. If isolate_migratepages_range() is called by CMA, it try to isolate regardless of pageblock unit and it don't reference get_pageblock_skip() by ignore_skip_hint. We should also respect it on update_pageblock_skip() to prevent from setting the wrong information. Signed-off-by: Joonsoo Kim Acked-by: Vlastimil Babka Reviewed-by: Naoya Horiguchi Reviewed-by: Wanpeng Li Cc: Christoph Lameter Cc: Rafael Aquini Cc: Vlastimil Babka Cc: Wanpeng Li Cc: Mel Gorman Cc: Rik van Riel Cc: Zhang Yanfei Cc: [3.7+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/compaction.c b/mm/compaction.c index 805165b..f58bcd0 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -134,6 +134,10 @@ static void update_pageblock_skip(struct compact_control *cc, bool migrate_scanner) { struct zone *zone = cc->zone; + + if (cc->ignore_skip_hint) + return; + if (!page) return; -- cgit v0.10.2 From a49ecbcd7b0d5a1cda7d60e03df402dd0ef76ac8 Mon Sep 17 00:00:00 2001 From: Jianguo Wu Date: Wed, 18 Dec 2013 17:08:54 -0800 Subject: mm/memory-failure.c: recheck PageHuge() after hugetlb page migrate successfully After a successful hugetlb page migration by soft offline, the source page will either be freed into hugepage_freelists or buddy(over-commit page). If page is in buddy, page_hstate(page) will be NULL. It will hit a NULL pointer dereference in dequeue_hwpoisoned_huge_page(). BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 IP: [] dequeue_hwpoisoned_huge_page+0x131/0x1d0 PGD c23762067 PUD c24be2067 PMD 0 Oops: 0000 [#1] SMP So check PageHuge(page) after call migrate_pages() successfully. Signed-off-by: Jianguo Wu Tested-by: Naoya Horiguchi Reviewed-by: Naoya Horiguchi Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/memory-failure.c b/mm/memory-failure.c index b7c1716..db08af9 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1505,10 +1505,16 @@ static int soft_offline_huge_page(struct page *page, int flags) if (ret > 0) ret = -EIO; } else { - set_page_hwpoison_huge_page(hpage); - dequeue_hwpoisoned_huge_page(hpage); - atomic_long_add(1 << compound_order(hpage), - &num_poisoned_pages); + /* overcommit hugetlb page will be freed to buddy */ + if (PageHuge(page)) { + set_page_hwpoison_huge_page(hpage); + dequeue_hwpoisoned_huge_page(hpage); + atomic_long_add(1 << compound_order(hpage), + &num_poisoned_pages); + } else { + SetPageHWPoison(page); + atomic_long_inc(&num_poisoned_pages); + } } return ret; } -- cgit v0.10.2 From 584ec9794293ff69f597f5bec23296e62e94e857 Mon Sep 17 00:00:00 2001 From: Davidlohr Bueso Date: Wed, 18 Dec 2013 17:08:55 -0800 Subject: MAINTAINERS: add Davidlohr as GPT maintainer Add a new entry for the GPT standard. Any future changes will now be routed through linux-efi. Signed-off-by: Davidlohr Bueso Acked-by: Matt Fleming Cc: Jens Axboe Acked-by: Matt Domsch Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/MAINTAINERS b/MAINTAINERS index 75e3db9..3c99c25 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3833,6 +3833,12 @@ T: git git://linuxtv.org/media_tree.git S: Maintained F: drivers/media/usb/gspca/ +GUID PARTITION TABLE (GPT) +M: Davidlohr Bueso +L: linux-efi@vger.kernel.org +S: Maintained +F: block/partitions/efi.* + STK1160 USB VIDEO CAPTURE DRIVER M: Ezequiel Garcia L: linux-media@vger.kernel.org -- cgit v0.10.2 From 11c731e81bb0d8d2e835447a2dd645b34bb74706 Mon Sep 17 00:00:00 2001 From: Wanpeng Li Date: Wed, 18 Dec 2013 17:08:56 -0800 Subject: mm/mempolicy: fix !vma in new_vma_page() BUG_ON(!vma) assumption is introduced by commit 0bf598d863e3 ("mbind: add BUG_ON(!vma) in new_vma_page()"), however, even if address = __vma_address(page, vma); and vma->start < address < vma->end page_address_in_vma() may still return -EFAULT because of many other conditions in it. As a result the while loop in new_vma_page() may end with vma=NULL. This patch revert the commit and also fix the potential dereference NULL pointer reported by Dan. http://marc.info/?l=linux-mm&m=137689530323257&w=2 kernel BUG at mm/mempolicy.c:1204! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 3 PID: 7056 Comm: trinity-child3 Not tainted 3.13.0-rc3+ #2 task: ffff8801ca5295d0 ti: ffff88005ab20000 task.ti: ffff88005ab20000 RIP: new_vma_page+0x70/0x90 RSP: 0000:ffff88005ab21db0 EFLAGS: 00010246 RAX: fffffffffffffff2 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000008040075 RSI: ffff8801c3d74600 RDI: ffffea00079a8b80 RBP: ffff88005ab21dc8 R08: 0000000000000004 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffff2 R13: ffffea00079a8b80 R14: 0000000000400000 R15: 0000000000400000 FS: 00007ff49c6f4740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ff49c68f994 CR3: 000000005a205000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Stack: ffffea00079a8b80 ffffea00079a8bc0 ffffea00079a8ba0 ffff88005ab21e50 ffffffff811adc7a 0000000000000000 ffff8801ca5295d0 0000000464e224f8 0000000000000000 0000000000000002 0000000000000000 ffff88020ce75c00 Call Trace: migrate_pages+0x12a/0x850 SYSC_mbind+0x513/0x6a0 SyS_mbind+0xe/0x10 ia32_do_call+0x13/0x13 Code: 85 c0 75 2f 4c 89 e1 48 89 da 31 f6 bf da 00 02 00 65 44 8b 04 25 08 f7 1c 00 e8 ec fd ff ff 5b 41 5c 41 5d 5d c3 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 4c 89 e6 48 89 df ba 01 00 00 00 e8 48 RIP [] new_vma_page+0x70/0x90 RSP Signed-off-by: Wanpeng Li Reported-by: Dave Jones Reported-by: Sasha Levin Reviewed-by: Naoya Horiguchi Reviewed-by: Bob Liu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 6d04d37..0cd2c4d 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1197,14 +1197,16 @@ static struct page *new_vma_page(struct page *page, unsigned long private, int * break; vma = vma->vm_next; } + + if (PageHuge(page)) { + if (vma) + return alloc_huge_page_noerr(vma, address, 1); + else + return NULL; + } /* - * queue_pages_range() confirms that @page belongs to some vma, - * so vma shouldn't be NULL. + * if !vma, alloc_page_vma() will use task or system default policy */ - BUG_ON(!vma); - - if (PageHuge(page)) - return alloc_huge_page_noerr(vma, address, 1); return alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address); } #else -- cgit v0.10.2 From 7ac181568342ec618d4da1ab03c2babcaa7ee84f Mon Sep 17 00:00:00 2001 From: Jan Beulich Date: Wed, 18 Dec 2013 17:08:57 -0800 Subject: fix build with make 3.80 According to Documentation/Changes, make 3.80 is still being supported for building the kernel, hence make files must not make (unconditional) use of features introduced only in newer versions. Commit 1bf49dd4be0b ("./Makefile: export initial ramdisk compression config option") however introduced "else ifeq" constructs which make 3.80 doesn't understand. Replace the logic there with more conventional (in the kernel build infrastructure) list constructs (except that the list here is intentionally limited to exactly one element). Signed-off-by: Jan Beulich Cc: P J P Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/Makefile b/Makefile index 858a147..89cee86 100644 --- a/Makefile +++ b/Makefile @@ -732,19 +732,13 @@ export mod_strip_cmd # Select initial ramdisk compression format, default is gzip(1). # This shall be used by the dracut(8) tool while creating an initramfs image. # -INITRD_COMPRESS=gzip -ifeq ($(CONFIG_RD_BZIP2), y) - INITRD_COMPRESS=bzip2 -else ifeq ($(CONFIG_RD_LZMA), y) - INITRD_COMPRESS=lzma -else ifeq ($(CONFIG_RD_XZ), y) - INITRD_COMPRESS=xz -else ifeq ($(CONFIG_RD_LZO), y) - INITRD_COMPRESS=lzo -else ifeq ($(CONFIG_RD_LZ4), y) - INITRD_COMPRESS=lz4 -endif -export INITRD_COMPRESS +INITRD_COMPRESS-y := gzip +INITRD_COMPRESS-$(CONFIG_RD_BZIP2) := bzip2 +INITRD_COMPRESS-$(CONFIG_RD_LZMA) := lzma +INITRD_COMPRESS-$(CONFIG_RD_XZ) := xz +INITRD_COMPRESS-$(CONFIG_RD_LZO) := lzo +INITRD_COMPRESS-$(CONFIG_RD_LZ4) := lz4 +export INITRD_COMPRESS := $(INITRD_COMPRESS-y) ifdef CONFIG_MODULE_SIG_ALL MODSECKEY = ./signing_key.priv -- cgit v0.10.2 From 98398c32f6687ee1e1f3ae084effb4b75adb0747 Mon Sep 17 00:00:00 2001 From: Jianguo Wu Date: Wed, 18 Dec 2013 17:08:59 -0800 Subject: mm/hugetlb: check for pte NULL pointer in __page_check_address() In __page_check_address(), if address's pud is not present, huge_pte_offset() will return NULL, we should check the return value. Signed-off-by: Jianguo Wu Cc: Naoya Horiguchi Cc: Mel Gorman Cc: qiuxishi Cc: Hanjun Guo Acked-by: Kirill A. Shutemov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds diff --git a/mm/rmap.c b/mm/rmap.c index 55c8b8d..068522d 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -600,7 +600,11 @@ pte_t *__page_check_address(struct page *page, struct mm_struct *mm, spinlock_t *ptl; if (unlikely(PageHuge(page))) { + /* when pud is not present, pte will be NULL */ pte = huge_pte_offset(mm, address); + if (!pte) + return NULL; + ptl = huge_pte_lockptr(page_hstate(page), mm, pte); goto check; } -- cgit v0.10.2 From 9fb444f22f09cfad3798e3610f7dc62f8a385ee8 Mon Sep 17 00:00:00 2001 From: Laurent Pinchart Date: Wed, 11 Dec 2013 03:48:15 +0100 Subject: ARM: shmobile: armadillo: Add PWM backlight power supply Commit 22ceeee16eb8f0d04de3ef43a5174fb30ec18af9 ("pwm-backlight: Add power supply support") added a mandatory power supply for the PWM backlight. Add a fixed 5V regulator to board code with a consumer supply entry for the backlight device. Signed-off-by: Laurent Pinchart Signed-off-by: Simon Horman (cherry picked from commit ad11cb9a5cf96346f1240995c672cdbb5501785c) Signed-off-by: Simon Horman diff --git a/arch/arm/mach-shmobile/board-armadillo800eva.c b/arch/arm/mach-shmobile/board-armadillo800eva.c index 958e3cb..c186891 100644 --- a/arch/arm/mach-shmobile/board-armadillo800eva.c +++ b/arch/arm/mach-shmobile/board-armadillo800eva.c @@ -614,6 +614,11 @@ static struct regulator_consumer_supply fixed3v3_power_consumers[] = { REGULATOR_SUPPLY("vqmmc", "sh_mmcif"), }; +/* Fixed 3.3V regulator used by LCD backlight */ +static struct regulator_consumer_supply fixed5v0_power_consumers[] = { + REGULATOR_SUPPLY("power", "pwm-backlight.0"), +}; + /* Fixed 3.3V regulator to be used by SDHI0 */ static struct regulator_consumer_supply vcc_sdhi0_consumers[] = { REGULATOR_SUPPLY("vmmc", "sh_mobile_sdhi.0"), @@ -1196,6 +1201,8 @@ static void __init eva_init(void) regulator_register_always_on(0, "fixed-3.3V", fixed3v3_power_consumers, ARRAY_SIZE(fixed3v3_power_consumers), 3300000); + regulator_register_always_on(3, "fixed-5.0V", fixed5v0_power_consumers, + ARRAY_SIZE(fixed5v0_power_consumers), 5000000); pinctrl_register_mappings(eva_pinctrl_map, ARRAY_SIZE(eva_pinctrl_map)); pwm_add_table(pwm_lookup, ARRAY_SIZE(pwm_lookup)); -- cgit v0.10.2 From db6077fd0b7dd41dc6ff18329cec979379071f87 Mon Sep 17 00:00:00 2001 From: Nicholas Bellinger Date: Wed, 11 Dec 2013 15:45:32 -0800 Subject: iscsi-target: Fix incorrect np->np_thread NULL assignment When shutting down a target there is a race condition between iscsit_del_np() and __iscsi_target_login_thread(). The latter sets the thread pointer to NULL, and the former tries to issue kthread_stop() on that pointer without any synchronization. This patch moves the np->np_thread NULL assignment into iscsit_del_np(), after kthread_stop() has completed. It also removes the signal_pending() + np_state check, and only exits when kthread_should_stop() is true. Reported-by: Hannes Reinecke Cc: #3.12+ Signed-off-by: Nicholas Bellinger diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index 02182ab..0086719 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -465,6 +465,7 @@ int iscsit_del_np(struct iscsi_np *np) */ send_sig(SIGINT, np->np_thread, 1); kthread_stop(np->np_thread); + np->np_thread = NULL; } np->np_transport->iscsit_free_np(np); diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c index 4eb93b2..e29279e 100644 --- a/drivers/target/iscsi/iscsi_target_login.c +++ b/drivers/target/iscsi/iscsi_target_login.c @@ -1403,11 +1403,6 @@ old_sess_out: out: stop = kthread_should_stop(); - if (!stop && signal_pending(current)) { - spin_lock_bh(&np->np_thread_lock); - stop = (np->np_thread_state == ISCSI_NP_THREAD_SHUTDOWN); - spin_unlock_bh(&np->np_thread_lock); - } /* Wait for another socket.. */ if (!stop) return 1; @@ -1415,7 +1410,6 @@ exit: iscsi_stop_login_thread_timer(np); spin_lock_bh(&np->np_thread_lock); np->np_thread_state = ISCSI_NP_THREAD_EXIT; - np->np_thread = NULL; spin_unlock_bh(&np->np_thread_lock); return 0; -- cgit v0.10.2 From 2853c2b6671509591be09213954d7249ca6ff224 Mon Sep 17 00:00:00 2001 From: Nicholas Bellinger Date: Wed, 11 Dec 2013 16:20:13 -0800 Subject: iser-target: Move INIT_WORK setup into isert_create_device_ib_res This patch moves INIT_WORK setup for cq_desc->cq_[rx,tx]_work into isert_create_device_ib_res(), instead of being done each callback invocation in isert_cq_[rx,tx]_callback(). This also fixes a 'INFO: trying to register non-static key' warning when cancel_work_sync() is called before INIT_WORK has setup the struct work_struct. Reported-by: Or Gerlitz Cc: #3.12+ Signed-off-by: Nicholas Bellinger diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c index 78f6e92..9804fca 100644 --- a/drivers/infiniband/ulp/isert/ib_isert.c +++ b/drivers/infiniband/ulp/isert/ib_isert.c @@ -207,7 +207,9 @@ isert_free_rx_descriptors(struct isert_conn *isert_conn) isert_conn->conn_rx_descs = NULL; } +static void isert_cq_tx_work(struct work_struct *); static void isert_cq_tx_callback(struct ib_cq *, void *); +static void isert_cq_rx_work(struct work_struct *); static void isert_cq_rx_callback(struct ib_cq *, void *); static int @@ -259,6 +261,7 @@ isert_create_device_ib_res(struct isert_device *device) cq_desc[i].device = device; cq_desc[i].cq_index = i; + INIT_WORK(&cq_desc[i].cq_rx_work, isert_cq_rx_work); device->dev_rx_cq[i] = ib_create_cq(device->ib_device, isert_cq_rx_callback, isert_cq_event_callback, @@ -270,6 +273,7 @@ isert_create_device_ib_res(struct isert_device *device) goto out_cq; } + INIT_WORK(&cq_desc[i].cq_tx_work, isert_cq_tx_work); device->dev_tx_cq[i] = ib_create_cq(device->ib_device, isert_cq_tx_callback, isert_cq_event_callback, @@ -1732,7 +1736,6 @@ isert_cq_tx_callback(struct ib_cq *cq, void *context) { struct isert_cq_desc *cq_desc = (struct isert_cq_desc *)context; - INIT_WORK(&cq_desc->cq_tx_work, isert_cq_tx_work); queue_work(isert_comp_wq, &cq_desc->cq_tx_work); } @@ -1776,7 +1779,6 @@ isert_cq_rx_callback(struct ib_cq *cq, void *context) { struct isert_cq_desc *cq_desc = (struct isert_cq_desc *)context; - INIT_WORK(&cq_desc->cq_rx_work, isert_cq_rx_work); queue_work(isert_rx_wq, &cq_desc->cq_rx_work); } -- cgit v0.10.2 From 95cadace8f3959282e76ebf8b382bd0930807d2c Mon Sep 17 00:00:00 2001 From: Nicholas Bellinger Date: Thu, 12 Dec 2013 12:24:11 -0800 Subject: target/file: Update hw_max_sectors based on current block_size This patch allows FILEIO to update hw_max_sectors based on the current max_bytes_per_io. This is required because vfs_[writev,readv]() can accept a maximum of 2048 iovecs per call, so the enforced hw_max_sectors really needs to be calculated based on block_size. This addresses a >= v3.5 bug where block_size=512 was rejecting > 1M sized I/O requests, because FD_MAX_SECTORS was hardcoded to 2048 for the block_size=4096 case. (v2: Use max_bytes_per_io instead of ->update_hw_max_sectors) Reported-by: Henrik Goldman Cc: #3.5+ Signed-off-by: Nicholas Bellinger diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c index 207b340..d06de84 100644 --- a/drivers/target/target_core_device.c +++ b/drivers/target/target_core_device.c @@ -1106,6 +1106,11 @@ int se_dev_set_block_size(struct se_device *dev, u32 block_size) dev->dev_attrib.block_size = block_size; pr_debug("dev[%p]: SE Device block_size changed to %u\n", dev, block_size); + + if (dev->dev_attrib.max_bytes_per_io) + dev->dev_attrib.hw_max_sectors = + dev->dev_attrib.max_bytes_per_io / block_size; + return 0; } diff --git a/drivers/target/target_core_file.c b/drivers/target/target_core_file.c index 0e34cda..78241a5 100644 --- a/drivers/target/target_core_file.c +++ b/drivers/target/target_core_file.c @@ -66,9 +66,8 @@ static int fd_attach_hba(struct se_hba *hba, u32 host_id) pr_debug("CORE_HBA[%d] - TCM FILEIO HBA Driver %s on Generic" " Target Core Stack %s\n", hba->hba_id, FD_VERSION, TARGET_CORE_MOD_VERSION); - pr_debug("CORE_HBA[%d] - Attached FILEIO HBA: %u to Generic" - " MaxSectors: %u\n", - hba->hba_id, fd_host->fd_host_id, FD_MAX_SECTORS); + pr_debug("CORE_HBA[%d] - Attached FILEIO HBA: %u to Generic\n", + hba->hba_id, fd_host->fd_host_id); return 0; } @@ -220,7 +219,8 @@ static int fd_configure_device(struct se_device *dev) } dev->dev_attrib.hw_block_size = fd_dev->fd_block_size; - dev->dev_attrib.hw_max_sectors = FD_MAX_SECTORS; + dev->dev_attrib.max_bytes_per_io = FD_MAX_BYTES; + dev->dev_attrib.hw_max_sectors = FD_MAX_BYTES / fd_dev->fd_block_size; dev->dev_attrib.hw_queue_depth = FD_MAX_DEVICE_QUEUE_DEPTH; if (fd_dev->fbd_flags & FDBD_HAS_BUFFERED_IO_WCE) { diff --git a/drivers/target/target_core_file.h b/drivers/target/target_core_file.h index 37ffc5b..d7772c1 100644 --- a/drivers/target/target_core_file.h +++ b/drivers/target/target_core_file.h @@ -7,7 +7,10 @@ #define FD_DEVICE_QUEUE_DEPTH 32 #define FD_MAX_DEVICE_QUEUE_DEPTH 128 #define FD_BLOCKSIZE 512 -#define FD_MAX_SECTORS 2048 +/* + * Limited by the number of iovecs (2048) per vfs_[writev,readv] call + */ +#define FD_MAX_BYTES 8388608 #define RRF_EMULATE_CDB 0x01 #define RRF_GOT_LBA 0x02 diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h index 9f1dda6..321301c 100644 --- a/include/target/target_core_base.h +++ b/include/target/target_core_base.h @@ -620,6 +620,7 @@ struct se_dev_attrib { u32 unmap_granularity; u32 unmap_granularity_alignment; u32 max_write_same_len; + u32 max_bytes_per_io; struct se_device *da_dev; struct config_group da_group; }; -- cgit v0.10.2 From 4799e310caf0fb9078389766d0210d1c6133ad51 Mon Sep 17 00:00:00 2001 From: Kuninori Morimoto Date: Mon, 16 Dec 2013 00:16:52 -0800 Subject: ARM: shmobile: bockw: fixup DMA mask 4dcfa60071b3d23f0181f27d8519f12e37cefbb9 (ARM: DMA-API: better handing of DMA masks for coherent allocations) exchanged DMA mask check method. Below warning will appear without this patch asoc-simple-card asoc-simple-card.0: \ Coherent DMA mask 0xffffffffffffffff is larger than dma_addr_t allows asoc-simple-card asoc-simple-card.0: \ Driver did not use or check the return value from dma_set_coherent_mask()? Signed-off-by: Kuninori Morimoto Acked-by: Laurent Pinchart Signed-off-by: Simon Horman diff --git a/arch/arm/mach-shmobile/board-bockw.c b/arch/arm/mach-shmobile/board-bockw.c index 3861152..3c4995a 100644 --- a/arch/arm/mach-shmobile/board-bockw.c +++ b/arch/arm/mach-shmobile/board-bockw.c @@ -679,7 +679,7 @@ static void __init bockw_init(void) .id = i, .data = &rsnd_card_info[i], .size_data = sizeof(struct asoc_simple_card_info), - .dma_mask = ~0, + .dma_mask = DMA_BIT_MASK(32), }; platform_device_register_full(&cardinfo); -- cgit v0.10.2 From d721a15c300c5f638a11573a6dd492158e737d6a Mon Sep 17 00:00:00 2001 From: Ben Dooks Date: Mon, 16 Dec 2013 12:38:48 +0000 Subject: ARM: shmobile: r8a7790: fix shdi resource sizes The r8a7790.dtsi file has four sdhi nodes which the first two have the wrong resource size for their register block. This causes the sh_modbile_sdhi driver to fail to communicate with card at-all. Change sdhi{0,1} node size from 0x100 to 0x200 to correct these nodes as per Kuninori Morimoto's response to the original patch where all four nodes where changed. sdhi{2,3} are the correct size. This bug has been present since sdhi resources were added to the r8a7790 by 8c9b1aa41853272a ("ARM: shmobile: r8a7790: add MMCIF and SDHI DT templates") in v3.11-rc2. Signed-off-by: Ben Dooks Tested-by: William Towle Acked-by: Kuninori Morimoto Signed-off-by: Simon Horman diff --git a/arch/arm/boot/dts/r8a7790.dtsi b/arch/arm/boot/dts/r8a7790.dtsi index 46e1d7e..9987dd0e 100644 --- a/arch/arm/boot/dts/r8a7790.dtsi +++ b/arch/arm/boot/dts/r8a7790.dtsi @@ -241,7 +241,7 @@ sdhi0: sdhi@ee100000 { compatible = "renesas,sdhi-r8a7790"; - reg = <0 0xee100000 0 0x100>; + reg = <0 0xee100000 0 0x200>; interrupt-parent = <&gic>; interrupts = <0 165 4>; cap-sd-highspeed; @@ -250,7 +250,7 @@ sdhi1: sdhi@ee120000 { compatible = "renesas,sdhi-r8a7790"; - reg = <0 0xee120000 0 0x100>; + reg = <0 0xee120000 0 0x200>; interrupt-parent = <&gic>; interrupts = <0 166 4>; cap-sd-highspeed; -- cgit v0.10.2 From 1e01c7eb7c431a74437d73fe54670398b4d2b222 Mon Sep 17 00:00:00 2001 From: Vineet Gupta Date: Thu, 19 Dec 2013 18:55:58 +0530 Subject: ARC: Allow conditional multiple inclusion of uapi/asm/unistd.h Commit 97bc386fc12d "ARC: Add guard macro to uapi/asm/unistd.h" inhibited multiple inclusion of ARCH unistd.h. This however hosed the system since Generic syscall table generator relies on it being included twice, and in lack-of an empty table was emitted by C preprocessor. Fix that by allowing one exception to rule for the special case (just like Xtensa) Suggested-by: Chen Gang Signed-off-by: Vineet Gupta diff --git a/arch/arc/include/uapi/asm/unistd.h b/arch/arc/include/uapi/asm/unistd.h index 68125dd..39e58d1 100644 --- a/arch/arc/include/uapi/asm/unistd.h +++ b/arch/arc/include/uapi/asm/unistd.h @@ -8,7 +8,11 @@ /******** no-legacy-syscalls-ABI *******/ -#ifndef _UAPI_ASM_ARC_UNISTD_H +/* + * Non-typical guard macro to enable inclusion twice in ARCH sys.c + * That is how the Generic syscall wrapper generator works + */ +#if !defined(_UAPI_ASM_ARC_UNISTD_H) || defined(__SYSCALL) #define _UAPI_ASM_ARC_UNISTD_H #define __ARCH_WANT_SYS_EXECVE @@ -36,4 +40,6 @@ __SYSCALL(__NR_arc_gettls, sys_arc_gettls) #define __NR_sysfs (__NR_arch_specific_syscall + 3) __SYSCALL(__NR_sysfs, sys_sysfs) +#undef __SYSCALL + #endif -- cgit v0.10.2 From a26ba7faddd51be3dd8957543a9d984a4ddd104a Mon Sep 17 00:00:00 2001 From: Rashika Kheria Date: Thu, 19 Dec 2013 15:02:22 +0530 Subject: drivers: block: Mark the functions as static in skd_main.c MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mark functions skd_skmsg_state_to_str() and skd_skreq_state_to_str() as static in skd_main.c because they are not used outside this file. This eliminates the following warnings in skd_main.c: drivers/block/skd_main.c:5272:13: warning: no previous prototype for ‘skd_skmsg_state_to_str’ [-Wmissing-prototypes] drivers/block/skd_main.c:5284:13: warning: no previous prototype for ‘skd_skreq_state_to_str’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria Reviewed-by: Josh Triplett Signed-off-by: Jens Axboe diff --git a/drivers/block/skd_main.c b/drivers/block/skd_main.c index 9199c93..eb6e1e0 100644 --- a/drivers/block/skd_main.c +++ b/drivers/block/skd_main.c @@ -5269,7 +5269,7 @@ const char *skd_skdev_state_to_str(enum skd_drvr_state state) } } -const char *skd_skmsg_state_to_str(enum skd_fit_msg_state state) +static const char *skd_skmsg_state_to_str(enum skd_fit_msg_state state) { switch (state) { case SKD_MSG_STATE_IDLE: @@ -5281,7 +5281,7 @@ const char *skd_skmsg_state_to_str(enum skd_fit_msg_state state) } } -const char *skd_skreq_state_to_str(enum skd_req_state state) +static const char *skd_skreq_state_to_str(enum skd_req_state state) { switch (state) { case SKD_REQ_STATE_IDLE: -- cgit v0.10.2 From 0c56010c83703e1f33325838eda9a2077827b6f1 Mon Sep 17 00:00:00 2001 From: Matias Bjorling Date: Tue, 10 Dec 2013 16:50:38 +0100 Subject: null_blk: mem garbage on NUMA systems during init For NUMA systems, initializing the blk-mq layer and using per node hctx. We initialize submit queues to 1, while blk-mq nr_hw_queues is initialized to the number of NUMA nodes. This makes the null_init_hctx function overwrite memory outside of what it allocated. In my case it lead to writing garbage into struct request_queue's mq_map. Signed-off-by: Matias Bjorling Cc: Jens Axboe Signed-off-by: Linus Torvalds diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c index ea192ec..f370fc1 100644 --- a/drivers/block/null_blk.c +++ b/drivers/block/null_blk.c @@ -495,23 +495,23 @@ static int null_add_dev(void) spin_lock_init(&nullb->lock); + if (queue_mode == NULL_Q_MQ && use_per_node_hctx) + submit_queues = nr_online_nodes; + if (setup_queues(nullb)) goto err; if (queue_mode == NULL_Q_MQ) { null_mq_reg.numa_node = home_node; null_mq_reg.queue_depth = hw_queue_depth; + null_mq_reg.nr_hw_queues = submit_queues; if (use_per_node_hctx) { null_mq_reg.ops->alloc_hctx = null_alloc_hctx; null_mq_reg.ops->free_hctx = null_free_hctx; - - null_mq_reg.nr_hw_queues = nr_online_nodes; } else { null_mq_reg.ops->alloc_hctx = blk_mq_alloc_single_hw_queue; null_mq_reg.ops->free_hctx = blk_mq_free_single_hw_queue; - - null_mq_reg.nr_hw_queues = submit_queues; } nullb->q = blk_mq_init_queue(&null_mq_reg, nullb); -- cgit v0.10.2 From 12f8f4fc0314103d47f9b1cbc812597b8d893ce1 Mon Sep 17 00:00:00 2001 From: Matias Bjorling Date: Wed, 18 Dec 2013 13:41:42 +0100 Subject: null_blk: documentation Add description of module and its parameters. Signed-off-by: Matias Bjorling Signed-off-by: Jens Axboe diff --git a/Documentation/block/null_blk.txt b/Documentation/block/null_blk.txt new file mode 100644 index 0000000..9e1b047 --- /dev/null +++ b/Documentation/block/null_blk.txt @@ -0,0 +1,71 @@ +Null block device driver +================================================================================ + +I. Overview + +The null block device (/dev/nullb*) is used for benchmarking the various +block-layer implementations. It emulates a block device of X gigabytes in size. +The following instances are possible: + + Single-queue block-layer + - Request-based. + - Single submission queue per device. + - Implements IO scheduling algorithms (CFQ, Deadline, noop). + Multi-queue block-layer + - Request-based. + - Configurable submission queues per device. + No block-layer (Known as bio-based) + - Bio-based. IO requests are submitted directly to the device driver. + - Directly accepts bio data structure and returns them. + +All of them has a completion queue for each core in the system. + +II. Module parameters applicable for all instances: + +queue_mode=[0-2]: Default: 2-Multi-queue + Selects which block-layer the module should instantiate with. + + 0: Bio-based. + 1: Single-queue. + 2: Multi-queue. + +home_node=[0--nr_nodes]: Default: NUMA_NO_NODE + Selects what socket the data structures is allocated from. + +gb=[Size in GB]: Default: 250GB + The size of the device reported to the system. + +bs=[Block size (in bytes)]: Default: 512 bytes + The block size reported to the system. + +nr_devices=[Num. devices]: Default: 2 + Number of block devices instantiated. They are instantiated as /dev/nullb0, + etc. + +irq_mode=[0-2]: Default: Soft-irq + The completion mode used for completing IOs to the block-layer. + + 0: None. + 1: Soft-irq. Uses ipi to complete IOs across sockets. Simulates the overhead + when IOs are issued from another socket than the home the device is + connected to. + 2: Timer: Waits a specific period (completion_nsec) for each IO before + completion. + +completion_nsec=[Num. ns]: Default: 10.000ns + Combined with irq_mode=2 (timer). The time each completion event must wait. + +submit_queues=[0..nr_cpus]: + The number of submission queues attached to the device driver. If unset, it + defaults to 1 on single-queue and bio-based instances. For multi-queue, + its ignored when use_per_node_hctx module parameter is 1. + +hw_queue_depth=[0..qdepth]: Defaults: 64 + The hardware queue depth of the device. + +III: Multi-queue specific parameters + +use_per_node_hctx=[0/1]: Defaults: 1 + If 1, the multi-queue block layer is instantiated with a hardware dispatch + queue for each CPU node in the system. If 0, it is instantiated with the + number of queues defined in the submit_queues parameter. -- cgit v0.10.2 From 2d263a7856cbaf26dd89b671e2161c4a49f8461b Mon Sep 17 00:00:00 2001 From: Matias Bjorling Date: Wed, 18 Dec 2013 13:41:43 +0100 Subject: null_blk: refactor init and init errors code paths Simplify the initialization logic of the three block-layers. - The queue initialization is split into two parts. This allows reuse of code when initializing the sq-, bio- and mq-based layers. - Set submit_queues default value to 0 and always set it at init time. - Simplify the init error code paths. Signed-off-by: Matias Bjorling Signed-off-by: Jens Axboe diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c index f370fc1..f0aeb2a 100644 --- a/drivers/block/null_blk.c +++ b/drivers/block/null_blk.c @@ -65,7 +65,7 @@ enum { NULL_Q_MQ = 2, }; -static int submit_queues = 1; +static int submit_queues; module_param(submit_queues, int, S_IRUGO); MODULE_PARM_DESC(submit_queues, "Number of submission queues"); @@ -355,16 +355,24 @@ static void null_free_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_index) kfree(hctx); } +static void null_init_queue(struct nullb *nullb, struct nullb_queue *nq) +{ + BUG_ON(!nullb); + BUG_ON(!nq); + + init_waitqueue_head(&nq->wait); + nq->queue_depth = nullb->queue_depth; +} + static int null_init_hctx(struct blk_mq_hw_ctx *hctx, void *data, unsigned int index) { struct nullb *nullb = data; struct nullb_queue *nq = &nullb->queues[index]; - init_waitqueue_head(&nq->wait); - nq->queue_depth = nullb->queue_depth; - nullb->nr_queues++; hctx->driver_data = nq; + null_init_queue(nullb, nq); + nullb->nr_queues++; return 0; } @@ -417,13 +425,13 @@ static int setup_commands(struct nullb_queue *nq) nq->cmds = kzalloc(nq->queue_depth * sizeof(*cmd), GFP_KERNEL); if (!nq->cmds) - return 1; + return -ENOMEM; tag_size = ALIGN(nq->queue_depth, BITS_PER_LONG) / BITS_PER_LONG; nq->tag_map = kzalloc(tag_size * sizeof(unsigned long), GFP_KERNEL); if (!nq->tag_map) { kfree(nq->cmds); - return 1; + return -ENOMEM; } for (i = 0; i < nq->queue_depth; i++) { @@ -454,33 +462,37 @@ static void cleanup_queues(struct nullb *nullb) static int setup_queues(struct nullb *nullb) { - struct nullb_queue *nq; - int i; - - nullb->queues = kzalloc(submit_queues * sizeof(*nq), GFP_KERNEL); + nullb->queues = kzalloc(submit_queues * sizeof(struct nullb_queue), + GFP_KERNEL); if (!nullb->queues) - return 1; + return -ENOMEM; nullb->nr_queues = 0; nullb->queue_depth = hw_queue_depth; - if (queue_mode == NULL_Q_MQ) - return 0; + return 0; +} + +static int init_driver_queues(struct nullb *nullb) +{ + struct nullb_queue *nq; + int i, ret = 0; for (i = 0; i < submit_queues; i++) { nq = &nullb->queues[i]; - init_waitqueue_head(&nq->wait); - nq->queue_depth = hw_queue_depth; - if (setup_commands(nq)) - break; + + null_init_queue(nullb, nq); + + ret = setup_commands(nq); + if (ret) + goto err_queue; nullb->nr_queues++; } - if (i == submit_queues) - return 0; - + return 0; +err_queue: cleanup_queues(nullb); - return 1; + return ret; } static int null_add_dev(void) @@ -495,9 +507,6 @@ static int null_add_dev(void) spin_lock_init(&nullb->lock); - if (queue_mode == NULL_Q_MQ && use_per_node_hctx) - submit_queues = nr_online_nodes; - if (setup_queues(nullb)) goto err; @@ -518,11 +527,13 @@ static int null_add_dev(void) } else if (queue_mode == NULL_Q_BIO) { nullb->q = blk_alloc_queue_node(GFP_KERNEL, home_node); blk_queue_make_request(nullb->q, null_queue_bio); + init_driver_queues(nullb); } else { nullb->q = blk_init_queue_node(null_request_fn, &nullb->lock, home_node); blk_queue_prep_rq(nullb->q, null_rq_prep_fn); if (nullb->q) blk_queue_softirq_done(nullb->q, null_softirq_done_fn); + init_driver_queues(nullb); } if (!nullb->q) @@ -579,7 +590,9 @@ static int __init null_init(void) } #endif - if (submit_queues > nr_cpu_ids) + if (queue_mode == NULL_Q_MQ && use_per_node_hctx) + submit_queues = nr_online_nodes; + else if (submit_queues > nr_cpu_ids) submit_queues = nr_cpu_ids; else if (!submit_queues) submit_queues = 1; -- cgit v0.10.2 From d15ee6b1a43afbe1a6cece3bd8d336a9d5cb7060 Mon Sep 17 00:00:00 2001 From: Matias Bjorling Date: Wed, 18 Dec 2013 13:41:44 +0100 Subject: null_blk: warning on ignored submit_queues param Let the user know when the number of submission queues are being ignored. Signed-off-by: Matias Bjorling Signed-off-by: Jens Axboe diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c index f0aeb2a..8f2e7c3 100644 --- a/drivers/block/null_blk.c +++ b/drivers/block/null_blk.c @@ -590,9 +590,12 @@ static int __init null_init(void) } #endif - if (queue_mode == NULL_Q_MQ && use_per_node_hctx) + if (queue_mode == NULL_Q_MQ && use_per_node_hctx) { + if (submit_queues > 0) + pr_warn("null_blk: submit_queues param is set to %u.", + nr_online_nodes); submit_queues = nr_online_nodes; - else if (submit_queues > nr_cpu_ids) + } else if (submit_queues > nr_cpu_ids) submit_queues = nr_cpu_ids; else if (!submit_queues) submit_queues = 1; -- cgit v0.10.2 From cdc27c27843248ae7eb0df5fc261dd004eaa5670 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Tue, 17 Dec 2013 17:09:08 +0000 Subject: arm64: ptrace: avoid using HW_BREAKPOINT_EMPTY for disabled events Commit 8f34a1da35ae ("arm64: ptrace: use HW_BREAKPOINT_EMPTY type for disabled breakpoints") fixed an issue with GDB trying to zero breakpoint control registers. The problem there is that the arch hw_breakpoint code will attempt to create a (disabled), execute breakpoint of length 0. This will fail validation and report unexpected failure to GDB. To avoid this, we treated disabled breakpoints as HW_BREAKPOINT_EMPTY, but that seems to have broken with recent kernels, causing watchpoints to be treated as TYPE_INST in the core code and returning ENOSPC for any further breakpoints. This patch fixes the problem by prioritising the `enable' field of the breakpoint: if it is cleared, we simply update the perf_event_attr to indicate that the thing is disabled and don't bother changing either the type or the length. This reinforces the behaviour that the breakpoint control register is essentially read-only apart from the enable bit when disabling a breakpoint. Cc: Reported-by: Aaron Liu Signed-off-by: Will Deacon Signed-off-by: Catalin Marinas diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 6777a21..6a8928b 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -214,31 +214,29 @@ static int ptrace_hbp_fill_attr_ctrl(unsigned int note_type, { int err, len, type, disabled = !ctrl.enabled; - if (disabled) { - len = 0; - type = HW_BREAKPOINT_EMPTY; - } else { - err = arch_bp_generic_fields(ctrl, &len, &type); - if (err) - return err; - - switch (note_type) { - case NT_ARM_HW_BREAK: - if ((type & HW_BREAKPOINT_X) != type) - return -EINVAL; - break; - case NT_ARM_HW_WATCH: - if ((type & HW_BREAKPOINT_RW) != type) - return -EINVAL; - break; - default: + attr->disabled = disabled; + if (disabled) + return 0; + + err = arch_bp_generic_fields(ctrl, &len, &type); + if (err) + return err; + + switch (note_type) { + case NT_ARM_HW_BREAK: + if ((type & HW_BREAKPOINT_X) != type) return -EINVAL; - } + break; + case NT_ARM_HW_WATCH: + if ((type & HW_BREAKPOINT_RW) != type) + return -EINVAL; + break; + default: + return -EINVAL; } attr->bp_len = len; attr->bp_type = type; - attr->disabled = disabled; return 0; } -- cgit v0.10.2 From 85fbd722ad0f5d64d1ad15888cd1eb2188bfb557 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Wed, 18 Dec 2013 07:07:32 -0500 Subject: libata, freezer: avoid block device removal while system is frozen MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Freezable kthreads and workqueues are fundamentally problematic in that they effectively introduce a big kernel lock widely used in the kernel and have already been the culprit of several deadlock scenarios. This is the latest occurrence. During resume, libata rescans all the ports and revalidates all pre-existing devices. If it determines that a device has gone missing, the device is removed from the system which involves invalidating block device and flushing bdi while holding driver core layer locks. Unfortunately, this can race with the rest of device resume. Because freezable kthreads and workqueues are thawed after device resume is complete and block device removal depends on freezable workqueues and kthreads (e.g. bdi_wq, jbd2) to make progress, this can lead to deadlock - block device removal can't proceed because kthreads are frozen and kthreads can't be thawed because device resume is blocked behind block device removal. 839a8e8660b6 ("writeback: replace custom worker pool implementation with unbound workqueue") made this particular deadlock scenario more visible but the underlying problem has always been there - the original forker task and jbd2 are freezable too. In fact, this is highly likely just one of many possible deadlock scenarios given that freezer behaves as a big kernel lock and we don't have any debug mechanism around it. I believe the right thing to do is getting rid of freezable kthreads and workqueues. This is something fundamentally broken. For now, implement a funny workaround in libata - just avoid doing block device hot[un]plug while the system is frozen. Kernel engineering at its finest. :( v2: Add EXPORT_SYMBOL_GPL(pm_freezing) for cases where libata is built as a module. v3: Comment updated and polling interval changed to 10ms as suggested by Rafael. v4: Add #ifdef CONFIG_FREEZER around the hack as pm_freezing is not defined when FREEZER is not configured thus breaking build. Reported by kbuild test robot. Signed-off-by: Tejun Heo Reported-by: Tomaž Šolc Reviewed-by: "Rafael J. Wysocki" Link: https://bugzilla.kernel.org/show_bug.cgi?id=62801 Link: http://lkml.kernel.org/r/20131213174932.GA27070@htj.dyndns.org Cc: Greg Kroah-Hartman Cc: Len Brown Cc: Oleg Nesterov Cc: stable@vger.kernel.org Cc: kbuild test robot diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index db6dfcf..176f629 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -3871,6 +3871,27 @@ void ata_scsi_hotplug(struct work_struct *work) return; } + /* + * XXX - UGLY HACK + * + * The block layer suspend/resume path is fundamentally broken due + * to freezable kthreads and workqueue and may deadlock if a block + * device gets removed while resume is in progress. I don't know + * what the solution is short of removing freezable kthreads and + * workqueues altogether. + * + * The following is an ugly hack to avoid kicking off device + * removal while freezer is active. This is a joke but does avoid + * this particular deadlock scenario. + * + * https://bugzilla.kernel.org/show_bug.cgi?id=62801 + * http://marc.info/?l=linux-kernel&m=138695698516487 + */ +#ifdef CONFIG_FREEZER + while (pm_freezing) + msleep(10); +#endif + DPRINTK("ENTER\n"); mutex_lock(&ap->scsi_scan_mutex); diff --git a/kernel/freezer.c b/kernel/freezer.c index b462fa1..aa6a8aa 100644 --- a/kernel/freezer.c +++ b/kernel/freezer.c @@ -19,6 +19,12 @@ EXPORT_SYMBOL(system_freezing_cnt); bool pm_freezing; bool pm_nosig_freezing; +/* + * Temporary export for the deadlock workaround in ata_scsi_hotplug(). + * Remove once the hack becomes unnecessary. + */ +EXPORT_SYMBOL_GPL(pm_freezing); + /* protects freezing and frozen transitions */ static DEFINE_SPINLOCK(freezer_lock); -- cgit v0.10.2 From 40e2d7f9b5dae048789c64672bf3027fbb663ffa Mon Sep 17 00:00:00 2001 From: Len Brown Date: Wed, 18 Dec 2013 16:44:57 -0500 Subject: x86 idle: Repair large-server 50-watt idle-power regression Linux 3.10 changed the timing of how thread_info->flags is touched: x86: Use generic idle loop (7d1a941731fabf27e5fb6edbebb79fe856edb4e5) This caused Intel NHM-EX and WSM-EX servers to experience a large number of immediate MONITOR/MWAIT break wakeups, which caused cpuidle to demote from deep C-states to shallow C-states, which caused these platforms to experience a significant increase in idle power. Note that this issue was already present before the commit above, however, it wasn't seen often enough to be noticed in power measurements. Here we extend an errata workaround from the Core2 EX "Dunnington" to extend to NHM-EX and WSM-EX, to prevent these immediate returns from MWAIT, reducing idle power on these platforms. While only acpi_idle ran on Dunnington, intel_idle may also run on these two newer systems. As of today, there are no other models that are known to need this tweak. Link: http://lkml.kernel.org/r/CAJvTdK=%2BaNN66mYpCGgbHGCHhYQAKx-vB0kJSWjVpsNb_hOAtQ@mail.gmail.com Signed-off-by: Len Brown Link: http://lkml.kernel.org/r/baff264285f6e585df757d58b17788feabc68918.1387403066.git.len.brown@intel.com Cc: # 3.12.x, 3.11.x, 3.10.x Signed-off-by: H. Peter Anvin diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index dc1ec0d..ea04b34 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -387,7 +387,8 @@ static void init_intel(struct cpuinfo_x86 *c) set_cpu_cap(c, X86_FEATURE_PEBS); } - if (c->x86 == 6 && c->x86_model == 29 && cpu_has_clflush) + if (c->x86 == 6 && cpu_has_clflush && + (c->x86_model == 29 || c->x86_model == 46 || c->x86_model == 47)) set_cpu_cap(c, X86_FEATURE_CLFLUSH_MONITOR); #ifdef CONFIG_X86_64 diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c index 92d1206..f80b700 100644 --- a/drivers/idle/intel_idle.c +++ b/drivers/idle/intel_idle.c @@ -377,6 +377,9 @@ static int intel_idle(struct cpuidle_device *dev, if (!current_set_polling_and_test()) { + if (this_cpu_has(X86_FEATURE_CLFLUSH_MONITOR)) + clflush((void *)¤t_thread_info()->flags); + __monitor((void *)¤t_thread_info()->flags, 0, 0); smp_mb(); if (!need_resched()) -- cgit v0.10.2 From b1aac815c0891fe4a55a6b0b715910142227700f Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Tue, 17 Dec 2013 00:38:39 +0100 Subject: net: inet_diag: zero out uninitialized idiag_{src,dst} fields Jakub reported while working with nlmon netlink sniffer that parts of the inet_diag_sockid are not initialized when r->idiag_family != AF_INET6. That is, fields of r->id.idiag_src[1 ... 3], r->id.idiag_dst[1 ... 3]. In fact, it seems that we can leak 6 * sizeof(u32) byte of kernel [slab] memory through this. At least, in udp_dump_one(), we allocate a skb in ... rep = nlmsg_new(sizeof(struct inet_diag_msg) + ..., GFP_KERNEL); ... and then pass that to inet_sk_diag_fill() that puts the whole struct inet_diag_msg into the skb, where we only fill out r->id.idiag_src[0], r->id.idiag_dst[0] and leave the rest untouched: r->id.idiag_src[0] = inet->inet_rcv_saddr; r->id.idiag_dst[0] = inet->inet_daddr; struct inet_diag_msg embeds struct inet_diag_sockid that is correctly / fully filled out in IPv6 case, but for IPv4 not. So just zero them out by using plain memset (for this little amount of bytes it's probably not worth the extra check for idiag_family == AF_INET). Similarly, fix also other places where we fill that out. Reported-by: Jakub Zawadzki Signed-off-by: Daniel Borkmann Signed-off-by: David S. Miller diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 56a964a..a0f52da 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -106,6 +106,10 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk, r->id.idiag_sport = inet->inet_sport; r->id.idiag_dport = inet->inet_dport; + + memset(&r->id.idiag_src, 0, sizeof(r->id.idiag_src)); + memset(&r->id.idiag_dst, 0, sizeof(r->id.idiag_dst)); + r->id.idiag_src[0] = inet->inet_rcv_saddr; r->id.idiag_dst[0] = inet->inet_daddr; @@ -240,12 +244,19 @@ static int inet_twsk_diag_fill(struct inet_timewait_sock *tw, r->idiag_family = tw->tw_family; r->idiag_retrans = 0; + r->id.idiag_if = tw->tw_bound_dev_if; sock_diag_save_cookie(tw, r->id.idiag_cookie); + r->id.idiag_sport = tw->tw_sport; r->id.idiag_dport = tw->tw_dport; + + memset(&r->id.idiag_src, 0, sizeof(r->id.idiag_src)); + memset(&r->id.idiag_dst, 0, sizeof(r->id.idiag_dst)); + r->id.idiag_src[0] = tw->tw_rcv_saddr; r->id.idiag_dst[0] = tw->tw_daddr; + r->idiag_state = tw->tw_substate; r->idiag_timer = 3; r->idiag_expires = jiffies_to_msecs(tmo); @@ -726,8 +737,13 @@ static int inet_diag_fill_req(struct sk_buff *skb, struct sock *sk, r->id.idiag_sport = inet->inet_sport; r->id.idiag_dport = ireq->ir_rmt_port; + + memset(&r->id.idiag_src, 0, sizeof(r->id.idiag_src)); + memset(&r->id.idiag_dst, 0, sizeof(r->id.idiag_dst)); + r->id.idiag_src[0] = ireq->ir_loc_addr; r->id.idiag_dst[0] = ireq->ir_rmt_addr; + r->idiag_expires = jiffies_to_msecs(tmo); r->idiag_rqueue = 0; r->idiag_wqueue = 0; -- cgit v0.10.2 From 0c8d087c04cdcef501064552149289866e53aa6c Mon Sep 17 00:00:00 2001 From: Wei Yongjun Date: Tue, 17 Dec 2013 10:42:09 +0800 Subject: xen-netback: fix some error return code 'err' is overwrited to 0 after maybe_pull_tail() call, so the error code was not set if skb_partial_csum_set() call failed. Fix to return error -EPROTO from those error handling case instead of 0. Fixes: d52eb0d46f36 ('xen-netback: make sure skb linear area covers checksum field') Signed-off-by: Wei Yongjun Acked-by: Wei Liu Signed-off-by: David S. Miller diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 27bbe58..7b4fd93 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -1209,8 +1209,10 @@ static int checksum_setup_ip(struct xenvif *vif, struct sk_buff *skb, goto out; if (!skb_partial_csum_set(skb, off, - offsetof(struct tcphdr, check))) + offsetof(struct tcphdr, check))) { + err = -EPROTO; goto out; + } if (recalculate_partial_csum) tcp_hdr(skb)->check = @@ -1227,8 +1229,10 @@ static int checksum_setup_ip(struct xenvif *vif, struct sk_buff *skb, goto out; if (!skb_partial_csum_set(skb, off, - offsetof(struct udphdr, check))) + offsetof(struct udphdr, check))) { + err = -EPROTO; goto out; + } if (recalculate_partial_csum) udp_hdr(skb)->check = @@ -1350,8 +1354,10 @@ static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb, goto out; if (!skb_partial_csum_set(skb, off, - offsetof(struct tcphdr, check))) + offsetof(struct tcphdr, check))) { + err = -EPROTO; goto out; + } if (recalculate_partial_csum) tcp_hdr(skb)->check = @@ -1368,8 +1374,10 @@ static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb, goto out; if (!skb_partial_csum_set(skb, off, - offsetof(struct udphdr, check))) + offsetof(struct udphdr, check))) { + err = -EPROTO; goto out; + } if (recalculate_partial_csum) udp_hdr(skb)->check = -- cgit v0.10.2 From e9db5c21d3646a6454fcd04938dd215ac3ab620a Mon Sep 17 00:00:00 2001 From: Wenliang Fan Date: Tue, 17 Dec 2013 11:25:28 +0800 Subject: drivers/net/hamradio: Integer overflow in hdlcdrv_ioctl() The local variable 'bi' comes from userspace. If userspace passed a large number to 'bi.data.calibrate', there would be an integer overflow in the following line: s->hdlctx.calibrate = bi.data.calibrate * s->par.bitrate / 16; Signed-off-by: Wenliang Fan Signed-off-by: David S. Miller diff --git a/drivers/net/hamradio/hdlcdrv.c b/drivers/net/hamradio/hdlcdrv.c index 3169252..5d78c1d 100644 --- a/drivers/net/hamradio/hdlcdrv.c +++ b/drivers/net/hamradio/hdlcdrv.c @@ -571,6 +571,8 @@ static int hdlcdrv_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) case HDLCDRVCTL_CALIBRATE: if(!capable(CAP_SYS_RAWIO)) return -EPERM; + if (bi.data.calibrate > INT_MAX / s->par.bitrate) + return -EINVAL; s->hdlctx.calibrate = bi.data.calibrate * s->par.bitrate / 16; return 0; -- cgit v0.10.2 From 8e3fbf870481eb53b2d3a322d1fc395ad8b367ed Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Salva=20Peir=C3=B3?= Date: Tue, 17 Dec 2013 10:06:30 +0100 Subject: hamradio/yam: fix info leak in ioctl MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The yam_ioctl() code fails to initialise the cmd field of the struct yamdrv_ioctl_cfg. Add an explicit memset(0) before filling the structure to avoid the 4-byte info leak. Signed-off-by: Salva Peiró Signed-off-by: David S. Miller diff --git a/drivers/net/hamradio/yam.c b/drivers/net/hamradio/yam.c index 1971411..61dd244 100644 --- a/drivers/net/hamradio/yam.c +++ b/drivers/net/hamradio/yam.c @@ -1057,6 +1057,7 @@ static int yam_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd) break; case SIOCYAMGCFG: + memset(&yi, 0, sizeof(yi)); yi.cfg.mask = 0xffffffff; yi.cfg.iobase = yp->iobase; yi.cfg.irq = yp->irq; -- cgit v0.10.2 From c047e0707307717818542c939223fc8ca454e2c9 Mon Sep 17 00:00:00 2001 From: Michal Schmidt Date: Tue, 17 Dec 2013 18:51:25 +0100 Subject: bnx2x: downgrade "valid ME register value" message level "valid ME register value" is not an error. It should be logged for debugging only. Signed-off-by: Michal Schmidt Acked-by: Yuval Mintz Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c index efa8a15..3dc2537 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c @@ -208,7 +208,7 @@ static int bnx2x_get_vf_id(struct bnx2x *bp, u32 *vf_id) return -EINVAL; } - BNX2X_ERR("valid ME register value: 0x%08x\n", me_reg); + DP(BNX2X_MSG_IOV, "valid ME register value: 0x%08x\n", me_reg); *vf_id = (me_reg & ME_REG_VF_NUM_MASK) >> ME_REG_VF_NUM_SHIFT; -- cgit v0.10.2 From fce7d3bfc0ae70ca2a57868f2a9708b13185fd1f Mon Sep 17 00:00:00 2001 From: Jan Beulich Date: Mon, 16 Dec 2013 14:39:40 +0000 Subject: x86/efi: Don't select EFI from certain special ACPI drivers Commit 7ea6c6c1 ("Move cper.c from drivers/acpi/apei to drivers/firmware/efi") results in CONFIG_EFI being enabled even when the user doesn't want this. Since ACPI APEI used to build fine without UEFI (and as far as I know also has no functional depency on it), at least in that case using a reverse dependency is wrong (and a straight one isn't needed). Whether the same is true for ACPI_EXTLOG I don't know - if there is a functional dependency, it should depend on EFI rather than selecting it. It certainly has (currently) no build dependency. Adjust Kconfig and build logic so that the bad dependency gets avoided. Signed-off-by: Jan Beulich Acked-by: Tony Luck Cc: Matt Fleming Link: http://lkml.kernel.org/r/52AF1EBC020000780010DBF9@nat28.tlf.novell.com Signed-off-by: Ingo Molnar diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index 5d92485..4770de5 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -348,7 +348,6 @@ source "drivers/acpi/apei/Kconfig" config ACPI_EXTLOG tristate "Extended Error Log support" depends on X86_MCE && X86_LOCAL_APIC - select EFI select UEFI_CPER default n help diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig index 786294b..3650b21 100644 --- a/drivers/acpi/apei/Kconfig +++ b/drivers/acpi/apei/Kconfig @@ -2,7 +2,6 @@ config ACPI_APEI bool "ACPI Platform Error Interface (APEI)" select MISC_FILESYSTEMS select PSTORE - select EFI select UEFI_CPER depends on X86 help diff --git a/drivers/firmware/Makefile b/drivers/firmware/Makefile index 299fad6..5373dc5 100644 --- a/drivers/firmware/Makefile +++ b/drivers/firmware/Makefile @@ -14,3 +14,4 @@ obj-$(CONFIG_FIRMWARE_MEMMAP) += memmap.o obj-$(CONFIG_GOOGLE_FIRMWARE) += google/ obj-$(CONFIG_EFI) += efi/ +obj-$(CONFIG_UEFI_CPER) += efi/ diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig index 3150aa4..6aecbc8 100644 --- a/drivers/firmware/efi/Kconfig +++ b/drivers/firmware/efi/Kconfig @@ -36,7 +36,7 @@ config EFI_VARS_PSTORE_DEFAULT_DISABLE backend for pstore by default. This setting can be overridden using the efivars module's pstore_disable parameter. -config UEFI_CPER - def_bool n - endmenu + +config UEFI_CPER + bool diff --git a/drivers/firmware/efi/Makefile b/drivers/firmware/efi/Makefile index 9ba156d..6c2a41e 100644 --- a/drivers/firmware/efi/Makefile +++ b/drivers/firmware/efi/Makefile @@ -1,7 +1,7 @@ # # Makefile for linux kernel # -obj-y += efi.o vars.o +obj-$(CONFIG_EFI) += efi.o vars.o obj-$(CONFIG_EFI_VARS) += efivars.o obj-$(CONFIG_EFI_VARS_PSTORE) += efi-pstore.o obj-$(CONFIG_UEFI_CPER) += cper.o -- cgit v0.10.2 From de06875f089678f4f9f1e8d5e1421fb0ceab12d0 Mon Sep 17 00:00:00 2001 From: Andy Grover Date: Tue, 26 Nov 2013 11:49:24 -0800 Subject: target: Remove extra percpu_ref_init lun->lun_ref is also initialized in core_tpg_post_addlun, so it doesn't need to be done in core_tpg_setup_virtual_lun0. (nab: Drop left-over percpu_ref_cancel_init in failure path) Signed-off-by: Andy Grover Signed-off-by: Nicholas Bellinger diff --git a/drivers/target/target_core_tpg.c b/drivers/target/target_core_tpg.c index f755712..2a573de 100644 --- a/drivers/target/target_core_tpg.c +++ b/drivers/target/target_core_tpg.c @@ -656,15 +656,9 @@ static int core_tpg_setup_virtual_lun0(struct se_portal_group *se_tpg) spin_lock_init(&lun->lun_sep_lock); init_completion(&lun->lun_ref_comp); - ret = percpu_ref_init(&lun->lun_ref, core_tpg_lun_ref_release); - if (ret < 0) - return ret; - ret = core_tpg_post_addlun(se_tpg, lun, lun_access, dev); - if (ret < 0) { - percpu_ref_cancel_init(&lun->lun_ref); + if (ret < 0) return ret; - } return 0; } -- cgit v0.10.2 From dcd211997ddba3229e875f5990ea14f920934729 Mon Sep 17 00:00:00 2001 From: Nicholas Bellinger Date: Tue, 17 Dec 2013 01:51:22 -0800 Subject: qla2xxx: Fix scsi_host leak on qlt_lport_register callback failure This patch fixes a possible scsi_host reference leak in qlt_lport_register(), when a non zero return from the passed (*callback) does not call drop the local reference via scsi_host_put() before returning. This currently does not effect existing tcm_qla2xxx code as the passed callback will never fail, but fix this up regardless for future code. Cc: Chad Dupuis Signed-off-by: Nicholas Bellinger diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c index 3bb0a1d..38a1257 100644 --- a/drivers/scsi/qla2xxx/qla_target.c +++ b/drivers/scsi/qla2xxx/qla_target.c @@ -4291,6 +4291,7 @@ int qlt_lport_register(struct qla_tgt_func_tmpl *qla_tgt_ops, u64 wwpn, if (rc != 0) { ha->tgt.tgt_ops = NULL; ha->tgt.target_lport_ptr = NULL; + scsi_host_put(host); } mutex_unlock(&qla_tgt_mutex); return rc; -- cgit v0.10.2 From 7a2a84518cfb263d2c4171b3d63671f88316adb2 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 19 Dec 2013 10:53:02 -0800 Subject: net: fec: fix potential use after free skb_tx_timestamp(skb) should be called _before_ TX completion has a chance to trigger, otherwise it is too late and we access freed memory. Signed-off-by: Eric Dumazet Fixes: de5fb0a05348 ("net: fec: put tx to napi poll function to fix dead lock") Cc: Frank Li Cc: Richard Cochran Acked-by: Richard Cochran Acked-by: Frank Li Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index e7c8b74..50bb71c 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -428,6 +428,8 @@ fec_enet_start_xmit(struct sk_buff *skb, struct net_device *ndev) /* If this was the last BD in the ring, start at the beginning again. */ bdp = fec_enet_get_nextdesc(bdp, fep); + skb_tx_timestamp(skb); + fep->cur_tx = bdp; if (fep->cur_tx == fep->dirty_tx) @@ -436,8 +438,6 @@ fec_enet_start_xmit(struct sk_buff *skb, struct net_device *ndev) /* Trigger transmission start */ writel(0, fep->hwp + FEC_X_DES_ACTIVE); - skb_tx_timestamp(skb); - return NETDEV_TX_OK; } -- cgit v0.10.2 From 24f5b855e17df7e355eacd6c4a12cc4d6a6c9ff0 Mon Sep 17 00:00:00 2001 From: Li RongQing Date: Thu, 19 Dec 2013 12:40:26 +0800 Subject: ipv6: always set the new created dst's from in ip6_rt_copy ip6_rt_copy only sets dst.from if ort has flag RTF_ADDRCONF and RTF_DEFAULT. but the prefix routes which did get installed by hand locally can have an expiration, and no any flag combination which can ensure a potential from does never expire, so we should always set the new created dst's from. This also fixes the new created dst is always expired since the ort, which is created by RA, maybe has RTF_EXPIRES and RTF_ADDRCONF, but no RTF_DEFAULT. Suggested-by: Hannes Frederic Sowa CC: Gao feng Signed-off-by: Li RongQing Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller diff --git a/net/ipv6/route.c b/net/ipv6/route.c index a0a48ac..4b4944c 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1905,9 +1905,7 @@ static struct rt6_info *ip6_rt_copy(struct rt6_info *ort, else rt->rt6i_gateway = *dest; rt->rt6i_flags = ort->rt6i_flags; - if ((ort->rt6i_flags & (RTF_DEFAULT | RTF_ADDRCONF)) == - (RTF_DEFAULT | RTF_ADDRCONF)) - rt6_set_from(rt, ort); + rt6_set_from(rt, ort); rt->rt6i_metric = 0; #ifdef CONFIG_IPV6_SUBTREES -- cgit v0.10.2 From a4f63634760acfcc349a5582c77ca4a004c813f6 Mon Sep 17 00:00:00 2001 From: Betty Dall Date: Thu, 19 Dec 2013 10:59:09 -0700 Subject: atl1c: Check return from pci_find_ext_capability() in atl1c_reset_pcie() The function atl1c_reset_pcie() does not check the return from pci_find_ext_cabability() where it is getting the postion of the PCI_EXT_CAP_ID_ERR. It is possible for the return to be 0. Signed-off-by: Betty Dall Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c index a36a760..2980175 100644 --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c @@ -145,9 +145,11 @@ static void atl1c_reset_pcie(struct atl1c_hw *hw, u32 flag) * Mask some pcie error bits */ pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); - pci_read_config_dword(pdev, pos + PCI_ERR_UNCOR_SEVER, &data); - data &= ~(PCI_ERR_UNC_DLP | PCI_ERR_UNC_FCP); - pci_write_config_dword(pdev, pos + PCI_ERR_UNCOR_SEVER, data); + if (pos) { + pci_read_config_dword(pdev, pos + PCI_ERR_UNCOR_SEVER, &data); + data &= ~(PCI_ERR_UNC_DLP | PCI_ERR_UNC_FCP); + pci_write_config_dword(pdev, pos + PCI_ERR_UNCOR_SEVER, data); + } /* clear error status */ pcie_capability_write_word(pdev, PCI_EXP_DEVSTA, PCI_EXP_DEVSTA_NFED | -- cgit v0.10.2 From 1a1f20bc9debd133549d5b289bd5494a4264a73d Mon Sep 17 00:00:00 2001 From: Leigh Brown Date: Thu, 19 Dec 2013 13:09:48 +0000 Subject: net: mvmdio: fix interrupt timeout handling This version corrects the whitespace issue. orion_mdio_wait_ready uses wait_event_timeout to wait for the SMI interrupt to fire. wait_event_timeout waits for between "timeout - 1" and "timeout" jiffies. In this case a 1ms timeout when HZ is 1000 results in a wait of 0 to 1 jiffies, causing premature timeouts. This fix ensures a minimum timeout of 2 jiffies, ensuring wait_event_timeout will always wait at least 1 jiffie. Issue reported by Nicolas Schichan. Tested-by: Nicolas Schichan Signed-off-by: Leigh Brown Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/marvell/mvmdio.c b/drivers/net/ethernet/marvell/mvmdio.c index 7354960..c4eeb69 100644 --- a/drivers/net/ethernet/marvell/mvmdio.c +++ b/drivers/net/ethernet/marvell/mvmdio.c @@ -92,6 +92,12 @@ static int orion_mdio_wait_ready(struct mii_bus *bus) if (time_is_before_jiffies(end)) ++timedout; } else { + /* wait_event_timeout does not guarantee a delay of at + * least one whole jiffie, so timeout must be no less + * than two. + */ + if (timeout < 2) + timeout = 2; wait_event_timeout(dev->smi_busy_wait, orion_mdio_smi_is_done(dev), timeout); -- cgit v0.10.2 From 965cdea825693c821d200e38fac9402cde6dce6a Mon Sep 17 00:00:00 2001 From: Wang Weidong Date: Tue, 17 Dec 2013 19:24:33 -0700 Subject: dccp: catch failed request_module call in dccp_probe init Check the return value of request_module during dccp_probe initialisation, bail out if that call fails. Signed-off-by: Gerrit Renker Signed-off-by: Wang Weidong Signed-off-by: David S. Miller diff --git a/net/dccp/probe.c b/net/dccp/probe.c index 4c6bdf9..595ddf0 100644 --- a/net/dccp/probe.c +++ b/net/dccp/probe.c @@ -152,17 +152,6 @@ static const struct file_operations dccpprobe_fops = { .llseek = noop_llseek, }; -static __init int setup_jprobe(void) -{ - int ret = register_jprobe(&dccp_send_probe); - - if (ret) { - request_module("dccp"); - ret = register_jprobe(&dccp_send_probe); - } - return ret; -} - static __init int dccpprobe_init(void) { int ret = -ENOMEM; @@ -174,7 +163,13 @@ static __init int dccpprobe_init(void) if (!proc_create(procname, S_IRUSR, init_net.proc_net, &dccpprobe_fops)) goto err0; - ret = setup_jprobe(); + ret = register_jprobe(&dccp_send_probe); + if (ret) { + ret = request_module("dccp"); + if (!ret) + ret = register_jprobe(&dccp_send_probe); + } + if (ret) goto err1; -- cgit v0.10.2 From e2f6c88fb903e123edfd1106b0b8310d5117f774 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Thu, 19 Dec 2013 19:41:46 -0500 Subject: drm/radeon: fix asic gfx values for scrapper asics Fixes gfx corruption on certain TN/RL parts. bug: https://bugs.freedesktop.org/show_bug.cgi?id=60389 Signed-off-by: Alex Deucher Cc: stable@vger.kernel.org diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c index 11aab2a..f59a9e9 100644 --- a/drivers/gpu/drm/radeon/ni.c +++ b/drivers/gpu/drm/radeon/ni.c @@ -895,6 +895,10 @@ static void cayman_gpu_init(struct radeon_device *rdev) (rdev->pdev->device == 0x999C)) { rdev->config.cayman.max_simds_per_se = 6; rdev->config.cayman.max_backends_per_se = 2; + rdev->config.cayman.max_hw_contexts = 8; + rdev->config.cayman.sx_max_export_size = 256; + rdev->config.cayman.sx_max_export_pos_size = 64; + rdev->config.cayman.sx_max_export_smx_size = 192; } else if ((rdev->pdev->device == 0x9903) || (rdev->pdev->device == 0x9904) || (rdev->pdev->device == 0x990A) || @@ -905,6 +909,10 @@ static void cayman_gpu_init(struct radeon_device *rdev) (rdev->pdev->device == 0x999D)) { rdev->config.cayman.max_simds_per_se = 4; rdev->config.cayman.max_backends_per_se = 2; + rdev->config.cayman.max_hw_contexts = 8; + rdev->config.cayman.sx_max_export_size = 256; + rdev->config.cayman.sx_max_export_pos_size = 64; + rdev->config.cayman.sx_max_export_smx_size = 192; } else if ((rdev->pdev->device == 0x9919) || (rdev->pdev->device == 0x9990) || (rdev->pdev->device == 0x9991) || @@ -915,9 +923,17 @@ static void cayman_gpu_init(struct radeon_device *rdev) (rdev->pdev->device == 0x99A0)) { rdev->config.cayman.max_simds_per_se = 3; rdev->config.cayman.max_backends_per_se = 1; + rdev->config.cayman.max_hw_contexts = 4; + rdev->config.cayman.sx_max_export_size = 128; + rdev->config.cayman.sx_max_export_pos_size = 32; + rdev->config.cayman.sx_max_export_smx_size = 96; } else { rdev->config.cayman.max_simds_per_se = 2; rdev->config.cayman.max_backends_per_se = 1; + rdev->config.cayman.max_hw_contexts = 4; + rdev->config.cayman.sx_max_export_size = 128; + rdev->config.cayman.sx_max_export_pos_size = 32; + rdev->config.cayman.sx_max_export_smx_size = 96; } rdev->config.cayman.max_texture_channel_caches = 2; rdev->config.cayman.max_gprs = 256; @@ -925,10 +941,6 @@ static void cayman_gpu_init(struct radeon_device *rdev) rdev->config.cayman.max_gs_threads = 32; rdev->config.cayman.max_stack_entries = 512; rdev->config.cayman.sx_num_of_sets = 8; - rdev->config.cayman.sx_max_export_size = 256; - rdev->config.cayman.sx_max_export_pos_size = 64; - rdev->config.cayman.sx_max_export_smx_size = 192; - rdev->config.cayman.max_hw_contexts = 8; rdev->config.cayman.sq_num_cf_insts = 2; rdev->config.cayman.sc_prim_fifo_size = 0x40; -- cgit v0.10.2 From 540436c80e5918dd5ed838449e108b1726fc4d68 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 20 Dec 2013 11:23:15 +0100 Subject: netfilter: nft_exthdr: call ipv6_find_hdr() with explicitly initialized offset In nft's nft_exthdr_eval() routine we process IPv6 extension header through invoking ipv6_find_hdr(), but we call it with an uninitialized offset variable that contains some stack value. In ipv6_find_hdr() we then test if the value of offset != 0 and call skb_header_pointer() on that offset in order to map struct ipv6hdr into it. Fix it up by initializing offset to 0 as it was probably intended to be. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Daniel Borkmann Cc: Hannes Frederic Sowa Signed-off-by: Pablo Neira Ayuso diff --git a/net/netfilter/nft_exthdr.c b/net/netfilter/nft_exthdr.c index 8e0bb75..55c939f 100644 --- a/net/netfilter/nft_exthdr.c +++ b/net/netfilter/nft_exthdr.c @@ -31,7 +31,7 @@ static void nft_exthdr_eval(const struct nft_expr *expr, { struct nft_exthdr *priv = nft_expr_priv(expr); struct nft_data *dest = &data[priv->dreg]; - unsigned int offset; + unsigned int offset = 0; int err; err = ipv6_find_hdr(pkt->skb, &offset, priv->type, NULL, NULL); -- cgit v0.10.2 From 443d20fd188208aa4df2118ad49f9168e411016d Mon Sep 17 00:00:00 2001 From: Helmut Schaa Date: Fri, 20 Dec 2013 14:41:54 +0100 Subject: netfilter: nf_ct_timestamp: Fix BUG_ON after netns deletion When having nf_conntrack_timestamp enabled deleting a netns can lead to the following BUG being triggered: [63836.660000] Kernel bug detected[#1]: [63836.660000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.18 #14 [63836.660000] task: 802d9420 ti: 802d2000 task.ti: 802d2000 [63836.660000] $ 0 : 00000000 00000000 00000000 00000000 [63836.660000] $ 4 : 00000001 00000004 00000020 00000020 [63836.660000] $ 8 : 00000000 80064910 00000000 00000000 [63836.660000] $12 : 0bff0002 00000001 00000000 0a0a0abe [63836.660000] $16 : 802e70a0 85f29d80 00000000 00000004 [63836.660000] $20 : 85fb62a0 00000002 802d3bc0 85fb62a0 [63836.660000] $24 : 00000000 87138110 [63836.660000] $28 : 802d2000 802d3b40 00000014 871327cc [63836.660000] Hi : 000005ff [63836.660000] Lo : f2edd000 [63836.660000] epc : 87138794 __nf_ct_ext_add_length+0xe8/0x1ec [nf_conntrack] [63836.660000] Not tainted [63836.660000] ra : 871327cc nf_conntrack_in+0x31c/0x7b8 [nf_conntrack] [63836.660000] Status: 1100d403 KERNEL EXL IE [63836.660000] Cause : 00800034 [63836.660000] PrId : 0001974c (MIPS 74Kc) [63836.660000] Modules linked in: ath9k ath9k_common pppoe ppp_async iptable_nat ath9k_hw ath pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv4 mac80211 ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_quota xt_policy xt_pkttype xt_owner xt_nat xt_multiport xt_mark xh [63836.660000] Process swapper (pid: 0, threadinfo=802d2000, task=802d9420, tls=00000000) [63836.660000] Stack : 802e70a0 871323d4 00000005 87080234 802e70a0 86d2a840 00000000 00000000 [63836.660000] Call Trace: [63836.660000] [<87138794>] __nf_ct_ext_add_length+0xe8/0x1ec [nf_conntrack] [63836.660000] [<871327cc>] nf_conntrack_in+0x31c/0x7b8 [nf_conntrack] [63836.660000] [<801ff63c>] nf_iterate+0x90/0xec [63836.660000] [<801ff730>] nf_hook_slow+0x98/0x164 [63836.660000] [<80205968>] ip_rcv+0x3e8/0x40c [63836.660000] [<801d9754>] __netif_receive_skb_core+0x624/0x6a4 [63836.660000] [<801da124>] process_backlog+0xa4/0x16c [63836.660000] [<801d9bb4>] net_rx_action+0x10c/0x1e0 [63836.660000] [<8007c5a4>] __do_softirq+0xd0/0x1bc [63836.660000] [<8007c730>] do_softirq+0x48/0x68 [63836.660000] [<8007c964>] irq_exit+0x54/0x70 [63836.660000] [<80060830>] ret_from_irq+0x0/0x4 [63836.660000] [<8006a9f8>] r4k_wait_irqoff+0x18/0x1c [63836.660000] [<8009cfb8>] cpu_startup_entry+0xa4/0x104 [63836.660000] [<802eb918>] start_kernel+0x394/0x3ac [63836.660000] [63836.660000] Code: 00821021 8c420000 2c440001 <00040336> 90440011 92350010 90560010 2485ffff 02a5a821 [63837.040000] ---[ end trace ebf660c3ce3b55e7 ]--- [63837.050000] Kernel panic - not syncing: Fatal exception in interrupt [63837.050000] Rebooting in 3 seconds.. Fix this by not unregistering the conntrack extension in the per-netns cleanup code. This bug was introduced in (73f4001 netfilter: nf_ct_tstamp: move initialization out of pernet_operations). Signed-off-by: Helmut Schaa Signed-off-by: Pablo Neira Ayuso diff --git a/net/netfilter/nf_conntrack_timestamp.c b/net/netfilter/nf_conntrack_timestamp.c index 902fb0a..7a394df 100644 --- a/net/netfilter/nf_conntrack_timestamp.c +++ b/net/netfilter/nf_conntrack_timestamp.c @@ -97,7 +97,6 @@ int nf_conntrack_tstamp_pernet_init(struct net *net) void nf_conntrack_tstamp_pernet_fini(struct net *net) { nf_conntrack_tstamp_fini_sysctl(net); - nf_ct_extend_unregister(&tstamp_extend); } int nf_conntrack_tstamp_init(void) -- cgit v0.10.2 From f5a44db5d2d677dfbf12deee461f85e9ec633961 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Fri, 20 Dec 2013 09:29:35 -0500 Subject: ext4: add explicit casts when masking cluster sizes The missing casts can cause the high 64-bits of the physical blocks to be lost. Set up new macros which allows us to make sure the right thing happen, even if at some point we end up supporting larger logical block numbers. Thanks to the Emese Revfy and the PaX security team for reporting this issue. Reported-by: PaX Team Reported-by: Emese Revfy Signed-off-by: "Theodore Ts'o" Cc: stable@vger.kernel.org diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index e618503..ece5556 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -268,6 +268,16 @@ struct ext4_io_submit { /* Translate # of blks to # of clusters */ #define EXT4_NUM_B2C(sbi, blks) (((blks) + (sbi)->s_cluster_ratio - 1) >> \ (sbi)->s_cluster_bits) +/* Mask out the low bits to get the starting block of the cluster */ +#define EXT4_PBLK_CMASK(s, pblk) ((pblk) & \ + ~((ext4_fsblk_t) (s)->s_cluster_ratio - 1)) +#define EXT4_LBLK_CMASK(s, lblk) ((lblk) & \ + ~((ext4_lblk_t) (s)->s_cluster_ratio - 1)) +/* Get the cluster offset */ +#define EXT4_PBLK_COFF(s, pblk) ((pblk) & \ + ((ext4_fsblk_t) (s)->s_cluster_ratio - 1)) +#define EXT4_LBLK_COFF(s, lblk) ((lblk) & \ + ((ext4_lblk_t) (s)->s_cluster_ratio - 1)) /* * Structure of a blocks group descriptor diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 267c9fb..4410cc3 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -1851,8 +1851,7 @@ static unsigned int ext4_ext_check_overlap(struct ext4_sb_info *sbi, depth = ext_depth(inode); if (!path[depth].p_ext) goto out; - b2 = le32_to_cpu(path[depth].p_ext->ee_block); - b2 &= ~(sbi->s_cluster_ratio - 1); + b2 = EXT4_LBLK_CMASK(sbi, le32_to_cpu(path[depth].p_ext->ee_block)); /* * get the next allocated block if the extent in the path @@ -1862,7 +1861,7 @@ static unsigned int ext4_ext_check_overlap(struct ext4_sb_info *sbi, b2 = ext4_ext_next_allocated_block(path); if (b2 == EXT_MAX_BLOCKS) goto out; - b2 &= ~(sbi->s_cluster_ratio - 1); + b2 = EXT4_LBLK_CMASK(sbi, b2); } /* check for wrap through zero on extent logical start block*/ @@ -2521,7 +2520,7 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode, * extent, we have to mark the cluster as used (store negative * cluster number in partial_cluster). */ - unaligned = pblk & (sbi->s_cluster_ratio - 1); + unaligned = EXT4_PBLK_COFF(sbi, pblk); if (unaligned && (ee_len == num) && (*partial_cluster != -((long long)EXT4_B2C(sbi, pblk)))) *partial_cluster = EXT4_B2C(sbi, pblk); @@ -2615,7 +2614,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode, * accidentally freeing it later on */ pblk = ext4_ext_pblock(ex); - if (pblk & (sbi->s_cluster_ratio - 1)) + if (EXT4_PBLK_COFF(sbi, pblk)) *partial_cluster = -((long long)EXT4_B2C(sbi, pblk)); ex--; @@ -3770,7 +3769,7 @@ int ext4_find_delalloc_cluster(struct inode *inode, ext4_lblk_t lblk) { struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); ext4_lblk_t lblk_start, lblk_end; - lblk_start = lblk & (~(sbi->s_cluster_ratio - 1)); + lblk_start = EXT4_LBLK_CMASK(sbi, lblk); lblk_end = lblk_start + sbi->s_cluster_ratio - 1; return ext4_find_delalloc_range(inode, lblk_start, lblk_end); @@ -3829,9 +3828,9 @@ get_reserved_cluster_alloc(struct inode *inode, ext4_lblk_t lblk_start, trace_ext4_get_reserved_cluster_alloc(inode, lblk_start, num_blks); /* Check towards left side */ - c_offset = lblk_start & (sbi->s_cluster_ratio - 1); + c_offset = EXT4_LBLK_COFF(sbi, lblk_start); if (c_offset) { - lblk_from = lblk_start & (~(sbi->s_cluster_ratio - 1)); + lblk_from = EXT4_LBLK_CMASK(sbi, lblk_start); lblk_to = lblk_from + c_offset - 1; if (ext4_find_delalloc_range(inode, lblk_from, lblk_to)) @@ -3839,7 +3838,7 @@ get_reserved_cluster_alloc(struct inode *inode, ext4_lblk_t lblk_start, } /* Now check towards right. */ - c_offset = (lblk_start + num_blks) & (sbi->s_cluster_ratio - 1); + c_offset = EXT4_LBLK_COFF(sbi, lblk_start + num_blks); if (allocated_clusters && c_offset) { lblk_from = lblk_start + num_blks; lblk_to = lblk_from + (sbi->s_cluster_ratio - c_offset) - 1; @@ -4047,7 +4046,7 @@ static int get_implied_cluster_alloc(struct super_block *sb, struct ext4_ext_path *path) { struct ext4_sb_info *sbi = EXT4_SB(sb); - ext4_lblk_t c_offset = map->m_lblk & (sbi->s_cluster_ratio-1); + ext4_lblk_t c_offset = EXT4_LBLK_COFF(sbi, map->m_lblk); ext4_lblk_t ex_cluster_start, ex_cluster_end; ext4_lblk_t rr_cluster_start; ext4_lblk_t ee_block = le32_to_cpu(ex->ee_block); @@ -4065,8 +4064,7 @@ static int get_implied_cluster_alloc(struct super_block *sb, (rr_cluster_start == ex_cluster_start)) { if (rr_cluster_start == ex_cluster_end) ee_start += ee_len - 1; - map->m_pblk = (ee_start & ~(sbi->s_cluster_ratio - 1)) + - c_offset; + map->m_pblk = EXT4_PBLK_CMASK(sbi, ee_start) + c_offset; map->m_len = min(map->m_len, (unsigned) sbi->s_cluster_ratio - c_offset); /* @@ -4220,7 +4218,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode, */ map->m_flags &= ~EXT4_MAP_FROM_CLUSTER; newex.ee_block = cpu_to_le32(map->m_lblk); - cluster_offset = map->m_lblk & (sbi->s_cluster_ratio-1); + cluster_offset = EXT4_LBLK_CMASK(sbi, map->m_lblk); /* * If we are doing bigalloc, check to see if the extent returned @@ -4288,7 +4286,7 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode, * needed so that future calls to get_implied_cluster_alloc() * work correctly. */ - offset = map->m_lblk & (sbi->s_cluster_ratio - 1); + offset = EXT4_LBLK_COFF(sbi, map->m_lblk); ar.len = EXT4_NUM_B2C(sbi, offset+allocated); ar.goal -= offset; ar.logical -= offset; diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 04766d9..04a5c75 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4126,7 +4126,7 @@ ext4_mb_initialize_context(struct ext4_allocation_context *ac, ext4_get_group_no_and_offset(sb, goal, &group, &block); /* set up allocation goals */ - ac->ac_b_ex.fe_logical = ar->logical & ~(sbi->s_cluster_ratio - 1); + ac->ac_b_ex.fe_logical = EXT4_LBLK_CMASK(sbi, ar->logical); ac->ac_status = AC_STATUS_CONTINUE; ac->ac_sb = sb; ac->ac_inode = ar->inode; @@ -4668,7 +4668,7 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode, * blocks at the beginning or the end unless we are explicitly * requested to avoid doing so. */ - overflow = block & (sbi->s_cluster_ratio - 1); + overflow = EXT4_PBLK_COFF(sbi, block); if (overflow) { if (flags & EXT4_FREE_BLOCKS_NOFREE_FIRST_CLUSTER) { overflow = sbi->s_cluster_ratio - overflow; @@ -4682,7 +4682,7 @@ void ext4_free_blocks(handle_t *handle, struct inode *inode, count += overflow; } } - overflow = count & (sbi->s_cluster_ratio - 1); + overflow = EXT4_LBLK_COFF(sbi, count); if (overflow) { if (flags & EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER) { if (count > overflow) -- cgit v0.10.2 From a96e4e2ffe439e45732d820fac6fee486b6412bf Mon Sep 17 00:00:00 2001 From: Roland Dreier Date: Thu, 19 Dec 2013 08:37:03 -0800 Subject: IB/uverbs: New macro to set pointers to NULL if length is 0 in INIT_UDATA() Trying to have a ternary operator to choose between NULL (or 0) and the real pointer value in invocations leads to an impossible choice between a sparse error about a literal 0 used as a NULL pointer, and a gcc warning about "pointer/integer type mismatch in conditional expression." Rather than clutter the source with more casts, move the ternary operator into a new INIT_UDATA_BUF_OR_NULL() macro, which makes it easier to use and simplifies its callers. Reported-by: Yann Droneaud Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h index 9879568..a283274 100644 --- a/drivers/infiniband/core/uverbs.h +++ b/drivers/infiniband/core/uverbs.h @@ -55,6 +55,14 @@ (udata)->outlen = (olen); \ } while (0) +#define INIT_UDATA_BUF_OR_NULL(udata, ibuf, obuf, ilen, olen) \ + do { \ + (udata)->inbuf = (ilen) ? (const void __user *) (ibuf) : NULL; \ + (udata)->outbuf = (olen) ? (void __user *) (obuf) : NULL; \ + (udata)->inlen = (ilen); \ + (udata)->outlen = (olen); \ + } while (0) + /* * Our lifetime rules for these structs are the following: * diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index 3438694..699463b 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -676,17 +676,14 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf, return -EINVAL; } - INIT_UDATA(&ucore, - (hdr.in_words) ? buf : 0, - (unsigned long)ex_hdr.response, - hdr.in_words * 8, - hdr.out_words * 8); - - INIT_UDATA(&uhw, - (ex_hdr.provider_in_words) ? buf + ucore.inlen : 0, - (ex_hdr.provider_out_words) ? (unsigned long)ex_hdr.response + ucore.outlen : 0, - ex_hdr.provider_in_words * 8, - ex_hdr.provider_out_words * 8); + INIT_UDATA_BUF_OR_NULL(&ucore, buf, (unsigned long) ex_hdr.response, + hdr.in_words * 8, hdr.out_words * 8); + + INIT_UDATA_BUF_OR_NULL(&uhw, + buf + ucore.inlen, + (unsigned long) ex_hdr.response + ucore.outlen, + ex_hdr.provider_in_words * 8, + ex_hdr.provider_out_words * 8); err = uverbs_ex_cmd_table[command](file, &ucore, -- cgit v0.10.2 From 7efb1b19b3414d7dec792f39e1c1a7db57a23961 Mon Sep 17 00:00:00 2001 From: Yann Droneaud Date: Wed, 11 Dec 2013 23:01:47 +0100 Subject: IB/uverbs: Check reserved field in extended command header As noted by Daniel Vetter in its article "Botching up ioctls"[1] "Check *all* unused fields and flags and all the padding for whether it's 0, and reject the ioctl if that's not the case. Otherwise your nice plan for future extensions is going right down the gutters since someone *will* submit an ioctl struct with random stack garbage in the yet unused parts. Which then bakes in the ABI that those fields can never be used for anything else but garbage." It's important to ensure that reserved fields are set to known value, so that it will be possible to use them latter to extend the ABI. The same reasonning apply to comp_mask field present in newer uverbs command: per commit 22878dbc9173 ("IB/core: Better checking of userspace values for receive flow steering"), unsupported values in comp_mask are rejected. [1] http://blog.ffwll.ch/2013/11/botching-up-ioctls.html Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com> Signed-off-by: Yann Droneaud Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index 699463b..ac2305d 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -668,6 +668,9 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf, if ((hdr.in_words + ex_hdr.provider_in_words) * 8 != count) return -EINVAL; + if (ex_hdr.cmd_hdr_reserved) + return -EINVAL; + if (ex_hdr.response) { if (!hdr.out_words && !ex_hdr.provider_out_words) return -EINVAL; -- cgit v0.10.2 From 2782c2d302557403e314a43856f681a5385e62c6 Mon Sep 17 00:00:00 2001 From: Yann Droneaud Date: Wed, 11 Dec 2013 23:01:48 +0100 Subject: IB/uverbs: Check comp_mask in destroy_flow Just like the check added to create_flow in 22878dbc9173 ("IB/core: Better checking of userspace values for receive flow steering"), comp_mask must be checked in destroy_flow too. Since only empty comp_mask is currently supported, any other value must be rejected. This check was silently added in a previous patch[1] to move comp_mask in extended command header, part of previous patchset[2] against create/destroy_flow uverbs. The idea of moving comp_mask to the header was discarded for the final patchset[3]. Unfortunately the check added in destroy_flow uverb was not integrated in the final patchset. [1] http://marc.info/?i=40175eda10d670d098204da6aa4c327a0171ae5f.1381510045.git.ydroneaud@opteya.com [2] http://marc.info/?i=cover.1381510045.git.ydroneaud@opteya.com [3] http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com Cc: Matan Barak Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com> Signed-off-by: Yann Droneaud Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 65f6e7d..f0ee6b9 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -2795,6 +2795,9 @@ int ib_uverbs_ex_destroy_flow(struct ib_uverbs_file *file, if (ret) return ret; + if (cmd.comp_mask) + return -EINVAL; + uobj = idr_write_uobj(&ib_uverbs_rule_idr, cmd.flow_handle, file->ucontext); if (!uobj) -- cgit v0.10.2 From c780d82a74cdf247a81f877ecae569b3a248f89b Mon Sep 17 00:00:00 2001 From: Yann Droneaud Date: Wed, 11 Dec 2013 23:01:49 +0100 Subject: IB/uverbs: Check reserved fields in create_flow As noted by Daniel Vetter in its article "Botching up ioctls"[1] "Check *all* unused fields and flags and all the padding for whether it's 0, and reject the ioctl if that's not the case. Otherwise your nice plan for future extensions is going right down the gutters since someone *will* submit an ioctl struct with random stack garbage in the yet unused parts. Which then bakes in the ABI that those fields can never be used for anything else but garbage." It's important to ensure that reserved fields are set to known value, so that it will be possible to use them latter to extend the ABI. The same reasonning apply to comp_mask field present in newer uverbs command: per commit 22878dbc9173 ("IB/core: Better checking of userspace values for receive flow steering"), unsupported values in comp_mask are rejected. [1] http://blog.ffwll.ch/2013/11/botching-up-ioctls.html Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com> Signed-off-by: Yann Droneaud Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index f0ee6b9..4e8b15c 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -2593,6 +2593,9 @@ out_put: static int kern_spec_to_ib_spec(struct ib_uverbs_flow_spec *kern_spec, union ib_flow_spec *ib_spec) { + if (kern_spec->reserved) + return -EINVAL; + ib_spec->type = kern_spec->type; switch (ib_spec->type) { @@ -2671,6 +2674,10 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file, (cmd.flow_attr.num_of_specs * sizeof(struct ib_uverbs_flow_spec))) return -EINVAL; + if (cmd.flow_attr.reserved[0] || + cmd.flow_attr.reserved[1]) + return -EINVAL; + if (cmd.flow_attr.num_of_specs) { kern_flow_attr = kmalloc(sizeof(*kern_flow_attr) + cmd.flow_attr.size, GFP_KERNEL); -- cgit v0.10.2 From 98a37510ec1452817600d8ea47cff1d9f8d9bec8 Mon Sep 17 00:00:00 2001 From: Yann Droneaud Date: Wed, 11 Dec 2013 23:01:50 +0100 Subject: IB/uverbs: Set error code when fail to consume all flow_spec items If the flow_spec items parsed count does not match the number of items declared in the flow_attr command, or if not all bytes are used for flow_spec items (eg. trailing garbage), a log message is reported and the function leave through the error path. Unfortunately the error code is currently not set. This patch set error code to -EINVAL in such cases, so that the error is reported to userspace instead of silently fail. Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com> Signed-off-by: Yann Droneaud Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 4e8b15c..45fb80b 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -2738,6 +2738,7 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file, if (cmd.flow_attr.size || (i != flow_attr->num_of_specs)) { pr_warn("create flow failed, flow %d: %d bytes left from uverb cmd\n", i, cmd.flow_attr.size); + err = -EINVAL; goto err_free; } flow_id = ib_create_flow(qp, flow_attr, IB_FLOW_DOMAIN_USER); -- cgit v0.10.2 From 6bcca3d4a3bcc9859cf001a0a21c8796edae2dc0 Mon Sep 17 00:00:00 2001 From: Yann Droneaud Date: Wed, 11 Dec 2013 23:01:52 +0100 Subject: IB/uverbs: Check input length in flow steering uverbs Since ib_copy_from_udata() doesn't check yet the available input data length before accessing userspace memory, an explicit check of this length is required to prevent: - reading past the user provided buffer, - underflow when subtracting the expected command size from the input length. This will ensure the newly added flow steering uverbs don't try to process truncated commands. Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com> Signed-off-by: Yann Droneaud Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 45fb80b..f1cc838 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -2649,6 +2649,9 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file, void *ib_spec; int i; + if (ucore->inlen < sizeof(cmd)) + return -EINVAL; + if (ucore->outlen < sizeof(resp)) return -ENOSPC; @@ -2799,6 +2802,9 @@ int ib_uverbs_ex_destroy_flow(struct ib_uverbs_file *file, struct ib_uobject *uobj; int ret; + if (ucore->inlen < sizeof(cmd)) + return -EINVAL; + ret = ib_copy_from_udata(&cmd, ucore, sizeof(cmd)); if (ret) return ret; -- cgit v0.10.2 From 6cc3df840a84dc4e8a874e74cd62a138074922ba Mon Sep 17 00:00:00 2001 From: Yann Droneaud Date: Wed, 11 Dec 2013 23:01:51 +0100 Subject: IB/uverbs: Check access to userspace response buffer in extended command This patch adds a check on the output buffer with access_ok(VERIFY_WRITE, ...) to ensure the whole buffer is in userspace memory before using the pointer in uverbs functions. If the buffer or a subset of it is not valid, returns -EFAULT to the caller. This will also catch invalid buffer before the final call to copy_to_user() which happen late in most uverb functions. Just like the check in read(2) syscall, it's a sanity check to detect invalid parameters provided by userspace. This particular check was added in vfs_read() by Linus Torvalds for v2.6.12 with following commit message: https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=fd770e66c9a65b14ce114e171266cf6f393df502 Make read/write always do the full "access_ok()" tests. The actual user copy will do them too, but only for the range that ends up being actually copied. That hides bugs when the range has been clamped by file size or other issues. Note: there's no need to check input buffer since vfs_write() already does access_ok(VERIFY_READ, ...) as part of write() syscall. Link: http://marc.info/?i=cover.1387273677.git.ydroneaud@opteya.com Signed-off-by: Yann Droneaud Signed-off-by: Roland Dreier diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c index ac2305d..08219fb 100644 --- a/drivers/infiniband/core/uverbs_main.c +++ b/drivers/infiniband/core/uverbs_main.c @@ -674,6 +674,11 @@ static ssize_t ib_uverbs_write(struct file *filp, const char __user *buf, if (ex_hdr.response) { if (!hdr.out_words && !ex_hdr.provider_out_words) return -EINVAL; + + if (!access_ok(VERIFY_WRITE, + (void __user *) (unsigned long) ex_hdr.response, + (hdr.out_words + ex_hdr.provider_out_words) * 8)) + return -EFAULT; } else { if (hdr.out_words || ex_hdr.provider_out_words) return -EINVAL; -- cgit v0.10.2 From ee53664bda169f519ce3c6a22d378f0b946c8178 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Fri, 20 Dec 2013 15:10:03 +0200 Subject: mm: Fix NULL pointer dereference in madvise(MADV_WILLNEED) support Sasha Levin found a NULL pointer dereference that is due to a missing page table lock, which in turn is due to the pmd entry in question being a transparent huge-table entry. The code - introduced in commit 1998cc048901 ("mm: make madvise(MADV_WILLNEED) support swap file prefetch") - correctly checks for this situation using pmd_none_or_trans_huge_or_clear_bad(), but it turns out that that function doesn't work correctly. pmd_none_or_trans_huge_or_clear_bad() expected that pmd_bad() would trigger if the transparent hugepage bit was set, but it doesn't do that if pmd_numa() is also set. Note that the NUMA bit only gets set on real NUMA machines, so people trying to reproduce this on most normal development systems would never actually trigger this. Fix it by removing the very subtle (and subtly incorrect) expectation, and instead just checking pmd_trans_huge() explicitly. Reported-by: Sasha Levin Acked-by: Andrea Arcangeli [ Additionally remove the now stale test for pmd_trans_huge() inside the pmd_bad() case - Linus ] Signed-off-by: Linus Torvalds diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index b12079a..db09234 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -599,11 +599,10 @@ static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd) #ifdef CONFIG_TRANSPARENT_HUGEPAGE barrier(); #endif - if (pmd_none(pmdval)) + if (pmd_none(pmdval) || pmd_trans_huge(pmdval)) return 1; if (unlikely(pmd_bad(pmdval))) { - if (!pmd_trans_huge(pmdval)) - pmd_clear_bad(pmd); + pmd_clear_bad(pmd); return 1; } return 0; -- cgit v0.10.2 From 8798cee2f90292ca4ca84baccc757e4d06f8797f Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Fri, 20 Dec 2013 14:54:11 +0000 Subject: Revert "mm: page_alloc: exclude unreclaimable allocations from zone fairness policy" This reverts commit 73f038b863df. The NUMA behaviour of this patch is less than ideal. An alternative approch is to interleave allocations only within local zones which is implemented in the next patch. Cc: stable@vger.kernel.org Signed-off-by: Mel Gorman Signed-off-by: Linus Torvalds diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f861d02..580a5f0 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1920,8 +1920,7 @@ zonelist_scan: * back to remote zones that do not partake in the * fairness round-robin cycle of this zonelist. */ - if ((alloc_flags & ALLOC_WMARK_LOW) && - (gfp_mask & GFP_MOVABLE_MASK)) { + if (alloc_flags & ALLOC_WMARK_LOW) { if (zone_page_state(zone, NR_ALLOC_BATCH) <= 0) continue; if (zone_reclaim_mode && -- cgit v0.10.2 From fff4068cba484e6b0abe334ed6b15d5a215a3b25 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Fri, 20 Dec 2013 14:54:12 +0000 Subject: mm: page_alloc: revert NUMA aspect of fair allocation policy Commit 81c0a2bb515f ("mm: page_alloc: fair zone allocator policy") meant to bring aging fairness among zones in system, but it was overzealous and badly regressed basic workloads on NUMA systems. Due to the way kswapd and page allocator interacts, we still want to make sure that all zones in any given node are used equally for all allocations to maximize memory utilization and prevent thrashing on the highest zone in the node. While the same principle applies to NUMA nodes - memory utilization is obviously improved by spreading allocations throughout all nodes - remote references can be costly and so many workloads prefer locality over memory utilization. The original change assumed that zone_reclaim_mode would be a good enough predictor for that, but it turned out to be as indicative as a coin flip. Revert the NUMA aspect of the fairness until we can find a proper way to make it configurable and agree on a sane default. Signed-off-by: Johannes Weiner Reviewed-by: Michal Hocko Signed-off-by: Mel Gorman Cc: # 3.12 Signed-off-by: Linus Torvalds diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 580a5f0..5248fe0 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1816,7 +1816,7 @@ static void zlc_clear_zones_full(struct zonelist *zonelist) static bool zone_local(struct zone *local_zone, struct zone *zone) { - return node_distance(local_zone->node, zone->node) == LOCAL_DISTANCE; + return local_zone->node == zone->node; } static bool zone_allows_reclaim(struct zone *local_zone, struct zone *zone) @@ -1913,18 +1913,17 @@ zonelist_scan: * page was allocated in should have no effect on the * time the page has in memory before being reclaimed. * - * When zone_reclaim_mode is enabled, try to stay in - * local zones in the fastpath. If that fails, the - * slowpath is entered, which will do another pass - * starting with the local zones, but ultimately fall - * back to remote zones that do not partake in the - * fairness round-robin cycle of this zonelist. + * Try to stay in local zones in the fastpath. If + * that fails, the slowpath is entered, which will do + * another pass starting with the local zones, but + * ultimately fall back to remote zones that do not + * partake in the fairness round-robin cycle of this + * zonelist. */ if (alloc_flags & ALLOC_WMARK_LOW) { if (zone_page_state(zone, NR_ALLOC_BATCH) <= 0) continue; - if (zone_reclaim_mode && - !zone_local(preferred_zone, zone)) + if (!zone_local(preferred_zone, zone)) continue; } /* @@ -2390,7 +2389,7 @@ static void prepare_slowpath(gfp_t gfp_mask, unsigned int order, * thrash fairness information for zones that are not * actually part of this zonelist's round-robin cycle. */ - if (zone_reclaim_mode && !zone_local(preferred_zone, zone)) + if (!zone_local(preferred_zone, zone)) continue; mod_zone_page_state(zone, NR_ALLOC_BATCH, high_wmark_pages(zone) - -- cgit v0.10.2 From 597d795a2a786d22dd872332428e2b9439ede639 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Fri, 20 Dec 2013 13:35:58 +0200 Subject: mm: do not allocate page->ptl dynamically, if spinlock_t fits to long In struct page we have enough space to fit long-size page->ptl there, but we use dynamically-allocated page->ptl if size(spinlock_t) is larger than sizeof(int). It hurts 64-bit architectures with CONFIG_GENERIC_LOCKBREAK, where sizeof(spinlock_t) == 8, but it easily fits into struct page. Signed-off-by: Kirill A. Shutemov Acked-by: Hugh Dickins Signed-off-by: Linus Torvalds diff --git a/include/linux/lockref.h b/include/linux/lockref.h index c8929c3..4bfde0e 100644 --- a/include/linux/lockref.h +++ b/include/linux/lockref.h @@ -19,7 +19,7 @@ #define USE_CMPXCHG_LOCKREF \ (IS_ENABLED(CONFIG_ARCH_USE_CMPXCHG_LOCKREF) && \ - IS_ENABLED(CONFIG_SMP) && !BLOATED_SPINLOCKS) + IS_ENABLED(CONFIG_SMP) && SPINLOCK_SIZE <= 4) struct lockref { union { diff --git a/include/linux/mm.h b/include/linux/mm.h index 1cedd00..3552717 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1317,7 +1317,7 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long a #endif /* CONFIG_MMU && !__ARCH_HAS_4LEVEL_HACK */ #if USE_SPLIT_PTE_PTLOCKS -#if BLOATED_SPINLOCKS +#if ALLOC_SPLIT_PTLOCKS extern bool ptlock_alloc(struct page *page); extern void ptlock_free(struct page *page); @@ -1325,7 +1325,7 @@ static inline spinlock_t *ptlock_ptr(struct page *page) { return page->ptl; } -#else /* BLOATED_SPINLOCKS */ +#else /* ALLOC_SPLIT_PTLOCKS */ static inline bool ptlock_alloc(struct page *page) { return true; @@ -1339,7 +1339,7 @@ static inline spinlock_t *ptlock_ptr(struct page *page) { return &page->ptl; } -#endif /* BLOATED_SPINLOCKS */ +#endif /* ALLOC_SPLIT_PTLOCKS */ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd) { diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index ad0616f..290901a 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -26,6 +26,7 @@ struct address_space; #define USE_SPLIT_PTE_PTLOCKS (NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS) #define USE_SPLIT_PMD_PTLOCKS (USE_SPLIT_PTE_PTLOCKS && \ IS_ENABLED(CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK)) +#define ALLOC_SPLIT_PTLOCKS (SPINLOCK_SIZE > BITS_PER_LONG/8) /* * Each physical page in the system has a struct page associated with @@ -155,7 +156,7 @@ struct page { * system if PG_buddy is set. */ #if USE_SPLIT_PTE_PTLOCKS -#if BLOATED_SPINLOCKS +#if ALLOC_SPLIT_PTLOCKS spinlock_t *ptl; #else spinlock_t ptl; diff --git a/kernel/bounds.c b/kernel/bounds.c index 5253204..9fd4246 100644 --- a/kernel/bounds.c +++ b/kernel/bounds.c @@ -22,6 +22,6 @@ void foo(void) #ifdef CONFIG_SMP DEFINE(NR_CPUS_BITS, ilog2(CONFIG_NR_CPUS)); #endif - DEFINE(BLOATED_SPINLOCKS, sizeof(spinlock_t) > sizeof(int)); + DEFINE(SPINLOCK_SIZE, sizeof(spinlock_t)); /* End of constants */ } diff --git a/mm/memory.c b/mm/memory.c index 5d9025f..b6e211b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4271,7 +4271,7 @@ void copy_user_huge_page(struct page *dst, struct page *src, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */ -#if USE_SPLIT_PTE_PTLOCKS && BLOATED_SPINLOCKS +#if ALLOC_SPLIT_PTLOCKS bool ptlock_alloc(struct page *page) { spinlock_t *ptl; -- cgit v0.10.2 From df36ac1bc2a166eef90785d584e4cfed6f52bd32 Mon Sep 17 00:00:00 2001 From: "Luck, Tony" Date: Wed, 18 Dec 2013 15:17:10 -0800 Subject: pstore: Don't allow high traffic options on fragile devices Some pstore backing devices use on board flash as persistent storage. These have limited numbers of write cycles so it is a poor idea to use them from high frequency operations. Signed-off-by: Tony Luck Signed-off-by: Linus Torvalds diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c index 26311f2..cb1d557 100644 --- a/drivers/acpi/apei/erst.c +++ b/drivers/acpi/apei/erst.c @@ -942,6 +942,7 @@ static int erst_clearer(enum pstore_type_id type, u64 id, int count, static struct pstore_info erst_info = { .owner = THIS_MODULE, .name = "erst", + .flags = PSTORE_FLAGS_FRAGILE, .open = erst_open_pstore, .close = erst_close_pstore, .read = erst_reader, diff --git a/drivers/firmware/efi/efi-pstore.c b/drivers/firmware/efi/efi-pstore.c index 743fd42..4b9dc83 100644 --- a/drivers/firmware/efi/efi-pstore.c +++ b/drivers/firmware/efi/efi-pstore.c @@ -356,6 +356,7 @@ static int efi_pstore_erase(enum pstore_type_id type, u64 id, int count, static struct pstore_info efi_pstore_info = { .owner = THIS_MODULE, .name = "efi", + .flags = PSTORE_FLAGS_FRAGILE, .open = efi_pstore_open, .close = efi_pstore_close, .read = efi_pstore_read, diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c index b8e93a4..78c3c20 100644 --- a/fs/pstore/platform.c +++ b/fs/pstore/platform.c @@ -443,8 +443,11 @@ int pstore_register(struct pstore_info *psi) pstore_get_records(0); kmsg_dump_register(&pstore_dumper); - pstore_register_console(); - pstore_register_ftrace(); + + if ((psi->flags & PSTORE_FLAGS_FRAGILE) == 0) { + pstore_register_console(); + pstore_register_ftrace(); + } if (pstore_update_ms >= 0) { pstore_timer.expires = jiffies + diff --git a/include/linux/pstore.h b/include/linux/pstore.h index abd437d..ece0c6b 100644 --- a/include/linux/pstore.h +++ b/include/linux/pstore.h @@ -51,6 +51,7 @@ struct pstore_info { char *buf; size_t bufsize; struct mutex read_mutex; /* serialize open/read/close */ + int flags; int (*open)(struct pstore_info *psi); int (*close)(struct pstore_info *psi); ssize_t (*read)(u64 *id, enum pstore_type_id *type, @@ -70,6 +71,8 @@ struct pstore_info { void *data; }; +#define PSTORE_FLAGS_FRAGILE 1 + #ifdef CONFIG_PSTORE extern int pstore_register(struct pstore_info *); extern bool pstore_cannot_block_path(enum kmsg_dump_reason reason); -- cgit v0.10.2 From 11daf32be9a6041a2406c2cbf76b75a9c4a6eaaa Mon Sep 17 00:00:00 2001 From: Matteo Facchinetti Date: Fri, 20 Dec 2013 10:16:22 +0100 Subject: powerpc/512x: dts: disable MPC5125 usb module At the moment the USB controller's pin muxing is not setup correctly and causes a kernel panic upon system startup, so disable the USB1 device tree node in the MPC5125 tower board dts file. The USB controller is connected to an USB3320 ULPI transceiver and the device tree should receive an update to reflect correct dependencies and required initialization data before the USB1 node can get re-enabled. Signed-off-by: Matteo Facchinetti Signed-off-by: Anatolij Gustschin diff --git a/arch/powerpc/boot/dts/mpc5125twr.dts b/arch/powerpc/boot/dts/mpc5125twr.dts index 0a0fe92..a618dfc 100644 --- a/arch/powerpc/boot/dts/mpc5125twr.dts +++ b/arch/powerpc/boot/dts/mpc5125twr.dts @@ -188,6 +188,10 @@ reg = <0xA000 0x1000>; }; + // disable USB1 port + // TODO: + // correct pinmux config and fix USB3320 ulpi dependency + // before re-enabling it usb@3000 { compatible = "fsl,mpc5121-usb2-dr"; reg = <0x3000 0x400>; @@ -196,6 +200,7 @@ interrupts = <43 0x8>; dr_mode = "host"; phy_type = "ulpi"; + status = "disabled"; }; // 5125 PSCs are not 52xx or 5121 PSC compatible -- cgit v0.10.2 From 40b64acd17a2200579db265048b4a51998a84729 Mon Sep 17 00:00:00 2001 From: Olof Johansson Date: Fri, 20 Dec 2013 14:28:05 -0800 Subject: mm: fix build of split ptlock code Commit 597d795a2a78 ('mm: do not allocate page->ptl dynamically, if spinlock_t fits to long') restructures some allocators that are compiled even if USE_SPLIT_PTLOCKS arn't used. It results in compilation failure: mm/memory.c:4282:6: error: 'struct page' has no member named 'ptl' mm/memory.c:4288:12: error: 'struct page' has no member named 'ptl' Add in the missing ifdef. Fixes: 597d795a2a78 ('mm: do not allocate page->ptl dynamically, if spinlock_t fits to long') Signed-off-by: Olof Johansson Cc: Kirill A. Shutemov Cc: Hugh Dickins Signed-off-by: Linus Torvalds diff --git a/mm/memory.c b/mm/memory.c index b6e211b..6768ce9 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4271,7 +4271,7 @@ void copy_user_huge_page(struct page *dst, struct page *src, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */ -#if ALLOC_SPLIT_PTLOCKS +#if USE_SPLIT_PTE_PTLOCKS && ALLOC_SPLIT_PTLOCKS bool ptlock_alloc(struct page *page) { spinlock_t *ptl; -- cgit v0.10.2 From b7000adef17a5cce85636e40fa2c2d9851a89e28 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Fri, 20 Dec 2013 16:52:45 -0800 Subject: Don't set the INITRD_COMPRESS environment variable automatically Commit 1bf49dd4be0b ("./Makefile: export initial ramdisk compression config option") started setting the INITRD_COMPRESS environment variable depending on which decompression models the kernel had available. That is completely broken. For example, we by default have CONFIG_RD_LZ4 enabled, and are able to decompress such an initrd, but the user tools to *create* such an initrd may not be availble. So trying to tell dracut to generate an lz4-compressed image just because we can decode such an image is completely inappropriate. Cc: J P Cc: Andrew Morton Cc: Jan Beulich Signed-off-by: Linus Torvalds diff --git a/Makefile b/Makefile index 89cee86..82ea7dd 100644 --- a/Makefile +++ b/Makefile @@ -738,7 +738,9 @@ INITRD_COMPRESS-$(CONFIG_RD_LZMA) := lzma INITRD_COMPRESS-$(CONFIG_RD_XZ) := xz INITRD_COMPRESS-$(CONFIG_RD_LZO) := lzo INITRD_COMPRESS-$(CONFIG_RD_LZ4) := lz4 -export INITRD_COMPRESS := $(INITRD_COMPRESS-y) +# do not export INITRD_COMPRESS, since we didn't actually +# choose a sane default compression above. +# export INITRD_COMPRESS := $(INITRD_COMPRESS-y) ifdef CONFIG_MODULE_SIG_ALL MODSECKEY = ./signing_key.priv -- cgit v0.10.2 From 89ed05eea093d4c18df5d504d104f29b874fea29 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Matias=20Bj=C3=B8rling?= Date: Sat, 21 Dec 2013 00:10:59 +0100 Subject: null_blk: corrections to documentation Randy Dunlap reported a couple of grammar errors and unfortunate usages of socket/node/core. Signed-off-by: Matias Bjorling Acked-by: Randy Dunlap Signed-off-by: Jens Axboe diff --git a/Documentation/block/null_blk.txt b/Documentation/block/null_blk.txt index 9e1b047..5603dad 100644 --- a/Documentation/block/null_blk.txt +++ b/Documentation/block/null_blk.txt @@ -18,7 +18,7 @@ The following instances are possible: - Bio-based. IO requests are submitted directly to the device driver. - Directly accepts bio data structure and returns them. -All of them has a completion queue for each core in the system. +All of them have a completion queue for each core in the system. II. Module parameters applicable for all instances: @@ -30,7 +30,7 @@ queue_mode=[0-2]: Default: 2-Multi-queue 2: Multi-queue. home_node=[0--nr_nodes]: Default: NUMA_NO_NODE - Selects what socket the data structures is allocated from. + Selects what CPU node the data structures are allocated from. gb=[Size in GB]: Default: 250GB The size of the device reported to the system. @@ -38,34 +38,34 @@ gb=[Size in GB]: Default: 250GB bs=[Block size (in bytes)]: Default: 512 bytes The block size reported to the system. -nr_devices=[Num. devices]: Default: 2 +nr_devices=[Number of devices]: Default: 2 Number of block devices instantiated. They are instantiated as /dev/nullb0, etc. -irq_mode=[0-2]: Default: Soft-irq +irq_mode=[0-2]: Default: 1-Soft-irq The completion mode used for completing IOs to the block-layer. 0: None. - 1: Soft-irq. Uses ipi to complete IOs across sockets. Simulates the overhead - when IOs are issued from another socket than the home the device is + 1: Soft-irq. Uses IPI to complete IOs across CPU nodes. Simulates the overhead + when IOs are issued from another CPU node than the home the device is connected to. 2: Timer: Waits a specific period (completion_nsec) for each IO before completion. -completion_nsec=[Num. ns]: Default: 10.000ns +completion_nsec=[ns]: Default: 10.000ns Combined with irq_mode=2 (timer). The time each completion event must wait. submit_queues=[0..nr_cpus]: The number of submission queues attached to the device driver. If unset, it defaults to 1 on single-queue and bio-based instances. For multi-queue, - its ignored when use_per_node_hctx module parameter is 1. + it is ignored when use_per_node_hctx module parameter is 1. -hw_queue_depth=[0..qdepth]: Defaults: 64 +hw_queue_depth=[0..qdepth]: Default: 64 The hardware queue depth of the device. III: Multi-queue specific parameters -use_per_node_hctx=[0/1]: Defaults: 1 +use_per_node_hctx=[0/1]: Default: 1 If 1, the multi-queue block layer is instantiated with a hardware dispatch queue for each CPU node in the system. If 0, it is instantiated with the number of queues defined in the submit_queues parameter. -- cgit v0.10.2 From 200052440d3b56f593038a35b7c14bdc780184a9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Matias=20Bj=C3=B8rling?= Date: Sat, 21 Dec 2013 00:11:00 +0100 Subject: null_blk: set use_per_node_hctx param to false The defaults for the module is to instantiate itself with blk-mq and a submit queue for each CPU node in the system. To save resources, initialize instead with a single submit queue. Signed-off-by: Matias Bjorling Signed-off-by: Jens Axboe diff --git a/Documentation/block/null_blk.txt b/Documentation/block/null_blk.txt index 5603dad..b2830b4 100644 --- a/Documentation/block/null_blk.txt +++ b/Documentation/block/null_blk.txt @@ -65,7 +65,8 @@ hw_queue_depth=[0..qdepth]: Default: 64 III: Multi-queue specific parameters -use_per_node_hctx=[0/1]: Default: 1 - If 1, the multi-queue block layer is instantiated with a hardware dispatch - queue for each CPU node in the system. If 0, it is instantiated with the - number of queues defined in the submit_queues parameter. +use_per_node_hctx=[0/1]: Default: 0 + 0: The number of submit queues are set to the value of the submit_queues + parameter. + 1: The multi-queue block layer is instantiated with a hardware dispatch + queue for each CPU node in the system. diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c index 8f2e7c3..9b0311b 100644 --- a/drivers/block/null_blk.c +++ b/drivers/block/null_blk.c @@ -101,9 +101,9 @@ static int hw_queue_depth = 64; module_param(hw_queue_depth, int, S_IRUGO); MODULE_PARM_DESC(hw_queue_depth, "Queue depth for each hardware queue. Default: 64"); -static bool use_per_node_hctx = true; +static bool use_per_node_hctx = false; module_param(use_per_node_hctx, bool, S_IRUGO); -MODULE_PARM_DESC(use_per_node_hctx, "Use per-node allocation for hardware context queues. Default: true"); +MODULE_PARM_DESC(use_per_node_hctx, "Use per-node allocation for hardware context queues. Default: false"); static void put_tag(struct nullb_queue *nq, unsigned int tag) { -- cgit v0.10.2 From fc1bc35443741e132dd0118e8dbac53f69a6f76e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Matias=20Bj=C3=B8rling?= Date: Sat, 21 Dec 2013 00:11:01 +0100 Subject: null_blk: support submit_queues on use_per_node_hctx In the case of both the submit_queues param and use_per_node_hctx param are used. We limit the number af submit_queues to the number of online nodes. If the submit_queues is a multiple of nr_online_nodes, its trivial. Simply map them to the nodes. For example: 8 submit queues are mapped as node0[0,1], node1[2,3], ... If uneven, we are left with an uneven number of submit_queues that must be mapped. These are mapped toward the first node and onward. E.g. 5 submit queues mapped onto 4 nodes are mapped as node0[0,1], node1[2], ... Signed-off-by: Matias Bjorling Signed-off-by: Jens Axboe diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c index 9b0311b..528f4e4 100644 --- a/drivers/block/null_blk.c +++ b/drivers/block/null_blk.c @@ -1,4 +1,5 @@ #include + #include #include #include @@ -346,8 +347,37 @@ static int null_queue_rq(struct blk_mq_hw_ctx *hctx, struct request *rq) static struct blk_mq_hw_ctx *null_alloc_hctx(struct blk_mq_reg *reg, unsigned int hctx_index) { - return kzalloc_node(sizeof(struct blk_mq_hw_ctx), GFP_KERNEL, - hctx_index); + int b_size = DIV_ROUND_UP(reg->nr_hw_queues, nr_online_nodes); + int tip = (reg->nr_hw_queues % nr_online_nodes); + int node = 0, i, n; + + /* + * Split submit queues evenly wrt to the number of nodes. If uneven, + * fill the first buckets with one extra, until the rest is filled with + * no extra. + */ + for (i = 0, n = 1; i < hctx_index; i++, n++) { + if (n % b_size == 0) { + n = 0; + node++; + + tip--; + if (!tip) + b_size = reg->nr_hw_queues / nr_online_nodes; + } + } + + /* + * A node might not be online, therefore map the relative node id to the + * real node id. + */ + for_each_online_node(n) { + if (!node) + break; + node--; + } + + return kzalloc_node(sizeof(struct blk_mq_hw_ctx), GFP_KERNEL, n); } static void null_free_hctx(struct blk_mq_hw_ctx *hctx, unsigned int hctx_index) @@ -591,10 +621,11 @@ static int __init null_init(void) #endif if (queue_mode == NULL_Q_MQ && use_per_node_hctx) { - if (submit_queues > 0) + if (submit_queues < nr_online_nodes) { pr_warn("null_blk: submit_queues param is set to %u.", nr_online_nodes); - submit_queues = nr_online_nodes; + submit_queues = nr_online_nodes; + } } else if (submit_queues > nr_cpu_ids) submit_queues = nr_cpu_ids; else if (!submit_queues) -- cgit v0.10.2 From 1881686f842065d2f92ec9c6424830ffc17d23b0 Mon Sep 17 00:00:00 2001 From: Benjamin LaHaise Date: Sat, 21 Dec 2013 15:49:28 -0500 Subject: aio: fix kioctx leak introduced by "aio: Fix a trinity splat" e34ecee2ae791df674dfb466ce40692ca6218e43 reworked the percpu reference counting to correct a bug trinity found. Unfortunately, the change lead to kioctxes being leaked because there was no final reference count to put. Add that reference count back in to fix things. Signed-off-by: Benjamin LaHaise Cc: stable@vger.kernel.org diff --git a/fs/aio.c b/fs/aio.c index 6efb7f6..fd1c0ba 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -652,7 +652,8 @@ static struct kioctx *ioctx_alloc(unsigned nr_events) aio_nr += ctx->max_reqs; spin_unlock(&aio_nr_lock); - percpu_ref_get(&ctx->users); /* io_setup() will drop this ref */ + percpu_ref_get(&ctx->users); /* io_setup() will drop this ref */ + percpu_ref_get(&ctx->reqs); /* free_ioctx_users() will drop this */ err = ioctx_add_table(ctx, mm); if (err) -- cgit v0.10.2 From 8e321fefb0e60bae4e2a28d20fc4fa30758d27c6 Mon Sep 17 00:00:00 2001 From: Benjamin LaHaise Date: Sat, 21 Dec 2013 17:56:08 -0500 Subject: aio/migratepages: make aio migrate pages sane The arbitrary restriction on page counts offered by the core migrate_page_move_mapping() code results in rather suspicious looking fiddling with page reference counts in the aio_migratepage() operation. To fix this, make migrate_page_move_mapping() take an extra_count parameter that allows aio to tell the code about its own reference count on the page being migrated. While cleaning up aio_migratepage(), make it validate that the old page being passed in is actually what aio_migratepage() expects to prevent misbehaviour in the case of races. Signed-off-by: Benjamin LaHaise diff --git a/fs/aio.c b/fs/aio.c index fd1c0ba..efa708b 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -244,9 +244,14 @@ static void aio_free_ring(struct kioctx *ctx) int i; for (i = 0; i < ctx->nr_pages; i++) { + struct page *page; pr_debug("pid(%d) [%d] page->count=%d\n", current->pid, i, page_count(ctx->ring_pages[i])); - put_page(ctx->ring_pages[i]); + page = ctx->ring_pages[i]; + if (!page) + continue; + ctx->ring_pages[i] = NULL; + put_page(page); } put_aio_ring_file(ctx); @@ -280,18 +285,38 @@ static int aio_migratepage(struct address_space *mapping, struct page *new, unsigned long flags; int rc; + rc = 0; + + /* Make sure the old page hasn't already been changed */ + spin_lock(&mapping->private_lock); + ctx = mapping->private_data; + if (ctx) { + pgoff_t idx; + spin_lock_irqsave(&ctx->completion_lock, flags); + idx = old->index; + if (idx < (pgoff_t)ctx->nr_pages) { + if (ctx->ring_pages[idx] != old) + rc = -EAGAIN; + } else + rc = -EINVAL; + spin_unlock_irqrestore(&ctx->completion_lock, flags); + } else + rc = -EINVAL; + spin_unlock(&mapping->private_lock); + + if (rc != 0) + return rc; + /* Writeback must be complete */ BUG_ON(PageWriteback(old)); - put_page(old); + get_page(new); - rc = migrate_page_move_mapping(mapping, new, old, NULL, mode); + rc = migrate_page_move_mapping(mapping, new, old, NULL, mode, 1); if (rc != MIGRATEPAGE_SUCCESS) { - get_page(old); + put_page(new); return rc; } - get_page(new); - /* We can potentially race against kioctx teardown here. Use the * address_space's private data lock to protect the mapping's * private_data. @@ -303,13 +328,24 @@ static int aio_migratepage(struct address_space *mapping, struct page *new, spin_lock_irqsave(&ctx->completion_lock, flags); migrate_page_copy(new, old); idx = old->index; - if (idx < (pgoff_t)ctx->nr_pages) - ctx->ring_pages[idx] = new; + if (idx < (pgoff_t)ctx->nr_pages) { + /* And only do the move if things haven't changed */ + if (ctx->ring_pages[idx] == old) + ctx->ring_pages[idx] = new; + else + rc = -EAGAIN; + } else + rc = -EINVAL; spin_unlock_irqrestore(&ctx->completion_lock, flags); } else rc = -EBUSY; spin_unlock(&mapping->private_lock); + if (rc == MIGRATEPAGE_SUCCESS) + put_page(old); + else + put_page(new); + return rc; } #endif diff --git a/include/linux/migrate.h b/include/linux/migrate.h index b7717d7..f015c05 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -55,7 +55,8 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping, struct page *newpage, struct page *page); extern int migrate_page_move_mapping(struct address_space *mapping, struct page *newpage, struct page *page, - struct buffer_head *head, enum migrate_mode mode); + struct buffer_head *head, enum migrate_mode mode, + int extra_count); #else static inline void putback_lru_pages(struct list_head *l) {} diff --git a/mm/migrate.c b/mm/migrate.c index e9b7102..9194375 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -317,14 +317,15 @@ static inline bool buffer_migrate_lock_buffers(struct buffer_head *head, */ int migrate_page_move_mapping(struct address_space *mapping, struct page *newpage, struct page *page, - struct buffer_head *head, enum migrate_mode mode) + struct buffer_head *head, enum migrate_mode mode, + int extra_count) { - int expected_count = 0; + int expected_count = 1 + extra_count; void **pslot; if (!mapping) { /* Anonymous page without mapping */ - if (page_count(page) != 1) + if (page_count(page) != expected_count) return -EAGAIN; return MIGRATEPAGE_SUCCESS; } @@ -334,7 +335,7 @@ int migrate_page_move_mapping(struct address_space *mapping, pslot = radix_tree_lookup_slot(&mapping->page_tree, page_index(page)); - expected_count = 2 + page_has_private(page); + expected_count += 1 + page_has_private(page); if (page_count(page) != expected_count || radix_tree_deref_slot_protected(pslot, &mapping->tree_lock) != page) { spin_unlock_irq(&mapping->tree_lock); @@ -584,7 +585,7 @@ int migrate_page(struct address_space *mapping, BUG_ON(PageWriteback(page)); /* Writeback must be complete */ - rc = migrate_page_move_mapping(mapping, newpage, page, NULL, mode); + rc = migrate_page_move_mapping(mapping, newpage, page, NULL, mode, 0); if (rc != MIGRATEPAGE_SUCCESS) return rc; @@ -611,7 +612,7 @@ int buffer_migrate_page(struct address_space *mapping, head = page_buffers(page); - rc = migrate_page_move_mapping(mapping, newpage, page, head, mode); + rc = migrate_page_move_mapping(mapping, newpage, page, head, mode, 0); if (rc != MIGRATEPAGE_SUCCESS) return rc; -- cgit v0.10.2 From 42f921a6f10c6c2079b093a115eb7e3c3508357f Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Fri, 20 Dec 2013 21:26:02 +0530 Subject: cpufreq: remove sysfs files for CPUs which failed to come back after resume MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There are cases where cpufreq_add_dev() may fail for some CPUs during system resume. With the current code we will still have sysfs cpufreq files for those CPUs and struct cpufreq_policy would be already freed for them. Hence any operation on those sysfs files would result in kernel warnings. Example of problems resulting from resume errors (from Bjørn Mork): WARNING: CPU: 0 PID: 6055 at fs/sysfs/file.c:343 sysfs_open_file+0x77/0x212() missing sysfs attribute operations for kobject: (null) Modules linked in: [stripped as irrelevant] CPU: 0 PID: 6055 Comm: grep Tainted: G D 3.13.0-rc2 #153 Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011 0000000000000009 ffff8802327ebb78 ffffffff81380b0e 0000000000000006 ffff8802327ebbc8 ffff8802327ebbb8 ffffffff81038635 0000000000000000 ffffffff811823c7 ffff88021a19e688 ffff88021a19e688 ffff8802302f9310 Call Trace: [] dump_stack+0x55/0x76 [] warn_slowpath_common+0x7c/0x96 [] ? sysfs_open_file+0x77/0x212 [] warn_slowpath_fmt+0x41/0x43 [] ? sysfs_get_active+0x6b/0x82 [] ? sysfs_open_file+0x32/0x212 [] sysfs_open_file+0x77/0x212 [] ? sysfs_schedule_callback+0x1ac/0x1ac [] do_dentry_open+0x17c/0x257 [] finish_open+0x41/0x4f [] do_last+0x80c/0x9ba [] ? inode_permission+0x40/0x42 [] path_openat+0x233/0x4a1 [] do_filp_open+0x35/0x85 [] ? __alloc_fd+0x172/0x184 [] do_sys_open+0x6b/0xfa [] SyS_openat+0xf/0x11 [] system_call_fastpath+0x16/0x1b To fix this, remove those sysfs files or put the associated kobject in case of such errors. Also, to make it simple, remove the cpufreq sysfs links from all the CPUs (except for the policy->cpu) during suspend, as that operation won't result in a loss of sysfs file permissions and we can create those links during resume just fine. Fixes: 5302c3fb2e62 ("cpufreq: Perform light-weight init/teardown during suspend/resume") Reported-and-tested-by: Bjørn Mork Signed-off-by: Viresh Kumar Cc: 3.12+ # 3.12+ [rjw: Changelog] Signed-off-by: Rafael J. Wysocki diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 02d534d..fab042e 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -845,8 +845,7 @@ static void cpufreq_init_policy(struct cpufreq_policy *policy) #ifdef CONFIG_HOTPLUG_CPU static int cpufreq_add_policy_cpu(struct cpufreq_policy *policy, - unsigned int cpu, struct device *dev, - bool frozen) + unsigned int cpu, struct device *dev) { int ret = 0; unsigned long flags; @@ -877,11 +876,7 @@ static int cpufreq_add_policy_cpu(struct cpufreq_policy *policy, } } - /* Don't touch sysfs links during light-weight init */ - if (!frozen) - ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq"); - - return ret; + return sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq"); } #endif @@ -926,6 +921,27 @@ err_free_policy: return NULL; } +static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy) +{ + struct kobject *kobj; + struct completion *cmp; + + down_read(&policy->rwsem); + kobj = &policy->kobj; + cmp = &policy->kobj_unregister; + up_read(&policy->rwsem); + kobject_put(kobj); + + /* + * We need to make sure that the underlying kobj is + * actually not referenced anymore by anybody before we + * proceed with unloading. + */ + pr_debug("waiting for dropping of refcount\n"); + wait_for_completion(cmp); + pr_debug("wait complete\n"); +} + static void cpufreq_policy_free(struct cpufreq_policy *policy) { free_cpumask_var(policy->related_cpus); @@ -986,7 +1002,7 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif, list_for_each_entry(tpolicy, &cpufreq_policy_list, policy_list) { if (cpumask_test_cpu(cpu, tpolicy->related_cpus)) { read_unlock_irqrestore(&cpufreq_driver_lock, flags); - ret = cpufreq_add_policy_cpu(tpolicy, cpu, dev, frozen); + ret = cpufreq_add_policy_cpu(tpolicy, cpu, dev); up_read(&cpufreq_rwsem); return ret; } @@ -1096,7 +1112,10 @@ err_get_freq: if (cpufreq_driver->exit) cpufreq_driver->exit(policy); err_set_policy_cpu: + if (frozen) + cpufreq_policy_put_kobj(policy); cpufreq_policy_free(policy); + nomem_out: up_read(&cpufreq_rwsem); @@ -1118,7 +1137,7 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif) } static int cpufreq_nominate_new_policy_cpu(struct cpufreq_policy *policy, - unsigned int old_cpu, bool frozen) + unsigned int old_cpu) { struct device *cpu_dev; int ret; @@ -1126,10 +1145,6 @@ static int cpufreq_nominate_new_policy_cpu(struct cpufreq_policy *policy, /* first sibling now owns the new sysfs dir */ cpu_dev = get_cpu_device(cpumask_any_but(policy->cpus, old_cpu)); - /* Don't touch sysfs files during light-weight tear-down */ - if (frozen) - return cpu_dev->id; - sysfs_remove_link(&cpu_dev->kobj, "cpufreq"); ret = kobject_move(&policy->kobj, &cpu_dev->kobj); if (ret) { @@ -1196,7 +1211,7 @@ static int __cpufreq_remove_dev_prepare(struct device *dev, if (!frozen) sysfs_remove_link(&dev->kobj, "cpufreq"); } else if (cpus > 1) { - new_cpu = cpufreq_nominate_new_policy_cpu(policy, cpu, frozen); + new_cpu = cpufreq_nominate_new_policy_cpu(policy, cpu); if (new_cpu >= 0) { update_policy_cpu(policy, new_cpu); @@ -1218,8 +1233,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev, int ret; unsigned long flags; struct cpufreq_policy *policy; - struct kobject *kobj; - struct completion *cmp; read_lock_irqsave(&cpufreq_driver_lock, flags); policy = per_cpu(cpufreq_cpu_data, cpu); @@ -1249,22 +1262,8 @@ static int __cpufreq_remove_dev_finish(struct device *dev, } } - if (!frozen) { - down_read(&policy->rwsem); - kobj = &policy->kobj; - cmp = &policy->kobj_unregister; - up_read(&policy->rwsem); - kobject_put(kobj); - - /* - * We need to make sure that the underlying kobj is - * actually not referenced anymore by anybody before we - * proceed with unloading. - */ - pr_debug("waiting for dropping of refcount\n"); - wait_for_completion(cmp); - pr_debug("wait complete\n"); - } + if (!frozen) + cpufreq_policy_put_kobj(policy); /* * Perform the ->exit() even during light-weight tear-down, -- cgit v0.10.2 From a27a9ab706c8f5bb8bbd320d2e9c5d089e380c6a Mon Sep 17 00:00:00 2001 From: Jason Baron Date: Thu, 19 Dec 2013 22:50:50 +0000 Subject: cpufreq: Use CONFIG_CPU_FREQ_DEFAULT_* to set initial policy for setpolicy drivers When configuring a default governor (via CONFIG_CPU_FREQ_DEFAULT_*) with the intel_pstate driver, the desired default policy is not properly set. For example, setting 'CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE' ends up with the 'powersave' policy being set. Fix by configuring the correct default policy, if either 'powersave' or 'performance' are requested. Otherwise, fallback to what the driver originally set via its 'init' routine. Signed-off-by: Jason Baron Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index fab042e..16d7b4a 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -828,6 +828,12 @@ static void cpufreq_init_policy(struct cpufreq_policy *policy) int ret = 0; memcpy(&new_policy, policy, sizeof(*policy)); + + /* Use the default policy if its valid. */ + if (cpufreq_driver->setpolicy) + cpufreq_parse_governor(policy->governor->name, + &new_policy.policy, NULL); + /* assure that the starting sequence is run in cpufreq_set_policy */ policy->governor = NULL; -- cgit v0.10.2 From c606850407d9096415e226c75a871d0650404446 Mon Sep 17 00:00:00 2001 From: Masami Ichikawa Date: Thu, 19 Dec 2013 20:00:47 +0900 Subject: PM / sleep: Fix memory leak in pm_vt_switch_unregister(). kmemleak reported a memory leak as below. unreferenced object 0xffff880118f14700 (size 32): comm "swapper/0", pid 1, jiffies 4294877401 (age 123.283s) hex dump (first 32 bytes): 00 01 10 00 00 00 ad de 00 02 20 00 00 00 ad de .......... ..... 00 d4 d2 18 01 88 ff ff 01 00 00 00 00 04 00 00 ................ backtrace: [] kmemleak_alloc+0x4e/0xb0 [] kmem_cache_alloc_trace+0x1ec/0x260 [] pm_vt_switch_required+0x76/0xb0 [] register_framebuffer+0x195/0x320 [] efifb_probe+0x718/0x780 [] platform_drv_probe+0x45/0xb0 [] driver_probe_device+0x87/0x3a0 [] __driver_attach+0x93/0xa0 [] bus_for_each_dev+0x63/0xa0 [] driver_attach+0x1e/0x20 [] bus_add_driver+0x180/0x250 [] driver_register+0x64/0xf0 [] __platform_driver_register+0x4a/0x50 [] efifb_driver_init+0x12/0x14 [] do_one_initcall+0xfa/0x1b0 [] kernel_init_freeable+0x17b/0x201 In pm_vt_switch_required(), "entry" variable is allocated via kmalloc(). So, in pm_vt_switch_unregister(), it needs to call kfree() when object is deleted from list. Signed-off-by: Masami Ichikawa Reviewed-by: Pavel Machek Signed-off-by: Rafael J. Wysocki diff --git a/kernel/power/console.c b/kernel/power/console.c index 463aa673..eacb8bd 100644 --- a/kernel/power/console.c +++ b/kernel/power/console.c @@ -81,6 +81,7 @@ void pm_vt_switch_unregister(struct device *dev) list_for_each_entry(tmp, &pm_vt_switch_list, head) { if (tmp->dev == dev) { list_del(&tmp->head); + kfree(tmp); break; } } -- cgit v0.10.2 From ed93b71492da3464b4798613aa8a99bed914251b Mon Sep 17 00:00:00 2001 From: Jacob Pan Date: Wed, 11 Dec 2013 14:39:27 -0800 Subject: powercap / RAPL: add support for ValleyView Soc This patch adds support for RAPL on Intel ValleyView based SoC platforms, such as Baytrail. Besides adding CPU ID, special energy unit encoding is handled for ValleyView. Signed-off-by: Jacob Pan Signed-off-by: Rafael J. Wysocki diff --git a/drivers/powercap/intel_rapl.c b/drivers/powercap/intel_rapl.c index 2a786c5..3c67683 100644 --- a/drivers/powercap/intel_rapl.c +++ b/drivers/powercap/intel_rapl.c @@ -833,6 +833,11 @@ static int rapl_write_data_raw(struct rapl_domain *rd, return 0; } +static const struct x86_cpu_id energy_unit_quirk_ids[] = { + { X86_VENDOR_INTEL, 6, 0x37},/* VLV */ + {} +}; + static int rapl_check_unit(struct rapl_package *rp, int cpu) { u64 msr_val; @@ -853,8 +858,11 @@ static int rapl_check_unit(struct rapl_package *rp, int cpu) * time unit: 1/time_unit_divisor Seconds */ value = (msr_val & ENERGY_UNIT_MASK) >> ENERGY_UNIT_OFFSET; - rp->energy_unit_divisor = 1 << value; - + /* some CPUs have different way to calculate energy unit */ + if (x86_match_cpu(energy_unit_quirk_ids)) + rp->energy_unit_divisor = 1000000 / (1 << value); + else + rp->energy_unit_divisor = 1 << value; value = (msr_val & POWER_UNIT_MASK) >> POWER_UNIT_OFFSET; rp->power_unit_divisor = 1 << value; @@ -941,6 +949,7 @@ static void package_power_limit_irq_restore(int package_id) static const struct x86_cpu_id rapl_ids[] = { { X86_VENDOR_INTEL, 6, 0x2a},/* SNB */ { X86_VENDOR_INTEL, 6, 0x2d},/* SNB EP */ + { X86_VENDOR_INTEL, 6, 0x37},/* VLV */ { X86_VENDOR_INTEL, 6, 0x3a},/* IVB */ { X86_VENDOR_INTEL, 6, 0x45},/* HSW */ /* TODO: Add more CPU IDs after testing */ -- cgit v0.10.2 From a68f9614614749727286f675d15f1e09d13cb54a Mon Sep 17 00:00:00 2001 From: Haiyang Zhang Date: Fri, 20 Dec 2013 16:52:31 -0800 Subject: hyperv: Fix race between probe and open calls Moving the register_netdev to the end of probe to prevent possible open call happens before NetVSP is connected. Signed-off-by: Haiyang Zhang Reviewed-by: K. Y. Srinivasan Signed-off-by: David S. Miller diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index f813572..71baeb3 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -261,9 +261,7 @@ int netvsc_recv_callback(struct hv_device *device_obj, struct sk_buff *skb; net = ((struct netvsc_device *)hv_get_drvdata(device_obj))->ndev; - if (!net) { - netdev_err(net, "got receive callback but net device" - " not initialized yet\n"); + if (!net || net->reg_state != NETREG_REGISTERED) { packet->status = NVSP_STAT_FAIL; return 0; } @@ -435,19 +433,11 @@ static int netvsc_probe(struct hv_device *dev, SET_ETHTOOL_OPS(net, ðtool_ops); SET_NETDEV_DEV(net, &dev->device); - ret = register_netdev(net); - if (ret != 0) { - pr_err("Unable to register netdev.\n"); - free_netdev(net); - goto out; - } - /* Notify the netvsc driver of the new device */ device_info.ring_size = ring_size; ret = rndis_filter_device_add(dev, &device_info); if (ret != 0) { netdev_err(net, "unable to add netvsc device (ret %d)\n", ret); - unregister_netdev(net); free_netdev(net); hv_set_drvdata(dev, NULL); return ret; @@ -456,7 +446,13 @@ static int netvsc_probe(struct hv_device *dev, netif_carrier_on(net); -out: + ret = register_netdev(net); + if (ret != 0) { + pr_err("Unable to register netdev.\n"); + rndis_filter_device_remove(dev); + free_netdev(net); + } + return ret; } -- cgit v0.10.2 From 3dc9acb67600393249a795934ccdfc291a200e6b Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Fri, 20 Dec 2013 05:11:12 +0900 Subject: aio: clean up and fix aio_setup_ring page mapping Since commit 36bc08cc01709 ("fs/aio: Add support to aio ring pages migration") the aio ring setup code has used a special per-ring backing inode for the page allocations, rather than just using random anonymous pages. However, rather than remembering the pages as it allocated them, it would allocate the pages, insert them into the file mapping (dirty, so that they couldn't be free'd), and then forget about them. And then to look them up again, it would mmap the mapping, and then use "get_user_pages()" to get back an array of the pages we just created. Now, not only is that incredibly inefficient, it also leaked all the pages if the mmap failed (which could happen due to excessive number of mappings, for example). So clean it all up, making it much more straightforward. Also remove some left-overs of the previous (broken) mm_populate() usage that was removed in commit d6c355c7dabc ("aio: fix race in ring buffer page lookup introduced by page migration support") but left the pointless and now misleading MAP_POPULATE flag around. Tested-and-acked-by: Benjamin LaHaise Signed-off-by: Linus Torvalds diff --git a/fs/aio.c b/fs/aio.c index 6efb7f6..643db8f 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -326,7 +326,7 @@ static int aio_setup_ring(struct kioctx *ctx) struct aio_ring *ring; unsigned nr_events = ctx->max_reqs; struct mm_struct *mm = current->mm; - unsigned long size, populate; + unsigned long size, unused; int nr_pages; int i; struct file *file; @@ -347,6 +347,20 @@ static int aio_setup_ring(struct kioctx *ctx) return -EAGAIN; } + ctx->aio_ring_file = file; + nr_events = (PAGE_SIZE * nr_pages - sizeof(struct aio_ring)) + / sizeof(struct io_event); + + ctx->ring_pages = ctx->internal_pages; + if (nr_pages > AIO_RING_PAGES) { + ctx->ring_pages = kcalloc(nr_pages, sizeof(struct page *), + GFP_KERNEL); + if (!ctx->ring_pages) { + put_aio_ring_file(ctx); + return -ENOMEM; + } + } + for (i = 0; i < nr_pages; i++) { struct page *page; page = find_or_create_page(file->f_inode->i_mapping, @@ -358,19 +372,14 @@ static int aio_setup_ring(struct kioctx *ctx) SetPageUptodate(page); SetPageDirty(page); unlock_page(page); + + ctx->ring_pages[i] = page; } - ctx->aio_ring_file = file; - nr_events = (PAGE_SIZE * nr_pages - sizeof(struct aio_ring)) - / sizeof(struct io_event); + ctx->nr_pages = i; - ctx->ring_pages = ctx->internal_pages; - if (nr_pages > AIO_RING_PAGES) { - ctx->ring_pages = kcalloc(nr_pages, sizeof(struct page *), - GFP_KERNEL); - if (!ctx->ring_pages) { - put_aio_ring_file(ctx); - return -ENOMEM; - } + if (unlikely(i != nr_pages)) { + aio_free_ring(ctx); + return -EAGAIN; } ctx->mmap_size = nr_pages * PAGE_SIZE; @@ -379,9 +388,9 @@ static int aio_setup_ring(struct kioctx *ctx) down_write(&mm->mmap_sem); ctx->mmap_base = do_mmap_pgoff(ctx->aio_ring_file, 0, ctx->mmap_size, PROT_READ | PROT_WRITE, - MAP_SHARED | MAP_POPULATE, 0, &populate); + MAP_SHARED, 0, &unused); + up_write(&mm->mmap_sem); if (IS_ERR((void *)ctx->mmap_base)) { - up_write(&mm->mmap_sem); ctx->mmap_size = 0; aio_free_ring(ctx); return -EAGAIN; @@ -389,27 +398,6 @@ static int aio_setup_ring(struct kioctx *ctx) pr_debug("mmap address: 0x%08lx\n", ctx->mmap_base); - /* We must do this while still holding mmap_sem for write, as we - * need to be protected against userspace attempting to mremap() - * or munmap() the ring buffer. - */ - ctx->nr_pages = get_user_pages(current, mm, ctx->mmap_base, nr_pages, - 1, 0, ctx->ring_pages, NULL); - - /* Dropping the reference here is safe as the page cache will hold - * onto the pages for us. It is also required so that page migration - * can unmap the pages and get the right reference count. - */ - for (i = 0; i < ctx->nr_pages; i++) - put_page(ctx->ring_pages[i]); - - up_write(&mm->mmap_sem); - - if (unlikely(ctx->nr_pages != nr_pages)) { - aio_free_ring(ctx); - return -EAGAIN; - } - ctx->user_id = ctx->mmap_base; ctx->nr_events = nr_events; /* trusted copy */ -- cgit v0.10.2 From 413541dd66d51f791a0b169d9b9014e4f56be13c Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 22 Dec 2013 13:08:32 -0800 Subject: Linux 3.13-rc5 diff --git a/Makefile b/Makefile index 82ea7dd..14d592c 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ VERSION = 3 PATCHLEVEL = 13 SUBLEVEL = 0 -EXTRAVERSION = -rc4 +EXTRAVERSION = -rc5 NAME = One Giant Leap for Frogkind # *DOCUMENTATION* -- cgit v0.10.2 From b6f8eaece6d5f0247931b6dac140e6cf876f48de Mon Sep 17 00:00:00 2001 From: Kumar Sanghvi Date: Wed, 18 Dec 2013 16:38:19 +0530 Subject: cxgb4: Reserve stid 0 for T4/T5 adapters When creating offload server entries, an IPv6 passive connection request can trigger a reply with a null STID, whereas the driver would expect the reply 'STID to match the value used for the request. This happens due to h/w limitation on T4 and T5. This patch ensures that STID 0 is never used if the stid range starts from zero. Based on original work by Santosh Rastapur Signed-off-by: Kumar Sanghvi Signed-off-by: Hariprasad Shenai Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index d6b12e0..f36f8e1 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -3134,6 +3134,7 @@ static int tid_init(struct tid_info *t) size_t size; unsigned int stid_bmap_size; unsigned int natids = t->natids; + struct adapter *adap = container_of(t, struct adapter, tids); stid_bmap_size = BITS_TO_LONGS(t->nstids + t->nsftids); size = t->ntids * sizeof(*t->tid_tab) + @@ -3167,6 +3168,11 @@ static int tid_init(struct tid_info *t) t->afree = t->atid_tab; } bitmap_zero(t->stid_bmap, t->nstids + t->nsftids); + /* Reserve stid 0 for T4/T5 adapters */ + if (!t->stid_base && + (is_t4(adap->params.chip) || is_t5(adap->params.chip))) + __set_bit(0, t->stid_bmap); + return 0; } -- cgit v0.10.2 From 7c89e5550ccb2a3118854639d9525847e896c686 Mon Sep 17 00:00:00 2001 From: Kumar Sanghvi Date: Wed, 18 Dec 2013 16:38:20 +0530 Subject: cxgb4: Include TCP as protocol when creating server filters We were creating LE Workaround Server Filters without specifying IPPROTO_TCP (6) in the filters (when F_PROTOCOL is set in TP_VLAN_PRI_MAP). This meant that UDP packets with matching IP Addresses/Ports would get caught up in the filter and be delivered to ULDs like iw_cxgb4. So, include the protocol information in the server filter properly. Based on original work by Casey Leedom Signed-off-by: Kumar Sanghvi Signed-off-by: Hariprasad Shenai Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index f36f8e1..df1d6b8 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -4217,6 +4217,11 @@ int cxgb4_create_server_filter(const struct net_device *dev, unsigned int stid, } } + if (adap->filter_mode & F_PROTOCOL) { + f->fs.val.proto = IPPROTO_TCP; + f->fs.mask.proto = ~0; + } + f->fs.dirsteer = 1; f->fs.iq = queue; /* Mark filter as locked */ diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h index 0a8205d..d3dd218 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h @@ -1171,6 +1171,10 @@ #define A_TP_TX_SCHED_PCMD 0x25 +#define S_PROTOCOL 5 +#define V_PROTOCOL(x) ((x) << S_PROTOCOL) +#define F_PROTOCOL V_PROTOCOL(1U) + #define S_PORT 1 #define V_PORT(x) ((x) << S_PORT) #define F_PORT V_PORT(1U) -- cgit v0.10.2 From 470c60c47a03f74f0ec1a83576eb6000025634a9 Mon Sep 17 00:00:00 2001 From: Kumar Sanghvi Date: Wed, 18 Dec 2013 16:38:21 +0530 Subject: cxgb4: Assign filter server TIDs properly The LE workaround code is incorrectly reusing the TCAM TIDs (meant for allocation by firmware in case of hash collisions) for filter servers. This patch assigns the filter server TIDs properly starting from sftid_base index. Based on original work by Santosh Rastapur Signed-off-by: Kumar Sanghvi Signed-off-by: Hariprasad Shenai Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index df1d6b8..f4f46fc 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -3012,7 +3012,8 @@ int cxgb4_alloc_sftid(struct tid_info *t, int family, void *data) } if (stid >= 0) { t->stid_tab[stid].data = data; - stid += t->stid_base; + stid -= t->nstids; + stid += t->sftid_base; t->stids_in_use++; } spin_unlock_bh(&t->stid_lock); @@ -3024,7 +3025,14 @@ EXPORT_SYMBOL(cxgb4_alloc_sftid); */ void cxgb4_free_stid(struct tid_info *t, unsigned int stid, int family) { - stid -= t->stid_base; + /* Is it a server filter TID? */ + if (t->nsftids && (stid >= t->sftid_base)) { + stid -= t->sftid_base; + stid += t->nstids; + } else { + stid -= t->stid_base; + } + spin_lock_bh(&t->stid_lock); if (family == PF_INET) __clear_bit(stid, t->stid_bmap); @@ -4185,7 +4193,7 @@ int cxgb4_create_server_filter(const struct net_device *dev, unsigned int stid, adap = netdev2adap(dev); /* Adjust stid to correct filter index */ - stid -= adap->tids.nstids; + stid -= adap->tids.sftid_base; stid += adap->tids.nftids; /* Check to make sure the filter requested is writable ... @@ -4248,7 +4256,7 @@ int cxgb4_remove_server_filter(const struct net_device *dev, unsigned int stid, adap = netdev2adap(dev); /* Adjust stid to correct filter index */ - stid -= adap->tids.nstids; + stid -= adap->tids.sftid_base; stid += adap->tids.nftids; f = &adap->tids.ftid_tab[stid]; diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h index 6f21f24..4dd0a82 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h @@ -131,7 +131,14 @@ static inline void *lookup_atid(const struct tid_info *t, unsigned int atid) static inline void *lookup_stid(const struct tid_info *t, unsigned int stid) { - stid -= t->stid_base; + /* Is it a server filter TID? */ + if (t->nsftids && (stid >= t->sftid_base)) { + stid -= t->sftid_base; + stid += t->nstids; + } else { + stid -= t->stid_base; + } + return stid < (t->nstids + t->nsftids) ? t->stid_tab[stid].data : NULL; } -- cgit v0.10.2 From 15f63b74c25797f9246b1738f0cfabd5febe226e Mon Sep 17 00:00:00 2001 From: Kumar Sanghvi Date: Wed, 18 Dec 2013 16:38:22 +0530 Subject: cxgb4: Account for stid entries properly in case of IPv6 IPv6 uses 2 TIDs with CLIP enabled and 4 TIDs without CLIP. Currently we are incrementing STIDs in use by 1 for both IPv4 and IPv6 which is wrong. Further, driver currently does not have interface to query if CLIP is programmed for particular IPv6 address. So, in this patch we increment/decrement TIDs in use by 4 for IPv6 assuming absence of CLIP. Such assumption keeps us on safe side and we don't end up allocating more stids for IPv6 than actually supported. Based on original work by Santosh Rastapur Signed-off-by: Kumar Sanghvi Signed-off-by: Hariprasad Shenai Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index f4f46fc..f9ca36c 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -2986,7 +2986,14 @@ int cxgb4_alloc_stid(struct tid_info *t, int family, void *data) if (stid >= 0) { t->stid_tab[stid].data = data; stid += t->stid_base; - t->stids_in_use++; + /* IPv6 requires max of 520 bits or 16 cells in TCAM + * This is equivalent to 4 TIDs. With CLIP enabled it + * needs 2 TIDs. + */ + if (family == PF_INET) + t->stids_in_use++; + else + t->stids_in_use += 4; } spin_unlock_bh(&t->stid_lock); return stid; @@ -3039,7 +3046,10 @@ void cxgb4_free_stid(struct tid_info *t, unsigned int stid, int family) else bitmap_release_region(t->stid_bmap, stid, 2); t->stid_tab[stid].data = NULL; - t->stids_in_use--; + if (family == PF_INET) + t->stids_in_use--; + else + t->stids_in_use -= 4; spin_unlock_bh(&t->stid_lock); } EXPORT_SYMBOL(cxgb4_free_stid); -- cgit v0.10.2 From dcf7b6f5bdeaa13d5e465d8795d2e7d6d1e27b65 Mon Sep 17 00:00:00 2001 From: Kumar Sanghvi Date: Wed, 18 Dec 2013 16:38:23 +0530 Subject: cxgb4: Add API to correctly calculate tuple fields Adds API cxgb4_select_ntuple so as to enable Upper Level Drivers to correctly calculate the tuple fields. Adds constant definitions for TP_VLAN_PRI_MAP for the Compressed Filter Tuple field widths and structures and uses them. Also, the CPL Parameters field for T5 is 40 bits so we need to prototype cxgb4_select_ntuple() to calculate and return u64 values. Based on original work by Casey Leedom Signed-off-by: Kumar Sanghvi Signed-off-by: Hariprasad Shenai Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h index 6c93088..56e0415 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h @@ -228,6 +228,25 @@ struct tp_params { uint32_t dack_re; /* DACK timer resolution */ unsigned short tx_modq[NCHAN]; /* channel to modulation queue map */ + + u32 vlan_pri_map; /* cached TP_VLAN_PRI_MAP */ + u32 ingress_config; /* cached TP_INGRESS_CONFIG */ + + /* TP_VLAN_PRI_MAP Compressed Filter Tuple field offsets. This is a + * subset of the set of fields which may be present in the Compressed + * Filter Tuple portion of filters and TCP TCB connections. The + * fields which are present are controlled by the TP_VLAN_PRI_MAP. + * Since a variable number of fields may or may not be present, their + * shifted field positions within the Compressed Filter Tuple may + * vary, or not even be present if the field isn't selected in + * TP_VLAN_PRI_MAP. Since some of these fields are needed in various + * places we store their offsets here, or a -1 if the field isn't + * present. + */ + int vlan_shift; + int vnic_shift; + int port_shift; + int protocol_shift; }; struct vpd_params { @@ -926,6 +945,8 @@ int t4_prep_fw(struct adapter *adap, struct fw_info *fw_info, const u8 *fw_data, unsigned int fw_size, struct fw_hdr *card_fw, enum dev_state state, int *reset); int t4_prep_adapter(struct adapter *adapter); +int t4_init_tp_params(struct adapter *adap); +int t4_filter_field_shift(const struct adapter *adap, int filter_sel); int t4_port_init(struct adapter *adap, int mbox, int pf, int vf); void t4_fatal_err(struct adapter *adapter); int t4_config_rss_range(struct adapter *adapter, int mbox, unsigned int viid, diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index f9ca36c..fff02ed 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -3755,7 +3755,7 @@ static void uld_attach(struct adapter *adap, unsigned int uld) lli.ucq_density = 1 << QUEUESPERPAGEPF0_GET( t4_read_reg(adap, SGE_INGRESS_QUEUES_PER_PAGE_PF) >> (adap->fn * 4)); - lli.filt_mode = adap->filter_mode; + lli.filt_mode = adap->params.tp.vlan_pri_map; /* MODQ_REQ_MAP sets queues 0-3 to chan 0-3 */ for (i = 0; i < NCHAN; i++) lli.tx_modq[i] = i; @@ -4229,13 +4229,13 @@ int cxgb4_create_server_filter(const struct net_device *dev, unsigned int stid, f->fs.val.lip[i] = val[i]; f->fs.mask.lip[i] = ~0; } - if (adap->filter_mode & F_PORT) { + if (adap->params.tp.vlan_pri_map & F_PORT) { f->fs.val.iport = port; f->fs.mask.iport = mask; } } - if (adap->filter_mode & F_PROTOCOL) { + if (adap->params.tp.vlan_pri_map & F_PROTOCOL) { f->fs.val.proto = IPPROTO_TCP; f->fs.mask.proto = ~0; } @@ -5121,7 +5121,7 @@ static int adap_init0(struct adapter *adap) enum dev_state state; u32 params[7], val[7]; struct fw_caps_config_cmd caps_cmd; - int reset = 1, j; + int reset = 1; /* * Contact FW, advertising Master capability (and potentially forcing @@ -5463,21 +5463,11 @@ static int adap_init0(struct adapter *adap) /* * These are finalized by FW initialization, load their values now. */ - v = t4_read_reg(adap, TP_TIMER_RESOLUTION); - adap->params.tp.tre = TIMERRESOLUTION_GET(v); - adap->params.tp.dack_re = DELAYEDACKRESOLUTION_GET(v); t4_read_mtu_tbl(adap, adap->params.mtus, NULL); t4_load_mtus(adap, adap->params.mtus, adap->params.a_wnd, adap->params.b_wnd); - /* MODQ_REQ_MAP defaults to setting queues 0-3 to chan 0-3 */ - for (j = 0; j < NCHAN; j++) - adap->params.tp.tx_modq[j] = j; - - t4_read_indirect(adap, TP_PIO_ADDR, TP_PIO_DATA, - &adap->filter_mode, 1, - TP_VLAN_PRI_MAP); - + t4_init_tp_params(adap); adap->flags |= FW_OK; return 0; diff --git a/drivers/net/ethernet/chelsio/cxgb4/l2t.c b/drivers/net/ethernet/chelsio/cxgb4/l2t.c index 2987809..cb05be9 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/l2t.c +++ b/drivers/net/ethernet/chelsio/cxgb4/l2t.c @@ -45,6 +45,7 @@ #include "l2t.h" #include "t4_msg.h" #include "t4fw_api.h" +#include "t4_regs.h" #define VLAN_NONE 0xfff @@ -411,6 +412,40 @@ done: } EXPORT_SYMBOL(cxgb4_l2t_get); +u64 cxgb4_select_ntuple(struct net_device *dev, + const struct l2t_entry *l2t) +{ + struct adapter *adap = netdev2adap(dev); + struct tp_params *tp = &adap->params.tp; + u64 ntuple = 0; + + /* Initialize each of the fields which we care about which are present + * in the Compressed Filter Tuple. + */ + if (tp->vlan_shift >= 0 && l2t->vlan != VLAN_NONE) + ntuple |= (F_FT_VLAN_VLD | l2t->vlan) << tp->vlan_shift; + + if (tp->port_shift >= 0) + ntuple |= (u64)l2t->lport << tp->port_shift; + + if (tp->protocol_shift >= 0) + ntuple |= (u64)IPPROTO_TCP << tp->protocol_shift; + + if (tp->vnic_shift >= 0) { + u32 viid = cxgb4_port_viid(dev); + u32 vf = FW_VIID_VIN_GET(viid); + u32 pf = FW_VIID_PFN_GET(viid); + u32 vld = FW_VIID_VIVLD_GET(viid); + + ntuple |= (u64)(V_FT_VNID_ID_VF(vf) | + V_FT_VNID_ID_PF(pf) | + V_FT_VNID_ID_VLD(vld)) << tp->vnic_shift; + } + + return ntuple; +} +EXPORT_SYMBOL(cxgb4_select_ntuple); + /* * Called when address resolution fails for an L2T entry to handle packets * on the arpq head. If a packet specifies a failure handler it is invoked, diff --git a/drivers/net/ethernet/chelsio/cxgb4/l2t.h b/drivers/net/ethernet/chelsio/cxgb4/l2t.h index 108c0f1..85eb5c7 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/l2t.h +++ b/drivers/net/ethernet/chelsio/cxgb4/l2t.h @@ -98,7 +98,8 @@ int cxgb4_l2t_send(struct net_device *dev, struct sk_buff *skb, struct l2t_entry *cxgb4_l2t_get(struct l2t_data *d, struct neighbour *neigh, const struct net_device *physdev, unsigned int priority); - +u64 cxgb4_select_ntuple(struct net_device *dev, + const struct l2t_entry *l2t); void t4_l2t_update(struct adapter *adap, struct neighbour *neigh); struct l2t_entry *t4_l2t_alloc_switching(struct l2t_data *d); int t4_l2t_set_switching(struct adapter *adap, struct l2t_entry *e, u16 vlan, diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c index 74a6fce..e1413ea 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c @@ -3808,6 +3808,109 @@ int t4_prep_adapter(struct adapter *adapter) return 0; } +/** + * t4_init_tp_params - initialize adap->params.tp + * @adap: the adapter + * + * Initialize various fields of the adapter's TP Parameters structure. + */ +int t4_init_tp_params(struct adapter *adap) +{ + int chan; + u32 v; + + v = t4_read_reg(adap, TP_TIMER_RESOLUTION); + adap->params.tp.tre = TIMERRESOLUTION_GET(v); + adap->params.tp.dack_re = DELAYEDACKRESOLUTION_GET(v); + + /* MODQ_REQ_MAP defaults to setting queues 0-3 to chan 0-3 */ + for (chan = 0; chan < NCHAN; chan++) + adap->params.tp.tx_modq[chan] = chan; + + /* Cache the adapter's Compressed Filter Mode and global Incress + * Configuration. + */ + t4_read_indirect(adap, TP_PIO_ADDR, TP_PIO_DATA, + &adap->params.tp.vlan_pri_map, 1, + TP_VLAN_PRI_MAP); + t4_read_indirect(adap, TP_PIO_ADDR, TP_PIO_DATA, + &adap->params.tp.ingress_config, 1, + TP_INGRESS_CONFIG); + + /* Now that we have TP_VLAN_PRI_MAP cached, we can calculate the field + * shift positions of several elements of the Compressed Filter Tuple + * for this adapter which we need frequently ... + */ + adap->params.tp.vlan_shift = t4_filter_field_shift(adap, F_VLAN); + adap->params.tp.vnic_shift = t4_filter_field_shift(adap, F_VNIC_ID); + adap->params.tp.port_shift = t4_filter_field_shift(adap, F_PORT); + adap->params.tp.protocol_shift = t4_filter_field_shift(adap, + F_PROTOCOL); + + /* If TP_INGRESS_CONFIG.VNID == 0, then TP_VLAN_PRI_MAP.VNIC_ID + * represents the presense of an Outer VLAN instead of a VNIC ID. + */ + if ((adap->params.tp.ingress_config & F_VNIC) == 0) + adap->params.tp.vnic_shift = -1; + + return 0; +} + +/** + * t4_filter_field_shift - calculate filter field shift + * @adap: the adapter + * @filter_sel: the desired field (from TP_VLAN_PRI_MAP bits) + * + * Return the shift position of a filter field within the Compressed + * Filter Tuple. The filter field is specified via its selection bit + * within TP_VLAN_PRI_MAL (filter mode). E.g. F_VLAN. + */ +int t4_filter_field_shift(const struct adapter *adap, int filter_sel) +{ + unsigned int filter_mode = adap->params.tp.vlan_pri_map; + unsigned int sel; + int field_shift; + + if ((filter_mode & filter_sel) == 0) + return -1; + + for (sel = 1, field_shift = 0; sel < filter_sel; sel <<= 1) { + switch (filter_mode & sel) { + case F_FCOE: + field_shift += W_FT_FCOE; + break; + case F_PORT: + field_shift += W_FT_PORT; + break; + case F_VNIC_ID: + field_shift += W_FT_VNIC_ID; + break; + case F_VLAN: + field_shift += W_FT_VLAN; + break; + case F_TOS: + field_shift += W_FT_TOS; + break; + case F_PROTOCOL: + field_shift += W_FT_PROTOCOL; + break; + case F_ETHERTYPE: + field_shift += W_FT_ETHERTYPE; + break; + case F_MACMATCH: + field_shift += W_FT_MACMATCH; + break; + case F_MPSHITTYPE: + field_shift += W_FT_MPSHITTYPE; + break; + case F_FRAGMENTATION: + field_shift += W_FT_FRAGMENTATION; + break; + } + } + return field_shift; +} + int t4_port_init(struct adapter *adap, int mbox, int pf, int vf) { u8 addr[6]; diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h index d3dd218..4082522 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h +++ b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h @@ -1171,14 +1171,50 @@ #define A_TP_TX_SCHED_PCMD 0x25 +#define S_VNIC 11 +#define V_VNIC(x) ((x) << S_VNIC) +#define F_VNIC V_VNIC(1U) + +#define S_FRAGMENTATION 9 +#define V_FRAGMENTATION(x) ((x) << S_FRAGMENTATION) +#define F_FRAGMENTATION V_FRAGMENTATION(1U) + +#define S_MPSHITTYPE 8 +#define V_MPSHITTYPE(x) ((x) << S_MPSHITTYPE) +#define F_MPSHITTYPE V_MPSHITTYPE(1U) + +#define S_MACMATCH 7 +#define V_MACMATCH(x) ((x) << S_MACMATCH) +#define F_MACMATCH V_MACMATCH(1U) + +#define S_ETHERTYPE 6 +#define V_ETHERTYPE(x) ((x) << S_ETHERTYPE) +#define F_ETHERTYPE V_ETHERTYPE(1U) + #define S_PROTOCOL 5 #define V_PROTOCOL(x) ((x) << S_PROTOCOL) #define F_PROTOCOL V_PROTOCOL(1U) +#define S_TOS 4 +#define V_TOS(x) ((x) << S_TOS) +#define F_TOS V_TOS(1U) + +#define S_VLAN 3 +#define V_VLAN(x) ((x) << S_VLAN) +#define F_VLAN V_VLAN(1U) + +#define S_VNIC_ID 2 +#define V_VNIC_ID(x) ((x) << S_VNIC_ID) +#define F_VNIC_ID V_VNIC_ID(1U) + #define S_PORT 1 #define V_PORT(x) ((x) << S_PORT) #define F_PORT V_PORT(1U) +#define S_FCOE 0 +#define V_FCOE(x) ((x) << S_FCOE) +#define F_FCOE V_FCOE(1U) + #define NUM_MPS_CLS_SRAM_L_INSTANCES 336 #define NUM_MPS_T5_CLS_SRAM_L_INSTANCES 512 @@ -1217,4 +1253,37 @@ #define V_CHIPID(x) ((x) << S_CHIPID) #define G_CHIPID(x) (((x) >> S_CHIPID) & M_CHIPID) +/* TP_VLAN_PRI_MAP controls which subset of fields will be present in the + * Compressed Filter Tuple for LE filters. Each bit set in TP_VLAN_PRI_MAP + * selects for a particular field being present. These fields, when present + * in the Compressed Filter Tuple, have the following widths in bits. + */ +#define W_FT_FCOE 1 +#define W_FT_PORT 3 +#define W_FT_VNIC_ID 17 +#define W_FT_VLAN 17 +#define W_FT_TOS 8 +#define W_FT_PROTOCOL 8 +#define W_FT_ETHERTYPE 16 +#define W_FT_MACMATCH 9 +#define W_FT_MPSHITTYPE 3 +#define W_FT_FRAGMENTATION 1 + +/* Some of the Compressed Filter Tuple fields have internal structure. These + * bit shifts/masks describe those structures. All shifts are relative to the + * base position of the fields within the Compressed Filter Tuple + */ +#define S_FT_VLAN_VLD 16 +#define V_FT_VLAN_VLD(x) ((x) << S_FT_VLAN_VLD) +#define F_FT_VLAN_VLD V_FT_VLAN_VLD(1U) + +#define S_FT_VNID_ID_VF 0 +#define V_FT_VNID_ID_VF(x) ((x) << S_FT_VNID_ID_VF) + +#define S_FT_VNID_ID_PF 7 +#define V_FT_VNID_ID_PF(x) ((x) << S_FT_VNID_ID_PF) + +#define S_FT_VNID_ID_VLD 16 +#define V_FT_VNID_ID_VLD(x) ((x) << S_FT_VNID_ID_VLD) + #endif /* __T4_REGS_H */ -- cgit v0.10.2 From a4ea025fc24532bae8a038d038f8e0f15b8a7d98 Mon Sep 17 00:00:00 2001 From: Kumar Sanghvi Date: Wed, 18 Dec 2013 16:38:24 +0530 Subject: RDMA/cxgb4: Calculate the filter server TID properly Based on original work by Santosh Rastapur Signed-off-by: Kumar Sanghvi Signed-off-by: Hariprasad Shenai Signed-off-by: David S. Miller diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c index 12fef76..02c7515 100644 --- a/drivers/infiniband/hw/cxgb4/cm.c +++ b/drivers/infiniband/hw/cxgb4/cm.c @@ -3323,9 +3323,7 @@ static int rx_pkt(struct c4iw_dev *dev, struct sk_buff *skb) /* * Calculate the server tid from filter hit index from cpl_rx_pkt. */ - stid = (__force int) cpu_to_be32((__force u32) rss->hash_val) - - dev->rdev.lldi.tids->sftid_base - + dev->rdev.lldi.tids->nstids; + stid = (__force int) cpu_to_be32((__force u32) rss->hash_val); lep = (struct c4iw_ep *)lookup_stid(dev->rdev.lldi.tids, stid); if (!lep) { -- cgit v0.10.2 From 8c04469057c3307d92d2c7076edcf9eb8f752bca Mon Sep 17 00:00:00 2001 From: Kumar Sanghvi Date: Wed, 18 Dec 2013 16:38:25 +0530 Subject: RDMA/cxgb4: Server filters are supported only for IPv4 Signed-off-by: Kumar Sanghvi Signed-off-by: Hariprasad Shenai Signed-off-by: David S. Miller diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c index 02c7515..bfd4ed7 100644 --- a/drivers/infiniband/hw/cxgb4/cm.c +++ b/drivers/infiniband/hw/cxgb4/cm.c @@ -2938,7 +2938,8 @@ int c4iw_create_listen(struct iw_cm_id *cm_id, int backlog) /* * Allocate a server TID. */ - if (dev->rdev.lldi.enable_fw_ofld_conn) + if (dev->rdev.lldi.enable_fw_ofld_conn && + ep->com.local_addr.ss_family == AF_INET) ep->stid = cxgb4_alloc_sftid(dev->rdev.lldi.tids, cm_id->local_addr.ss_family, ep); else -- cgit v0.10.2 From 41b4f86c1368758a02129e0629a9cc7c2811fa32 Mon Sep 17 00:00:00 2001 From: Kumar Sanghvi Date: Wed, 18 Dec 2013 16:38:26 +0530 Subject: RDMA/cxgb4: Use cxgb4_select_ntuple to correctly calculate ntuple fields Signed-off-by: Kumar Sanghvi Signed-off-by: Hariprasad Shenai Signed-off-by: David S. Miller diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c index bfd4ed7..4512687 100644 --- a/drivers/infiniband/hw/cxgb4/cm.c +++ b/drivers/infiniband/hw/cxgb4/cm.c @@ -524,50 +524,6 @@ static int send_abort(struct c4iw_ep *ep, struct sk_buff *skb, gfp_t gfp) return c4iw_l2t_send(&ep->com.dev->rdev, skb, ep->l2t); } -#define VLAN_NONE 0xfff -#define FILTER_SEL_VLAN_NONE 0xffff -#define FILTER_SEL_WIDTH_P_FC (3+1) /* port uses 3 bits, FCoE one bit */ -#define FILTER_SEL_WIDTH_VIN_P_FC \ - (6 + 7 + FILTER_SEL_WIDTH_P_FC) /* 6 bits are unused, VF uses 7 bits*/ -#define FILTER_SEL_WIDTH_TAG_P_FC \ - (3 + FILTER_SEL_WIDTH_VIN_P_FC) /* PF uses 3 bits */ -#define FILTER_SEL_WIDTH_VLD_TAG_P_FC (1 + FILTER_SEL_WIDTH_TAG_P_FC) - -static unsigned int select_ntuple(struct c4iw_dev *dev, struct dst_entry *dst, - struct l2t_entry *l2t) -{ - unsigned int ntuple = 0; - u32 viid; - - switch (dev->rdev.lldi.filt_mode) { - - /* default filter mode */ - case HW_TPL_FR_MT_PR_IV_P_FC: - if (l2t->vlan == VLAN_NONE) - ntuple |= FILTER_SEL_VLAN_NONE << FILTER_SEL_WIDTH_P_FC; - else { - ntuple |= l2t->vlan << FILTER_SEL_WIDTH_P_FC; - ntuple |= 1 << FILTER_SEL_WIDTH_TAG_P_FC; - } - ntuple |= l2t->lport << S_PORT | IPPROTO_TCP << - FILTER_SEL_WIDTH_VLD_TAG_P_FC; - break; - case HW_TPL_FR_MT_PR_OV_P_FC: { - viid = cxgb4_port_viid(l2t->neigh->dev); - - ntuple |= FW_VIID_VIN_GET(viid) << FILTER_SEL_WIDTH_P_FC; - ntuple |= FW_VIID_PFN_GET(viid) << FILTER_SEL_WIDTH_VIN_P_FC; - ntuple |= FW_VIID_VIVLD_GET(viid) << FILTER_SEL_WIDTH_TAG_P_FC; - ntuple |= l2t->lport << S_PORT | IPPROTO_TCP << - FILTER_SEL_WIDTH_VLD_TAG_P_FC; - break; - } - default: - break; - } - return ntuple; -} - static int send_connect(struct c4iw_ep *ep) { struct cpl_act_open_req *req; @@ -641,8 +597,9 @@ static int send_connect(struct c4iw_ep *ep) req->local_ip = la->sin_addr.s_addr; req->peer_ip = ra->sin_addr.s_addr; req->opt0 = cpu_to_be64(opt0); - req->params = cpu_to_be32(select_ntuple(ep->com.dev, - ep->dst, ep->l2t)); + req->params = cpu_to_be32(cxgb4_select_ntuple( + ep->com.dev->rdev.lldi.ports[0], + ep->l2t)); req->opt2 = cpu_to_be32(opt2); } else { req6 = (struct cpl_act_open_req6 *)skb_put(skb, wrlen); @@ -662,9 +619,9 @@ static int send_connect(struct c4iw_ep *ep) req6->peer_ip_lo = *((__be64 *) (ra6->sin6_addr.s6_addr + 8)); req6->opt0 = cpu_to_be64(opt0); - req6->params = cpu_to_be32( - select_ntuple(ep->com.dev, ep->dst, - ep->l2t)); + req6->params = cpu_to_be32(cxgb4_select_ntuple( + ep->com.dev->rdev.lldi.ports[0], + ep->l2t)); req6->opt2 = cpu_to_be32(opt2); } } else { @@ -681,8 +638,9 @@ static int send_connect(struct c4iw_ep *ep) t5_req->peer_ip = ra->sin_addr.s_addr; t5_req->opt0 = cpu_to_be64(opt0); t5_req->params = cpu_to_be64(V_FILTER_TUPLE( - select_ntuple(ep->com.dev, - ep->dst, ep->l2t))); + cxgb4_select_ntuple( + ep->com.dev->rdev.lldi.ports[0], + ep->l2t))); t5_req->opt2 = cpu_to_be32(opt2); } else { t5_req6 = (struct cpl_t5_act_open_req6 *) @@ -703,7 +661,9 @@ static int send_connect(struct c4iw_ep *ep) (ra6->sin6_addr.s6_addr + 8)); t5_req6->opt0 = cpu_to_be64(opt0); t5_req6->params = (__force __be64)cpu_to_be32( - select_ntuple(ep->com.dev, ep->dst, ep->l2t)); + cxgb4_select_ntuple( + ep->com.dev->rdev.lldi.ports[0], + ep->l2t)); t5_req6->opt2 = cpu_to_be32(opt2); } } @@ -1630,7 +1590,8 @@ static void send_fw_act_open_req(struct c4iw_ep *ep, unsigned int atid) memset(req, 0, sizeof(*req)); req->op_compl = htonl(V_WR_OP(FW_OFLD_CONNECTION_WR)); req->len16_pkd = htonl(FW_WR_LEN16(DIV_ROUND_UP(sizeof(*req), 16))); - req->le.filter = cpu_to_be32(select_ntuple(ep->com.dev, ep->dst, + req->le.filter = cpu_to_be32(cxgb4_select_ntuple( + ep->com.dev->rdev.lldi.ports[0], ep->l2t)); sin = (struct sockaddr_in *)&ep->com.local_addr; req->le.lport = sin->sin_port; @@ -3396,7 +3357,9 @@ static int rx_pkt(struct c4iw_dev *dev, struct sk_buff *skb) window = (__force u16) htons((__force u16)tcph->window); /* Calcuate filter portion for LE region. */ - filter = (__force unsigned int) cpu_to_be32(select_ntuple(dev, dst, e)); + filter = (__force unsigned int) cpu_to_be32(cxgb4_select_ntuple( + dev->rdev.lldi.ports[0], + e)); /* * Synthesize the cpl_pass_accept_req. We have everything except the -- cgit v0.10.2 From db850559a303dc9459d7dd7339bd19a66907a15a Mon Sep 17 00:00:00 2001 From: Mugunthan V N Date: Wed, 18 Dec 2013 21:33:50 +0530 Subject: drivers: net : cpsw: pass proper device name while requesting irq During checking the interrupts with "cat /proc/interrupts", it is showing device name as (null), this change was done with commit id aa1a15e2d where request_irq is changed to devm_request_irq also changing the irq name from platform device name to net device name, but the net device is not registered at this point with the network frame work, so devm_request_irq is called with device name as NULL, by which it is showed as "(null)" in "cat /proc/interrupts". So this patch changes back irq name to platform device name itself in devm_request_irq so that the device name shows as below. Previous to this patch root@am335x-evm:~# cat /proc/interrupts CPU0 28: 2265 INTC 12 edma 30: 80 INTC 14 edma_error 56: 0 INTC 40 (null) 57: 1794 INTC 41 (null) 58: 7 INTC 42 (null) 59: 0 INTC 43 (null) With this patch root@am335x-evm:~# cat /proc/interrupts CPU0 28: 213 INTC 12 edma 30: 9 INTC 14 edma_error 56: 0 INTC 40 4a100000.ethernet 57: 16097 INTC 41 4a100000.ethernet 58: 11964 INTC 42 4a100000.ethernet 59: 0 INTC 43 4a100000.ethernet Signed-off-by: Mugunthan V N Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 614f284..5330fd2 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -2108,7 +2108,7 @@ static int cpsw_probe(struct platform_device *pdev) while ((res = platform_get_resource(priv->pdev, IORESOURCE_IRQ, k))) { for (i = res->start; i <= res->end; i++) { if (devm_request_irq(&pdev->dev, i, cpsw_interrupt, 0, - dev_name(priv->dev), priv)) { + dev_name(&pdev->dev), priv)) { dev_err(priv->dev, "error attaching irq\n"); goto clean_ale_ret; } -- cgit v0.10.2 From 61e7f09d0f437c9614029445754099383ec2eec4 Mon Sep 17 00:00:00 2001 From: Hannes Frederic Sowa Date: Thu, 19 Dec 2013 02:13:36 +0100 Subject: ipv4: consistent reporting of pmtu data in case of corking We report different pmtu values back on the first write and on further writes on an corked socket. Also don't include the dst.header_len (respectively exthdrlen) as this should already be dealt with by the interface mtu of the outgoing (virtual) interface and policy of that interface should dictate if fragmentation should happen. Instead reduce the pmtu data by IP options as we do for IPv6. Make the same changes for ip_append_data, where we did not care about options or dst.header_len at all. Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 9124027..df18461 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -828,7 +828,7 @@ static int __ip_append_data(struct sock *sk, if (cork->length + length > maxnonfragsize - fragheaderlen) { ip_local_error(sk, EMSGSIZE, fl4->daddr, inet->inet_dport, - mtu-exthdrlen); + mtu - (opt ? opt->optlen : 0)); return -EMSGSIZE; } @@ -1151,7 +1151,8 @@ ssize_t ip_append_page(struct sock *sk, struct flowi4 *fl4, struct page *page, mtu : 0xFFFF; if (cork->length + size > maxnonfragsize - fragheaderlen) { - ip_local_error(sk, EMSGSIZE, fl4->daddr, inet->inet_dport, mtu); + ip_local_error(sk, EMSGSIZE, fl4->daddr, inet->inet_dport, + mtu - (opt ? opt->optlen : 0)); return -EMSGSIZE; } -- cgit v0.10.2 From 488574dbc47e0f570330c8c2b56ae299c28ade14 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Fri, 20 Dec 2013 10:58:15 -0800 Subject: gpu: fix qxl missing crc32_le Fix build error: qxl uses crc32 functions so it needs to select CRC32. Also use angle quotes around a kernel header file name. drivers/built-in.o: In function `qxl_display_read_client_monitors_config': (.text+0x19d754): undefined reference to `crc32_le' Signed-off-by: Randy Dunlap Signed-off-by: Dave Airlie diff --git a/drivers/gpu/drm/qxl/Kconfig b/drivers/gpu/drm/qxl/Kconfig index 037d324..66ac0ff 100644 --- a/drivers/gpu/drm/qxl/Kconfig +++ b/drivers/gpu/drm/qxl/Kconfig @@ -8,5 +8,6 @@ config DRM_QXL select DRM_KMS_HELPER select DRM_KMS_FB_HELPER select DRM_TTM + select CRC32 help QXL virtual GPU for Spice virtualization desktop integration. Do not enable this driver unless your distro ships a corresponding X.org QXL driver that can handle kernel modesetting. diff --git a/drivers/gpu/drm/qxl/qxl_display.c b/drivers/gpu/drm/qxl/qxl_display.c index 5e827c2..d70aafb 100644 --- a/drivers/gpu/drm/qxl/qxl_display.c +++ b/drivers/gpu/drm/qxl/qxl_display.c @@ -24,7 +24,7 @@ */ -#include "linux/crc32.h" +#include #include "qxl_drv.h" #include "qxl_object.h" -- cgit v0.10.2 From 2e6d8b469b8000b4aa9b95dfcebd89eafae3e8cf Mon Sep 17 00:00:00 2001 From: Thomas Hellstrom Date: Sat, 21 Dec 2013 22:23:02 +0100 Subject: drm/ttm: Fix swapin regression Commit "drm/ttm: Don't move non-existing data" didn't take the swapped-out corner case into account. This patch corrects that. Fixes blank screen after attempted suspend / hibernate on vmwgfx. Signed-off-by: Thomas Hellstrom Signed-off-by: Dave Airlie diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index 15b86a9..4061521 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -353,7 +353,8 @@ int ttm_bo_move_memcpy(struct ttm_buffer_object *bo, * Don't move nonexistent data. Clear destination instead. */ if (old_iomap == NULL && - (ttm == NULL || ttm->state == tt_unpopulated)) { + (ttm == NULL || (ttm->state == tt_unpopulated && + !(ttm->page_flags & TTM_PAGE_FLAG_SWAPPED)))) { memset_io(new_iomap, 0, new_mem->num_pages*PAGE_SIZE); goto out2; } -- cgit v0.10.2 From 46d01d63221c3508421dd72ff9c879f61053cffc Mon Sep 17 00:00:00 2001 From: Chad Hanson Date: Mon, 23 Dec 2013 17:45:01 -0500 Subject: selinux: fix broken peer recv check Fix a broken networking check. Return an error if peer recv fails. If secmark is active and the packet recv succeeds the peer recv error is ignored. Signed-off-by: Chad Hanson Cc: stable@vger.kernel.org Signed-off-by: Paul Moore diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 419491d..5db2646 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -4334,8 +4334,10 @@ static int selinux_socket_sock_rcv_skb(struct sock *sk, struct sk_buff *skb) } err = avc_has_perm(sk_sid, peer_sid, SECCLASS_PEER, PEER__RECV, &ad); - if (err) + if (err) { selinux_netlbl_err(skb, err, 0); + return err; + } } if (secmark_active) { -- cgit v0.10.2 From c0c1439541f5305b57a83d599af32b74182933fe Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Mon, 23 Dec 2013 17:45:01 -0500 Subject: selinux: selinux_setprocattr()->ptrace_parent() needs rcu_read_lock() selinux_setprocattr() does ptrace_parent(p) under task_lock(p), but task_struct->alloc_lock doesn't pin ->parent or ->ptrace, this looks confusing and triggers the "suspicious RCU usage" warning because ptrace_parent() does rcu_dereference_check(). And in theory this is wrong, spin_lock()->preempt_disable() doesn't necessarily imply rcu_read_lock() we need to access the ->parent. Reported-by: Evan McNabb Signed-off-by: Oleg Nesterov Cc: stable@vger.kernel.org Signed-off-by: Paul Moore diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 5db2646..6625699 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -5588,11 +5588,11 @@ static int selinux_setprocattr(struct task_struct *p, /* Check for ptracing, and update the task SID if ok. Otherwise, leave SID unchanged and fail. */ ptsid = 0; - task_lock(p); + rcu_read_lock(); tracer = ptrace_parent(p); if (tracer) ptsid = task_sid(tracer); - task_unlock(p); + rcu_read_unlock(); if (tracer) { error = avc_has_perm(ptsid, sid, SECCLASS_PROCESS, -- cgit v0.10.2 From f60900f2609e893c7f8d0bccc7ada4947dac4cd5 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Mon, 23 Dec 2013 18:49:30 +0100 Subject: auxvec.h: account for AT_HWCAP2 in AT_VECTOR_SIZE_BASE Commit 2171364d1a92 ("powerpc: Add HWCAP2 aux entry") introduced a new AT_ auxv entry type AT_HWCAP2 but failed to update AT_VECTOR_SIZE_BASE accordingly. Signed-off-by: Ard Biesheuvel Fixes: 2171364d1a92 (powerpc: Add HWCAP2 aux entry) Cc: stable@vger.kernel.org Acked-by: Michael Neuling Cc: Nishanth Aravamudan Cc: Benjamin Herrenschmidt Signed-off-by: Linus Torvalds diff --git a/include/linux/auxvec.h b/include/linux/auxvec.h index 669fef5..3e0fbe4 100644 --- a/include/linux/auxvec.h +++ b/include/linux/auxvec.h @@ -3,6 +3,6 @@ #include -#define AT_VECTOR_SIZE_BASE 19 /* NEW_AUX_ENT entries in auxiliary table */ +#define AT_VECTOR_SIZE_BASE 20 /* NEW_AUX_ENT entries in auxiliary table */ /* number of "#define AT_.*" above, minus {AT_NULL, AT_IGNORE, AT_NOTELF} */ #endif /* _LINUX_AUXVEC_H */ -- cgit v0.10.2 From 38958c15dc640a9249e4f0cd0dfb0ddc7a23464d Mon Sep 17 00:00:00 2001 From: Rajendra Nayak Date: Thu, 12 Dec 2013 15:22:49 +0530 Subject: ARM: DRA7: hwmod: Fix boot crash with DEBUG_LL With commit '7dedd34: ARM: OMAP2+: hwmod: Fix a crash in _setup_reset() with DEBUG_LL' we moved from parsing cmdline to identify uart used for earlycon to using the requsite hwmod CONFIG_DEBUG_OMAPxUARTy FLAGS. On DRA7 though, we seem to be missing this flag, and atleast on the DRA7 EVM where we use uart1 for console, boot fails with DEBUG_LL enabled. Reported-by: Lokesh Vutla Tested-by: Lokesh Vutla # on a different base Signed-off-by: Rajendra Nayak Fixes: 7dedd346941d ("ARM: OMAP2+: hwmod: Fix a crash in _setup_reset() with DEBUG_LL") Signed-off-by: Paul Walmsley diff --git a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c index db32d53..18f333c 100644 --- a/arch/arm/mach-omap2/omap_hwmod_7xx_data.c +++ b/arch/arm/mach-omap2/omap_hwmod_7xx_data.c @@ -1637,7 +1637,7 @@ static struct omap_hwmod dra7xx_uart1_hwmod = { .class = &dra7xx_uart_hwmod_class, .clkdm_name = "l4per_clkdm", .main_clk = "uart1_gfclk_mux", - .flags = HWMOD_SWSUP_SIDLE_ACT, + .flags = HWMOD_SWSUP_SIDLE_ACT | DEBUG_OMAP2UART1_FLAGS, .prcm = { .omap4 = { .clkctrl_offs = DRA7XX_CM_L4PER_UART1_CLKCTRL_OFFSET, -- cgit v0.10.2 From 6d4c88304794442055eaea1c07f3c7b988b8c924 Mon Sep 17 00:00:00 2001 From: Suman Anna Date: Mon, 23 Dec 2013 16:53:11 -0600 Subject: ARM: OMAP2+: hwmod_data: fix missing OMAP_INTC_START in irq data Commit 7d7e1eb (ARM: OMAP2+: Prepare for irqs.h removal) and commit ec2c082 (ARM: OMAP2+: Remove hardcoded IRQs and enable SPARSE_IRQ) updated the way interrupts for OMAP2/3 devices are defined in the HWMOD data structures to being an index plus a fixed offset (defined by OMAP_INTC_START). Couple of irqs in the OMAP2/3 hwmod data were misconfigured completely as they were missing this OMAP_INTC_START relative offset. Add this offset back to fix the incorrect irq data for the following modules: OMAP2 - GPMC, RNG OMAP3 - GPMC, ISP MMU & IVA MMU Signed-off-by: Suman Anna Fixes: 7d7e1eba7e92 ("ARM: OMAP2+: Prepare for irqs.h removal") Fixes: ec2c0825ca31 ("ARM: OMAP2+: Remove hardcoded IRQs and enable SPARSE_IRQ") Cc: Tony Lindgren Signed-off-by: Paul Walmsley diff --git a/arch/arm/mach-omap2/omap_hwmod_2xxx_ipblock_data.c b/arch/arm/mach-omap2/omap_hwmod_2xxx_ipblock_data.c index 56cebb0..d23c77f 100644 --- a/arch/arm/mach-omap2/omap_hwmod_2xxx_ipblock_data.c +++ b/arch/arm/mach-omap2/omap_hwmod_2xxx_ipblock_data.c @@ -796,7 +796,7 @@ struct omap_hwmod omap2xxx_counter_32k_hwmod = { /* gpmc */ static struct omap_hwmod_irq_info omap2xxx_gpmc_irqs[] = { - { .irq = 20 }, + { .irq = 20 + OMAP_INTC_START, }, { .irq = -1 } }; @@ -841,7 +841,7 @@ static struct omap_hwmod_class omap2_rng_hwmod_class = { }; static struct omap_hwmod_irq_info omap2_rng_mpu_irqs[] = { - { .irq = 52 }, + { .irq = 52 + OMAP_INTC_START, }, { .irq = -1 } }; diff --git a/arch/arm/mach-omap2/omap_hwmod_3xxx_data.c b/arch/arm/mach-omap2/omap_hwmod_3xxx_data.c index 9e56fab..3bfb2db 100644 --- a/arch/arm/mach-omap2/omap_hwmod_3xxx_data.c +++ b/arch/arm/mach-omap2/omap_hwmod_3xxx_data.c @@ -2172,7 +2172,7 @@ static struct omap_hwmod_class omap3xxx_gpmc_hwmod_class = { }; static struct omap_hwmod_irq_info omap3xxx_gpmc_irqs[] = { - { .irq = 20 }, + { .irq = 20 + OMAP_INTC_START, }, { .irq = -1 } }; @@ -3006,7 +3006,7 @@ static struct omap_mmu_dev_attr mmu_isp_dev_attr = { static struct omap_hwmod omap3xxx_mmu_isp_hwmod; static struct omap_hwmod_irq_info omap3xxx_mmu_isp_irqs[] = { - { .irq = 24 }, + { .irq = 24 + OMAP_INTC_START, }, { .irq = -1 } }; @@ -3048,7 +3048,7 @@ static struct omap_mmu_dev_attr mmu_iva_dev_attr = { static struct omap_hwmod omap3xxx_mmu_iva_hwmod; static struct omap_hwmod_irq_info omap3xxx_mmu_iva_irqs[] = { - { .irq = 28 }, + { .irq = 28 + OMAP_INTC_START, }, { .irq = -1 } }; -- cgit v0.10.2 From 797f87f83b60685ff8a13fa0572d2f10393c50d3 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Thu, 26 Dec 2013 12:17:00 +0100 Subject: macvlan: fix netdev feature propagation from lower device There are inconsistencies wrt. feature propagation/inheritance between macvlan and the underlying interface. When a feature is turned off on the real device before a macvlan is created on top, these will remain enabled on the macvlan device, whereas turning off the feature on the lower device after macvlan creation the kernel will propagate the changes to the macvlan. The second issue is that, when propagating changes from underlying device to the macvlan interface, macvlan can erronously lose its NETIF_F_LLTX flag, as features are anded with the underlying device. However, LLTX should be kept since it has no dependencies on physical hardware (LLTX is set on macvlan creation regardless of the lower device properties, see 8ffab51b3dfc54876f145f15b351c41f3f703195 (macvlan: lockless tx path). The LLTX flag is now forced regardless of user settings in absence of layer2 hw acceleration (a6cc0cfa72e0b6d9f2c8fd858aa, net: Add layer 2 hardware acceleration operations for macvlan devices). Use netdev_increment_features to rebuild the feature set on capability changes on either the lower device or on the macvlan interface. As pointed out by Ben Hutchings, use netdev_update_features on NETDEV_FEAT_CHANGE event (it calls macvlan_fix_features/netdev_features_change if needed). Signed-off-by: Florian Westphal Signed-off-by: David S. Miller diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index acf9379..60406b0 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -690,8 +690,19 @@ static netdev_features_t macvlan_fix_features(struct net_device *dev, netdev_features_t features) { struct macvlan_dev *vlan = netdev_priv(dev); + netdev_features_t mask; - return features & (vlan->set_features | ~MACVLAN_FEATURES); + features |= NETIF_F_ALL_FOR_ALL; + features &= (vlan->set_features | ~MACVLAN_FEATURES); + mask = features; + + features = netdev_increment_features(vlan->lowerdev->features, + features, + mask); + if (!vlan->fwd_priv) + features |= NETIF_F_LLTX; + + return features; } static const struct ethtool_ops macvlan_ethtool_ops = { @@ -1019,9 +1030,8 @@ static int macvlan_device_event(struct notifier_block *unused, break; case NETDEV_FEAT_CHANGE: list_for_each_entry(vlan, &port->vlans, list) { - vlan->dev->features = dev->features & MACVLAN_FEATURES; vlan->dev->gso_max_size = dev->gso_max_size; - netdev_features_change(vlan->dev); + netdev_update_features(vlan->dev); } break; case NETDEV_UNREGISTER: -- cgit v0.10.2 From db12cf27435356017e7ab375ef5e82a1cc749384 Mon Sep 17 00:00:00 2001 From: Jesper Dangaard Brouer Date: Mon, 16 Dec 2013 17:09:41 +0100 Subject: netfilter: WARN about wrong usage of sequence number adjustments Since commit 41d73ec053d2 (netfilter: nf_conntrack: make sequence number adjustments usuable without NAT), the sequence number extension is dynamically allocated. Instead of dying, give a WARN splash, in case of wrong usage of the seqadj code, e.g. when forgetting to allocate via nfct_seqadj_ext_add(). Wrong usage have been seen in the IPVS code path. Signed-off-by: Jesper Dangaard Brouer Acked-by: Julian Anastasov Signed-off-by: Simon Horman diff --git a/net/netfilter/nf_conntrack_seqadj.c b/net/netfilter/nf_conntrack_seqadj.c index 17c1bcb..b2d38da 100644 --- a/net/netfilter/nf_conntrack_seqadj.c +++ b/net/netfilter/nf_conntrack_seqadj.c @@ -36,6 +36,11 @@ int nf_ct_seqadj_set(struct nf_conn *ct, enum ip_conntrack_info ctinfo, if (off == 0) return 0; + if (unlikely(!seqadj)) { + WARN(1, "Wrong seqadj usage, missing nfct_seqadj_ext_add()\n"); + return 0; + } + set_bit(IPS_SEQ_ADJUST_BIT, &ct->status); spin_lock_bh(&ct->lock); -- cgit v0.10.2 From b25adce1606427fd88da08f5203714cada7f6a98 Mon Sep 17 00:00:00 2001 From: Jesper Dangaard Brouer Date: Mon, 16 Dec 2013 17:09:47 +0100 Subject: ipvs: correct usage/allocation of seqadj ext in ipvs The IPVS FTP helper ip_vs_ftp could trigger an OOPS in nf_ct_seqadj_set, after commit 41d73ec053d2 (netfilter: nf_conntrack: make sequence number adjustments usuable without NAT). This is because, the seqadj ext is now allocated dynamically, and the IPVS code didn't handle this situation. Fix this in the IPVS nfct code by invoking the alloc function nfct_seqadj_ext_add(). Fixes: 41d73ec053d2 (netfilter: nf_conntrack: make sequence number adjustments usuable without NAT) Suggested-by: Julian Anastasov Signed-off-by: Jesper Dangaard Brouer Acked-by: Julian Anastasov Signed-off-by: Simon Horman diff --git a/net/netfilter/ipvs/ip_vs_nfct.c b/net/netfilter/ipvs/ip_vs_nfct.c index c8beafd..5a355a4 100644 --- a/net/netfilter/ipvs/ip_vs_nfct.c +++ b/net/netfilter/ipvs/ip_vs_nfct.c @@ -63,6 +63,7 @@ #include #include #include +#include #include #include @@ -97,6 +98,11 @@ ip_vs_update_conntrack(struct sk_buff *skb, struct ip_vs_conn *cp, int outin) if (CTINFO2DIR(ctinfo) != IP_CT_DIR_ORIGINAL) return; + /* Applications may adjust TCP seqs */ + if (cp->app && nf_ct_protonum(ct) == IPPROTO_TCP && + !nfct_seqadj(ct) && !nfct_seqadj_ext_add(ct)) + return; + /* * The connection is not yet in the hashtable, so we update it. * CIP->VIP will remain the same, so leave the tuple in -- cgit v0.10.2 From 7e367c18c059c638bf6fb540f1decec18d64cb55 Mon Sep 17 00:00:00 2001 From: Tony Lindgren Date: Fri, 27 Dec 2013 09:33:27 -0800 Subject: ARM: OMAP2+: Fix LCD panel backlight regression for LDP legacy booting Looks like the LCD panel on LDP has been broken quite a while, and recently got fixed by commit 0b2aa8bed3e1 (gpio: twl4030: Fix regression for twl gpio output). However, there's still an issue left where the panel backlight does not come on if the LCD drivers are built into the kernel. Fix the issue by registering the DPI LCD panel only after the twl4030 GPIO has probed. Reported-by: Russell King Acked-by: Tomi Valkeinen [tony@atomide.com: updated per Tomi's comments] Signed-off-by: Tony Lindgren diff --git a/arch/arm/mach-omap2/board-ldp.c b/arch/arm/mach-omap2/board-ldp.c index 4ec8d82..44a59c3 100644 --- a/arch/arm/mach-omap2/board-ldp.c +++ b/arch/arm/mach-omap2/board-ldp.c @@ -242,12 +242,18 @@ static void __init ldp_display_init(void) static int ldp_twl_gpio_setup(struct device *dev, unsigned gpio, unsigned ngpio) { + int res; + /* LCD enable GPIO */ ldp_lcd_pdata.enable_gpio = gpio + 7; /* Backlight enable GPIO */ ldp_lcd_pdata.backlight_gpio = gpio + 15; + res = platform_device_register(&ldp_lcd_device); + if (res) + pr_err("Unable to register LCD: %d\n", res); + return 0; } @@ -346,7 +352,6 @@ static struct omap2_hsmmc_info mmc[] __initdata = { static struct platform_device *ldp_devices[] __initdata = { &ldp_gpio_keys_device, - &ldp_lcd_device, }; #ifdef CONFIG_OMAP_MUX -- cgit v0.10.2 From c2349758acf1874e4c2b93fe41d072336f1a31d0 Mon Sep 17 00:00:00 2001 From: Sasha Levin Date: Wed, 18 Dec 2013 23:49:42 -0500 Subject: rds: prevent dereference of a NULL device Binding might result in a NULL device, which is dereferenced causing this BUG: [ 1317.260548] BUG: unable to handle kernel NULL pointer dereference at 000000000000097 4 [ 1317.261847] IP: [] rds_ib_laddr_check+0x82/0x110 [ 1317.263315] PGD 418bcb067 PUD 3ceb21067 PMD 0 [ 1317.263502] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 1317.264179] Dumping ftrace buffer: [ 1317.264774] (ftrace buffer empty) [ 1317.265220] Modules linked in: [ 1317.265824] CPU: 4 PID: 836 Comm: trinity-child46 Tainted: G W 3.13.0-rc4- next-20131218-sasha-00013-g2cebb9b-dirty #4159 [ 1317.267415] task: ffff8803ddf33000 ti: ffff8803cd31a000 task.ti: ffff8803cd31a000 [ 1317.268399] RIP: 0010:[] [] rds_ib_laddr_check+ 0x82/0x110 [ 1317.269670] RSP: 0000:ffff8803cd31bdf8 EFLAGS: 00010246 [ 1317.270230] RAX: 0000000000000000 RBX: ffff88020b0dd388 RCX: 0000000000000000 [ 1317.270230] RDX: ffffffff8439822e RSI: 00000000000c000a RDI: 0000000000000286 [ 1317.270230] RBP: ffff8803cd31be38 R08: 0000000000000000 R09: 0000000000000000 [ 1317.270230] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 [ 1317.270230] R13: 0000000054086700 R14: 0000000000a25de0 R15: 0000000000000031 [ 1317.270230] FS: 00007ff40251d700(0000) GS:ffff88022e200000(0000) knlGS:000000000000 0000 [ 1317.270230] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1317.270230] CR2: 0000000000000974 CR3: 00000003cd478000 CR4: 00000000000006e0 [ 1317.270230] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1317.270230] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602 [ 1317.270230] Stack: [ 1317.270230] 0000000054086700 5408670000a25de0 5408670000000002 0000000000000000 [ 1317.270230] ffffffff84223542 00000000ea54c767 0000000000000000 ffffffff86d26160 [ 1317.270230] ffff8803cd31be68 ffffffff84223556 ffff8803cd31beb8 ffff8800c6765280 [ 1317.270230] Call Trace: [ 1317.270230] [] ? rds_trans_get_preferred+0x42/0xa0 [ 1317.270230] [] rds_trans_get_preferred+0x56/0xa0 [ 1317.270230] [] rds_bind+0x73/0xf0 [ 1317.270230] [] SYSC_bind+0x92/0xf0 [ 1317.270230] [] ? context_tracking_user_exit+0xb8/0x1d0 [ 1317.270230] [] ? trace_hardirqs_on+0xd/0x10 [ 1317.270230] [] ? syscall_trace_enter+0x32/0x290 [ 1317.270230] [] SyS_bind+0xe/0x10 [ 1317.270230] [] tracesys+0xdd/0xe2 [ 1317.270230] Code: 00 8b 45 cc 48 8d 75 d0 48 c7 45 d8 00 00 00 00 66 c7 45 d0 02 00 89 45 d4 48 89 df e8 78 49 76 ff 41 89 c4 85 c0 75 0c 48 8b 03 <80> b8 74 09 00 00 01 7 4 06 41 bc 9d ff ff ff f6 05 2a b6 c2 02 [ 1317.270230] RIP [] rds_ib_laddr_check+0x82/0x110 [ 1317.270230] RSP [ 1317.270230] CR2: 0000000000000974 Signed-off-by: Sasha Levin Signed-off-by: David S. Miller diff --git a/net/rds/ib.c b/net/rds/ib.c index b4c8b00..ba2dffe 100644 --- a/net/rds/ib.c +++ b/net/rds/ib.c @@ -338,7 +338,8 @@ static int rds_ib_laddr_check(__be32 addr) ret = rdma_bind_addr(cm_id, (struct sockaddr *)&sin); /* due to this, we will claim to support iWARP devices unless we check node_type. */ - if (ret || cm_id->device->node_type != RDMA_NODE_IB_CA) + if (ret || !cm_id->device || + cm_id->device->node_type != RDMA_NODE_IB_CA) ret = -EADDRNOTAVAIL; rdsdebug("addr %pI4 ret %d node type %d\n", -- cgit v0.10.2 From 1a29321ed045e3aad23c5f7b63036e465ee3093a Mon Sep 17 00:00:00 2001 From: Jamal Hadi Salim Date: Mon, 23 Dec 2013 08:02:11 -0500 Subject: net_sched: act: Dont increment refcnt on replace This is a bug fix. The existing code tries to kill many birds with one stone: Handling binding of actions to filters, new actions and replacing of action attributes. A simple test case to illustrate: XXXX moja@fe1:~$ sudo tc actions add action drop index 12 moja@fe1:~$ actions get action gact index 12 action order 1: gact action drop random type none pass val 0 index 12 ref 1 bind 0 moja@fe1:~$ sudo tc actions replace action ok index 12 moja@fe1:~$ actions get action gact index 12 action order 1: gact action drop random type none pass val 0 index 12 ref 2 bind 0 XXXX The above shows the refcounf being wrongly incremented on replace. There are more complex scenarios with binding of actions to filters that i am leaving out that didnt work as well... Signed-off-by: Jamal Hadi Salim Signed-off-by: David S. Miller diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c index 5c5edf5..11fe1a4 100644 --- a/net/sched/act_csum.c +++ b/net/sched/act_csum.c @@ -77,16 +77,16 @@ static int tcf_csum_init(struct net *n, struct nlattr *nla, struct nlattr *est, &csum_idx_gen, &csum_hash_info); if (IS_ERR(pc)) return PTR_ERR(pc); - p = to_tcf_csum(pc); ret = ACT_P_CREATED; } else { - p = to_tcf_csum(pc); - if (!ovr) { - tcf_hash_release(pc, bind, &csum_hash_info); + if (bind)/* dont override defaults */ + return 0; + tcf_hash_release(pc, bind, &csum_hash_info); + if (!ovr) return -EEXIST; - } } + p = to_tcf_csum(pc); spin_lock_bh(&p->tcf_lock); p->tcf_action = parm->action; p->update_flags = parm->update_flags; diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c index 5645a4d..eb9ba60 100644 --- a/net/sched/act_gact.c +++ b/net/sched/act_gact.c @@ -102,10 +102,11 @@ static int tcf_gact_init(struct net *net, struct nlattr *nla, return PTR_ERR(pc); ret = ACT_P_CREATED; } else { - if (!ovr) { - tcf_hash_release(pc, bind, &gact_hash_info); + if (bind)/* dont override defaults */ + return 0; + tcf_hash_release(pc, bind, &gact_hash_info); + if (!ovr) return -EEXIST; - } } gact = to_gact(pc); diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c index 882a897..dcbfe8c 100644 --- a/net/sched/act_ipt.c +++ b/net/sched/act_ipt.c @@ -141,10 +141,12 @@ static int tcf_ipt_init(struct net *net, struct nlattr *nla, struct nlattr *est, return PTR_ERR(pc); ret = ACT_P_CREATED; } else { - if (!ovr) { - tcf_ipt_release(to_ipt(pc), bind); + if (bind)/* dont override defaults */ + return 0; + tcf_ipt_release(to_ipt(pc), bind); + + if (!ovr) return -EEXIST; - } } ipt = to_ipt(pc); diff --git a/net/sched/act_nat.c b/net/sched/act_nat.c index 6a15ace..7686953 100644 --- a/net/sched/act_nat.c +++ b/net/sched/act_nat.c @@ -70,15 +70,15 @@ static int tcf_nat_init(struct net *net, struct nlattr *nla, struct nlattr *est, &nat_idx_gen, &nat_hash_info); if (IS_ERR(pc)) return PTR_ERR(pc); - p = to_tcf_nat(pc); ret = ACT_P_CREATED; } else { - p = to_tcf_nat(pc); - if (!ovr) { - tcf_hash_release(pc, bind, &nat_hash_info); + if (bind) + return 0; + tcf_hash_release(pc, bind, &nat_hash_info); + if (!ovr) return -EEXIST; - } } + p = to_tcf_nat(pc); spin_lock_bh(&p->tcf_lock); p->old_addr = parm->old_addr; diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c index 03b6767..7aa2dcd 100644 --- a/net/sched/act_pedit.c +++ b/net/sched/act_pedit.c @@ -84,10 +84,12 @@ static int tcf_pedit_init(struct net *net, struct nlattr *nla, ret = ACT_P_CREATED; } else { p = to_pedit(pc); - if (!ovr) { - tcf_hash_release(pc, bind, &pedit_hash_info); + tcf_hash_release(pc, bind, &pedit_hash_info); + if (bind) + return 0; + if (!ovr) return -EEXIST; - } + if (p->tcfp_nkeys && p->tcfp_nkeys != parm->nkeys) { keys = kmalloc(ksize, GFP_KERNEL); if (keys == NULL) diff --git a/net/sched/act_police.c b/net/sched/act_police.c index 16a62c3..ef246d8 100644 --- a/net/sched/act_police.c +++ b/net/sched/act_police.c @@ -177,10 +177,12 @@ static int tcf_act_police_locate(struct net *net, struct nlattr *nla, if (bind) { police->tcf_bindcnt += 1; police->tcf_refcnt += 1; + return 0; } if (ovr) goto override; - return ret; + /* not replacing */ + return -EEXIST; } } diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c index 31157d3..f7b45ab 100644 --- a/net/sched/act_simple.c +++ b/net/sched/act_simple.c @@ -142,10 +142,13 @@ static int tcf_simp_init(struct net *net, struct nlattr *nla, ret = ACT_P_CREATED; } else { d = to_defact(pc); - if (!ovr) { - tcf_simp_release(d, bind); + + if (bind) + return 0; + tcf_simp_release(d, bind); + if (!ovr) return -EEXIST; - } + reset_policy(d, defdata, parm); } diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c index 35ea643..8fe9d25 100644 --- a/net/sched/act_skbedit.c +++ b/net/sched/act_skbedit.c @@ -120,10 +120,11 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla, ret = ACT_P_CREATED; } else { d = to_skbedit(pc); - if (!ovr) { - tcf_hash_release(pc, bind, &skbedit_hash_info); + if (bind) + return 0; + tcf_hash_release(pc, bind, &skbedit_hash_info); + if (!ovr) return -EEXIST; - } } spin_lock_bh(&d->tcf_lock); -- cgit v0.10.2 From 375679104ab3ccfd18dcbd7ba503734fb9a2c63a Mon Sep 17 00:00:00 2001 From: Nithin Sujir Date: Thu, 19 Dec 2013 17:44:11 -0800 Subject: tg3: Expand 4g_overflow_test workaround to skb fragments of any size. The current driver assumes that an skb fragment can only be upto jumbo size. Presumably this was a fast-path optimization. This assumption is no longer true as fragments can be upto 32k. v2: Remove unnecessary parantheses per Eric Dumazet. Cc: stable@vger.kernel.org Signed-off-by: Nithin Nayak Sujir Signed-off-by: Michael Chan Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index f3dd93b..15a66e4 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -7622,7 +7622,7 @@ static inline int tg3_4g_overflow_test(dma_addr_t mapping, int len) { u32 base = (u32) mapping & 0xffffffff; - return (base > 0xffffdcc0) && (base + len + 8 < base); + return base + len + 8 < base; } /* Test for TSO DMA buffers that cross into regions which are within MSS bytes -- cgit v0.10.2 From 37ec274e9713eafc2ba6c4471420f06cb8f68ecf Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 19 Dec 2013 18:10:40 -0800 Subject: arc_emac: fix potential use after free Signed-off-by: Eric Dumazet skb_tx_timestamp(skb) should be called _before_ TX completion has a chance to trigger, otherwise it is too late and we access freed memory. Fixes: e4f2379db6c6 ("ethernet/arc/arc_emac - Add new driver") From: Eric Dumazet Cc: Alexey Brodkin Cc: Richard Cochran Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/arc/emac_main.c b/drivers/net/ethernet/arc/emac_main.c index b2ffad1..248baf6 100644 --- a/drivers/net/ethernet/arc/emac_main.c +++ b/drivers/net/ethernet/arc/emac_main.c @@ -565,6 +565,8 @@ static int arc_emac_tx(struct sk_buff *skb, struct net_device *ndev) /* Make sure pointer to data buffer is set */ wmb(); + skb_tx_timestamp(skb); + *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len); /* Increment index to point to the next BD */ @@ -579,8 +581,6 @@ static int arc_emac_tx(struct sk_buff *skb, struct net_device *ndev) arc_reg_set(priv, R_STATUS, TXPL_MASK); - skb_tx_timestamp(skb); - return NETDEV_TX_OK; } -- cgit v0.10.2 From 73409f3b0ff0ae7bc1f647936b23e6d5d5dcbe28 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Fri, 27 Dec 2013 13:04:33 -0500 Subject: net: Add some clarification to skb_tx_timestamp() comment. We've seen so many instances of people invoking skb_tx_timestamp() after the device already has been given the packet, that it's worth being a little bit more verbose and explicit in this comment. Signed-off-by: David S. Miller diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 6aae838..6f69b3f 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2531,6 +2531,10 @@ static inline void sw_tx_timestamp(struct sk_buff *skb) * Ethernet MAC Drivers should call this function in their hard_xmit() * function immediately before giving the sk_buff to the MAC hardware. * + * Specifically, one should make absolutely sure that this function is + * called before TX completion of this packet can trigger. Otherwise + * the packet could potentially already be freed. + * * @skb: A socket buffer. */ static inline void skb_tx_timestamp(struct sk_buff *skb) -- cgit v0.10.2 From 4710b2ba873692194c636811ceda398f95e02db2 Mon Sep 17 00:00:00 2001 From: David Gibson Date: Fri, 20 Dec 2013 15:10:44 +1100 Subject: netxen: Correct off-by-one errors in bounds checks netxen_process_lro() contains two bounds checks. One for the ring number against the number of rings, and one for the Rx buffer ID against the array of receive buffers. Both of these have off-by-one errors, using > instead of >=. The correct versions are used in netxen_process_rcv(), they're just wrong in netxen_process_lro(). Signed-off-by: David Gibson Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c index 7692dfd..cc68657 100644 --- a/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c +++ b/drivers/net/ethernet/qlogic/netxen/netxen_nic_init.c @@ -1604,13 +1604,13 @@ netxen_process_lro(struct netxen_adapter *adapter, u32 seq_number; u8 vhdr_len = 0; - if (unlikely(ring > adapter->max_rds_rings)) + if (unlikely(ring >= adapter->max_rds_rings)) return NULL; rds_ring = &recv_ctx->rds_rings[ring]; index = netxen_get_lro_sts_refhandle(sts_data0); - if (unlikely(index > rds_ring->num_desc)) + if (unlikely(index >= rds_ring->num_desc)) return NULL; buffer = &rds_ring->rx_buf_arr[index]; -- cgit v0.10.2 From 6a9eadccff2926e392173a989042f14c867cffbf Mon Sep 17 00:00:00 2001 From: Li RongQing Date: Fri, 20 Dec 2013 17:20:12 +0800 Subject: ipv6: release dst properly in ipip6_tunnel_xmit if a dst is not attached to anywhere, it should be released before exit ipip6_tunnel_xmit, otherwise cause dst memory leakage. Fixes: 61c1db7fae21 ("ipv6: sit: add GSO/TSO support") Signed-off-by: Li RongQing Acked-by: Eric Dumazet Signed-off-by: David S. Miller diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index a710fde..c874822 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -966,8 +966,10 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb, tos = INET_ECN_encapsulate(tos, ipv6_get_dsfield(iph6)); skb = iptunnel_handle_offloads(skb, false, SKB_GSO_SIT); - if (IS_ERR(skb)) + if (IS_ERR(skb)) { + ip_rt_put(rt); goto out; + } err = iptunnel_xmit(rt, skb, fl4.saddr, fl4.daddr, IPPROTO_IPV6, tos, ttl, df, !net_eq(tunnel->net, dev_net(dev))); -- cgit v0.10.2 From e38195bf32d7ccb2ae3f56f36b895daf455ffd94 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Tue, 24 Dec 2013 18:32:35 +0100 Subject: netfilter: nf_tables: fix dumping with large number of sets If not table name is specified, the dumping of the existing sets may be incomplete with a sufficiently large number of sets and tables. This patch fixes missing reset of the cursors after finding the location of the last object that has been included in the previous multi-part message. Signed-off-by: Pablo Neira Ayuso diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index f93b7d0..d9fcd27 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -2098,17 +2098,21 @@ static int nf_tables_dump_sets_all(struct nft_ctx *ctx, struct sk_buff *skb, struct netlink_callback *cb) { const struct nft_set *set; - unsigned int idx = 0, s_idx = cb->args[0]; + unsigned int idx, s_idx = cb->args[0]; struct nft_table *table, *cur_table = (struct nft_table *)cb->args[2]; if (cb->args[1]) return skb->len; list_for_each_entry(table, &ctx->afi->tables, list) { - if (cur_table && cur_table != table) - continue; + if (cur_table) { + if (cur_table != table) + continue; + cur_table = NULL; + } ctx->table = table; + idx = 0; list_for_each_entry(set, &ctx->table->sets, list) { if (idx < s_idx) goto cont; -- cgit v0.10.2 From d2012975619251bdfeb7a5159faa7727ea9cddd3 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Fri, 27 Dec 2013 10:44:23 +0100 Subject: netfilter: nf_tables: fix oops when updating table with user chains This patch fixes a crash while trying to deactivate a table that contains user chains. You can reproduce it via: % nft add table table1 % nft add chain table1 chain1 % nft-table-upd ip table1 dormant [ 253.021026] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 [ 253.021114] IP: [] nf_register_hook+0x35/0x6f [ 253.021167] PGD 30fa5067 PUD 30fa2067 PMD 0 [ 253.021208] Oops: 0000 [#1] SMP [...] [ 253.023305] Call Trace: [ 253.023331] [] nf_tables_newtable+0x11c/0x258 [nf_tables] [ 253.023385] [] nfnetlink_rcv_msg+0x1f4/0x226 [nfnetlink] [ 253.023438] [] ? nfnetlink_rcv_msg+0x7a/0x226 [nfnetlink] [ 253.023491] [] ? nfnetlink_bind+0x45/0x45 [nfnetlink] [ 253.023542] [] netlink_rcv_skb+0x3c/0x88 [ 253.023586] [] nfnetlink_rcv+0x3af/0x3e4 [nfnetlink] [ 253.023638] [] ? _raw_read_unlock+0x22/0x34 [ 253.023683] [] netlink_unicast+0xe2/0x161 [ 253.023727] [] netlink_sendmsg+0x304/0x332 [ 253.023773] [] __sock_sendmsg_nosec+0x25/0x27 [ 253.023820] [] sock_sendmsg+0x5a/0x7b [ 253.023861] [] ? copy_from_user+0x2a/0x2c [ 253.023905] [] ? move_addr_to_kernel+0x35/0x60 [ 253.023952] [] SYSC_sendto+0x119/0x15c [ 253.023995] [] ? sysret_check+0x1b/0x56 [ 253.024039] [] ? trace_hardirqs_on_caller+0x140/0x1db [ 253.024090] [] ? trace_hardirqs_on_thunk+0x3a/0x3f [ 253.024141] [] SyS_sendto+0x9/0xb [ 253.026219] [] system_call_fastpath+0x16/0x1b Reported-by: Alex Wei Signed-off-by: Pablo Neira Ayuso diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index d9fcd27..d65c80b 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -312,6 +312,9 @@ static int nf_tables_table_enable(struct nft_table *table) int err, i = 0; list_for_each_entry(chain, &table->chains, list) { + if (!(chain->flags & NFT_BASE_CHAIN)) + continue; + err = nf_register_hook(&nft_base_chain(chain)->ops); if (err < 0) goto err; @@ -321,6 +324,9 @@ static int nf_tables_table_enable(struct nft_table *table) return 0; err: list_for_each_entry(chain, &table->chains, list) { + if (!(chain->flags & NFT_BASE_CHAIN)) + continue; + if (i-- <= 0) break; @@ -333,8 +339,10 @@ static int nf_tables_table_disable(struct nft_table *table) { struct nft_chain *chain; - list_for_each_entry(chain, &table->chains, list) - nf_unregister_hook(&nft_base_chain(chain)->ops); + list_for_each_entry(chain, &table->chains, list) { + if (chain->flags & NFT_BASE_CHAIN) + nf_unregister_hook(&nft_base_chain(chain)->ops); + } return 0; } -- cgit v0.10.2 From 46b76e0b8b5e21fa5a2387d6f72b193514e7f722 Mon Sep 17 00:00:00 2001 From: Simon Wunderlich Date: Mon, 2 Dec 2013 20:38:30 +0100 Subject: batman-adv: fix alignment for batadv_coded_packet The compiler may decide to pad the structure, and then it does not have the expected size of 46 byte. Fix this by moving it in the pragma pack(2) part of the code. Signed-off-by: Simon Wunderlich Signed-off-by: Marek Lindner Signed-off-by: Antonio Quartulli diff --git a/net/batman-adv/packet.h b/net/batman-adv/packet.h index 207459b..10597a6 100644 --- a/net/batman-adv/packet.h +++ b/net/batman-adv/packet.h @@ -315,8 +315,6 @@ struct batadv_bcast_packet { */ }; -#pragma pack() - /** * struct batadv_coded_packet - network coded packet * @header: common batman packet header and ttl of first included packet @@ -349,6 +347,8 @@ struct batadv_coded_packet { __be16 coded_len; }; +#pragma pack() + /** * struct batadv_unicast_tvlv - generic unicast packet with tvlv payload * @header: common batman packet header -- cgit v0.10.2 From a40d9b075c21f06872de3f05cc2eb3d06665e2ff Mon Sep 17 00:00:00 2001 From: Simon Wunderlich Date: Mon, 2 Dec 2013 20:38:31 +0100 Subject: batman-adv: fix header alignment by unrolling batadv_header The size of the batadv_header of 3 is problematic on some architectures which automatically pad all structures to a 32 bit boundary. To not lose performance by packing this struct, better embed it into the various host structures. Reported-by: Russell King Signed-off-by: Simon Wunderlich Signed-off-by: Marek Lindner Signed-off-by: Antonio Quartulli diff --git a/net/batman-adv/bat_iv_ogm.c b/net/batman-adv/bat_iv_ogm.c index a2b480a..b9c8a6e 100644 --- a/net/batman-adv/bat_iv_ogm.c +++ b/net/batman-adv/bat_iv_ogm.c @@ -307,9 +307,9 @@ static int batadv_iv_ogm_iface_enable(struct batadv_hard_iface *hard_iface) hard_iface->bat_iv.ogm_buff = ogm_buff; batadv_ogm_packet = (struct batadv_ogm_packet *)ogm_buff; - batadv_ogm_packet->header.packet_type = BATADV_IV_OGM; - batadv_ogm_packet->header.version = BATADV_COMPAT_VERSION; - batadv_ogm_packet->header.ttl = 2; + batadv_ogm_packet->packet_type = BATADV_IV_OGM; + batadv_ogm_packet->version = BATADV_COMPAT_VERSION; + batadv_ogm_packet->ttl = 2; batadv_ogm_packet->flags = BATADV_NO_FLAGS; batadv_ogm_packet->reserved = 0; batadv_ogm_packet->tq = BATADV_TQ_MAX_VALUE; @@ -346,7 +346,7 @@ batadv_iv_ogm_primary_iface_set(struct batadv_hard_iface *hard_iface) batadv_ogm_packet = (struct batadv_ogm_packet *)ogm_buff; batadv_ogm_packet->flags = BATADV_PRIMARIES_FIRST_HOP; - batadv_ogm_packet->header.ttl = BATADV_TTL; + batadv_ogm_packet->ttl = BATADV_TTL; } /* when do we schedule our own ogm to be sent */ @@ -435,7 +435,7 @@ static void batadv_iv_ogm_send_to_if(struct batadv_forw_packet *forw_packet, fwd_str, (packet_num > 0 ? "aggregated " : ""), batadv_ogm_packet->orig, ntohl(batadv_ogm_packet->seqno), - batadv_ogm_packet->tq, batadv_ogm_packet->header.ttl, + batadv_ogm_packet->tq, batadv_ogm_packet->ttl, (batadv_ogm_packet->flags & BATADV_DIRECTLINK ? "on" : "off"), hard_iface->net_dev->name, @@ -491,7 +491,7 @@ static void batadv_iv_ogm_emit(struct batadv_forw_packet *forw_packet) /* multihomed peer assumed * non-primary OGMs are only broadcasted on their interface */ - if ((directlink && (batadv_ogm_packet->header.ttl == 1)) || + if ((directlink && (batadv_ogm_packet->ttl == 1)) || (forw_packet->own && (forw_packet->if_incoming != primary_if))) { /* FIXME: what about aggregated packets ? */ batadv_dbg(BATADV_DBG_BATMAN, bat_priv, @@ -499,7 +499,7 @@ static void batadv_iv_ogm_emit(struct batadv_forw_packet *forw_packet) (forw_packet->own ? "Sending own" : "Forwarding"), batadv_ogm_packet->orig, ntohl(batadv_ogm_packet->seqno), - batadv_ogm_packet->header.ttl, + batadv_ogm_packet->ttl, forw_packet->if_incoming->net_dev->name, forw_packet->if_incoming->net_dev->dev_addr); @@ -572,7 +572,7 @@ batadv_iv_ogm_can_aggregate(const struct batadv_ogm_packet *new_bat_ogm_packet, */ if ((!directlink) && (!(batadv_ogm_packet->flags & BATADV_DIRECTLINK)) && - (batadv_ogm_packet->header.ttl != 1) && + (batadv_ogm_packet->ttl != 1) && /* own packets originating non-primary * interfaces leave only that interface @@ -587,7 +587,7 @@ batadv_iv_ogm_can_aggregate(const struct batadv_ogm_packet *new_bat_ogm_packet, * interface only - we still can aggregate */ if ((directlink) && - (new_bat_ogm_packet->header.ttl == 1) && + (new_bat_ogm_packet->ttl == 1) && (forw_packet->if_incoming == if_incoming) && /* packets from direct neighbors or @@ -778,7 +778,7 @@ static void batadv_iv_ogm_forward(struct batadv_orig_node *orig_node, struct batadv_priv *bat_priv = netdev_priv(if_incoming->soft_iface); uint16_t tvlv_len; - if (batadv_ogm_packet->header.ttl <= 1) { + if (batadv_ogm_packet->ttl <= 1) { batadv_dbg(BATADV_DBG_BATMAN, bat_priv, "ttl exceeded\n"); return; } @@ -798,7 +798,7 @@ static void batadv_iv_ogm_forward(struct batadv_orig_node *orig_node, tvlv_len = ntohs(batadv_ogm_packet->tvlv_len); - batadv_ogm_packet->header.ttl--; + batadv_ogm_packet->ttl--; memcpy(batadv_ogm_packet->prev_sender, ethhdr->h_source, ETH_ALEN); /* apply hop penalty */ @@ -807,7 +807,7 @@ static void batadv_iv_ogm_forward(struct batadv_orig_node *orig_node, batadv_dbg(BATADV_DBG_BATMAN, bat_priv, "Forwarding packet: tq: %i, ttl: %i\n", - batadv_ogm_packet->tq, batadv_ogm_packet->header.ttl); + batadv_ogm_packet->tq, batadv_ogm_packet->ttl); /* switch of primaries first hop flag when forwarding */ batadv_ogm_packet->flags &= ~BATADV_PRIMARIES_FIRST_HOP; @@ -972,8 +972,8 @@ batadv_iv_ogm_orig_update(struct batadv_priv *bat_priv, spin_unlock_bh(&neigh_node->bat_iv.lq_update_lock); if (dup_status == BATADV_NO_DUP) { - orig_node->last_ttl = batadv_ogm_packet->header.ttl; - neigh_node->last_ttl = batadv_ogm_packet->header.ttl; + orig_node->last_ttl = batadv_ogm_packet->ttl; + neigh_node->last_ttl = batadv_ogm_packet->ttl; } batadv_bonding_candidate_add(bat_priv, orig_node, neigh_node); @@ -1247,7 +1247,7 @@ static void batadv_iv_ogm_process(const struct ethhdr *ethhdr, * packet in an aggregation. Here we expect that the padding * is always zero (or not 0x01) */ - if (batadv_ogm_packet->header.packet_type != BATADV_IV_OGM) + if (batadv_ogm_packet->packet_type != BATADV_IV_OGM) return; /* could be changed by schedule_own_packet() */ @@ -1267,8 +1267,8 @@ static void batadv_iv_ogm_process(const struct ethhdr *ethhdr, if_incoming->net_dev->dev_addr, batadv_ogm_packet->orig, batadv_ogm_packet->prev_sender, ntohl(batadv_ogm_packet->seqno), batadv_ogm_packet->tq, - batadv_ogm_packet->header.ttl, - batadv_ogm_packet->header.version, has_directlink_flag); + batadv_ogm_packet->ttl, + batadv_ogm_packet->version, has_directlink_flag); rcu_read_lock(); list_for_each_entry_rcu(hard_iface, &batadv_hardif_list, list) { @@ -1433,7 +1433,7 @@ static void batadv_iv_ogm_process(const struct ethhdr *ethhdr, * seqno and similar ttl as the non-duplicate */ sameseq = orig_node->last_real_seqno == ntohl(batadv_ogm_packet->seqno); - similar_ttl = orig_node->last_ttl - 3 <= batadv_ogm_packet->header.ttl; + similar_ttl = orig_node->last_ttl - 3 <= batadv_ogm_packet->ttl; if (is_bidirect && ((dup_status == BATADV_NO_DUP) || (sameseq && similar_ttl))) batadv_iv_ogm_orig_update(bat_priv, orig_node, ethhdr, diff --git a/net/batman-adv/distributed-arp-table.c b/net/batman-adv/distributed-arp-table.c index 6c8c393..b316a4c 100644 --- a/net/batman-adv/distributed-arp-table.c +++ b/net/batman-adv/distributed-arp-table.c @@ -349,7 +349,7 @@ static void batadv_dbg_arp(struct batadv_priv *bat_priv, struct sk_buff *skb, unicast_4addr_packet = (struct batadv_unicast_4addr_packet *)skb->data; - switch (unicast_4addr_packet->u.header.packet_type) { + switch (unicast_4addr_packet->u.packet_type) { case BATADV_UNICAST: batadv_dbg(BATADV_DBG_DAT, bat_priv, "* encapsulated within a UNICAST packet\n"); @@ -374,7 +374,7 @@ static void batadv_dbg_arp(struct batadv_priv *bat_priv, struct sk_buff *skb, break; default: batadv_dbg(BATADV_DBG_DAT, bat_priv, "* type: Unknown (%u)!\n", - unicast_4addr_packet->u.header.packet_type); + unicast_4addr_packet->u.packet_type); } break; case BATADV_BCAST: @@ -387,7 +387,7 @@ static void batadv_dbg_arp(struct batadv_priv *bat_priv, struct sk_buff *skb, default: batadv_dbg(BATADV_DBG_DAT, bat_priv, "* encapsulated within an unknown packet type (0x%x)\n", - unicast_4addr_packet->u.header.packet_type); + unicast_4addr_packet->u.packet_type); } } diff --git a/net/batman-adv/fragmentation.c b/net/batman-adv/fragmentation.c index 271d321..6ddb614 100644 --- a/net/batman-adv/fragmentation.c +++ b/net/batman-adv/fragmentation.c @@ -355,7 +355,7 @@ bool batadv_frag_skb_fwd(struct sk_buff *skb, batadv_add_counter(bat_priv, BATADV_CNT_FRAG_FWD_BYTES, skb->len + ETH_HLEN); - packet->header.ttl--; + packet->ttl--; batadv_send_skb_packet(skb, neigh_node->if_incoming, neigh_node->addr); ret = true; @@ -444,9 +444,9 @@ bool batadv_frag_send_packet(struct sk_buff *skb, goto out_err; /* Create one header to be copied to all fragments */ - frag_header.header.packet_type = BATADV_UNICAST_FRAG; - frag_header.header.version = BATADV_COMPAT_VERSION; - frag_header.header.ttl = BATADV_TTL; + frag_header.packet_type = BATADV_UNICAST_FRAG; + frag_header.version = BATADV_COMPAT_VERSION; + frag_header.ttl = BATADV_TTL; frag_header.seqno = htons(atomic_inc_return(&bat_priv->frag_seqno)); frag_header.reserved = 0; frag_header.no = 0; diff --git a/net/batman-adv/icmp_socket.c b/net/batman-adv/icmp_socket.c index 29ae4ef..130cc32 100644 --- a/net/batman-adv/icmp_socket.c +++ b/net/batman-adv/icmp_socket.c @@ -194,7 +194,7 @@ static ssize_t batadv_socket_write(struct file *file, const char __user *buff, goto free_skb; } - if (icmp_header->header.packet_type != BATADV_ICMP) { + if (icmp_header->packet_type != BATADV_ICMP) { batadv_dbg(BATADV_DBG_BATMAN, bat_priv, "Error - can't send packet from char device: got bogus packet type (expected: BAT_ICMP)\n"); len = -EINVAL; @@ -243,9 +243,9 @@ static ssize_t batadv_socket_write(struct file *file, const char __user *buff, icmp_header->uid = socket_client->index; - if (icmp_header->header.version != BATADV_COMPAT_VERSION) { + if (icmp_header->version != BATADV_COMPAT_VERSION) { icmp_header->msg_type = BATADV_PARAMETER_PROBLEM; - icmp_header->header.version = BATADV_COMPAT_VERSION; + icmp_header->version = BATADV_COMPAT_VERSION; batadv_socket_add_packet(socket_client, icmp_header, packet_len); goto free_skb; diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c index c51a5e5..d87778b 100644 --- a/net/batman-adv/main.c +++ b/net/batman-adv/main.c @@ -383,17 +383,17 @@ int batadv_batman_skb_recv(struct sk_buff *skb, struct net_device *dev, batadv_ogm_packet = (struct batadv_ogm_packet *)skb->data; - if (batadv_ogm_packet->header.version != BATADV_COMPAT_VERSION) { + if (batadv_ogm_packet->version != BATADV_COMPAT_VERSION) { batadv_dbg(BATADV_DBG_BATMAN, bat_priv, "Drop packet: incompatible batman version (%i)\n", - batadv_ogm_packet->header.version); + batadv_ogm_packet->version); goto err_free; } /* all receive handlers return whether they received or reused * the supplied skb. if not, we have to free the skb. */ - idx = batadv_ogm_packet->header.packet_type; + idx = batadv_ogm_packet->packet_type; ret = (*batadv_rx_handler[idx])(skb, hard_iface); if (ret == NET_RX_DROP) @@ -1119,9 +1119,9 @@ void batadv_tvlv_unicast_send(struct batadv_priv *bat_priv, uint8_t *src, skb_reserve(skb, ETH_HLEN); tvlv_buff = skb_put(skb, sizeof(*unicast_tvlv_packet) + tvlv_len); unicast_tvlv_packet = (struct batadv_unicast_tvlv_packet *)tvlv_buff; - unicast_tvlv_packet->header.packet_type = BATADV_UNICAST_TVLV; - unicast_tvlv_packet->header.version = BATADV_COMPAT_VERSION; - unicast_tvlv_packet->header.ttl = BATADV_TTL; + unicast_tvlv_packet->packet_type = BATADV_UNICAST_TVLV; + unicast_tvlv_packet->version = BATADV_COMPAT_VERSION; + unicast_tvlv_packet->ttl = BATADV_TTL; unicast_tvlv_packet->reserved = 0; unicast_tvlv_packet->tvlv_len = htons(tvlv_len); unicast_tvlv_packet->align = 0; diff --git a/net/batman-adv/network-coding.c b/net/batman-adv/network-coding.c index 351e199..511d7e1 100644 --- a/net/batman-adv/network-coding.c +++ b/net/batman-adv/network-coding.c @@ -722,7 +722,7 @@ static bool batadv_can_nc_with_orig(struct batadv_priv *bat_priv, { if (orig_node->last_real_seqno != ntohl(ogm_packet->seqno)) return false; - if (orig_node->last_ttl != ogm_packet->header.ttl + 1) + if (orig_node->last_ttl != ogm_packet->ttl + 1) return false; if (!batadv_compare_eth(ogm_packet->orig, ogm_packet->prev_sender)) return false; @@ -1082,9 +1082,9 @@ static bool batadv_nc_code_packets(struct batadv_priv *bat_priv, coded_packet = (struct batadv_coded_packet *)skb_dest->data; skb_reset_mac_header(skb_dest); - coded_packet->header.packet_type = BATADV_CODED; - coded_packet->header.version = BATADV_COMPAT_VERSION; - coded_packet->header.ttl = packet1->header.ttl; + coded_packet->packet_type = BATADV_CODED; + coded_packet->version = BATADV_COMPAT_VERSION; + coded_packet->ttl = packet1->ttl; /* Info about first unicast packet */ memcpy(coded_packet->first_source, first_source, ETH_ALEN); @@ -1097,7 +1097,7 @@ static bool batadv_nc_code_packets(struct batadv_priv *bat_priv, memcpy(coded_packet->second_source, second_source, ETH_ALEN); memcpy(coded_packet->second_orig_dest, packet2->dest, ETH_ALEN); coded_packet->second_crc = packet_id2; - coded_packet->second_ttl = packet2->header.ttl; + coded_packet->second_ttl = packet2->ttl; coded_packet->second_ttvn = packet2->ttvn; coded_packet->coded_len = htons(coding_len); @@ -1452,7 +1452,7 @@ bool batadv_nc_skb_forward(struct sk_buff *skb, /* We only handle unicast packets */ payload = skb_network_header(skb); packet = (struct batadv_unicast_packet *)payload; - if (packet->header.packet_type != BATADV_UNICAST) + if (packet->packet_type != BATADV_UNICAST) goto out; /* Try to find a coding opportunity and send the skb if one is found */ @@ -1505,7 +1505,7 @@ void batadv_nc_skb_store_for_decoding(struct batadv_priv *bat_priv, /* Check for supported packet type */ payload = skb_network_header(skb); packet = (struct batadv_unicast_packet *)payload; - if (packet->header.packet_type != BATADV_UNICAST) + if (packet->packet_type != BATADV_UNICAST) goto out; /* Find existing nc_path or create a new */ @@ -1623,7 +1623,7 @@ batadv_nc_skb_decode_packet(struct batadv_priv *bat_priv, struct sk_buff *skb, ttvn = coded_packet_tmp.second_ttvn; } else { orig_dest = coded_packet_tmp.first_orig_dest; - ttl = coded_packet_tmp.header.ttl; + ttl = coded_packet_tmp.ttl; ttvn = coded_packet_tmp.first_ttvn; } @@ -1648,9 +1648,9 @@ batadv_nc_skb_decode_packet(struct batadv_priv *bat_priv, struct sk_buff *skb, /* Create decoded unicast packet */ unicast_packet = (struct batadv_unicast_packet *)skb->data; - unicast_packet->header.packet_type = BATADV_UNICAST; - unicast_packet->header.version = BATADV_COMPAT_VERSION; - unicast_packet->header.ttl = ttl; + unicast_packet->packet_type = BATADV_UNICAST; + unicast_packet->version = BATADV_COMPAT_VERSION; + unicast_packet->ttl = ttl; memcpy(unicast_packet->dest, orig_dest, ETH_ALEN); unicast_packet->ttvn = ttvn; diff --git a/net/batman-adv/packet.h b/net/batman-adv/packet.h index 10597a6..175ce7d 100644 --- a/net/batman-adv/packet.h +++ b/net/batman-adv/packet.h @@ -164,23 +164,18 @@ struct batadv_bla_claim_dst { __be16 group; /* group id */ }; -struct batadv_header { - uint8_t packet_type; - uint8_t version; /* batman version field */ - uint8_t ttl; - /* the parent struct has to add a byte after the header to make - * everything 4 bytes aligned again - */ -}; - /** * struct batadv_ogm_packet - ogm (routing protocol) packet - * @header: common batman packet header + * @packet_type: batman-adv packet type, part of the general header + * @version: batman-adv protocol version, part of the genereal header + * @ttl: time to live for this packet, part of the genereal header * @flags: contains routing relevant flags - see enum batadv_iv_flags * @tvlv_len: length of tvlv data following the ogm header */ struct batadv_ogm_packet { - struct batadv_header header; + uint8_t packet_type; + uint8_t version; + uint8_t ttl; uint8_t flags; __be32 seqno; uint8_t orig[ETH_ALEN]; @@ -197,14 +192,18 @@ struct batadv_ogm_packet { /** * batadv_icmp_header - common ICMP header - * @header: common batman header + * @packet_type: batman-adv packet type, part of the general header + * @version: batman-adv protocol version, part of the genereal header + * @ttl: time to live for this packet, part of the genereal header * @msg_type: ICMP packet type * @dst: address of the destination node * @orig: address of the source node * @uid: local ICMP socket identifier */ struct batadv_icmp_header { - struct batadv_header header; + uint8_t packet_type; + uint8_t version; + uint8_t ttl; uint8_t msg_type; /* see ICMP message types above */ uint8_t dst[ETH_ALEN]; uint8_t orig[ETH_ALEN]; @@ -253,8 +252,18 @@ struct batadv_icmp_packet_rr { */ #pragma pack(2) +/** + * struct batadv_unicast_packet - unicast packet for network payload + * @packet_type: batman-adv packet type, part of the general header + * @version: batman-adv protocol version, part of the genereal header + * @ttl: time to live for this packet, part of the genereal header + * @ttvn: translation table version number + * @dest: originator destination of the unicast packet + */ struct batadv_unicast_packet { - struct batadv_header header; + uint8_t packet_type; + uint8_t version; + uint8_t ttl; uint8_t ttvn; /* destination translation table version number */ uint8_t dest[ETH_ALEN]; /* "4 bytes boundary + 2 bytes" long to make the payload after the @@ -280,7 +289,9 @@ struct batadv_unicast_4addr_packet { /** * struct batadv_frag_packet - fragmented packet - * @header: common batman packet header with type, compatversion, and ttl + * @packet_type: batman-adv packet type, part of the general header + * @version: batman-adv protocol version, part of the genereal header + * @ttl: time to live for this packet, part of the genereal header * @dest: final destination used when routing fragments * @orig: originator of the fragment used when merging the packet * @no: fragment number within this sequence @@ -289,7 +300,9 @@ struct batadv_unicast_4addr_packet { * @total_size: size of the merged packet */ struct batadv_frag_packet { - struct batadv_header header; + uint8_t packet_type; + uint8_t version; /* batman version field */ + uint8_t ttl; #if defined(__BIG_ENDIAN_BITFIELD) uint8_t no:4; uint8_t reserved:4; @@ -305,8 +318,19 @@ struct batadv_frag_packet { __be16 total_size; }; +/** + * struct batadv_bcast_packet - broadcast packet for network payload + * @packet_type: batman-adv packet type, part of the general header + * @version: batman-adv protocol version, part of the genereal header + * @ttl: time to live for this packet, part of the genereal header + * @reserved: reserved byte for alignment + * @seqno: sequence identification + * @orig: originator of the broadcast packet + */ struct batadv_bcast_packet { - struct batadv_header header; + uint8_t packet_type; + uint8_t version; /* batman version field */ + uint8_t ttl; uint8_t reserved; __be32 seqno; uint8_t orig[ETH_ALEN]; @@ -317,7 +341,9 @@ struct batadv_bcast_packet { /** * struct batadv_coded_packet - network coded packet - * @header: common batman packet header and ttl of first included packet + * @packet_type: batman-adv packet type, part of the general header + * @version: batman-adv protocol version, part of the genereal header + * @ttl: time to live for this packet, part of the genereal header * @reserved: Align following fields to 2-byte boundaries * @first_source: original source of first included packet * @first_orig_dest: original destinal of first included packet @@ -332,7 +358,9 @@ struct batadv_bcast_packet { * @coded_len: length of network coded part of the payload */ struct batadv_coded_packet { - struct batadv_header header; + uint8_t packet_type; + uint8_t version; /* batman version field */ + uint8_t ttl; uint8_t first_ttvn; /* uint8_t first_dest[ETH_ALEN]; - saved in mac header destination */ uint8_t first_source[ETH_ALEN]; @@ -351,7 +379,9 @@ struct batadv_coded_packet { /** * struct batadv_unicast_tvlv - generic unicast packet with tvlv payload - * @header: common batman packet header + * @packet_type: batman-adv packet type, part of the general header + * @version: batman-adv protocol version, part of the genereal header + * @ttl: time to live for this packet, part of the genereal header * @reserved: reserved field (for packet alignment) * @src: address of the source * @dst: address of the destination @@ -359,7 +389,9 @@ struct batadv_coded_packet { * @align: 2 bytes to align the header to a 4 byte boundry */ struct batadv_unicast_tvlv_packet { - struct batadv_header header; + uint8_t packet_type; + uint8_t version; /* batman version field */ + uint8_t ttl; uint8_t reserved; uint8_t dst[ETH_ALEN]; uint8_t src[ETH_ALEN]; diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c index d4114d7..5b52d71 100644 --- a/net/batman-adv/routing.c +++ b/net/batman-adv/routing.c @@ -308,7 +308,7 @@ static int batadv_recv_my_icmp_packet(struct batadv_priv *bat_priv, memcpy(icmph->dst, icmph->orig, ETH_ALEN); memcpy(icmph->orig, primary_if->net_dev->dev_addr, ETH_ALEN); icmph->msg_type = BATADV_ECHO_REPLY; - icmph->header.ttl = BATADV_TTL; + icmph->ttl = BATADV_TTL; res = batadv_send_skb_to_orig(skb, orig_node, NULL); if (res != NET_XMIT_DROP) @@ -363,7 +363,7 @@ static int batadv_recv_icmp_ttl_exceeded(struct batadv_priv *bat_priv, memcpy(icmp_packet->icmph.orig, primary_if->net_dev->dev_addr, ETH_ALEN); icmp_packet->icmph.msg_type = BATADV_TTL_EXCEEDED; - icmp_packet->icmph.header.ttl = BATADV_TTL; + icmp_packet->icmph.ttl = BATADV_TTL; if (batadv_send_skb_to_orig(skb, orig_node, NULL) != NET_XMIT_DROP) ret = NET_RX_SUCCESS; @@ -434,7 +434,7 @@ int batadv_recv_icmp_packet(struct sk_buff *skb, return batadv_recv_my_icmp_packet(bat_priv, skb); /* TTL exceeded */ - if (icmph->header.ttl < 2) + if (icmph->ttl < 2) return batadv_recv_icmp_ttl_exceeded(bat_priv, skb); /* get routing information */ @@ -449,7 +449,7 @@ int batadv_recv_icmp_packet(struct sk_buff *skb, icmph = (struct batadv_icmp_header *)skb->data; /* decrement ttl */ - icmph->header.ttl--; + icmph->ttl--; /* route it */ if (batadv_send_skb_to_orig(skb, orig_node, recv_if) != NET_XMIT_DROP) @@ -709,7 +709,7 @@ static int batadv_route_unicast_packet(struct sk_buff *skb, unicast_packet = (struct batadv_unicast_packet *)skb->data; /* TTL exceeded */ - if (unicast_packet->header.ttl < 2) { + if (unicast_packet->ttl < 2) { pr_debug("Warning - can't forward unicast packet from %pM to %pM: ttl exceeded\n", ethhdr->h_source, unicast_packet->dest); goto out; @@ -727,9 +727,9 @@ static int batadv_route_unicast_packet(struct sk_buff *skb, /* decrement ttl */ unicast_packet = (struct batadv_unicast_packet *)skb->data; - unicast_packet->header.ttl--; + unicast_packet->ttl--; - switch (unicast_packet->header.packet_type) { + switch (unicast_packet->packet_type) { case BATADV_UNICAST_4ADDR: hdr_len = sizeof(struct batadv_unicast_4addr_packet); break; @@ -970,7 +970,7 @@ int batadv_recv_unicast_packet(struct sk_buff *skb, unicast_packet = (struct batadv_unicast_packet *)skb->data; unicast_4addr_packet = (struct batadv_unicast_4addr_packet *)skb->data; - is4addr = unicast_packet->header.packet_type == BATADV_UNICAST_4ADDR; + is4addr = unicast_packet->packet_type == BATADV_UNICAST_4ADDR; /* the caller function should have already pulled 2 bytes */ if (is4addr) hdr_size = sizeof(*unicast_4addr_packet); @@ -1160,7 +1160,7 @@ int batadv_recv_bcast_packet(struct sk_buff *skb, if (batadv_is_my_mac(bat_priv, bcast_packet->orig)) goto out; - if (bcast_packet->header.ttl < 2) + if (bcast_packet->ttl < 2) goto out; orig_node = batadv_orig_hash_find(bat_priv, bcast_packet->orig); diff --git a/net/batman-adv/send.c b/net/batman-adv/send.c index c83be5e..fba4dcf 100644 --- a/net/batman-adv/send.c +++ b/net/batman-adv/send.c @@ -161,11 +161,11 @@ batadv_send_skb_push_fill_unicast(struct sk_buff *skb, int hdr_size, return false; unicast_packet = (struct batadv_unicast_packet *)skb->data; - unicast_packet->header.version = BATADV_COMPAT_VERSION; + unicast_packet->version = BATADV_COMPAT_VERSION; /* batman packet type: unicast */ - unicast_packet->header.packet_type = BATADV_UNICAST; + unicast_packet->packet_type = BATADV_UNICAST; /* set unicast ttl */ - unicast_packet->header.ttl = BATADV_TTL; + unicast_packet->ttl = BATADV_TTL; /* copy the destination for faster routing */ memcpy(unicast_packet->dest, orig_node->orig, ETH_ALEN); /* set the destination tt version number */ @@ -221,7 +221,7 @@ bool batadv_send_skb_prepare_unicast_4addr(struct batadv_priv *bat_priv, goto out; uc_4addr_packet = (struct batadv_unicast_4addr_packet *)skb->data; - uc_4addr_packet->u.header.packet_type = BATADV_UNICAST_4ADDR; + uc_4addr_packet->u.packet_type = BATADV_UNICAST_4ADDR; memcpy(uc_4addr_packet->src, primary_if->net_dev->dev_addr, ETH_ALEN); uc_4addr_packet->subtype = packet_subtype; uc_4addr_packet->reserved = 0; @@ -436,7 +436,7 @@ int batadv_add_bcast_packet_to_list(struct batadv_priv *bat_priv, /* as we have a copy now, it is safe to decrease the TTL */ bcast_packet = (struct batadv_bcast_packet *)newskb->data; - bcast_packet->header.ttl--; + bcast_packet->ttl--; skb_reset_mac_header(newskb); diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c index 36f0508..b4881f8 100644 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -264,11 +264,11 @@ static int batadv_interface_tx(struct sk_buff *skb, goto dropped; bcast_packet = (struct batadv_bcast_packet *)skb->data; - bcast_packet->header.version = BATADV_COMPAT_VERSION; - bcast_packet->header.ttl = BATADV_TTL; + bcast_packet->version = BATADV_COMPAT_VERSION; + bcast_packet->ttl = BATADV_TTL; /* batman packet type: broadcast */ - bcast_packet->header.packet_type = BATADV_BCAST; + bcast_packet->packet_type = BATADV_BCAST; bcast_packet->reserved = 0; /* hw address of first interface is the orig mac because only @@ -328,7 +328,7 @@ void batadv_interface_rx(struct net_device *soft_iface, struct sk_buff *skb, struct batadv_hard_iface *recv_if, int hdr_size, struct batadv_orig_node *orig_node) { - struct batadv_header *batadv_header = (struct batadv_header *)skb->data; + struct batadv_bcast_packet *batadv_bcast_packet; struct batadv_priv *bat_priv = netdev_priv(soft_iface); __be16 ethertype = htons(ETH_P_BATMAN); struct vlan_ethhdr *vhdr; @@ -336,7 +336,8 @@ void batadv_interface_rx(struct net_device *soft_iface, unsigned short vid; bool is_bcast; - is_bcast = (batadv_header->packet_type == BATADV_BCAST); + batadv_bcast_packet = (struct batadv_bcast_packet *)skb->data; + is_bcast = (batadv_bcast_packet->packet_type == BATADV_BCAST); /* check if enough space is available for pulling, and pull */ if (!pskb_may_pull(skb, hdr_size)) -- cgit v0.10.2 From 27a417e6badac911487c10990e04f5ccbb11b1b2 Mon Sep 17 00:00:00 2001 From: Antonio Quartulli Date: Thu, 5 Dec 2013 15:33:00 +0100 Subject: batman-adv: fix size of batadv_icmp_header struct batadv_icmp_header currently has a size of 17, which will be padded to 20 on some architectures. Fix this by unrolling the header into the parent structures. Moreover keep the ICMP parsing functions as generic as they are now by using a stub icmp_header struct during packet parsing. Signed-off-by: Antonio Quartulli Signed-off-by: Marek Lindner diff --git a/net/batman-adv/main.c b/net/batman-adv/main.c index d87778b..1511f64 100644 --- a/net/batman-adv/main.c +++ b/net/batman-adv/main.c @@ -426,8 +426,8 @@ static void batadv_recv_handler_init(void) BUILD_BUG_ON(offsetof(struct batadv_unicast_packet, dest) != 4); BUILD_BUG_ON(offsetof(struct batadv_unicast_tvlv_packet, dst) != 4); BUILD_BUG_ON(offsetof(struct batadv_frag_packet, dest) != 4); - BUILD_BUG_ON(offsetof(struct batadv_icmp_packet, icmph.dst) != 4); - BUILD_BUG_ON(offsetof(struct batadv_icmp_packet_rr, icmph.dst) != 4); + BUILD_BUG_ON(offsetof(struct batadv_icmp_packet, dst) != 4); + BUILD_BUG_ON(offsetof(struct batadv_icmp_packet_rr, dst) != 4); /* broadcast packet */ batadv_rx_handler[BATADV_BCAST] = batadv_recv_bcast_packet; diff --git a/net/batman-adv/packet.h b/net/batman-adv/packet.h index 175ce7d..2a857ed 100644 --- a/net/batman-adv/packet.h +++ b/net/batman-adv/packet.h @@ -191,7 +191,7 @@ struct batadv_ogm_packet { #define BATADV_OGM_HLEN sizeof(struct batadv_ogm_packet) /** - * batadv_icmp_header - common ICMP header + * batadv_icmp_header - common members among all the ICMP packets * @packet_type: batman-adv packet type, part of the general header * @version: batman-adv protocol version, part of the genereal header * @ttl: time to live for this packet, part of the genereal header @@ -199,6 +199,11 @@ struct batadv_ogm_packet { * @dst: address of the destination node * @orig: address of the source node * @uid: local ICMP socket identifier + * @align: not used - useful for alignment purposes only + * + * This structure is used for ICMP packets parsing only and it is never sent + * over the wire. The alignment field at the end is there to ensure that + * members are padded the same way as they are in real packets. */ struct batadv_icmp_header { uint8_t packet_type; @@ -208,16 +213,29 @@ struct batadv_icmp_header { uint8_t dst[ETH_ALEN]; uint8_t orig[ETH_ALEN]; uint8_t uid; + uint8_t align[3]; }; /** * batadv_icmp_packet - ICMP packet - * @icmph: common ICMP header + * @packet_type: batman-adv packet type, part of the general header + * @version: batman-adv protocol version, part of the genereal header + * @ttl: time to live for this packet, part of the genereal header + * @msg_type: ICMP packet type + * @dst: address of the destination node + * @orig: address of the source node + * @uid: local ICMP socket identifier * @reserved: not used - useful for alignment * @seqno: ICMP sequence number */ struct batadv_icmp_packet { - struct batadv_icmp_header icmph; + uint8_t packet_type; + uint8_t version; + uint8_t ttl; + uint8_t msg_type; /* see ICMP message types above */ + uint8_t dst[ETH_ALEN]; + uint8_t orig[ETH_ALEN]; + uint8_t uid; uint8_t reserved; __be16 seqno; }; @@ -226,13 +244,25 @@ struct batadv_icmp_packet { /** * batadv_icmp_packet_rr - ICMP RouteRecord packet - * @icmph: common ICMP header + * @packet_type: batman-adv packet type, part of the general header + * @version: batman-adv protocol version, part of the genereal header + * @ttl: time to live for this packet, part of the genereal header + * @msg_type: ICMP packet type + * @dst: address of the destination node + * @orig: address of the source node + * @uid: local ICMP socket identifier * @rr_cur: number of entries the rr array * @seqno: ICMP sequence number * @rr: route record array */ struct batadv_icmp_packet_rr { - struct batadv_icmp_header icmph; + uint8_t packet_type; + uint8_t version; + uint8_t ttl; + uint8_t msg_type; /* see ICMP message types above */ + uint8_t dst[ETH_ALEN]; + uint8_t orig[ETH_ALEN]; + uint8_t uid; uint8_t rr_cur; __be16 seqno; uint8_t rr[BATADV_RR_LEN][ETH_ALEN]; diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c index 5b52d71..46278bf 100644 --- a/net/batman-adv/routing.c +++ b/net/batman-adv/routing.c @@ -338,9 +338,9 @@ static int batadv_recv_icmp_ttl_exceeded(struct batadv_priv *bat_priv, icmp_packet = (struct batadv_icmp_packet *)skb->data; /* send TTL exceeded if packet is an echo request (traceroute) */ - if (icmp_packet->icmph.msg_type != BATADV_ECHO_REQUEST) { + if (icmp_packet->msg_type != BATADV_ECHO_REQUEST) { pr_debug("Warning - can't forward icmp packet from %pM to %pM: ttl exceeded\n", - icmp_packet->icmph.orig, icmp_packet->icmph.dst); + icmp_packet->orig, icmp_packet->dst); goto out; } @@ -349,7 +349,7 @@ static int batadv_recv_icmp_ttl_exceeded(struct batadv_priv *bat_priv, goto out; /* get routing information */ - orig_node = batadv_orig_hash_find(bat_priv, icmp_packet->icmph.orig); + orig_node = batadv_orig_hash_find(bat_priv, icmp_packet->orig); if (!orig_node) goto out; @@ -359,11 +359,11 @@ static int batadv_recv_icmp_ttl_exceeded(struct batadv_priv *bat_priv, icmp_packet = (struct batadv_icmp_packet *)skb->data; - memcpy(icmp_packet->icmph.dst, icmp_packet->icmph.orig, ETH_ALEN); - memcpy(icmp_packet->icmph.orig, primary_if->net_dev->dev_addr, + memcpy(icmp_packet->dst, icmp_packet->orig, ETH_ALEN); + memcpy(icmp_packet->orig, primary_if->net_dev->dev_addr, ETH_ALEN); - icmp_packet->icmph.msg_type = BATADV_TTL_EXCEEDED; - icmp_packet->icmph.ttl = BATADV_TTL; + icmp_packet->msg_type = BATADV_TTL_EXCEEDED; + icmp_packet->ttl = BATADV_TTL; if (batadv_send_skb_to_orig(skb, orig_node, NULL) != NET_XMIT_DROP) ret = NET_RX_SUCCESS; -- cgit v0.10.2 From 2f7a318219186485c175339758ac42a03644200b Mon Sep 17 00:00:00 2001 From: Simon Wunderlich Date: Mon, 2 Dec 2013 20:38:33 +0100 Subject: batman-adv: fix size of batadv_bla_claim_dst Since this is a mac address and always 48 bit, and we can assume that it is always aligned to 2-byte boundaries, add a pack(2) pragma. Signed-off-by: Simon Wunderlich Signed-off-by: Marek Lindner Signed-off-by: Antonio Quartulli diff --git a/net/batman-adv/packet.h b/net/batman-adv/packet.h index 2a857ed..04cf27c 100644 --- a/net/batman-adv/packet.h +++ b/net/batman-adv/packet.h @@ -155,6 +155,7 @@ enum batadv_tvlv_type { BATADV_TVLV_ROAM = 0x05, }; +#pragma pack(2) /* the destination hardware field in the ARP frame is used to * transport the claim type and the group id */ @@ -163,6 +164,7 @@ struct batadv_bla_claim_dst { uint8_t type; /* bla_claimframe */ __be16 group; /* group id */ }; +#pragma pack() /** * struct batadv_ogm_packet - ogm (routing protocol) packet -- cgit v0.10.2 From ca6630464454d74ae1d430e99007a1139f4a2aba Mon Sep 17 00:00:00 2001 From: Antonio Quartulli Date: Sun, 15 Dec 2013 13:26:55 +0100 Subject: batman-adv: fix alignment for batadv_tvlv_tt_change Make struct batadv_tvlv_tt_change a multiple 4 bytes long to avoid padding on any architecture. Signed-off-by: Antonio Quartulli Signed-off-by: Marek Lindner diff --git a/net/batman-adv/packet.h b/net/batman-adv/packet.h index 04cf27c..2dd8f24 100644 --- a/net/batman-adv/packet.h +++ b/net/batman-adv/packet.h @@ -484,13 +484,13 @@ struct batadv_tvlv_tt_vlan_data { * struct batadv_tvlv_tt_change - translation table diff data * @flags: status indicators concerning the non-mesh client (see * batadv_tt_client_flags) - * @reserved: reserved field + * @reserved: reserved field - useful for alignment purposes only * @addr: mac address of non-mesh client that triggered this tt change * @vid: VLAN identifier */ struct batadv_tvlv_tt_change { uint8_t flags; - uint8_t reserved; + uint8_t reserved[3]; uint8_t addr[ETH_ALEN]; __be16 vid; }; diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index 4add57d..ff625fe 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -333,7 +333,8 @@ static void batadv_tt_local_event(struct batadv_priv *bat_priv, return; tt_change_node->change.flags = flags; - tt_change_node->change.reserved = 0; + memset(tt_change_node->change.reserved, 0, + sizeof(tt_change_node->change.reserved)); memcpy(tt_change_node->change.addr, common->addr, ETH_ALEN); tt_change_node->change.vid = htons(common->vid); @@ -2221,7 +2222,8 @@ static void batadv_tt_tvlv_generate(struct batadv_priv *bat_priv, ETH_ALEN); tt_change->flags = tt_common_entry->flags; tt_change->vid = htons(tt_common_entry->vid); - tt_change->reserved = 0; + memset(tt_change->reserved, 0, + sizeof(tt_change->reserved)); tt_num_entries++; tt_change++; -- cgit v0.10.2 From 55883fd1048e09f5b6e1edaf0caf7e4f6f31f971 Mon Sep 17 00:00:00 2001 From: Antonio Quartulli Date: Mon, 23 Dec 2013 01:28:05 +0100 Subject: batman-adv: clean nf state when removing protocol header If an interface enslaved into batman-adv is a bridge (or a virtual interface built on top of a bridge) the nf_bridge member of the skbs reaching the soft-interface is filled with the state about "netfilter bridge" operations. Then, if one of such skbs is locally delivered, the nf_bridge member should be cleaned up to avoid that the old state could mess up with other "netfilter bridge" operations when entering a second bridge. This is needed because batman-adv is an encapsulation protocol. However at the moment skb->nf_bridge is not released at all leading to bogus "netfilter bridge" behaviours. Fix this by cleaning the netfilter state of the skb before it gets delivered to the upper layer in interface_rx(). Signed-off-by: Antonio Quartulli Signed-off-by: Marek Lindner diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c index b4881f8..875a702 100644 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -346,6 +346,11 @@ void batadv_interface_rx(struct net_device *soft_iface, skb_pull_rcsum(skb, hdr_size); skb_reset_mac_header(skb); + /* clean the netfilter state now that the batman-adv header has been + * removed + */ + nf_reset(skb); + vid = batadv_get_vid(skb, hdr_size); ethhdr = eth_hdr(skb); -- cgit v0.10.2 From 2b1e2cb3594df80446dc33bb8e12230da11f38ff Mon Sep 17 00:00:00 2001 From: Antonio Quartulli Date: Mon, 23 Dec 2013 21:43:39 +0100 Subject: batman-adv: fix vlan header access When batadv_get_vid() is invoked in interface_rx() the batman-adv header has already been removed, therefore the header_len argument has to be 0. Introduced by c018ad3de61a1dc4194879a53e5559e094aa7b1a ("batman-adv: add the VLAN ID attribute to the TT entry") Signed-off-by: Antonio Quartulli Signed-off-by: Marek Lindner diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c index 875a702..a8f99d1 100644 --- a/net/batman-adv/soft-interface.c +++ b/net/batman-adv/soft-interface.c @@ -351,7 +351,7 @@ void batadv_interface_rx(struct net_device *soft_iface, */ nf_reset(skb); - vid = batadv_get_vid(skb, hdr_size); + vid = batadv_get_vid(skb, 0); ethhdr = eth_hdr(skb); switch (ntohs(ethhdr->h_proto)) { -- cgit v0.10.2 From 2ee0d3c80fdb7974cfa1c7e25b5048e9fcaf69b6 Mon Sep 17 00:00:00 2001 From: Pablo Neira Ayuso Date: Sat, 28 Dec 2013 00:59:38 +0100 Subject: netfilter: nf_tables: fix wrong datatype in nft_validate_data_load() This patch fixes dictionary mappings, eg. add rule ip filter input meta dnat set tcp dport map { 22 => 1.1.1.1, 23 => 2.2.2.2 } The kernel was returning -EINVAL in nft_validate_data_load() since the type of the set element data that is passed was the real userspace datatype instead of NFT_DATA_VALUE. Signed-off-by: Pablo Neira Ayuso diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index d65c80b..71a9f49 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -2382,7 +2382,9 @@ static int nf_tables_bind_check_setelem(const struct nft_ctx *ctx, enum nft_registers dreg; dreg = nft_type_to_reg(set->dtype); - return nft_validate_data_load(ctx, dreg, &elem->data, set->dtype); + return nft_validate_data_load(ctx, dreg, &elem->data, + set->dtype == NFT_DATA_VERDICT ? + NFT_DATA_VERDICT : NFT_DATA_VALUE); } int nf_tables_bind_set(const struct nft_ctx *ctx, struct nft_set *set, -- cgit v0.10.2 From 9928422fef6f9aafc09d3c2fed4f803f67f5240b Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Wed, 11 Dec 2013 09:48:58 +0100 Subject: ARM: pxa: fix USB gadget driver compilation regression After commit 88f718e3fa4d67f3a8dbe79a2f97d722323e4051 "ARM: pxa: delete the custom GPIO header" a compilation error was introduced in the PXA25x gadget driver. An attempt to fix the problem was made in commit b144e4ab1ef130e8bf30bcd3e529b7f35112c503 "usb: gadget: fix pxa25x compilation problems" by explictly stating the driver needs the header, which solved the compilation for a few boards, such as the pxa255-idp and its defconfig. However the Lubbock board has this special clause in drivers/usb/gadget/pxa25x_udc.c: This include file has an implicit dependency on having been included before was included. Before commit 88f718e3fa4d67f3a8dbe79a2f97d722323e4051 "ARM: pxa: delete the custom GPIO header" this implicit dependency for the pxa25x_udc compile on the Lubbock was satisfied by implicitly including which was in turn including , apart from the earlier added . Fix this by having the PXA25x explicitly include . Reported-by: Russell King Cc: Greg Kroah-Hartmann Cc: Felipe Balbi Signed-off-by: Linus Walleij Signed-off-by: Haojian Zhuang Signed-off-by: Olof Johansson diff --git a/arch/arm/mach-pxa/include/mach/lubbock.h b/arch/arm/mach-pxa/include/mach/lubbock.h index 2a086e8..958cd6af 100644 --- a/arch/arm/mach-pxa/include/mach/lubbock.h +++ b/arch/arm/mach-pxa/include/mach/lubbock.h @@ -10,6 +10,8 @@ * published by the Free Software Foundation. */ +#include + #define LUBBOCK_ETH_PHYS PXA_CS3_PHYS #define LUBBOCK_FPGA_PHYS PXA_CS2_PHYS -- cgit v0.10.2 From 802eee95bde72fd0cd0f3a5b2098375a487d1eda Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 29 Dec 2013 16:01:33 -0800 Subject: Linux 3.13-rc6 diff --git a/Makefile b/Makefile index 14d592c..ab80be7 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ VERSION = 3 PATCHLEVEL = 13 SUBLEVEL = 0 -EXTRAVERSION = -rc5 +EXTRAVERSION = -rc6 NAME = One Giant Leap for Frogkind # *DOCUMENTATION* -- cgit v0.10.2 From 90ff5d688e61f49f23545ffab6228bd7e87e6dc7 Mon Sep 17 00:00:00 2001 From: Michael Neuling Date: Mon, 16 Dec 2013 15:12:43 +1100 Subject: powerpc: Fix bad stack check in exception entry In EXCEPTION_PROLOG_COMMON() we check to see if the stack pointer (r1) is valid when coming from the kernel. If it's not valid, we die but with a nice oops message. Currently we allocate a stack frame (subtract INT_FRAME_SIZE) before we check to see if the stack pointer is negative. Unfortunately, this won't detect a bad stack where r1 is less than INT_FRAME_SIZE. This patch fixes the check to compare the modified r1 with -INT_FRAME_SIZE. With this, bad kernel stack pointers (including NULL pointers) are correctly detected again. Kudos to Paulus for finding this. Signed-off-by: Michael Neuling cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h index 894662a..243ce69 100644 --- a/arch/powerpc/include/asm/exception-64s.h +++ b/arch/powerpc/include/asm/exception-64s.h @@ -284,7 +284,7 @@ do_kvm_##n: \ subi r1,r1,INT_FRAME_SIZE; /* alloc frame on kernel stack */ \ beq- 1f; \ ld r1,PACAKSAVE(r13); /* kernel stack to use */ \ -1: cmpdi cr1,r1,0; /* check if r1 is in userspace */ \ +1: cmpdi cr1,r1,-INT_FRAME_SIZE; /* check if r1 is in userspace */ \ blt+ cr1,3f; /* abort if it is */ \ li r1,(n); /* will be reloaded later */ \ sth r1,PACA_TRAP_SAVE(r13); \ -- cgit v0.10.2 From e8a00ad5e238421ded856ea39f692b94c2d324eb Mon Sep 17 00:00:00 2001 From: Rajesh B Prathipati Date: Mon, 16 Dec 2013 18:58:22 +1100 Subject: powerpc: Make unaligned accesses endian-safe for powerpc The generic put_unaligned/get_unaligned macros were made endian-safe by calling the appropriate endian dependent macros based on the endian type of the powerpc processor. Signed-off-by: Rajesh B Prathipati Signed-off-by: Anton Blanchard Signed-off-by: Benjamin Herrenschmidt diff --git a/arch/powerpc/include/asm/unaligned.h b/arch/powerpc/include/asm/unaligned.h index 5f1b1e3..8296381 100644 --- a/arch/powerpc/include/asm/unaligned.h +++ b/arch/powerpc/include/asm/unaligned.h @@ -4,13 +4,18 @@ #ifdef __KERNEL__ /* - * The PowerPC can do unaligned accesses itself in big endian mode. + * The PowerPC can do unaligned accesses itself based on its endian mode. */ #include #include +#ifdef __LITTLE_ENDIAN__ +#define get_unaligned __get_unaligned_le +#define put_unaligned __put_unaligned_le +#else #define get_unaligned __get_unaligned_be #define put_unaligned __put_unaligned_be +#endif #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_UNALIGNED_H */ -- cgit v0.10.2 From 20151169f1de4b170368fdb574024027620d0d49 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 18 Dec 2013 09:29:57 +1100 Subject: powerpc: Make 64-bit non-VMX __copy_tofrom_user bi-endian The powerpc 64-bit __copy_tofrom_user() function uses shifts to handle unaligned invocations. However, these shifts were designed for big-endian systems: On little-endian systems, they must shift in the opposite direction. This commit relies on the C preprocessor to insert the correct shifts into the assembly code. [ This is a rare but nasty LE issue. Most of the time we use the POWER7 optimised __copy_tofrom_user_power7 loop, but when it hits an exception we fall back to the base __copy_tofrom_user loop. - Anton ] Signed-off-by: Paul E. McKenney Signed-off-by: Anton Blanchard Signed-off-by: Benjamin Herrenschmidt diff --git a/arch/powerpc/lib/copyuser_64.S b/arch/powerpc/lib/copyuser_64.S index d73a590..596a285 100644 --- a/arch/powerpc/lib/copyuser_64.S +++ b/arch/powerpc/lib/copyuser_64.S @@ -9,6 +9,14 @@ #include #include +#ifdef __BIG_ENDIAN__ +#define sLd sld /* Shift towards low-numbered address. */ +#define sHd srd /* Shift towards high-numbered address. */ +#else +#define sLd srd /* Shift towards low-numbered address. */ +#define sHd sld /* Shift towards high-numbered address. */ +#endif + .align 7 _GLOBAL(__copy_tofrom_user) BEGIN_FTR_SECTION @@ -118,10 +126,10 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_LD_STD) 24: ld r9,0(r4) /* 3+2n loads, 2+2n stores */ 25: ld r0,8(r4) - sld r6,r9,r10 + sLd r6,r9,r10 26: ldu r9,16(r4) - srd r7,r0,r11 - sld r8,r0,r10 + sHd r7,r0,r11 + sLd r8,r0,r10 or r7,r7,r6 blt cr6,79f 27: ld r0,8(r4) @@ -129,35 +137,35 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_LD_STD) 28: ld r0,0(r4) /* 4+2n loads, 3+2n stores */ 29: ldu r9,8(r4) - sld r8,r0,r10 + sLd r8,r0,r10 addi r3,r3,-8 blt cr6,5f 30: ld r0,8(r4) - srd r12,r9,r11 - sld r6,r9,r10 + sHd r12,r9,r11 + sLd r6,r9,r10 31: ldu r9,16(r4) or r12,r8,r12 - srd r7,r0,r11 - sld r8,r0,r10 + sHd r7,r0,r11 + sLd r8,r0,r10 addi r3,r3,16 beq cr6,78f 1: or r7,r7,r6 32: ld r0,8(r4) 76: std r12,8(r3) -2: srd r12,r9,r11 - sld r6,r9,r10 +2: sHd r12,r9,r11 + sLd r6,r9,r10 33: ldu r9,16(r4) or r12,r8,r12 77: stdu r7,16(r3) - srd r7,r0,r11 - sld r8,r0,r10 + sHd r7,r0,r11 + sLd r8,r0,r10 bdnz 1b 78: std r12,8(r3) or r7,r7,r6 79: std r7,16(r3) -5: srd r12,r9,r11 +5: sHd r12,r9,r11 or r12,r8,r12 80: std r12,24(r3) bne 6f @@ -165,23 +173,38 @@ END_FTR_SECTION_IFCLR(CPU_FTR_UNALIGNED_LD_STD) blr 6: cmpwi cr1,r5,8 addi r3,r3,32 - sld r9,r9,r10 + sLd r9,r9,r10 ble cr1,7f 34: ld r0,8(r4) - srd r7,r0,r11 + sHd r7,r0,r11 or r9,r7,r9 7: bf cr7*4+1,1f +#ifdef __BIG_ENDIAN__ rotldi r9,r9,32 +#endif 94: stw r9,0(r3) +#ifdef __LITTLE_ENDIAN__ + rotrdi r9,r9,32 +#endif addi r3,r3,4 1: bf cr7*4+2,2f +#ifdef __BIG_ENDIAN__ rotldi r9,r9,16 +#endif 95: sth r9,0(r3) +#ifdef __LITTLE_ENDIAN__ + rotrdi r9,r9,16 +#endif addi r3,r3,2 2: bf cr7*4+3,3f +#ifdef __BIG_ENDIAN__ rotldi r9,r9,8 +#endif 96: stb r9,0(r3) +#ifdef __LITTLE_ENDIAN__ + rotrdi r9,r9,8 +#endif 3: li r3,0 blr -- cgit v0.10.2 From 20acebdfaec3e15d8a27ab25bf4a2f91b2afa757 Mon Sep 17 00:00:00 2001 From: Brian W Hart Date: Thu, 19 Dec 2013 17:14:07 -0600 Subject: powernv/eeh: Fix possible buffer overrun in ioda_eeh_phb_diag() PHB diagnostic buffer may be smaller than PAGE_SIZE, especially when PAGE_SIZE > 4KB. Signed-off-by: Brian W Hart Acked-by: Gavin Shan Signed-off-by: Benjamin Herrenschmidt diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c index 02245ce..8184ef5 100644 --- a/arch/powerpc/platforms/powernv/eeh-ioda.c +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c @@ -820,14 +820,15 @@ static void ioda_eeh_phb_diag(struct pci_controller *hose) struct OpalIoPhbErrorCommon *common; long rc; - common = (struct OpalIoPhbErrorCommon *)phb->diag.blob; - rc = opal_pci_get_phb_diag_data2(phb->opal_id, common, PAGE_SIZE); + rc = opal_pci_get_phb_diag_data2(phb->opal_id, phb->diag.blob, + PNV_PCI_DIAG_BUF_SIZE); if (rc != OPAL_SUCCESS) { pr_warning("%s: Failed to get diag-data for PHB#%x (%ld)\n", __func__, hose->global_number, rc); return; } + common = (struct OpalIoPhbErrorCommon *)phb->diag.blob; switch (common->ioType) { case OPAL_PHB_ERROR_DATA_TYPE_P7IOC: ioda_eeh_p7ioc_phb_diag(hose, common); -- cgit v0.10.2 From ca1de5deb782e1636ed5b898e215a8840ae39230 Mon Sep 17 00:00:00 2001 From: Brian W Hart Date: Fri, 20 Dec 2013 13:06:01 -0600 Subject: powernv/eeh: Add buffer for P7IOC hub error data Prevent ioda_eeh_hub_diag() from clobbering itself when called by supplying a per-PHB buffer for P7IOC hub diagnostic data. Take care to inform OPAL of the correct size for the buffer. [Small style change to the use of sizeof -- BenH] Signed-off-by: Brian W Hart Acked-by: Gavin Shan Signed-off-by: Benjamin Herrenschmidt diff --git a/arch/powerpc/platforms/powernv/eeh-ioda.c b/arch/powerpc/platforms/powernv/eeh-ioda.c index 8184ef5..d7ddcee 100644 --- a/arch/powerpc/platforms/powernv/eeh-ioda.c +++ b/arch/powerpc/platforms/powernv/eeh-ioda.c @@ -36,7 +36,6 @@ #include "powernv.h" #include "pci.h" -static char *hub_diag = NULL; static int ioda_eeh_nb_init = 0; static int ioda_eeh_event(struct notifier_block *nb, @@ -140,15 +139,6 @@ static int ioda_eeh_post_init(struct pci_controller *hose) ioda_eeh_nb_init = 1; } - /* We needn't HUB diag-data on PHB3 */ - if (phb->type == PNV_PHB_IODA1 && !hub_diag) { - hub_diag = (char *)__get_free_page(GFP_KERNEL | __GFP_ZERO); - if (!hub_diag) { - pr_err("%s: Out of memory !\n", __func__); - return -ENOMEM; - } - } - #ifdef CONFIG_DEBUG_FS if (phb->dbgfs) { debugfs_create_file("err_injct_outbound", 0600, @@ -633,11 +623,10 @@ static void ioda_eeh_hub_diag_common(struct OpalIoP7IOCErrorData *data) static void ioda_eeh_hub_diag(struct pci_controller *hose) { struct pnv_phb *phb = hose->private_data; - struct OpalIoP7IOCErrorData *data; + struct OpalIoP7IOCErrorData *data = &phb->diag.hub_diag; long rc; - data = (struct OpalIoP7IOCErrorData *)ioda_eeh_hub_diag; - rc = opal_pci_get_hub_diag_data(phb->hub_id, data, PAGE_SIZE); + rc = opal_pci_get_hub_diag_data(phb->hub_id, data, sizeof(*data)); if (rc != OPAL_SUCCESS) { pr_warning("%s: Failed to get HUB#%llx diag-data (%ld)\n", __func__, phb->hub_id, rc); diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h index 911c24e..1ed8d5f 100644 --- a/arch/powerpc/platforms/powernv/pci.h +++ b/arch/powerpc/platforms/powernv/pci.h @@ -172,11 +172,13 @@ struct pnv_phb { } ioda; }; - /* PHB status structure */ + /* PHB and hub status structure */ union { unsigned char blob[PNV_PCI_DIAG_BUF_SIZE]; struct OpalIoP7IOCPhbErrorData p7ioc; + struct OpalIoP7IOCErrorData hub_diag; } diag; + }; extern struct pci_ops pnv_pci_ops; -- cgit v0.10.2 From 286e4f90a72c0b0621dde0294af6ed4b0baddabb Mon Sep 17 00:00:00 2001 From: Anton Blanchard Date: Mon, 23 Dec 2013 12:19:51 +1100 Subject: powerpc: Align p_end p_end is an 8 byte value embedded in the text section. This means it is only 4 byte aligned when it should be 8 byte aligned. Fix this by adding an explicit alignment. This fixes an issue where POWER7 little endian builds with CONFIG_RELOCATABLE=y fail to boot. Signed-off-by: Anton Blanchard Signed-off-by: Benjamin Herrenschmidt diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index 2ae41ab..fad2abd 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -470,6 +470,7 @@ _STATIC(__after_prom_start) mtctr r8 bctr +.balign 8 p_end: .llong _end - _stext 4: /* Now copy the rest of the kernel up to _end */ -- cgit v0.10.2 From 7d4151b5098fb0bf7f6f8d1156e1ab9d83260580 Mon Sep 17 00:00:00 2001 From: Olof Johansson Date: Sat, 28 Dec 2013 13:01:47 -0800 Subject: powerpc: Fix alignment of secondary cpu spin vars Commit 5c0484e25ec0 ('powerpc: Endian safe trampoline') resulted in losing proper alignment of the spinlock variables used when booting secondary CPUs, causing some quite odd issues with failing to boot on PA Semi-based systems. This showed itself on ppc64_defconfig, but not on pasemi_defconfig, so it had gone unnoticed when I initially tested the LE patch set. Fix is to add explicit alignment instead of relying on good luck. :) [ It appears that there is a different issue with PA Semi systems however this fix is definitely correct so applying anyway -- BenH ] Fixes: 5c0484e25ec0 ('powerpc: Endian safe trampoline') Reported-by: Christian Zigotzky Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=67811 Signed-off-by: Olof Johansson Signed-off-by: Benjamin Herrenschmidt diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S index fad2abd..4f0946d 100644 --- a/arch/powerpc/kernel/head_64.S +++ b/arch/powerpc/kernel/head_64.S @@ -80,6 +80,7 @@ END_FTR_SECTION(0, 1) * of the function that the cpu should jump to to continue * initialization. */ + .balign 8 .globl __secondary_hold_spinloop __secondary_hold_spinloop: .llong 0x0 -- cgit v0.10.2 From 84602761ca4495dd409be936dfa93ed20c946684 Mon Sep 17 00:00:00 2001 From: Ying Xue Date: Fri, 27 Dec 2013 10:18:28 +0800 Subject: tipc: fix deadlock during socket release A deadlock might occur if name table is withdrawn in socket release routine, and while packets are still being received from bearer. CPU0 CPU1 T0: recv_msg() release() T1: tipc_recv_msg() tipc_withdraw() T2: [grab node lock] [grab port lock] T3: tipc_link_wakeup_ports() tipc_nametbl_withdraw() T4: [grab port lock]* named_cluster_distribute() T5: wakeupdispatch() tipc_link_send() T6: [grab node lock]* The opposite order of holding port lock and node lock on above two different paths may result in a deadlock. If socket lock instead of port lock is used to protect port instance in tipc_withdraw(), the reverse order of holding port lock and node lock will be eliminated, as a result, the deadlock is killed as well. Reported-by: Lars Everbrand Reviewed-by: Erik Hugne Signed-off-by: Ying Xue Signed-off-by: David S. Miller diff --git a/net/tipc/port.c b/net/tipc/port.c index c081a76..d43f318 100644 --- a/net/tipc/port.c +++ b/net/tipc/port.c @@ -251,18 +251,15 @@ struct tipc_port *tipc_createport(struct sock *sk, return p_ptr; } -int tipc_deleteport(u32 ref) +int tipc_deleteport(struct tipc_port *p_ptr) { - struct tipc_port *p_ptr; struct sk_buff *buf = NULL; - tipc_withdraw(ref, 0, NULL); - p_ptr = tipc_port_lock(ref); - if (!p_ptr) - return -EINVAL; + tipc_withdraw(p_ptr, 0, NULL); - tipc_ref_discard(ref); - tipc_port_unlock(p_ptr); + spin_lock_bh(p_ptr->lock); + tipc_ref_discard(p_ptr->ref); + spin_unlock_bh(p_ptr->lock); k_cancel_timer(&p_ptr->timer); if (p_ptr->connected) { @@ -704,47 +701,36 @@ int tipc_set_portimportance(u32 ref, unsigned int imp) } -int tipc_publish(u32 ref, unsigned int scope, struct tipc_name_seq const *seq) +int tipc_publish(struct tipc_port *p_ptr, unsigned int scope, + struct tipc_name_seq const *seq) { - struct tipc_port *p_ptr; struct publication *publ; u32 key; - int res = -EINVAL; - p_ptr = tipc_port_lock(ref); - if (!p_ptr) + if (p_ptr->connected) return -EINVAL; + key = p_ptr->ref + p_ptr->pub_count + 1; + if (key == p_ptr->ref) + return -EADDRINUSE; - if (p_ptr->connected) - goto exit; - key = ref + p_ptr->pub_count + 1; - if (key == ref) { - res = -EADDRINUSE; - goto exit; - } publ = tipc_nametbl_publish(seq->type, seq->lower, seq->upper, scope, p_ptr->ref, key); if (publ) { list_add(&publ->pport_list, &p_ptr->publications); p_ptr->pub_count++; p_ptr->published = 1; - res = 0; + return 0; } -exit: - tipc_port_unlock(p_ptr); - return res; + return -EINVAL; } -int tipc_withdraw(u32 ref, unsigned int scope, struct tipc_name_seq const *seq) +int tipc_withdraw(struct tipc_port *p_ptr, unsigned int scope, + struct tipc_name_seq const *seq) { - struct tipc_port *p_ptr; struct publication *publ; struct publication *tpubl; int res = -EINVAL; - p_ptr = tipc_port_lock(ref); - if (!p_ptr) - return -EINVAL; if (!seq) { list_for_each_entry_safe(publ, tpubl, &p_ptr->publications, pport_list) { @@ -771,7 +757,6 @@ int tipc_withdraw(u32 ref, unsigned int scope, struct tipc_name_seq const *seq) } if (list_empty(&p_ptr->publications)) p_ptr->published = 0; - tipc_port_unlock(p_ptr); return res; } diff --git a/net/tipc/port.h b/net/tipc/port.h index 9122535..34f12bd 100644 --- a/net/tipc/port.h +++ b/net/tipc/port.h @@ -116,7 +116,7 @@ int tipc_reject_msg(struct sk_buff *buf, u32 err); void tipc_acknowledge(u32 port_ref, u32 ack); -int tipc_deleteport(u32 portref); +int tipc_deleteport(struct tipc_port *p_ptr); int tipc_portimportance(u32 portref, unsigned int *importance); int tipc_set_portimportance(u32 portref, unsigned int importance); @@ -127,9 +127,9 @@ int tipc_set_portunreliable(u32 portref, unsigned int isunreliable); int tipc_portunreturnable(u32 portref, unsigned int *isunreturnable); int tipc_set_portunreturnable(u32 portref, unsigned int isunreturnable); -int tipc_publish(u32 portref, unsigned int scope, +int tipc_publish(struct tipc_port *p_ptr, unsigned int scope, struct tipc_name_seq const *name_seq); -int tipc_withdraw(u32 portref, unsigned int scope, +int tipc_withdraw(struct tipc_port *p_ptr, unsigned int scope, struct tipc_name_seq const *name_seq); int tipc_connect(u32 portref, struct tipc_portid const *port); diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 3b61851..e741416 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -354,7 +354,7 @@ static int release(struct socket *sock) * Delete TIPC port; this ensures no more messages are queued * (also disconnects an active connection & sends a 'FIN-' to peer) */ - res = tipc_deleteport(tport->ref); + res = tipc_deleteport(tport); /* Discard any remaining (connection-based) messages in receive queue */ __skb_queue_purge(&sk->sk_receive_queue); @@ -386,30 +386,46 @@ static int release(struct socket *sock) */ static int bind(struct socket *sock, struct sockaddr *uaddr, int uaddr_len) { + struct sock *sk = sock->sk; struct sockaddr_tipc *addr = (struct sockaddr_tipc *)uaddr; - u32 portref = tipc_sk_port(sock->sk)->ref; + struct tipc_port *tport = tipc_sk_port(sock->sk); + int res = -EINVAL; - if (unlikely(!uaddr_len)) - return tipc_withdraw(portref, 0, NULL); + lock_sock(sk); + if (unlikely(!uaddr_len)) { + res = tipc_withdraw(tport, 0, NULL); + goto exit; + } - if (uaddr_len < sizeof(struct sockaddr_tipc)) - return -EINVAL; - if (addr->family != AF_TIPC) - return -EAFNOSUPPORT; + if (uaddr_len < sizeof(struct sockaddr_tipc)) { + res = -EINVAL; + goto exit; + } + if (addr->family != AF_TIPC) { + res = -EAFNOSUPPORT; + goto exit; + } if (addr->addrtype == TIPC_ADDR_NAME) addr->addr.nameseq.upper = addr->addr.nameseq.lower; - else if (addr->addrtype != TIPC_ADDR_NAMESEQ) - return -EAFNOSUPPORT; + else if (addr->addrtype != TIPC_ADDR_NAMESEQ) { + res = -EAFNOSUPPORT; + goto exit; + } if ((addr->addr.nameseq.type < TIPC_RESERVED_TYPES) && (addr->addr.nameseq.type != TIPC_TOP_SRV) && - (addr->addr.nameseq.type != TIPC_CFG_SRV)) - return -EACCES; + (addr->addr.nameseq.type != TIPC_CFG_SRV)) { + res = -EACCES; + goto exit; + } - return (addr->scope > 0) ? - tipc_publish(portref, addr->scope, &addr->addr.nameseq) : - tipc_withdraw(portref, -addr->scope, &addr->addr.nameseq); + res = (addr->scope > 0) ? + tipc_publish(tport, addr->scope, &addr->addr.nameseq) : + tipc_withdraw(tport, -addr->scope, &addr->addr.nameseq); +exit: + release_sock(sk); + return res; } /** -- cgit v0.10.2 From 7a399e3a2e05bc580a78ea72371b3896827f72e1 Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Tue, 24 Dec 2013 12:16:01 -0200 Subject: fec: Do not assume that PHY reset is active low We should not assume that the PHY reset is always active low. Retrieve this information from the device tree instead, so that the PHY reset can work on both cases. Reported-by: Philipp Zabel Signed-off-by: Fabio Estevam Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 50bb71c..45b8b22 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -2049,6 +2049,8 @@ static void fec_reset_phy(struct platform_device *pdev) int err, phy_reset; int msec = 1; struct device_node *np = pdev->dev.of_node; + enum of_gpio_flags flags; + bool port; if (!np) return; @@ -2058,18 +2060,22 @@ static void fec_reset_phy(struct platform_device *pdev) if (msec > 1000) msec = 1; - phy_reset = of_get_named_gpio(np, "phy-reset-gpios", 0); + phy_reset = of_get_named_gpio_flags(np, "phy-reset-gpios", 0, &flags); if (!gpio_is_valid(phy_reset)) return; - err = devm_gpio_request_one(&pdev->dev, phy_reset, - GPIOF_OUT_INIT_LOW, "phy-reset"); + if (flags & OF_GPIO_ACTIVE_LOW) + port = GPIOF_OUT_INIT_LOW; + else + port = GPIOF_OUT_INIT_HIGH; + + err = devm_gpio_request_one(&pdev->dev, phy_reset, port, "phy-reset"); if (err) { dev_err(&pdev->dev, "failed to get phy-reset-gpios: %d\n", err); return; } msleep(msec); - gpio_set_value(phy_reset, 1); + gpio_set_value(phy_reset, !port); } #else /* CONFIG_OF */ static void fec_reset_phy(struct platform_device *pdev) -- cgit v0.10.2 From ac3d5ac277352fe6e27809286768e9f1f8aa388d Mon Sep 17 00:00:00 2001 From: Paul Durrant Date: Mon, 23 Dec 2013 09:27:17 +0000 Subject: xen-netback: fix guest-receive-side array sizes The sizes chosen for the metadata and grant_copy_op arrays on the guest receive size are wrong; - The meta array is needlessly twice the ring size, when we only ever consume a single array element per RX ring slot - The grant_copy_op array is way too small. It's sized based on a bogus assumption: that at most two copy ops will be used per ring slot. This may have been true at some point in the past but it's clear from looking at start_new_rx_buffer() that a new ring slot is only consumed if a frag would overflow the current slot (plus some other conditions) so the actual limit is MAX_SKB_FRAGS grant_copy_ops per ring slot. This patch fixes those two sizing issues and, because grant_copy_ops grows so much, it pulls it out into a separate chunk of vmalloc()ed memory. Signed-off-by: Paul Durrant Acked-by: Wei Liu Cc: Ian Campbell Cc: David Vrabel Signed-off-by: David S. Miller diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h index 08ae01b..c47794b 100644 --- a/drivers/net/xen-netback/common.h +++ b/drivers/net/xen-netback/common.h @@ -101,6 +101,13 @@ struct xenvif_rx_meta { #define MAX_PENDING_REQS 256 +/* It's possible for an skb to have a maximal number of frags + * but still be less than MAX_BUFFER_OFFSET in size. Thus the + * worst-case number of copy operations is MAX_SKB_FRAGS per + * ring slot. + */ +#define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE) + struct xenvif { /* Unique identifier for this interface. */ domid_t domid; @@ -143,13 +150,13 @@ struct xenvif { */ RING_IDX rx_req_cons_peek; - /* Given MAX_BUFFER_OFFSET of 4096 the worst case is that each - * head/fragment page uses 2 copy operations because it - * straddles two buffers in the frontend. - */ - struct gnttab_copy grant_copy_op[2*XEN_NETIF_RX_RING_SIZE]; - struct xenvif_rx_meta meta[2*XEN_NETIF_RX_RING_SIZE]; + /* This array is allocated seperately as it is large */ + struct gnttab_copy *grant_copy_op; + /* We create one meta structure per ring request we consume, so + * the maximum number is the same as the ring size. + */ + struct xenvif_rx_meta meta[XEN_NETIF_RX_RING_SIZE]; u8 fe_dev_addr[6]; diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c index 870f1fa..34ca4e5 100644 --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -307,6 +307,15 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid, SET_NETDEV_DEV(dev, parent); vif = netdev_priv(dev); + + vif->grant_copy_op = vmalloc(sizeof(struct gnttab_copy) * + MAX_GRANT_COPY_OPS); + if (vif->grant_copy_op == NULL) { + pr_warn("Could not allocate grant copy space for %s\n", name); + free_netdev(dev); + return ERR_PTR(-ENOMEM); + } + vif->domid = domid; vif->handle = handle; vif->can_sg = 1; @@ -487,6 +496,7 @@ void xenvif_free(struct xenvif *vif) unregister_netdev(vif->dev); + vfree(vif->grant_copy_op); free_netdev(vif->dev); module_put(THIS_MODULE); diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 7b4fd93..7842555 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -608,7 +608,7 @@ void xenvif_rx_action(struct xenvif *vif) if (!npo.copy_prod) return; - BUG_ON(npo.copy_prod > ARRAY_SIZE(vif->grant_copy_op)); + BUG_ON(npo.copy_prod > MAX_GRANT_COPY_OPS); gnttab_batch_copy(vif->grant_copy_op, npo.copy_prod); while ((skb = __skb_dequeue(&rxq)) != NULL) { -- cgit v0.10.2 From f81152e35001e91997ec74a7b4e040e6ab0acccf Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Mon, 23 Dec 2013 00:32:31 +0100 Subject: net: rose: restore old recvmsg behavior recvmsg handler in net/rose/af_rose.c performs size-check ->msg_namelen. After commit f3d3342602f8bcbf37d7c46641cb9bca7618eb1c (net: rework recvmsg handler msg_name and msg_namelen logic), we now always take the else branch due to namelen being initialized to 0. Digging in netdev-vger-cvs git repo shows that msg_namelen was initialized with a fixed-size since at least 1995, so the else branch was never taken. Compile tested only. Signed-off-by: Florian Westphal Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c index 33af772..62ced65 100644 --- a/net/rose/af_rose.c +++ b/net/rose/af_rose.c @@ -1253,6 +1253,7 @@ static int rose_recvmsg(struct kiocb *iocb, struct socket *sock, if (msg->msg_name) { struct sockaddr_rose *srose; + struct full_sockaddr_rose *full_srose = msg->msg_name; memset(msg->msg_name, 0, sizeof(struct full_sockaddr_rose)); srose = msg->msg_name; @@ -1260,18 +1261,9 @@ static int rose_recvmsg(struct kiocb *iocb, struct socket *sock, srose->srose_addr = rose->dest_addr; srose->srose_call = rose->dest_call; srose->srose_ndigis = rose->dest_ndigis; - if (msg->msg_namelen >= sizeof(struct full_sockaddr_rose)) { - struct full_sockaddr_rose *full_srose = (struct full_sockaddr_rose *)msg->msg_name; - for (n = 0 ; n < rose->dest_ndigis ; n++) - full_srose->srose_digis[n] = rose->dest_digis[n]; - msg->msg_namelen = sizeof(struct full_sockaddr_rose); - } else { - if (rose->dest_ndigis >= 1) { - srose->srose_ndigis = 1; - srose->srose_digi = rose->dest_digis[0]; - } - msg->msg_namelen = sizeof(struct sockaddr_rose); - } + for (n = 0 ; n < rose->dest_ndigis ; n++) + full_srose->srose_digis[n] = rose->dest_digis[n]; + msg->msg_namelen = sizeof(struct full_sockaddr_rose); } skb_free_datagram(sk, skb); -- cgit v0.10.2 From 33c133cc7598e60976a069344910d63e56cc4401 Mon Sep 17 00:00:00 2001 From: Sergei Shtylyov Date: Fri, 20 Dec 2013 22:09:04 +0300 Subject: phy: IRQ cannot be shared With the way PHY IRQ handler is implemented (all real handling being pushed to the workqueue and returning IRQ_HANDLED all the time PHY is active), we cannot really claim that PHY IRQ can be shared when calling request_irq(). Signed-off-by: Sergei Shtylyov Acked-by: Florian Fainelli Signed-off-by: David S. Miller diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index 36c6994..98434b8 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -565,10 +565,8 @@ int phy_start_interrupts(struct phy_device *phydev) int err = 0; atomic_set(&phydev->irq_disable, 0); - if (request_irq(phydev->irq, phy_interrupt, - IRQF_SHARED, - "phy_interrupt", - phydev) < 0) { + if (request_irq(phydev->irq, phy_interrupt, 0, "phy_interrupt", + phydev) < 0) { pr_warn("%s: Can't get IRQ %d (PHY)\n", phydev->bus->name, phydev->irq); phydev->irq = PHY_POLL; -- cgit v0.10.2 From 7cd013992335b1c5156059248ee765fb3b14d154 Mon Sep 17 00:00:00 2001 From: Vince Bridgers Date: Fri, 20 Dec 2013 11:19:34 -0600 Subject: stmmac: Fix incorrect spinlock release and PTP cap detection. This patch corrects a problem in stmmac_ptp.c, functions stmmac_adjust_time and stmmac_adjust_freq where the incorrect spinlocks were released. This patch also addresses a problem in stmmac_main, function stmmac_init_ptp where the capability detection for advanced timestamping was masked by message masking. This patch was touch tested using linuxptp, and runs without the previously observed instabilities. More extensive testing is ongoing. Vince Signed-off-by: Vince Bridgers Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 8a7a23a..797b56a 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -622,17 +622,15 @@ static int stmmac_init_ptp(struct stmmac_priv *priv) if (!(priv->dma_cap.time_stamp || priv->dma_cap.atime_stamp)) return -EOPNOTSUPP; - if (netif_msg_hw(priv)) { - if (priv->dma_cap.time_stamp) { - pr_debug("IEEE 1588-2002 Time Stamp supported\n"); - priv->adv_ts = 0; - } - if (priv->dma_cap.atime_stamp && priv->extend_desc) { - pr_debug - ("IEEE 1588-2008 Advanced Time Stamp supported\n"); - priv->adv_ts = 1; - } - } + priv->adv_ts = 0; + if (priv->dma_cap.atime_stamp && priv->extend_desc) + priv->adv_ts = 1; + + if (netif_msg_hw(priv) && priv->dma_cap.time_stamp) + pr_debug("IEEE 1588-2002 Time Stamp supported\n"); + + if (netif_msg_hw(priv) && priv->adv_ts) + pr_debug("IEEE 1588-2008 Advanced Time Stamp supported\n"); priv->hw->ptp = &stmmac_ptp; priv->hwts_tx_en = 0; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c index b8b0eee..7680581 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ptp.c @@ -56,7 +56,7 @@ static int stmmac_adjust_freq(struct ptp_clock_info *ptp, s32 ppb) priv->hw->ptp->config_addend(priv->ioaddr, addend); - spin_unlock_irqrestore(&priv->lock, flags); + spin_unlock_irqrestore(&priv->ptp_lock, flags); return 0; } @@ -91,7 +91,7 @@ static int stmmac_adjust_time(struct ptp_clock_info *ptp, s64 delta) priv->hw->ptp->adjust_systime(priv->ioaddr, sec, nsec, neg_adj); - spin_unlock_irqrestore(&priv->lock, flags); + spin_unlock_irqrestore(&priv->ptp_lock, flags); return 0; } -- cgit v0.10.2 From 13fcca8f25f4e9ce7f55da9cd353bb743236e212 Mon Sep 17 00:00:00 2001 From: Rob Herring Date: Sun, 29 Dec 2013 19:37:43 -0600 Subject: Revert "of/address: Handle #address-cells > 2 specially" This reverts commit e38c0a1fbc5803cbacdaac0557c70ac8ca5152e7. Nikita Yushchenko reports: While trying to make freescale p2020ds and mpc8572ds boards working with mainline kernel, I faced that commit e38c0a1f (Handle Both these boards have uli1575 chip. Corresponding part in device tree is something like uli1575@0 { reg = <0x0 0x0 0x0 0x0 0x0>; #size-cells = <2>; #address-cells = <3>; ranges = <0x2000000 0x0 0x80000000 0x2000000 0x0 0x80000000 0x0 0x20000000 0x1000000 0x0 0x0 0x1000000 0x0 0x0 0x0 0x10000>; isa@1e { ... I.e. it has #address-cells = <3> With commit e38c0a1f reverted, devices under uli1575 are registered correctly, e.g. for rtc OF: ** translation for device /pcie@ffe09000/pcie@0/uli1575@0/isa@1e/rtc@70 ** OF: bus is isa (na=2, ns=1) on /pcie@ffe09000/pcie@0/uli1575@0/isa@1e OF: translating address: 00000001 00000070 OF: parent bus is default (na=3, ns=2) on /pcie@ffe09000/pcie@0/uli1575@0 OF: walking ranges... OF: ISA map, cp=0, s=1000, da=70 OF: parent translation for: 01000000 00000000 00000000 OF: with offset: 70 OF: one level translation: 00000000 00000000 00000070 OF: parent bus is pci (na=3, ns=2) on /pcie@ffe09000/pcie@0 OF: walking ranges... OF: default map, cp=a0000000, s=20000000, da=70 OF: default map, cp=0, s=10000, da=70 OF: parent translation for: 01000000 00000000 00000000 OF: with offset: 70 OF: one level translation: 01000000 00000000 00000070 OF: parent bus is pci (na=3, ns=2) on /pcie@ffe09000 OF: walking ranges... OF: PCI map, cp=0, s=10000, da=70 OF: parent translation for: 01000000 00000000 00000000 OF: with offset: 70 OF: one level translation: 01000000 00000000 00000070 OF: parent bus is default (na=2, ns=2) on / OF: walking ranges... OF: PCI map, cp=0, s=10000, da=70 OF: parent translation for: 00000000 ffc10000 OF: with offset: 70 OF: one level translation: 00000000 ffc10070 OF: reached root node With commit e38c0a1f in place, address translation fails: OF: ** translation for device /pcie@ffe09000/pcie@0/uli1575@0/isa@1e/rtc@70 ** OF: bus is isa (na=2, ns=1) on /pcie@ffe09000/pcie@0/uli1575@0/isa@1e OF: translating address: 00000001 00000070 OF: parent bus is default (na=3, ns=2) on /pcie@ffe09000/pcie@0/uli1575@0 OF: walking ranges... OF: ISA map, cp=0, s=1000, da=70 OF: parent translation for: 01000000 00000000 00000000 OF: with offset: 70 OF: one level translation: 00000000 00000000 00000070 OF: parent bus is pci (na=3, ns=2) on /pcie@ffe09000/pcie@0 OF: walking ranges... OF: default map, cp=a0000000, s=20000000, da=70 OF: default map, cp=0, s=10000, da=70 OF: not found ! Thierry Reding confirmed this commit was not needed after all: "We ended up merging a different address representation for Tegra PCIe and I've confirmed that reverting this commit doesn't cause any obvious regressions. I think all other drivers in drivers/pci/host ended up copying what we did on Tegra, so I wouldn't expect any other breakage either." There doesn't appear to be a simple way to support both behaviours, so reverting this as nothing should be depending on the new behaviour. Cc: stable@vger.kernel.org # v3.7+ Signed-off-by: Rob Herring diff --git a/drivers/of/address.c b/drivers/of/address.c index 4b9317b..d3dd41c 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -69,14 +69,6 @@ static u64 of_bus_default_map(__be32 *addr, const __be32 *range, (unsigned long long)cp, (unsigned long long)s, (unsigned long long)da); - /* - * If the number of address cells is larger than 2 we assume the - * mapping doesn't specify a physical address. Rather, the address - * specifies an identifier that must match exactly. - */ - if (na > 2 && memcmp(range, addr, na * 4) != 0) - return OF_BAD_ADDR; - if (da < cp || da >= (cp + s)) return OF_BAD_ADDR; return da - cp; -- cgit v0.10.2 From 5d9270869b6cd3cb098ad9393c0cb62ab184e35e Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Tue, 24 Dec 2013 21:06:01 +0100 Subject: of/Kconfig: Spelling s/one/once/ Signed-off-by: Geert Uytterhoeven Signed-off-by: Rob Herring diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig index de6f899..c6973f1 100644 --- a/drivers/of/Kconfig +++ b/drivers/of/Kconfig @@ -20,7 +20,7 @@ config OF_SELFTEST depends on OF_IRQ help This option builds in test cases for the device tree infrastructure - that are executed one at boot time, and the results dumped to the + that are executed once at boot time, and the results dumped to the console. If unsure, say N here, but this option is safe to enable. -- cgit v0.10.2 From 2f53a713c4b6c1d2f32d87c532d47ad17861053a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Tue, 17 Dec 2013 18:32:53 +0100 Subject: of/irq: Fix device_node refcount in of_irq_parse_raw() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Commit 2361613206e6, "of/irq: Refactor interrupt-map parsing" changed the refcount on the device_node causing an error in of_node_put(): ERROR: Bad of_node_put() on /pci@800000020000000 CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc3-dirty #2 Call Trace: [c00000003e403500] [c0000000000144fc] .show_stack+0x7c/0x1f0 (unreliable) [c00000003e4035d0] [c00000000070f250] .dump_stack+0x88/0xb4 [c00000003e403650] [c0000000005e8768] .of_node_release+0xd8/0xf0 [c00000003e4036e0] [c0000000005eeafc] .of_irq_parse_one+0x10c/0x280 [c00000003e4037a0] [c0000000005efd4c] .of_irq_parse_pci+0x3c/0x1d0 [c00000003e403840] [c000000000038240] .pcibios_setup_device+0xa0/0x2e0 [c00000003e403910] [c0000000000398f0] .pcibios_setup_bus_devices+0x60/0xd0 [c00000003e403990] [c00000000003b3a4] .__of_scan_bus+0x1a4/0x2b0 [c00000003e403a80] [c00000000003a62c] .pcibios_scan_phb+0x30c/0x410 [c00000003e403b60] [c0000000009fe430] .pcibios_init+0x7c/0xd4 This patch adjusts the refcount in the walk of the interrupt tree. When a match is found, there is no need to increase the refcount on 'out_irq->np' as 'newpar' is already holding a ref. The refcount balance between 'ipar' and 'newpar' is maintained in the skiplevel: goto label. This patch also removes the usage of the device_node variable 'old' which seems useless after the latest changes. Signed-off-by: Cédric Le Goater Signed-off-by: Rob Herring diff --git a/drivers/of/irq.c b/drivers/of/irq.c index 786b0b4..2721240 100644 --- a/drivers/of/irq.c +++ b/drivers/of/irq.c @@ -165,7 +165,6 @@ int of_irq_parse_raw(const __be32 *addr, struct of_phandle_args *out_irq) if (of_get_property(ipar, "interrupt-controller", NULL) != NULL) { pr_debug(" -> got it !\n"); - of_node_put(old); return 0; } @@ -250,8 +249,7 @@ int of_irq_parse_raw(const __be32 *addr, struct of_phandle_args *out_irq) * Successfully parsed an interrrupt-map translation; copy new * interrupt specifier into the out_irq structure */ - of_node_put(out_irq->np); - out_irq->np = of_node_get(newpar); + out_irq->np = newpar; match_array = imap - newaddrsize - newintsize; for (i = 0; i < newintsize; i++) @@ -268,7 +266,6 @@ int of_irq_parse_raw(const __be32 *addr, struct of_phandle_args *out_irq) } fail: of_node_put(ipar); - of_node_put(out_irq->np); of_node_put(newpar); return -EINVAL; -- cgit v0.10.2 From 5d3ad8a63844f496f5c3a1f8bc3d25dc374c8360 Mon Sep 17 00:00:00 2001 From: Rob Herring Date: Tue, 3 Dec 2013 10:20:16 -0600 Subject: MAINTAINERS: Update Rob Herring's email address My Calxeda email address is going away. Signed-off-by: Rob Herring Signed-off-by: Rob Herring diff --git a/MAINTAINERS b/MAINTAINERS index d5e4ff3..21c038f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -783,7 +783,7 @@ F: arch/arm/boot/dts/sama*.dts F: arch/arm/boot/dts/sama*.dtsi ARM/CALXEDA HIGHBANK ARCHITECTURE -M: Rob Herring +M: Rob Herring L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers) S: Maintained F: arch/arm/mach-highbank/ @@ -6256,7 +6256,7 @@ F: drivers/i2c/busses/i2c-ocores.c OPEN FIRMWARE AND FLATTENED DEVICE TREE M: Grant Likely -M: Rob Herring +M: Rob Herring L: devicetree@vger.kernel.org W: http://fdt.secretlab.ca T: git git://git.secretlab.ca/git/linux-2.6.git @@ -6268,7 +6268,7 @@ K: of_get_property K: of_match_table OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS -M: Rob Herring +M: Rob Herring M: Pawel Moll M: Mark Rutland M: Ian Campbell -- cgit v0.10.2 From 2205369a314e12fcec4781cc73ac9c08fc2b47de Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Tue, 31 Dec 2013 16:23:35 -0500 Subject: vlan: Fix header ops passthru when doing TX VLAN offload. When the vlan code detects that the real device can do TX VLAN offloads in hardware, it tries to arrange for the real device's header_ops to be invoked directly. But it does so illegally, by simply hooking the real device's header_ops up to the VLAN device. This doesn't work because we will end up invoking a set of header_ops routines which expect a device type which matches the real device, but will see a VLAN device instead. Fix this by providing a pass-thru set of header_ops which will arrange to pass the proper real device instead. To facilitate this add a dev_rebuild_header(). There are implementations which provide a ->cache and ->create but not a ->rebuild (f.e. PLIP). So we need a helper function just like dev_hard_header() to avoid crashes. Use this helper in the one existing place where the header_ops->rebuild was being invoked, the neighbour code. With lots of help from Florian Westphal. Signed-off-by: David S. Miller diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index d9a550b..7514b9c 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1912,6 +1912,15 @@ static inline int dev_parse_header(const struct sk_buff *skb, return dev->header_ops->parse(skb, haddr); } +static inline int dev_rebuild_header(struct sk_buff *skb) +{ + const struct net_device *dev = skb->dev; + + if (!dev->header_ops || !dev->header_ops->rebuild) + return 0; + return dev->header_ops->rebuild(skb); +} + typedef int gifconf_func_t(struct net_device * dev, char __user * bufptr, int len); int register_gifconf(unsigned int family, gifconf_func_t *gifconf); static inline int unregister_gifconf(unsigned int family) diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c index 762896e..47c908f 100644 --- a/net/8021q/vlan_dev.c +++ b/net/8021q/vlan_dev.c @@ -530,6 +530,23 @@ static const struct header_ops vlan_header_ops = { .parse = eth_header_parse, }; +static int vlan_passthru_hard_header(struct sk_buff *skb, struct net_device *dev, + unsigned short type, + const void *daddr, const void *saddr, + unsigned int len) +{ + struct vlan_dev_priv *vlan = vlan_dev_priv(dev); + struct net_device *real_dev = vlan->real_dev; + + return dev_hard_header(skb, real_dev, type, daddr, saddr, len); +} + +static const struct header_ops vlan_passthru_header_ops = { + .create = vlan_passthru_hard_header, + .rebuild = dev_rebuild_header, + .parse = eth_header_parse, +}; + static struct device_type vlan_type = { .name = "vlan", }; @@ -573,7 +590,7 @@ static int vlan_dev_init(struct net_device *dev) dev->needed_headroom = real_dev->needed_headroom; if (real_dev->features & NETIF_F_HW_VLAN_CTAG_TX) { - dev->header_ops = real_dev->header_ops; + dev->header_ops = &vlan_passthru_header_ops; dev->hard_header_len = real_dev->hard_header_len; } else { dev->header_ops = &vlan_header_ops; diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 36b1443..932c6d7 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1275,7 +1275,7 @@ int neigh_compat_output(struct neighbour *neigh, struct sk_buff *skb) if (dev_hard_header(skb, dev, ntohs(skb->protocol), NULL, NULL, skb->len) < 0 && - dev->header_ops->rebuild(skb)) + dev_rebuild_header(skb)) return 0; return dev_queue_xmit(skb); -- cgit v0.10.2 From 8d88bbffcbac2e7ceba04a9cdff97241b6b5f1db Mon Sep 17 00:00:00 2001 From: Octavian Purdila Date: Mon, 23 Dec 2013 19:06:31 +0200 Subject: usbnet: mcs7830: rework link state detection Even with the quirks in commit dabdaf0c (mcs7830: Fix link state detection) there are still spurious link-down events for some chips where the false link-down events count go over a few hundreds. This patch takes a more conservative approach and only looks at link-down events where the link-down state is not combined with other states (e.g. half/full speed, pending frames in SRAM or TX status information valid). In all other cases we assume the link is up. Tested on MCS7830CV-DA (USB ID 9710:7830). Cc: Ondrej Zary Cc: Michael Leun Cc: Ming Lei Signed-off-by: Octavian Purdila Signed-off-by: David S. Miller diff --git a/drivers/net/usb/mcs7830.c b/drivers/net/usb/mcs7830.c index 03832d3..f546378 100644 --- a/drivers/net/usb/mcs7830.c +++ b/drivers/net/usb/mcs7830.c @@ -117,7 +117,6 @@ enum { struct mcs7830_data { u8 multi_filter[8]; u8 config; - u8 link_counter; }; static const char driver_name[] = "MOSCHIP usb-ethernet driver"; @@ -561,26 +560,16 @@ static void mcs7830_status(struct usbnet *dev, struct urb *urb) { u8 *buf = urb->transfer_buffer; bool link, link_changed; - struct mcs7830_data *data = mcs7830_get_data(dev); if (urb->actual_length < 16) return; - link = !(buf[1] & 0x20); + link = !(buf[1] == 0x20); link_changed = netif_carrier_ok(dev->net) != link; if (link_changed) { - data->link_counter++; - /* - track link state 20 times to guard against erroneous - link state changes reported sometimes by the chip - */ - if (data->link_counter > 20) { - data->link_counter = 0; - usbnet_link_change(dev, link, 0); - netdev_dbg(dev->net, "Link Status is: %d\n", link); - } - } else - data->link_counter = 0; + usbnet_link_change(dev, link, 0); + netdev_dbg(dev->net, "Link Status is: %d\n", link); + } } static const struct driver_info moschip_info = { -- cgit v0.10.2 From b899e698fca1de16921525e347f6e81539fdedcf Mon Sep 17 00:00:00 2001 From: Yaniv Rosner Date: Wed, 1 Jan 2014 11:06:41 +0200 Subject: bnx2x: Fix 578xx-KR 1G link Fix a problem where 578xx-KR is unable to get link when connected to 1G link partner. Two fixes were required: One was to force CL37 sync_status low to prevent Warpcore from getting stuck in CL73 parallel detect loop while link partner is sending. Second fix was to enable auto-detect mode, thus allowing the Warpcore to select the higher speed protocol between 10G-KR (over CL73), or go down to 1G over CL73 when there's indication for it. Signed-off-by: Yaniv Rosner Signed-off-by: Ariel Elior Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c index 20dcc02..efbf729 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c @@ -3865,6 +3865,19 @@ static void bnx2x_warpcore_enable_AN_KR(struct bnx2x_phy *phy, bnx2x_warpcore_enable_AN_KR2(phy, params, vars); } else { + /* Enable Auto-Detect to support 1G over CL37 as well */ + bnx2x_cl45_write(bp, phy, MDIO_WC_DEVAD, + MDIO_WC_REG_SERDESDIGITAL_CONTROL1000X1, 0x10); + + /* Force cl48 sync_status LOW to avoid getting stuck in CL73 + * parallel-detect loop when CL73 and CL37 are enabled. + */ + CL22_WR_OVER_CL45(bp, phy, MDIO_REG_BANK_AER_BLOCK, + MDIO_AER_BLOCK_AER_REG, 0); + bnx2x_cl45_write(bp, phy, MDIO_WC_DEVAD, + MDIO_WC_REG_RXB_ANA_RX_CONTROL_PCI, 0x0800); + bnx2x_set_aer_mmd(params, phy); + bnx2x_disable_kr2(params, vars, phy); } diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h index 3efbb35..14ffb6e 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h @@ -7179,6 +7179,7 @@ Theotherbitsarereservedandshouldbezero*/ #define MDIO_WC_REG_RX1_PCI_CTRL 0x80ca #define MDIO_WC_REG_RX2_PCI_CTRL 0x80da #define MDIO_WC_REG_RX3_PCI_CTRL 0x80ea +#define MDIO_WC_REG_RXB_ANA_RX_CONTROL_PCI 0x80fa #define MDIO_WC_REG_XGXSBLK2_UNICORE_MODE_10G 0x8104 #define MDIO_WC_REG_XGXS_STATUS3 0x8129 #define MDIO_WC_REG_PAR_DET_10G_STATUS 0x8130 -- cgit v0.10.2 From e803d33a32213f2c28456faaa62b2a88f91de2ea Mon Sep 17 00:00:00 2001 From: Yaniv Rosner Date: Wed, 1 Jan 2014 11:06:42 +0200 Subject: bnx2x: Fix passive DAC cable detection Fix Passive DAC detection for specific cables, such that even in case SFP_CABLE_TECHNOLOGY option is not set in the EEPROM (offset 8), treat it as a passive DAC cable, since some cables don't have this indication. Signed-off-by: Yaniv Rosner Signed-off-by: Ariel Elior Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c index efbf729..000b6ee 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c @@ -8133,17 +8133,20 @@ static int bnx2x_get_edc_mode(struct bnx2x_phy *phy, *edc_mode = EDC_MODE_ACTIVE_DAC; else check_limiting_mode = 1; - } else if (copper_module_type & - SFP_EEPROM_FC_TX_TECH_BITMASK_COPPER_PASSIVE) { + } else { + *edc_mode = EDC_MODE_PASSIVE_DAC; + /* Even in case PASSIVE_DAC indication is not set, + * treat it as a passive DAC cable, since some cables + * don't have this indication. + */ + if (copper_module_type & + SFP_EEPROM_FC_TX_TECH_BITMASK_COPPER_PASSIVE) { DP(NETIF_MSG_LINK, "Passive Copper cable detected\n"); - *edc_mode = - EDC_MODE_PASSIVE_DAC; - } else { - DP(NETIF_MSG_LINK, - "Unknown copper-cable-type 0x%x !!!\n", - copper_module_type); - return -EINVAL; + } else { + DP(NETIF_MSG_LINK, + "Unknown copper-cable-type\n"); + } } break; } -- cgit v0.10.2 From a429ec239cbc42dd3b2bc9933745103ca71d2b9d Mon Sep 17 00:00:00 2001 From: Yaniv Rosner Date: Wed, 1 Jan 2014 11:06:43 +0200 Subject: bnx2x: Fix Duplex setting for 54618se BCM54618SE is used to advertise half-duplex even if HD was not requested by the user. This change makes the legacy speed/duplex advertisement for this PHY exactly according to the requested speed and duplex. Signed-off-by: Yaniv Rosner Signed-off-by: Ariel Elior Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c index 000b6ee..68417e1 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c @@ -10841,9 +10841,9 @@ static int bnx2x_54618se_config_init(struct bnx2x_phy *phy, (1<<11)); if (((phy->req_line_speed == SPEED_AUTO_NEG) && - (phy->speed_cap_mask & - PORT_HW_CFG_SPEED_CAPABILITY_D0_1G)) || - (phy->req_line_speed == SPEED_1000)) { + (phy->speed_cap_mask & + PORT_HW_CFG_SPEED_CAPABILITY_D0_1G)) || + (phy->req_line_speed == SPEED_1000)) { an_1000_val |= (1<<8); autoneg_val |= (1<<9 | 1<<12); if (phy->req_duplex == DUPLEX_FULL) @@ -10859,30 +10859,32 @@ static int bnx2x_54618se_config_init(struct bnx2x_phy *phy, 0x09, &an_1000_val); - /* Set 100 speed advertisement */ - if (((phy->req_line_speed == SPEED_AUTO_NEG) && - (phy->speed_cap_mask & - (PORT_HW_CFG_SPEED_CAPABILITY_D0_100M_FULL | - PORT_HW_CFG_SPEED_CAPABILITY_D0_100M_HALF)))) { - an_10_100_val |= (1<<7); - /* Enable autoneg and restart autoneg for legacy speeds */ - autoneg_val |= (1<<9 | 1<<12); - - if (phy->req_duplex == DUPLEX_FULL) - an_10_100_val |= (1<<8); - DP(NETIF_MSG_LINK, "Advertising 100M\n"); - } - - /* Set 10 speed advertisement */ - if (((phy->req_line_speed == SPEED_AUTO_NEG) && - (phy->speed_cap_mask & - (PORT_HW_CFG_SPEED_CAPABILITY_D0_10M_FULL | - PORT_HW_CFG_SPEED_CAPABILITY_D0_10M_HALF)))) { - an_10_100_val |= (1<<5); - autoneg_val |= (1<<9 | 1<<12); - if (phy->req_duplex == DUPLEX_FULL) + /* Advertise 10/100 link speed */ + if (phy->req_line_speed == SPEED_AUTO_NEG) { + if (phy->speed_cap_mask & + PORT_HW_CFG_SPEED_CAPABILITY_D0_10M_HALF) { + an_10_100_val |= (1<<5); + autoneg_val |= (1<<9 | 1<<12); + DP(NETIF_MSG_LINK, "Advertising 10M-HD\n"); + } + if (phy->speed_cap_mask & + PORT_HW_CFG_SPEED_CAPABILITY_D0_10M_FULL) { an_10_100_val |= (1<<6); - DP(NETIF_MSG_LINK, "Advertising 10M\n"); + autoneg_val |= (1<<9 | 1<<12); + DP(NETIF_MSG_LINK, "Advertising 10M-FD\n"); + } + if (phy->speed_cap_mask & + PORT_HW_CFG_SPEED_CAPABILITY_D0_100M_HALF) { + an_10_100_val |= (1<<7); + autoneg_val |= (1<<9 | 1<<12); + DP(NETIF_MSG_LINK, "Advertising 100M-HD\n"); + } + if (phy->speed_cap_mask & + PORT_HW_CFG_SPEED_CAPABILITY_D0_100M_FULL) { + an_10_100_val |= (1<<8); + autoneg_val |= (1<<9 | 1<<12); + DP(NETIF_MSG_LINK, "Advertising 100M-FD\n"); + } } /* Only 10/100 are allowed to work in FORCE mode */ -- cgit v0.10.2 From ad1d9ef3f736485ba0f49325fa4c73cd1d963853 Mon Sep 17 00:00:00 2001 From: Yaniv Rosner Date: Wed, 1 Jan 2014 11:06:44 +0200 Subject: bnx2x: Fix incorrect link-up report Fix a problem where link is reported to be up when SFP+ module is plugged in without cable. This occurs with specific module types which may generate temporary TX_FAULT indication. Solution is to avoid changing any link parameters when checking TX_FAULT indication while physical link is down. Signed-off-by: Yaniv Rosner Signed-off-by: Ariel Elior Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c index 68417e1..998cce3 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c @@ -13360,6 +13360,10 @@ static u8 bnx2x_analyze_link_error(struct link_params *params, DP(NETIF_MSG_LINK, "Link changed:[%x %x]->%x\n", vars->link_up, old_status, status); + /* Do not touch the link in case physical link down */ + if ((vars->phy_flags & PHY_PHYSICAL_LINK_FLAG) == 0) + return 1; + /* a. Update shmem->link_status accordingly * b. Update link_vars->link_up */ -- cgit v0.10.2 From f17e9fa5686ac21bce756638ef71160395880928 Mon Sep 17 00:00:00 2001 From: Yaniv Rosner Date: Wed, 1 Jan 2014 11:06:45 +0200 Subject: bnx2x: Fix KR2 work-around detection of BCM8073 KR2 work-around is based on detecting non-KR2 devices which may not link up in this mode. One such link-partner is the BCM8073 which has specific advertisement characteristics in specific mode, and this condition was not set correctly. Signed-off-by: Yaniv Rosner Signed-off-by: Ariel Elior Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c index 998cce3..11fc795 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c @@ -13572,7 +13572,7 @@ static void bnx2x_check_kr2_wa(struct link_params *params, */ not_kr2_device = (((base_page & 0x8000) == 0) || (((base_page & 0x8000) && - ((next_page & 0xe0) == 0x2)))); + ((next_page & 0xe0) == 0x20)))); /* In case KR2 is already disabled, check if we need to re-enable it */ if (!(vars->link_attr_sync & LINK_ATTR_SYNC_KR2_ENABLE)) { -- cgit v0.10.2 From 7e0309631ecf0cd16edba72ff74747fa1b96ead3 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Wed, 1 Jan 2014 23:04:25 +0100 Subject: net: llc: fix order of evaluation in llc_conn_ac_inc_vr_by_1 Function llc_conn_ac_inc_vr_by_1() evaluates via macro PDU_GET_NEXT_Vr() into ... llc_sk(sk)->vR = ++llc_sk(sk)->vR & 0xffffffffffffff7f ... but the order in which the side effects take place is undefined because there is no intervening sequence point. As llc_sk(sk)->vR is written in llc_sk(sk)->vR (assignment left-hand side) and written in ++llc_sk(sk)->vR & 0xffffffffffffff7f this might possibly yield undefined behavior. The final value of llc_sk(sk)->vR is ambiguous, because, depending on the order of expression evaluation, the increment may occur before, after, or interleaved with the assignment. In C, evaluating such an expression yields undefined behavior. Since we're doing the increment via PDU_GET_NEXT_Vr() macro and the only place it is being used is from llc_conn_ac_inc_vr_by_1(), in order to increment vR by 1 with a follow-up optimized modulo, rewrite the expression into ((vR + 1) & CONST) in order to fix this. Signed-off-by: Daniel Borkmann Cc: Arnaldo Carvalho de Melo Cc: Stephen Hemminger Signed-off-by: David S. Miller diff --git a/include/net/llc_pdu.h b/include/net/llc_pdu.h index 31e2de7..c0f0a13 100644 --- a/include/net/llc_pdu.h +++ b/include/net/llc_pdu.h @@ -142,7 +142,7 @@ #define LLC_S_PF_IS_1(pdu) ((pdu->ctrl_2 & LLC_S_PF_BIT_MASK) ? 1 : 0) #define PDU_SUPV_GET_Nr(pdu) ((pdu->ctrl_2 & 0xFE) >> 1) -#define PDU_GET_NEXT_Vr(sn) (++sn & ~LLC_2_SEQ_NBR_MODULO) +#define PDU_GET_NEXT_Vr(sn) (((sn) + 1) & ~LLC_2_SEQ_NBR_MODULO) /* FRMR information field macros */ -- cgit v0.10.2 From c3ac17cd6af2687d5881184edd310a5f9c4baa98 Mon Sep 17 00:00:00 2001 From: Li RongQing Date: Thu, 2 Jan 2014 08:49:36 +0800 Subject: ipv6: fix the use of pcpu_tstats in sit when read/write the 64bit data, the correct lock should be hold. Signed-off-by: Li RongQing Acked-by: Eric Dumazet Signed-off-by: David S. Miller diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index c874822..d3005b3 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -702,8 +702,10 @@ static int ipip6_rcv(struct sk_buff *skb) } tstats = this_cpu_ptr(tunnel->dev->tstats); + u64_stats_update_begin(&tstats->syncp); tstats->rx_packets++; tstats->rx_bytes += skb->len; + u64_stats_update_end(&tstats->syncp); netif_rx(skb); -- cgit v0.10.2 From d9c602f033b00ba360a324c0ee5aa59a6838fb40 Mon Sep 17 00:00:00 2001 From: Manish Chopra Date: Thu, 2 Jan 2014 13:38:43 -0500 Subject: qlcnic: Fix loopback diagnostic test o Adapter requires that if the port is in loopback mode no traffic should be flowing through that port, so on arrival of Link up AEN, do not advertise Link up to the stack until port is out of loopback mode Signed-off-by: Manish Chopra Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h index 631ea0a..4dfef81 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h @@ -487,6 +487,7 @@ struct qlcnic_hardware_context { struct qlcnic_mailbox *mailbox; u8 extend_lb_time; u8 phys_port_id[ETH_ALEN]; + u8 lb_mode; }; struct qlcnic_adapter_stats { @@ -808,6 +809,7 @@ struct qlcnic_mac_list_s { #define QLCNIC_ILB_MODE 0x1 #define QLCNIC_ELB_MODE 0x2 +#define QLCNIC_LB_MODE_MASK 0x3 #define QLCNIC_LINKEVENT 0x1 #define QLCNIC_LB_RESPONSE 0x2 diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c index 6055d39..f776f99 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c @@ -1684,12 +1684,6 @@ int qlcnic_83xx_loopback_test(struct net_device *netdev, u8 mode) } } while ((adapter->ahw->linkup && ahw->has_link_events) != 1); - /* Make sure carrier is off and queue is stopped during loopback */ - if (netif_running(netdev)) { - netif_carrier_off(netdev); - netif_tx_stop_all_queues(netdev); - } - ret = qlcnic_do_lb_test(adapter, mode); qlcnic_83xx_clear_lb_mode(adapter, mode); @@ -2121,6 +2115,7 @@ static void qlcnic_83xx_handle_link_aen(struct qlcnic_adapter *adapter, ahw->link_autoneg = MSB(MSW(data[3])); ahw->module_type = MSB(LSW(data[3])); ahw->has_link_events = 1; + ahw->lb_mode = data[4] & QLCNIC_LB_MODE_MASK; qlcnic_advert_link_change(adapter, link_status); } diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c index eda6c69..1362976 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c @@ -689,6 +689,10 @@ void qlcnic_advert_link_change(struct qlcnic_adapter *adapter, int linkup) adapter->ahw->linkup = 0; netif_carrier_off(netdev); } else if (!adapter->ahw->linkup && linkup) { + /* Do not advertise Link up if the port is in loopback mode */ + if (qlcnic_83xx_check(adapter) && adapter->ahw->lb_mode) + return; + netdev_info(netdev, "NIC Link is up\n"); adapter->ahw->linkup = 1; netif_carrier_on(netdev); -- cgit v0.10.2 From f3e3ccf83bab261c5b55623bd3e9d1147b1c2e19 Mon Sep 17 00:00:00 2001 From: Manish Chopra Date: Thu, 2 Jan 2014 13:38:44 -0500 Subject: qlcnic: Fix resource allocation for TX queues o TX queues allocation was getting distributed equally among all the functions of the port including VFs and PF. Which was leading to failure in PF's multiple TX queues creation. o Instead of dividing queues equally allocate one TX queue for each VF as VF doesn't support multiple TX queues. Signed-off-by: Manish Chopra Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c index 686f460..024f816 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c @@ -75,7 +75,6 @@ static int qlcnic_sriov_pf_cal_res_limit(struct qlcnic_adapter *adapter, num_vfs = sriov->num_vfs; max = num_vfs + 1; info->bit_offsets = 0xffff; - info->max_tx_ques = res->num_tx_queues / max; info->max_rx_mcast_mac_filters = res->num_rx_mcast_mac_filters; num_vf_macs = QLCNIC_SRIOV_VF_MAX_MAC; @@ -86,6 +85,7 @@ static int qlcnic_sriov_pf_cal_res_limit(struct qlcnic_adapter *adapter, info->max_tx_mac_filters = temp; info->min_tx_bw = 0; info->max_tx_bw = MAX_BW; + info->max_tx_ques = res->num_tx_queues - sriov->num_vfs; } else { id = qlcnic_sriov_func_to_index(adapter, func); if (id < 0) @@ -95,6 +95,7 @@ static int qlcnic_sriov_pf_cal_res_limit(struct qlcnic_adapter *adapter, info->max_tx_bw = vp->max_tx_bw; info->max_rx_ucast_mac_filters = num_vf_macs; info->max_tx_mac_filters = num_vf_macs; + info->max_tx_ques = QLCNIC_SINGLE_RING; } info->max_rx_ip_addr = res->num_destip / max; -- cgit v0.10.2 From 619a60ee04be33238721a15c1f9704a2a515a33e Mon Sep 17 00:00:00 2001 From: Vlad Yasevich Date: Thu, 2 Jan 2014 14:39:44 -0500 Subject: sctp: Remove outqueue empty state The SCTP outqueue structure maintains a data chunks that are pending transmission, the list of chunks that are pending a retransmission and a length of data in flight. It also tries to keep the emtpy state so that it can performe shutdown sequence or notify user. The problem is that the empy state is inconsistently tracked. It is possible to completely drain the queue without sending anything when using PR-SCTP. In this case, the empty state will not be correctly state as report by Jamal Hadi Salim . This can cause an association to be perminantly stuck in the SHUTDOWN_PENDING state. Additionally, SCTP is incredibly inefficient when setting the empty state. Even though all the data is availaible in the outqueue structure, we ignore it and walk a list of trasnports. In the end, we can completely remove the extra empty state and figure out if the queue is empty by looking at 3 things: length of pending data, length of in-flight data, and exisiting of retransmit data. All of these are already in the strucutre. Reported-by: Jamal Hadi Salim Signed-off-by: Vlad Yasevich Acked-by: Neil Horman Tested-by: Jamal Hadi Salim Signed-off-by: David S. Miller diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h index 67b5d00..0a248b3 100644 --- a/include/net/sctp/structs.h +++ b/include/net/sctp/structs.h @@ -1046,9 +1046,6 @@ struct sctp_outq { /* Corked? */ char cork; - - /* Is this structure empty? */ - char empty; }; void sctp_outq_init(struct sctp_association *, struct sctp_outq *); diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c index f51ba98..59268f6 100644 --- a/net/sctp/outqueue.c +++ b/net/sctp/outqueue.c @@ -208,8 +208,6 @@ void sctp_outq_init(struct sctp_association *asoc, struct sctp_outq *q) INIT_LIST_HEAD(&q->retransmit); INIT_LIST_HEAD(&q->sacked); INIT_LIST_HEAD(&q->abandoned); - - q->empty = 1; } /* Free the outqueue structure and any related pending chunks. @@ -332,7 +330,6 @@ int sctp_outq_tail(struct sctp_outq *q, struct sctp_chunk *chunk) SCTP_INC_STATS(net, SCTP_MIB_OUTUNORDERCHUNKS); else SCTP_INC_STATS(net, SCTP_MIB_OUTORDERCHUNKS); - q->empty = 0; break; } } else { @@ -654,7 +651,6 @@ redo: if (chunk->fast_retransmit == SCTP_NEED_FRTX) chunk->fast_retransmit = SCTP_DONT_FRTX; - q->empty = 0; q->asoc->stats.rtxchunks++; break; } @@ -1065,8 +1061,6 @@ static int sctp_outq_flush(struct sctp_outq *q, int rtx_timeout) sctp_transport_reset_timers(transport); - q->empty = 0; - /* Only let one DATA chunk get bundled with a * COOKIE-ECHO chunk. */ @@ -1275,29 +1269,17 @@ int sctp_outq_sack(struct sctp_outq *q, struct sctp_chunk *chunk) "advertised peer ack point:0x%x\n", __func__, asoc, ctsn, asoc->adv_peer_ack_point); - /* See if all chunks are acked. - * Make sure the empty queue handler will get run later. - */ - q->empty = (list_empty(&q->out_chunk_list) && - list_empty(&q->retransmit)); - if (!q->empty) - goto finish; - - list_for_each_entry(transport, transport_list, transports) { - q->empty = q->empty && list_empty(&transport->transmitted); - if (!q->empty) - goto finish; - } - - pr_debug("%s: sack queue is empty\n", __func__); -finish: - return q->empty; + return sctp_outq_is_empty(q); } -/* Is the outqueue empty? */ +/* Is the outqueue empty? + * The queue is empty when we have not pending data, no in-flight data + * and nothing pending retransmissions. + */ int sctp_outq_is_empty(const struct sctp_outq *q) { - return q->empty; + return q->out_qlen == 0 && q->outstanding_bytes == 0 && + list_empty(&q->retransmit); } /******************************************************************** -- cgit v0.10.2 From 7a7ffbabf99445704be01bff5d7e360da908cf8e Mon Sep 17 00:00:00 2001 From: Wei-Chun Chao Date: Thu, 26 Dec 2013 13:10:22 -0800 Subject: ipv4: fix tunneled VM traffic over hw VXLAN/GRE GSO NIC VM to VM GSO traffic is broken if it goes through VXLAN or GRE tunnel and the physical NIC on the host supports hardware VXLAN/GRE GSO offload (e.g. bnx2x and next-gen mlx4). Two issues - (VXLAN) VM traffic has SKB_GSO_DODGY and SKB_GSO_UDP_TUNNEL with SKB_GSO_TCP/UDP set depending on the inner protocol. GSO header integrity check fails in udp4_ufo_fragment if inner protocol is TCP. Also gso_segs is calculated incorrectly using skb->len that includes tunnel header. Fix: robust check should only be applied to the inner packet. (VXLAN & GRE) Once GSO header integrity check passes, NULL segs is returned and the original skb is sent to hardware. However the tunnel header is already pulled. Fix: tunnel header needs to be restored so that hardware can perform GSO properly on the original packet. Signed-off-by: Wei-Chun Chao Signed-off-by: David S. Miller diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 7514b9c..5faaadb 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3017,6 +3017,19 @@ static inline void netif_set_gso_max_size(struct net_device *dev, dev->gso_max_size = size; } +static inline void skb_gso_error_unwind(struct sk_buff *skb, __be16 protocol, + int pulled_hlen, u16 mac_offset, + int mac_len) +{ + skb->protocol = protocol; + skb->encapsulation = 1; + skb_push(skb, pulled_hlen); + skb_reset_transport_header(skb); + skb->mac_header = mac_offset; + skb->network_header = skb->mac_header + mac_len; + skb->mac_len = mac_len; +} + static inline bool netif_is_macvlan(struct net_device *dev) { return dev->priv_flags & IFF_MACVLAN; diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c index e5d4361..2cd02f3 100644 --- a/net/ipv4/gre_offload.c +++ b/net/ipv4/gre_offload.c @@ -28,6 +28,7 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb, netdev_features_t enc_features; int ghl = GRE_HEADER_SECTION; struct gre_base_hdr *greh; + u16 mac_offset = skb->mac_header; int mac_len = skb->mac_len; __be16 protocol = skb->protocol; int tnl_hlen; @@ -58,13 +59,13 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb, } else csum = false; + if (unlikely(!pskb_may_pull(skb, ghl))) + goto out; + /* setup inner skb. */ skb->protocol = greh->protocol; skb->encapsulation = 0; - if (unlikely(!pskb_may_pull(skb, ghl))) - goto out; - __skb_pull(skb, ghl); skb_reset_mac_header(skb); skb_set_network_header(skb, skb_inner_network_offset(skb)); @@ -73,8 +74,10 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb, /* segment inner packet. */ enc_features = skb->dev->hw_enc_features & netif_skb_features(skb); segs = skb_mac_gso_segment(skb, enc_features); - if (!segs || IS_ERR(segs)) + if (!segs || IS_ERR(segs)) { + skb_gso_error_unwind(skb, protocol, ghl, mac_offset, mac_len); goto out; + } skb = segs; tnl_hlen = skb_tnl_header_len(skb); diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index f140048..a7e4729 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -2478,6 +2478,7 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb, netdev_features_t features) { struct sk_buff *segs = ERR_PTR(-EINVAL); + u16 mac_offset = skb->mac_header; int mac_len = skb->mac_len; int tnl_hlen = skb_inner_mac_header(skb) - skb_transport_header(skb); __be16 protocol = skb->protocol; @@ -2497,8 +2498,11 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb, /* segment inner packet. */ enc_features = skb->dev->hw_enc_features & netif_skb_features(skb); segs = skb_mac_gso_segment(skb, enc_features); - if (!segs || IS_ERR(segs)) + if (!segs || IS_ERR(segs)) { + skb_gso_error_unwind(skb, protocol, tnl_hlen, mac_offset, + mac_len); goto out; + } outer_hlen = skb_tnl_header_len(skb); skb = segs; diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index 83206de..79c62bd 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -41,6 +41,14 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, { struct sk_buff *segs = ERR_PTR(-EINVAL); unsigned int mss; + int offset; + __wsum csum; + + if (skb->encapsulation && + skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL) { + segs = skb_udp_tunnel_segment(skb, features); + goto out; + } mss = skb_shinfo(skb)->gso_size; if (unlikely(skb->len <= mss)) @@ -63,27 +71,20 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, goto out; } + /* Do software UFO. Complete and fill in the UDP checksum as + * HW cannot do checksum of UDP packets sent as multiple + * IP fragments. + */ + offset = skb_checksum_start_offset(skb); + csum = skb_checksum(skb, offset, skb->len - offset, 0); + offset += skb->csum_offset; + *(__sum16 *)(skb->data + offset) = csum_fold(csum); + skb->ip_summed = CHECKSUM_NONE; + /* Fragment the skb. IP headers of the fragments are updated in * inet_gso_segment() */ - if (skb->encapsulation && skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL) - segs = skb_udp_tunnel_segment(skb, features); - else { - int offset; - __wsum csum; - - /* Do software UFO. Complete and fill in the UDP checksum as - * HW cannot do checksum of UDP packets sent as multiple - * IP fragments. - */ - offset = skb_checksum_start_offset(skb); - csum = skb_checksum(skb, offset, skb->len - offset, 0); - offset += skb->csum_offset; - *(__sum16 *)(skb->data + offset) = csum_fold(csum); - skb->ip_summed = CHECKSUM_NONE; - - segs = skb_segment(skb, features); - } + segs = skb_segment(skb, features); out: return segs; } -- cgit v0.10.2 From 6cd4ce0099da7702f885b6fa9ebb49e3831d90b4 Mon Sep 17 00:00:00 2001 From: Jason Wang Date: Mon, 30 Dec 2013 11:34:40 +0800 Subject: virtio-net: fix refill races during restore During restoring, try_fill_recv() was called with neither napi lock nor napi disabled. This can lead two try_fill_recv() was called in the same time. Fix this by refilling before trying to enable napi. Fixes 0741bcb5584f9e2390ae6261573c4de8314999f2 (virtio: net: Add freeze, restore handlers to support S4). Cc: Amit Shah Cc: Rusty Russell Cc: Michael S. Tsirkin Cc: Eric Dumazet Signed-off-by: Jason Wang Signed-off-by: David S. Miller diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index d208f86..5d77644 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1797,16 +1797,17 @@ static int virtnet_restore(struct virtio_device *vdev) if (err) return err; - if (netif_running(vi->dev)) + if (netif_running(vi->dev)) { + for (i = 0; i < vi->curr_queue_pairs; i++) + if (!try_fill_recv(&vi->rq[i], GFP_KERNEL)) + schedule_delayed_work(&vi->refill, 0); + for (i = 0; i < vi->max_queue_pairs; i++) virtnet_napi_enable(&vi->rq[i]); + } netif_device_attach(vi->dev); - for (i = 0; i < vi->curr_queue_pairs; i++) - if (!try_fill_recv(&vi->rq[i], GFP_KERNEL)) - schedule_delayed_work(&vi->refill, 0); - mutex_lock(&vi->config_lock); vi->config_enable = true; mutex_unlock(&vi->config_lock); -- cgit v0.10.2 From 4d231b76eef6c4a6bd9c96769e191517765942cb Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Mon, 30 Dec 2013 23:40:50 +0100 Subject: net: llc: fix use after free in llc_ui_recvmsg While commit 30a584d944fb fixes datagram interface in LLC, a use after free bug has been introduced for SOCK_STREAM sockets that do not make use of MSG_PEEK. The flow is as follow ... if (!(flags & MSG_PEEK)) { ... sk_eat_skb(sk, skb, false); ... } ... if (used + offset < skb->len) continue; ... where sk_eat_skb() calls __kfree_skb(). Therefore, cache original length and work on skb_len to check partial reads. Fixes: 30a584d944fb ("[LLX]: SOCK_DGRAM interface fixes") Signed-off-by: Daniel Borkmann Cc: Stephen Hemminger Cc: Arnaldo Carvalho de Melo Signed-off-by: David S. Miller diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c index 7b01b9f..c71b699 100644 --- a/net/llc/af_llc.c +++ b/net/llc/af_llc.c @@ -715,7 +715,7 @@ static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock, unsigned long cpu_flags; size_t copied = 0; u32 peek_seq = 0; - u32 *seq; + u32 *seq, skb_len; unsigned long used; int target; /* Read at least this many bytes */ long timeo; @@ -812,6 +812,7 @@ static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock, } continue; found_ok_skb: + skb_len = skb->len; /* Ok so how much can we use? */ used = skb->len - offset; if (len < used) @@ -844,7 +845,7 @@ static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock, } /* Partial read */ - if (used + offset < skb->len) + if (used + offset < skb_len) continue; } while (len > 0); -- cgit v0.10.2 From fad8da3e085ddf5e661090033287f1a5d62858fc Mon Sep 17 00:00:00 2001 From: Yasushi Asano Date: Tue, 31 Dec 2013 12:04:19 +0900 Subject: ipv6 addrconf: fix preferred lifetime state-changing behavior while valid_lft is infinity Fixed a problem with setting the lifetime of an IPv6 address. When setting preferred_lft to a value not zero or infinity, while valid_lft is infinity(0xffffffff) preferred lifetime is set to forever and does not update. Therefore preferred lifetime never becomes deprecated. valid lifetime and preferred lifetime should be set independently, even if valid lifetime is infinity, preferred lifetime must expire correctly (meaning it must eventually become deprecated) Signed-off-by: Yasushi Asano Acked-by: Hannes Frederic Sowa Signed-off-by: David S. Miller diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index d5fa5b8..1a341f7 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -3456,7 +3456,12 @@ restart: &inet6_addr_lst[i], addr_lst) { unsigned long age; - if (ifp->flags & IFA_F_PERMANENT) + /* When setting preferred_lft to a value not zero or + * infinity, while valid_lft is infinity + * IFA_F_PERMANENT has a non-infinity life time. + */ + if ((ifp->flags & IFA_F_PERMANENT) && + (ifp->prefered_lft == INFINITY_LIFE_TIME)) continue; spin_lock(&ifp->lock); @@ -3481,7 +3486,8 @@ restart: ifp->flags |= IFA_F_DEPRECATED; } - if (time_before(ifp->tstamp + ifp->valid_lft * HZ, next)) + if ((ifp->valid_lft != INFINITY_LIFE_TIME) && + (time_before(ifp->tstamp + ifp->valid_lft * HZ, next))) next = ifp->tstamp + ifp->valid_lft * HZ; spin_unlock(&ifp->lock); @@ -3761,7 +3767,8 @@ static int inet6_fill_ifaddr(struct sk_buff *skb, struct inet6_ifaddr *ifa, put_ifaddrmsg(nlh, ifa->prefix_len, ifa->flags, rt_scope(ifa->scope), ifa->idev->dev->ifindex); - if (!(ifa->flags&IFA_F_PERMANENT)) { + if (!((ifa->flags&IFA_F_PERMANENT) && + (ifa->prefered_lft == INFINITY_LIFE_TIME))) { preferred = ifa->prefered_lft; valid = ifa->valid_lft; if (preferred != INFINITY_LIFE_TIME) { -- cgit v0.10.2 From abb6013cca147ad940b0e9fee260d2d9e93b7018 Mon Sep 17 00:00:00 2001 From: Li RongQing Date: Thu, 2 Jan 2014 13:20:12 +0800 Subject: ipv6: fix the use of pcpu_tstats in ip6_tunnel when read/write the 64bit data, the correct lock should be hold. Fixes: 87b6d218f3adb ("tunnel: implement 64 bits statistics") Cc: Stephen Hemminger Cc: Eric Dumazet Signed-off-by: Li RongQing Signed-off-by: David S. Miller diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index d606232..7881965 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -103,16 +103,25 @@ struct ip6_tnl_net { static struct net_device_stats *ip6_get_stats(struct net_device *dev) { - struct pcpu_tstats sum = { 0 }; + struct pcpu_tstats tmp, sum = { 0 }; int i; for_each_possible_cpu(i) { + unsigned int start; const struct pcpu_tstats *tstats = per_cpu_ptr(dev->tstats, i); - sum.rx_packets += tstats->rx_packets; - sum.rx_bytes += tstats->rx_bytes; - sum.tx_packets += tstats->tx_packets; - sum.tx_bytes += tstats->tx_bytes; + do { + start = u64_stats_fetch_begin_bh(&tstats->syncp); + tmp.rx_packets = tstats->rx_packets; + tmp.rx_bytes = tstats->rx_bytes; + tmp.tx_packets = tstats->tx_packets; + tmp.tx_bytes = tstats->tx_bytes; + } while (u64_stats_fetch_retry_bh(&tstats->syncp, start)); + + sum.rx_packets += tmp.rx_packets; + sum.rx_bytes += tmp.rx_bytes; + sum.tx_packets += tmp.tx_packets; + sum.tx_bytes += tmp.tx_bytes; } dev->stats.rx_packets = sum.rx_packets; dev->stats.rx_bytes = sum.rx_bytes; @@ -824,8 +833,10 @@ static int ip6_tnl_rcv(struct sk_buff *skb, __u16 protocol, } tstats = this_cpu_ptr(t->dev->tstats); + u64_stats_update_begin(&tstats->syncp); tstats->rx_packets++; tstats->rx_bytes += skb->len; + u64_stats_update_end(&tstats->syncp); netif_rx(skb); -- cgit v0.10.2 From 469bdcefdc47a69028029e792ff1e80680c867b9 Mon Sep 17 00:00:00 2001 From: Li RongQing Date: Thu, 2 Jan 2014 14:24:36 +0800 Subject: ipv6: fix the use of pcpu_tstats in ip6_vti.c when read/write the 64bit data, the correct lock should be hold. and we can use the generic vti6_get_stats to return stats, and not define a new one in ip6_vti.c Fixes: 87b6d218f3adb ("tunnel: implement 64 bits statistics") Cc: Stephen Hemminger Cc: Eric Dumazet Signed-off-by: Li RongQing Signed-off-by: David S. Miller diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c index ed94ba6..a4564b0 100644 --- a/net/ipv6/ip6_vti.c +++ b/net/ipv6/ip6_vti.c @@ -75,26 +75,6 @@ struct vti6_net { struct ip6_tnl __rcu **tnls[2]; }; -static struct net_device_stats *vti6_get_stats(struct net_device *dev) -{ - struct pcpu_tstats sum = { 0 }; - int i; - - for_each_possible_cpu(i) { - const struct pcpu_tstats *tstats = per_cpu_ptr(dev->tstats, i); - - sum.rx_packets += tstats->rx_packets; - sum.rx_bytes += tstats->rx_bytes; - sum.tx_packets += tstats->tx_packets; - sum.tx_bytes += tstats->tx_bytes; - } - dev->stats.rx_packets = sum.rx_packets; - dev->stats.rx_bytes = sum.rx_bytes; - dev->stats.tx_packets = sum.tx_packets; - dev->stats.tx_bytes = sum.tx_bytes; - return &dev->stats; -} - #define for_each_vti6_tunnel_rcu(start) \ for (t = rcu_dereference(start); t; t = rcu_dereference(t->next)) @@ -331,8 +311,10 @@ static int vti6_rcv(struct sk_buff *skb) } tstats = this_cpu_ptr(t->dev->tstats); + u64_stats_update_begin(&tstats->syncp); tstats->rx_packets++; tstats->rx_bytes += skb->len; + u64_stats_update_end(&tstats->syncp); skb->mark = 0; secpath_reset(skb); @@ -716,7 +698,7 @@ static const struct net_device_ops vti6_netdev_ops = { .ndo_start_xmit = vti6_tnl_xmit, .ndo_do_ioctl = vti6_ioctl, .ndo_change_mtu = vti6_change_mtu, - .ndo_get_stats = vti6_get_stats, + .ndo_get_stats64 = ip_tunnel_get_stats64, }; /** -- cgit v0.10.2 From aca5f58f9ba803ec8c2e6bcf890db17589e8dfcc Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Thu, 2 Jan 2014 19:50:52 -0500 Subject: netpoll: Fix missing TXQ unlock and and OOPS. The VLAN tag handling code in netpoll_send_skb_on_dev() has two problems. 1) It exits without unlocking the TXQ. 2) It then tries to queue a NULL skb to npinfo->txq. Reported-by: Ahmed Tamrawi Signed-off-by: David S. Miller diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 8f97199..3030978 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -386,8 +386,14 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb, !vlan_hw_offload_capable(netif_skb_features(skb), skb->vlan_proto)) { skb = __vlan_put_tag(skb, skb->vlan_proto, vlan_tx_tag_get(skb)); - if (unlikely(!skb)) - break; + if (unlikely(!skb)) { + /* This is actually a packet drop, but we + * don't want the code at the end of this + * function to try and re-queue a NULL skb. + */ + status = NETDEV_TX_OK; + goto unlock_txq; + } skb->vlan_tci = 0; } @@ -395,6 +401,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb, if (status == NETDEV_TX_OK) txq_trans_update(txq); } + unlock_txq: __netif_tx_unlock(txq); if (status == NETDEV_TX_OK) -- cgit v0.10.2 From 940d9d34a5467c2e2574866eb009d4cb61e27299 Mon Sep 17 00:00:00 2001 From: Thadeu Lima de Souza Cascardo Date: Mon, 23 Dec 2013 15:34:29 -0200 Subject: cxgb4: allow large buffer size to have page size Since commit 52367a763d8046190754ab43743e42638564a2d1 ("cxgb4/cxgb4vf: Code cleanup to enable T4 Configuration File support"), we have failures like this during cxgb4 probe: cxgb4 0000:01:00.4: bad SGE FL page buffer sizes [65536, 65536] cxgb4: probe of 0000:01:00.4 failed with error -22 This happens whenever software parameters are used, without a configuration file. That happens when the hardware was already initialized (after kexec, or after csiostor is loaded). It happens that these values are acceptable, rendering fl_pg_order equal to 0, which is the case of a hard init when the page size is equal or larger than 65536. Accepting fl_large_pg equal to fl_small_pg solves the issue, and shouldn't cause any trouble besides a possible performance reduction when smaller pages are used. And that can be fixed by a configuration file. Signed-off-by: Thadeu Lima de Souza Cascardo Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c index cc380c3..cc3511a 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/sge.c +++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c @@ -2581,7 +2581,7 @@ static int t4_sge_init_soft(struct adapter *adap) #undef READ_FL_BUF if (fl_small_pg != PAGE_SIZE || - (fl_large_pg != 0 && (fl_large_pg <= fl_small_pg || + (fl_large_pg != 0 && (fl_large_pg < fl_small_pg || (fl_large_pg & (fl_large_pg-1)) != 0))) { dev_err(adap->pdev_dev, "bad SGE FL page buffer sizes [%d, %d]\n", fl_small_pg, fl_large_pg); -- cgit v0.10.2 From 7bda701e012373ca53c9d837b7b25131852e0238 Mon Sep 17 00:00:00 2001 From: "fan.du" Date: Fri, 3 Jan 2014 10:18:58 +0800 Subject: {vxlan, inet6} Mark vxlan_dev flags with VXLAN_F_IPV6 properly Even if user doesn't supply the physical netdev to attach vxlan dev to, and at the same time user want to vxlan sit top of IPv6, mark vxlan_dev flags with VXLAN_F_IPV6 to create IPv6 based socket. Otherwise kernel crashes safely every time spitting below messages, Steps to reproduce: ip link add vxlan0 type vxlan id 42 group ff0e::110 ip link set vxlan0 up [ 62.656266] BUG: unable to handle kernel NULL pointer dereference[ 62.656320] ip (3008) used greatest stack depth: 3912 bytes left at 0000000000000046 [ 62.656423] IP: [] ip6_route_output+0xbd/0xe0 [ 62.656525] PGD 2c966067 PUD 2c9a2067 PMD 0 [ 62.656674] Oops: 0000 [#1] SMP [ 62.656781] Modules linked in: vxlan netconsole deflate zlib_deflate af_key [ 62.657083] CPU: 1 PID: 2128 Comm: whoopsie Not tainted 3.12.0+ #182 [ 62.657083] Hardware name: innotek GmbH VirtualBox, BIOS VirtualBox 12/01/2006 [ 62.657083] task: ffff88002e2335d0 ti: ffff88002c94c000 task.ti: ffff88002c94c000 [ 62.657083] RIP: 0010:[] [] ip6_route_output+0xbd/0xe0 [ 62.657083] RSP: 0000:ffff88002fd038f8 EFLAGS: 00210296 [ 62.657083] RAX: 0000000000000000 RBX: ffff88002fd039e0 RCX: 0000000000000000 [ 62.657083] RDX: ffff88002fd0eb68 RSI: ffff88002fd0d278 RDI: ffff88002fd0d278 [ 62.657083] RBP: ffff88002fd03918 R08: 0000000002000000 R09: 0000000000000000 [ 62.657083] R10: 00000000000001ff R11: 0000000000000000 R12: 0000000000000001 [ 62.657083] R13: ffff88002d96b480 R14: ffffffff81c8e2c0 R15: 0000000000000001 [ 62.657083] FS: 0000000000000000(0000) GS:ffff88002fd00000(0063) knlGS:00000000f693b740 [ 62.657083] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 [ 62.657083] CR2: 0000000000000046 CR3: 000000002c9d2000 CR4: 00000000000006e0 [ 62.657083] Stack: [ 62.657083] ffff88002fd03a40 ffffffff81c8e2c0 ffff88002fd039e0 ffff88002d96b480 [ 62.657083] ffff88002fd03958 ffffffff816cac8b ffff880019277cc0 ffff8800192b5d00 [ 62.657083] ffff88002d5bc000 ffff880019277cc0 0000000000001821 0000000000000001 [ 62.657083] Call Trace: [ 62.657083] [ 62.657083] [] ip6_dst_lookup_tail+0xdb/0xf0 [ 62.657083] [] ip6_dst_lookup+0x10/0x20 [ 62.657083] [] vxlan_xmit_one+0x193/0x9c0 [vxlan] [ 62.657083] [] ? account+0xc7/0x1f0 [ 62.657083] [] vxlan_xmit+0xd3/0x400 [vxlan] [ 62.657083] [] dev_hard_start_xmit+0x49d/0x5e0 [ 62.657083] [] dev_queue_xmit+0x2d9/0x480 [ 62.657083] [] ? _raw_write_unlock_bh+0x14/0x20 [ 62.657083] [] ? eth_header+0x35/0xe0 [ 62.657083] [] neigh_resolve_output+0x11e/0x1e0 [ 62.657083] [] ? ip6_fragment+0xad0/0xad0 [ 62.657083] [] ip6_finish_output2+0x2f5/0x470 [ 62.657083] [] ip6_finish_output+0x86/0xc0 [ 62.657083] [] ip6_output+0x78/0xb0 [ 62.657083] [] mld_sendpack+0x256/0x2a0 [ 62.657083] [] mld_ifc_timer_expire+0x17c/0x290 [ 62.657083] [] ? igmp6_timer_handler+0x80/0x80 [ 62.657083] [] ? igmp6_timer_handler+0x80/0x80 [ 62.657083] [] call_timer_fn+0x45/0x150 [ 62.657083] [] ? igmp6_timer_handler+0x80/0x80 [ 62.657083] [] run_timer_softirq+0x1f3/0x2a0 [ 62.657083] [] ? lapic_next_event+0x18/0x20 [ 62.657083] [] ? clockevents_program_event+0x6f/0x110 [ 62.657083] [] __do_softirq+0xd6/0x2b0 [ 62.657083] [] irq_exit+0x7e/0xa0 [ 62.657083] [] smp_apic_timer_interrupt+0x45/0x60 [ 62.657083] [] apic_timer_interrupt+0x6a/0x70 [ 62.657083] [ 62.657083] [] ? sysenter_dispatch+0x7/0x1a [ 62.657083] Code: 4d 8b 85 a8 02 00 00 4c 89 e9 ba 03 04 00 00 48 c7 c6 c0 be 8d 81 48 c7 c7 48 35 a3 81 31 c0 e8 db 68 0e 00 49 8b 85 a8 02 00 00 <0f> b6 40 46 c0 e8 05 0f b6 c0 c1 e0 03 41 09 c4 e9 77 ff ff ff [ 62.657083] RIP [] ip6_route_output+0xbd/0xe0 [ 62.657083] RSP [ 62.657083] CR2: 0000000000000046 [ 62.657083] ---[ end trace ba8a9583d7cd1934 ]--- [ 62.657083] Kernel panic - not syncing: Fatal exception in interrupt Signed-off-by: Fan Du Reported-by: Ryan Whelan Acked-by: Cong Wang Signed-off-by: David S. Miller diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 249e01c..ed384fe 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -2440,7 +2440,8 @@ static int vxlan_newlink(struct net *net, struct net_device *dev, /* update header length based on lower device */ dev->hard_header_len = lowerdev->hard_header_len + (use_ipv6 ? VXLAN6_HEADROOM : VXLAN_HEADROOM); - } + } else if (use_ipv6) + vxlan->flags |= VXLAN_F_IPV6; if (data[IFLA_VXLAN_TOS]) vxlan->tos = nla_get_u8(data[IFLA_VXLAN_TOS]); -- cgit v0.10.2 From 0d68fc4f1210f8caea2bdd68f99dc6da35ee3740 Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Fri, 3 Jan 2014 11:33:45 +0800 Subject: infiniband: make sure the src net is infiniband when create new link When we create a new infiniband link with uninfiniband device, e.g. `ip link add link em1 type ipoib pkey 0x8001`. We will get a NULL pointer dereference cause other dev like Ethernet don't have struct ib_device. The code path is: rtnl_newlink |-- ipoib_new_child_link |-- __ipoib_vlan_add |-- ipoib_set_dev_features |-- ib_query_device Fix this bug by make sure the src net is infiniband when create new link. Signed-off-by: Hangbin Liu Signed-off-by: David S. Miller diff --git a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c index c29b5c8..cdc7df4 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c @@ -31,6 +31,7 @@ */ #include +#include /* For ARPHRD_xxx */ #include #include #include "ipoib.h" @@ -103,7 +104,7 @@ static int ipoib_new_child_link(struct net *src_net, struct net_device *dev, return -EINVAL; pdev = __dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK])); - if (!pdev) + if (!pdev || pdev->type != ARPHRD_INFINIBAND) return -ENODEV; ppriv = netdev_priv(pdev); -- cgit v0.10.2 From a02bdd423d844f5beb3196922f07c85c2f7691b8 Mon Sep 17 00:00:00 2001 From: Shahed Shaikh Date: Fri, 3 Jan 2014 01:34:28 -0500 Subject: qlcnic: Fix bug in Tx completion path o Driver is using common tx_clean_lock for all Tx queues. This patch adds per queue tx_clean_lock. o Driver is not updating sw_consumer while processing Tx completion when interface is going down. Fixed in this patch. Signed-off-by: Shahed Shaikh Signed-off-by: Manish Chopra Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h index 4dfef81..ff80cd8 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h @@ -579,6 +579,8 @@ struct qlcnic_host_tx_ring { dma_addr_t phys_addr; dma_addr_t hw_cons_phys_addr; struct netdev_queue *txq; + /* Lock to protect Tx descriptors cleanup */ + spinlock_t tx_clean_lock; } ____cacheline_internodealigned_in_smp; /* @@ -1095,7 +1097,6 @@ struct qlcnic_adapter { struct qlcnic_filter_hash rx_fhash; struct list_head vf_mc_list; - spinlock_t tx_clean_lock; spinlock_t mac_learn_lock; /* spinlock for catching rcv filters for eswitch traffic */ spinlock_t rx_mac_learn_lock; diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_init.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_init.c index e9c21e5..c4262c2 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_init.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_init.c @@ -134,6 +134,8 @@ void qlcnic_release_tx_buffers(struct qlcnic_adapter *adapter, struct qlcnic_skb_frag *buffrag; int i, j; + spin_lock(&tx_ring->tx_clean_lock); + cmd_buf = tx_ring->cmd_buf_arr; for (i = 0; i < tx_ring->num_desc; i++) { buffrag = cmd_buf->frag_array; @@ -157,6 +159,8 @@ void qlcnic_release_tx_buffers(struct qlcnic_adapter *adapter, } cmd_buf++; } + + spin_unlock(&tx_ring->tx_clean_lock); } void qlcnic_free_sw_resources(struct qlcnic_adapter *adapter) diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c index 1362976..ad1531a 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c @@ -782,7 +782,7 @@ static int qlcnic_process_cmd_ring(struct qlcnic_adapter *adapter, struct net_device *netdev = adapter->netdev; struct qlcnic_skb_frag *frag; - if (!spin_trylock(&adapter->tx_clean_lock)) + if (!spin_trylock(&tx_ring->tx_clean_lock)) return 1; sw_consumer = tx_ring->sw_consumer; @@ -811,8 +811,9 @@ static int qlcnic_process_cmd_ring(struct qlcnic_adapter *adapter, break; } + tx_ring->sw_consumer = sw_consumer; + if (count && netif_running(netdev)) { - tx_ring->sw_consumer = sw_consumer; smp_mb(); if (netif_tx_queue_stopped(tx_ring->txq) && netif_carrier_ok(netdev)) { @@ -838,7 +839,8 @@ static int qlcnic_process_cmd_ring(struct qlcnic_adapter *adapter, */ hw_consumer = le32_to_cpu(*(tx_ring->hw_consumer)); done = (sw_consumer == hw_consumer); - spin_unlock(&adapter->tx_clean_lock); + + spin_unlock(&tx_ring->tx_clean_lock); return done; } diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c index 2c8cac0..b8a245a 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c @@ -1756,7 +1756,6 @@ void __qlcnic_down(struct qlcnic_adapter *adapter, struct net_device *netdev) if (qlcnic_sriov_vf_check(adapter)) qlcnic_sriov_cleanup_async_list(&adapter->ahw->sriov->bc); smp_mb(); - spin_lock(&adapter->tx_clean_lock); netif_carrier_off(netdev); adapter->ahw->linkup = 0; netif_tx_disable(netdev); @@ -1777,7 +1776,6 @@ void __qlcnic_down(struct qlcnic_adapter *adapter, struct net_device *netdev) for (ring = 0; ring < adapter->drv_tx_rings; ring++) qlcnic_release_tx_buffers(adapter, &adapter->tx_ring[ring]); - spin_unlock(&adapter->tx_clean_lock); } /* Usage: During suspend and firmware recovery module */ @@ -2172,6 +2170,7 @@ int qlcnic_alloc_tx_rings(struct qlcnic_adapter *adapter, } memset(cmd_buf_arr, 0, TX_BUFF_RINGSIZE(tx_ring)); tx_ring->cmd_buf_arr = cmd_buf_arr; + spin_lock_init(&tx_ring->tx_clean_lock); } if (qlcnic_83xx_check(adapter) || @@ -2299,7 +2298,6 @@ qlcnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent) rwlock_init(&adapter->ahw->crb_lock); mutex_init(&adapter->ahw->mem_lock); - spin_lock_init(&adapter->tx_clean_lock); INIT_LIST_HEAD(&adapter->mac_list); qlcnic_register_dcb(adapter); -- cgit v0.10.2 From e848582cee23f7ab540ec49ab1c7d7f8bfefcd84 Mon Sep 17 00:00:00 2001 From: Dmitry Kravkov Date: Sun, 5 Jan 2014 18:33:50 +0200 Subject: bnx2x: limit number of interrupt vectors for 57711 Original straightforward division may lead to zeroing number of SB and null-pointer dereference when device is short of MSIX vectors or lacks MSIX capabilities. Reported-by: Vladislav Zolotarov Signed-off-by: Dmitry Kravkov Signed-off-by: Yuval Mintz Signed-off-by: Ariel Elior Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h index a1f66e2..cb30d1a3 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h @@ -2499,4 +2499,6 @@ void bnx2x_set_local_cmng(struct bnx2x *bp); #define MCPR_SCRATCH_BASE(bp) \ (CHIP_IS_E1x(bp) ? MCP_REG_MCPR_SCRATCH : MCP_A_REG_MCPR_SCRATCH) +#define E1H_MAX_MF_SB_COUNT (HC_SB_MAX_SB_E1X/(E1HVN_MAX * PORT_MAX)) + #endif /* bnx2x.h */ diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c index 814d0ec..8b3107b 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c @@ -11447,9 +11447,9 @@ static int bnx2x_get_hwinfo(struct bnx2x *bp) } } - /* adjust igu_sb_cnt to MF for E1x */ - if (CHIP_IS_E1x(bp) && IS_MF(bp)) - bp->igu_sb_cnt /= E1HVN_MAX; + /* adjust igu_sb_cnt to MF for E1H */ + if (CHIP_IS_E1H(bp) && IS_MF(bp)) + bp->igu_sb_cnt = min_t(u8, bp->igu_sb_cnt, E1H_MAX_MF_SB_COUNT); /* port info */ bnx2x_get_port_hwinfo(bp); -- cgit v0.10.2 From 89e18ae6e6288deb1940cd16afe4e6983545defa Mon Sep 17 00:00:00 2001 From: Michal Kalderon Date: Sun, 5 Jan 2014 18:33:51 +0200 Subject: bnx2x: Correct number of MSI-X vectors for VFs Number of VFs in PCIe configuration space is zero-based. Driver incorrectly sets the number of VFs to be larger by one than what actually is feasible by HW, which might cause later VFs to fail to allocate their MSI-X interrupts. Signed-off-by: Michal Kalderon Signed-off-by: Yuval Mintz Signed-off-by: Ariel Elior Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c index 2e46c28..ddd95b9 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c @@ -3202,13 +3202,16 @@ int bnx2x_enable_sriov(struct bnx2x *bp) bnx2x_iov_static_resc(bp, vf); } - /* prepare msix vectors in VF configuration space */ + /* prepare msix vectors in VF configuration space - the value in the + * PCI configuration space should be the index of the last entry, + * namely one less than the actual size of the table + */ for (vf_idx = first_vf; vf_idx < first_vf + req_vfs; vf_idx++) { bnx2x_pretend_func(bp, HW_VF_HANDLE(bp, vf_idx)); REG_WR(bp, PCICFG_OFFSET + GRC_CONFIG_REG_VF_MSIX_CONTROL, - num_vf_queues); + num_vf_queues - 1); DP(BNX2X_MSG_IOV, "set msix vec num in VF %d cfg space to %d\n", - vf_idx, num_vf_queues); + vf_idx, num_vf_queues - 1); } bnx2x_pretend_func(bp, BP_ABS_FUNC(bp)); -- cgit v0.10.2 From 5b622918cd6f4ff5b5200ea6b2e48ae2fa6ad09e Mon Sep 17 00:00:00 2001 From: Michal Kalderon Date: Sun, 5 Jan 2014 18:33:52 +0200 Subject: bnx2x: Clean before update RSS arrives When a PF receives a VF message indicating a change in RSS properties it should clean the flags' bit-fields; Otherwise, it's possible that some random values will be considered as flags by the lower layers configuring the RSS in FW. Signed-off-by: Michal Kalderon Signed-off-by: Yuval Mintz Signed-off-by: Ariel Elior Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c index 32c92ab..95feada 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c @@ -4382,8 +4382,11 @@ int bnx2x_config_rss(struct bnx2x *bp, struct bnx2x_raw_obj *r = &o->raw; /* Do nothing if only driver cleanup was requested */ - if (test_bit(RAMROD_DRV_CLR_ONLY, &p->ramrod_flags)) + if (test_bit(RAMROD_DRV_CLR_ONLY, &p->ramrod_flags)) { + DP(BNX2X_MSG_SP, "Not configuring RSS ramrod_flags=%lx\n", + p->ramrod_flags); return 0; + } r->set_pending(r); diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c index 3dc2537..26fcba2 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c @@ -1805,6 +1805,9 @@ static void bnx2x_vf_mbx_update_rss(struct bnx2x *bp, struct bnx2x_virtf *vf, vf_op_params->rss_result_mask = rss_tlv->rss_result_mask; /* flags handled individually for backward/forward compatability */ + vf_op_params->rss_flags = 0; + vf_op_params->ramrod_flags = 0; + if (rss_tlv->rss_flags & VFPF_RSS_MODE_DISABLED) __set_bit(BNX2X_RSS_MODE_DISABLED, &vf_op_params->rss_flags); if (rss_tlv->rss_flags & VFPF_RSS_MODE_REGULAR) -- cgit v0.10.2 From 9dfef3adaebb42163d691be73e05b12da440cbe5 Mon Sep 17 00:00:00 2001 From: Yuval Mintz Date: Sun, 5 Jan 2014 18:33:53 +0200 Subject: bnx2x: fix AFEX memory overflow There are 2 different (related) flows in the slowpath configuration that utilize the same pointer and cast it to different structs; This is obviously incorrect as the intended allocated memory is that of the smaller struct, possibly causing the flow utilizing the larger struct to corrupt other slowpath configuration. Since both flows are exclusive, set the allocated memory to be a union of both structs. Reported-by: Dan Carpenter Signed-off-by: Yuval Mintz Signed-off-by: Ariel Elior Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h index cb30d1a3..2d5fce4 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h @@ -1250,7 +1250,10 @@ struct bnx2x_slowpath { * Therefore, if they would have been defined in the same union, * data can get corrupted. */ - struct afex_vif_list_ramrod_data func_afex_rdata; + union { + struct afex_vif_list_ramrod_data viflist_data; + struct function_update_data func_update; + } func_afex_rdata; /* used by dmae command executer */ struct dmae_command dmae[MAX_DMAE_C]; -- cgit v0.10.2 From e8379c79542c95b25890ed49be652b1634deca17 Mon Sep 17 00:00:00 2001 From: Yuval Mintz Date: Sun, 5 Jan 2014 18:33:54 +0200 Subject: bnx2x: fix VLAN configuration for VFs. If the hypervisor configures a vlan for the VF via the PF, the expected result is that only packets tagged by said vlan will be received by the VF (and that vlan will be silently removed). Due to an incorrect manipulation of vlan filters in the driver, the VF can receive untagged traffic even if the hypervisor configured some vlan for it. This patch corrects the behaviour. Signed-off-by: Yuval Mintz Signed-off-by: Ariel Elior Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c index 95feada..18438a5 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c @@ -2038,6 +2038,7 @@ static int bnx2x_vlan_mac_del_all(struct bnx2x *bp, struct bnx2x_vlan_mac_ramrod_params p; struct bnx2x_exe_queue_obj *exeq = &o->exe_queue; struct bnx2x_exeq_elem *exeq_pos, *exeq_pos_n; + unsigned long flags; int read_lock; int rc = 0; @@ -2046,8 +2047,9 @@ static int bnx2x_vlan_mac_del_all(struct bnx2x *bp, spin_lock_bh(&exeq->lock); list_for_each_entry_safe(exeq_pos, exeq_pos_n, &exeq->exe_queue, link) { - if (exeq_pos->cmd_data.vlan_mac.vlan_mac_flags == - *vlan_mac_flags) { + flags = exeq_pos->cmd_data.vlan_mac.vlan_mac_flags; + if (BNX2X_VLAN_MAC_CMP_FLAGS(flags) == + BNX2X_VLAN_MAC_CMP_FLAGS(*vlan_mac_flags)) { rc = exeq->remove(bp, exeq->owner, exeq_pos); if (rc) { BNX2X_ERR("Failed to remove command\n"); @@ -2080,7 +2082,9 @@ static int bnx2x_vlan_mac_del_all(struct bnx2x *bp, return read_lock; list_for_each_entry(pos, &o->head, link) { - if (pos->vlan_mac_flags == *vlan_mac_flags) { + flags = pos->vlan_mac_flags; + if (BNX2X_VLAN_MAC_CMP_FLAGS(flags) == + BNX2X_VLAN_MAC_CMP_FLAGS(*vlan_mac_flags)) { p.user_req.vlan_mac_flags = pos->vlan_mac_flags; memcpy(&p.user_req.u, &pos->u, sizeof(pos->u)); rc = bnx2x_config_vlan_mac(bp, &p); diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h index 658f4e3..6a53c15 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.h @@ -266,6 +266,13 @@ enum { BNX2X_DONT_CONSUME_CAM_CREDIT, BNX2X_DONT_CONSUME_CAM_CREDIT_DEST, }; +/* When looking for matching filters, some flags are not interesting */ +#define BNX2X_VLAN_MAC_CMP_MASK (1 << BNX2X_UC_LIST_MAC | \ + 1 << BNX2X_ETH_MAC | \ + 1 << BNX2X_ISCSI_ETH_MAC | \ + 1 << BNX2X_NETQ_ETH_MAC) +#define BNX2X_VLAN_MAC_CMP_FLAGS(flags) \ + ((flags) & BNX2X_VLAN_MAC_CMP_MASK) struct bnx2x_vlan_mac_ramrod_params { /* Object to run the command from */ diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c index ddd95b9..e7845e5 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c @@ -1209,6 +1209,11 @@ static void bnx2x_vfop_rxmode(struct bnx2x *bp, struct bnx2x_virtf *vf) /* next state */ vfop->state = BNX2X_VFOP_RXMODE_DONE; + /* record the accept flags in vfdb so hypervisor can modify them + * if necessary + */ + bnx2x_vfq(vf, ramrod->cl_id - vf->igu_base_id, accept_flags) = + ramrod->rx_accept_flags; vfop->rc = bnx2x_config_rx_mode(bp, ramrod); bnx2x_vfop_finalize(vf, vfop->rc, VFOP_DONE); op_err: @@ -1224,39 +1229,43 @@ op_pending: return; } +static void bnx2x_vf_prep_rx_mode(struct bnx2x *bp, u8 qid, + struct bnx2x_rx_mode_ramrod_params *ramrod, + struct bnx2x_virtf *vf, + unsigned long accept_flags) +{ + struct bnx2x_vf_queue *vfq = vfq_get(vf, qid); + + memset(ramrod, 0, sizeof(*ramrod)); + ramrod->cid = vfq->cid; + ramrod->cl_id = vfq_cl_id(vf, vfq); + ramrod->rx_mode_obj = &bp->rx_mode_obj; + ramrod->func_id = FW_VF_HANDLE(vf->abs_vfid); + ramrod->rx_accept_flags = accept_flags; + ramrod->tx_accept_flags = accept_flags; + ramrod->pstate = &vf->filter_state; + ramrod->state = BNX2X_FILTER_RX_MODE_PENDING; + + set_bit(BNX2X_FILTER_RX_MODE_PENDING, &vf->filter_state); + set_bit(RAMROD_RX, &ramrod->ramrod_flags); + set_bit(RAMROD_TX, &ramrod->ramrod_flags); + + ramrod->rdata = bnx2x_vf_sp(bp, vf, rx_mode_rdata.e2); + ramrod->rdata_mapping = bnx2x_vf_sp_map(bp, vf, rx_mode_rdata.e2); +} + int bnx2x_vfop_rxmode_cmd(struct bnx2x *bp, struct bnx2x_virtf *vf, struct bnx2x_vfop_cmd *cmd, int qid, unsigned long accept_flags) { - struct bnx2x_vf_queue *vfq = vfq_get(vf, qid); struct bnx2x_vfop *vfop = bnx2x_vfop_add(bp, vf); if (vfop) { struct bnx2x_rx_mode_ramrod_params *ramrod = &vf->op_params.rx_mode; - memset(ramrod, 0, sizeof(*ramrod)); - - /* Prepare ramrod parameters */ - ramrod->cid = vfq->cid; - ramrod->cl_id = vfq_cl_id(vf, vfq); - ramrod->rx_mode_obj = &bp->rx_mode_obj; - ramrod->func_id = FW_VF_HANDLE(vf->abs_vfid); - - ramrod->rx_accept_flags = accept_flags; - ramrod->tx_accept_flags = accept_flags; - ramrod->pstate = &vf->filter_state; - ramrod->state = BNX2X_FILTER_RX_MODE_PENDING; - - set_bit(BNX2X_FILTER_RX_MODE_PENDING, &vf->filter_state); - set_bit(RAMROD_RX, &ramrod->ramrod_flags); - set_bit(RAMROD_TX, &ramrod->ramrod_flags); - - ramrod->rdata = - bnx2x_vf_sp(bp, vf, rx_mode_rdata.e2); - ramrod->rdata_mapping = - bnx2x_vf_sp_map(bp, vf, rx_mode_rdata.e2); + bnx2x_vf_prep_rx_mode(bp, qid, ramrod, vf, accept_flags); bnx2x_vfop_opset(BNX2X_VFOP_RXMODE_CONFIG, bnx2x_vfop_rxmode, cmd->done); @@ -3439,10 +3448,18 @@ out: int bnx2x_set_vf_vlan(struct net_device *dev, int vfidx, u16 vlan, u8 qos) { + struct bnx2x_queue_state_params q_params = {NULL}; + struct bnx2x_vlan_mac_ramrod_params ramrod_param; + struct bnx2x_queue_update_params *update_params; + struct pf_vf_bulletin_content *bulletin = NULL; + struct bnx2x_rx_mode_ramrod_params rx_ramrod; struct bnx2x *bp = netdev_priv(dev); - int rc, q_logical_state; + struct bnx2x_vlan_mac_obj *vlan_obj; + unsigned long vlan_mac_flags = 0; + unsigned long ramrod_flags = 0; struct bnx2x_virtf *vf = NULL; - struct pf_vf_bulletin_content *bulletin = NULL; + unsigned long accept_flags; + int rc; /* sanity and init */ rc = bnx2x_vf_ndo_prep(bp, vfidx, &vf, &bulletin); @@ -3460,104 +3477,118 @@ int bnx2x_set_vf_vlan(struct net_device *dev, int vfidx, u16 vlan, u8 qos) /* update PF's copy of the VF's bulletin. No point in posting the vlan * to the VF since it doesn't have anything to do with it. But it useful * to store it here in case the VF is not up yet and we can only - * configure the vlan later when it does. + * configure the vlan later when it does. Treat vlan id 0 as remove the + * Host tag. */ - bulletin->valid_bitmap |= 1 << VLAN_VALID; + if (vlan > 0) + bulletin->valid_bitmap |= 1 << VLAN_VALID; + else + bulletin->valid_bitmap &= ~(1 << VLAN_VALID); bulletin->vlan = vlan; /* is vf initialized and queue set up? */ - q_logical_state = - bnx2x_get_q_logical_state(bp, &bnx2x_leading_vfq(vf, sp_obj)); - if (vf->state == VF_ENABLED && - q_logical_state == BNX2X_Q_LOGICAL_STATE_ACTIVE) { - /* configure the vlan in device on this vf's queue */ - unsigned long ramrod_flags = 0; - unsigned long vlan_mac_flags = 0; - struct bnx2x_vlan_mac_obj *vlan_obj = - &bnx2x_leading_vfq(vf, vlan_obj); - struct bnx2x_vlan_mac_ramrod_params ramrod_param; - struct bnx2x_queue_state_params q_params = {NULL}; - struct bnx2x_queue_update_params *update_params; + if (vf->state != VF_ENABLED || + bnx2x_get_q_logical_state(bp, &bnx2x_leading_vfq(vf, sp_obj)) != + BNX2X_Q_LOGICAL_STATE_ACTIVE) + return rc; - rc = validate_vlan_mac(bp, &bnx2x_leading_vfq(vf, mac_obj)); - if (rc) - return rc; - memset(&ramrod_param, 0, sizeof(ramrod_param)); + /* configure the vlan in device on this vf's queue */ + vlan_obj = &bnx2x_leading_vfq(vf, vlan_obj); + rc = validate_vlan_mac(bp, &bnx2x_leading_vfq(vf, mac_obj)); + if (rc) + return rc; - /* must lock vfpf channel to protect against vf flows */ - bnx2x_lock_vf_pf_channel(bp, vf, CHANNEL_TLV_PF_SET_VLAN); + /* must lock vfpf channel to protect against vf flows */ + bnx2x_lock_vf_pf_channel(bp, vf, CHANNEL_TLV_PF_SET_VLAN); - /* remove existing vlans */ - __set_bit(RAMROD_COMP_WAIT, &ramrod_flags); - rc = vlan_obj->delete_all(bp, vlan_obj, &vlan_mac_flags, - &ramrod_flags); - if (rc) { - BNX2X_ERR("failed to delete vlans\n"); - rc = -EINVAL; - goto out; - } + /* remove existing vlans */ + __set_bit(RAMROD_COMP_WAIT, &ramrod_flags); + rc = vlan_obj->delete_all(bp, vlan_obj, &vlan_mac_flags, + &ramrod_flags); + if (rc) { + BNX2X_ERR("failed to delete vlans\n"); + rc = -EINVAL; + goto out; + } + + /* need to remove/add the VF's accept_any_vlan bit */ + accept_flags = bnx2x_leading_vfq(vf, accept_flags); + if (vlan) + clear_bit(BNX2X_ACCEPT_ANY_VLAN, &accept_flags); + else + set_bit(BNX2X_ACCEPT_ANY_VLAN, &accept_flags); + + bnx2x_vf_prep_rx_mode(bp, LEADING_IDX, &rx_ramrod, vf, + accept_flags); + bnx2x_leading_vfq(vf, accept_flags) = accept_flags; + bnx2x_config_rx_mode(bp, &rx_ramrod); + + /* configure the new vlan to device */ + memset(&ramrod_param, 0, sizeof(ramrod_param)); + __set_bit(RAMROD_COMP_WAIT, &ramrod_flags); + ramrod_param.vlan_mac_obj = vlan_obj; + ramrod_param.ramrod_flags = ramrod_flags; + set_bit(BNX2X_DONT_CONSUME_CAM_CREDIT, + &ramrod_param.user_req.vlan_mac_flags); + ramrod_param.user_req.u.vlan.vlan = vlan; + ramrod_param.user_req.cmd = BNX2X_VLAN_MAC_ADD; + rc = bnx2x_config_vlan_mac(bp, &ramrod_param); + if (rc) { + BNX2X_ERR("failed to configure vlan\n"); + rc = -EINVAL; + goto out; + } - /* send queue update ramrod to configure default vlan and silent - * vlan removal + /* send queue update ramrod to configure default vlan and silent + * vlan removal + */ + __set_bit(RAMROD_COMP_WAIT, &q_params.ramrod_flags); + q_params.cmd = BNX2X_Q_CMD_UPDATE; + q_params.q_obj = &bnx2x_leading_vfq(vf, sp_obj); + update_params = &q_params.params.update; + __set_bit(BNX2X_Q_UPDATE_DEF_VLAN_EN_CHNG, + &update_params->update_flags); + __set_bit(BNX2X_Q_UPDATE_SILENT_VLAN_REM_CHNG, + &update_params->update_flags); + if (vlan == 0) { + /* if vlan is 0 then we want to leave the VF traffic + * untagged, and leave the incoming traffic untouched + * (i.e. do not remove any vlan tags). */ - __set_bit(RAMROD_COMP_WAIT, &q_params.ramrod_flags); - q_params.cmd = BNX2X_Q_CMD_UPDATE; - q_params.q_obj = &bnx2x_leading_vfq(vf, sp_obj); - update_params = &q_params.params.update; - __set_bit(BNX2X_Q_UPDATE_DEF_VLAN_EN_CHNG, + __clear_bit(BNX2X_Q_UPDATE_DEF_VLAN_EN, + &update_params->update_flags); + __clear_bit(BNX2X_Q_UPDATE_SILENT_VLAN_REM, + &update_params->update_flags); + } else { + /* configure default vlan to vf queue and set silent + * vlan removal (the vf remains unaware of this vlan). + */ + __set_bit(BNX2X_Q_UPDATE_DEF_VLAN_EN, &update_params->update_flags); - __set_bit(BNX2X_Q_UPDATE_SILENT_VLAN_REM_CHNG, + __set_bit(BNX2X_Q_UPDATE_SILENT_VLAN_REM, &update_params->update_flags); + update_params->def_vlan = vlan; + update_params->silent_removal_value = + vlan & VLAN_VID_MASK; + update_params->silent_removal_mask = VLAN_VID_MASK; + } - if (vlan == 0) { - /* if vlan is 0 then we want to leave the VF traffic - * untagged, and leave the incoming traffic untouched - * (i.e. do not remove any vlan tags). - */ - __clear_bit(BNX2X_Q_UPDATE_DEF_VLAN_EN, - &update_params->update_flags); - __clear_bit(BNX2X_Q_UPDATE_SILENT_VLAN_REM, - &update_params->update_flags); - } else { - /* configure the new vlan to device */ - __set_bit(RAMROD_COMP_WAIT, &ramrod_flags); - ramrod_param.vlan_mac_obj = vlan_obj; - ramrod_param.ramrod_flags = ramrod_flags; - ramrod_param.user_req.u.vlan.vlan = vlan; - ramrod_param.user_req.cmd = BNX2X_VLAN_MAC_ADD; - rc = bnx2x_config_vlan_mac(bp, &ramrod_param); - if (rc) { - BNX2X_ERR("failed to configure vlan\n"); - rc = -EINVAL; - goto out; - } - - /* configure default vlan to vf queue and set silent - * vlan removal (the vf remains unaware of this vlan). - */ - update_params = &q_params.params.update; - __set_bit(BNX2X_Q_UPDATE_DEF_VLAN_EN, - &update_params->update_flags); - __set_bit(BNX2X_Q_UPDATE_SILENT_VLAN_REM, - &update_params->update_flags); - update_params->def_vlan = vlan; - } + /* Update the Queue state */ + rc = bnx2x_queue_state_change(bp, &q_params); + if (rc) { + BNX2X_ERR("Failed to configure default VLAN\n"); + goto out; + } - /* Update the Queue state */ - rc = bnx2x_queue_state_change(bp, &q_params); - if (rc) { - BNX2X_ERR("Failed to configure default VLAN\n"); - goto out; - } - /* clear the flag indicating that this VF needs its vlan - * (will only be set if the HV configured the Vlan before vf was - * up and we were called because the VF came up later - */ + /* clear the flag indicating that this VF needs its vlan + * (will only be set if the HV configured the Vlan before vf was + * up and we were called because the VF came up later + */ out: - vf->cfg_flags &= ~VF_CFG_VLAN; - bnx2x_unlock_vf_pf_channel(bp, vf, CHANNEL_TLV_PF_SET_VLAN); - } + vf->cfg_flags &= ~VF_CFG_VLAN; + bnx2x_unlock_vf_pf_channel(bp, vf, CHANNEL_TLV_PF_SET_VLAN); + return rc; } diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h index 1ff6a93..8c213fa52 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h @@ -74,6 +74,7 @@ struct bnx2x_vf_queue { /* VLANs object */ struct bnx2x_vlan_mac_obj vlan_obj; atomic_t vlan_count; /* 0 means vlan-0 is set ~ untagged */ + unsigned long accept_flags; /* last accept flags configured */ /* Queue Slow-path State object */ struct bnx2x_queue_sp_obj sp_obj; diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c index 26fcba2..0756d7d 100644 --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c @@ -1598,6 +1598,8 @@ static void bnx2x_vfop_mbx_qfilters(struct bnx2x *bp, struct bnx2x_virtf *vf) if (msg->flags & VFPF_SET_Q_FILTERS_RX_MASK_CHANGED) { unsigned long accept = 0; + struct pf_vf_bulletin_content *bulletin = + BP_VF_BULLETIN(bp, vf->index); /* covert VF-PF if mask to bnx2x accept flags */ if (msg->rx_mask & VFPF_RX_MASK_ACCEPT_MATCHED_UNICAST) @@ -1617,9 +1619,11 @@ static void bnx2x_vfop_mbx_qfilters(struct bnx2x *bp, struct bnx2x_virtf *vf) __set_bit(BNX2X_ACCEPT_BROADCAST, &accept); /* A packet arriving the vf's mac should be accepted - * with any vlan + * with any vlan, unless a vlan has already been + * configured. */ - __set_bit(BNX2X_ACCEPT_ANY_VLAN, &accept); + if (!(bulletin->valid_bitmap & (1 << VLAN_VALID))) + __set_bit(BNX2X_ACCEPT_ANY_VLAN, &accept); /* set rx-mode */ rc = bnx2x_vfop_rxmode_cmd(bp, vf, &cmd, @@ -1710,6 +1714,21 @@ static void bnx2x_vf_mbx_set_q_filters(struct bnx2x *bp, goto response; } } + /* if vlan was set by hypervisor we don't allow guest to config vlan */ + if (bulletin->valid_bitmap & 1 << VLAN_VALID) { + int i; + + /* search for vlan filters */ + for (i = 0; i < filters->n_mac_vlan_filters; i++) { + if (filters->filters[i].flags & + VFPF_Q_FILTER_VLAN_TAG_VALID) { + BNX2X_ERR("VF[%d] attempted to configure vlan but one was already set by Hypervisor. Aborting request\n", + vf->abs_vfid); + vf->op_rc = -EPERM; + goto response; + } + } + } /* verify vf_qid */ if (filters->vf_qid > vf_rxq_count(vf)) -- cgit v0.10.2 From 7d30622dbe64a7207af8a98f48d4a4ef00ab658a Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Sun, 5 Jan 2014 22:08:25 -0200 Subject: fec: Revert "fec: Do not assume that PHY reset is active low" In order to keep DT compatibility we need to revert this, otherwise the original dts files will no longer work with this driver change. This reverts commit 7a399e3a2e05bc580a78ea72371b3896827f72e1. Signed-off-by: Fabio Estevam Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 45b8b22..50bb71c 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -2049,8 +2049,6 @@ static void fec_reset_phy(struct platform_device *pdev) int err, phy_reset; int msec = 1; struct device_node *np = pdev->dev.of_node; - enum of_gpio_flags flags; - bool port; if (!np) return; @@ -2060,22 +2058,18 @@ static void fec_reset_phy(struct platform_device *pdev) if (msec > 1000) msec = 1; - phy_reset = of_get_named_gpio_flags(np, "phy-reset-gpios", 0, &flags); + phy_reset = of_get_named_gpio(np, "phy-reset-gpios", 0); if (!gpio_is_valid(phy_reset)) return; - if (flags & OF_GPIO_ACTIVE_LOW) - port = GPIOF_OUT_INIT_LOW; - else - port = GPIOF_OUT_INIT_HIGH; - - err = devm_gpio_request_one(&pdev->dev, phy_reset, port, "phy-reset"); + err = devm_gpio_request_one(&pdev->dev, phy_reset, + GPIOF_OUT_INIT_LOW, "phy-reset"); if (err) { dev_err(&pdev->dev, "failed to get phy-reset-gpios: %d\n", err); return; } msleep(msec); - gpio_set_value(phy_reset, !port); + gpio_set_value(phy_reset, 1); } #else /* CONFIG_OF */ static void fec_reset_phy(struct platform_device *pdev) -- cgit v0.10.2 From 965801e1eb624154fe5e9dc5d2ff0b7f1951a11c Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Mon, 6 Jan 2014 01:45:50 +0100 Subject: net: 6lowpan: fix lowpan_header_create non-compression memcpy call In function lowpan_header_create(), we invoke the following code construct: struct ipv6hdr *hdr; ... hdr = ipv6_hdr(skb); ... if (...) memcpy(hc06_ptr + 1, &hdr->flow_lbl[1], 2); else memcpy(hc06_ptr, &hdr, 4); Where the else path of the condition, that is, non-compression path, calls memcpy() with a pointer to struct ipv6hdr *hdr as source, thus two levels of indirection. This cannot be correct, and likely only one level of pointer was intended as source buffer for memcpy() here. Fixes: 44331fe2aa0d ("IEEE802.15.4: 6LoWPAN basic support") Signed-off-by: Daniel Borkmann Cc: Alexander Smirnov Cc: Dmitry Eremin-Solenikov Cc: Werner Almesberger Signed-off-by: David S. Miller diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c index 459e200..a2d2456 100644 --- a/net/ieee802154/6lowpan.c +++ b/net/ieee802154/6lowpan.c @@ -547,7 +547,7 @@ static int lowpan_header_create(struct sk_buff *skb, hc06_ptr += 3; } else { /* compress nothing */ - memcpy(hc06_ptr, &hdr, 4); + memcpy(hc06_ptr, hdr, 4); /* replace the top byte with new ECN | DSCP format */ *hc06_ptr = tmp; hc06_ptr += 4; -- cgit v0.10.2 From f35f76ee76df008131bbe01a2297de0c55ee2297 Mon Sep 17 00:00:00 2001 From: Josh Boyer Date: Sun, 5 Jan 2014 10:24:01 -0500 Subject: xen-netback: Include header for vmalloc Commit ac3d5ac27735 ("xen-netback: fix guest-receive-side array sizes") added calls to vmalloc and vfree in the interface.c file without including . This causes build failures if the -Werror=implicit-function-declaration flag is passed. Signed-off-by: Josh Boyer Acked-by: Wei Liu Signed-off-by: David S. Miller diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c index 34ca4e5..fff8cdd 100644 --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include -- cgit v0.10.2 From da1388d655292a11b5e9c011532e9ca83f77e1d3 Mon Sep 17 00:00:00 2001 From: Vasundhara Volam Date: Mon, 6 Jan 2014 13:02:23 +0530 Subject: be2net: disable RSS when number of RXQs is reduced to 1 via set-channels When *only* the default RXQ is used, the RSS policy must be disabled so that all IP and no-IP traffic is placed into the default RXQ. If not, IP traffic is dropped. Also, issue the RSS_CONFIG cmd only if FW advertises RSS capability for the interface. Signed-off-by: Vasundhara Volam Signed-off-by: Sathya Perla Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c b/drivers/net/ethernet/emulex/benet/be_cmds.c index e0e8bc1..b84902e 100644 --- a/drivers/net/ethernet/emulex/benet/be_cmds.c +++ b/drivers/net/ethernet/emulex/benet/be_cmds.c @@ -2017,6 +2017,9 @@ int be_cmd_rss_config(struct be_adapter *adapter, u8 *rsstable, 0x3ea83c02, 0x4a110304}; int status; + if (!(be_if_cap_flags(adapter) & BE_IF_FLAGS_RSS)) + return 0; + if (mutex_lock_interruptible(&adapter->mbox_lock)) return -1; diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index 0fde69d..6b774a5 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -2744,13 +2744,16 @@ static int be_rx_qs_create(struct be_adapter *adapter) if (!BEx_chip(adapter)) adapter->rss_flags |= RSS_ENABLE_UDP_IPV4 | RSS_ENABLE_UDP_IPV6; + } else { + /* Disable RSS, if only default RX Q is created */ + adapter->rss_flags = RSS_ENABLE_NONE; + } - rc = be_cmd_rss_config(adapter, rsstable, adapter->rss_flags, - 128); - if (rc) { - adapter->rss_flags = 0; - return rc; - } + rc = be_cmd_rss_config(adapter, rsstable, adapter->rss_flags, + 128); + if (rc) { + adapter->rss_flags = RSS_ENABLE_NONE; + return rc; } /* First time posting */ -- cgit v0.10.2 From 5eeff6354faffb3f140d690eec1cede78de53b06 Mon Sep 17 00:00:00 2001 From: Suresh Reddy Date: Mon, 6 Jan 2014 13:02:24 +0530 Subject: be2net: increase the timeout value for loopback-test FW cmd The loopback test FW cmd may need upto 15 seconds to complete on certain PHYs. This patch also fixes the name of the completion variable used to synchronize FW cmd completions as it not used by the flashing cmd alone anymore. Signed-off-by: Suresh Reddy Signed-off-by: Sathya Perla Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h index 5878df6..2e031f2 100644 --- a/drivers/net/ethernet/emulex/benet/be.h +++ b/drivers/net/ethernet/emulex/benet/be.h @@ -480,7 +480,7 @@ struct be_adapter { struct list_head entry; u32 flash_status; - struct completion flash_compl; + struct completion et_cmd_compl; struct be_resources res; /* resources available for the func */ u16 num_vfs; /* Number of VFs provisioned by PF */ diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c b/drivers/net/ethernet/emulex/benet/be_cmds.c index b84902e..94c35c8 100644 --- a/drivers/net/ethernet/emulex/benet/be_cmds.c +++ b/drivers/net/ethernet/emulex/benet/be_cmds.c @@ -141,11 +141,17 @@ static int be_mcc_compl_process(struct be_adapter *adapter, subsystem = resp_hdr->subsystem; } + if (opcode == OPCODE_LOWLEVEL_LOOPBACK_TEST && + subsystem == CMD_SUBSYSTEM_LOWLEVEL) { + complete(&adapter->et_cmd_compl); + return 0; + } + if (((opcode == OPCODE_COMMON_WRITE_FLASHROM) || (opcode == OPCODE_COMMON_WRITE_OBJECT)) && (subsystem == CMD_SUBSYSTEM_COMMON)) { adapter->flash_status = compl_status; - complete(&adapter->flash_compl); + complete(&adapter->et_cmd_compl); } if (compl_status == MCC_STATUS_SUCCESS) { @@ -2163,7 +2169,7 @@ int lancer_cmd_write_object(struct be_adapter *adapter, struct be_dma_mem *cmd, be_mcc_notify(adapter); spin_unlock_bh(&adapter->mcc_lock); - if (!wait_for_completion_timeout(&adapter->flash_compl, + if (!wait_for_completion_timeout(&adapter->et_cmd_compl, msecs_to_jiffies(60000))) status = -1; else @@ -2258,8 +2264,8 @@ int be_cmd_write_flashrom(struct be_adapter *adapter, struct be_dma_mem *cmd, be_mcc_notify(adapter); spin_unlock_bh(&adapter->mcc_lock); - if (!wait_for_completion_timeout(&adapter->flash_compl, - msecs_to_jiffies(40000))) + if (!wait_for_completion_timeout(&adapter->et_cmd_compl, + msecs_to_jiffies(40000))) status = -1; else status = adapter->flash_status; @@ -2370,6 +2376,7 @@ int be_cmd_loopback_test(struct be_adapter *adapter, u32 port_num, { struct be_mcc_wrb *wrb; struct be_cmd_req_loopback_test *req; + struct be_cmd_resp_loopback_test *resp; int status; spin_lock_bh(&adapter->mcc_lock); @@ -2384,8 +2391,8 @@ int be_cmd_loopback_test(struct be_adapter *adapter, u32 port_num, be_wrb_cmd_hdr_prepare(&req->hdr, CMD_SUBSYSTEM_LOWLEVEL, OPCODE_LOWLEVEL_LOOPBACK_TEST, sizeof(*req), wrb, NULL); - req->hdr.timeout = cpu_to_le32(4); + req->hdr.timeout = cpu_to_le32(15); req->pattern = cpu_to_le64(pattern); req->src_port = cpu_to_le32(port_num); req->dest_port = cpu_to_le32(port_num); @@ -2393,12 +2400,15 @@ int be_cmd_loopback_test(struct be_adapter *adapter, u32 port_num, req->num_pkts = cpu_to_le32(num_pkts); req->loopback_type = cpu_to_le32(loopback_type); - status = be_mcc_notify_wait(adapter); - if (!status) { - struct be_cmd_resp_loopback_test *resp = embedded_payload(wrb); - status = le32_to_cpu(resp->status); - } + be_mcc_notify(adapter); + + spin_unlock_bh(&adapter->mcc_lock); + wait_for_completion(&adapter->et_cmd_compl); + resp = embedded_payload(wrb); + status = le32_to_cpu(resp->status); + + return status; err: spin_unlock_bh(&adapter->mcc_lock); return status; diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index 6b774a5..fa44bba 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -4208,7 +4208,7 @@ static int be_ctrl_init(struct be_adapter *adapter) spin_lock_init(&adapter->mcc_lock); spin_lock_init(&adapter->mcc_cq_lock); - init_completion(&adapter->flash_compl); + init_completion(&adapter->et_cmd_compl); pci_save_state(adapter->pdev); return 0; -- cgit v0.10.2 From e3dc867c1758fc4ae6fc6b63ea6bbcd454479f08 Mon Sep 17 00:00:00 2001 From: Suresh Reddy Date: Mon, 6 Jan 2014 13:02:25 +0530 Subject: be2net: fix max_evt_qs calculation for BE3 in SR-IOV config The driver wrongly assumes 16 EQs/vectors are available for each BE3 PF. When SR-IOV is enabled, a BE3 PF can support only a max of 8 EQs. Signed-off-by: Suresh Reddy Signed-off-by: Sathya Perla Signed-off-by: David S. Miller diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h index 2e031f2..4ccaf9a 100644 --- a/drivers/net/ethernet/emulex/benet/be.h +++ b/drivers/net/ethernet/emulex/benet/be.h @@ -104,6 +104,7 @@ static inline char *nic_name(struct pci_dev *pdev) #define BE3_MAX_RSS_QS 16 #define BE3_MAX_TX_QS 16 #define BE3_MAX_EVT_QS 16 +#define BE3_SRIOV_MAX_EVT_QS 8 #define MAX_RX_QS 32 #define MAX_EVT_QS 32 diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index fa44bba..bf40fda 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -3127,11 +3127,11 @@ static void BEx_get_resources(struct be_adapter *adapter, { struct pci_dev *pdev = adapter->pdev; bool use_sriov = false; + int max_vfs; - if (BE3_chip(adapter) && sriov_want(adapter)) { - int max_vfs; + max_vfs = pci_sriov_get_totalvfs(pdev); - max_vfs = pci_sriov_get_totalvfs(pdev); + if (BE3_chip(adapter) && sriov_want(adapter)) { res->max_vfs = max_vfs > 0 ? min(MAX_VFS, max_vfs) : 0; use_sriov = res->max_vfs; } @@ -3162,7 +3162,11 @@ static void BEx_get_resources(struct be_adapter *adapter, BE3_MAX_RSS_QS : BE2_MAX_RSS_QS; res->max_rx_qs = res->max_rss_qs + 1; - res->max_evt_qs = be_physfn(adapter) ? BE3_MAX_EVT_QS : 1; + if (be_physfn(adapter)) + res->max_evt_qs = (max_vfs > 0) ? + BE3_SRIOV_MAX_EVT_QS : BE3_MAX_EVT_QS; + else + res->max_evt_qs = 1; res->if_cap_flags = BE_IF_CAP_FLAGS_WANT; if (!(adapter->function_caps & BE_FUNCTION_CAPS_RSS)) -- cgit v0.10.2 From 22d3b76ed7d2a02bca513b113f2613b9b6e61d58 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Sun, 5 Jan 2014 20:31:39 -0800 Subject: isdn: Drop big endian cpp checks from telespci and hfc_pci drivers With arm:allmodconfig, building the Teles PCI driver fails with telespci.c:294:2: error: #error "not running on big endian machines now" Similar, building the driver for HFC PCI-Bus cards fails with hfc_pci.c:1647:2: error: #error "not running on big endian machines now" Remove the big endian cpp check from both drivers to fix the build errors. Signed-off-by: Guenter Roeck Signed-off-by: David S. Miller diff --git a/drivers/isdn/hisax/hfc_pci.c b/drivers/isdn/hisax/hfc_pci.c index 497bd02..4a48255 100644 --- a/drivers/isdn/hisax/hfc_pci.c +++ b/drivers/isdn/hisax/hfc_pci.c @@ -1643,10 +1643,6 @@ setup_hfcpci(struct IsdnCard *card) int i; struct pci_dev *tmp_hfcpci = NULL; -#ifdef __BIG_ENDIAN -#error "not running on big endian machines now" -#endif - strcpy(tmp, hfcpci_revision); printk(KERN_INFO "HiSax: HFC-PCI driver Rev. %s\n", HiSax_getrev(tmp)); diff --git a/drivers/isdn/hisax/telespci.c b/drivers/isdn/hisax/telespci.c index f6ab63a..33eeb46 100644 --- a/drivers/isdn/hisax/telespci.c +++ b/drivers/isdn/hisax/telespci.c @@ -290,10 +290,6 @@ int setup_telespci(struct IsdnCard *card) struct IsdnCardState *cs = card->cs; char tmp[64]; -#ifdef __BIG_ENDIAN -#error "not running on big endian machines now" -#endif - strcpy(tmp, telespci_revision); printk(KERN_INFO "HiSax: Teles/PCI driver Rev. %s\n", HiSax_getrev(tmp)); if (cs->typ != ISDN_CTYPE_TELESPCI) -- cgit v0.10.2 From e5e97ee956d8c5ed2fc5877d29dee17a6a59de8e Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Mon, 6 Jan 2014 10:07:29 -0600 Subject: hso: fix handling of modem port SERIAL_STATE notifications The existing serial state notification handling expected older Option devices, having a hardcoded assumption that the Modem port was always USB interface #2. That isn't true for devices from the past few years. hso_serial_state_notification is a local cache of a USB Communications Interface Class SERIAL_STATE notification from the device, and the USB CDC specification (section 6.3, table 67 "Class-Specific Notifications") defines wIndex as the USB interface the event applies to. For hso devices this will always be the Modem port, as the Modem port is the only port which is set up to receive them by the driver. So instead of always expecting USB interface #2, instead validate the notification with the actual USB interface number of the Modem port. Signed-off-by: Dan Williams Tested-by: H. Nikolaus Schaller Signed-off-by: David S. Miller diff --git a/drivers/net/usb/hso.c b/drivers/net/usb/hso.c index 86292e6..1a48234 100644 --- a/drivers/net/usb/hso.c +++ b/drivers/net/usb/hso.c @@ -185,7 +185,6 @@ enum rx_ctrl_state{ #define BM_REQUEST_TYPE (0xa1) #define B_NOTIFICATION (0x20) #define W_VALUE (0x0) -#define W_INDEX (0x2) #define W_LENGTH (0x2) #define B_OVERRUN (0x1<<6) @@ -1487,6 +1486,7 @@ static void tiocmget_intr_callback(struct urb *urb) struct uart_icount *icount; struct hso_serial_state_notification *serial_state_notification; struct usb_device *usb; + int if_num; /* Sanity checks */ if (!serial) @@ -1495,15 +1495,24 @@ static void tiocmget_intr_callback(struct urb *urb) handle_usb_error(status, __func__, serial->parent); return; } + + /* tiocmget is only supported on HSO_PORT_MODEM */ tiocmget = serial->tiocmget; if (!tiocmget) return; + BUG_ON((serial->parent->port_spec & HSO_PORT_MASK) != HSO_PORT_MODEM); + usb = serial->parent->usb; + if_num = serial->parent->interface->altsetting->desc.bInterfaceNumber; + + /* wIndex should be the USB interface number of the port to which the + * notification applies, which should always be the Modem port. + */ serial_state_notification = &tiocmget->serial_state_notification; if (serial_state_notification->bmRequestType != BM_REQUEST_TYPE || serial_state_notification->bNotification != B_NOTIFICATION || le16_to_cpu(serial_state_notification->wValue) != W_VALUE || - le16_to_cpu(serial_state_notification->wIndex) != W_INDEX || + le16_to_cpu(serial_state_notification->wIndex) != if_num || le16_to_cpu(serial_state_notification->wLength) != W_LENGTH) { dev_warn(&usb->dev, "hso received invalid serial state notification\n"); -- cgit v0.10.2 From 88ad31491e21f5dec347911d9804c673af414a09 Mon Sep 17 00:00:00 2001 From: Hannes Frederic Sowa Date: Mon, 6 Jan 2014 17:53:14 +0100 Subject: ipv6: don't install anycast address for /128 addresses on routers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit It does not make sense to create an anycast address for an /128-prefix. Suppress it. As 32019e651c6fce ("ipv6: Do not leave router anycast address for /127 prefixes.") shows we also may not leave them, because we could accidentally remove an anycast address the user has allocated or got added via another prefix. Cc: François-Xavier Le Bail Cc: Thomas Haller Cc: Jiri Pirko Signed-off-by: Hannes Frederic Sowa Signed-off-by: David S. Miller diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 1a341f7..f62c72b 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -1671,7 +1671,7 @@ void addrconf_leave_solict(struct inet6_dev *idev, const struct in6_addr *addr) static void addrconf_join_anycast(struct inet6_ifaddr *ifp) { struct in6_addr addr; - if (ifp->prefix_len == 127) /* RFC 6164 */ + if (ifp->prefix_len >= 127) /* RFC 6164 */ return; ipv6_addr_prefix(&addr, &ifp->addr, ifp->prefix_len); if (ipv6_addr_any(&addr)) @@ -1682,7 +1682,7 @@ static void addrconf_join_anycast(struct inet6_ifaddr *ifp) static void addrconf_leave_anycast(struct inet6_ifaddr *ifp) { struct in6_addr addr; - if (ifp->prefix_len == 127) /* RFC 6164 */ + if (ifp->prefix_len >= 127) /* RFC 6164 */ return; ipv6_addr_prefix(&addr, &ifp->addr, ifp->prefix_len); if (ipv6_addr_any(&addr)) -- cgit v0.10.2 From fe0d692bbc645786bce1a98439e548ae619269f5 Mon Sep 17 00:00:00 2001 From: Curt Brune Date: Mon, 6 Jan 2014 11:00:32 -0800 Subject: bridge: use spin_lock_bh() in br_multicast_set_hash_max br_multicast_set_hash_max() is called from process context in net/bridge/br_sysfs_br.c by the sysfs store_hash_max() function. br_multicast_set_hash_max() calls spin_lock(&br->multicast_lock), which can deadlock the CPU if a softirq that also tries to take the same lock interrupts br_multicast_set_hash_max() while the lock is held . This can happen quite easily when any of the bridge multicast timers expire, which try to take the same lock. The fix here is to use spin_lock_bh(), preventing other softirqs from executing on this CPU. Steps to reproduce: 1. Create a bridge with several interfaces (I used 4). 2. Set the "multicast query interval" to a low number, like 2. 3. Enable the bridge as a multicast querier. 4. Repeatedly set the bridge hash_max parameter via sysfs. # brctl addbr br0 # brctl addif br0 eth1 eth2 eth3 eth4 # brctl setmcqi br0 2 # brctl setmcquerier br0 1 # while true ; do echo 4096 > /sys/class/net/br0/bridge/hash_max; done Signed-off-by: Curt Brune Signed-off-by: Scott Feldman Signed-off-by: David S. Miller diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index 4c214b2..ef66365 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -1998,7 +1998,7 @@ int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val) u32 old; struct net_bridge_mdb_htable *mdb; - spin_lock(&br->multicast_lock); + spin_lock_bh(&br->multicast_lock); if (!netif_running(br->dev)) goto unlock; @@ -2030,7 +2030,7 @@ rollback: } unlock: - spin_unlock(&br->multicast_lock); + spin_unlock_bh(&br->multicast_lock); return err; } -- cgit v0.10.2