linux-fsl-qoriq - Freescale linux tree with Scalys patches

Age	Commit message (Collapse)	Author
2010-08-24	powerpc: Re-enable preemption before cpu_die()	Signed-off-by: Darren Hart
	start_secondary() is called shortly after _start and also via cpu_idle()->cpu_die()->pseries_mach_cpu_die() start_secondary() expects a preempt_count() of 0. pseries_mach_cpu_die() is called via the cpu_idle() routine with preemption disabled, resulting in the following repeating message during rapid cpu offline/online tests with CONFIG_PREEMPT=y: BUG: scheduling while atomic: swapper/0/0x00000002 Modules linked in: autofs4 binfmt_misc dm_mirror dm_region_hash dm_log [last unloaded: scsi_wait_scan] Call Trace: [c00000010e7079c0] [c0000000000133ec] .show_stack+0xd8/0x218 (unreliable) [c00000010e707aa0] [c0000000006a47f0] .dump_stack+0x28/0x3c [c00000010e707b20] [c00000000006e7a4] .__schedule_bug+0x7c/0x9c [c00000010e707bb0] [c000000000699d9c] .schedule+0x104/0x800 [c00000010e707cd0] [c000000000015b24] .cpu_idle+0x1c4/0x1d8 [c00000010e707d70] [c0000000006aa1b4] .start_secondary+0x398/0x3d4 [c00000010e707e30] [c000000000008278] .start_secondary_resume+0x10/0x14 Move the cpu_die() call inside the existing preemption enabled block of cpu_idle(). This is safe as the idle task is affined to a single CPU so the debug_smp_processor_id() tests (from cpu_should_die()) won't trigger as we are in a "migration disabled" region. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Will Schmidt <will_schmidt@vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Nathan Fontenot <nfont@austin.ibm.com> Cc: Robert Jennings <rcj@linux.vnet.ibm.com> Cc: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-24	powerpc/pci: Drop unnecessary null test	Julia Lawall
	list_for_each_entry binds its first argument to a non-null value, and thus any null test on the value of that argument is superfluous. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ iterator I; expression x,E,E1,E2; statement S,S1,S2; @@ I(x,...) { <... - if (x != NULL \|\| ...) S ...> } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-24	powerpc/powermac: Drop unnecessary null test	Julia Lawall
	for_each_node_by_name binds its first argument to a non-null value, and thus any null test on the value of that argument is superfluous. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ iterator I; expression x,E; @@ I(x,...) { <... ( - (x != NULL) && E ...> } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-24	powerpc/powermac: Drop unnecessary of_node_put	Julia Lawall
	for_each_node_by_name only exits when its first argument is NULL, and a subsequent call to of_node_put on that argument is unnecessary. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ iterator name for_each_node_by_name; expression np,E; identifier l; @@ for_each_node_by_name(np,...) { ... when != break; when != goto l; } ... when != np = E - of_node_put(np); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Reviewed-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-24	powerpc/kdump: Stop all other CPUs before running crash handlers	Anton Blanchard
	During kdump we run the crash handlers first then stop all other CPUs. We really want to stop all CPUs as close to the fail as possible and also have a very controlled environment for running the crash handlers, so it makes sense to reverse the order. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-24	powerpc/mm: Fix vsid_scrample typo	Anton Blanchard
	The code is wrapped in an #if 0, but it's wrong so we may as well fix it. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-24	powerpc: Use is_32bit_task() helper to test 32 bit binary	Denis Kirjanov
	Use is_32bit_task() helper to test 32 bit binary. Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-24	powerpc: Export memstart_addr and kernstart_addr on ppc64	Sonny Rao
	Some modules (like eHCA) want to map all of kernel memory, for this to work with a relocated kernel, we need to export kernstart_addr so modules can use PHYSICAL_START and memstart_addr so they could use MEMORY_START. Note that the 32bit code already exports these symbols. Signed-off-By: Sonny Rao <sonnyrao@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-24	powerpc: Make rwsem use "long" type	Benjamin Herrenschmidt
	This makes the 64-bit kernel use 64-bit signed integers for the counter (effectively supporting 32-bit of active count in the semaphore), thus avoiding things like overflow of the mmap_sem if you use a really crazy number of threads Note: Ideally the type in the structure should be atomic_long_t rather than "long". However, there's some nasty issues with that. It needs to be initialized statically -and- lib/rwsem.c does things like sem->count = RWSEM_UNLOCKED_VALUE; Now, if you mix in the fact that atomic_* types are actually structures with one member and note typedefs of a scalar, it makes its really nasty. So I stuck to what we did before using a long and casts for now. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-08-24	Merge remote branch 'jwb/merge' into merge	Benjamin Herrenschmidt

2010-08-24	ARM: imx: fix build failure concerning otg/ulpi	Uwe Kleine-König
	The build failure was introduced by 13dd0c9 (USB: otg/ulpi: extend the generic ulpi driver.) Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Igor Grinberg <grinberg@compulab.co.il> Cc: Mike Rapoport <mike@compulab.co.il> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: ftdi_sio: add product ID for Lenz LI-USB	Galen Seitz
	Add ftdi product ID for Lenz LI-USB, a model train interface. This was NOT tested against 2.6.35, but a similar patch was tested with the CentOS 2.6.18-194.11.1.el5 kernel. It wasn't clear to me what ordering is being used in ftdi_sio.c, so I inserted the ID after another model train entry(SPROG_II). Signed-off-by: Galen Seitz <galens@seitzassoc.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: adutux: fix misuse of return value of copy_to_user()	Kulikov Vasiliy
	copy_to_user() returns number of not copied bytes, not error code. Signed-off-by: Kulikov Vasiliy <segooon@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: iowarrior: fix misuse of return value of copy_to_user()	Kulikov Vasiliy
	copy_to_user() returns number of not copied bytes, not error code. Signed-off-by: Kulikov Vasiliy <segooon@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: xHCI: update ring dequeue pointer when process missed tds	Andiry Xu
	This patch fixes a isoc transfer bug reported by Sander Eikelenboom. When ep->skip is set, endpoint ring dequeue pointer should be updated when processed every missed td. Although ring dequeue pointer will also be updated when ep->skip is clear, leave it intact during missed tds processing may cause two issues: 1). If the very next valid transfer following missed tds is a short transfer, its actual_length will be miscalculated; 2). If there are too many missed tds during transfer, new inserted tds may found the transfer ring full and urb enqueue fails. Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Andiry Xu <andiry.xu@amd.com> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: xhci: Remove buggy assignment in next_trb()	John Youn
	The code to increment the TRB pointer has a slight ambiguity that could lead to a bug on different compilers. The ANSI C specification does not specify the precedence of the assignment operator over the postfix operator. gcc 4.4 produced the correct code (increment the pointer and assign the value), but a MIPS compiler that one of John's clients used assigned the old (unincremented) value. Remove the unnecessary assignment to make all compilers produce the correct assembly. Signed-off-by: John Youn <johnyoun@synopsys.com> Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: ftdi_sio: Add ID for Ionics PlugComputer	Martin Michlmayr
	Add the ID for the Ionics PlugComputer (<http://ionicsplug.com/>). Signed-off-by: Martin Michlmayr <tbm@cyrius.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: serial: io_ti.c: don't return 0 if writing the download record failed	Roel Kluin
	If the write download record failed we shouldn't return 0. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: otg: twl4030: fix wrong assumption of starting state	Felipe Balbi
	The reset state of twl4030-usb is not sleeping, it starts up awaken and we need to disable it if we have booted with a disconnected cable to avoid over consumption on the default state. To avoid problems later, we read the current state of the transceiver from the PHY_PWR_CTRL register. The bootloader can, anyways, put the device to sleep before us. Tested on a custom OMAP board. Signed-off-by: Felipe Balbi <felipe.balbi@nokia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: gadget: Return -ENOMEM on memory allocation failure	Julia Lawall
	In this code, 0 is returned on memory allocation failure, even though other failures return -ENOMEM or other similar values. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression ret; expression x,e1,e2,e3; @@ ret = 0 ... when != ret = e1 *x = \(kmalloc\\|kcalloc\\|kzalloc\)(...) ... when != ret = e2 if (x == NULL) { ... when != ret = e3 return ret; } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: gadget: fix composite kernel-doc warnings	Randy Dunlap
	Warning(include/linux/usb/composite.h:284): No description found for parameter 'disconnect' Warning(drivers/usb/gadget/composite.c:744): No description found for parameter 'c' Warning(drivers/usb/gadget/composite.c:744): Excess function parameter 'cdev' description in 'usb_string_ids_n' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: ssu100: set tty_flags in ssu100_process_packet	Bill Pemberton
	flag was never set in ssu100_process_packet. Add logic to set it before calling tty_insert_flip_* Signed-off-by: Bill Pemberton <wfp5p@virginia.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: ssu100: add disconnect function for ssu100	Bill Pemberton
	Add a disconnect function to the functions of this device. The disconnect is a call to usb_serial_generic_disconnect() so it requires that symbol to be exported from generic.c. Signed-off-by: Bill Pemberton <wfp5p@virginia.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: serial: export symbol usb_serial_generic_disconnect	Bill Pemberton
	This is needed by the ssu100 driver to use this function. Signed-off-by: Bill Pemberton <wfp5p@virginia.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: ssu100: rework logic for TIOCMIWAIT	Bill Pemberton
	Rework the logic for TIOCMIWAIT to use wait_event_interruptible. This also adds support for TIOCGICOUNT. Signed-off-by: Bill Pemberton <wfp5p@virginia.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: ssu100: add register parameter to ssu100_setregister	Bill Pemberton
	The function ssu100_setregister was hard coded to only set the MCR register. Add a register parameter so that other registers can be set. Signed-off-by: Bill Pemberton <wfp5p@virginia.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: ssu100: remove duplicate #defines in ssu100	Bill Pemberton
	The ssu100 uses a TI16C550C UART so the SERIAL_ defines in this code are duplicates of those found in serial_reg.h. Remove the defines in ssu100.c and use the ones in the header file. Signed-off-by: Bill Pemberton <wfp5p@virginia.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: ssu100: refine process_packet in ssu100	Bill Pemberton
	The status information does not appear at the start of each incoming packet so the check for len < 4 at the start of ssu100_process_packet is wrong. Remove it. Signed-off-by: Bill Pemberton <wfp5p@virginia.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: ssu100: add locking for port private data in ssu100	Bill Pemberton
	Signed-off-by: Bill Pemberton <wfp5p@virginia.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: r8a66597-udc: return -ENOMEM if kzalloc() fails	Axel Lin
	Signed-off-by: Axel Lin <axel.lin@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: io_ti: check firmware version before updating	Greg Kroah-Hartman
	If we can't read the firmware for a device from the disk, and yet the device already has a valid firmware image in it, we don't want to replace the firmware with something invalid. So check the version number to be less than the current one to verify this is the correct thing to do. Reported-by: Chris Beauchamp <chris@chillibean.tv> Tested-by: Chris Beauchamp <chris@chillibean.tv> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: ftdi_sio: fix endianess of max packet size	Michael Wileczka
	The USB max packet size (always little-endian) was not being byte swapped on big-endian systems. Applicable since [USB: ftdi_sio: fix hi-speed device packet size calculation] approx 2.6.31 Signed-off-by: Michael Wileczka <mikewileczka@yahoo.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: CP210x Fix Break On/Off	Craig Shelley
	The definitions for BREAK_ON and BREAK_OFF are inverted, causing break requests to fail. This patch sets BREAK_ON and BREAK_OFF to the correct values. Signed-off-by: Craig Shelley <craig@microtron.org.uk> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: pl2303: New vendor and product id	Jef Driesen
	Add support for the Zeagle N2iTiON3 dive computer interface. Since Zeagle devices are actually manufactured by Seiko, this patch will support other Seiko based models as well. Signed-off-by: Jef Driesen <jefdriesen@telenet.be> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: serial: fix leak of usb serial module refrence count	Ming Lei
	The patch with title below makes reference count of usb serial module always more than one after driver is bound. USB-BKL: Remove BKL use for usb serial driver probing In fact, the patch above only replaces lock_kernel() with try_module_get() , and does not use module_put() to do what unlock_kernel() did, so casue leak of reference count of usb serial module and the module can not be unloaded after serial driver is bound with device. This patch fixes the issue, also simplifies such things: -only call try_module_get() once in the entry of usb_serial_probe() -only call module_put() once in the exit of usb_serial_probe Signed-off-by: Ming Lei <tom.leiming@gmail.com> Cc: Johan Hovold <jhovold@gmail.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: add device IDs for igotu to navman	Ross Burton
	I recently bought a i-gotU USB GPS, and whilst hunting around for linux support discovered this post by you back in 2009: http://kerneltrap.org/mailarchive/linux-usb/2009/3/12/5148644 >Try the navman driver instead. You can either add the device id to the > driver and rebuild it, or do this before you plug the device in: > modprobe navman > echo -n "0x0df7 0x0900" > /sys/bus/usb-serial/drivers/navman/new_id > > and then plug your device in and see if that works. I can confirm that the navman driver works with the right device IDs on my i-gotU GT-600, which has the same device IDs. Attached is a patch adding the IDs. From: Ross Burton <ross@linux.intel.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: isp1760: use a write barrier to ensure proper ndelay timing	Michael Hennerich
	The ISP1760 has some timing requirements where it has to delay a short period after a write to a register has started. However, this delay is from the time the write hits the USB chip (the ISP1760), not from the time where the processor started processing the write. So on a quick enough processor, it is sometimes possible for the write to not hit the device before we start delaying, and we then violate the part's timing requirements, so things stop working. To avoid all this, insert a write barrier after the register write and before the timing delay/register read so we can guarantee we only start counting time after the write has hit the device. Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: option: add Celot CT-650	Michael Tokarev
	Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	USB: uvc_v4l2: cleanup test for end of loop	Dan Carpenter
	We're trying to test for the the end of the loop here. "format" is never NULL. We don't know what "format->fcc" is because we're past the end of the loop and I think "fmt->fmt.pix.pixelformat" comes from the user so we don't know what that is either. It works, but it's cleaner to just test to see if (i == ARRAY_SIZE(uvc_formats). Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-24	xfs: do not discard page cache data on EAGAIN	Christoph Hellwig
	If xfs_map_blocks returns EAGAIN because of lock contention we must redirty the page and not disard the pagecache content and return an error from writepage. We used to do this correctly, but the logic got lost during the recent reshuffle of the writepage code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Mike Gao <ygao.linux@gmail.com> Tested-by: Mike Gao <ygao.linux@gmail.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com>
2010-08-24	xfs: don't do memory allocation under the CIL context lock	Dave Chinner
	Formatting items requires memory allocation when using delayed logging. Currently that memory allocation is done while holding the CIL context lock in read mode. This means that if memory allocation takes some time (e.g. enters reclaim), we cannot push on the CIL until the allocation(s) required by formatting complete. This can stall CIL pushes for some time, and once a push is stalled so are all new transaction commits. Fix this splitting the item formatting into two steps. The first step which does the allocation and memcpy() into the allocated buffer is now done outside the CIL context lock, and only the CIL insert is done inside the CIL context lock. This avoids the stall issue. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-08-24	xfs: Reduce log force overhead for delayed logging	Dave Chinner
	Delayed logging adds some serialisation to the log force process to ensure that it does not deference a bad commit context structure when determining if a CIL push is necessary or not. It does this by grabing the CIL context lock exclusively, then dropping it before pushing the CIL if necessary. This causes serialisation of all log forces and pushes regardless of whether a force is necessary or not. As a result fsync heavy workloads (like dbench) can be significantly slower with delayed logging than without. To avoid this penalty, copy the current sequence from the context to the CIL structure when they are swapped. This allows us to do unlocked checks on the current sequence without having to worry about dereferencing context structures that may have already been freed. Hence we can remove the CIL context locking in the forcing code and only call into the push code if the current context matches the sequence we need to force. By passing the sequence into the push code, we can check the sequence again once we have the CIL lock held exclusive and abort if the sequence has already been pushed. This avoids a lock round-trip and unnecessary CIL pushes when we have racing push calls. The result is that the regression in dbench performance goes away - this change improves dbench performance on a ramdisk from ~2100MB/s to ~2500MB/s. This compares favourably to not using delayed logging which retuns ~2500MB/s for the same workload. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-08-24	xfs: dummy transactions should not dirty VFS state	Dave Chinner
	When we need to cover the log, we issue dummy transactions to ensure the current log tail is on disk. Unfortunately we currently use the root inode in the dummy transaction, and the act of committing the transaction dirties the inode at the VFS level. As a result, the VFS writeback of the dirty inode will prevent the filesystem from idling long enough for the log covering state machine to complete. The state machine gets stuck in a loop issuing new dummy transactions to cover the log and never makes progress. To avoid this problem, the dummy transactions should not cause externally visible state changes. To ensure this occurs, make sure that dummy transactions log an unchanging field in the superblock as it's state is never propagated outside the filesystem. This allows the log covering state machine to complete successfully and the filesystem now correctly enters a fully idle state about 90s after the last modification was made. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-08-24	xfs: ensure f_ffree returned by statfs() is non-negative	Stuart Brodsky
	Because of delayed updates to sb_icount field in the super block, it is possible to allocate over maxicount number of inodes. This causes the arithmetic to calculate a negative number of free inodes in user commands like df or stat -f. Since maxicount is a somewhat arbitrary number, a slight over allocation is not critical but user commands should be displayed as 0 or greater and never go negative. To do this the value in the stats buffer f_ffree is capped to never go negative. [ Modified to use max_t as per Christoph's comment. ] Signed-off-by: Stu Brodsky <sbrodsky@sgi.com> Signed-off-by: Dave Chinner <dchinner@redhat.com>
2010-08-24	xfs: handle negative wbc->nr_to_write during sync writeback	Dave Chinner
	During data integrity (WB_SYNC_ALL) writeback, wbc->nr_to_write will go negative on inodes with more than 1024 dirty pages due to implementation details of write_cache_pages(). Currently XFS will abort page clustering in writeback once nr_to_write drops below zero, and so for data integrity writeback we will do very inefficient page at a time allocation and IO submission for inodes with large numbers of dirty pages. Fix this by only aborting the page clustering code when wbc->nr_to_write is negative and the sync mode is WB_SYNC_NONE. Cc: <stable@kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-08-24	writeback: write_cache_pages doesn't terminate at nr_to_write <= 0	Dave Chinner
	I noticed XFS writeback in 2.6.36-rc1 was much slower than it should have been. Enabling writeback tracing showed: flush-253:16-8516 [007] 1342952.351608: wbc_writepage: bdi 253:16: towrt=1024 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0 flush-253:16-8516 [007] 1342952.351654: wbc_writepage: bdi 253:16: towrt=1023 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0 flush-253:16-8516 [000] 1342952.369520: wbc_writepage: bdi 253:16: towrt=0 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0 flush-253:16-8516 [000] 1342952.369542: wbc_writepage: bdi 253:16: towrt=-1 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0 flush-253:16-8516 [000] 1342952.369549: wbc_writepage: bdi 253:16: towrt=-2 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0 Writeback is not terminating in background writeback if ->writepage is returning with wbc->nr_to_write == 0, resulting in sub-optimal single page writeback on XFS. Fix the write_cache_pages loop to terminate correctly when this situation occurs and so prevent this sub-optimal background writeback pattern. This improves sustained sequential buffered write performance from around 250MB/s to 750MB/s for a 100GB file on an XFS filesystem on my 8p test VM. Cc:<stable@kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-08-24	xfs: fix untrusted inode number lookup	Dave Chinner
	Commit 7124fe0a5b619d65b739477b3b55a20bf805b06d ("xfs: validate untrusted inode numbers during lookup") changes the inode lookup code to do btree lookups for untrusted inode numbers. This change made an invalid assumption about the alignment of inodes and hence incorrectly calculated the first inode in the cluster. As a result, some inode numbers were being incorrectly considered invalid when they were actually valid. The issue was not picked up by the xfstests suite because it always runs fsr and dump (the two utilities that utilise the bulkstat interface) on cache hot inodes and hence the lookup code in the cold cache path was not sufficiently exercised to uncover this intermittent problem. Fix the issue by relaxing the btree lookup criteria and then checking if the record returned contains the inode number we are lookup for. If it we get an incorrect record, then the inode number is invalid. Cc: <stable@kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-08-24	xfs: ensure we mark all inodes in a freed cluster XFS_ISTALE	Dave Chinner
	Under heavy load parallel metadata loads (e.g. dbench), we can fail to mark all the inodes in a cluster being freed as XFS_ISTALE as we skip inodes we cannot get the XFS_ILOCK_EXCL or the flush lock on. When this happens and the inode cluster buffer has already been marked stale and freed, inode reclaim can try to write the inode out as it is dirty and not marked stale. This can result in writing th metadata to an freed extent, or in the case it has already been overwritten trigger a magic number check failure and return an EUCLEAN error such as: Filesystem "ram0": inode 0x442ba1 background reclaim flush failed with 117 Fix this by ensuring that we hoover up all in memory inodes in the cluster and mark them XFS_ISTALE when freeing the cluster. Cc: <stable@kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-08-24	xfs: unlock items before allowing the CIL to commit	Dave Chinner
	When we commit a transaction using delayed logging, we need to unlock the items in the transaciton before we unlock the CIL context and allow it to be checkpointed. If we unlock them after we release the CIl context lock, the CIL can checkpoint and complete before we free the log items. This breaks stale buffer item unlock and unpin processing as there is an implicit assumption that the unlock will occur before the unpin. Also, some log items need to store the LSN of the transaction commit in the item (inodes and EFIs) and so can race with other transaction completions if we don't prevent the CIL from checkpointing before the unlock occurs. Cc: <stable@kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2010-08-24	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds
	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits) netfilter: fix CONFIG_COMPAT support isdn/avm: fix build when PCMCIA is not enabled header: fix broken headers for user space e1000e: don't check for alternate MAC addr on parts that don't support it e1000e: disable ASPM L1 on 82573 ll_temac: Fix poll implementation netxen: fix a race in netxen_nic_get_stats() qlnic: fix a race in qlcnic_get_stats() irda: fix a race in irlan_eth_xmit() net: sh_eth: remove unused variable netxen: update version 4.0.74 netxen: fix inconsistent lock state vlan: Match underlying dev carrier on vlan add ibmveth: Fix opps during MTU change on an active device ehea: Fix synchronization between HW and SW send queue bnx2x: Update bnx2x version to 1.52.53-4 bnx2x: Fix PHY locking problem rds: fix a leak of kernel memory netlink: fix compat recvmsg netfilter: fix userspace header warning ...