summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2011-10-12xfs: factor extent allocation out of xfs_bmapiDave Chinner
To further improve the readability of xfs_bmapi(), factor the extent allocation out into a separate function. This removes a large block of logic from the xfs_bmapi() code loop and makes it easier to see the operational logic flow for xfs_bmapi(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: do not use xfs_bmap_add_extent for adding delalloc extentsChristoph Hellwig
We can just call xfs_bmap_add_extent_hole_delay directly to add a delayed allocated regions to the extent tree, instead of going through all the complexities of xfs_bmap_add_extent that aren't needed for this simple case. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: introduce xfs_bmapi_delay()Christoph Hellwig
Delalloc reservations are much simpler than allocations, so give them a separate bmapi-level interface. Using the previously added xfs_bmapi_reserve_delalloc we get a function that is only minimally more complicated than xfs_bmapi_read, which is far from the complexity in xfs_bmapi. Also remove the XFS_BMAPI_DELAY code after switching over the only user to xfs_bmapi_delay. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: factor delalloc reservations out of xfs_bmapiChristoph Hellwig
Move the reservation of delayed allocations, and addition of delalloc regions to the extent trees into a new helper function. For now this adds some twisted goto logic to xfs_bmapi, but that will be cleaned up in the following patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: remove xfs_bmapi_single()Dave Chinner
Now we have xfs_bmapi_read, there is no need for xfs_bmapi_single(). Change the remaining caller over and kill the function. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: introduce xfs_bmapi_read()Dave Chinner
xfs_bmapi() currently handles both extent map reading and allocation. As a result, the code is littered with "if (wr)" branches to conditionally do allocation operations if required. This makes the code much harder to follow and causes significant indent issues with the code. Given that read mapping is much simpler than allocation, we can split out read mapping from xfs_bmapi() and reuse the logic that we have already factored out do do all the hard work of handling the extent map manipulations. The results in a much simpler function for the common extent read operations, and will allow the allocation code to be simplified in another commit. Once xfs_bmapi_read() is implemented, convert all the callers of xfs_bmapi() that are only reading extents to use the new function. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: factor extent map manipulations out of xfs_bmapiDave Chinner
To further improve the readability of xfs_bmapi(), factor the pure extent map manipulations out into separate functions. This removes large blocks of logic from the xfs_bmapi() code loop and makes it easier to see the operational logic flow for xfs_bmapi(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: remove the nextents variable in xfs_bmapiChristoph Hellwig
Instead of using a local variable that needs to updated when we modify the extent map just check ifp->if_bytes directly where we use it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: remove impossible to read code in xfs_bmap_add_extent_delay_realChristoph Hellwig
We already have the worst case blocks reserved, so xfs_icsb_modify_counters won't fail in xfs_bmap_add_extent_delay_real. In fact we've had an assert to catch this case since day and it never triggered. So remove the code to try smaller reservations, and just return the error for that case in addition to keeping the assert. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: remove the first extent special case in xfs_bmap_add_extentChristoph Hellwig
Both xfs_bmap_add_extent_hole_delay and xfs_bmap_add_extent_hole_real already contain code to handle the case where there is no extent to merge with, which is effectively the same as the code duplicated here. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: Return -EIO when xfs_vn_getattr() failedMitsuo Hayasaka
An attribute of inode can be fetched via xfs_vn_getattr() in XFS. Currently it returns EIO, not negative value, when it failed. As a result, the system call returns not negative value even though an error occured. The stat(2), ls and mv commands cannot handle this error and do not work correctly. This patch fixes this bug, and returns -EIO, not EIO when an error is detected in xfs_vn_getattr(). Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: Fix the incorrect comment in the header of _xfs_buf_findChandra Seetharaman
Fix the incorrect comment in the header of the function _xfs_buf_find(). Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: Check the return value of xfs_trans_get_buf()Chandra Seetharaman
Check the return value of xfs_trans_get_buf() and fail appropriately. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: Check the return value of xfs_buf_get()Chandra Seetharaman
Check the return value of xfs_buf_get() and fail appropriately. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: improve ioend error handlingChristoph Hellwig
Return unwritten extent conversion errors to aio_complete. Skip both unwritten extent conversion and size updates if we had an I/O error or the filesystem has been shut down. Return -EIO to the aio/buffer completion handlers in case of a forced shutdown. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: avoid direct I/O write vs buffered I/O raceChristoph Hellwig
Currently a buffered reader or writer can add pages to the pagecache while we are waiting for the iolock in xfs_file_dio_aio_write. Prevent this by re-checking mapping->nrpages after we got the iolock, and if nessecary upgrade the lock to exclusive mode. To simplify this a bit only take the ilock inside of xfs_file_aio_write_checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: avoid synchronous transactions when deleting attr blocksChristoph Hellwig
Currently xfs_attr_inactive causes a synchronous transactions if we are removing a file that has any extents allocated to the attribute fork, and thus makes XFS extremely slow at removing files with out of line extended attributes. The code looks a like a relict from the days before the busy extent list, but with the busy extent list we avoid reusing data and attr extents that have been freed but not commited yet, so this code is just as superflous as the synchronous transactions for data blocks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: remove i_iocountChristoph Hellwig
We now have an i_dio_count filed and surrounding infrastructure to wait for direct I/O completion instead of i_icount, and we have never needed to iocount waits for buffered I/O given that we only set the page uptodate after finishing all required work. Thus remove i_iocount, and replace the actually needed waits with calls to inode_dio_wait. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: wait for I/O completion when writing out pages in xfs_setattr_sizeChristoph Hellwig
The current code relies on the xfs_ioend_wait call later on to make sure all I/O actually has completed. The xfs_ioend_wait call will go away soon, so prepare for that by using the waiting filemap function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: reduce ioend latencyChristoph Hellwig
There is no reason to queue up ioends for processing in user context unless we actually need it. Just complete ioends that do not convert unwritten extents or need a size update from the end_io context. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: defer AIO/DIO completionsChristoph Hellwig
We really shouldn't complete AIO or DIO requests until we have finished the unwritten extent conversion and size update. This means fsync never has to pick up any ioends as all work has been completed when signalling I/O completion. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: remove dead ENODEV handling in xfs_destroy_ioendChristoph Hellwig
No driver returns ENODEV from it bio completion handler, not has this ever been documented. Remove the dead code dealing with it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: use the "delwri" terminology consistentlyChristoph Hellwig
And also remove the strange local lock and delwri list pointers in a few functions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: let xfs_bwrite callers handle the xfs_buf_relseChristoph Hellwig
Remove the xfs_buf_relse from xfs_bwrite and let the caller handle it to mirror the delwri and read paths. Also remove the mount pointer passed to xfs_bwrite, which is superflous now that we have a mount pointer in the buftarg. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: call xfs_buf_delwri_queue directlyChristoph Hellwig
Unify the ways we add buffers to the delwri queue by always calling xfs_buf_delwri_queue directly. The xfs_bdwrite functions is removed and opencoded in its callers, and the two places setting XBF_DELWRI while a buffer is locked and expecting xfs_buf_unlock to pick it up are converted to call xfs_buf_delwri_queue directly, too. Also replace the XFS_BUF_UNDELAYWRITE macro with direct calls to xfs_buf_delwri_dequeue to make the explicit queuing/dequeuing more obvious. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: move more delwri setup into xfs_buf_delwri_queueChristoph Hellwig
Do not transfer a reference held by the caller to the buffer on the list, or decrement it in xfs_buf_delwri_queue, but instead grab a new reference if needed, and let the caller drop its own reference. Also move setting of the XBF_DELWRI and XBF_ASYNC flags into xfs_buf_delwri_queue, and only do it if needed. Note that for now xfs_buf_unlock already has XBF_DELWRI, but that will change in the following patches. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: remove the unlock argument to xfs_buf_delwri_queueChristoph Hellwig
We can just unlock the buffer in the caller, and the decrement of b_hold would also be needed in the !unlock, we just never hit that case currently given that the caller handles that case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: remove delwri buffer handling from xfs_buf_iorequestChristoph Hellwig
We cannot ever reach xfs_buf_iorequest for a buffer with XBF_DELWRI set, given that all write handlers make sure that the buffer is remove from the delwri queue before, and we never do reads with the XBF_DELWRI flag set (which the code would not handle correctly anyway). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: don't serialise adjacent concurrent direct IO appending writesDave Chinner
For append write workloads, extending the file requires a certain amount of exclusive locking to be done up front to ensure sanity in things like ensuring that we've zeroed any allocated regions between the old EOF and the start of the new IO. For single threads, this typically isn't a problem, and for large IOs we don't serialise enough for it to be a problem for two threads on really fast block devices. However for smaller IO and larger thread counts we have a problem. Take 4 concurrent sequential, single block sized and aligned IOs. After the first IO is submitted but before it completes, we end up with this state: IO 1 IO 2 IO 3 IO 4 +-------+-------+-------+-------+ ^ ^ | | | | | | | \- ip->i_new_size \- ip->i_size And the IO is done without exclusive locking because offset <= ip->i_size. When we submit IO 2, we see offset > ip->i_size, and grab the IO lock exclusive, because there is a chance we need to do EOF zeroing. However, there is already an IO in progress that avoids the need for IO zeroing because offset <= ip->i_new_size. hence we could avoid holding the IO lock exlcusive for this. Hence after submission of the second IO, we'd end up this state: IO 1 IO 2 IO 3 IO 4 +-------+-------+-------+-------+ ^ ^ | | | | | | | \- ip->i_new_size \- ip->i_size There is no need to grab the i_mutex of the IO lock in exclusive mode if we don't need to invalidate the page cache. Taking these locks on every direct IO effective serialises them as taking the IO lock in exclusive mode has to wait for all shared holders to drop the lock. That only happens when IO is complete, so effective it prevents dispatch of concurrent direct IO writes to the same inode. And so you can see that for the third concurrent IO, we'd avoid exclusive locking for the same reason we avoided the exclusive lock for the second IO. Fixing this is a bit more complex than that, because we need to hold a write-submission local value of ip->i_new_size to that clearing the value is only done if no other thread has updated it before our IO completes..... Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-12xfs: don't serialise direct IO reads on page cache checksDave Chinner
There is no need to grab the i_mutex of the IO lock in exclusive mode if we don't need to invalidate the page cache. Taking these locks on every direct IO effective serialises them as taking the IO lock in exclusive mode has to wait for all shared holders to drop the lock. That only happens when IO is complete, so effective it prevents dispatch of concurrent direct IO reads to the same inode. Fix this by taking the IO lock shared to check the page cache state, and only then drop it and take the IO lock exclusively if there is work to be done. Hence for the normal direct IO case, no exclusive locking will occur. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Joern Engel <joern@logfs.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-05Linux 3.1-rc9Linus Torvalds
2011-10-04Merge git://github.com/davem330/netLinus Torvalds
* git://github.com/davem330/net: pch_gbe: Fixed the issue on which a network freezes pch_gbe: Fixed the issue on which PC was frozen when link was downed. make PACKET_STATISTICS getsockopt report consistently between ring and non-ring net: xen-netback: correctly restart Tx after a VM restore/migrate bonding: properly stop queuing work when requested can bcm: fix incomplete tx_setup fix RDSRDMA: Fix cleanup of rds_iw_mr_pool net: Documentation: Fix type of variables ibmveth: Fix oops on request_irq failure ipv6: nullify ipv6_ac_list and ipv6_fl_list when creating new socket cxgb4: Fix EEH on IBM P7IOC can bcm: fix tx_setup off-by-one errors MAINTAINERS: tehuti: Alexander Indenbaum's address bounces dp83640: reduce driver noise ptp: fix L2 event message recognition
2011-10-04Merge branch 'fix/asoc' of git://github.com/tiwai/soundLinus Torvalds
* 'fix/asoc' of git://github.com/tiwai/sound: ASoC: omap_mcpdm_remove cannot be __devexit ASoC: Fix setting update bits for WM8753_LADC and WM8753_RADC ASoC: use a valid device for dev_err() in Zylonite
2011-10-04Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/radeon/kms: fix channel_remap setup (v2) drm/radeon: Set cursor x/y to 0 when x/yorigin > 0. drm/radeon: Update AVIVO cursor coordinate origin before x/yorigin calculation. drm/radeon: Simplify cursor x/yorigin calculation. drm/radeon/kms: fix cursor image off-by-one error drm/radeon/kms: Fix logic error in DP HPD handler drm/radeon/kms: add retry limits for native DP aux defer drm/radeon/kms: fix regression in DP aux defer handling
2011-10-04Merge branch 'spi/merge' of git://git.secretlab.ca/git/linux-2.6Linus Torvalds
* 'spi/merge' of git://git.secretlab.ca/git/linux-2.6: spi-topcliff-pch: Fix overrun issue spi-topcliff-pch: Add recovery processing in case FIFO overrun error occurs spi-topcliff-pch: Fix CPU read complete condition issue spi-topcliff-pch: Fix SSN Control issue spi-topcliff-pch: add tx-memory clear after complete transmitting
2011-10-04PCI: Disable MPS configuration by defaultJon Mason
Add the ability to disable PCI-E MPS turning and using the BIOS configured MPS defaults. Due to the number of issues recently discovered on some x86 chipsets, make this the default behavior. Also, add the option for peer to peer DMA MPS configuration. Peer to peer DMA is outside the scope of this patch, but MPS configuration could prevent it from working by having the MPS on one root port different than the MPS on another. To work around this, simply make the system wide MPS the smallest possible value (128B). Signed-off-by: Jon Mason <mason@myri.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-04drm/radeon/kms: fix channel_remap setup (v2)Alex Deucher
Most asics just use the hw default value which requires no explicit programming. For those that need a different value, the vbios will program it properly. As such, there's no need to program these registers explicitly in the driver. Changing MC_SHARED_CHREMAP requires a reload of all data in vram otherwise its contents will be scambled. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=40103 v2: drop now unused channel_remap functions. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Cc: stable@kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
2011-10-04spi-topcliff-pch: Fix overrun issueTomoya MORINAGA
We found that adding load, Rx data sometimes drops.(with DMA transfer mode) The cause is that before starting Rx-DMA processing, Tx-DMA processing starts. This causes FIFO overrun occurs. This patch fixes the issue by modifying FIFO tx-threshold and DMA descriptor size like below. Current this patch Rx-descriptor 4Byte+12Byte*341 --> 12Byte*340-4Byte-12Byte Rx-threshold (Not modified) Tx-descriptor 4Byte+12Byte*341 --> 16Byte-12Byte*340 Rx-threshold 12Byte --> 2Byte Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-10-04spi-topcliff-pch: Add recovery processing in case FIFO overrun error occursTomoya MORINAGA
Add recovery processing in case FIFO overrun error occurs with DMA transfer mode. Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-10-04spi-topcliff-pch: Fix CPU read complete condition issueTomoya MORINAGA
We found Rx data sometimes drops.(with non-DMA transfer mode) The cause is read complete condition is not true. This patch fixes the issue. Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-10-04spi-topcliff-pch: Fix SSN Control issueTomoya MORINAGA
During processing 1 command/data series, SSN should keep LOW. However, currently, SSN becomes HIGH. This patch fixes the issue. Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-10-04spi-topcliff-pch: add tx-memory clear after complete transmittingTomoya MORINAGA
Currently, in case of reading date from SPI flash, command is sent twice. The cause is that tx-memory clear processing is missing . This patch adds the tx-momory clear processing. Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-10-04lis3: fix regression of HP DriveGuard with 8bit chipTakashi Iwai
Commit 2a7fade7e03 ("hwmon: lis3: Power on corrections") caused a regression on HP laptops with 8bit chip. Writing CTRL2_BOOT_8B bit seems clearing the BIOS setup, and no proper interrupt for DriveGuard will be triggered any more. Since the init code there is basically only for embedded devices, put a pdata check so that the problematic initialization will be skipped for hp_accel stuff. Signed-off-by: Takashi Iwai <tiwai@suse.de> Cc: Eric Piel <eric.piel@tremplin-utc.net> Cc: Samu Onkalo <samu.p.onkalo@nokia.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-03Merge branch 'hwmon-for-linus' of git://github.com/groeck/linuxLinus Torvalds
* 'hwmon-for-linus' of git://github.com/groeck/linux: hwmon: (coretemp) Avoid leaving around dangling pointer hwmon: (coretemp) Fixup platform device ID change
2011-10-03Merge git://github.com/davem330/ideLinus Torvalds
* git://github.com/davem330/ide: ide-disk: Fix request requeuing
2011-10-03Merge branch 'btrfs-3.0' of git://github.com/chrismason/linuxLinus Torvalds
* 'btrfs-3.0' of git://github.com/chrismason/linux: Btrfs: force a page fault if we have a shorty copy on a page boundary
2011-10-03ide-disk: Fix request requeuingBorislav Petkov
Simon Kirby reported that on his RAID setup with idedisk underneath the box OOMs after a couple of days of runtime. Running with CONFIG_DEBUG_KMEMLEAK pointed to idedisk_prep_fn() which unconditionally allocates an ide_cmd struct. However, ide_requeue_and_plug() can be called more than once per request, either from the request issue or the IRQ handler path and do blk_peek_request() ends up in idedisk_prep_fn() repeatedly, allocating a struct ide_cmd everytime and "forgetting" the previous pointer. Make sure the code reuses the old allocated chunk. Reported-and-tested-by: Simon Kirby <sim@hostway.ca> Cc: <stable@kernel.org> [ 39.x, 3.0.x ] Link: http://marc.info/?l=linux-kernel&m=131667641517919 Link: http://lkml.kernel.org/r/20110922072643.GA27232@hostway.ca Signed-off-by: Borislav Petkov <bp@alien8.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-03pch_gbe: Fixed the issue on which a network freezesToshiharu Okada
The pch_gbe driver has an issue which a network stops, when receiving traffic is high. In the case, The link down and up are necessary to return a network. This patch fixed this issue. Signed-off-by: Toshiharu Okada <toshiharu-linux@dsn.okisemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-03pch_gbe: Fixed the issue on which PC was frozen when link was downed.Toshiharu Okada
When a link was downed during network use, there is an issue on which PC freezes. This patch fixed this issue. Signed-off-by: Toshiharu Okada <toshiharu-linux@dsn.okisemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-03make PACKET_STATISTICS getsockopt report consistently between ring and non-ringWillem de Bruijn
This is a minor change. Up until kernel 2.6.32, getsockopt(fd, SOL_PACKET, PACKET_STATISTICS, ...) would return total and dropped packets since its last invocation. The introduction of socket queue overflow reporting [1] changed drop rate calculation in the normal packet socket path, but not when using a packet ring. As a result, the getsockopt now returns different statistics depending on the reception method used. With a ring, it still returns the count since the last call, as counts are incremented in tpacket_rcv and reset in getsockopt. Without a ring, it returns 0 if no drops occurred since the last getsockopt and the total drops over the lifespan of the socket otherwise. The culprit is this line in packet_rcv, executed on a drop: drop_n_acct: po->stats.tp_drops = atomic_inc_return(&sk->sk_drops); As it shows, the new drop number it taken from the socket drop counter, which is not reset at getsockopt. I put together a small example that demonstrates the issue [2]. It runs for 10 seconds and overflows the queue/ring on every odd second. The reported drop rates are: ring: 16, 0, 16, 0, 16, ... non-ring: 0, 15, 0, 30, 0, 46, 0, 60, 0 , 74. Note how the even ring counts monotonically increase. Because the getsockopt adds tp_drops to tp_packets, total counts are similarly reported cumulatively. Long story short, reinstating the original code, as the below patch does, fixes the issue at the cost of additional per-packet cycles. Another solution that does not introduce per-packet overhead is be to keep the current data path, record the value of sk_drops at getsockopt() at call N in a new field in struct packetsock and subtract that when reporting at call N+1. I'll be happy to code that, instead, it's just more messy. [1] http://patchwork.ozlabs.org/patch/35665/ [2] http://kernel.googlecode.com/files/test-packetsock-getstatistics.c Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>