summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2006-06-23Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [NET]: Require CAP_NET_ADMIN to create tuntap devices. [NET]: fix net-core kernel-doc [TCP]: Move inclusion of <linux/dmaengine.h> to correct place in <linux/tcp.h> [IPSEC]: Handle GSO packets [NET]: Added GSO toggle [NET]: Add software TSOv4 [NET]: Add generic segmentation offload [NET]: Merge TSO/UFO fields in sk_buff [NET]: Prevent transmission after dev_deactivate [IPV6] ADDRCONF: Fix default source address selection without CONFIG_IPV6_PRIVACY [IPV6]: Fix source address selection. [NET]: Avoid allocating skb in skb_pad
2006-06-23[PATCH] list: use list_replace_init() instead of list_splice_init()Oleg Nesterov
list_splice_init(list, head) does unneeded job if it is known that list_empty(head) == 1. We can use list_replace_init() instead. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] clean up default value of IP_DCCP_ACKVECJean-Luc Leger
Default values for boolean and tristate options can only be 'y', 'm' or 'n'. This patch removes wrong default for IP_DCCP_ACKVEC. Signed-off-by: Jean-Luc Leger <jean-luc.leger@dspnet.fr.eu.org> Cc: Arnaldo Carvalho de Melo <acme@conectiva.com.br> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[PATCH] VFS: Permit filesystem to override root dentry on mountDavid Howells
Extend the get_sb() filesystem operation to take an extra argument that permits the VFS to pass in the target vfsmount that defines the mountpoint. The filesystem is then required to manually set the superblock and root dentry pointers. For most filesystems, this should be done with simple_set_mnt() which will set the superblock pointer and then set the root dentry to the superblock's s_root (as per the old default behaviour). The get_sb() op now returns an integer as there's now no need to return the superblock pointer. This patch permits a superblock to be implicitly shared amongst several mount points, such as can be done with NFS to avoid potential inode aliasing. In such a case, simple_set_mnt() would not be called, and instead the mnt_root and mnt_sb would be set directly. The patch also makes the following changes: (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount pointer argument and return an integer, so most filesystems have to change very little. (*) If one of the convenience function is not used, then get_sb() should normally call simple_set_mnt() to instantiate the vfsmount. This will always return 0, and so can be tail-called from get_sb(). (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the dcache upon superblock destruction rather than shrink_dcache_anon(). This is required because the superblock may now have multiple trees that aren't actually bound to s_root, but that still need to be cleaned up. The currently called functions assume that the whole tree is rooted at s_root, and that anonymous dentries are not the roots of trees which results in dentries being left unculled. However, with the way NFS superblock sharing are currently set to be implemented, these assumptions are violated: the root of the filesystem is simply a dummy dentry and inode (the real inode for '/' may well be inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries with child trees. [*] Anonymous until discovered from another tree. (*) The documentation has been adjusted, including the additional bit of changing ext2_* into foo_* in the documentation. [akpm@osdl.org: convert ipath_fs, do other stuff] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Cc: Nathan Scott <nathans@sgi.com> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-06-23[NET]: fix net-core kernel-docRandy Dunlap
Warning(/var/linsrc/linux-2617-g4//include/linux/skbuff.h:304): No description found for parameter 'dma_cookie' Warning(/var/linsrc/linux-2617-g4//include/net/sock.h:1274): No description found for parameter 'copied_early' Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'chan' Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'event' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23[IPSEC]: Handle GSO packetsHerbert Xu
This patch segments GSO packets received by the IPsec stack. This can happen when a NIC driver injects GSO packets into the stack which are then forwarded to another host. The primary application of this is going to be Xen where its backend driver may inject GSO packets into dom0. Of course this also can be used by other virtualisation schemes such as VMWare or UML since the tap device could be modified to inject GSO packets received through splice. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23[NET]: Added GSO toggleHerbert Xu
This patch adds a generic segmentation offload toggle that can be turned on/off for each net device. For now it only supports in TCPv4. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23[NET]: Add software TSOv4Herbert Xu
This patch adds the GSO implementation for IPv4 TCP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23[NET]: Add generic segmentation offloadHerbert Xu
This patch adds the infrastructure for generic segmentation offload. The idea is to tap into the potential savings of TSO without hardware support by postponing the allocation of segmented skb's until just before the entry point into the NIC driver. The same structure can be used to support software IPv6 TSO, as well as UFO and segmentation offload for other relevant protocols, e.g., DCCP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23[NET]: Merge TSO/UFO fields in sk_buffHerbert Xu
Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not going to scale if we add any more segmentation methods (e.g., DCCP). So let's merge them. They were used to tell the protocol of a packet. This function has been subsumed by the new gso_type field. This is essentially a set of netdev feature bits (shifted by 16 bits) that are required to process a specific skb. As such it's easy to tell whether a given device can process a GSO skb: you just have to and the gso_type field and the netdev's features field. I've made gso_type a conjunction. The idea is that you have a base type (e.g., SKB_GSO_TCPV4) that can be modified further to support new features. For example, if we add a hardware TSO type that supports ECN, they would declare NETIF_F_TSO | NETIF_F_TSO_ECN. All TSO packets with CWR set would have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO packets would be SKB_GSO_TCPV4. This means that only the CWR packets need to be emulated in software. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23[NET]: Prevent transmission after dev_deactivateHerbert Xu
The dev_deactivate function has bit-rotted since the introduction of lockless drivers. In particular, the spin_unlock_wait call at the end has no effect on the xmit routine of lockless drivers. With a little bit of work, we can make it much more useful by providing the guarantee that when it returns, no more calls to the xmit routine of the underlying driver will be made. The idea is simple. There are two entry points in to the xmit routine. The first comes from dev_queue_xmit. That one is easily stopped by using synchronize_rcu. This works because we set the qdisc to noop_qdisc before the synchronize_rcu call. That in turn causes all subsequent packets sent to dev_queue_xmit to be dropped. The synchronize_rcu call also ensures all outstanding calls leave their critical section. The other entry point is from qdisc_run. Since we now have a bit that indicates whether it's running, all we have to do is to wait until the bit is off. I've removed the loop to wait for __LINK_STATE_SCHED to clear. This is useless because netif_wake_queue can cause it to be set again. It is also harmless because we've disarmed qdisc_run. I've also removed the spin_unlock_wait on xmit_lock because its only purpose of making sure that all outstanding xmit_lock holders have exited is also given by dev_watchdog_down. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23[IPV6] ADDRCONF: Fix default source address selection without ↵YOSHIFUJI Hideaki
CONFIG_IPV6_PRIVACY We need to update hiscore.rule even if we don't enable CONFIG_IPV6_PRIVACY, because we have more less significant rule; longest match. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23[IPV6]: Fix source address selection.Łukasz Stelmach
Two additional labels (RFC 3484, sec. 10.3) for IPv6 addreses are defined to make a distinction between global unicast addresses and Unique Local Addresses (fc00::/7, RFC 4193) and Teredo (2001::/32, RFC 4380). It is necessary to avoid attempts of connection that would either fail (eg. fec0:: to 2001:feed::) or be sub-optimal (2001:0:: to 2001:feed::). Signed-off-by: Łukasz Stelmach <stlman@poczta.fm> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23[NET]: Avoid allocating skb in skb_padHerbert Xu
First of all it is unnecessary to allocate a new skb in skb_pad since the existing one is not shared. More importantly, our hard_start_xmit interface does not allow a new skb to be allocated since that breaks requeueing. This patch uses pskb_expand_head to expand the existing skb and linearize it if needed. Actually, someone should sift through every instance of skb_pad on a non-linear skb as they do not fit the reasons why this was originally created. Incidentally, this fixes a minor bug when the skb is cloned (tcpdump, TCP, etc.). As it is skb_pad will simply write over a cloned skb. Because of the position of the write it is unlikely to cause problems but still it's best if we don't do it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-23Merge branch 'master' into upstreamJeff Garzik
2006-06-20[ATM]: fix broken uses of NIPQUAD in net/atmAl Viro
NIPQUAD expects an l-value of type __be32, _NOT_ a pointer to __be32. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-20[SCTP]: sctp_unpack_cookie() fixAl Viro
sizeof(pointer) != sizeof(array)... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-20Merge branch 'upstream' of ↵Jeff Garzik
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into upstream
2006-06-20[NET]: Prevent multiple qdisc runsHerbert Xu
Having two or more qdisc_run's contend against each other is bad because it can induce packet reordering if the packets have to be requeued. It appears that this is an unintended consequence of relinquinshing the queue lock while transmitting. That in turn is needed for devices that spend a lot of time in their transmit routine. There are no advantages to be had as devices with queues are inherently single-threaded (the loopback device is not but then it doesn't have a queue). Even if you were to add a queue to a parallel virtual device (e.g., bolt a tbf filter in front of an ipip tunnel device), you would still want to process the queue in sequence to ensure that the packets are ordered correctly. The solution here is to steal a bit from net_device to prevent this. BTW, as qdisc_restart is no longer used by anyone as a module inside the kernel (IIRC it used to with netif_wake_queue), I have not exported the new __qdisc_run function. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-20[NETFILTER]: xt_sctp: fix endless loop caused by 0 chunk lengthPatrick McHardy
Fix endless loop in the SCTP match similar to those already fixed in the SCTP conntrack helper (was CVE-2006-1527). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-20Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (46 commits) IB/uverbs: Don't serialize with ib_uverbs_idr_mutex IB/mthca: Make all device methods truly reentrant IB/mthca: Fix memory leak on modify_qp error paths IB/uverbs: Factor out common idr code IB/uverbs: Don't decrement usecnt on error paths IB/uverbs: Release lock on error path IB/cm: Use address handle helpers IB/sa: Add ib_init_ah_from_path() IB: Add ib_init_ah_from_wc() IB/ucm: Get rid of duplicate P_Key parameter IB/srp: Factor out common request reset code IB/srp: Support SRP rev. 10 targets [SCSI] srp.h: Add I/O Class values IB/fmr: Use device's max_map_map_per_fmr attribute in FMR pool. IB/mthca: Fill in max_map_per_fmr device attribute IB/ipath: Add client reregister event generation IB/mthca: Add client reregister event generation IB: Move struct port_info from ipath to <rdma/ib_smi.h> IPoIB: Handle client reregister events IB: Add client reregister event type ...
2006-06-20Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (109 commits) [ETHTOOL]: Fix UFO typo [SCTP]: Fix persistent slowdown in sctp when a gap ack consumes rx buffer. [SCTP]: Send only 1 window update SACK per message. [SCTP]: Don't do CRC32C checksum over loopback. [SCTP] Reset rtt_in_progress for the chunk when processing its sack. [SCTP]: Reject sctp packets with broadcast addresses. [SCTP]: Limit association max_retrans setting in setsockopt. [PFKEYV2]: Fix inconsistent typing in struct sadb_x_kmprivate. [IPV6]: Sum real space for RTAs. [IRDA]: Use put_unaligned() in irlmp_do_discovery(). [BRIDGE]: Add support for NETIF_F_HW_CSUM devices [NET]: Add NETIF_F_GEN_CSUM and NETIF_F_ALL_CSUM [TG3]: Convert to non-LLTX [TG3]: Remove unnecessary tx_lock [TCP]: Add tcp_slow_start_after_idle sysctl. [BNX2]: Update version and reldate [BNX2]: Use CPU native page size [BNX2]: Use compressed firmware [BNX2]: Add firmware decompression [BNX2]: Allow WoL settings on new 5708 chips ... Manual fixup for conflict in drivers/net/tulip/winbond-840.c
2006-06-18[ETHTOOL]: Fix UFO typoHerbert Xu
The function ethtool_get_ufo was referring to ETHTOOL_GTSO instead of ETHTOOL_GUFO. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[SCTP]: Fix persistent slowdown in sctp when a gap ack consumes rx buffer.Neil Horman
In the event that our entire receive buffer is full with a series of chunks that represent a single gap-ack, and then we accept a chunk (or chunks) that fill in the gap between the ctsn and the first gap, we renege chunks from the end of the buffer, which effectively does nothing but move our gap to the end of our received tsn stream. This does little but move our missing tsns down stream a little, and, if the sender is sending sufficiently large retransmit frames, the result is a perpetual slowdown which can never be recovered from, since the only chunk that can be accepted to allow progress in the tsn stream necessitates that a new gap be created to make room for it. This leads to a constant need for retransmits, and subsequent receiver stalls. The fix I've come up with is to deliver the frame without reneging if we have a full receive buffer and the receiving sockets sk_receive_queue is empty(indicating that the receive buffer is being blocked by a missing tsn). Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[SCTP]: Send only 1 window update SACK per message.Tsutomu Fujii
Right now, every time we increase our rwnd by more then MTU bytes, we trigger a SACK. When processing large messages, this will generate a SACK for almost every other SCTP fragment. However since we are freeing the entire message at the same time, we might as well collapse the SACK generation to 1. Signed-off-by: Tsutomu Fujii <t-fujii@nb.jp.nec.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[SCTP]: Don't do CRC32C checksum over loopback.Sridhar Samudrala
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[SCTP] Reset rtt_in_progress for the chunk when processing its sack.Vlad Yasevich
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[SCTP]: Reject sctp packets with broadcast addresses.Vlad Yasevich
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[SCTP]: Limit association max_retrans setting in setsockopt.Vlad Yasevich
When using ASSOCINFO socket option, we need to limit the number of maximum association retransmissions to be no greater than the sum of all the path retransmissions. This is specified in Section 7.1.2 of the SCTP socket API draft. However, we only do this if the association has multiple paths. If there is only one path, the protocol stack will use the assoc_max_retrans setting when trying to retransmit packets. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[IPV6]: Sum real space for RTAs.YOSHIFUJI Hideaki
This patch fixes RTNLGRP_IPV6_IFINFO netlink notifications. Issue pointed out by Patrick McHardy <kaber@trash.net>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[IRDA]: Use put_unaligned() in irlmp_do_discovery().David S. Miller
irda_device_info->hints[] is byte aligned but is being accessed as a u16 Based upon a patch by Luke Yang <luke.adi@gmail.com>. Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[BRIDGE]: Add support for NETIF_F_HW_CSUM devicesHerbert Xu
As it is the bridge will only ever declare NETIF_F_IP_CSUM even if all its constituent devices support NETIF_F_HW_CSUM. This patch fixes this by supporting the first one out of NETIF_F_NO_CSUM, NETIF_F_HW_CSUM, and NETIF_F_IP_CSUM that is supported by all constituent devices. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[NET]: Add NETIF_F_GEN_CSUM and NETIF_F_ALL_CSUMHerbert Xu
The current stack treats NETIF_F_HW_CSUM and NETIF_F_NO_CSUM identically so we test for them in quite a few places. For the sake of brevity, I'm adding the macro NETIF_F_GEN_CSUM for these two. We also test the disjunct of NETIF_F_IP_CSUM and the other two in various places, for that purpose I've added NETIF_F_ALL_CSUM. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[TCP]: Add tcp_slow_start_after_idle sysctl.David S. Miller
A lot of people have asked for a way to disable tcp_cwnd_restart(), and it seems reasonable to add a sysctl to do that. Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[TCP] Westwood: reset RTT min after FRTOLuca De Cicco
RTT_min is updated each time a timeout event occurs in order to cope with hard handovers in wireless scenarios such as UMTS. Signed-off-by: Luca De Cicco <ldecicco@gmail.com> Signed-off-by: Stephen Hemminger <shemminger@dxpl.pdx.osdl.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[TCP] Westwood: bandwidth filter startupLuca De Cicco
The bandwidth estimate filter is now initialized with the first sample in order to have better performances in the case of small file transfers. Signed-off-by: Luca De Cicco <ldecicco@gmail.com> Signed-off-by: Stephen Hemminger <shemminger@dxpl.pdx.osdl.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[TCP] Westwood: comment fixesLuca De Cicco
Cleanup some comments and add more references Signed-off-by: Luca De Cicco <ldecicco@gmail.com> Signed-off-by: Stephen Hemminger <shemminger@dxpl.pdx.osdl.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[TCP] Westwood: fix first sampleStephen Hemminger
Need to update send sequence number tracking after first ack. Rework of patch from Luca De Cicco. Signed-off-by: Stephen Hemminger <shemminger@dxpl.pdx.osdl.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[NET]: net.ipv4.ip_autoconfig sysctl removalStephen Hemminger
The sysctl net.ipv4.ip_autoconfig is a legacy value that is not used. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[IPX]: Endian bug in ipxrtr_route_packet()Alexey Dobriyan
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[NET]: Warn in __skb_trim if skb is pagedHerbert Xu
It's better to warn and fail rather than rarely triggering BUG on paths that incorrectly call skb_trim/__skb_trim on a non-linear skb. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[NET]: skb_trim auditHerbert Xu
I found a few more spots where pskb_trim_rcsum could be used but were not. This patch changes them to use it. Also, sk_filter can get paged skb data. Therefore we must use pskb_trim instead of skb_trim. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[NET]: Clean up skb_linearizeHerbert Xu
The linearisation operation doesn't need to be super-optimised. So we can replace __skb_linearize with __pskb_pull_tail which does the same thing but is more general. Also, most users of skb_linearize end up testing whether the skb is linear or not so it helps to make skb_linearize do just that. Some callers of skb_linearize also use it to copy cloned data, so it's useful to have a new function skb_linearize_cow to copy the data if it's either non-linear or cloned. Last but not least, I've removed the gfp argument since nobody uses it anymore. If it's ever needed we can easily add it back. Misc bugs fixed by this patch: * via-velocity error handling (also, no SG => no frags) Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[NET]: Add netif_tx_lockHerbert Xu
Various drivers use xmit_lock internally to synchronise with their transmission routines. They do so without setting xmit_lock_owner. This is fine as long as netpoll is not in use. With netpoll it is possible for deadlocks to occur if xmit_lock_owner isn't set. This is because if a printk occurs while xmit_lock is held and xmit_lock_owner is not set can cause netpoll to attempt to take xmit_lock recursively. While it is possible to resolve this by getting netpoll to use trylock, it is suboptimal because netpoll's sole objective is to maximise the chance of getting the printk out on the wire. So delaying or dropping the message is to be avoided as much as possible. So the only alternative is to always set xmit_lock_owner. The following patch does this by introducing the netif_tx_lock family of functions that take care of setting/unsetting xmit_lock_owner. I renamed xmit_lock to _xmit_lock to indicate that it should not be used directly. I didn't provide irq versions of the netif_tx_lock functions since xmit_lock is meant to be a BH-disabling lock. This is pretty much a straight text substitution except for a small bug fix in winbond. It currently uses netif_stop_queue/spin_unlock_wait to stop transmission. This is unsafe as an IRQ can potentially wake up the queue. So it is safer to use netif_tx_disable. The hamradio bits used spin_lock_irq but it is unnecessary as xmit_lock must never be taken in an IRQ handler. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[NETFILTER]: hashlimit match: fix random initializationPatrick McHardy
hashlimit does: if (!ht->rnd) get_random_bytes(&ht->rnd, 4); ignoring that 0 is also a valid random number. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[NETFILTER]: recent match: missing refcnt initializationPatrick McHardy
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[NETFILTER]: recent match: fix "sleeping function called from invalid context"Patrick McHardy
create_proc_entry must not be called with locks held. Use a mutex instead to protect data only changed in user context. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[SECMARK]: Add CONNSECMARK xtables targetJames Morris
Add a new xtables target, CONNSECMARK, which is used to specify rules for copying security marks from packets to connections, and for copyying security marks back from connections to packets. This is similar to the CONNMARK target, but is more limited in scope in that it only allows copying of security marks to and from packets, as this is all it needs to do. A typical scenario would be to apply a security mark to a 'new' packet with SECMARK, then copy that to its conntrack via CONNMARK, and then restore the security mark from the connection to established and related packets on that connection. Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[SECMARK]: Add secmark support to conntrackJames Morris
Add a secmark field to IP and NF conntracks, so that security markings on packets can be copied to their associated connections, and also copied back to packets as required. This is similar to the network mark field currently used with conntrack, although it is intended for enforcement of security policy rather than network policy. Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-18[SECMARK]: Add xtables SECMARK targetJames Morris
Add a SECMARK target to xtables, allowing the admin to apply security marks to packets via both iptables and ip6tables. The target currently handles SELinux security marking, but can be extended for other purposes as needed. Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>