summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2013-01-17netfilter: nf_ct_sip: support Cisco 7941/7945 IP phonesKevin Cernekee
Most SIP devices use a source port of 5060/udp on SIP requests, so the response automatically comes back to port 5060: phone_ip:5060 -> proxy_ip:5060 REGISTER proxy_ip:5060 -> phone_ip:5060 100 Trying The newer Cisco IP phones, however, use a randomly chosen high source port for the SIP request but expect the response on port 5060: phone_ip:49173 -> proxy_ip:5060 REGISTER proxy_ip:5060 -> phone_ip:5060 100 Trying Standard Linux NAT, with or without nf_nat_sip, will send the reply back to port 49173, not 5060: phone_ip:49173 -> proxy_ip:5060 REGISTER proxy_ip:5060 -> phone_ip:49173 100 Trying But the phone is not listening on 49173, so it will never see the reply. This patch modifies nf_*_sip to work around this quirk by extracting the SIP response port from the Via: header, iff the source IP in the packet header matches the source IP in the SIP request. Signed-off-by: Kevin Cernekee <cernekee@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: Documentation/networking/ip-sysctl.txt drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c Both conflicts were simply overlapping context. A build fix for qlcnic is in here too, simply removing the added devinit annotations which no longer exist. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14Merge branch 'master' of git://1984.lsi.us.es/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== The following patchset contains netfilter fixes for 3.8-rc3, they are: * fix possible BUG_ON if several netns are in use and the nf_conntrack module is removed, initial patch from Gao feng, final patch from myself. * fix unset return value if conntrack zone are disabled at compile-time, reported by Borislav Petkov, fix from myself. * fix display error message via dmesg for arp_tables, from Jan Engelhardt. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14tcp: fix a panic on UP machines in reqsk_fastopen_removeEric Dumazet
spin_is_locked() on a non !SMP build is kind of useless. BUG_ON(!spin_is_locked(xx)) is guaranteed to crash. Just remove this check in reqsk_fastopen_remove() as the callers do hold the socket lock. Reported-by: Ketan Kulkarni <ketkulka@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jerry Chu <hkchu@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Dave Taht <dave.taht@gmail.com> Acked-by: H.K. Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14net: phy: remove flags argument from phy_{attach, connect, connect_direct}Florian Fainelli
The flags argument of the phy_{attach,connect,connect_direct} functions is then used to assign a struct phy_device dev_flags with its value. All callers but the tg3 driver pass the flag 0, which results in the underlying PHY drivers in drivers/net/phy/ not being able to actually use any of the flags they would set in dev_flags. This patch gets rid of the flags argument, and passes phydev->dev_flags to the internal PHY library call phy_attach_direct() such that drivers which actually modify a phy device dev_flags get the value preserved for use by the underlying phy driver. Acked-by: Kosta Zertsekel <konszert@marvell.com> Signed-off-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14pkt_sched: namespace aware act_mirredBenjamin LaHaise
Eric Dumazet pointed out that act_mirred needs to find the current net_ns, and struct net pointer is not provided in the call chain. His original patch made use of current->nsproxy->net_ns to find the network namespace, but this fails to work correctly for userspace code that makes use of netlink sockets in different network namespaces. Instead, pass the "struct net *" down along the call chain to where it is needed. This version removes the ifb changes as Eric has submitted that patch separately, but is otherwise identical to the previous version. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Tested-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14ipv6 netevent: Remove old_neigh from netevent_redirect.YOSHIFUJI Hideaki / 吉藤英明
The only user is cxgb3 driver. old_neigh is used to check device change, but it must not happen on redirect. In this sense, we can remove old_neigh argument. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix regression allowing IP_TTL setting of zero, fix from Cong Wang. 2) Fix leak regressions in tunap, from Jason Wang. 3) be2net driver always returns IRQ_HANDLED in INTx handler, fix from Sathya Perla. 4) qlge doesn't really support NETIF_F_TSO6, don't set that flag. Fix from Amerigo Wang. 5) Add 802.11ad Atheros wil6210 driver, from Vladimir Kondratiev. 6) Fix MTU calculations in mac80211 layer, from T Krishna Chaitanya. 7) Station info layer of mac80211 needs to use del_timer_sync(), from Johannes Berg. 8) tcp_read_sock() can loop forever, because we don't immediately stop when recv_actor() returns zero. Fix from Eric Dumazet. 9) Fix WARN_ON() in tcp_cleanup_rbuf(). We have to use sk_eat_skb() in tcp_recv_skb() to handle the case where a large GRO packet is split up while it is use by a splice() operation. Fix also from Eric Dumazet. 10) addrconf_get_prefix_route() in ipv6 tests flags incorrectly, it does: if (X && (p->flags & Y) != 0) when it really meant to go: if (X && (p->flags & X) != 0) fix from Romain Kuntz. 11) Fix lost Kconfig dependency for bfin_mac driver hardware timestamping. From Lars-Peter Clausen. 12) Fix regression in handling of RST without ACK in TCP, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (37 commits) be2net: fix unconditionally returning IRQ_HANDLED in INTx tuntap: fix leaking reference count tuntap: forbid calling TUNSETIFF when detached tuntap: switch to use rtnl_dereference() net, wireless: overwrite default_ethtool_ops qlge: remove NETIF_F_TSO6 flag tcp: accept RST without ACK flag net: ethernet: xilinx: Do not use NO_IRQ in axienet net: ethernet: xilinx: Do not use axienet on PPC bnx2x: Allow management traffic after boot from SAN bnx2x: Fix fastpath structures when memory allocation fails bfin_mac: Restore hardware time-stamping dependency on BF518 tun: avoid owner checks on IFF_ATTACH_QUEUE bnx2x: move debugging code before the return tuntap: refuse to re-attach to different tun_struct ipv6: use addrconf_get_prefix_route for prefix route lookup [v2] ipv6: fix the noflags test in addrconf_get_prefix_route tcp: fix splice() and tcp collapsing interaction tcp: splice: fix an infinite loop in tcp_read_sock() net: prevent setting ttl=0 via IP_TTL ...
2013-01-14Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Included changes: - use per_cpu_add when possible - prevent the TT component to add multicast address as "mesh clients" - some debug output improvements - proper lockdeps class initializations - new style fixes (space before/after brackets) - other minor fixes and refactoring Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14ipv6: Store Router Alert option in IP6CB directly.YOSHIFUJI Hideaki / 吉藤英明
Router Alert option is very small and we can store the value itself in the skb. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14ipv6 xfrm: Use ipv6_addr_hash() in xfrm6_tunnel_spi_hash_byaddr().YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14ipv6 route: Use ipv6_addr_hash() in rt6_info_hash_nhsfn().YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input().YOSHIFUJI Hideaki / 吉藤英明
Move generalized version of ipv6_is_mld() to header, and use it from ip6_mc_input(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14ipv6: Use ipv6_get_dsfield() instead of ipv6_tclass().YOSHIFUJI Hideaki / 吉藤英明
Commit 7a3198a8 ("ipv6: helper function to get tclass") introduced ipv6_tclass(), but similar function is already available as ipv6_get_dsfield(). We might be able to call ipv6_tclass() from ipv6_get_dsfield(), but it is confusing to have two versions. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14ipv6: Introduce ip6_flowinfo() to extract flowinfo (tclass + flowlabel).YOSHIFUJI Hideaki / 吉藤英明
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-14ipv6: Introduce ip6_flow_hdr() to fill version, tclass and flowlabel.YOSHIFUJI Hideaki / 吉藤英明
This is not only for readability but also for optimization. What we do here is to build the 32bit word at the beginning of the ipv6 header (the "ip6_flow" virtual member of struct ip6_hdr in RFC3542) and we do not need to read the tclass portion of the target buffer. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-13netfilter: x_tables: print correct hook names for ARPJan Engelhardt
arptables 0.0.4 (released on 10th Jan 2013) supports calling the CLASSIFY target, but on adding a rule to the wrong chain, the diagnostic is as follows: # arptables -A INPUT -j CLASSIFY --set-class 0:0 arptables: Invalid argument # dmesg | tail -n1 x_tables: arp_tables: CLASSIFY target: used from hooks PREROUTING, but only usable from INPUT/FORWARD This is incorrect, since xt_CLASSIFY.c does specify (1 << NF_ARP_OUT) | (1 << NF_ARP_FORWARD). This patch corrects the x_tables diagnostic message to print the proper hook names for the NFPROTO_ARP case. Affects all kernels down to and including v2.6.31. Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-12netfilter: nf_conntrack: fix BUG_ON while removing nf_conntrack with netnsPablo Neira Ayuso
canqun zhang reported that we're hitting BUG_ON in the nf_conntrack_destroy path when calling kfree_skb while rmmod'ing the nf_conntrack module. Currently, the nf_ct_destroy hook is being set to NULL in the destroy path of conntrack.init_net. However, this is a problem since init_net may be destroyed before any other existing netns (we cannot assume any specific ordering while releasing existing netns according to what I read in recent emails). Thanks to Gao feng for initial patch to address this issue. Reported-by: canqun zhang <canqunzhang@gmail.com> Acked-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-01-12batman-adv: unbloat batadv_priv if debug is not enabledMarek Lindner
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-01-12batman-adv: remove unused variable from orig_node structMarek Lindner
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-01-12batman-adv: fix typo in debug messageAntonio Quartulli
in bat_iv_ogm.c a debug message should print "tq" instead of "td" Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12batman-adv: use the const qualifier in hash functionsAntonio Quartulli
The data argument in each hash function should carry the "const" qualifier as it is never modified. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12batman-adv: don't compile the BLA switch if not requestedAntonio Quartulli
When the Bridge Loop Avoidance component is not compiled-in, its boolean switch should be not compiled as well. This patch surrounds the switch with a proper ifdef. This behaviour was introduced by 9fd6b0615b5499b270d39a92b8790e206cf75833 ("batman-adv: add bridge loop avoidance compile option") Signed-off-by: Antonio Quartulli <ordex@autistici.org> Acked-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12batman-adv: remove useless NULL checkAntonio Quartulli
debugfs_remove_recursive() checks whether its argument is not null on its own, therefore it is possible to remove the external check. Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12batman-adv: remove useless blank lines before and after bracketsAntonio Quartulli
Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12batman-adv: Initialize lockdep class keys for hashesAntonio Quartulli
Different hashes have the same class key because they get initialised with the same one. For this reason lockdep can create false warning when they are used recursively. Re-initialise the key for each hash after the invocation to hash_new() to avoid this problem. Signed-off-by: Antonio Quartulli <ordex@autistici.org> Tested-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12batman-adv: remove useless assignment in tt_local_add()Antonio Quartulli
The flag field of the tt_local_entry->common structure in tt_local_add() is first assigned NO_FLAGS and then TT_CLIENT_NEW so nullifying the first operation. For this reason it is safe to remove the first assignment. This was introuduced by ("batman-adv: keep local table consistency for further TT_RESPONSE") Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12batman-adv: unify and properly print hex valuesAntonio Quartulli
Values are printed in hexadecimal format in several points in the code, but they are not printed using the same format string. This patches unifies the format used for such numbers so that they look the same everywhere. Given the fact that all the variables printed as hexadecimal are 16 bit long, this is the chosen printing format: %#.4x Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12batman-adv: print the CRC together with the translation tablesAntonio Quartulli
To simplify debugging operations, it is better to print the related CRC together with the translation table (local CRC for the local table and global CRC for each entry in the global table) Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12batman-adv: improve local translation table outputAntonio Quartulli
This patch adds a nice header to the local translation table and the last_seen time for each local entry Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12batman-adv: reduce local TT entry timeout to 10 minutesAntonio Quartulli
The current timeout is set to one hour. However a client connected to the mesh network will always generate traffic. In the worst case it will send ARP requests every 4 or 5 minutes. On the other hand having a long timeout means storing dead entries for one hour and it leads to very big trans-tables containing useless clients. This patch reduces the timeout to 10 minutes Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-01-12batman-adv: Do not add multicast MAC addresses to translation tableLinus Lüssing
The current translation table mechanism is not suitable for multicast addresses and we are currently flooding such frames anyway. Therefore this patch prevents multicast MAC addresses being added to the translation table. Signed-off-by: Linus Lüssing <linus.luessing@web.de> Acked-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-01-12batman-adv: use per_cpu_add helperShan Wei
this_cpu_add is an atomic operation. and be more faster than per_cpu_ptr operation. Signed-off-by: Shan Wei <davidshan@tencent.com> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de> Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-01-12net: splice: fix __splice_segment()Eric Dumazet
commit 9ca1b22d6d2 (net: splice: avoid high order page splitting) forgot that skb->head could need a copy into several page frags. This could be the case for loopback traffic mostly. Also remove now useless skb argument from linear_to_page() and __splice_segment() prototypes. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-11ipv4: fib: fix a comment.Rami Rosen
In fib_frontend.c, there is a confusing comment; NETLINK_CB(skb).portid does not refer to a pid of sending process, but rather to a netlink portid. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-11net, wireless: overwrite default_ethtool_opsStanislaw Gruszka
Since: commit 2c60db037034d27f8c636403355d52872da92f81 Author: Eric Dumazet <edumazet@google.com> Date: Sun Sep 16 09:17:26 2012 +0000 net: provide a default dev->ethtool_ops wireless core does not correctly assign ethtool_ops. After alloc_netdev*() call, some cfg80211 drivers provide they own ethtool_ops, but some do not. For them, wireless core provide generic cfg80211_ethtool_ops, which is assigned in NETDEV_REGISTER notify call: if (!dev->ethtool_ops) dev->ethtool_ops = &cfg80211_ethtool_ops; But after Eric's commit, dev->ethtool_ops is no longer NULL (on cfg80211 drivers without custom ethtool_ops), but points to &default_ethtool_ops. In order to fix the problem, provide function which will overwrite default_ethtool_ops and use it by wireless core. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-11net: Export __netdev_pick_tx so that it can be used in modulesAlexander Duyck
When testing with FCoE enabled we discovered that I had not exported __netdev_pick_tx. As a result ixgbe doesn't build with the RFC patches applied because ixgbe_select_queue was calling the function. This change corrects that build issue by correctly exporting __netdev_pick_tx so it can be used by modules. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-11Merge tag 'nfs-for-3.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfix from Trond Myklebust: - Fix a socket lock leak in net/sunrpc/xprt.c * tag 'nfs-for-3.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Ensure we release the socket write lock if the rpc_task exits early
2013-01-11tcp: accept RST without ACK flagEric Dumazet
commit c3ae62af8e755 (tcp: should drop incoming frames without ACK flag set) added a regression on the handling of RST messages. RST should be allowed to come even without ACK bit set. We validate the RST by checking the exact sequence, as requested by RFC 793 and 5961 3.2, in tcp_validate_incoming() Reported-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Eric Wong <normalperson@yhbt.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-11net: Add support for XPS without sysfs being definedAlexander Duyck
This patch makes it so that we can support transmit packet steering without sysfs needing to be enabled. The reason for making this change is to make it so that a driver can make use of the XPS even while the sysfs portion of the interface is not present. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-11net: Rewrite netif_set_xps_queues to address several issuesAlexander Duyck
This change is meant to address several issues I found within the netif_set_xps_queues function. If the allocation of one of the maps to be assigned to new_dev_maps failed we could end up with the device map in an inconsistent state since we had already worked through a number of CPUs and removed or added the queue. To address that I split the process into several steps. The first of which is just the allocation of updated maps for CPUs that will need larger maps to store the queue. By doing this we can fail gracefully without actually altering the contents of the current device map. The second issue I found was the fact that we were always allocating a new device map even if we were not adding any queues. I have updated the code so that we only allocate a new device map if we are adding queues, otherwise if we are not adding any queues to CPUs we just skip to the removal process. The last change I made was to reuse the code from remove_xps_queue to remove the queue from the CPU. By making this change we can be consistent in how we go about adding and removing the queues from the CPUs. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-11net: Rewrite netif_reset_xps_queue to allow for better code reuseAlexander Duyck
This patch does a minor refactor on netif_reset_xps_queue to address a few items I noticed. First is the fact that we are doing removal of queues in both netif_reset_xps_queue and netif_set_xps_queue. Since there is no need to have the code in two places I am pushing it out into a separate function and will come back in another patch and reuse the code in netif_set_xps_queue. The second item this change addresses is the fact that the Tx queues were not getting their numa_node value cleared as a part of the XPS queue reset. This patch resolves that by resetting the numa_node value if the dev_maps value is set. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-11net: Add functions netif_reset_xps_queue and netif_set_xps_queueAlexander Duyck
This patch adds two functions, netif_reset_xps_queue and netif_set_xps_queue. The main idea behind these two functions is to provide a mechanism through which drivers can update their defaults in regards to XPS. Currently no such mechanism exists and as a result we cannot use XPS for things such as ATR which would require a basic configuration to start in which the Tx queues are mapped to CPUs via a 1:1 mapping. With this change I am making it possible for drivers such as ixgbe to be able to use the XPS feature by controlling the default configuration. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-11net: Split core bits of netdev_pick_tx into __netdev_pick_txAlexander Duyck
This change splits the core bits of netdev_pick_tx into a separate function. The main idea behind this is to make this code accessible to select queue functions when they decide to process the standard path instead of their own custom path in their select queue routine. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10net_sched: more precise pkt_len computationEric Dumazet
One long standing problem with TSO/GSO/GRO packets is that skb->len doesn't represent a precise amount of bytes on wire. Headers are only accounted for the first segment. For TCP, thats typically 66 bytes per 1448 bytes segment missing, an error of 4.5 % for normal MSS value. As consequences : 1) TBF/CBQ/HTB/NETEM/... can send more bytes than the assigned limits. 2) Device stats are slightly under estimated as well. Fix this by taking account of headers in qdisc_skb_cb(skb)->pkt_len computation. Packet schedulers should use qdisc pkt_len instead of skb->len for their bandwidth limitations, and TSO enabled devices drivers could use pkt_len if their statistics are not hardware assisted, and if they don't scratch skb->cb[] first word. Both egress and ingress paths work, thanks to commit fda55eca5a (net: introduce skb_transport_header_was_set()) : If GRO built a GSO packet, it also set the transport header for us. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Stephen Hemminger <shemminger@vyatta.com> Cc: Paolo Valente <paolo.valente@unimore.it> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10nfs: fix sunrpc/clnt.c kernel-doc warningsRandy Dunlap
Fix new kernel-doc warnings in clnt.c: Warning(net/sunrpc/clnt.c:561): No description found for parameter 'flavor' Warning(net/sunrpc/clnt.c:561): Excess function parameter 'auth' description in 'rpc_clone_client_set_auth' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: linux-nfs@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-10ipv6: use addrconf_get_prefix_route for prefix route lookup [v2]Romain Kuntz
Replace ip6_route_lookup() with addrconf_get_prefix_route() when looking up for a prefix route. This ensures that the connected prefix is looked up in the main table, and avoids the selection of other matching routes located in different tables as well as blackhole or prohibited entries. In addition, this fixes an Opps introduced by commit 64c6d08e (ipv6: del unreachable route when an addr is deleted on lo), that would occur when a blackhole or prohibited entry is selected by ip6_route_lookup(). Such entries have a NULL rt6i_table argument, which is accessed by __ip6_del_rt() when trying to lock rt6i_table->tb6_lock. The function addrconf_is_prefix_route() is not used anymore and is removed. [v2] Minor indentation cleanup and log updates. Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10ipv6: fix the noflags test in addrconf_get_prefix_routeRomain Kuntz
The tests on the flags in addrconf_get_prefix_route() does no make much sense: the 'noflags' parameter contains the set of flags that must not match with the route flags, so the test must be done against 'noflags', and not against 'flags'. Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10tcp: fix splice() and tcp collapsing interactionEric Dumazet
Under unusual circumstances, TCP collapse can split a big GRO TCP packet while its being used in a splice(socket->pipe) operation. skb_splice_bits() releases the socket lock before calling splice_to_pipe(). [ 1081.353685] WARNING: at net/ipv4/tcp.c:1330 tcp_cleanup_rbuf+0x4d/0xfc() [ 1081.371956] Hardware name: System x3690 X5 -[7148Z68]- [ 1081.391820] cleanup rbuf bug: copied AD3BCF1 seq AD370AF rcvnxt AD3CF13 To fix this problem, we must eat skbs in tcp_recv_skb(). Remove the inline keyword from tcp_recv_skb() definition since it has three call sites. Reported-by: Christian Becker <c.becker@traviangames.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10tcp: splice: fix an infinite loop in tcp_read_sock()Eric Dumazet
commit 02275a2ee7c0 (tcp: don't abort splice() after small transfers) added a regression. [ 83.843570] INFO: rcu_sched self-detected stall on CPU [ 83.844575] INFO: rcu_sched detected stalls on CPUs/tasks: { 6} (detected by 0, t=21002 jiffies, g=4457, c=4456, q=13132) [ 83.844582] Task dump for CPU 6: [ 83.844584] netperf R running task 0 8966 8952 0x0000000c [ 83.844587] 0000000000000000 0000000000000006 0000000000006c6c 0000000000000000 [ 83.844589] 000000000000006c 0000000000000096 ffffffff819ce2bc ffffffffffffff10 [ 83.844592] ffffffff81088679 0000000000000010 0000000000000246 ffff880c4b9ddcd8 [ 83.844594] Call Trace: [ 83.844596] [<ffffffff81088679>] ? vprintk_emit+0x1c9/0x4c0 [ 83.844601] [<ffffffff815ad449>] ? schedule+0x29/0x70 [ 83.844606] [<ffffffff81537bd2>] ? tcp_splice_data_recv+0x42/0x50 [ 83.844610] [<ffffffff8153beaa>] ? tcp_read_sock+0xda/0x260 [ 83.844613] [<ffffffff81537b90>] ? tcp_prequeue_process+0xb0/0xb0 [ 83.844615] [<ffffffff8153c0f0>] ? tcp_splice_read+0xc0/0x250 [ 83.844618] [<ffffffff814dc0c2>] ? sock_splice_read+0x22/0x30 [ 83.844622] [<ffffffff811b820b>] ? do_splice_to+0x7b/0xa0 [ 83.844627] [<ffffffff811ba4bc>] ? sys_splice+0x59c/0x5d0 [ 83.844630] [<ffffffff8119745b>] ? putname+0x2b/0x40 [ 83.844633] [<ffffffff8118bcb4>] ? do_sys_open+0x174/0x1e0 [ 83.844636] [<ffffffff815b6202>] ? system_call_fastpath+0x16/0x1b if recv_actor() returns 0, we should stop immediately, because looping wont give a chance to drain the pipe. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>