summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2012-07-18ipv6: add ipv6_addr_hash() helperEric Dumazet
Introduce ipv6_addr_hash() helper doing a XOR on all bits of an IPv6 address, with an optimized x86_64 version. Use it in flow dissector, as suggested by Andrew McGregor, to reduce hash collision probabilities in fq_codel (and other users of flow dissector) Use it in ip6_tunnel.c and use more bit shuffling, as suggested by David Laight, as existing hash was ignoring most of them. Use it in sunrpc and use more bit shuffling, using hash_32(). Use it in net/ipv6/addrconf.c, using hash_32() as well. As a cleanup, use it in net/ipv4/tcp_metrics.c Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrew McGregor <andrewmcgr@gmail.com> Cc: Dave Taht <dave.taht@gmail.com> Cc: Tom Herbert <therbert@google.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-18skbuff: Use correct allocation in skb_copy_ubufsKrishna Kumar
Use correct allocation flags during copy of user space fragments to the kernel. Also "improve" couple of for loops. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-18net/ipv4: VTI support new module for ip_vti.Saurabh
New VTI tunnel kernel module, Kconfig and Makefile changes. Signed-off-by: Saurabh Mohan <saurabh.mohan@vyatta.com> Reviewed-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-18net/ipv4: VTI support rx-path hook in xfrm4_mode_tunnel.Saurabh
Incorporated David and Steffen's comments. Add hook for rx-path xfmr4_mode_tunnel for VTI tunnel module. Signed-off-by: Saurabh Mohan <saurabh.mohan@vyatta.com> Reviewed-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-18tcp: refine SYN handling in tcp_validate_incomingEric Dumazet
Followup of commit 0c24604b68fc (tcp: implement RFC 5961 4.2) As reported by Vijay Subramanian, we should send a challenge ACK instead of a dup ack if a SYN flag is set on a packet received out of window. This permits the ratelimiting to work as intended, and to increase correct SNMP counters. Suggested-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Vijay Subramanian <subramanian.vijay@gmail.com> Cc: Kiran Kumar Kella <kkiran@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-18cipso: don't follow a NULL pointer when setsockopt() is calledPaul Moore
As reported by Alan Cox, and verified by Lin Ming, when a user attempts to add a CIPSO option to a socket using the CIPSO_V4_TAG_LOCAL tag the kernel dies a terrible death when it attempts to follow a NULL pointer (the skb argument to cipso_v4_validate() is NULL when called via the setsockopt() syscall). This patch fixes this by first checking to ensure that the skb is non-NULL before using it to find the incoming network interface. In the unlikely case where the skb is NULL and the user attempts to add a CIPSO option with the _TAG_LOCAL tag we return an error as this is not something we want to allow. A simple reproducer, kindly supplied by Lin Ming, although you must have the CIPSO DOI #3 configure on the system first or you will be caught early in cipso_v4_validate(): #include <sys/types.h> #include <sys/socket.h> #include <linux/ip.h> #include <linux/in.h> #include <string.h> struct local_tag { char type; char length; char info[4]; }; struct cipso { char type; char length; char doi[4]; struct local_tag local; }; int main(int argc, char **argv) { int sockfd; struct cipso cipso = { .type = IPOPT_CIPSO, .length = sizeof(struct cipso), .local = { .type = 128, .length = sizeof(struct local_tag), }, }; memset(cipso.doi, 0, 4); cipso.doi[3] = 3; sockfd = socket(AF_INET, SOCK_DGRAM, 0); #define SOL_IP 0 setsockopt(sockfd, SOL_IP, IP_OPTIONS, &cipso, sizeof(struct cipso)); return 0; } CC: Lin Ming <mlin@ss.pku.edu.cn> Reported-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-18ipv6: fix inet6_csk_xmit()Eric Dumazet
We should provide to inet6_csk_route_socket a struct flowi6 pointer, so that net6_csk_xmit() works correctly instead of sending garbage. Also add some consts Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-18mac80211: flush stations before stop beaconingEliad Peller
When AP interface is going down, the stations are flushed (in ieee80211_do_stop()) only after the beaconing was stopped. However, drivers might rely on stations being removed before the beaconing was stopped, in order to clean up properly. Fix it by flushing the stations on ap stop. (we already do the same for other interface types, e.g. in ieee80211_set_disassoc()) Signed-off-by: Eliad Peller <eliad@wizery.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-18cfg80211: Fix mutex locking in reg_last_request_cell_baseMohammed Shafi Shajakhan
should fix the following issue [ 3229.815012] [ BUG: lock held when returning to user space! ] [ 3229.815016] 3.5.0-rc7-wl #28 Tainted: G W O [ 3229.815017] ------------------------------------------------ [ 3229.815019] wpa_supplicant/5783 is leaving the kernel with locks still held! [ 3229.815022] 1 lock held by wpa_supplicant/5783: [ 3229.815023] #0: (reg_mutex){+.+.+.}, at: [<fa65834d>] reg_last_request_cell_base+0x1d/0x60 [cfg80211] Cc: Luis Rodriguez <mcgrof@gmail.com> Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qca.qualcomm.com> Tested-by: Luciano Coelho <coelho@ti.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-18libceph: fix messenger retrySage Weil
In ancient times, the messenger could both initiate and accept connections. An artifact if that was data structures to store/process an incoming ceph_msg_connect request and send an outgoing ceph_msg_connect_reply. Sadly, the negotiation code was referencing those structures and ignoring important information (like the peer's connect_seq) from the correct ones. Among other things, this fixes tight reconnect loops where the server sends RETRY_SESSION and we (the client) retries with the same connect_seq as last time. This bug pretty easily triggered by injecting socket failures on the MDS and running some fs workload like workunits/direct_io/test_sync_io. Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-17SUNRPC: Add a missing spin_unlock to gss_mech_list_pseudoflavorsTrond Myklebust
The patch "SUNRPC: Add rpcauth_list_flavors()" introduces a new error path in gss_mech_list_pseudoflavors, but fails to release the spin lock. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-07-17ipv4: fix rcu splatEric Dumazet
free_nh_exceptions() should use rcu_dereference_protected(..., 1) since its called after one RCU grace period. Also add some const-ification in recent code. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17ipv4: Fix nexthop exception hash computation.David S. Miller
Need to mask it with (FNHE_HASH_SIZE - 1). Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
2012-07-17Merge branch 'for-john' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
2012-07-17Merge branch 'nexthop_exceptions'David S. Miller
These patches implement the final mechanism necessary to really allow us to go without the route cache in ipv4. We need a place to have long-term storage of PMTU/redirect information which is independent of the routes themselves, yet does not get us back into a situation where we have to write to metrics or anything like that. For this we use an "next-hop exception" table in the FIB nexthops. The one thing I desperately want to avoid is having to create clone routes in the FIB trie for this purpose, because that is very expensive. However, I'm willing to entertain such an idea later if this current scheme proves to have downsides that the FIB trie variant would not have. In order to accomodate this any such scheme, we need to be able to produce a full flow key at PMTU/redirect time. That required an adjustment of the interface call-sites used to propagate these events. For a PMTU/redirect with a fully specified socket, we pass that socket and use it to produce the flow key. Otherwise we use a passed in SKB to formulate the key. There are two cases that need to be distinguished, ICMP message processing (in which case the IP header is at skb->data) and output packet processing (mostly tunnels, and in all such cases the IP header is at ip_hdr(skb)). We also have to make the code able to handle the case where the dst itself passed into the dst_ops->{update_pmtu,redirect} method is invalidated. This matters for calls from sockets that have cached that route. We provide a inet{,6} helper function for this purpose, and edit SCTP specially since it caches routes at the transport rather than socket level. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17netpoll: move np->dev and np->dev_name init into __netpoll_setup()Jiri Pirko
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17ipv4: Add FIB nexthop exceptions.David S. Miller
In a regime where we have subnetted route entries, we need a way to store persistent storage about destination specific learned values such as redirects and PMTU values. This is implemented here via nexthop exceptions. The initial implementation is a 2048 entry hash table with relaiming starting at chain length 5. A more sophisticated scheme can be devised if that proves necessary. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17tcp: implement RFC 5961 4.2Eric Dumazet
Implement the RFC 5691 mitigation against Blind Reset attack using SYN bit. Section 4.2 of RFC 5961 advises to send a Challenge ACK and drop incoming packet, instead of resetting the session. Add a new SNMP counter to count number of challenge acks sent in response to SYN packets. (netstat -s | grep TCPSYNChallenge) Remove obsolete TCPAbortOnSyn, since we no longer abort a TCP session because of a SYN flag. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kiran Kumar Kella <kkiran@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}()David S. Miller
This will be used so that we can compose a full flow key. Even though we have a route in this context, we need more. In the future the routes will be without destination address, source address, etc. keying. One ipv4 route will cover entire subnets, etc. In this environment we have to have a way to possess persistent storage for redirects and PMTU information. This persistent storage will exist in the FIB tables, and that's why we'll need to be able to rebuild a full lookup flow key here. Using that flow key will do a fib_lookup() and create/update the persistent entry. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17Merge branch 'master' of git://1984.lsi.us.es/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== I know that we're in fairly late stage to request pulls, but the IPVS people pinged me with little patches with oops fixes last week. One of them was recently introduced (during the 3.4 development cycle) while cleaning up the IPVS netns support. They are: * Fix one regression introduced in 3.4 while cleaning up the netns support for IPVS, from Julian Anastasov. * Fix one oops triggered due to resetting the conntrack attached to the skb instead of just putting it in the forward hook, from Lin Ming. This problem seems to be there since 2.6.37 according to Simon Horman. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17mac80211: go out of PS before sending disassocEliad Peller
on disassoc, ieee80211_set_disassoc() goes out of PS before indicating BSS_CHANGED_ASSOC (not sure why this is needed, but some drivers might count on the current behavior). However, it does it after sending the disassoc frame, which results in null-data frame being sent (in order to go out of ps) after we were already sent the disassoc, which is invalid. Fix it by going out of ps before sending the disassoc. Signed-off-by: Eliad Peller <eliad@wizery.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-17cfg80211: remove regulatory_update()Luis R. Rodriguez
regulatory_update() just calls wiphy_update_regulatory(). wiphy_update_regulatory() assumes you already have the reg_mutex held so just move the call within locking context and kill the superfluous regulatory_update(). Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-17cfg80211: make regulatory_update() staticLuis R. Rodriguez
Now that we have wiphy_regulatory_register() we can tuck away the core's regulatory_update() call there and make it static. Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-17cfg80211: rename reg_device_remove() to wiphy_regulatory_deregister()Luis R. Rodriguez
This makes it clearer what we're doing. This now makes a bit more sense given that regardless of the wiphy if the cell base station hint feature is supported we will be modifying the way the regulatory core behaves. Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-17cfg80211: add cellular base station regulatory hint supportLuis R. Rodriguez
Cellular base stations can provide hints to cfg80211 about where they think we are. This can be done for example on a cell phone. To enable these hints we simply allow them through as user regulatory hints but we allow userspace to clasify the hint as either coming directly from the user or coming from a cellular base station. This option is only available when you enable CONFIG_CFG80211_CERTIFICATION_ONUS. The base station hints themselves will not be processed by the core unless at least one device on the system supports this feature. Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-17cfg80211: add CONFIG_CFG80211_CERTIFICATION_ONUSLuis R. Rodriguez
This adds CONFIG_CFG80211_CERTIFICATION_ONUS which is to be used for features / code which require a bit of work on the system integrator's part to ensure that the system will still pass 802.11 regulatory certification. This option is also usable for researchers and experimenters looking to add code in the kernel without impacting compliant code. We'd use CONFIG_EXPERT alone but it seems that most standard Linux distributions are enabling CONFIG_EXPERT already. This allows us to define 802.11 specific kernel features under a flag that is intended by design to be disabled by standard Linux distributions, and only enabled by system integrators or distributions that have done work to ensure regulatory certification on the system with the enabled features. Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-17ipvs: fix oops in ip_vs_dst_event on rmmodJulian Anastasov
After commit 39f618b4fd95ae243d940ec64c961009c74e3333 (3.4) "ipvs: reset ipvs pointer in netns" we can oops in ip_vs_dst_event on rmmod ip_vs because ip_vs_control_cleanup is called after the ipvs_core_ops subsys is unregistered and net->ipvs is NULL. Fix it by exiting early from ip_vs_dst_event if ipvs is NULL. It is safe because all services and dests for the net are already freed. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-07-17cfg80211: fix set_regdom() to cancel requests with same alpha2Kalle Valo
While adding regulatory support to ath6kl I noticed that I easily got the regulatory code confused. The way to reproduce the bug was: 1. iw reg set FI (in userspace) 2. cfg80211 calls ath6kl_reg_notify(FI) 3. ath6kl sets regdomain in firmware 4. firmware sends regdomain event to notify about the new regdomain (FI) 5. ath6kl calls regulatory_hint(FI) And this (from FI to FI transition) confuses cfg80211 and after that I only get "Pending regulatory request, waiting for it to be processed...." messages and regdomain changes won't work anymore. The reason why ath6kl calls regulatory_hint() is that firmware can change the regulatory domain by it's own, for example due to 11d IEs. I could of course workaround this in ath6kl but I think it's better to handle the case in cfg80211. The fix is pretty simple, use a different error code if the regdomain is same and then just set the request processed so that it doesn't block new requests. Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-17cfg80211: support TX error rate CQMThomas Pedersen
Let the user configure serveral TX error conection quality monitoring parameters: % error rate, survey interval, and # of attempted packets. On exceeding the TX failure rate over the given interval, the driver will send a CQM notify event with the actual TX failure rate and packets attempted. Signed-off-by: Thomas Pedersen <c_tpeder@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-17nl80211: add wdev ID as u64 as it shouldJohannes Berg
In one of my previous patches I erroneously used nla_put_u32 for the wdev_id, fix that to use nla_put_u64. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-17mac80211: fix tx-mgmt cookie value being left uninitializedNicolas Cavallari
commit "mac80211: unify SW/offload remain-on-channel" moved the cookie assignment from ieee80211_mgmt_tx() to ieee80211_start_roc_work(). But the latter is only called where offchannel is needed. If offchannel isn't needed/used, a uninitialized cookie value would be returned to userspace. This patch sets the cookie value when offchannel isn't used. Signed-off-by: Nicolas Cavallari <cavallar@lri.fr> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-07-17tcp: implement RFC 5961 3.2Eric Dumazet
Implement the RFC 5691 mitigation against Blind Reset attack using RST bit. Idea is to validate incoming RST sequence, to match RCV.NXT value, instead of previouly accepted window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND) If sequence is in window but not an exact match, send a "challenge ACK", so that the other part can resend an RST with the appropriate sequence. Add a new sysctl, tcp_challenge_ack_limit, to limit number of challenge ACK sent per second. Add a new SNMP counter to count number of challenge acks sent. (netstat -s | grep TCPChallengeACK) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kiran Kumar Kella <kkiran@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17ipv6: fix unappropriate errno returned for non-multicast addressLi Wei
We need to check the passed in multicast address and return appropriate errno(EINVAL) if it is not valid. And it's no need to walk through the ipv6_mc_list in this situation. Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17irda: Fix typo in irdaMasanari Iida
Correct spelling typo in irda. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17sctp: fix sparse warning for sctp_init_cause_fixedIoan Orghici
Fix the following sparse warning: * symbol 'sctp_init_cause_fixed' was not declared. Should it be static? Signed-off-by: Ioan Orghici <ioanorghici@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17ax25: Fix missing breakAlan Cox
At least there seems to be no reason to disallow ROSE sockets when NETROM is loaded. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17netem: refine early skb orphaningEric Dumazet
netem does an early orphaning of skbs. Doing so breaks TCP Small Queue or any mechanism relying on socket sk_wmem_alloc feedback. Ideally, we should perform this orphaning after the rate module and before the delay module, to mimic what happens on a real link : skb orphaning is indeed normally done at TX completion, before the transit on the link. +-------+ +--------+ +---------------+ +-----------------+ + Qdisc +---> Device +--> TX completion +--> links / hops +-> + + + xmit + + skb orphaning + + propagation + +-------+ +--------+ +---------------+ +-----------------+ < rate limiting > < delay, drops, reorders > If netem is used without delay feature (drops, reorders, rate limiting), then we should avoid early skb orphaning, to keep pressure on sockets as long as packets are still in qdisc queue. Ideally, netem should be refactored to implement delay module as the last stage. Current algorithm merges the two phases (rate limiting + delay) so its not correct. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Mark Gordon <msg@google.com> Cc: Andreas Terzis <aterzis@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17caif: Fix access to freed pernet memorySjur Brændeland
unregister_netdevice_notifier() must be called before unregister_pernet_subsys() to avoid accessing already freed pernet memory. This fixes the following oops when doing rmmod: Call Trace: [<ffffffffa0f802bd>] caif_device_notify+0x4d/0x5a0 [caif] [<ffffffff81552ba9>] unregister_netdevice_notifier+0xb9/0x100 [<ffffffffa0f86dcc>] caif_device_exit+0x1c/0x250 [caif] [<ffffffff810e7734>] sys_delete_module+0x1a4/0x300 [<ffffffff810da82d>] ? trace_hardirqs_on_caller+0x15d/0x1e0 [<ffffffff813517de>] ? trace_hardirqs_on_thunk+0x3a/0x3 [<ffffffff81696bad>] system_call_fastpath+0x1a/0x1f RIP [<ffffffffa0f7f561>] caif_get+0x51/0xb0 [caif] Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17net: cgroup: fix access the unallocated memory in netprio cgroupGao feng
there are some out of bound accesses in netprio cgroup. now before accessing the dev->priomap.priomap array,we only check if the dev->priomap exist.and because we don't want to see additional bound checkings in fast path, so we should make sure that dev->priomap is null or array size of dev->priomap.priomap is equal to max_prioidx + 1; so in write_priomap logic,we should call extend_netdev_table when dev->priomap is null and dev->priomap.priomap_len < max_len. and in cgrp_create->update_netdev_tables logic,we should call extend_netdev_table only when dev->priomap exist and dev->priomap.priomap_len < max_len. and it's not needed to call update_netdev_tables in write_priomap, we can only allocate the net device's priomap which we change through net_prio.ifpriomap. this patch also add a return value for update_netdev_tables & extend_netdev_table, so when new_priomap is allocated failed, write_priomap will stop to access the priomap,and return -ENOMEM back to the userspace to tell the user what happend. Change From v3: 1. add rtnl protect when reading max_prioidx in write_priomap. 2. only call extend_netdev_table when map->priomap_len < max_len, this will make sure array size of dev->map->priomap always bigger than any prioidx. 3. add a function write_update_netdev_table to make codes clear. Change From v2: 1. protect extend_netdev_table by RTNL. 2. when extend_netdev_table failed,call dev_put to reduce device's refcount. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Eric Dumazet <edumazet@google.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17bridge: Fix enforcement of multicast hash_max limitThomas Graf
The hash size is doubled when it needs to grow and compared against hash_max. The >= comparison will limit the hash table size to half of what is expected i.e. the default 512 hash_max will not allow the hash table to grow larger than 256. Also print the hash table limit instead of the desirable size when the limit is reached. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17ipv6: fix RTPROT_RA markup of RA routes w/nexthopsDenis Ovsienko
Userspace implementations of network routing protocols sometimes need to tell RA-originated IPv6 routes from other kernel routes to make proper routing decisions. This makes most sense for RA routes with nexthops, namely, default routes and Route Information routes. The intended mean of preserving RA route origin in a netlink message is through indicating RTPROT_RA as protocol code. Function rt6_fill_node() tried to do that for default routes, but its test condition was taken wrong. This change is modeled after the original mailing list posting by Jeff Haran. It fixes the test condition for default route case and sets the same behaviour for Route Information case (both types use nexthops). Handling of the 3rd RA route type, Prefix Information, is left unchanged, as it stands for interface connected routes (without nexthops). Signed-off-by: Denis Ovsienko <infrastation@yandex.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-176lowpan: Change byte order when storing/accessing to len fieldTony Cheneau
Lenght field should be encoded using big endian byte order, such as intend in the specs. As it is currently written, the len field would not be decoded properly on an implementation using the correct byte ordering. Hence, it could lead to interroperability issues. Also, I rewrote the code so that iphc0 argument of lowpan_alloc_new_frame could be removed. Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-176lowpan: Change byte order when storing/accessing u16 tagTony Cheneau
The tag field should be stored and accessed using big endian byte order (as intended in the specs). Or else, when displayed with a trafic analyser, such a Wireshark, the field not properly displayed (e.g. 0x01 00 instead of 0x00 01, and so on). Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-176lowpan: Fix null pointer dereference in UDP uncompression functionTony Cheneau
When a UDP packet gets fragmented, a crash will occur at reassembly time. This is because skb->transport_header is not set during earlier period of fragment reassembly. As a consequence, call to udp_hdr() return NULL and uh (which is NULL) gets dereferenced without much test. Signed-off-by: Tony Cheneau <tony.cheneau@amnesiak.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17Merge branch 'tipc_net-next' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Paul Gortmaker says: ==================== This is the same eight commits as sent for review last week[1], with just the incorporation of the pr_fmt change as suggested by JoeP. There was no additional change requests, so unless you can see something else you'd like me to change, please pull. ... Erik Hugne (5): tipc: use standard printk shortcut macros (pr_err etc.) tipc: remove TIPC packet debugging functions and macros tipc: simplify print buffer handling in tipc_printf tipc: phase out most of the struct print_buf usage tipc: remove print_buf and deprecated log buffer code Paul Gortmaker (3): tipc: factor stats struct out of the larger link struct tipc: limit error messages relating to memory leak to one line tipc: simplify link_print by divorcing it from using tipc_printf ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17sctp: Fix list corruption resulting from freeing an association on a listNeil Horman
A few days ago Dave Jones reported this oops: [22766.294255] general protection fault: 0000 [#1] PREEMPT SMP [22766.295376] CPU 0 [22766.295384] Modules linked in: [22766.387137] ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90 ffff880147c03a74 [22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000 [22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000, [22766.387137] Stack: [22766.387140] ffff880147c03a10 [22766.387140] ffffffffa169f2b6 [22766.387140] ffff88013ed95728 [22766.387143] 0000000000000002 [22766.387143] 0000000000000000 [22766.387143] ffff880003fad062 [22766.387144] ffff88013c120000 [22766.387144] [22766.387145] Call Trace: [22766.387145] <IRQ> [22766.387150] [<ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0 [sctp] [22766.387154] [<ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp] [22766.387157] [<ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp] [22766.387161] [<ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0 [22766.387163] [<ffffffff815827e3>] ? nf_hook_slow+0x133/0x210 [22766.387166] [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0 [22766.387168] [<ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0 [22766.387169] [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0 [22766.387171] [<ffffffff81590a07>] ip_local_deliver+0x47/0x80 [22766.387172] [<ffffffff8158fd80>] ip_rcv_finish+0x150/0x680 [22766.387174] [<ffffffff81590c54>] ip_rcv+0x214/0x320 [22766.387176] [<ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910 [22766.387178] [<ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910 [22766.387180] [<ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40 [22766.387182] [<ffffffff81558f83>] netif_receive_skb+0x23/0x1f0 [22766.387183] [<ffffffff815596a9>] ? dev_gro_receive+0x139/0x440 [22766.387185] [<ffffffff81559280>] napi_skb_finish+0x70/0xa0 [22766.387187] [<ffffffff81559cb5>] napi_gro_receive+0xf5/0x130 [22766.387218] [<ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e] [22766.387242] [<ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e] [22766.387266] [<ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e] [22766.387268] [<ffffffff81559fea>] net_rx_action+0x1aa/0x3d0 [22766.387270] [<ffffffff810a495f>] ? account_system_vtime+0x10f/0x130 [22766.387273] [<ffffffff810734d0>] __do_softirq+0xe0/0x420 [22766.387275] [<ffffffff8169826c>] call_softirq+0x1c/0x30 [22766.387278] [<ffffffff8101db15>] do_softirq+0xd5/0x110 [22766.387279] [<ffffffff81073bc5>] irq_exit+0xd5/0xe0 [22766.387281] [<ffffffff81698b03>] do_IRQ+0x63/0xd0 [22766.387283] [<ffffffff8168ee2f>] common_interrupt+0x6f/0x6f [22766.387283] <EOI> [22766.387284] [22766.387285] [<ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b [22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00 48 89 fb 49 89 f5 66 c1 c0 08 66 39 46 02 [22766.387307] [22766.387307] RIP [22766.387311] [<ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp] [22766.387311] RSP <ffff880147c039b0> [22766.387142] ffffffffa16ab120 [22766.599537] ---[ end trace 3f6dae82e37b17f5 ]--- [22766.601221] Kernel panic - not syncing: Fatal exception in interrupt It appears from his analysis and some staring at the code that this is likely occuring because an association is getting freed while still on the sctp_assoc_hashtable. As a result, we get a gpf when traversing the hashtable while a freed node corrupts part of the list. Nominally I would think that an mibalanced refcount was responsible for this, but I can't seem to find any obvious imbalance. What I did note however was that the two places where we create an association using sctp_primitive_ASSOCIATE (__sctp_connect and sctp_sendmsg), have failure paths which free a newly created association after calling sctp_primitive_ASSOCIATE. sctp_primitive_ASSOCIATE brings us into the sctp_sf_do_prm_asoc path, which issues a SCTP_CMD_NEW_ASOC side effect, which in turn adds a new association to the aforementioned hash table. the sctp command interpreter that process side effects has not way to unwind previously processed commands, so freeing the association from the __sctp_connect or sctp_sendmsg error path would lead to a freed association remaining on this hash table. I've fixed this but modifying sctp_[un]hash_established to use hlist_del_init, which allows us to proerly use hlist_unhashed to check if the node is on a hashlist safely during a delete. That in turn alows us to safely call sctp_unhash_established in the __sctp_connect and sctp_sendmsg error paths before freeing them, regardles of what the associations state is on the hash list. I noted, while I was doing this, that the __sctp_unhash_endpoint was using hlist_unhsashed in a simmilar fashion, but never nullified any removed nodes pointers to make that function work properly, so I fixed that up in a simmilar fashion. I attempted to test this using a virtual guest running the SCTP_RR test from netperf in a loop while running the trinity fuzzer, both in a loop. I wasn't able to recreate the problem prior to this fix, nor was I able to trigger the failure after (neither of which I suppose is suprising). Given the trace above however, I think its likely that this is what we hit. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: davej@redhat.com CC: davej@redhat.com CC: "David S. Miller" <davem@davemloft.net> CC: Vlad Yasevich <vyasevich@gmail.com> CC: Sridhar Samudrala <sri@us.ibm.com> CC: linux-sctp@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17net: make sock diag per-namespaceAndrey Vagin
Before this patch sock_diag works for init_net only and dumps information about sockets from all namespaces. This patch expands sock_diag for all name-spaces. It creates a netlink kernel socket for each netns and filters data during dumping. v2: filter accoding with netns in all places remove an unused variable. Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Pavel Emelyanov <xemul@parallels.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Andrew Vagin <avagin@openvz.org> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17tcp: add OFO snmp countersEric Dumazet
Add three SNMP TCP counters, to better track TCP behavior at global stage (netstat -s), when packets are received Out Of Order (OFO) TCPOFOQueue : Number of packets queued in OFO queue TCPOFODrop : Number of packets meant to be queued in OFO but dropped because socket rcvbuf limit hit. TCPOFOMerge : Number of packets in OFO that were merged with other packets. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-16SUNRPC: Add rpcauth_list_flavors()Chuck Lever
The gss_mech_list_pseudoflavors() function provides a list of currently registered GSS pseudoflavors. This list does not include any non-GSS flavors that have been registered with the RPC client. nfs4_find_root_sec() currently adds these extra flavors by hand. Instead, nfs4_find_root_sec() should be looking at the set of flavors that have been explicitly registered via rpcauth_register(). And, other areas of code will soon need the same kind of list that contains all flavors the kernel currently knows about (see below). Rather than cloning the open-coded logic in nfs4_find_root_sec() to those new places, introduce a generic RPC function that generates a full list of registered auth flavors and pseudoflavors. A new rpc_authops method is added that lists a flavor's pseudoflavors, if it has any. I encountered an interesting module loader loop when I tried to get the RPC client to invoke gss_mech_list_pseudoflavors() by name. This patch is a pre-requisite for server trunking discovery, and a pre-requisite for fixing up the in-kernel mount client to do better automatic security flavor selection. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>