summaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)Author
2011-06-09ipv4: Fix packet size calculation for raw IPsec packets in __ip_append_dataSteffen Klassert
We assume that transhdrlen is positive on the first fragment which is wrong for raw packets. So we don't add exthdrlen to the packet size for raw packets. This leads to a reallocation on IPsec because we have not enough headroom on the skb to place the IPsec headers. This patch fixes this by adding exthdrlen to the packet size whenever the send queue of the socket is empty. This issue was introduced with git commit 1470ddf7 (inet: Remove explicit write references to sk/inet in ip_append_data) Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-09net: pmtu_expires fixesEric Dumazet
commit 2c8cec5c10bc (ipv4: Cache learned PMTU information in inetpeer) added some racy peer->pmtu_expires accesses. As its value can be changed by another cpu/thread, we should be more careful, reading its value once. Add peer_pmtu_expired() and peer_pmtu_cleaned() helpers Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-05netfilter: use unsigned variables for packet lengths in ip[6]_queue.Dave Jones
Netlink message lengths can't be negative, so use unsigned variables. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-06-05netfilter: nf_conntrack: fix ct refcount leak in l4proto->error()Pablo Neira Ayuso
This patch fixes a refcount leak of ct objects that may occur if l4proto->error() assigns one conntrack object to one skbuff. In that case, we have to skip further processing in nf_conntrack_in(). With this patch, we can also fix wrong return values (-NF_ACCEPT) for special cases in ICMP[v6] that should not bump the invalid/error statistic counters. Reported-by: Zoltan Menyhart <Zoltan.Menyhart@bull.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-06-05netfilter: nf_nat: fix crash in nf_nat_csumJulian Anastasov
Fix crash in nf_nat_csum when mangling packets in OUTPUT hook where skb->dev is not defined, it is set later before POSTROUTING. Problem happens for CHECKSUM_NONE. We can check device from rt but using CHECKSUM_PARTIAL should be safe (skb_checksum_help). Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-06-05netfilter: add more values to enum ip_conntrack_infoEric Dumazet
Following error is raised (and other similar ones) : net/ipv4/netfilter/nf_nat_standalone.c: In function ‘nf_nat_fn’: net/ipv4/netfilter/nf_nat_standalone.c:119:2: warning: case value ‘4’ not in enumerated type ‘enum ip_conntrack_info’ gcc barfs on adding two enum values and getting a not enumerated result : case IP_CT_RELATED+IP_CT_IS_REPLY: Add missing enum values Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: David Miller <davem@davemloft.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2011-06-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (40 commits) tg3: Fix tg3_skb_error_unmap() net: tracepoint of net_dev_xmit sees freed skb and causes panic drivers/net/can/flexcan.c: add missing clk_put net: dm9000: Get the chip in a known good state before enabling interrupts drivers/net/davinci_emac.c: add missing clk_put af-packet: Add flag to distinguish VID 0 from no-vlan. caif: Fix race when conditionally taking rtnl lock usbnet/cdc_ncm: add missing .reset_resume hook vlan: fix typo in vlan_dev_hard_start_xmit() net/ipv4: Check for mistakenly passed in non-IPv4 address iwl4965: correctly validate temperature value bluetooth l2cap: fix locking in l2cap_global_chan_by_psm ath9k: fix two more bugs in tx power cfg80211: don't drop p2p probe responses Revert "net: fix section mismatches" drivers/net/usb/catc.c: Fix potential deadlock in catc_ctrl_run() sctp: stop pending timers and purge queues when peer restart asoc drivers/net: ks8842 Fix crash on received packet when in PIO mode. ip_options_compile: properly handle unaligned pointer iwlagn: fix incorrect PCI subsystem id for 6150 devices ...
2011-06-02net/ipv4: Check for mistakenly passed in non-IPv4 addressMarcus Meissner
Check against mistakenly passing in IPv6 addresses (which would result in an INADDR_ANY bind) or similar incompatible sockaddrs. Signed-off-by: Marcus Meissner <meissner@suse.de> Cc: Reinhard Max <max@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-31ip_options_compile: properly handle unaligned pointerChris Metcalf
The current code takes an unaligned pointer and does htonl() on it to make it big-endian, then does a memcpy(). The problem is that the compiler decides that since the pointer is to a __be32, it is legal to optimize the copy into a processor word store. However, on an architecture that does not handled unaligned writes in kernel space, this produces an unaligned exception fault. The solution is to track the pointer as a "char *" (which removes a bunch of unpleasant casts in any case), and then just use put_unaligned_be32() to write the value to memory. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: David S. Miller <davem@zippy.davemloft.net>
2011-05-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: net: Kill ratelimit.h dependency in linux/net.h net: Add linux/sysctl.h includes where needed. net: Kill ether_table[] declaration. inetpeer: fix race in unused_list manipulations atm: expose ATM device index in sysfs IPVS: bug in ip_vs_ftp, same list heaad used in all netns. bug.h: Move ratelimit warn interfaces to ratelimit.h bonding: cleanup module option descriptions net:8021q:vlan.c Fix pr_info to just give the vlan fullname and version. net: davinci_emac: fix dev_err use at probe can: convert to %pK for kptr_restrict support net: fix ETHTOOL_SFEATURES compatibility with old ethtool_ops.set_flags netfilter: Fix several warnings in compat_mtw_from_user(). netfilter: ipset: fix ip_set_flush return code netfilter: ipset: remove unused variable from type_pf_tdel() netfilter: ipset: Use proper timeout value to jiffies conversion
2011-05-27inetpeer: fix race in unused_list manipulationsEric Dumazet
Several crashes in cleanup_once() were reported in recent kernels. Commit d6cc1d642de9 (inetpeer: various changes) added a race in unlink_from_unused(). One way to avoid taking unused_peers.lock before doing the list_empty() test is to catch 0->1 refcnt transitions, using full barrier atomic operations variants (atomic_cmpxchg() and atomic_inc_return()) instead of previous atomic_inc() and atomic_add_unless() variants. We then call unlink_from_unused() only for the owner of the 0->1 transition. Add a new atomic_add_unless_return() static helper With help from Arun Sharma. Refs: https://bugzilla.kernel.org/show_bug.cgi?id=32772 Reported-by: Arun Sharma <asharma@fb.com> Reported-by: Maximilian Engelhardt <maxi@daemonizer.de> Reported-by: Yann Dupont <Yann.Dupont@univ-nantes.fr> Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-26Merge branches 'core-fixes-for-linus' and 'irq-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: seqlock: Get rid of SEQLOCK_UNLOCKED * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: irq: Remove smp_affinity_list when unregister irq proc
2011-05-24igmp: call ip_mc_clear_src() only when we have no users of ip_mc_listVeaceslav Falico
In igmp_group_dropped() we call ip_mc_clear_src(), which resets the number of source filters per mulitcast. However, igmp_group_dropped() is also called on NETDEV_DOWN, NETDEV_PRE_TYPE_CHANGE and NETDEV_UNREGISTER, which means that the group might get added back on NETDEV_UP, NETDEV_REGISTER and NETDEV_POST_TYPE_CHANGE respectively, leaving us with broken source filters. To fix that, we must clear the source filters only when there are no users in the ip_mc_list, i.e. in ip_mc_dec_group() and on device destroy. Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-24seqlock: Get rid of SEQLOCK_UNLOCKEDEric Dumazet
All static seqlock should be initialized with the lockdep friendly __SEQLOCK_UNLOCKED() macro. Remove legacy SEQLOCK_UNLOCKED() macro. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Link: http://lkml.kernel.org/r/%3C1306238888.3026.31.camel%40edumazet-laptop%3E Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2011-05-24net: convert %p usage to %pKDan Rosenberg
The %pK format specifier is designed to hide exposed kernel pointers, specifically via /proc interfaces. Exposing these pointers provides an easy target for kernel write vulnerabilities, since they reveal the locations of writable structures containing easily triggerable function pointers. The behavior of %pK depends on the kptr_restrict sysctl. If kptr_restrict is set to 0, no deviation from the standard %p behavior occurs. If kptr_restrict is set to 1, the default, if the current user (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG (currently in the LSM tree), kernel pointers using %pK are printed as 0's. If kptr_restrict is set to 2, kernel pointers using %pK are printed as 0's regardless of privileges. Replacing with 0's was chosen over the default "(null)", which cannot be parsed by userland %p, which expects "(nil)". The supporting code for kptr_restrict and %pK are currently in the -mm tree. This patch converts users of %p in net/ to %pK. Cases of printing pointers to the syslog are not covered, since this would eliminate useful information for postmortem debugging and the reading of the syslog is already optionally protected by the dmesg_restrict sysctl. Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: James Morris <jmorris@namei.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Thomas Graf <tgraf@infradead.org> Cc: Eugene Teo <eugeneteo@kernel.org> Cc: Kees Cook <kees.cook@canonical.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David S. Miller <davem@davemloft.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-23net: ping: cleanups ping_v4_unhash()Eric Dumazet
net/ipv4/ping.c: In function ‘ping_v4_unhash’: net/ipv4/ping.c:140:28: warning: variable ‘hslot’ set but not used Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Vasiliy Kulikov <segoon@openwall.com> Acked-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits) bnx2x: allow device properly initialize after hotplug bnx2x: fix DMAE timeout according to hw specifications bnx2x: properly handle CFC DEL in cnic flow bnx2x: call dev_kfree_skb_any instead of dev_kfree_skb net: filter: move forward declarations to avoid compile warnings pktgen: refactor pg_init() code pktgen: use vzalloc_node() instead of vmalloc_node() + memset() net: skb_trim explicitely check the linearity instead of data_len ipv4: Give backtrace in ip_rt_bug(). net: avoid synchronize_rcu() in dev_deactivate_many net: remove synchronize_net() from netdev_set_master() rtnetlink: ignore NETDEV_RELEASE and NETDEV_JOIN event net: rename NETDEV_BONDING_DESLAVE to NETDEV_RELEASE bridge: call NETDEV_JOIN notifiers when add a slave netpoll: disable netpoll when enslave a device macvlan: Forward unicast frames in bridge mode to lowerdev net: Remove linux/prefetch.h include from linux/skbuff.h ipv4: Include linux/prefetch.h in fib_trie.c netlabel: Remove prefetches from list handlers. drivers/net: add prefetch header for prefetch users ... Fixed up prefetch parts: removed a few duplicate prefetch.h includes, fixed the location of the igb prefetch.h, took my version of the skbuff.h code without the extra parentheses etc.
2011-05-23Add appropriate <linux/prefetch.h> include for prefetch usersPaul Gortmaker
After discovering that wide use of prefetch on modern CPUs could be a net loss instead of a win, net drivers which were relying on the implicit inclusion of prefetch.h via the list headers showed up in the resulting cleanup fallout. Give them an explicit include via the following $0.02 script. ========================================= #!/bin/bash MANUAL="" for i in `git grep -l 'prefetch(.*)' .` ; do grep -q '<linux/prefetch.h>' $i if [ $? = 0 ] ; then continue fi ( echo '?^#include <linux/?a' echo '#include <linux/prefetch.h>' echo . echo w echo q ) | ed -s $i > /dev/null 2>&1 if [ $? != 0 ]; then echo $i needs manual fixup MANUAL="$i $MANUAL" fi done echo ------------------- 8\<---------------------- echo vi $MANUAL ========================================= Signed-off-by: Paul <paul.gortmaker@windriver.com> [ Fixed up some incorrect #include placements, and added some non-network drivers and the fib_trie.c case - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-23ipv4: Give backtrace in ip_rt_bug().Dave Jones
Add a stack backtrace to the ip_rt_bug path for debugging Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-23ipv4: Include linux/prefetch.h in fib_trie.cDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits) macvlan: fix panic if lowerdev in a bond tg3: Add braces around 5906 workaround. tg3: Fix NETIF_F_LOOPBACK error macvlan: remove one synchronize_rcu() call networking: NET_CLS_ROUTE4 depends on INET irda: Fix error propagation in ircomm_lmp_connect_response() irda: Kill set but unused variable 'bytes' in irlan_check_command_param() irda: Kill set but unused variable 'clen' in ircomm_connect_indication() rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport() be2net: Kill set but unused variable 'req' in lancer_fw_download() irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication() atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined. rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer(). rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler() rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection() rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window() pkt_sched: Kill set but unused variable 'protocol' in tc_classify() isdn: capi: Use pr_debug() instead of ifdefs. tg3: Update version to 3.119 tg3: Apply rx_discards fix to 5719/5720 ... Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c as per Davem.
2011-05-20Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (78 commits) Revert "rcu: Decrease memory-barrier usage based on semi-formal proof" net,rcu: convert call_rcu(prl_entry_destroy_rcu) to kfree batman,rcu: convert call_rcu(softif_neigh_free_rcu) to kfree_rcu batman,rcu: convert call_rcu(neigh_node_free_rcu) to kfree() batman,rcu: convert call_rcu(gw_node_free_rcu) to kfree_rcu net,rcu: convert call_rcu(kfree_tid_tx) to kfree_rcu() net,rcu: convert call_rcu(xt_osf_finger_free_rcu) to kfree_rcu() net/mac80211,rcu: convert call_rcu(work_free_rcu) to kfree_rcu() net,rcu: convert call_rcu(wq_free_rcu) to kfree_rcu() net,rcu: convert call_rcu(phonet_device_rcu_free) to kfree_rcu() perf,rcu: convert call_rcu(swevent_hlist_release_rcu) to kfree_rcu() perf,rcu: convert call_rcu(free_ctx) to kfree_rcu() net,rcu: convert call_rcu(__nf_ct_ext_free_rcu) to kfree_rcu() net,rcu: convert call_rcu(net_generic_release) to kfree_rcu() net,rcu: convert call_rcu(netlbl_unlhsh_free_addr6) to kfree_rcu() net,rcu: convert call_rcu(netlbl_unlhsh_free_addr4) to kfree_rcu() security,rcu: convert call_rcu(sel_netif_free) to kfree_rcu() net,rcu: convert call_rcu(xps_dev_maps_release) to kfree_rcu() net,rcu: convert call_rcu(xps_map_release) to kfree_rcu() net,rcu: convert call_rcu(rps_map_release) to kfree_rcu() ...
2011-05-19ipconfig wait for carrierMicha Nelissen
v3 -> v4: fix return boolean false instead of 0 for ic_is_init_dev Currently the ip auto configuration has a hardcoded delay of 1 second. When (ethernet) link takes longer to come up (e.g. more than 3 seconds), nfs root may not be found. Remove the hardcoded delay, and wait for carrier on at least one network device. Signed-off-by: Micha Nelissen <micha@neli.hopto.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-19net: ping: fix the coding styleChangli Gao
The characters in a line should be no more than 80. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-19net: ping: make local functions staticChangli Gao
As these functions are only used in this file. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-18ipv4: Pass explicit destination address to rt_bind_peer().David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-18ipv4: Pass explicit destination address to rt_get_peer().David S. Miller
This will next trickle down to rt_bind_peer(). Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-18ipv4: Make caller provide flowi4 key to inet_csk_route_req().David S. Miller
This way the caller can get at the fully resolved fl4->{daddr,saddr} etc. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-18ipv4: Kill RT_CACHE_DEBUGDavid S. Miller
It's way past it's usefulness. And this gets rid of a bunch of stray ->rt_{dst,src} references. Even the comment documenting the macro was inaccurate (stated default was 1 when it's 0). If reintroduced, it should be done properly, with dynamic debug facilities. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-17ipv4: Don't use enums as bitmasks in ip_fragment.cDavid S. Miller
Noticed by Joe Perches. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-17net: ping: fix build failureVasiliy Kulikov
If CONFIG_PROC_SYSCTL=n the building process fails: ping.c:(.text+0x52af3): undefined reference to `inet_get_ping_group_range_net' Moved inet_get_ping_group_range_net() to ping.c. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-16ipv4: more compliant RFC 3168 supportEric Dumazet
Commit 6623e3b24a5e (ipv4: IP defragmentation must be ECN aware) was an attempt to not lose "Congestion Experienced" (CE) indications when performing datagram defragmentation. Stefanos Harhalakis raised the point that RFC 3168 requirements were not completely met by this commit. In particular, we MUST detect invalid combinations and eventually drop illegal frames. Reported-by: Stefanos Harhalakis <v13@v13.gr> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-16ipv4: Trivial rt->rt_src conversions in net/ipv4/route.cDavid S. Miller
At these points we have a fully filled in value via the IP header the form of ip_hdr(skb)->saddr Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-16net: ping: dont call udp_ioctl()Eric Dumazet
udp_ioctl() really handles UDP and UDPLite protocols. 1) It can increment UDP_MIB_INERRORS in case first_packet_length() finds a frame with bad checksum. 2) It has a dependency on sizeof(struct udphdr), not applicable to ICMP/PING If ping sockets need to handle SIOCINQ/SIOCOUTQ ioctl, this should be done differently. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Vasiliy Kulikov <segoon@openwall.com> Acked-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-15net: ping: small changesEric Dumazet
ping_table is not __read_mostly, since it contains one rwlock, and is static to ping.c ping_port_rover & ping_v4_lookup are static Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13ipv4: Remove rt->rt_dst reference from ip_forward_options().David S. Miller
At this point iph->daddr equals what rt->rt_dst would hold. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13ipv4: Remove route key identity dependencies in ip_rt_get_source().David S. Miller
Pass in the sk_buff so that we can fetch the necessary keys from the packet header when working with input routes. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13ipv4: Always call ip_options_build() after rest of IP header is filled in.David S. Miller
This will allow ip_options_build() to reliably look at the values of iph->{daddr,saddr} Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13ipv4: Kill spurious write to iph->daddr in ip_forward_options().David S. Miller
This code block executes when opt->srr_is_hit is set. It will be set only by ip_options_rcv_srr(). ip_options_rcv_srr() walks until it hits a matching nexthop in the SRR option addresses, and when it matches one 1) looks up the route for that nexthop and 2) on route lookup success it writes that nexthop value into iph->daddr. ip_forward_options() runs later, and again walks the SRR option addresses looking for the option matching the destination of the route stored in skb_rtable(). This route will be the same exact one looked up for the nexthop by ip_options_rcv_srr(). Therefore "rt->rt_dst == iph->daddr" must be true. All it really needs to do is record the route's source address in the matching SRR option adddress. It need not write iph->daddr again, since that has already been done by ip_options_rcv_srr() as detailed above. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13net: ipv4: add IPPROTO_ICMP socket kindVasiliy Kulikov
This patch adds IPPROTO_ICMP socket kind. It makes it possible to send ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages without any special privileges. In other words, the patch makes it possible to implement setuid-less and CAP_NET_RAW-less /bin/ping. In order not to increase the kernel's attack surface, the new functionality is disabled by default, but is enabled at bootup by supporting Linux distributions, optionally with restriction to a group or a group range (see below). Similar functionality is implemented in Mac OS X: http://www.manpagez.com/man/4/icmp/ A new ping socket is created with socket(PF_INET, SOCK_DGRAM, PROT_ICMP) Message identifiers (octets 4-5 of ICMP header) are interpreted as local ports. Addresses are stored in struct sockaddr_in. No port numbers are reserved for privileged processes, port 0 is reserved for API ("let the kernel pick a free number"). There is no notion of remote ports, remote port numbers provided by the user (e.g. in connect()) are ignored. Data sent and received include ICMP headers. This is deliberate to: 1) Avoid the need to transport headers values like sequence numbers by other means. 2) Make it easier to port existing programs using raw sockets. ICMP headers given to send() are checked and sanitized. The type must be ICMP_ECHO and the code must be zero (future extensions might relax this, see below). The id is set to the number (local port) of the socket, the checksum is always recomputed. ICMP reply packets received from the network are demultiplexed according to their id's, and are returned by recv() without any modifications. IP header information and ICMP errors of those packets may be obtained via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source quenches and redirects are reported as fake errors via the error queue (IP_RECVERR); the next hop address for redirects is saved to ee_info (in network order). socket(2) is restricted to the group range specified in "/proc/sys/net/ipv4/ping_group_range". It is "1 0" by default, meaning that nobody (not even root) may create ping sockets. Setting it to "100 100" would grant permissions to the single group (to either make /sbin/ping g+s and owned by this group or to grant permissions to the "netadmins" group), "0 4294967295" would enable it for the world, "100 4294967295" would enable it for the users, but not daemons. The existing code might be (in the unlikely case anyone needs it) extended rather easily to handle other similar pairs of ICMP messages (Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply etc.). Userspace ping util & patch for it: http://openwall.info/wiki/people/segoon/ping For Openwall GNU/*/Linux it was the last step on the road to the setuid-less distro. A revision of this patch (for RHEL5/OpenVZ kernels) is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs: http://mirrors.kernel.org/openwall/Owl/current/iso/ Initially this functionality was written by Pavel Kankovsky for Linux 2.4.32, but unfortunately it was never made public. All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with the patch. PATCH v3: - switched to flowi4. - minor changes to be consistent with raw sockets code. PATCH v2: - changed ping_debug() to pr_debug(). - removed CONFIG_IP_PING. - removed ping_seq_fops.owner field (unused for procfs). - switched to proc_net_fops_create(). - switched to %pK in seq_printf(). PATCH v1: - fixed checksumming bug. - CAP_NET_RAW may not create icmp sockets anymore. RFC v2: - minor cleanups. - introduced sysctl'able group range to restrict socket(2). Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-13ipv4: Fix 'iph' use before set.David S. Miller
I swear none of my compilers warned about this, yet it is so obvious. > net/ipv4/ip_forward.c: In function 'ip_forward': > net/ipv4/ip_forward.c:87: warning: 'iph' may be used uninitialized in this function Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12ipv4: Elide use of rt->rt_dst in ip_forward()David S. Miller
No matter what kind of header mangling occurs due to IP options processing, rt->rt_dst will always equal iph->daddr in the packet. So we can safely use iph->daddr instead of rt->rt_dst here. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12ipv4: Simplify iph->daddr overwrite in ip_options_rcv_srr().David S. Miller
We already copy the 4-byte nexthop from the options block into local variable "nexthop" for the route lookup. Re-use that variable instead of memcpy()'ing again when assigning to iph->daddr after the route lookup succeeds. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-12ipv4: Kill spurious opt->srr check in ip_options_rcv_srr().David S. Miller
All call sites conditionalize the call to ip_options_rcv_srr() with a check of opt->srr, so no need to check it again there. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-11Merge branch 'master' of ↵David S. Miller
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-3.6 Conflicts: drivers/net/benet/be_main.c
2011-05-10xfrm: Assign the inner mode output function to the dst entrySteffen Klassert
As it is, we assign the outer modes output function to the dst entry when we create the xfrm bundle. This leads to two problems on interfamily scenarios. We might insert ipv4 packets into ip6_fragment when called from xfrm6_output. The system crashes if we try to fragment an ipv4 packet with ip6_fragment. This issue was introduced with git commit ad0081e4 (ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed). The second issue is, that we might insert ipv4 packets in netfilter6 and vice versa on interfamily scenarios. With this patch we assign the inner mode output function to the dst entry when we create the xfrm bundle. So xfrm4_output/xfrm6_output from the inner mode is used and the right fragmentation and netfilter functions are called. We switch then to outer mode with the output_finish functions. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-10net: fix two lockdep splatsEric Dumazet
Commit e67f88dd12f6 (net: dont hold rtnl mutex during netlink dump callbacks) switched rtnl protection to RCU, but we forgot to adjust two rcu_dereference() lockdep annotations : inet_get_link_af_size() or inet_fill_link_af() might be called with rcu_read_lock or rtnl held, so use rcu_dereference_rtnl() instead of rtnl_dereference() Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-10ipv4: xfrm: Eliminate ->rt_src reference in policy code.David S. Miller
Rearrange xfrm4_dst_lookup() so that it works by calling a helper function __xfrm_dst_lookup() that takes an explicit flow key storage area as an argument. Use this new helper in xfrm4_get_saddr() so we can fetch the selected source address from the flow instead of from rt->rt_src Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-10ipv4: udp: Eliminate remaining uses of rt->rt_srcDavid S. Miller
We already track and pass around the correct flow key, so simply use it in udp_send_skb(). Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-10ipv4: icmp: Eliminate remaining uses of rt->rt_srcDavid S. Miller
On input packets, rt->rt_src always equals ip_hdr(skb)->saddr Anything that mangles or otherwise changes the IP header must relookup the route found at skb_rtable(). Therefore this invariant must always hold true. Signed-off-by: David S. Miller <davem@davemloft.net>