summaryrefslogtreecommitdiff
path: root/net/ipv6/route.c
AgeCommit message (Collapse)Author
2012-09-13ipv6: replace write lock with read lock when get route infoLi RongQing
geting route info does not write rt->rt6i_table, so replace write lock with read lock Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-13ipv6: route templates can be constEric Dumazet
We kmemdup() templates, so they can be const. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-10ipv6: remove some useless RCU read lockAmerigo Wang
After this commit: commit 97cac0821af4474ec4ba3a9e7a36b98ed9b6db88 Author: David S. Miller <davem@davemloft.net> Date: Mon Jul 2 22:43:47 2012 -0700 ipv6: Store route neighbour in rt6_info struct. we no longer use RCU to protect route neighbour. Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-10netlink: Rename pid to portid to avoid confusionEric W. Biederman
It is a frequent mistake to confuse the netlink port identifier with a process identifier. Try to reduce this confusion by renaming fields that hold port identifiers portid instead of pid. I have carefully avoided changing the structures exported to userspace to avoid changing the userspace API. I have successfully built an allyesconfig kernel with this change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07ipv6: fix handling of throw routesNicolas Dichtel
It's the same problem that previous fix about blackhole and prohibit routes. When adding a throw route, it was handled like a classic route. Moreover, it was only possible to add this kind of routes by specifying an interface. Before the patch: $ ip route add throw 2001::2/128 RTNETLINK answers: No such device $ ip route add throw 2001::2/128 dev eth0 $ ip -6 route | grep 2001::2 2001::2 dev eth0 metric 1024 After: $ ip route add throw 2001::2/128 $ ip -6 route | grep 2001::2 throw 2001::2 dev lo metric 1024 error -11 Reported-by: Markus Stenberg <markus.stenberg@iki.fi> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-05ipv6: fix handling of blackhole and prohibit routesNicolas Dichtel
When adding a blackhole or a prohibit route, they were handling like classic routes. Moreover, it was only possible to add this kind of routes by specifying an interface. Bug already reported here: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=498498 Before the patch: $ ip route add blackhole 2001::1/128 RTNETLINK answers: No such device $ ip route add blackhole 2001::1/128 dev eth0 $ ip -6 route | grep 2001 2001::1 dev eth0 metric 1024 After: $ ip route add blackhole 2001::1/128 $ ip -6 route | grep 2001 blackhole 2001::1 dev lo metric 1024 error -22 v2: wrong patch v3: add a field fc_type in struct fib6_config to store RTN_* type Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-09net: Loopback ifindex is constant nowPavel Emelyanov
As pointed out, there are places, that access net->loopback_dev->ifindex and after ifindex generation is made per-net this value becomes constant equals 1. So go ahead and introduce the LOOPBACK_IFINDEX constant and use it where appropriate. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-30ipv6: fix incorrect route 'expires' value passed to userspaceLi Wei
When userspace use RTM_GETROUTE to dump route table, with an already expired route entry, we always got an 'expires' value(2147157) calculated base on INT_MAX. The reason of this problem is in the following satement: rt->dst.expires - jiffies < INT_MAX gcc promoted the type of both sides of '<' to unsigned long, thus a small negative value would be considered greater than INT_MAX. With the help of Eric Dumazet, do the out of bound checks in rtnl_put_cacheinfo(), _after_ conversion to clock_t. Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20net: Document dst->obsolete better.David S. Miller
Add a big comment explaining how the field works, and use defines instead of magic constants for the values assigned to it. Suggested by Joe Perches. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17Merge branch 'nexthop_exceptions'David S. Miller
These patches implement the final mechanism necessary to really allow us to go without the route cache in ipv4. We need a place to have long-term storage of PMTU/redirect information which is independent of the routes themselves, yet does not get us back into a situation where we have to write to metrics or anything like that. For this we use an "next-hop exception" table in the FIB nexthops. The one thing I desperately want to avoid is having to create clone routes in the FIB trie for this purpose, because that is very expensive. However, I'm willing to entertain such an idea later if this current scheme proves to have downsides that the FIB trie variant would not have. In order to accomodate this any such scheme, we need to be able to produce a full flow key at PMTU/redirect time. That required an adjustment of the interface call-sites used to propagate these events. For a PMTU/redirect with a fully specified socket, we pass that socket and use it to produce the flow key. Otherwise we use a passed in SKB to formulate the key. There are two cases that need to be distinguished, ICMP message processing (in which case the IP header is at skb->data) and output packet processing (mostly tunnels, and in all such cases the IP header is at ip_hdr(skb)). We also have to make the code able to handle the case where the dst itself passed into the dst_ops->{update_pmtu,redirect} method is invalidated. This matters for calls from sockets that have cached that route. We provide a inet{,6} helper function for this purpose, and edit SCTP specially since it caches routes at the transport rather than socket level. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}()David S. Miller
This will be used so that we can compose a full flow key. Even though we have a route in this context, we need more. In the future the routes will be without destination address, source address, etc. keying. One ipv4 route will cover entire subnets, etc. In this environment we have to have a way to possess persistent storage for redirects and PMTU information. This persistent storage will exist in the FIB tables, and that's why we'll need to be able to rebuild a full lookup flow key here. Using that flow key will do a fib_lookup() and create/update the persistent entry. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17ipv6: fix RTPROT_RA markup of RA routes w/nexthopsDenis Ovsienko
Userspace implementations of network routing protocols sometimes need to tell RA-originated IPv6 routes from other kernel routes to make proper routing decisions. This makes most sense for RA routes with nexthops, namely, default routes and Route Information routes. The intended mean of preserving RA route origin in a netlink message is through indicating RTPROT_RA as protocol code. Function rt6_fill_node() tried to do that for default routes, but its test condition was taken wrong. This change is modeled after the original mailing list posting by Jeff Haran. It fixes the test condition for default route case and sets the same behaviour for Route Information case (both types use nexthops). Handling of the 3rd RA route type, Prefix Information, is left unchanged, as it stands for interface connected routes (without nexthops). Signed-off-by: Denis Ovsienko <infrastation@yandex.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-14ipv6: Initialize the struct rt6_info behind the dst_enty fieldSteffen Klassert
We start initializing the struct rt6_info at the first field behind the struct dst_enty. This is error prone because it might leave a new field uninitialized. So start initializing the struct rt6_info right behind the dst_entry. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-12net: Add dummy dst_ops->redirect method where needed.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-12ipv6: Use icmpv6_notify() to propagate redirect, instead of rt6_redirect().David S. Miller
And delete rt6_redirect(), since it is no longer used. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-12ipv6: Add ip6_redirect() and ip6_sk_redirect() helper functions.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-12ipv6: Pull main logic of rt6_redirect() into rt6_do_redirect().David S. Miller
Hook it into dst_ops->redirect as well. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-12ipv6: Move bulk of redirect handling into rt6_redirect().David S. Miller
This sets things up so that we can have the protocol error handlers call down into the ipv6 route code for redirects just as ipv4 already does. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-11rtnetlink: Remove ts/tsage args to rtnl_put_cacheinfo().David S. Miller
Nobody provides non-zero values any longer. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-11inet: Kill FLOWI_FLAG_PRECOW_METRICS.David S. Miller
No longer needed. TCP writes metrics, but now in it's own special cache that does not dirty the route metrics. Therefore there is no longer any reason to pre-cow metrics in this way. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-11tcp: Move timestamps from inetpeer to metrics cache.David S. Miller
With help from Lin Ming. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05ipv6: Initialize the neighbour pointer of rt6_info on allocationSteffen Klassert
git commit 97cac082 (ipv6: Store route neighbour in rt6_info struct) added a neighbour pointer to rt6_info. Currently we don't initialize this pointer at allocation time. We assume this pointer to be valid if it is not a null pointer, so initialize it on allocation. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05ipv6: Store route neighbour in rt6_info struct.David S. Miller
This makes for a simplified conversion away from dst_get_neighbour*(). All code outside of ipv6 will use neigh lookups via dst_neigh_lookup*(). Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05net: Pass neighbours and dest address into NETEVENT_REDIRECT events.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05net: Add optional SKB arg to dst_ops->neigh_lookup().David S. Miller
Causes the handler to use the daddr in the ipv4/ipv6 header when the route gateway is unspecified (local subnet). Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-26net/ipv6/route.c: packets originating on device match loDavid McCullough
Fix to allow IPv6 packets originating locally to match rules with the "iff" set to "lo". This allows IPv6 rule matching work the same as it does for IPv4. From the iproute2 man page: iif NAME select the incoming device to match. If the interface is loop‐ back, the rule only matches packets originating from this host. This means that you may create separate routing tables for for‐ warded and local packets and, hence, completely segregate them. Signed-off-by: David McCullough <david_mccullough@mcafee.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/usb/qmi_wwan.c net/batman-adv/translation-table.c net/ipv6/route.c qmi_wwan.c resolution provided by Bjørn Mork. batman-adv conflict is dealing merely with the changes of global function names to have a proper subsystem prefix. ipv6's route.c conflict is merely two side-by-side additions of network namespace methods. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-19ipv6: Move ipv6 proc file registration to end of init orderThomas Graf
/proc/net/ipv6_route reflects the contents of fib_table_hash. The proc handler is installed in ip6_route_net_init() whereas fib_table_hash is allocated in fib6_net_init() _after_ the proc handler has been installed. This opens up a short time frame to access fib_table_hash with its pants down. Move the registration of the proc files to a later point in the init order to avoid the race. Tested :-) Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/ipv6/route.c Pull in 'net' again to get the revert of Thomas's change which introduced regressions. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-16Revert "ipv6: Prevent access to uninitialized fib_table_hash via ↵David S. Miller
/proc/net/ipv6_route" This reverts commit 2a0c451ade8e1783c5d453948289e4a978d417c9. It causes crashes, because now ip6_null_entry is used before it is initialized. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-16ipv6: Fix types of ip6_update_pmtu().David S. Miller
The mtu should be a __be32, not the mark. Reported-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/ipv6/route.c This deals with a merge conflict between the net-next addition of the inetpeer network namespace ops, and Thomas Graf's bug fix in 2a0c451ade8e1783c5d453948289e4a978d417c9 which makes sure we don't register /proc/net/ipv6_route before it is actually safe to do so. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-15ipv6: Prevent access to uninitialized fib_table_hash via /proc/net/ipv6_routeThomas Graf
/proc/net/ipv6_route reflects the contents of fib_table_hash. The proc handler is installed in ip6_route_net_init() whereas fib_table_hash is allocated in fib6_net_init() _after_ the proc handler has been installed. This opens up a short time frame to access fib_table_hash with its pants down. fib6_init() as a whole can't be moved to an earlier position as it also registers the rtnetlink message handlers which should be registered at the end. Therefore split it into fib6_init() which is run early and fib6_init_late() to register the rtnetlink message handlers. Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-15ipv6: Handle PMTU in ICMP error handlers.David S. Miller
One tricky issue on the ipv6 side vs. ipv4 is that the ICMP callouts to handle the error pass the 32-bit info cookie in network byte order whereas ipv4 passes it around in host byte order. Like the ipv4 side, we have two helper functions. One for when we have a socket context and one for when we do not. ip6ip6 tunnels are not handled here, because they handle PMTU events by essentially relaying another ICMP packet-too-big message back to the original sender. This patch allows us to get rid of rt6_do_pmtu_disc(). It handles all kinds of situations that simply cannot happen when we do the PMTU update directly using a fully resolved route. In fact, the "plen == 128" check in ip6_rt_update_pmtu() can very likely be removed or changed into a BUG_ON() check. We should never have a prefixed ipv6 route when we get there. Another piece of strange history here is that TCP and DCCP, unlike in ipv4, never invoke the update_pmtu() method from their ICMP error handlers. This is incredibly astonishing since this is the context where we have the most accurate context in which to make a PMTU update, namely we have a fully connected socket and associated cached socket route. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11inet: Avoid potential NULL peer dereference.David S. Miller
We handle NULL in rt{,6}_set_peer but then our caller will try to pass that NULL pointer into inet_putpeer() which isn't ready for it. Fix this by moving the NULL check one level up, and then remove the now unnecessary NULL check from inetpeer_ptr_set_peer(). Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11inet: Use FIB table peer roots in routes.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-11inet: Hide route peer accesses behind helpers.David S. Miller
We encode the pointer(s) into an unsigned long with one state bit. The state bit is used so we can store the inetpeer tree root to use when resolving the peer later. Later the peer roots will be per-FIB table, and this change works to facilitate that. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-10inet: Pass inetpeer root into inet_getpeer*() interfaces.David S. Miller
Otherwise we reference potentially non-existing members when ipv6 is disabled. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-10ipv6: Do not mark ipv6_inetpeer_ops as __net_initdata.David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09inet: Consolidate inetpeer_invalidate_tree() interfaces.David S. Miller
We only need one interface for this operation, since we always know which inetpeer root we want to flush. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09inet: Initialize per-netns inetpeer roots in net/ipv{4,6}/route.cDavid S. Miller
Instead of net/ipv4/inetpeer.c Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-09inet: Create and use rt{,6}_get_peer_create().David S. Miller
There's a lot of places that open-code rt{,6}_get_peer() only because they want to set 'create' to one. So add an rt{,6}_get_peer_create() for their sake. There were also a few spots open-coding plain rt{,6}_get_peer() and those are transformed here as well. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-08inetpeer: add parameter net for inet_getpeer_v4,v6Gao feng
add struct net as a parameter of inet_getpeer_v[4,6], use net to replace &init_net. and modify some places to provide net for inet_getpeer_v[4,6] Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-19ipv6: bool/const conversions phase2Eric Dumazet
Mostly bool conversions, some inline removals and const additions. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16net: ipv6: Standardize prefixes for message loggingJoe Perches
Add #define pr_fmt(fmt) as appropriate. Add "IPv6: " to appropriate files. Convert printk(KERN_<LEVEL> to pr_<level> (but not KERN_DEBUG). Standardize on "%s: " not "%s(): " when emitting __func__. Use "%s: ", __func__ instead of embedding function name. Coalesce formats, align arguments. ADDRCONF output is now prefixed with "IPv6: " Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-15net: Convert net_ratelimit uses to net_<level>_ratelimitedJoe Perches
Standardize the net core ratelimited logging functions. Coalesce formats, align arguments. Change a printk then vprintk sequence to use printf extension %pV. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/atheros/atlx/atl1.c drivers/net/ethernet/atheros/atlx/atl1.h Resolved a conflict between a DMA error bug fix and NAPI support changes in the atl1 driver. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-15net: cleanup unsigned to unsigned intEric Dumazet
Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-13ipv6: fix problem with expired dst cacheGao feng
If the ipv6 dst cache which copy from the dst generated by ICMPV6 RA packet. this dst cache will not check expire because it has no RTF_EXPIRES flag. So this dst cache will always be used until the dst gc run. Change the struct dst_entry,add a union contains new pointer from and expires. When rt6_info.rt6i_flags has no RTF_EXPIRES flag,the dst.expires has no use. we can use this field to point to where the dst cache copy from. The dst.from is only used in IPV6. rt6_check_expired check if rt6_info.dst.from is expired. ip6_rt_copy only set dst.from when the ort has flag RTF_ADDRCONF and RTF_DEFAULT.then hold the ort. ip6_dst_destroy release the ort. Add some functions to operate the RTF_EXPIRES flag and expires(from) together. and change the code to use these new adding functions. Changes from v5: modify ip6_route_add and ndisc_router_discovery to use new adding functions. Only set dst.from when the ort has flag RTF_ADDRCONF and RTF_DEFAULT.then hold the ort. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04ipv6: Fix 'inet6_rtm_getroute' to release 'rt->dst' in case of 'alloc_skb' ↵Shmulik Ladkani
failure In 72331bc [ipv6: Fix RTM_GETROUTE's interpretation of RTA_IIF to be consistent with ipv4] the code of 'inet6_rtm_getroute()' was re-ordered such that the reference to 'rt->dst' is incremented prior skb allocation. Hence, if 'alloc_skb()' fails, must drop a reference from 'rt->dst'. Add the missing 'dst_release()' call. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>