summaryrefslogtreecommitdiff
path: root/net/core
AgeCommit message (Collapse)Author
2013-08-27net: Check the correct namespace when spoofing pid over SCM_RIGHTSAndy Lutomirski
This is a security bug. The follow-up will fix nsproxy to discourage this type of issue from happening again. Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2013-08-15rtnetlink: remove an unneeded testDan Carpenter
We know that "dev" is a valid pointer at this point, so we can remove the test and clean up a little. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-15dev: move skb_scrub_packet() after eth_type_trans()Nicolas Dichtel
skb_scrub_packet() was called before eth_type_trans() to let eth_type_trans() set pkt_type. In fact, we should force pkt_type to PACKET_HOST, so move the call after eth_type_trans(). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-14rtnetlink: rtnl_bridge_getlink: Call nlmsg_find_attr() with ifinfomsg headerAsbjoern Sloth Toennesen
Fix the iproute2 command `bridge vlan show`, after switching from rtgenmsg to ifinfomsg. Let's start with a little history: Feb 20: Vlad Yasevich got his VLAN-aware bridge patchset included in the 3.9 merge window. In the kernel commit 6cbdceeb, he added attribute support to bridge GETLINK requests sent with rtgenmsg. Mar 6th: Vlad got this iproute2 reference implementation of the bridge vlan netlink interface accepted (iproute2 9eff0e5c) Apr 25th: iproute2 switched from using rtgenmsg to ifinfomsg (63338dca) http://patchwork.ozlabs.org/patch/239602/ http://marc.info/?t=136680900700007 Apr 28th: Linus released 3.9 Apr 30th: Stephen released iproute2 3.9.0 The `bridge vlan show` command haven't been working since the switch to ifinfomsg, or in a released version of iproute2. Since the kernel side only supports rtgenmsg, which iproute2 switched away from just prior to the iproute2 3.9.0 release. I haven't been able to find any documentation, about neither rtgenmsg nor ifinfomsg, and in which situation to use which, but kernel commit 88c5b5ce seams to suggest that ifinfomsg should be used. Fixing this in kernel will break compatibility, but I doubt that anybody have been using it due to this bug in the user space reference implementation, at least not without noticing this bug. That said the functionality is still fully functional in 3.9, when reversing iproute2 commit 63338dca. This could also be fixed in iproute2, but thats an ugly patch that would reintroduce rtgenmsg in iproute2, and from searching in netdev it seams like rtgenmsg usage is discouraged. I'm assuming that the only reason that Vlad implemented the kernel side to use rtgenmsg, was because iproute2 was using it at the time. Signed-off-by: Asbjoern Sloth Toennesen <ast@fiberby.net> Reviewed-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-10rtnetlink: Fix inverted check in ndo_dflt_fdb_del()Sridhar Samudrala
Fix inverted check when deleting an fdb entry. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-10net: attempt high order allocations in sock_alloc_send_pskb()Eric Dumazet
Adding paged frags skbs to af_unix sockets introduced a performance regression on large sends because of additional page allocations, even if each skb could carry at least 100% more payload than before. We can instruct sock_alloc_send_pskb() to attempt high order allocations. Most of the time, it does a single page allocation instead of 8. I added an additional parameter to sock_alloc_send_pskb() to let other users to opt-in for this new feature on followup patches. Tested: Before patch : $ netperf -t STREAM_STREAM STREAM STREAM TEST Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 2304 212992 212992 10.00 46861.15 After patch : $ netperf -t STREAM_STREAM STREAM STREAM TEST Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 2304 212992 212992 10.00 57981.11 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09net: flow_dissector: add 802.1ad supportEric Dumazet
Same behavior than 802.1q : finds the encapsulated protocol and skip 32bit header. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-09cgroup: make cgroup_taskset deal with cgroup_subsys_state instead of cgroupTejun Heo
cgroup is in the process of converting to css (cgroup_subsys_state) from cgroup as the principal subsystem interface handle. This is mostly to prepare for the unified hierarchy support where css's will be created and destroyed dynamically but also helps cleaning up subsystem implementations as css is usually what they are interested in anyway. cgroup_taskset which is used by the subsystem attach methods is the last cgroup subsystem API which isn't using css as the handle. Update cgroup_taskset_cur_cgroup() to cgroup_taskset_cur_css() and cgroup_taskset_for_each() to take @skip_css instead of @skip_cgrp. The conversions are pretty mechanical. One exception is cpuset::cgroup_cs(), which lost its last user and got removed. This patch shouldn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-08-09cgroup: pass around cgroup_subsys_state instead of cgroup in file methodsTejun Heo
cgroup is currently in the process of transitioning to using struct cgroup_subsys_state * as the primary handle instead of struct cgroup. Please see the previous commit which converts the subsystem methods for rationale. This patch converts all cftype file operations to take @css instead of @cgroup. cftypes for the cgroup core files don't have their subsytem pointer set. These will automatically use the dummy_css added by the previous patch and can be converted the same way. Most subsystem conversions are straight forwards but there are some interesting ones. * freezer: update_if_frozen() is also converted to take @css instead of @cgroup for consistency. This will make the code look simpler too once iterators are converted to use css. * memory/vmpressure: mem_cgroup_from_css() needs to be exported to vmpressure while mem_cgroup_from_cont() can be made static. Updated accordingly. * cpu: cgroup_tg() doesn't have any user left. Removed. * cpuacct: cgroup_ca() doesn't have any user left. Removed. * hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left. Removed. * net_cls: cgrp_cls_state() doesn't have any user left. Removed. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-08-09cgroup: pass around cgroup_subsys_state instead of cgroup in subsystem methodsTejun Heo
cgroup is currently in the process of transitioning to using struct cgroup_subsys_state * as the primary handle instead of struct cgroup * in subsystem implementations for the following reasons. * With unified hierarchy, subsystems will be dynamically bound and unbound from cgroups and thus css's (cgroup_subsys_state) may be created and destroyed dynamically over the lifetime of a cgroup, which is different from the current state where all css's are allocated and destroyed together with the associated cgroup. This in turn means that cgroup_css() should be synchronized and may return NULL, making it more cumbersome to use. * Differing levels of per-subsystem granularity in the unified hierarchy means that the task and descendant iterators should behave differently depending on the specific subsystem the iteration is being performed for. * In majority of the cases, subsystems only care about its part in the cgroup hierarchy - ie. the hierarchy of css's. Subsystem methods often obtain the matching css pointer from the cgroup and don't bother with the cgroup pointer itself. Passing around css fits much better. This patch converts all cgroup_subsys methods to take @css instead of @cgroup. The conversions are mostly straight-forward. A few noteworthy changes are * ->css_alloc() now takes css of the parent cgroup rather than the pointer to the new cgroup as the css for the new cgroup doesn't exist yet. Knowing the parent css is enough for all the existing subsystems. * In kernel/cgroup.c::offline_css(), unnecessary open coded css dereference is replaced with local variable access. This patch shouldn't cause any behavior differences. v2: Unnecessary explicit cgrp->subsys[] deref in css_online() replaced with local variable @css as suggested by Li Zefan. Rebased on top of new for-3.12 which includes for-3.11-fixes so that ->css_free() invocation added by da0a12caff ("cgroup: fix a leak when percpu_ref_init() fails") is converted too. Suggested by Li Zefan. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Steven Rostedt <rostedt@goodmis.org>
2013-08-09netprio_cgroup: pass around @css instead of @cgroup and kill struct ↵Tejun Heo
cgroup_netprio_state cgroup controller API will be converted to primarily use struct cgroup_subsys_state instead of struct cgroup. In preparation, make the internal functions of netprio_cgroup pass around @css instead of @cgrp. While at it, kill struct cgroup_netprio_state which only contained struct cgroup_subsys_state without serving any purpose. All functions are converted to deal with @css directly. This patch shouldn't cause any behavior differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: David S. Miller <davem@davemloft.net>
2013-08-09cgroup: s/cgroup_subsys_state/cgroup_css/ s/task_subsys_state/task_css/Tejun Heo
The names of the two struct cgroup_subsys_state accessors - cgroup_subsys_state() and task_subsys_state() - are somewhat awkward. The former clashes with the type name and the latter doesn't even indicate it's somehow related to cgroup. We're about to revamp large portion of cgroup API, so, let's rename them so that they're less awkward. Most per-controller usages of the accessors are localized in accessor wrappers and given the amount of scheduled changes, this isn't gonna add any noticeable headache. Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state() to task_css(). This patch is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
2013-08-07net: use skb_copy_datagram_from_iovec() in zerocopy_sg_from_iovec()Jason Wang
Use skb_copy_datagram_from_iovec() to avoid code duplication and make it easy to be read. Also we can do the skipping inside the zero-copy loop. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07net: use release_pages() in zerocopy_sg_from_iovec()Jason Wang
To reduce the duplicated codes. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07net: remove the useless comment in zerocopy_sg_from_iovec()Jason Wang
Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07net: use skb_fill_page_desc() in zerocopy_sg_from_iovec()Jason Wang
Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07net: move zerocopy_sg_from_iovec() to net/core/datagram.cJason Wang
To let it be reused and reduce code duplication. Also document this function. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-07net: move iov_pages() to net/core/iovec.cJason Wang
To let it be reused and reduce code duplication. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-05neighbour: populate neigh_parms on alloc before calling ndo_neigh_setupVeaceslav Falico
dev->ndo_neigh_setup() might need some of the values of neigh_parms, so populate them before calling it. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Merge net into net-next to setup some infrastructure Eric Dumazet needs for usbnet changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-03fib_rules: fix suppressor names and default valuesStefan Tomanek
This change brings the suppressor attribute names into line; it also changes the data types to provide a more consistent interface. While -1 indicates that the suppressor is not enabled, values >= 0 for suppress_prefixlen or suppress_ifgroup reject routing decisions violating the constraint. This changes the previously presented behaviour of suppress_prefixlen, where a prefix length _less_ than the attribute value was rejected. After this change, a prefix length less than *or* equal to the value is considered a violation of the rule constraint. It also changes the default values for default and newly added rules (disabling any suppression for those). Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-02neighbour: populate neigh_parms on alloc before calling ndo_neigh_setupVeaceslav Falico
dev->ndo_neigh_setup() might need some of the values of neigh_parms, so populate them before calling it. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-02fib_rules: add route suppression based on ifgroupStefan Tomanek
This change adds the ability to suppress a routing decision based upon the interface group the selected interface belongs to. This allows it to exclude specific devices from a routing decision. Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-02net: check net.core.somaxconn sysctl valuesRoman Gushchin
It's possible to assign an invalid value to the net.core.somaxconn sysctl variable, because there is no checks at all. The sk_max_ack_backlog field of the sock structure is defined as unsigned short. Therefore, the backlog argument in inet_listen() shouldn't exceed USHRT_MAX. The backlog argument in the listen() syscall is truncated to the somaxconn value. So, the somaxconn value shouldn't exceed 65535 (USHRT_MAX). Also, negative values of somaxconn are meaningless. before: $ sysctl -w net.core.somaxconn=256 net.core.somaxconn = 256 $ sysctl -w net.core.somaxconn=65536 net.core.somaxconn = 65536 $ sysctl -w net.core.somaxconn=-100 net.core.somaxconn = -100 after: $ sysctl -w net.core.somaxconn=256 net.core.somaxconn = 256 $ sysctl -w net.core.somaxconn=65536 error: "Invalid argument" setting key "net.core.somaxconn" $ sysctl -w net.core.somaxconn=-100 error: "Invalid argument" setting key "net.core.somaxconn" Based on a prior patch from Changli Gao. Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Reported-by: Changli Gao <xiaosuo@gmail.com> Suggested-by: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01net: rename CONFIG_NET_LL_RX_POLL to CONFIG_NET_RX_BUSY_POLLCong Wang
Eliezer renames several *ll_poll to *busy_poll, but forgets CONFIG_NET_LL_RX_POLL, so in case of confusion, rename it too. Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-01fib_rules: add .suppress operationStefan Tomanek
This change adds a new operation to the fib_rules_ops struct; it allows the suppression of routing decisions if certain criteria are not met by its results. The first implemented constraint is a minimum prefix length added to the structures of routing rules. If a rule is added with a minimum prefix length >0, only routes meeting this threshold will be considered. Any other (more general) routing table entries will be ignored. When configuring a system with multiple network uplinks and default routes, it is often convinient to reference the main routing table multiple times - but omitting the default route. Using this patch and a modified "ip" utility, this can be achieved by using the following command sequence: $ ip route add table secuplink default via 10.42.23.1 $ ip rule add pref 100 table main prefixlength 1 $ ip rule add pref 150 fwmark 0xA table secuplink With this setup, packets marked 0xA will be processed by the additional routing table "secuplink", but only if no suitable route in the main routing table can be found. By using a minimal prefixlength of 1, the default route (/0) of the table "main" is hidden to packets processed by rule 100; packets traveling to destinations with more specific routing entries are processed as usual. Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31netem: Introduce skb_orphan_partial() helperEric Dumazet
Commit 547669d483e578 ("tcp: xps: fix reordering issues") added unexpected reorders in case netem is used in a MQ setup for high performance test bed. ETH=eth0 tc qd del dev $ETH root 2>/dev/null tc qd add dev $ETH root handle 1: mq for i in `seq 1 32` do tc qd add dev $ETH parent 1:$i netem delay 100ms done As all tcp packets are orphaned by netem, TCP stack believes it can set skb->ooo_okay on all packets. In order to allow producers to send more packets, we want to keep sk_wmem_alloc from reaching sk_sndbuf limit. We can do that by accounting one byte per skb in netem queues, so that TCP stack is not fooled too much. Tested: With above MQ/netem setup, scaling number of concurrent flows gives linear results and no reorders/retransmits lpq83:~# for n in 1 10 20 30 40 50 60 70 80 90 100 do echo -n "n:$n " ; ./super_netperf $n -H 10.7.7.84; done n:1 198.46 n:10 2002.69 n:20 4000.98 n:30 6006.35 n:40 8020.93 n:50 10032.3 n:60 12081.9 n:70 13971.3 n:80 16009.7 n:90 17117.3 n:100 17425.5 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31flow_dissector: add support for IPPROTO_IPV6Tom Herbert
Support IPPROTO_IPV6 similar to IPPROTO_IPIP Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31flow_dissector: clean up IPIP caseTom Herbert
Explicitly set proto to ETH_P_IP and jump directly to ip processing. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31net: export physical port id via sysfsJiri Pirko
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Narendra K <narendra_k@dell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31rtnl: export physical port id via RT netlinkJiri Pirko
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Narendra K <narendra_k@dell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-31net: add ndo to get id of physical port of the deviceJiri Pirko
This patch adds a ndo for getting physical port of the device. Driver which is aware of being virtual function of some physical port should implement this ndo. This is applicable not only for IOV, but for other solutions (NPAR, multichannel) as well. Basically if there is possible to have multiple netdevs on the single hw port. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-29pktgen: add needed include fileStephen Rothwell
Fixes this on PowerPC (at least): net/core/pktgen.c: In function 'fill_packet_ipv6': net/core/pktgen.c:2906:3: error: implicit declaration of function 'csum_ipv6_magic' [-Werror=implicit-function-declaration] udph->check = ~csum_ipv6_magic(&iph->saddr, &iph->daddr, udplen, IPPROTO_UDP, 0); ^ Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-28pktgen: Use ip_send_check() to compute checksumThomas Graf
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-28pktgen: Add UDPCSUM flag to support UDP checksumsThomas Graf
UDP checksums are optional, hence pktgen has been omitting them in favour of performance. The optional flag UDPCSUM enables UDP checksumming. If the output device supports hardware checksumming the skb is prepared and marked CHECKSUM_PARTIAL, otherwise the checksum is generated in software. Signed-off-by: Thomas Graf <tgraf@suug.ch> Cc: Eric Dumazet <edumazet@google.com> Cc: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-27net: core: convert class code to use dev_groupsGreg Kroah-Hartman
The dev_attrs field of struct class is going away soon, dev_groups should be used instead. This converts the networking core class code to use the correct field. In order to do this in the "cleanest" way, some of the macros had to be changed to reflect the driver core format of naming show/store functions, which accounts for the majority of the churn in this file. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-07-26neigh: prevent overflowing params in /proc/sys/net/ipv4/neigh/Francesco Fusco
Without this patch, the fields app_solicit, gc_thresh1, gc_thresh2, gc_thresh3, proxy_qlen, ucast_solicit, mcast_solicit could have assumed negative values when setting large numbers. Signed-off-by: Francesco Fusco <ffusco@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-25net: fix comment above build_skb()Florian Fainelli
build_skb() specifies that the data parameter must come from a kmalloc'd area, this is only true if frag_size equals 0, because then build_skb() will use kzsize(data) to figure out the actual data size. Update the comment to reflect that special condition. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-25net: Make devnet_rename_seq staticThomas Gleixner
No users outside net/core/dev.c. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-25net: add sk_stream_is_writeable() helperEric Dumazet
Several call sites use the hardcoded following condition : sk_stream_wspace(sk) >= sk_stream_min_wspace(sk) Lets use a helper because TCP_NOTSENT_LOWAT support will change this condition for TCP sockets. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-22net: Provide a generic socket error queue delivery method for Tx time stamps.Richard Cochran
This patch moves the private error queue delivery function from the af_packet code to the core socket method. In this way, network layers only needing the error queue for transmit time stamping can share common code. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "A couple interesting SKB fragment handling fixes, plus the usual small bits here and there: 1) Fix 64-bit divide build failure on 32-bit platforms in mlx5, from Tim Gardner. 2) Get rid of a stupid reimplementation on "%*phC" in our sysfs MAC address printing helper. 3) Fix NETIF_F_SG capability advertisement in hyperv driver, if the device can't do checksumming offloads then it shouldn't say it can do SG either. From Haiyang Zhang. 4) bgmac needs to depend on PHYLIB, from Hauke Mehrtens. 5) Don't leak DMA mappings on mapping failures, from Neil Horman. 6) We need to reset the transport header of SKBs in ipv4 before we attempt to perform early socket demux, just like ipv6 does. From Eric Dumazet. 7) Add missing locking on vxlan device removal, from Stephen Hemminger. 8) xen-netfront has to make two passes over an SKB to prepare it for transfer. One pass calculates the number of slots needed, the second massages the SKB and fills the slots. Unfortunately, the first pass doesn't calculate the number of slots properly so we can end up trying to build a MAX_SKB_FRAGS + 1 SKB which doesn't work out so well. Fix from Jan Beulich with help and discussion with several others. 9) Fix a similar problem in tun and macvtap, which have to split up scatter-gather elements at PAGE_SIZE boundaries. Don't do zerocopy if it would result in a > MAX_SKB_FRAGS skb. Fixes from Jason Wang. 10) On receive, once we've decoded the VLAN state completely, clear skb->vlan_tci. Otherwise demuxed tunnels underneath can trigger the VLAN code again, corrupting the packet. Fix from Eric Dumazet" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: vlan: fix a race in egress prio management vlan: mask vlan prio bits macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS pkt_sched: sch_qfq: remove a source of high packet delay/jitter xen-netfront: pull on receive skb may need to happen earlier vxlan: add necessary locking on device removal hyperv: Fix the NETIF_F_SG flag setting in netvsc net: Fix sysfs_format_mac() code duplication. be2net: Fix to avoid hardware workaround when not needed macvtap: do not assume 802.1Q when send vlan packets macvtap: fix the missing ret value of TUNSETQUEUE ipv4: set transport header earlier mlx5 core: Fix __udivdi3 when compiling for 32 bit arches bgmac: add dependency to phylib net/irda: fixed style issues in irlan_eth ethtool: fixed trailing statements in ethtool ndisc: bool initializations should use true and false atl1e: unmap partially mapped skb on dma error and free skb
2013-07-18vlan: mask vlan prio bitsEric Dumazet
In commit 48cc32d38a52d0b68f91a171a8d00531edc6a46e ("vlan: don't deliver frames for unknown vlans to protocols") Florian made sure we set pkt_type to PACKET_OTHERHOST if the vlan id is set and we could find a vlan device for this particular id. But we also have a problem if prio bits are set. Steinar reported an issue on a router receiving IPv6 frames with a vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device, because skb->vlan_tci is set. Forwarded frame is completely corrupted : We can see (8100:4000) being inserted in the middle of IPv6 source address : 16:48:00.780413 IP6 2001:16d8:8100:4000:ee1c:0:9d9:bc87 > 9f94:4d95:2001:67c:29f4::: ICMP6, unknown icmp6 type (0), length 64 0x0000: 0000 0029 8000 c7c3 7103 0001 a0ae e651 0x0010: 0000 0000 ccce 0b00 0000 0000 1011 1213 0x0020: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 0x0030: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 It seems we are not really ready to properly cope with this right now. We can probably do better in future kernels : vlan_get_ingress_priority() should be a netdev property instead of a per vlan_dev one. For stable kernels, lets clear vlan_tci to fix the bugs. Reported-by: Steinar H. Gunderson <sesse@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-16ethtool: fixed trailing statements in ethtoolDragos Foianu
Applied fixes suggested by checkpatch.pl. Signed-off-by: Dragos Foianu <dragos.foianu@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-14net: delete __cpuinit usage from all net filesPaul Gortmaker
The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. This removes all the net/* uses of the __cpuinit macros from all C files. [1] https://lkml.org/lkml/2013/5/20/589 Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2013-07-12net: access page->private by using page_privateSunghan Suh
Signed-off-by: Sunghan Suh <sunghan.suh@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-11gso: Update tunnel segmentation to support Tx checksum offloadAlexander Duyck
This change makes it so that the GRE and VXLAN tunnels can make use of Tx checksum offload support provided by some drivers via the hw_enc_features. Without this fix enabling GSO means sacrificing Tx checksum offload and this actually leads to a performance regression as shown below: Utilization Send Throughput local GSO 10^6bits/s % S state 6276.51 8.39 enabled 7123.52 8.42 disabled To resolve this it was necessary to address two items. First netif_skb_features needed to be updated so that it would correctly handle the Trans Ether Bridging protocol without impacting the need to check for Q-in-Q tagging. To do this it was necessary to update harmonize_features so that it used skb_network_protocol instead of just using the outer protocol. Second it was necessary to update the GRE and UDP tunnel segmentation offloads so that they would reset the encapsulation bit and inner header offsets after the offload was complete. As a result of this change I have seen the following results on a interface with Tx checksum enabled for encapsulated frames: Utilization Send Throughput local GSO 10^6bits/s % S state 7123.52 8.42 disabled 8321.75 5.43 enabled v2: Instead of replacing refrence to skb->protocol with skb_network_protocol just replace the protocol reference in harmonize_features to allow for double VLAN tag checks. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-11net: rename busy poll socket op and globalsEliezer Tamir
Rename LL_SO to BUSY_POLL_SO Rename sysctl_net_ll_{read,poll} to sysctl_busy_{read,poll} Fix up users of these variables. Fix documentation for sysctl. a patch for the socket.7 man page will follow separately, because of limitations of my mail setup. Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-07-11net: rename include/net/ll_poll.h to include/net/busy_poll.hEliezer Tamir
Rename the file and correct all the places where it is included. Signed-off-by: Eliezer Tamir <eliezer.tamir@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>