summaryrefslogtreecommitdiff
path: root/drivers/net/tun.c
AgeCommit message (Collapse)Author
2014-12-09tun: move internal flag defines out of uapiMichael S. Tsirkin
TUN_ flags are internal and never exposed to userspace. Any application using it is almost certainly buggy. Move them out to tun.c. Note: we remove these completely in follow-up patches, this code movement is split out for ease of review. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2014-12-09Merge branch 'iov_iter' into for-nextAl Viro
2014-12-06tun/macvtap: use consume_skb() instead of kfree_skb() when neededJason Wang
To be more friendly with drop monitor, we should only call kfree_skb() when the packets were dropped and use consume_skb() in other cases. Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-03tun: Fix GSO meta-data handling in tun_get_userHerbert Xu
When we write the GSO meta-data in tun_get_user we end up advancing the IO vector twice, thus exhausting the user buffer before we can finish writing the packet. Fixes: f5ff53b4d97c ("{macvtap,tun}_get_user(): switch to iov_iter") Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-24{macvtap,tun}_get_user(): switch to iov_iterAl Viro
allows to switch macvtap and tun from ->aio_write() to ->write_iter() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-24switch drivers/net/tun.c to ->read_iter()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19tun: return NET_XMIT_DROP for dropped packetsJason Wang
After commit 5d097109257c03a71845729f8db6b5770c4bbedc ("tun: only queue packets on device"), NETDEV_TX_OK was returned for dropped packets. This will confuse pktgen since dropped packets were counted as sent ones. Fixing this by returning NET_XMIT_DROP to let pktgen count it as error packet. Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-13tun: fix issues of iovec iterators using in tun_put_user()Jason Wang
This patch fixes two issues after using iovec iterators: - vlan_offset should be initialized to zero, otherwise unexpected offset will be used in skb_copy_datagram_iter() - advance iovec iterator when vnet_hdr_sz is greater than sizeof(gso), this is the case when mergeable rx buffer were enabled for a virt guest. Fixes e0b46d0ee9c240c7430a47e9b0365674d4a04522 ("tun: Use iovec iterators") Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-07tun: Use iovec iteratorsHerbert Xu
This patch removes the use of skb_copy_datagram_const_iovec in favour of the iovec iterator-based skb_copy_datagram_iter. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-05fs: Convert show_fdinfo functions to voidJoe Perches
seq_printf functions shouldn't really check the return value. Checking seq_has_overflowed() occasionally is used instead. Update vfs documentation. Link: http://lkml.kernel.org/p/e37e6e7b76acbdcc3bb4ab2a57c8f8ca1ae11b9a.1412031505.git.joe@perches.com Cc: David S. Miller <davem@davemloft.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Joe Perches <joe@perches.com> [ did a few clean ups ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-03tun: Fix TUN_PKT_STRIP settingHerbert Xu
We set the flag TUN_PKT_STRIP if the user buffer provided is too small to contain the entire packet plus meta-data. However, this has been broken ever since we added GSO meta-data. VLAN acceleration also has the same problem. This patch fixes this by taking both into account when setting the TUN_PKT_STRIP flag. The fact that this has been broken for six years without anyone realising means that nobody actually uses this flag. Fixes: f43798c27684 ("tun: Allow GSO using virtio_net_hdr") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-11-03tun: Fix csum_start with VLAN accelerationHerbert Xu
When VLAN acceleration is in use on the xmit path, we end up setting csum_start to the wrong place. The result is that the whoever ends up doing the checksum setting will corrupt the packet instead of writing the checksum to the expected location, usually this means writing the checksum with an offset of -4. This patch fixes this by adjusting csum_start when VLAN acceleration is detected. Fixes: 6680ec68eff4 ("tuntap: hardware vlan tx support") Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packetsBen Hutchings
UFO is now disabled on all drivers that work with virtio net headers, but userland may try to send UFO/IPv6 packets anyway. Instead of sending with ID=0, we should select identifiers on their behalf (as we used to). Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data") Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-31drivers/net: Disable UFO through virtioBen Hutchings
IPv6 does not allow fragmentation by routers, so there is no fragmentation ID in the fixed header. UFO for IPv6 requires the ID to be passed separately, but there is no provision for this in the virtio net protocol. Until recently our software implementation of UFO/IPv6 generated a new ID, but this was a bug. Now we will use ID=0 for any UFO/IPv6 packet passed through a tap, which is even worse. Unfortunately there is no distinction between UFO/IPv4 and v6 features, so disable UFO on taps and virtio_net completely until we have a proper solution. We cannot depend on VM managers respecting the tap feature flags, so keep accepting UFO packets but log a warning the first time we do this. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data") Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09security: make security_file_set_fowner, f_setown and __f_setown void returnJeff Layton
security_file_set_fowner always returns 0, so make it f_setown and __f_setown void return functions and fix up the error handling in the callers. Cc: linux-security-module@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2014-07-15net: set name_assign_type in alloc_netdev()Tom Gundersen
Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert all users to pass NET_NAME_UNKNOWN. Coccinelle patch: @@ expression sizeof_priv, name, setup, txqs, rxqs, count; @@ ( -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs) +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs) | -alloc_netdev_mq(sizeof_priv, name, setup, count) +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count) | -alloc_netdev(sizeof_priv, name, setup) +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup) ) v9: move comments here from the wrong commit Signed-off-by: Tom Gundersen <teg@jklm.no> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-21net-tun: restructure tun_do_read for better sleep/wakeup efficiencyXi Wang
tun_do_read always adds current thread to wait queue, even if a packet is ready to read. This is inefficient because both sleeper and waker want to acquire the wait queue spin lock when packet rate is high. We restructure the read function and use common kernel networking routines to handle receive, sleep and wakeup. With the change available packets are checked first before the reading thread is added to the wait queue. Ran performance tests with the following configuration: - my packet generator -> tap1 -> br0 -> tap0 -> my packet consumer - sender pinned to one core and receiver pinned to another core - sender send small UDP packets (64 bytes total) as fast as it can - sandy bridge cores - throughput are receiver side goodput numbers The results are baseline: 731k pkts/sec, cpu utilization at 1.50 cpus changed: 783k pkts/sec, cpu utilization at 1.53 cpus The performance difference is largely determined by packet rate and inter-cpu communication cost. For example, if the sender and receiver are pinned to different cpu sockets, the results are baseline: 558k pkts/sec, cpu utilization at 1.71 cpus changed: 690k pkts/sec, cpu utilization at 1.67 cpus Co-authored-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Xi Wang <xii@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-27drivers/net: Use RCU_INIT_POINTER(x, NULL) in tun.cMonam Agarwal
This patch replaces rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL) The rcu_assign_pointer() ensures that the initialization of a structure is carried out before storing a pointer to that structure. And in the case of the NULL pointer, there is no structure to initialize. So, rcu_assign_pointer(p, NULL) can be safely converted to RCU_INIT_POINTER(p, NULL) Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-20tun: remove bogus hardware vlan acceleration flags from vlan_featuresFernando Luis Vazquez Cao
Even though only the outer vlan tag can be HW accelerated in the transmission path, in the TUN/TAP driver vlan_features mirrors hw_features, which happens to have the NETIF_F_HW_VLAN_?TAG_TX flags set. Because of this, during packet tranmisssion through a stacked vlan device dev_hard_start_xmit, (incorrectly) assuming that the vlan device supports hardware vlan acceleration, does not add the vlan header to the skb payload and the inner vlan tags are lost (vlan_tci contains the outer vlan tag when userspace reads the packet from the tap device). Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17netdevice: add queue selection fallback handler for ndo_select_queueDaniel Borkmann
Add a new argument for ndo_select_queue() callback that passes a fallback handler. This gets invoked through netdev_pick_tx(); fallback handler is currently __netdev_pick_tx() as most drivers invoke this function within their customized implementation in case for skbs that don't need any special handling. This fallback handler can then be replaced on other call-sites with different queue selection methods (e.g. in packet sockets, pktgen etc). This also has the nice side-effect that __netdev_pick_tx() is then only invoked from netdev_pick_tx() and export of that function to modules can be undone. Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-29tun: add device name(iff) field to proc fdinfo entryMasatake YAMATO
A file descriptor opened for /dev/net/tun and a tun device are connected with ioctl. Though understanding the connection is important for trouble shooting, no way is given to a user to know the connected device for a given file descriptor at userland. This patch adds a new fdinfo field for the device name connected to a file descriptor opened for /dev/net/tun. Here is an example of the field: # lsof | grep tun qemu-syst 4565 qemu 25u CHR 10,200 0t138 12921 /dev/net/tun ... # cat /proc/4565/fdinfo/25 pos: 138 flags: 0104002 iff: vnet0 # ip link show dev vnet0 8: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ... changelog: v2: indent iff just like the other fdinfo fields are. v3: remove unused variable. Both are suggested by David Miller <davem@davemloft.net>. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-23tuntap: Fix for a race in accessing numqueuesDominic Curran
A patch for fixing a race between queue selection and changing queues was introduced in commit 92bb73ea2("tuntap: fix a possible race between queue selection and changing queues"). The fix was to prevent the driver from re-reading the tun->numqueues more than once within tun_select_queue() using ACCESS_ONCE(). We have been experiancing 'Divide-by-zero' errors in tun_net_xmit() since we moved from 3.6 to 3.10, and believe that they come from a simular source where the value of tun->numqueues changes to zero between the first and a subsequent read of tun->numqueues. The fix is a simular use of ACCESS_ONCE(), as well as a multiply instead of a divide in the if statement. Signed-off-by: Dominic Curran <dominic.curran@citrix.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Maxim Krasnyansky <maxk@qti.qualcomm.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Max Krasnyansky <maxk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2014-01-10net: core: explicitly select a txq before doing l2 forwardingJason Wang
Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The will cause several issues: - NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan instead of lower device which misses the necessary txq synchronization for lower device such as txq stopping or frozen required by dev watchdog or control path. - dev_hard_start_xmit() was called with NULL txq which bypasses the net device watchdog. - dev_hard_start_xmit() does not check txq everywhere which will lead a crash when tso is disabled for lower device. Fix this by explicitly introducing a new param for .ndo_select_queue() for just selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also extended to accept this parameter and dev_queue_xmit_accel() was used to do l2 forwarding transmission. With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of dev_queue_xmit() to do the transmission. In the future, it was also required for macvtap l2 forwarding support since it provides a necessary synchronization method. Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: e1000-devel@lists.sourceforge.net Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-02tun, rfs: fix the incorrect hash valueZhi Yong Wu
The code incorrectly save the queue index as the hash, so this patch is fixing it with the hash received in the stack receive path. Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-31tun: Add support for RFS on tun flowsTom Herbert
This patch adds support so that the rps_flow_tables (RFS) can be programmed using the tun flows which are already set up to track flows for the purposes of queue selection. On the receive path (corresponding to select_queue and tun_net_xmit) the rxhash is saved in the flow_entry. The original code only does flow lookup in select_queue, so this patch adds a flow lookup in tun_net_xmit if num_queues == 1 (select_queue is not called from dev_queue_xmit->netdev_pick_tx in that case). The flow is recorded (processing CPU) in tun_flow_update (TX path), and reset when flow is deleted. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/intel/i40e/i40e_main.c drivers/net/macvtap.c Both minor merge hassles, simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17net: Change skb_get_rxhash to skb_get_hashTom Herbert
Changing name of function as part of making the hash in skbuff to be generic property, not just for receive path. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11tun: unbreak truncated packet signallingJason Wang
Commit 6680ec68eff47d36f67b4351bc9836fd6cba9532 (tuntap: hardware vlan tx support) breaks the truncated packet signal by nev return a length greater than iov length in tun_put_user(). This patch fixes by always return the length of packet plus possible vlan header. Caller can detect the truncated packet by comparing the return value and the size of io length. Cc: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11net: Revert macvtap/tun truncation signalling changes.David S. Miller
Jason Wang and Michael S. Tsirkin are still discussing how to properly fix this. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11tun: unbreak truncated packet signallingJason Wang
Commit 6680ec68eff47d36f67b4351bc9836fd6cba9532 (tuntap: hardware vlan tx support) breaks the truncated packet signal by never return a length greater than iov length in tun_put_user(). This patch fixes this by always return the length of packet plus possible vlan header. Caller can detect the truncated packet by comparing the return value and the size of iov length. Reported-by: Vlad Yasevich <vyasevich@gmail.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-11Revert "tun: remove useless codes in tun_chr_aio_read() and tun_recvmsg()"David S. Miller
This reverts commit 73713357ab58aacda1af715bb5a623528dbbfd79. MSG_TRUNC handling was broken and is going to be fixed in the 'net' tree, so revert this. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10tun: remove useless codes in tun_chr_aio_read() and tun_recvmsg()Zhi Yong Wu
By checking related codes, it is impossible that ret > len or total_len, so we should remove some useless codes in both above functions. Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Merge 'net' into 'net-next' to get the AF_PACKET bug fix that Daniel's direct transmit changes depend upon. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06tun: remove unused parameter in tun_do_read()Zhi Yong Wu
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06tun: spelling fixesstephen hemminger
Fix spelling errors in tun driver. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-06tun: update file current positionZhi Yong Wu
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-14tuntap: limit head length of skb allocatedJason Wang
We currently use hdr_len as a hint of head length which is advertised by guest. But when guest advertise a very big value, it can lead to an 64K+ allocating of kmalloc() which has a very high possibility of failure when host memory is fragmented or under heavy stress. The huge hdr_len also reduce the effect of zerocopy or even disable if a gso skb is linearized in guest. To solves those issues, this patch introduces an upper limit (PAGE_SIZE) of the head, which guarantees an order 0 allocation each time. Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-08tun: don't look at current when non-blockingMichael S. Tsirkin
We play with a wait queue even if socket is non blocking. This is an obvious waste. Besides, it will prevent calling the non blocking variant when current is not valid. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-12tuntap: correctly handle error in tun_set_iff()Jason Wang
Commit c8d68e6be1c3b242f1c598595830890b65cea64a (tuntap: multiqueue support) only call free_netdev() on error in tun_set_iff(). This causes several issues: - memory of tun security were leaked - use after free since the flow gc timer was not deleted and the tfile were not detached This patch solves the above issues. Reported-by: Wannes Rombouts <wannes.rombouts@epitech.eu> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-05tuntap: orphan frags before trying to set tx timestampJason Wang
sock_tx_timestamp() will clear all zerocopy flags of skb which may lead the frags never to be orphaned. This will break guest to guest traffic when zerocopy is enabled. Fix this by orphaning the frags before trying to set tx time stamp. The issue were introduced by commit eda297729171fe16bf34fe5b0419dfb69060f623 (tun: Support software transmit time stamping). Cc: Richard Cochran <richardcochran@gmail.com> Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-05tuntap: purge socket error queue on detachJason Wang
Commit eda297729171fe16bf34fe5b0419dfb69060f623 (tun: Support software transmit time stamping) will queue skbs into error queue when tx stamping is enabled. But it forgets to purge the error queue during detach. This patch fixes this. Cc: Richard Cochran <richardcochran@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-21tun: Get skfilter layoutPavel Emelyanov
The only thing we may have from tun device is the fprog, whic contains the number of filter elements and a pointer to (user-space) memory where the elements are. The program itself may not be available if the device is persistent and detached. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-21tun: Allow to skip filter on attachPavel Emelyanov
There's a small problem with sk-filters on tun devices. Consider an application doing this sequence of steps: fd = open("/dev/net/tun"); ioctl(fd, TUNSETIFF, { .ifr_name = "tun0" }); ioctl(fd, TUNATTACHFILTER, &my_filter); ioctl(fd, TUNSETPERSIST, 1); close(fd); At that point the tun0 will remain in the system and will keep in mind that there should be a socket filter at address '&my_filter'. If after that we do fd = open("/dev/net/tun"); ioctl(fd, TUNSETIFF, { .ifr_name = "tun0" }); we most likely receive the -EFAULT error, since tun_attach() would try to connect the filter back. But (!) if we provide a filter at address &my_filter, then tun0 will be created and the "new" filter would be attached, but application may not know about that. This may create certain problems to anyone using tun-s, but it's critical problem for c/r -- if we meet a persistent tun device with a filter in mind, we will not be able to attach to it to dump its state (flags, owner, address, vnethdr size, etc.). The proposal is to allow to attach to tun device (with TUNSETIFF) w/o attaching the filter to the tun-file's socket. After this attach app may e.g clean the device by dropping the filter, it doesn't want to have one, or (in case of c/r) get information about the device with tun ioctls. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-21tun: Report whether the queue is attached or notPavel Emelyanov
Multiqueue tun devices allow to attach and detach from its queues while keeping the interface itself set on file. Knowing this is critical for the checkpoint part of criu project. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-21tun: Add ability to create tun device with given indexPavel Emelyanov
Tun devices cannot be created with ifidex user wants, but it's required by checkpoint-restore project. Long time ago such ability was implemented for rtnl_ops-based interface for creating links (9c7dafbf net: Allow to create links with given ifindex), but the only API for creating and managing tuntap devices is ioctl-based and is evolving with adding new ones (cde8b15f tuntap: add ioctl to attach or detach a file form tuntap device). Following that trend, here's how a new ioctl that sets the ifindex for device, that _will_ be created by TUNSETIFF ioctl looks like. So those who want a tuntap device with the ifindex N, should open the tun device, call ioctl(fd, TUNSETIFINDEX, &N), then call TUNSETIFF. If the index N is busy, then the register_netdev will find this out and the ioctl would be failed with -EBUSY. If setifindex is not called, then it will be generated as before. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2013-08-15tun: signedness bug in tun_get_user()Dan Carpenter
The recent fix d9bf5f1309 "tun: compare with 0 instead of total_len" is not totally correct. Because "len" and "sizeof()" are size_t type, that means they are never less than zero. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-14tun: compare with 0 instead of total_lenWeiping Pan
Since we set "len = total_len" in the beginning of tun_get_user(), so we should compare the new len with 0, instead of total_len, or the if statement always returns false. Signed-off-by: Weiping Pan <wpan@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-10net: attempt high order allocations in sock_alloc_send_pskb()Eric Dumazet
Adding paged frags skbs to af_unix sockets introduced a performance regression on large sends because of additional page allocations, even if each skb could carry at least 100% more payload than before. We can instruct sock_alloc_send_pskb() to attempt high order allocations. Most of the time, it does a single page allocation instead of 8. I added an additional parameter to sock_alloc_send_pskb() to let other users to opt-in for this new feature on followup patches. Tested: Before patch : $ netperf -t STREAM_STREAM STREAM STREAM TEST Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 2304 212992 212992 10.00 46861.15 After patch : $ netperf -t STREAM_STREAM STREAM STREAM TEST Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 2304 212992 212992 10.00 57981.11 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>