summaryrefslogtreecommitdiff
path: root/Documentation/networking
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/networking')
-rw-r--r--Documentation/networking/.gitignore1
-rw-r--r--Documentation/networking/00-INDEX6
-rw-r--r--Documentation/networking/Makefile5
-rw-r--r--Documentation/networking/arcnet.txt7
-rw-r--r--Documentation/networking/batman-adv.txt54
-rw-r--r--Documentation/networking/bonding.txt160
-rw-r--r--Documentation/networking/can.txt217
-rw-r--r--Documentation/networking/dccp.txt4
-rw-r--r--Documentation/networking/e100.txt6
-rw-r--r--Documentation/networking/e1000.txt12
-rw-r--r--Documentation/networking/e1000e.txt16
-rw-r--r--Documentation/networking/i40e.txt115
-rw-r--r--Documentation/networking/ieee802154.txt6
-rw-r--r--Documentation/networking/ifenslave.c1105
-rw-r--r--Documentation/networking/igb.txt67
-rw-r--r--Documentation/networking/igbvf.txt8
-rw-r--r--Documentation/networking/ip-sysctl.txt86
-rw-r--r--Documentation/networking/ipvs-sysctl.txt13
-rw-r--r--Documentation/networking/ixgb.txt14
-rw-r--r--Documentation/networking/ixgbe.txt109
-rw-r--r--Documentation/networking/ixgbevf.txt6
-rw-r--r--Documentation/networking/l2tp.txt2
-rw-r--r--Documentation/networking/netdev-FAQ.txt224
-rw-r--r--Documentation/networking/netdevices.txt10
-rw-r--r--Documentation/networking/netlink_mmap.txt24
-rw-r--r--Documentation/networking/openvswitch.txt40
-rw-r--r--Documentation/networking/operstates.txt4
-rw-r--r--Documentation/networking/packet_mmap.txt141
-rw-r--r--Documentation/networking/rxrpc.txt2
-rw-r--r--Documentation/networking/scaling.txt58
-rw-r--r--Documentation/networking/sctp.txt5
-rw-r--r--Documentation/networking/stmmac.txt9
-rw-r--r--Documentation/networking/tproxy.txt5
-rw-r--r--Documentation/networking/vortex.txt6
-rw-r--r--Documentation/networking/x25-iface.txt2
35 files changed, 1102 insertions, 1447 deletions
diff --git a/Documentation/networking/.gitignore b/Documentation/networking/.gitignore
index 286a568..e69de29 100644
--- a/Documentation/networking/.gitignore
+++ b/Documentation/networking/.gitignore
@@ -1 +0,0 @@
-ifenslave
diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX
index 258d9b9..f11580f 100644
--- a/Documentation/networking/00-INDEX
+++ b/Documentation/networking/00-INDEX
@@ -86,10 +86,10 @@ generic_netlink.txt
- info on Generic Netlink
gianfar.txt
- Gianfar Ethernet Driver.
+i40e.txt
+ - README for the Intel Ethernet Controller XL710 Driver (i40e).
ieee802154.txt
- Linux IEEE 802.15.4 implementation, API and drivers
-ifenslave.c
- - Configure network interfaces for parallel routing (bonding).
igb.txt
- README for the Intel Gigabit Ethernet Driver (igb).
igbvf.txt
@@ -126,6 +126,8 @@ multiqueue.txt
- HOWTO for multiqueue network device support.
netconsole.txt
- The network console module netconsole.ko: configuration and notes.
+netdev-FAQ.txt
+ - FAQ describing how to submit net changes to netdev mailing list.
netdev-features.txt
- Network interface features API description.
netdevices.txt
diff --git a/Documentation/networking/Makefile b/Documentation/networking/Makefile
index 24c308d..0aa1ac9 100644
--- a/Documentation/networking/Makefile
+++ b/Documentation/networking/Makefile
@@ -1,11 +1,6 @@
# kbuild trick to avoid linker error. Can be omitted if a module is built.
obj- := dummy.o
-# List of programs to build
-hostprogs-y := ifenslave
-
-HOSTCFLAGS_ifenslave.o += -I$(objtree)/usr/include
-
# Tell kbuild to always build the programs
always := $(hostprogs-y)
diff --git a/Documentation/networking/arcnet.txt b/Documentation/networking/arcnet.txt
index 9ff5795..aff97f4 100644
--- a/Documentation/networking/arcnet.txt
+++ b/Documentation/networking/arcnet.txt
@@ -70,9 +70,10 @@ list, mail to linux-arcnet@tichy.ch.uj.edu.pl.
There are archives of the mailing list at:
http://epistolary.org/mailman/listinfo.cgi/arcnet
-The people on linux-net@vger.kernel.org have also been known to be very
-helpful, especially when we're talking about ALPHA Linux kernels that may or
-may not work right in the first place.
+The people on linux-net@vger.kernel.org (now defunct, replaced by
+netdev@vger.kernel.org) have also been known to be very helpful, especially
+when we're talking about ALPHA Linux kernels that may or may not work right
+in the first place.
Other Drivers and Info
diff --git a/Documentation/networking/batman-adv.txt b/Documentation/networking/batman-adv.txt
index c1d8204..89490beb 100644
--- a/Documentation/networking/batman-adv.txt
+++ b/Documentation/networking/batman-adv.txt
@@ -69,8 +69,7 @@ folder:
# aggregated_ogms gw_bandwidth log_level
# ap_isolation gw_mode orig_interval
# bonding gw_sel_class routing_algo
-# bridge_loop_avoidance hop_penalty vis_mode
-# fragmentation
+# bridge_loop_avoidance hop_penalty fragmentation
There is a special folder for debugging information:
@@ -78,7 +77,7 @@ There is a special folder for debugging information:
# ls /sys/kernel/debug/batman_adv/bat0/
# bla_backbone_table log transtable_global
# bla_claim_table originators transtable_local
-# gateways socket vis_data
+# gateways socket
Some of the files contain all sort of status information regard-
ing the mesh network. For example, you can view the table of
@@ -127,51 +126,6 @@ ously assigned to interfaces now used by batman advanced, e.g.
# ifconfig eth0 0.0.0.0
-VISUALIZATION
--------------
-
-If you want topology visualization, at least one mesh node must
-be configured as VIS-server:
-
-# echo "server" > /sys/class/net/bat0/mesh/vis_mode
-
-Each node is either configured as "server" or as "client" (de-
-fault: "client"). Clients send their topology data to the server
-next to them, and server synchronize with other servers. If there
-is no server configured (default) within the mesh, no topology
-information will be transmitted. With these "synchronizing
-servers", there can be 1 or more vis servers sharing the same (or
-at least very similar) data.
-
-When configured as server, you can get a topology snapshot of
-your mesh:
-
-# cat /sys/kernel/debug/batman_adv/bat0/vis_data
-
-This raw output is intended to be easily parsable and convertable
-with other tools. Have a look at the batctl README if you want a
-vis output in dot or json format for instance and how those out-
-puts could then be visualised in an image.
-
-The raw format consists of comma separated values per entry where
-each entry is giving information about a certain source inter-
-face. Each entry can/has to have the following values:
--> "mac" - mac address of an originator's source interface
- (each line begins with it)
--> "TQ mac value" - src mac's link quality towards mac address
- of a neighbor originator's interface which
- is being used for routing
--> "TT mac" - TT announced by source mac
--> "PRIMARY" - this is a primary interface
--> "SEC mac" - secondary mac address of source
- (requires preceding PRIMARY)
-
-The TQ value has a range from 4 to 255 with 255 being the best.
-The TT entries are showing which hosts are connected to the mesh
-via bat0 or being bridged into the mesh network. The PRIMARY/SEC
-values are only applied on primary interfaces
-
-
LOGGING/DEBUGGING
-----------------
@@ -245,5 +199,5 @@ Mailing-list: b.a.t.m.a.n@open-mesh.org (optional subscription
You can also contact the Authors:
-Marek Lindner <lindner_marek@yahoo.de>
-Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
+Marek Lindner <mareklindner@neomailbox.ch>
+Simon Wunderlich <sw@simonwunderlich.de>
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 10a015c..2cdb8b6 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -104,8 +104,7 @@ Table of Contents
==============================
Most popular distro kernels ship with the bonding driver
-already available as a module and the ifenslave user level control
-program installed and ready for use. If your distro does not, or you
+already available as a module. If your distro does not, or you
have need to compile bonding from source (e.g., configuring and
installing a mainline kernel from kernel.org), you'll need to perform
the following steps:
@@ -124,46 +123,13 @@ device support" section. It is recommended that you configure the
driver as module since it is currently the only way to pass parameters
to the driver or configure more than one bonding device.
- Build and install the new kernel and modules, then continue
-below to install ifenslave.
+ Build and install the new kernel and modules.
-1.2 Install ifenslave Control Utility
+1.2 Bonding Control Utility
-------------------------------------
- The ifenslave user level control program is included in the
-kernel source tree, in the file Documentation/networking/ifenslave.c.
-It is generally recommended that you use the ifenslave that
-corresponds to the kernel that you are using (either from the same
-source tree or supplied with the distro), however, ifenslave
-executables from older kernels should function (but features newer
-than the ifenslave release are not supported). Running an ifenslave
-that is newer than the kernel is not supported, and may or may not
-work.
-
- To install ifenslave, do the following:
-
-# gcc -Wall -O -I/usr/src/linux/include ifenslave.c -o ifenslave
-# cp ifenslave /sbin/ifenslave
-
- If your kernel source is not in "/usr/src/linux," then replace
-"/usr/src/linux/include" in the above with the location of your kernel
-source include directory.
-
- You may wish to back up any existing /sbin/ifenslave, or, for
-testing or informal use, tag the ifenslave to the kernel version
-(e.g., name the ifenslave executable /sbin/ifenslave-2.6.10).
-
-IMPORTANT NOTE:
-
- If you omit the "-I" or specify an incorrect directory, you
-may end up with an ifenslave that is incompatible with the kernel
-you're trying to build it for. Some distros (e.g., Red Hat from 7.1
-onwards) do not have /usr/include/linux symbolically linked to the
-default kernel source include directory.
-
-SECOND IMPORTANT NOTE:
- If you plan to configure bonding using sysfs or using the
-/etc/network/interfaces file, you do not need to use ifenslave.
+ It is recommended to configure bonding via iproute2 (netlink)
+or sysfs, the old ifenslave control utility is obsolete.
2. Bonding Driver Options
=========================
@@ -337,6 +303,12 @@ arp_validate
such a situation, validation of backup slaves must be
disabled.
+ The validation of ARP requests on backup slaves is mainly
+ helping bonding to decide which slaves are more likely to
+ work in case of the active slave failure, it doesn't really
+ guarantee that the backup slave will work if it's selected
+ as the next active slave.
+
This option is useful in network configurations in which
multiple bonding hosts are concurrently issuing ARPs to one or
more targets beyond a common switch. Should the link between
@@ -349,6 +321,25 @@ arp_validate
This option was added in bonding version 3.1.0.
+arp_all_targets
+
+ Specifies the quantity of arp_ip_targets that must be reachable
+ in order for the ARP monitor to consider a slave as being up.
+ This option affects only active-backup mode for slaves with
+ arp_validation enabled.
+
+ Possible values are:
+
+ any or 0
+
+ consider the slave up only when any of the arp_ip_targets
+ is reachable
+
+ all or 1
+
+ consider the slave up only when all of the arp_ip_targets
+ are reachable
+
downdelay
Specifies the time, in milliseconds, to wait before disabling
@@ -648,6 +639,15 @@ num_unsol_na
are generated by the ipv4 and ipv6 code and the numbers of
repetitions cannot be set independently.
+packets_per_slave
+
+ Specify the number of packets to transmit through a slave before
+ moving to the next one. When set to 0 then a slave is chosen at
+ random.
+
+ The valid range is 0 - 65535; the default value is 1. This option
+ has effect only in balance-rr mode.
+
primary
A string (eth0, eth2, etc) specifying which slave is the
@@ -752,21 +752,16 @@ xmit_hash_policy
protocol information to generate the hash.
Uses XOR of hardware MAC addresses and IP addresses to
- generate the hash. The IPv4 formula is
+ generate the hash. The formula is
- (((source IP XOR dest IP) AND 0xffff) XOR
- ( source MAC XOR destination MAC ))
- modulo slave count
+ hash = source MAC XOR destination MAC
+ hash = hash XOR source IP XOR destination IP
+ hash = hash XOR (hash RSHIFT 16)
+ hash = hash XOR (hash RSHIFT 8)
+ And then hash is reduced modulo slave count.
- The IPv6 formula is
-
- hash = (source ip quad 2 XOR dest IP quad 2) XOR
- (source ip quad 3 XOR dest IP quad 3) XOR
- (source ip quad 4 XOR dest IP quad 4)
-
- (((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
- XOR (source MAC XOR destination MAC))
- modulo slave count
+ If the protocol is IPv6 then the source and destination
+ addresses are first hashed using ipv6_addr_hash.
This algorithm will place all traffic to a particular
network peer on the same slave. For non-IP traffic,
@@ -788,21 +783,16 @@ xmit_hash_policy
slaves, although a single connection will not span
multiple slaves.
- The formula for unfragmented IPv4 TCP and UDP packets is
-
- ((source port XOR dest port) XOR
- ((source IP XOR dest IP) AND 0xffff)
- modulo slave count
+ The formula for unfragmented TCP and UDP packets is
- The formula for unfragmented IPv6 TCP and UDP packets is
+ hash = source port, destination port (as in the header)
+ hash = hash XOR source IP XOR destination IP
+ hash = hash XOR (hash RSHIFT 16)
+ hash = hash XOR (hash RSHIFT 8)
+ And then hash is reduced modulo slave count.
- hash = (source port XOR dest port) XOR
- ((source ip quad 2 XOR dest IP quad 2) XOR
- (source ip quad 3 XOR dest IP quad 3) XOR
- (source ip quad 4 XOR dest IP quad 4))
-
- ((hash >> 24) XOR (hash >> 16) XOR (hash >> 8) XOR hash)
- modulo slave count
+ If the protocol is IPv6 then the source and destination
+ addresses are first hashed using ipv6_addr_hash.
For fragmented TCP or UDP packets and all other IPv4 and
IPv6 protocol traffic, the source and destination port
@@ -810,10 +800,6 @@ xmit_hash_policy
formula is the same as for the layer2 transmit hash
policy.
- The IPv4 policy is intended to mimic the behavior of
- certain switches, notably Cisco switches with PFC2 as
- well as some Foundry and IBM products.
-
This algorithm is not fully 802.3ad compliant. A
single TCP or UDP conversation containing both
fragmented and unfragmented packets will see packets
@@ -824,6 +810,26 @@ xmit_hash_policy
conversations. Other implementations of 802.3ad may
or may not tolerate this noncompliance.
+ encap2+3
+
+ This policy uses the same formula as layer2+3 but it
+ relies on skb_flow_dissect to obtain the header fields
+ which might result in the use of inner headers if an
+ encapsulation protocol is used. For example this will
+ improve the performance for tunnel users because the
+ packets will be distributed according to the encapsulated
+ flows.
+
+ encap3+4
+
+ This policy uses the same formula as layer3+4 but it
+ relies on skb_flow_dissect to obtain the header fields
+ which might result in the use of inner headers if an
+ encapsulation protocol is used. For example this will
+ improve the performance for tunnel users because the
+ packets will be distributed according to the encapsulated
+ flows.
+
The default value is layer2. This option was added in bonding
version 2.6.3. In earlier versions of bonding, this parameter
does not exist, and the layer2 policy is the only policy. The
@@ -851,7 +857,7 @@ resend_igmp
==============================
You can configure bonding using either your distro's network
-initialization scripts, or manually using either ifenslave or the
+initialization scripts, or manually using either iproute2 or the
sysfs interface. Distros generally use one of three packages for the
network initialization scripts: initscripts, sysconfig or interfaces.
Recent versions of these packages have support for bonding, while older
@@ -1160,7 +1166,7 @@ not support this method for specifying multiple bonding interfaces; for
those instances, see the "Configuring Multiple Bonds Manually" section,
below.
-3.3 Configuring Bonding Manually with Ifenslave
+3.3 Configuring Bonding Manually with iproute2
-----------------------------------------------
This section applies to distros whose network initialization
@@ -1171,7 +1177,7 @@ version 8.
The general method for these systems is to place the bonding
module parameters into a config file in /etc/modprobe.d/ (as
appropriate for the installed distro), then add modprobe and/or
-ifenslave commands to the system's global init script. The name of
+`ip link` commands to the system's global init script. The name of
the global init script differs; for sysconfig, it is
/etc/init.d/boot.local and for initscripts it is /etc/rc.d/rc.local.
@@ -1183,8 +1189,8 @@ reboots, edit the appropriate file (/etc/init.d/boot.local or
modprobe bonding mode=balance-alb miimon=100
modprobe e100
ifconfig bond0 192.168.1.1 netmask 255.255.255.0 up
-ifenslave bond0 eth0
-ifenslave bond0 eth1
+ip link set eth0 master bond0
+ip link set eth1 master bond0
Replace the example bonding module parameters and bond0
network configuration (IP address, netmask, etc) with the appropriate
@@ -1371,6 +1377,12 @@ To add ARP targets:
To remove an ARP target:
# echo -192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target
+To configure the interval between learning packet transmits:
+# echo 12 > /sys/class/net/bond0/bonding/lp_interval
+ NOTE: the lp_inteval is the number of seconds between instances where
+the bonding driver sends learning packets to each slaves peer switch. The
+default interval is 1 second.
+
Example Configuration
---------------------
We begin with the same example that is shown in section 3.3,
diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt
index 820f553..4c07241 100644
--- a/Documentation/networking/can.txt
+++ b/Documentation/networking/can.txt
@@ -25,6 +25,12 @@ This file contains
4.1.5 RAW socket option CAN_RAW_FD_FRAMES
4.1.6 RAW socket returned message flags
4.2 Broadcast Manager protocol sockets (SOCK_DGRAM)
+ 4.2.1 Broadcast Manager operations
+ 4.2.2 Broadcast Manager message flags
+ 4.2.3 Broadcast Manager transmission timers
+ 4.2.4 Broadcast Manager message sequence transmission
+ 4.2.5 Broadcast Manager receive filter timers
+ 4.2.6 Broadcast Manager multiplex message receive filter
4.3 connected transport protocols (SOCK_SEQPACKET)
4.4 unconnected transport protocols (SOCK_DGRAM)
@@ -593,6 +599,217 @@ solution for a couple of reasons:
In order to receive such messages, CAN_RAW_RECV_OWN_MSGS must be set.
4.2 Broadcast Manager protocol sockets (SOCK_DGRAM)
+
+ The Broadcast Manager protocol provides a command based configuration
+ interface to filter and send (e.g. cyclic) CAN messages in kernel space.
+
+ Receive filters can be used to down sample frequent messages; detect events
+ such as message contents changes, packet length changes, and do time-out
+ monitoring of received messages.
+
+ Periodic transmission tasks of CAN frames or a sequence of CAN frames can be
+ created and modified at runtime; both the message content and the two
+ possible transmit intervals can be altered.
+
+ A BCM socket is not intended for sending individual CAN frames using the
+ struct can_frame as known from the CAN_RAW socket. Instead a special BCM
+ configuration message is defined. The basic BCM configuration message used
+ to communicate with the broadcast manager and the available operations are
+ defined in the linux/can/bcm.h include. The BCM message consists of a
+ message header with a command ('opcode') followed by zero or more CAN frames.
+ The broadcast manager sends responses to user space in the same form:
+
+ struct bcm_msg_head {
+ __u32 opcode; /* command */
+ __u32 flags; /* special flags */
+ __u32 count; /* run 'count' times with ival1 */
+ struct timeval ival1, ival2; /* count and subsequent interval */
+ canid_t can_id; /* unique can_id for task */
+ __u32 nframes; /* number of can_frames following */
+ struct can_frame frames[0];
+ };
+
+ The aligned payload 'frames' uses the same basic CAN frame structure defined
+ at the beginning of section 4 and in the include/linux/can.h include. All
+ messages to the broadcast manager from user space have this structure.
+
+ Note a CAN_BCM socket must be connected instead of bound after socket
+ creation (example without error checking):
+
+ int s;
+ struct sockaddr_can addr;
+ struct ifreq ifr;
+
+ s = socket(PF_CAN, SOCK_DGRAM, CAN_BCM);
+
+ strcpy(ifr.ifr_name, "can0");
+ ioctl(s, SIOCGIFINDEX, &ifr);
+
+ addr.can_family = AF_CAN;
+ addr.can_ifindex = ifr.ifr_ifindex;
+
+ connect(s, (struct sockaddr *)&addr, sizeof(addr))
+
+ (..)
+
+ The broadcast manager socket is able to handle any number of in flight
+ transmissions or receive filters concurrently. The different RX/TX jobs are
+ distinguished by the unique can_id in each BCM message. However additional
+ CAN_BCM sockets are recommended to communicate on multiple CAN interfaces.
+ When the broadcast manager socket is bound to 'any' CAN interface (=> the
+ interface index is set to zero) the configured receive filters apply to any
+ CAN interface unless the sendto() syscall is used to overrule the 'any' CAN
+ interface index. When using recvfrom() instead of read() to retrieve BCM
+ socket messages the originating CAN interface is provided in can_ifindex.
+
+ 4.2.1 Broadcast Manager operations
+
+ The opcode defines the operation for the broadcast manager to carry out,
+ or details the broadcast managers response to several events, including
+ user requests.
+
+ Transmit Operations (user space to broadcast manager):
+
+ TX_SETUP: Create (cyclic) transmission task.
+
+ TX_DELETE: Remove (cyclic) transmission task, requires only can_id.
+
+ TX_READ: Read properties of (cyclic) transmission task for can_id.
+
+ TX_SEND: Send one CAN frame.
+
+ Transmit Responses (broadcast manager to user space):
+
+ TX_STATUS: Reply to TX_READ request (transmission task configuration).
+
+ TX_EXPIRED: Notification when counter finishes sending at initial interval
+ 'ival1'. Requires the TX_COUNTEVT flag to be set at TX_SETUP.
+
+ Receive Operations (user space to broadcast manager):
+
+ RX_SETUP: Create RX content filter subscription.
+
+ RX_DELETE: Remove RX content filter subscription, requires only can_id.
+
+ RX_READ: Read properties of RX content filter subscription for can_id.
+
+ Receive Responses (broadcast manager to user space):
+
+ RX_STATUS: Reply to RX_READ request (filter task configuration).
+
+ RX_TIMEOUT: Cyclic message is detected to be absent (timer ival1 expired).
+
+ RX_CHANGED: BCM message with updated CAN frame (detected content change).
+ Sent on first message received or on receipt of revised CAN messages.
+
+ 4.2.2 Broadcast Manager message flags
+
+ When sending a message to the broadcast manager the 'flags' element may
+ contain the following flag definitions which influence the behaviour:
+
+ SETTIMER: Set the values of ival1, ival2 and count
+
+ STARTTIMER: Start the timer with the actual values of ival1, ival2
+ and count. Starting the timer leads simultaneously to emit a CAN frame.
+
+ TX_COUNTEVT: Create the message TX_EXPIRED when count expires
+
+ TX_ANNOUNCE: A change of data by the process is emitted immediately.
+
+ TX_CP_CAN_ID: Copies the can_id from the message header to each
+ subsequent frame in frames. This is intended as usage simplification. For
+ TX tasks the unique can_id from the message header may differ from the
+ can_id(s) stored for transmission in the subsequent struct can_frame(s).
+
+ RX_FILTER_ID: Filter by can_id alone, no frames required (nframes=0).
+
+ RX_CHECK_DLC: A change of the DLC leads to an RX_CHANGED.
+
+ RX_NO_AUTOTIMER: Prevent automatically starting the timeout monitor.
+
+ RX_ANNOUNCE_RESUME: If passed at RX_SETUP and a receive timeout occured, a
+ RX_CHANGED message will be generated when the (cyclic) receive restarts.
+
+ TX_RESET_MULTI_IDX: Reset the index for the multiple frame transmission.
+
+ RX_RTR_FRAME: Send reply for RTR-request (placed in op->frames[0]).
+
+ 4.2.3 Broadcast Manager transmission timers
+
+ Periodic transmission configurations may use up to two interval timers.
+ In this case the BCM sends a number of messages ('count') at an interval
+ 'ival1', then continuing to send at another given interval 'ival2'. When
+ only one timer is needed 'count' is set to zero and only 'ival2' is used.
+ When SET_TIMER and START_TIMER flag were set the timers are activated.
+ The timer values can be altered at runtime when only SET_TIMER is set.
+
+ 4.2.4 Broadcast Manager message sequence transmission
+
+ Up to 256 CAN frames can be transmitted in a sequence in the case of a cyclic
+ TX task configuration. The number of CAN frames is provided in the 'nframes'
+ element of the BCM message head. The defined number of CAN frames are added
+ as array to the TX_SETUP BCM configuration message.
+
+ /* create a struct to set up a sequence of four CAN frames */
+ struct {
+ struct bcm_msg_head msg_head;
+ struct can_frame frame[4];
+ } mytxmsg;
+
+ (..)
+ mytxmsg.nframes = 4;
+ (..)
+
+ write(s, &mytxmsg, sizeof(mytxmsg));
+
+ With every transmission the index in the array of CAN frames is increased
+ and set to zero at index overflow.
+
+ 4.2.5 Broadcast Manager receive filter timers
+
+ The timer values ival1 or ival2 may be set to non-zero values at RX_SETUP.
+ When the SET_TIMER flag is set the timers are enabled:
+
+ ival1: Send RX_TIMEOUT when a received message is not received again within
+ the given time. When START_TIMER is set at RX_SETUP the timeout detection
+ is activated directly - even without a former CAN frame reception.
+
+ ival2: Throttle the received message rate down to the value of ival2. This
+ is useful to reduce messages for the application when the signal inside the
+ CAN frame is stateless as state changes within the ival2 periode may get
+ lost.
+
+ 4.2.6 Broadcast Manager multiplex message receive filter
+
+ To filter for content changes in multiplex message sequences an array of more
+ than one CAN frames can be passed in a RX_SETUP configuration message. The
+ data bytes of the first CAN frame contain the mask of relevant bits that
+ have to match in the subsequent CAN frames with the received CAN frame.
+ If one of the subsequent CAN frames is matching the bits in that frame data
+ mark the relevant content to be compared with the previous received content.
+ Up to 257 CAN frames (multiplex filter bit mask CAN frame plus 256 CAN
+ filters) can be added as array to the TX_SETUP BCM configuration message.
+
+ /* usually used to clear CAN frame data[] - beware of endian problems! */
+ #define U64_DATA(p) (*(unsigned long long*)(p)->data)
+
+ struct {
+ struct bcm_msg_head msg_head;
+ struct can_frame frame[5];
+ } msg;
+
+ msg.msg_head.opcode = RX_SETUP;
+ msg.msg_head.can_id = 0x42;
+ msg.msg_head.flags = 0;
+ msg.msg_head.nframes = 5;
+ U64_DATA(&msg.frame[0]) = 0xFF00000000000000ULL; /* MUX mask */
+ U64_DATA(&msg.frame[1]) = 0x01000000000000FFULL; /* data mask (MUX 0x01) */
+ U64_DATA(&msg.frame[2]) = 0x0200FFFF000000FFULL; /* data mask (MUX 0x02) */
+ U64_DATA(&msg.frame[3]) = 0x330000FFFFFF0003ULL; /* data mask (MUX 0x33) */
+ U64_DATA(&msg.frame[4]) = 0x4F07FC0FF0000000ULL; /* data mask (MUX 0x4F) */
+
+ write(s, &msg, sizeof(msg));
+
4.3 connected transport protocols (SOCK_SEQPACKET)
4.4 unconnected transport protocols (SOCK_DGRAM)
diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt
index d718bc2..bf5dbe3 100644
--- a/Documentation/networking/dccp.txt
+++ b/Documentation/networking/dccp.txt
@@ -18,8 +18,8 @@ Introduction
Datagram Congestion Control Protocol (DCCP) is an unreliable, connection
oriented protocol designed to solve issues present in UDP and TCP, particularly
for real-time and multimedia (streaming) traffic.
-It divides into a base protocol (RFC 4340) and plugable congestion control
-modules called CCIDs. Like plugable TCP congestion control, at least one CCID
+It divides into a base protocol (RFC 4340) and pluggable congestion control
+modules called CCIDs. Like pluggable TCP congestion control, at least one CCID
needs to be enabled in order for the protocol to function properly. In the Linux
implementation, this is the TCP-like CCID2 (RFC 4341). Additional CCIDs, such as
the TCP-friendly CCID3 (RFC 4342), are optional.
diff --git a/Documentation/networking/e100.txt b/Documentation/networking/e100.txt
index fcb6c71c..f862cf3 100644
--- a/Documentation/networking/e100.txt
+++ b/Documentation/networking/e100.txt
@@ -1,7 +1,7 @@
Linux* Base Driver for the Intel(R) PRO/100 Family of Adapters
==============================================================
-November 15, 2005
+March 15, 2011
Contents
========
@@ -103,7 +103,7 @@ Additional Configurations
PRO/100 Family of Adapters is e100.
As an example, if you install the e100 driver for two PRO/100 adapters
- (eth0 and eth1), add the following to a configuraton file in /etc/modprobe.d/
+ (eth0 and eth1), add the following to a configuration file in /etc/modprobe.d/
alias eth0 e100
alias eth1 e100
@@ -122,7 +122,7 @@ Additional Configurations
NOTE: This setting is not saved across reboots.
- Ethtool
+ ethtool
-------
The driver utilizes the ethtool interface for driver configuration and
diff --git a/Documentation/networking/e1000.txt b/Documentation/networking/e1000.txt
index 71ca958..437b209 100644
--- a/Documentation/networking/e1000.txt
+++ b/Documentation/networking/e1000.txt
@@ -1,8 +1,8 @@
-Linux* Base Driver for the Intel(R) PRO/1000 Family of Adapters
-===============================================================
+Linux* Base Driver for Intel(R) Ethernet Network Connection
+===========================================================
Intel Gigabit Linux driver.
-Copyright(c) 1999 - 2010 Intel Corporation.
+Copyright(c) 1999 - 2013 Intel Corporation.
Contents
========
@@ -420,15 +420,15 @@ Additional Configurations
- The maximum MTU setting for Jumbo Frames is 16110. This value coincides
with the maximum Jumbo Frames size of 16128.
- - Using Jumbo Frames at 10 or 100 Mbps may result in poor performance or
- loss of link.
+ - Using Jumbo frames at 10 or 100 Mbps is not supported and may result in
+ poor performance or loss of link.
- Adapters based on the Intel(R) 82542 and 82573V/E controller do not
support Jumbo Frames. These correspond to the following product names:
Intel(R) PRO/1000 Gigabit Server Adapter
Intel(R) PRO/1000 PM Network Connection
- Ethtool
+ ethtool
-------
The driver utilizes the ethtool interface for driver configuration and
diagnostics, as well as displaying statistical information. The ethtool
diff --git a/Documentation/networking/e1000e.txt b/Documentation/networking/e1000e.txt
index 97b5ba9..ad2d9f3 100644
--- a/Documentation/networking/e1000e.txt
+++ b/Documentation/networking/e1000e.txt
@@ -1,8 +1,8 @@
-Linux* Driver for Intel(R) Network Connection
-=============================================
+Linux* Driver for Intel(R) Ethernet Network Connection
+======================================================
Intel Gigabit Linux driver.
-Copyright(c) 1999 - 2010 Intel Corporation.
+Copyright(c) 1999 - 2013 Intel Corporation.
Contents
========
@@ -259,13 +259,16 @@ Additional Configurations
- The maximum MTU setting for Jumbo Frames is 9216. This value coincides
with the maximum Jumbo Frames size of 9234 bytes.
- - Using Jumbo Frames at 10 or 100 Mbps is not supported and may result in
+ - Using Jumbo frames at 10 or 100 Mbps is not supported and may result in
poor performance or loss of link.
- Some adapters limit Jumbo Frames sized packets to a maximum of
4096 bytes and some adapters do not support Jumbo Frames.
- Ethtool
+ - Jumbo Frames cannot be configured on an 82579-based Network device, if
+ MACSec is enabled on the system.
+
+ ethtool
-------
The driver utilizes the ethtool interface for driver configuration and
diagnostics, as well as displaying statistical information. We
@@ -273,6 +276,9 @@ Additional Configurations
http://ftp.kernel.org/pub/software/network/ethtool/
+ NOTE: When validating enable/disable tests on some parts (82578, for example)
+ you need to add a few seconds between tests when working with ethtool.
+
Speed and Duplex
----------------
Speed and Duplex are configured through the ethtool* utility. For
diff --git a/Documentation/networking/i40e.txt b/Documentation/networking/i40e.txt
new file mode 100644
index 0000000..f737273
--- /dev/null
+++ b/Documentation/networking/i40e.txt
@@ -0,0 +1,115 @@
+Linux Base Driver for the Intel(R) Ethernet Controller XL710 Family
+===================================================================
+
+Intel i40e Linux driver.
+Copyright(c) 2013 Intel Corporation.
+
+Contents
+========
+
+- Identifying Your Adapter
+- Additional Configurations
+- Performance Tuning
+- Known Issues
+- Support
+
+
+Identifying Your Adapter
+========================
+
+The driver in this release is compatible with the Intel Ethernet
+Controller XL710 Family.
+
+For more information on how to identify your adapter, go to the Adapter &
+Driver ID Guide at:
+
+ http://support.intel.com/support/network/sb/CS-012904.htm
+
+
+Enabling the driver
+===================
+
+The driver is enabled via the standard kernel configuration system,
+using the make command:
+
+ Make oldconfig/silentoldconfig/menuconfig/etc.
+
+The driver is located in the menu structure at:
+
+ -> Device Drivers
+ -> Network device support (NETDEVICES [=y])
+ -> Ethernet driver support
+ -> Intel devices
+ -> Intel(R) Ethernet Controller XL710 Family
+
+Additional Configurations
+=========================
+
+ Generic Receive Offload (GRO)
+ -----------------------------
+ The driver supports the in-kernel software implementation of GRO. GRO has
+ shown that by coalescing Rx traffic into larger chunks of data, CPU
+ utilization can be significantly reduced when under large Rx load. GRO is
+ an evolution of the previously-used LRO interface. GRO is able to coalesce
+ other protocols besides TCP. It's also safe to use with configurations that
+ are problematic for LRO, namely bridging and iSCSI.
+
+ Ethtool
+ -------
+ The driver utilizes the ethtool interface for driver configuration and
+ diagnostics, as well as displaying statistical information. The latest
+ ethtool version is required for this functionality.
+
+ The latest release of ethtool can be found from
+ https://www.kernel.org/pub/software/network/ethtool
+
+ Data Center Bridging (DCB)
+ --------------------------
+ DCB configuration is not currently supported.
+
+ FCoE
+ ----
+ Fiber Channel over Ethernet (FCoE) hardware offload is not currently
+ supported.
+
+ MAC and VLAN anti-spoofing feature
+ ----------------------------------
+ When a malicious driver attempts to send a spoofed packet, it is dropped by
+ the hardware and not transmitted. An interrupt is sent to the PF driver
+ notifying it of the spoof attempt.
+
+ When a spoofed packet is detected the PF driver will send the following
+ message to the system log (displayed by the "dmesg" command):
+
+ Spoof event(s) detected on VF (n)
+
+ Where n=the VF that attempted to do the spoofing.
+
+
+Performance Tuning
+==================
+
+An excellent article on performance tuning can be found at:
+
+http://www.redhat.com/promo/summit/2008/downloads/pdf/Thursday/Mark_Wagner.pdf
+
+
+Known Issues
+============
+
+
+Support
+=======
+
+For general information, go to the Intel support website at:
+
+ http://support.intel.com
+
+or the Intel Wired Networking project hosted by Sourceforge at:
+
+ http://e1000.sourceforge.net
+
+If an issue is identified with the released source code on the supported
+kernel with a supported adapter, email the specific information related
+to the issue to e1000-devel@lists.sourceforge.net and copy
+netdev@vger.kernel.org.
diff --git a/Documentation/networking/ieee802154.txt b/Documentation/networking/ieee802154.txt
index 67a9cb2..22bbc72 100644
--- a/Documentation/networking/ieee802154.txt
+++ b/Documentation/networking/ieee802154.txt
@@ -4,8 +4,8 @@
Introduction
============
-The IEEE 802.15.4 working group focuses on standartization of bottom
-two layers: Medium Accsess Control (MAC) and Physical (PHY). And there
+The IEEE 802.15.4 working group focuses on standardization of bottom
+two layers: Medium Access Control (MAC) and Physical (PHY). And there
are mainly two options available for upper layers:
- ZigBee - proprietary protocol from ZigBee Alliance
- 6LowPAN - IPv6 networking over low rate personal area networks
@@ -66,7 +66,7 @@ net_device, with .type = ARPHRD_IEEE802154. Data is exchanged with socket family
code via plain sk_buffs. On skb reception skb->cb must contain additional
info as described in the struct ieee802154_mac_cb. During packet transmission
the skb->cb is used to provide additional data to device's header_ops->create
-function. Be aware, that this data can be overriden later (when socket code
+function. Be aware that this data can be overridden later (when socket code
submits skb to qdisc), so if you need something from that cb later, you should
store info in the skb->data on your own.
diff --git a/Documentation/networking/ifenslave.c b/Documentation/networking/ifenslave.c
deleted file mode 100644
index ac5debb..0000000
--- a/Documentation/networking/ifenslave.c
+++ /dev/null
@@ -1,1105 +0,0 @@
-/* Mode: C;
- * ifenslave.c: Configure network interfaces for parallel routing.
- *
- * This program controls the Linux implementation of running multiple
- * network interfaces in parallel.
- *
- * Author: Donald Becker <becker@cesdis.gsfc.nasa.gov>
- * Copyright 1994-1996 Donald Becker
- *
- * This program is free software; you can redistribute it
- * and/or modify it under the terms of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * The author may be reached as becker@CESDIS.gsfc.nasa.gov, or C/O
- * Center of Excellence in Space Data and Information Sciences
- * Code 930.5, Goddard Space Flight Center, Greenbelt MD 20771
- *
- * Changes :
- * - 2000/10/02 Willy Tarreau <willy at meta-x.org> :
- * - few fixes. Master's MAC address is now correctly taken from
- * the first device when not previously set ;
- * - detach support : call BOND_RELEASE to detach an enslaved interface.
- * - give a mini-howto from command-line help : # ifenslave -h
- *
- * - 2001/02/16 Chad N. Tindel <ctindel at ieee dot org> :
- * - Master is now brought down before setting the MAC address. In
- * the 2.4 kernel you can't change the MAC address while the device is
- * up because you get EBUSY.
- *
- * - 2001/09/13 Takao Indoh <indou dot takao at jp dot fujitsu dot com>
- * - Added the ability to change the active interface on a mode 1 bond
- * at runtime.
- *
- * - 2001/10/23 Chad N. Tindel <ctindel at ieee dot org> :
- * - No longer set the MAC address of the master. The bond device will
- * take care of this itself
- * - Try the SIOC*** versions of the bonding ioctls before using the
- * old versions
- * - 2002/02/18 Erik Habbinga <erik_habbinga @ hp dot com> :
- * - ifr2.ifr_flags was not initialized in the hwaddr_notset case,
- * SIOCGIFFLAGS now called before hwaddr_notset test
- *
- * - 2002/10/31 Tony Cureington <tony.cureington * hp_com> :
- * - If the master does not have a hardware address when the first slave
- * is enslaved, the master is assigned the hardware address of that
- * slave - there is a comment in bonding.c stating "ifenslave takes
- * care of this now." This corrects the problem of slaves having
- * different hardware addresses in active-backup mode when
- * multiple interfaces are specified on a single ifenslave command
- * (ifenslave bond0 eth0 eth1).
- *
- * - 2003/03/18 - Tsippy Mendelson <tsippy.mendelson at intel dot com> and
- * Shmulik Hen <shmulik.hen at intel dot com>
- * - Moved setting the slave's mac address and openning it, from
- * the application to the driver. This enables support of modes
- * that need to use the unique mac address of each slave.
- * The driver also takes care of closing the slave and restoring its
- * original mac address upon release.
- * In addition, block possibility of enslaving before the master is up.
- * This prevents putting the system in an undefined state.
- *
- * - 2003/05/01 - Amir Noam <amir.noam at intel dot com>
- * - Added ABI version control to restore compatibility between
- * new/old ifenslave and new/old bonding.
- * - Prevent adding an adapter that is already a slave.
- * Fixes the problem of stalling the transmission and leaving
- * the slave in a down state.
- *
- * - 2003/05/01 - Shmulik Hen <shmulik.hen at intel dot com>
- * - Prevent enslaving if the bond device is down.
- * Fixes the problem of leaving the system in unstable state and
- * halting when trying to remove the module.
- * - Close socket on all abnormal exists.
- * - Add versioning scheme that follows that of the bonding driver.
- * current version is 1.0.0 as a base line.
- *
- * - 2003/05/22 - Jay Vosburgh <fubar at us dot ibm dot com>
- * - ifenslave -c was broken; it's now fixed
- * - Fixed problem with routes vanishing from master during enslave
- * processing.
- *
- * - 2003/05/27 - Amir Noam <amir.noam at intel dot com>
- * - Fix backward compatibility issues:
- * For drivers not using ABI versions, slave was set down while
- * it should be left up before enslaving.
- * Also, master was not set down and the default set_mac_address()
- * would fail and generate an error message in the system log.
- * - For opt_c: slave should not be set to the master's setting
- * while it is running. It was already set during enslave. To
- * simplify things, it is now handled separately.
- *
- * - 2003/12/01 - Shmulik Hen <shmulik.hen at intel dot com>
- * - Code cleanup and style changes
- * set version to 1.1.0
- */
-
-#define APP_VERSION "1.1.0"
-#define APP_RELDATE "December 1, 2003"
-#define APP_NAME "ifenslave"
-
-static char *version =
-APP_NAME ".c:v" APP_VERSION " (" APP_RELDATE ")\n"
-"o Donald Becker (becker@cesdis.gsfc.nasa.gov).\n"
-"o Detach support added on 2000/10/02 by Willy Tarreau (willy at meta-x.org).\n"
-"o 2.4 kernel support added on 2001/02/16 by Chad N. Tindel\n"
-" (ctindel at ieee dot org).\n";
-
-static const char *usage_msg =
-"Usage: ifenslave [-f] <master-if> <slave-if> [<slave-if>...]\n"
-" ifenslave -d <master-if> <slave-if> [<slave-if>...]\n"
-" ifenslave -c <master-if> <slave-if>\n"
-" ifenslave --help\n";
-
-static const char *help_msg =
-"\n"
-" To create a bond device, simply follow these three steps :\n"
-" - ensure that the required drivers are properly loaded :\n"
-" # modprobe bonding ; modprobe <3c59x|eepro100|pcnet32|tulip|...>\n"
-" - assign an IP address to the bond device :\n"
-" # ifconfig bond0 <addr> netmask <mask> broadcast <bcast>\n"
-" - attach all the interfaces you need to the bond device :\n"
-" # ifenslave [{-f|--force}] bond0 eth0 [eth1 [eth2]...]\n"
-" If bond0 didn't have a MAC address, it will take eth0's. Then, all\n"
-" interfaces attached AFTER this assignment will get the same MAC addr.\n"
-" (except for ALB/TLB modes)\n"
-"\n"
-" To set the bond device down and automatically release all the slaves :\n"
-" # ifconfig bond0 down\n"
-"\n"
-" To detach a dead interface without setting the bond device down :\n"
-" # ifenslave {-d|--detach} bond0 eth0 [eth1 [eth2]...]\n"
-"\n"
-" To change active slave :\n"
-" # ifenslave {-c|--change-active} bond0 eth0\n"
-"\n"
-" To show master interface info\n"
-" # ifenslave bond0\n"
-"\n"
-" To show all interfaces info\n"
-" # ifenslave {-a|--all-interfaces}\n"
-"\n"
-" To be more verbose\n"
-" # ifenslave {-v|--verbose} ...\n"
-"\n"
-" # ifenslave {-u|--usage} Show usage\n"
-" # ifenslave {-V|--version} Show version\n"
-" # ifenslave {-h|--help} This message\n"
-"\n";
-
-#include <unistd.h>
-#include <stdlib.h>
-#include <stdio.h>
-#include <ctype.h>
-#include <string.h>
-#include <errno.h>
-#include <fcntl.h>
-#include <getopt.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <sys/ioctl.h>
-#include <linux/if.h>
-#include <net/if_arp.h>
-#include <linux/if_ether.h>
-#include <linux/if_bonding.h>
-#include <linux/sockios.h>
-
-typedef unsigned long long u64; /* hack, so we may include kernel's ethtool.h */
-typedef __uint32_t u32; /* ditto */
-typedef __uint16_t u16; /* ditto */
-typedef __uint8_t u8; /* ditto */
-#include <linux/ethtool.h>
-
-struct option longopts[] = {
- /* { name has_arg *flag val } */
- {"all-interfaces", 0, 0, 'a'}, /* Show all interfaces. */
- {"change-active", 0, 0, 'c'}, /* Change the active slave. */
- {"detach", 0, 0, 'd'}, /* Detach a slave interface. */
- {"force", 0, 0, 'f'}, /* Force the operation. */
- {"help", 0, 0, 'h'}, /* Give help */
- {"usage", 0, 0, 'u'}, /* Give usage */
- {"verbose", 0, 0, 'v'}, /* Report each action taken. */
- {"version", 0, 0, 'V'}, /* Emit version information. */
- { 0, 0, 0, 0}
-};
-
-/* Command-line flags. */
-unsigned int
-opt_a = 0, /* Show-all-interfaces flag. */
-opt_c = 0, /* Change-active-slave flag. */
-opt_d = 0, /* Detach a slave interface. */
-opt_f = 0, /* Force the operation. */
-opt_h = 0, /* Help */
-opt_u = 0, /* Usage */
-opt_v = 0, /* Verbose flag. */
-opt_V = 0; /* Version */
-
-int skfd = -1; /* AF_INET socket for ioctl() calls.*/
-int abi_ver = 0; /* userland - kernel ABI version */
-int hwaddr_set = 0; /* Master's hwaddr is set */
-int saved_errno;
-
-struct ifreq master_mtu, master_flags, master_hwaddr;
-struct ifreq slave_mtu, slave_flags, slave_hwaddr;
-
-struct dev_ifr {
- struct ifreq *req_ifr;
- char *req_name;
- int req_type;
-};
-
-struct dev_ifr master_ifra[] = {
- {&master_mtu, "SIOCGIFMTU", SIOCGIFMTU},
- {&master_flags, "SIOCGIFFLAGS", SIOCGIFFLAGS},
- {&master_hwaddr, "SIOCGIFHWADDR", SIOCGIFHWADDR},
- {NULL, "", 0}
-};
-
-struct dev_ifr slave_ifra[] = {
- {&slave_mtu, "SIOCGIFMTU", SIOCGIFMTU},
- {&slave_flags, "SIOCGIFFLAGS", SIOCGIFFLAGS},
- {&slave_hwaddr, "SIOCGIFHWADDR", SIOCGIFHWADDR},
- {NULL, "", 0}
-};
-
-static void if_print(char *ifname);
-static int get_drv_info(char *master_ifname);
-static int get_if_settings(char *ifname, struct dev_ifr ifra[]);
-static int get_slave_flags(char *slave_ifname);
-static int set_master_hwaddr(char *master_ifname, struct sockaddr *hwaddr);
-static int set_slave_hwaddr(char *slave_ifname, struct sockaddr *hwaddr);
-static int set_slave_mtu(char *slave_ifname, int mtu);
-static int set_if_flags(char *ifname, short flags);
-static int set_if_up(char *ifname, short flags);
-static int set_if_down(char *ifname, short flags);
-static int clear_if_addr(char *ifname);
-static int set_if_addr(char *master_ifname, char *slave_ifname);
-static int change_active(char *master_ifname, char *slave_ifname);
-static int enslave(char *master_ifname, char *slave_ifname);
-static int release(char *master_ifname, char *slave_ifname);
-#define v_print(fmt, args...) \
- if (opt_v) \
- fprintf(stderr, fmt, ## args )
-
-int main(int argc, char *argv[])
-{
- char **spp, *master_ifname, *slave_ifname;
- int c, i, rv;
- int res = 0;
- int exclusive = 0;
-
- while ((c = getopt_long(argc, argv, "acdfhuvV", longopts, 0)) != EOF) {
- switch (c) {
- case 'a': opt_a++; exclusive++; break;
- case 'c': opt_c++; exclusive++; break;
- case 'd': opt_d++; exclusive++; break;
- case 'f': opt_f++; exclusive++; break;
- case 'h': opt_h++; exclusive++; break;
- case 'u': opt_u++; exclusive++; break;
- case 'v': opt_v++; break;
- case 'V': opt_V++; exclusive++; break;
-
- case '?':
- fprintf(stderr, "%s", usage_msg);
- res = 2;
- goto out;
- }
- }
-
- /* options check */
- if (exclusive > 1) {
- fprintf(stderr, "%s", usage_msg);
- res = 2;
- goto out;
- }
-
- if (opt_v || opt_V) {
- printf("%s", version);
- if (opt_V) {
- res = 0;
- goto out;
- }
- }
-
- if (opt_u) {
- printf("%s", usage_msg);
- res = 0;
- goto out;
- }
-
- if (opt_h) {
- printf("%s", usage_msg);
- printf("%s", help_msg);
- res = 0;
- goto out;
- }
-
- /* Open a basic socket */
- if ((skfd = socket(AF_INET, SOCK_DGRAM, 0)) < 0) {
- perror("socket");
- res = 1;
- goto out;
- }
-
- if (opt_a) {
- if (optind == argc) {
- /* No remaining args */
- /* show all interfaces */
- if_print((char *)NULL);
- goto out;
- } else {
- /* Just show usage */
- fprintf(stderr, "%s", usage_msg);
- res = 2;
- goto out;
- }
- }
-
- /* Copy the interface name */
- spp = argv + optind;
- master_ifname = *spp++;
-
- if (master_ifname == NULL) {
- fprintf(stderr, "%s", usage_msg);
- res = 2;
- goto out;
- }
-
- /* exchange abi version with bonding module */
- res = get_drv_info(master_ifname);
- if (res) {
- fprintf(stderr,
- "Master '%s': Error: handshake with driver failed. "
- "Aborting\n",
- master_ifname);
- goto out;
- }
-
- slave_ifname = *spp++;
-
- if (slave_ifname == NULL) {
- if (opt_d || opt_c) {
- fprintf(stderr, "%s", usage_msg);
- res = 2;
- goto out;
- }
-
- /* A single arg means show the
- * configuration for this interface
- */
- if_print(master_ifname);
- goto out;
- }
-
- res = get_if_settings(master_ifname, master_ifra);
- if (res) {
- /* Probably a good reason not to go on */
- fprintf(stderr,
- "Master '%s': Error: get settings failed: %s. "
- "Aborting\n",
- master_ifname, strerror(res));
- goto out;
- }
-
- /* check if master is indeed a master;
- * if not then fail any operation
- */
- if (!(master_flags.ifr_flags & IFF_MASTER)) {
- fprintf(stderr,
- "Illegal operation; the specified interface '%s' "
- "is not a master. Aborting\n",
- master_ifname);
- res = 1;
- goto out;
- }
-
- /* check if master is up; if not then fail any operation */
- if (!(master_flags.ifr_flags & IFF_UP)) {
- fprintf(stderr,
- "Illegal operation; the specified master interface "
- "'%s' is not up.\n",
- master_ifname);
- res = 1;
- goto out;
- }
-
- /* Only for enslaving */
- if (!opt_c && !opt_d) {
- sa_family_t master_family = master_hwaddr.ifr_hwaddr.sa_family;
- unsigned char *hwaddr =
- (unsigned char *)master_hwaddr.ifr_hwaddr.sa_data;
-
- /* The family '1' is ARPHRD_ETHER for ethernet. */
- if (master_family != 1 && !opt_f) {
- fprintf(stderr,
- "Illegal operation: The specified master "
- "interface '%s' is not ethernet-like.\n "
- "This program is designed to work with "
- "ethernet-like network interfaces.\n "
- "Use the '-f' option to force the "
- "operation.\n",
- master_ifname);
- res = 1;
- goto out;
- }
-
- /* Check master's hw addr */
- for (i = 0; i < 6; i++) {
- if (hwaddr[i] != 0) {
- hwaddr_set = 1;
- break;
- }
- }
-
- if (hwaddr_set) {
- v_print("current hardware address of master '%s' "
- "is %2.2x:%2.2x:%2.2x:%2.2x:%2.2x:%2.2x, "
- "type %d\n",
- master_ifname,
- hwaddr[0], hwaddr[1],
- hwaddr[2], hwaddr[3],
- hwaddr[4], hwaddr[5],
- master_family);
- }
- }
-
- /* Accepts only one slave */
- if (opt_c) {
- /* change active slave */
- res = get_slave_flags(slave_ifname);
- if (res) {
- fprintf(stderr,
- "Slave '%s': Error: get flags failed. "
- "Aborting\n",
- slave_ifname);
- goto out;
- }
- res = change_active(master_ifname, slave_ifname);
- if (res) {
- fprintf(stderr,
- "Master '%s', Slave '%s': Error: "
- "Change active failed\n",
- master_ifname, slave_ifname);
- }
- } else {
- /* Accept multiple slaves */
- do {
- if (opt_d) {
- /* detach a slave interface from the master */
- rv = get_slave_flags(slave_ifname);
- if (rv) {
- /* Can't work with this slave. */
- /* remember the error and skip it*/
- fprintf(stderr,
- "Slave '%s': Error: get flags "
- "failed. Skipping\n",
- slave_ifname);
- res = rv;
- continue;
- }
- rv = release(master_ifname, slave_ifname);
- if (rv) {
- fprintf(stderr,
- "Master '%s', Slave '%s': Error: "
- "Release failed\n",
- master_ifname, slave_ifname);
- res = rv;
- }
- } else {
- /* attach a slave interface to the master */
- rv = get_if_settings(slave_ifname, slave_ifra);
- if (rv) {
- /* Can't work with this slave. */
- /* remember the error and skip it*/
- fprintf(stderr,
- "Slave '%s': Error: get "
- "settings failed: %s. "
- "Skipping\n",
- slave_ifname, strerror(rv));
- res = rv;
- continue;
- }
- rv = enslave(master_ifname, slave_ifname);
- if (rv) {
- fprintf(stderr,
- "Master '%s', Slave '%s': Error: "
- "Enslave failed\n",
- master_ifname, slave_ifname);
- res = rv;
- }
- }
- } while ((slave_ifname = *spp++) != NULL);
- }
-
-out:
- if (skfd >= 0) {
- close(skfd);
- }
-
- return res;
-}
-
-static short mif_flags;
-
-/* Get the inteface configuration from the kernel. */
-static int if_getconfig(char *ifname)
-{
- struct ifreq ifr;
- int metric, mtu; /* Parameters of the master interface. */
- struct sockaddr dstaddr, broadaddr, netmask;
- unsigned char *hwaddr;
-
- strcpy(ifr.ifr_name, ifname);
- if (ioctl(skfd, SIOCGIFFLAGS, &ifr) < 0)
- return -1;
- mif_flags = ifr.ifr_flags;
- printf("The result of SIOCGIFFLAGS on %s is %x.\n",
- ifname, ifr.ifr_flags);
-
- strcpy(ifr.ifr_name, ifname);
- if (ioctl(skfd, SIOCGIFADDR, &ifr) < 0)
- return -1;
- printf("The result of SIOCGIFADDR is %2.2x.%2.2x.%2.2x.%2.2x.\n",
- ifr.ifr_addr.sa_data[0], ifr.ifr_addr.sa_data[1],
- ifr.ifr_addr.sa_data[2], ifr.ifr_addr.sa_data[3]);
-
- strcpy(ifr.ifr_name, ifname);
- if (ioctl(skfd, SIOCGIFHWADDR, &ifr) < 0)
- return -1;
-
- /* Gotta convert from 'char' to unsigned for printf(). */
- hwaddr = (unsigned char *)ifr.ifr_hwaddr.sa_data;
- printf("The result of SIOCGIFHWADDR is type %d "
- "%2.2x:%2.2x:%2.2x:%2.2x:%2.2x:%2.2x.\n",
- ifr.ifr_hwaddr.sa_family, hwaddr[0], hwaddr[1],
- hwaddr[2], hwaddr[3], hwaddr[4], hwaddr[5]);
-
- strcpy(ifr.ifr_name, ifname);
- if (ioctl(skfd, SIOCGIFMETRIC, &ifr) < 0) {
- metric = 0;
- } else
- metric = ifr.ifr_metric;
- printf("The result of SIOCGIFMETRIC is %d\n", metric);
-
- strcpy(ifr.ifr_name, ifname);
- if (ioctl(skfd, SIOCGIFMTU, &ifr) < 0)
- mtu = 0;
- else
- mtu = ifr.ifr_mtu;
- printf("The result of SIOCGIFMTU is %d\n", mtu);
-
- strcpy(ifr.ifr_name, ifname);
- if (ioctl(skfd, SIOCGIFDSTADDR, &ifr) < 0) {
- memset(&dstaddr, 0, sizeof(struct sockaddr));
- } else
- dstaddr = ifr.ifr_dstaddr;
-
- strcpy(ifr.ifr_name, ifname);
- if (ioctl(skfd, SIOCGIFBRDADDR, &ifr) < 0) {
- memset(&broadaddr, 0, sizeof(struct sockaddr));
- } else
- broadaddr = ifr.ifr_broadaddr;
-
- strcpy(ifr.ifr_name, ifname);
- if (ioctl(skfd, SIOCGIFNETMASK, &ifr) < 0) {
- memset(&netmask, 0, sizeof(struct sockaddr));
- } else
- netmask = ifr.ifr_netmask;
-
- return 0;
-}
-
-static void if_print(char *ifname)
-{
- char buff[1024];
- struct ifconf ifc;
- struct ifreq *ifr;
- int i;
-
- if (ifname == (char *)NULL) {
- ifc.ifc_len = sizeof(buff);
- ifc.ifc_buf = buff;
- if (ioctl(skfd, SIOCGIFCONF, &ifc) < 0) {
- perror("SIOCGIFCONF failed");
- return;
- }
-
- ifr = ifc.ifc_req;
- for (i = ifc.ifc_len / sizeof(struct ifreq); --i >= 0; ifr++) {
- if (if_getconfig(ifr->ifr_name) < 0) {
- fprintf(stderr,
- "%s: unknown interface.\n",
- ifr->ifr_name);
- continue;
- }
-
- if (((mif_flags & IFF_UP) == 0) && !opt_a) continue;
- /*ife_print(&ife);*/
- }
- } else {
- if (if_getconfig(ifname) < 0) {
- fprintf(stderr,
- "%s: unknown interface.\n", ifname);
- }
- }
-}
-
-static int get_drv_info(char *master_ifname)
-{
- struct ifreq ifr;
- struct ethtool_drvinfo info;
- char *endptr;
-
- memset(&ifr, 0, sizeof(ifr));
- strncpy(ifr.ifr_name, master_ifname, IFNAMSIZ);
- ifr.ifr_data = (caddr_t)&info;
-
- info.cmd = ETHTOOL_GDRVINFO;
- strncpy(info.driver, "ifenslave", 32);
- snprintf(info.fw_version, 32, "%d", BOND_ABI_VERSION);
-
- if (ioctl(skfd, SIOCETHTOOL, &ifr) < 0) {
- if (errno == EOPNOTSUPP) {
- goto out;
- }
-
- saved_errno = errno;
- v_print("Master '%s': Error: get bonding info failed %s\n",
- master_ifname, strerror(saved_errno));
- return 1;
- }
-
- abi_ver = strtoul(info.fw_version, &endptr, 0);
- if (*endptr) {
- v_print("Master '%s': Error: got invalid string as an ABI "
- "version from the bonding module\n",
- master_ifname);
- return 1;
- }
-
-out:
- v_print("ABI ver is %d\n", abi_ver);
-
- return 0;
-}
-
-static int change_active(char *master_ifname, char *slave_ifname)
-{
- struct ifreq ifr;
- int res = 0;
-
- if (!(slave_flags.ifr_flags & IFF_SLAVE)) {
- fprintf(stderr,
- "Illegal operation: The specified slave interface "
- "'%s' is not a slave\n",
- slave_ifname);
- return 1;
- }
-
- strncpy(ifr.ifr_name, master_ifname, IFNAMSIZ);
- strncpy(ifr.ifr_slave, slave_ifname, IFNAMSIZ);
- if ((ioctl(skfd, SIOCBONDCHANGEACTIVE, &ifr) < 0) &&
- (ioctl(skfd, BOND_CHANGE_ACTIVE_OLD, &ifr) < 0)) {
- saved_errno = errno;
- v_print("Master '%s': Error: SIOCBONDCHANGEACTIVE failed: "
- "%s\n",
- master_ifname, strerror(saved_errno));
- res = 1;
- }
-
- return res;
-}
-
-static int enslave(char *master_ifname, char *slave_ifname)
-{
- struct ifreq ifr;
- int res = 0;
-
- if (slave_flags.ifr_flags & IFF_SLAVE) {
- fprintf(stderr,
- "Illegal operation: The specified slave interface "
- "'%s' is already a slave\n",
- slave_ifname);
- return 1;
- }
-
- res = set_if_down(slave_ifname, slave_flags.ifr_flags);
- if (res) {
- fprintf(stderr,
- "Slave '%s': Error: bring interface down failed\n",
- slave_ifname);
- return res;
- }
-
- if (abi_ver < 2) {
- /* Older bonding versions would panic if the slave has no IP
- * address, so get the IP setting from the master.
- */
- set_if_addr(master_ifname, slave_ifname);
- } else {
- res = clear_if_addr(slave_ifname);
- if (res) {
- fprintf(stderr,
- "Slave '%s': Error: clear address failed\n",
- slave_ifname);
- return res;
- }
- }
-
- if (master_mtu.ifr_mtu != slave_mtu.ifr_mtu) {
- res = set_slave_mtu(slave_ifname, master_mtu.ifr_mtu);
- if (res) {
- fprintf(stderr,
- "Slave '%s': Error: set MTU failed\n",
- slave_ifname);
- return res;
- }
- }
-
- if (hwaddr_set) {
- /* Master already has an hwaddr
- * so set it's hwaddr to the slave
- */
- if (abi_ver < 1) {
- /* The driver is using an old ABI, so
- * the application sets the slave's
- * hwaddr
- */
- res = set_slave_hwaddr(slave_ifname,
- &(master_hwaddr.ifr_hwaddr));
- if (res) {
- fprintf(stderr,
- "Slave '%s': Error: set hw address "
- "failed\n",
- slave_ifname);
- goto undo_mtu;
- }
-
- /* For old ABI the application needs to bring the
- * slave back up
- */
- res = set_if_up(slave_ifname, slave_flags.ifr_flags);
- if (res) {
- fprintf(stderr,
- "Slave '%s': Error: bring interface "
- "down failed\n",
- slave_ifname);
- goto undo_slave_mac;
- }
- }
- /* The driver is using a new ABI,
- * so the driver takes care of setting
- * the slave's hwaddr and bringing
- * it up again
- */
- } else {
- /* No hwaddr for master yet, so
- * set the slave's hwaddr to it
- */
- if (abi_ver < 1) {
- /* For old ABI, the master needs to be
- * down before setting its hwaddr
- */
- res = set_if_down(master_ifname, master_flags.ifr_flags);
- if (res) {
- fprintf(stderr,
- "Master '%s': Error: bring interface "
- "down failed\n",
- master_ifname);
- goto undo_mtu;
- }
- }
-
- res = set_master_hwaddr(master_ifname,
- &(slave_hwaddr.ifr_hwaddr));
- if (res) {
- fprintf(stderr,
- "Master '%s': Error: set hw address "
- "failed\n",
- master_ifname);
- goto undo_mtu;
- }
-
- if (abi_ver < 1) {
- /* For old ABI, bring the master
- * back up
- */
- res = set_if_up(master_ifname, master_flags.ifr_flags);
- if (res) {
- fprintf(stderr,
- "Master '%s': Error: bring interface "
- "up failed\n",
- master_ifname);
- goto undo_master_mac;
- }
- }
-
- hwaddr_set = 1;
- }
-
- /* Do the real thing */
- strncpy(ifr.ifr_name, master_ifname, IFNAMSIZ);
- strncpy(ifr.ifr_slave, slave_ifname, IFNAMSIZ);
- if ((ioctl(skfd, SIOCBONDENSLAVE, &ifr) < 0) &&
- (ioctl(skfd, BOND_ENSLAVE_OLD, &ifr) < 0)) {
- saved_errno = errno;
- v_print("Master '%s': Error: SIOCBONDENSLAVE failed: %s\n",
- master_ifname, strerror(saved_errno));
- res = 1;
- }
-
- if (res) {
- goto undo_master_mac;
- }
-
- return 0;
-
-/* rollback (best effort) */
-undo_master_mac:
- set_master_hwaddr(master_ifname, &(master_hwaddr.ifr_hwaddr));
- hwaddr_set = 0;
- goto undo_mtu;
-undo_slave_mac:
- set_slave_hwaddr(slave_ifname, &(slave_hwaddr.ifr_hwaddr));
-undo_mtu:
- set_slave_mtu(slave_ifname, slave_mtu.ifr_mtu);
- return res;
-}
-
-static int release(char *master_ifname, char *slave_ifname)
-{
- struct ifreq ifr;
- int res = 0;
-
- if (!(slave_flags.ifr_flags & IFF_SLAVE)) {
- fprintf(stderr,
- "Illegal operation: The specified slave interface "
- "'%s' is not a slave\n",
- slave_ifname);
- return 1;
- }
-
- strncpy(ifr.ifr_name, master_ifname, IFNAMSIZ);
- strncpy(ifr.ifr_slave, slave_ifname, IFNAMSIZ);
- if ((ioctl(skfd, SIOCBONDRELEASE, &ifr) < 0) &&
- (ioctl(skfd, BOND_RELEASE_OLD, &ifr) < 0)) {
- saved_errno = errno;
- v_print("Master '%s': Error: SIOCBONDRELEASE failed: %s\n",
- master_ifname, strerror(saved_errno));
- return 1;
- } else if (abi_ver < 1) {
- /* The driver is using an old ABI, so we'll set the interface
- * down to avoid any conflicts due to same MAC/IP
- */
- res = set_if_down(slave_ifname, slave_flags.ifr_flags);
- if (res) {
- fprintf(stderr,
- "Slave '%s': Error: bring interface "
- "down failed\n",
- slave_ifname);
- }
- }
-
- /* set to default mtu */
- set_slave_mtu(slave_ifname, 1500);
-
- return res;
-}
-
-static int get_if_settings(char *ifname, struct dev_ifr ifra[])
-{
- int i;
- int res = 0;
-
- for (i = 0; ifra[i].req_ifr; i++) {
- strncpy(ifra[i].req_ifr->ifr_name, ifname, IFNAMSIZ);
- res = ioctl(skfd, ifra[i].req_type, ifra[i].req_ifr);
- if (res < 0) {
- saved_errno = errno;
- v_print("Interface '%s': Error: %s failed: %s\n",
- ifname, ifra[i].req_name,
- strerror(saved_errno));
-
- return saved_errno;
- }
- }
-
- return 0;
-}
-
-static int get_slave_flags(char *slave_ifname)
-{
- int res = 0;
-
- strncpy(slave_flags.ifr_name, slave_ifname, IFNAMSIZ);
- res = ioctl(skfd, SIOCGIFFLAGS, &slave_flags);
- if (res < 0) {
- saved_errno = errno;
- v_print("Slave '%s': Error: SIOCGIFFLAGS failed: %s\n",
- slave_ifname, strerror(saved_errno));
- } else {
- v_print("Slave %s: flags %04X.\n",
- slave_ifname, slave_flags.ifr_flags);
- }
-
- return res;
-}
-
-static int set_master_hwaddr(char *master_ifname, struct sockaddr *hwaddr)
-{
- unsigned char *addr = (unsigned char *)hwaddr->sa_data;
- struct ifreq ifr;
- int res = 0;
-
- strncpy(ifr.ifr_name, master_ifname, IFNAMSIZ);
- memcpy(&(ifr.ifr_hwaddr), hwaddr, sizeof(struct sockaddr));
- res = ioctl(skfd, SIOCSIFHWADDR, &ifr);
- if (res < 0) {
- saved_errno = errno;
- v_print("Master '%s': Error: SIOCSIFHWADDR failed: %s\n",
- master_ifname, strerror(saved_errno));
- return res;
- } else {
- v_print("Master '%s': hardware address set to "
- "%2.2x:%2.2x:%2.2x:%2.2x:%2.2x:%2.2x.\n",
- master_ifname, addr[0], addr[1], addr[2],
- addr[3], addr[4], addr[5]);
- }
-
- return res;
-}
-
-static int set_slave_hwaddr(char *slave_ifname, struct sockaddr *hwaddr)
-{
- unsigned char *addr = (unsigned char *)hwaddr->sa_data;
- struct ifreq ifr;
- int res = 0;
-
- strncpy(ifr.ifr_name, slave_ifname, IFNAMSIZ);
- memcpy(&(ifr.ifr_hwaddr), hwaddr, sizeof(struct sockaddr));
- res = ioctl(skfd, SIOCSIFHWADDR, &ifr);
- if (res < 0) {
- saved_errno = errno;
-
- v_print("Slave '%s': Error: SIOCSIFHWADDR failed: %s\n",
- slave_ifname, strerror(saved_errno));
-
- if (saved_errno == EBUSY) {
- v_print(" The device is busy: it must be idle "
- "before running this command.\n");
- } else if (saved_errno == EOPNOTSUPP) {
- v_print(" The device does not support setting "
- "the MAC address.\n"
- " Your kernel likely does not support slave "
- "devices.\n");
- } else if (saved_errno == EINVAL) {
- v_print(" The device's address type does not match "
- "the master's address type.\n");
- }
- return res;
- } else {
- v_print("Slave '%s': hardware address set to "
- "%2.2x:%2.2x:%2.2x:%2.2x:%2.2x:%2.2x.\n",
- slave_ifname, addr[0], addr[1], addr[2],
- addr[3], addr[4], addr[5]);
- }
-
- return res;
-}
-
-static int set_slave_mtu(char *slave_ifname, int mtu)
-{
- struct ifreq ifr;
- int res = 0;
-
- ifr.ifr_mtu = mtu;
- strncpy(ifr.ifr_name, slave_ifname, IFNAMSIZ);
-
- res = ioctl(skfd, SIOCSIFMTU, &ifr);
- if (res < 0) {
- saved_errno = errno;
- v_print("Slave '%s': Error: SIOCSIFMTU failed: %s\n",
- slave_ifname, strerror(saved_errno));
- } else {
- v_print("Slave '%s': MTU set to %d.\n", slave_ifname, mtu);
- }
-
- return res;
-}
-
-static int set_if_flags(char *ifname, short flags)
-{
- struct ifreq ifr;
- int res = 0;
-
- ifr.ifr_flags = flags;
- strncpy(ifr.ifr_name, ifname, IFNAMSIZ);
-
- res = ioctl(skfd, SIOCSIFFLAGS, &ifr);
- if (res < 0) {
- saved_errno = errno;
- v_print("Interface '%s': Error: SIOCSIFFLAGS failed: %s\n",
- ifname, strerror(saved_errno));
- } else {
- v_print("Interface '%s': flags set to %04X.\n", ifname, flags);
- }
-
- return res;
-}
-
-static int set_if_up(char *ifname, short flags)
-{
- return set_if_flags(ifname, flags | IFF_UP);
-}
-
-static int set_if_down(char *ifname, short flags)
-{
- return set_if_flags(ifname, flags & ~IFF_UP);
-}
-
-static int clear_if_addr(char *ifname)
-{
- struct ifreq ifr;
- int res = 0;
-
- strncpy(ifr.ifr_name, ifname, IFNAMSIZ);
- ifr.ifr_addr.sa_family = AF_INET;
- memset(ifr.ifr_addr.sa_data, 0, sizeof(ifr.ifr_addr.sa_data));
-
- res = ioctl(skfd, SIOCSIFADDR, &ifr);
- if (res < 0) {
- saved_errno = errno;
- v_print("Interface '%s': Error: SIOCSIFADDR failed: %s\n",
- ifname, strerror(saved_errno));
- } else {
- v_print("Interface '%s': address cleared\n", ifname);
- }
-
- return res;
-}
-
-static int set_if_addr(char *master_ifname, char *slave_ifname)
-{
- struct ifreq ifr;
- int res;
- unsigned char *ipaddr;
- int i;
- struct {
- char *req_name;
- char *desc;
- int g_ioctl;
- int s_ioctl;
- } ifra[] = {
- {"IFADDR", "addr", SIOCGIFADDR, SIOCSIFADDR},
- {"DSTADDR", "destination addr", SIOCGIFDSTADDR, SIOCSIFDSTADDR},
- {"BRDADDR", "broadcast addr", SIOCGIFBRDADDR, SIOCSIFBRDADDR},
- {"NETMASK", "netmask", SIOCGIFNETMASK, SIOCSIFNETMASK},
- {NULL, NULL, 0, 0},
- };
-
- for (i = 0; ifra[i].req_name; i++) {
- strncpy(ifr.ifr_name, master_ifname, IFNAMSIZ);
- res = ioctl(skfd, ifra[i].g_ioctl, &ifr);
- if (res < 0) {
- int saved_errno = errno;
-
- v_print("Interface '%s': Error: SIOCG%s failed: %s\n",
- master_ifname, ifra[i].req_name,
- strerror(saved_errno));
-
- ifr.ifr_addr.sa_family = AF_INET;
- memset(ifr.ifr_addr.sa_data, 0,
- sizeof(ifr.ifr_addr.sa_data));
- }
-
- strncpy(ifr.ifr_name, slave_ifname, IFNAMSIZ);
- res = ioctl(skfd, ifra[i].s_ioctl, &ifr);
- if (res < 0) {
- int saved_errno = errno;
-
- v_print("Interface '%s': Error: SIOCS%s failed: %s\n",
- slave_ifname, ifra[i].req_name,
- strerror(saved_errno));
-
- }
-
- ipaddr = (unsigned char *)ifr.ifr_addr.sa_data;
- v_print("Interface '%s': set IP %s to %d.%d.%d.%d\n",
- slave_ifname, ifra[i].desc,
- ipaddr[0], ipaddr[1], ipaddr[2], ipaddr[3]);
- }
-
- return 0;
-}
-
-/*
- * Local variables:
- * version-control: t
- * kept-new-versions: 5
- * c-indent-level: 4
- * c-basic-offset: 4
- * tab-width: 4
- * compile-command: "gcc -Wall -Wstrict-prototypes -O -I/usr/src/linux/include ifenslave.c -o ifenslave"
- * End:
- */
-
diff --git a/Documentation/networking/igb.txt b/Documentation/networking/igb.txt
index 9a2a037..4ebbd65 100644
--- a/Documentation/networking/igb.txt
+++ b/Documentation/networking/igb.txt
@@ -1,8 +1,8 @@
-Linux* Base Driver for Intel(R) Network Connection
-==================================================
+Linux* Base Driver for Intel(R) Ethernet Network Connection
+===========================================================
Intel Gigabit Linux driver.
-Copyright(c) 1999 - 2010 Intel Corporation.
+Copyright(c) 1999 - 2013 Intel Corporation.
Contents
========
@@ -36,6 +36,53 @@ Default Value: 0
This parameter adds support for SR-IOV. It causes the driver to spawn up to
max_vfs worth of virtual function.
+QueuePairs
+----------
+Valid Range: 0-1
+Default Value: 1 (TX and RX will be paired onto one interrupt vector)
+
+If set to 0, when MSI-X is enabled, the TX and RX will attempt to occupy
+separate vectors.
+
+This option can be overridden to 1 if there are not sufficient interrupts
+available. This can occur if any combination of RSS, VMDQ, and max_vfs
+results in more than 4 queues being used.
+
+Node
+----
+Valid Range: 0-n
+Default Value: -1 (off)
+
+ 0 - n: where n is the number of the NUMA node that should be used to
+ allocate memory for this adapter port.
+ -1: uses the driver default of allocating memory on whichever processor is
+ running insmod/modprobe.
+
+ The Node parameter will allow you to pick which NUMA node you want to have
+ the adapter allocate memory from. All driver structures, in-memory queues,
+ and receive buffers will be allocated on the node specified. This parameter
+ is only useful when interrupt affinity is specified, otherwise some portion
+ of the time the interrupt could run on a different core than the memory is
+ allocated on, causing slower memory access and impacting throughput, CPU, or
+ both.
+
+EEE
+---
+Valid Range: 0-1
+Default Value: 1 (enabled)
+
+ A link between two EEE-compliant devices will result in periodic bursts of
+ data followed by long periods where in the link is in an idle state. This Low
+ Power Idle (LPI) state is supported in both 1Gbps and 100Mbps link speeds.
+ NOTE: EEE support requires autonegotiation.
+
+DMAC
+----
+Valid Range: 0-1
+Default Value: 1 (enabled)
+ Enables or disables DMA Coalescing feature.
+
+
Additional Configurations
=========================
@@ -55,10 +102,10 @@ Additional Configurations
- The maximum MTU setting for Jumbo Frames is 9216. This value coincides
with the maximum Jumbo Frames size of 9234 bytes.
- - Using Jumbo Frames at 10 or 100 Mbps may result in poor performance or
- loss of link.
+ - Using Jumbo frames at 10 or 100 Mbps is not supported and may result in
+ poor performance or loss of link.
- Ethtool
+ ethtool
-------
The driver utilizes the ethtool interface for driver configuration and
diagnostics, as well as displaying statistical information. The latest
@@ -106,6 +153,14 @@ Additional Configurations
Where n=the VF that attempted to do the spoofing.
+ Setting MAC Address, VLAN and Rate Limit Using IProute2 Tool
+ ------------------------------------------------------------
+ You can set a MAC address of a Virtual Function (VF), a default VLAN and the
+ rate limit using the IProute2 tool. Download the latest version of the
+ iproute2 tool from Sourceforge if your version does not have all the
+ features you require.
+
+
Support
=======
diff --git a/Documentation/networking/igbvf.txt b/Documentation/networking/igbvf.txt
index cbfe4ee..40db17a 100644
--- a/Documentation/networking/igbvf.txt
+++ b/Documentation/networking/igbvf.txt
@@ -1,8 +1,8 @@
-Linux* Base Driver for Intel(R) Network Connection
-==================================================
+Linux* Base Driver for Intel(R) Ethernet Network Connection
+===========================================================
Intel Gigabit Linux driver.
-Copyright(c) 1999 - 2010 Intel Corporation.
+Copyright(c) 1999 - 2013 Intel Corporation.
Contents
========
@@ -55,7 +55,7 @@ networking link on the left to search for your adapter:
Additional Configurations
=========================
- Ethtool
+ ethtool
-------
The driver utilizes the ethtool interface for driver configuration and
diagnostics, as well as displaying statistical information. The ethtool
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 3458d63..3c12d9a 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -183,7 +183,7 @@ tcp_early_retrans - INTEGER
for triggering fast retransmit when the amount of outstanding data is
small and when no previously unsent data can be transmitted (such
that limited transmit could be used). Also controls the use of
- Tail loss probe (TLP) that converts RTOs occuring due to tail
+ Tail loss probe (TLP) that converts RTOs occurring due to tail
losses into fast recovery (draft-dukkipati-tcpm-tcp-loss-probe-01).
Possible values:
0 disables ER
@@ -267,17 +267,6 @@ tcp_max_orphans - INTEGER
more aggressively. Let me to remind again: each orphan eats
up to ~64K of unswappable memory.
-tcp_max_ssthresh - INTEGER
- Limited Slow-Start for TCP with large congestion windows (cwnd) defined in
- RFC3742. Limited slow-start is a mechanism to limit growth of the cwnd
- on the region where cwnd is larger than tcp_max_ssthresh. TCP increases cwnd
- by at most tcp_max_ssthresh segments, and by at least tcp_max_ssthresh/2
- segments per RTT when the cwnd is above tcp_max_ssthresh.
- If TCP connection increased cwnd to thousands (or tens of thousands) segments,
- and thousands of packets were being dropped during slow-start, you can set
- tcp_max_ssthresh to improve performance for new TCP connection.
- Default: 0 (off)
-
tcp_max_syn_backlog - INTEGER
Maximal number of remembered connection requests, which have not
received an acknowledgment from connecting client.
@@ -440,6 +429,10 @@ tcp_syncookies - BOOLEAN
SYN flood warnings in logs not being really flooded, your server
is seriously misconfigured.
+ If you want to test which effects syncookies have to your
+ network connections you can set this knob to 2 to enable
+ unconditionally generation of syncookies.
+
tcp_fastopen - INTEGER
Enable TCP Fast Open feature (draft-ietf-tcpm-fastopen) to send data
in the opening SYN packet. To use this feature, the client application
@@ -447,7 +440,7 @@ tcp_fastopen - INTEGER
connect() to perform a TCP handshake automatically.
The values (bitmap) are
- 1: Enables sending data in the opening SYN on the client.
+ 1: Enables sending data in the opening SYN on the client w/ MSG_FASTOPEN.
2: Enables TCP Fast Open on the server side, i.e., allowing data in
a SYN packet to be accepted and passed to the application before
3-way hand shake finishes.
@@ -460,7 +453,7 @@ tcp_fastopen - INTEGER
different ways of setting max_qlen without the TCP_FASTOPEN socket
option.
- Default: 0
+ Default: 1
Note that the client & server side Fast Open flags (1 and 2
respectively) must be also enabled before the rest of flags can take
@@ -478,6 +471,15 @@ tcp_syn_retries - INTEGER
tcp_timestamps - BOOLEAN
Enable timestamps as defined in RFC1323.
+tcp_min_tso_segs - INTEGER
+ Minimal number of segments per TSO frame.
+ Since linux-3.12, TCP does an automatic sizing of TSO frames,
+ depending on flow rate, instead of filling 64Kbytes packets.
+ For specific usages, it's possible to force TCP to build big
+ TSO frames. Note that TCP stack might split too big TSO packets
+ if available window is too small.
+ Default: 2
+
tcp_tso_win_divisor - INTEGER
This allows control over what percentage of the congestion window
can be consumed by a single TSO frame.
@@ -516,6 +518,19 @@ tcp_wmem - vector of 3 INTEGERs: min, default, max
this value is ignored.
Default: between 64K and 4MB, depending on RAM size.
+tcp_notsent_lowat - UNSIGNED INTEGER
+ A TCP socket can control the amount of unsent bytes in its write queue,
+ thanks to TCP_NOTSENT_LOWAT socket option. poll()/select()/epoll()
+ reports POLLOUT events if the amount of unsent bytes is below a per
+ socket value, and if the write queue is not full. sendmsg() will
+ also not add new buffers if the limit is hit.
+
+ This global variable controls the amount of unsent data for
+ sockets not using TCP_NOTSENT_LOWAT. For these sockets, a change
+ to the global variable has immediate effect.
+
+ Default: UINT_MAX (0xFFFFFFFF)
+
tcp_workaround_signed_windows - BOOLEAN
If set, assume no receipt of a window scaling option means the
remote TCP is broken and treats the window as a signed quantity.
@@ -562,9 +577,6 @@ tcp_limit_output_bytes - INTEGER
typical pfifo_fast qdiscs.
tcp_limit_output_bytes limits the number of bytes on qdisc
or device to reduce artificial RTT/cwnd and reduce bufferbloat.
- Note: For GSO/TSO enabled flows, we try to have at least two
- packets in flight. Reducing tcp_limit_output_bytes might also
- reduce the size of individual GSO packet (64KB being the max)
Default: 131072
tcp_challenge_ack_limit - INTEGER
@@ -685,6 +697,15 @@ ip_dynaddr - BOOLEAN
occurs.
Default: 0
+ip_early_demux - BOOLEAN
+ Optimize input packet processing down to one demux for
+ certain kinds of local sockets. Currently we only do this
+ for established TCP sockets.
+
+ It may add an additional cost for pure routing workloads that
+ reduces overall throughput, in such case you should disable it.
+ Default: 1
+
icmp_echo_ignore_all - BOOLEAN
If set non-zero, then the kernel will ignore all ICMP ECHO
requests sent to it.
@@ -729,7 +750,7 @@ icmp_ignore_bogus_error_responses - BOOLEAN
frames. Such violations are normally logged via a kernel warning.
If this is set to TRUE, the kernel will not give such warnings, which
will avoid log file clutter.
- Default: FALSE
+ Default: 1
icmp_errors_use_inbound_ifaddr - BOOLEAN
@@ -1013,7 +1034,15 @@ disable_policy - BOOLEAN
disable_xfrm - BOOLEAN
Disable IPSEC encryption on this interface, whatever the policy
+igmpv2_unsolicited_report_interval - INTEGER
+ The interval in milliseconds in which the next unsolicited
+ IGMPv1 or IGMPv2 report retransmit will take place.
+ Default: 10000 (10 seconds)
+igmpv3_unsolicited_report_interval - INTEGER
+ The interval in milliseconds in which the next unsolicited
+ IGMPv3 report retransmit will take place.
+ Default: 1000 (1 seconds)
tag - INTEGER
Allows you to write a number, which can be used as required.
@@ -1305,6 +1334,27 @@ ndisc_notify - BOOLEAN
1 - Generate unsolicited neighbour advertisements when device is brought
up or hardware address changes.
+mldv1_unsolicited_report_interval - INTEGER
+ The interval in milliseconds in which the next unsolicited
+ MLDv1 report retransmit will take place.
+ Default: 10000 (10 seconds)
+
+mldv2_unsolicited_report_interval - INTEGER
+ The interval in milliseconds in which the next unsolicited
+ MLDv2 report retransmit will take place.
+ Default: 1000 (1 second)
+
+force_mld_version - INTEGER
+ 0 - (default) No enforcement of a MLD version, MLDv1 fallback allowed
+ 1 - Enforce to use MLD version 1
+ 2 - Enforce to use MLD version 2
+
+suppress_frag_ndisc - INTEGER
+ Control RFC 6980 (Security Implications of IPv6 Fragmentation
+ with IPv6 Neighbor Discovery) behavior:
+ 1 - (default) discard fragmented neighbor discovery packets
+ 0 - allow fragmented neighbor discovery packets
+
icmp/*:
ratelimit - INTEGER
Limit the maximal rates for sending ICMPv6 packets.
diff --git a/Documentation/networking/ipvs-sysctl.txt b/Documentation/networking/ipvs-sysctl.txt
index 9573d0c..7a3c047 100644
--- a/Documentation/networking/ipvs-sysctl.txt
+++ b/Documentation/networking/ipvs-sysctl.txt
@@ -181,6 +181,19 @@ snat_reroute - BOOLEAN
always be the same as the original route so it is an optimisation
to disable snat_reroute and avoid the recalculation.
+sync_persist_mode - INTEGER
+ default 0
+
+ Controls the synchronisation of connections when using persistence
+
+ 0: All types of connections are synchronised
+ 1: Attempt to reduce the synchronisation traffic depending on
+ the connection type. For persistent services avoid synchronisation
+ for normal connections, do it only for persistence templates.
+ In such case, for TCP and SCTP it may need enabling sloppy_tcp and
+ sloppy_sctp flags on backup servers. For non-persistent services
+ such optimization is not applied, mode 0 is assumed.
+
sync_version - INTEGER
default 1
diff --git a/Documentation/networking/ixgb.txt b/Documentation/networking/ixgb.txt
index d75a1f9..1e0c045 100644
--- a/Documentation/networking/ixgb.txt
+++ b/Documentation/networking/ixgb.txt
@@ -1,7 +1,7 @@
-Linux Base Driver for 10 Gigabit Intel(R) Network Connection
-=============================================================
+Linux Base Driver for 10 Gigabit Intel(R) Ethernet Network Connection
+=====================================================================
-October 9, 2007
+March 14, 2011
Contents
@@ -274,9 +274,9 @@ Additional Configurations
-------------------------------------------------
Configuring a network driver to load properly when the system is started is
distribution dependent. Typically, the configuration process involves adding
- an alias line to files in /etc/modprobe.d/ as well as editing other system
- startup scripts and/or configuration files. Many popular Linux distributions
- ship with tools to make these changes for you. To learn the proper way to
+ an alias line to /etc/modprobe.conf as well as editing other system startup
+ scripts and/or configuration files. Many popular Linux distributions ship
+ with tools to make these changes for you. To learn the proper way to
configure a network device for your system, refer to your distribution
documentation. If during this process you are asked for the driver or module
name, the name for the Linux Base Driver for the Intel 10GbE Family of
@@ -306,7 +306,7 @@ Additional Configurations
with the maximum Jumbo Frames size of 16128.
- Ethtool
+ ethtool
-------
The driver utilizes the ethtool interface for driver configuration and
diagnostics, as well as displaying statistical information. The ethtool
diff --git a/Documentation/networking/ixgbe.txt b/Documentation/networking/ixgbe.txt
index af77ed3..96ccceb 100644
--- a/Documentation/networking/ixgbe.txt
+++ b/Documentation/networking/ixgbe.txt
@@ -1,8 +1,9 @@
-Linux Base Driver for 10 Gigabit PCI Express Intel(R) Network Connection
-========================================================================
+Linux* Base Driver for the Intel(R) Ethernet 10 Gigabit PCI Express Family of
+Adapters
+=============================================================================
-Intel Gigabit Linux driver.
-Copyright(c) 1999 - 2010 Intel Corporation.
+Intel 10 Gigabit Linux driver.
+Copyright(c) 1999 - 2013 Intel Corporation.
Contents
========
@@ -16,8 +17,8 @@ Contents
Identifying Your Adapter
========================
-The driver in this release is compatible with 82598 and 82599-based Intel
-Network Connections.
+The driver in this release is compatible with 82598, 82599 and X540-based
+Intel Network Connections.
For more information on how to identify your adapter, go to the Adapter &
Driver ID Guide at:
@@ -72,7 +73,7 @@ cables that comply with SFF-8431 v4.1 and SFF-8472 v10.4 specifications.
Laser turns off for SFP+ when ifconfig down
-------------------------------------------
"ifconfig down" turns off the laser for 82599-based SFP+ fiber adapters.
-"ifconfig up" turns on the later.
+"ifconfig up" turns on the laser.
82598-BASED ADAPTERS
@@ -118,6 +119,93 @@ NOTE: For 82598 backplane cards entering 1 gig mode, flow control default
behavior is changed to off. Flow control in 1 gig mode on these devices can
lead to Tx hangs.
+Intel(R) Ethernet Flow Director
+-------------------------------
+Supports advanced filters that direct receive packets by their flows to
+different queues. Enables tight control on routing a flow in the platform.
+Matches flows and CPU cores for flow affinity. Supports multiple parameters
+for flexible flow classification and load balancing.
+
+Flow director is enabled only if the kernel is multiple TX queue capable.
+
+An included script (set_irq_affinity.sh) automates setting the IRQ to CPU
+affinity.
+
+You can verify that the driver is using Flow Director by looking at the counter
+in ethtool: fdir_miss and fdir_match.
+
+Other ethtool Commands:
+To enable Flow Director
+ ethtool -K ethX ntuple on
+To add a filter
+ Use -U switch. e.g., ethtool -U ethX flow-type tcp4 src-ip 0x178000a
+ action 1
+To see the list of filters currently present:
+ ethtool -u ethX
+
+Perfect Filter: Perfect filter is an interface to load the filter table that
+funnels all flow into queue_0 unless an alternative queue is specified using
+"action". In that case, any flow that matches the filter criteria will be
+directed to the appropriate queue.
+
+If the queue is defined as -1, filter will drop matching packets.
+
+To account for filter matches and misses, there are two stats in ethtool:
+fdir_match and fdir_miss. In addition, rx_queue_N_packets shows the number of
+packets processed by the Nth queue.
+
+NOTE: Receive Packet Steering (RPS) and Receive Flow Steering (RFS) are not
+compatible with Flow Director. IF Flow Director is enabled, these will be
+disabled.
+
+The following three parameters impact Flow Director.
+
+FdirMode
+--------
+Valid Range: 0-2 (0=off, 1=ATR, 2=Perfect filter mode)
+Default Value: 1
+
+ Flow Director filtering modes.
+
+FdirPballoc
+-----------
+Valid Range: 0-2 (0=64k, 1=128k, 2=256k)
+Default Value: 0
+
+ Flow Director allocated packet buffer size.
+
+AtrSampleRate
+--------------
+Valid Range: 1-100
+Default Value: 20
+
+ Software ATR Tx packet sample rate. For example, when set to 20, every 20th
+ packet, looks to see if the packet will create a new flow.
+
+Node
+----
+Valid Range: 0-n
+Default Value: 1 (off)
+
+ 0 - n: where n is the number of NUMA nodes (i.e. 0 - 3) currently online in
+ your system
+ 1: turns this option off
+
+ The Node parameter will allow you to pick which NUMA node you want to have
+ the adapter allocate memory on.
+
+max_vfs
+-------
+Valid Range: 1-63
+Default Value: 0
+
+ If the value is greater than 0 it will also force the VMDq parameter to be 1
+ or more.
+
+ This parameter adds support for SR-IOV. It causes the driver to spawn up to
+ max_vfs worth of virtual function.
+
+
Additional Configurations
=========================
@@ -221,9 +309,10 @@ http://www.redhat.com/promo/summit/2008/downloads/pdf/Thursday/Mark_Wagner.pdf
Known Issues
============
- Enabling SR-IOV in a 32-bit Microsoft* Windows* Server 2008 Guest OS using
- Intel (R) 82576-based GbE or Intel (R) 82599-based 10GbE controller under KVM
- -----------------------------------------------------------------------------
+ Enabling SR-IOV in a 32-bit or 64-bit Microsoft* Windows* Server 2008/R2
+ Guest OS using Intel (R) 82576-based GbE or Intel (R) 82599-based 10GbE
+ controller under KVM
+ ------------------------------------------------------------------------
KVM Hypervisor/VMM supports direct assignment of a PCIe device to a VM. This
includes traditional PCIe devices, as well as SR-IOV-capable devices using
Intel 82576-based and 82599-based controllers.
diff --git a/Documentation/networking/ixgbevf.txt b/Documentation/networking/ixgbevf.txt
index 5a91a41..53d8d2a 100644
--- a/Documentation/networking/ixgbevf.txt
+++ b/Documentation/networking/ixgbevf.txt
@@ -1,8 +1,8 @@
-Linux* Base Driver for Intel(R) Network Connection
-==================================================
+Linux* Base Driver for Intel(R) Ethernet Network Connection
+===========================================================
Intel Gigabit Linux driver.
-Copyright(c) 1999 - 2010 Intel Corporation.
+Copyright(c) 1999 - 2013 Intel Corporation.
Contents
========
diff --git a/Documentation/networking/l2tp.txt b/Documentation/networking/l2tp.txt
index e63fc1f..c74434d 100644
--- a/Documentation/networking/l2tp.txt
+++ b/Documentation/networking/l2tp.txt
@@ -197,7 +197,7 @@ state information because the file format is subject to change. It is
implemented to provide extra debug information to help diagnose
problems.) Users should use the netlink API.
-/proc/net/pppol2tp is also provided for backwards compaibility with
+/proc/net/pppol2tp is also provided for backwards compatibility with
the original pppol2tp driver. It lists information about L2TPv2
tunnels and sessions only. Its use is discouraged.
diff --git a/Documentation/networking/netdev-FAQ.txt b/Documentation/networking/netdev-FAQ.txt
new file mode 100644
index 0000000..0fe1c6e
--- /dev/null
+++ b/Documentation/networking/netdev-FAQ.txt
@@ -0,0 +1,224 @@
+
+Information you need to know about netdev
+-----------------------------------------
+
+Q: What is netdev?
+
+A: It is a mailing list for all network-related Linux stuff. This includes
+ anything found under net/ (i.e. core code like IPv6) and drivers/net
+ (i.e. hardware specific drivers) in the Linux source tree.
+
+ Note that some subsystems (e.g. wireless drivers) which have a high volume
+ of traffic have their own specific mailing lists.
+
+ The netdev list is managed (like many other Linux mailing lists) through
+ VGER ( http://vger.kernel.org/ ) and archives can be found below:
+
+ http://marc.info/?l=linux-netdev
+ http://www.spinics.net/lists/netdev/
+
+ Aside from subsystems like that mentioned above, all network-related Linux
+ development (i.e. RFC, review, comments, etc.) takes place on netdev.
+
+Q: How do the changes posted to netdev make their way into Linux?
+
+A: There are always two trees (git repositories) in play. Both are driven
+ by David Miller, the main network maintainer. There is the "net" tree,
+ and the "net-next" tree. As you can probably guess from the names, the
+ net tree is for fixes to existing code already in the mainline tree from
+ Linus, and net-next is where the new code goes for the future release.
+ You can find the trees here:
+
+ http://git.kernel.org/?p=linux/kernel/git/davem/net.git
+ http://git.kernel.org/?p=linux/kernel/git/davem/net-next.git
+
+Q: How often do changes from these trees make it to the mainline Linus tree?
+
+A: To understand this, you need to know a bit of background information
+ on the cadence of Linux development. Each new release starts off with
+ a two week "merge window" where the main maintainers feed their new
+ stuff to Linus for merging into the mainline tree. After the two weeks,
+ the merge window is closed, and it is called/tagged "-rc1". No new
+ features get mainlined after this -- only fixes to the rc1 content
+ are expected. After roughly a week of collecting fixes to the rc1
+ content, rc2 is released. This repeats on a roughly weekly basis
+ until rc7 (typically; sometimes rc6 if things are quiet, or rc8 if
+ things are in a state of churn), and a week after the last vX.Y-rcN
+ was done, the official "vX.Y" is released.
+
+ Relating that to netdev: At the beginning of the 2-week merge window,
+ the net-next tree will be closed - no new changes/features. The
+ accumulated new content of the past ~10 weeks will be passed onto
+ mainline/Linus via a pull request for vX.Y -- at the same time,
+ the "net" tree will start accumulating fixes for this pulled content
+ relating to vX.Y
+
+ An announcement indicating when net-next has been closed is usually
+ sent to netdev, but knowing the above, you can predict that in advance.
+
+ IMPORTANT: Do not send new net-next content to netdev during the
+ period during which net-next tree is closed.
+
+ Shortly after the two weeks have passed (and vX.Y-rc1 is released), the
+ tree for net-next reopens to collect content for the next (vX.Y+1) release.
+
+ If you aren't subscribed to netdev and/or are simply unsure if net-next
+ has re-opened yet, simply check the net-next git repository link above for
+ any new networking-related commits.
+
+ The "net" tree continues to collect fixes for the vX.Y content, and
+ is fed back to Linus at regular (~weekly) intervals. Meaning that the
+ focus for "net" is on stabilization and bugfixes.
+
+ Finally, the vX.Y gets released, and the whole cycle starts over.
+
+Q: So where are we now in this cycle?
+
+A: Load the mainline (Linus) page here:
+
+ http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git
+
+ and note the top of the "tags" section. If it is rc1, it is early
+ in the dev cycle. If it was tagged rc7 a week ago, then a release
+ is probably imminent.
+
+Q: How do I indicate which tree (net vs. net-next) my patch should be in?
+
+A: Firstly, think whether you have a bug fix or new "next-like" content.
+ Then once decided, assuming that you use git, use the prefix flag, i.e.
+
+ git format-patch --subject-prefix='PATCH net-next' start..finish
+
+ Use "net" instead of "net-next" (always lower case) in the above for
+ bug-fix net content. If you don't use git, then note the only magic in
+ the above is just the subject text of the outgoing e-mail, and you can
+ manually change it yourself with whatever MUA you are comfortable with.
+
+Q: I sent a patch and I'm wondering what happened to it. How can I tell
+ whether it got merged?
+
+A: Start by looking at the main patchworks queue for netdev:
+
+ http://patchwork.ozlabs.org/project/netdev/list/
+
+ The "State" field will tell you exactly where things are at with
+ your patch.
+
+Q: The above only says "Under Review". How can I find out more?
+
+A: Generally speaking, the patches get triaged quickly (in less than 48h).
+ So be patient. Asking the maintainer for status updates on your
+ patch is a good way to ensure your patch is ignored or pushed to
+ the bottom of the priority list.
+
+Q: How can I tell what patches are queued up for backporting to the
+ various stable releases?
+
+A: Normally Greg Kroah-Hartman collects stable commits himself, but
+ for networking, Dave collects up patches he deems critical for the
+ networking subsystem, and then hands them off to Greg.
+
+ There is a patchworks queue that you can see here:
+ http://patchwork.ozlabs.org/bundle/davem/stable/?state=*
+
+ It contains the patches which Dave has selected, but not yet handed
+ off to Greg. If Greg already has the patch, then it will be here:
+ http://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git
+
+ A quick way to find whether the patch is in this stable-queue is
+ to simply clone the repo, and then git grep the mainline commit ID, e.g.
+
+ stable-queue$ git grep -l 284041ef21fdf2e
+ releases/3.0.84/ipv6-fix-possible-crashes-in-ip6_cork_release.patch
+ releases/3.4.51/ipv6-fix-possible-crashes-in-ip6_cork_release.patch
+ releases/3.9.8/ipv6-fix-possible-crashes-in-ip6_cork_release.patch
+ stable/stable-queue$
+
+Q: I see a network patch and I think it should be backported to stable.
+ Should I request it via "stable@vger.kernel.org" like the references in
+ the kernel's Documentation/stable_kernel_rules.txt file say?
+
+A: No, not for networking. Check the stable queues as per above 1st to see
+ if it is already queued. If not, then send a mail to netdev, listing
+ the upstream commit ID and why you think it should be a stable candidate.
+
+ Before you jump to go do the above, do note that the normal stable rules
+ in Documentation/stable_kernel_rules.txt still apply. So you need to
+ explicitly indicate why it is a critical fix and exactly what users are
+ impacted. In addition, you need to convince yourself that you _really_
+ think it has been overlooked, vs. having been considered and rejected.
+
+ Generally speaking, the longer it has had a chance to "soak" in mainline,
+ the better the odds that it is an OK candidate for stable. So scrambling
+ to request a commit be added the day after it appears should be avoided.
+
+Q: I have created a network patch and I think it should be backported to
+ stable. Should I add a "Cc: stable@vger.kernel.org" like the references
+ in the kernel's Documentation/ directory say?
+
+A: No. See above answer. In short, if you think it really belongs in
+ stable, then ensure you write a decent commit log that describes who
+ gets impacted by the bugfix and how it manifests itself, and when the
+ bug was introduced. If you do that properly, then the commit will
+ get handled appropriately and most likely get put in the patchworks
+ stable queue if it really warrants it.
+
+ If you think there is some valid information relating to it being in
+ stable that does _not_ belong in the commit log, then use the three
+ dash marker line as described in Documentation/SubmittingPatches to
+ temporarily embed that information into the patch that you send.
+
+Q: Someone said that the comment style and coding convention is different
+ for the networking content. Is this true?
+
+A: Yes, in a largely trivial way. Instead of this:
+
+ /*
+ * foobar blah blah blah
+ * another line of text
+ */
+
+ it is requested that you make it look like this:
+
+ /* foobar blah blah blah
+ * another line of text
+ */
+
+Q: I am working in existing code that has the former comment style and not the
+ latter. Should I submit new code in the former style or the latter?
+
+A: Make it the latter style, so that eventually all code in the domain of
+ netdev is of this format.
+
+Q: I found a bug that might have possible security implications or similar.
+ Should I mail the main netdev maintainer off-list?
+
+A: No. The current netdev maintainer has consistently requested that people
+ use the mailing lists and not reach out directly. If you aren't OK with
+ that, then perhaps consider mailing "security@kernel.org" or reading about
+ http://oss-security.openwall.org/wiki/mailing-lists/distros
+ as possible alternative mechanisms.
+
+Q: What level of testing is expected before I submit my change?
+
+A: If your changes are against net-next, the expectation is that you
+ have tested by layering your changes on top of net-next. Ideally you
+ will have done run-time testing specific to your change, but at a
+ minimum, your changes should survive an "allyesconfig" and an
+ "allmodconfig" build without new warnings or failures.
+
+Q: Any other tips to help ensure my net/net-next patch gets OK'd?
+
+A: Attention to detail. Re-read your own work as if you were the
+ reviewer. You can start with using checkpatch.pl, perhaps even
+ with the "--strict" flag. But do not be mindlessly robotic in
+ doing so. If your change is a bug-fix, make sure your commit log
+ indicates the end-user visible symptom, the underlying reason as
+ to why it happens, and then if necessary, explain why the fix proposed
+ is the best way to get things done. Don't mangle whitespace, and as
+ is common, don't mis-indent function arguments that span multiple lines.
+ If it is your first patch, mail it to yourself so you can test apply
+ it to an unpatched tree to confirm infrastructure didn't mangle it.
+
+ Finally, go back and read Documentation/SubmittingPatches to be
+ sure you are not repeating some common mistake documented there.
diff --git a/Documentation/networking/netdevices.txt b/Documentation/networking/netdevices.txt
index c7ecc70..0b1cf6b 100644
--- a/Documentation/networking/netdevices.txt
+++ b/Documentation/networking/netdevices.txt
@@ -10,12 +10,12 @@ network devices.
struct net_device allocation rules
==================================
Network device structures need to persist even after module is unloaded and
-must be allocated with kmalloc. If device has registered successfully,
-it will be freed on last use by free_netdev. This is required to handle the
-pathologic case cleanly (example: rmmod mydriver </sys/class/net/myeth/mtu )
+must be allocated with alloc_netdev_mqs() and friends.
+If device has registered successfully, it will be freed on last use
+by free_netdev(). This is required to handle the pathologic case cleanly
+(example: rmmod mydriver </sys/class/net/myeth/mtu )
-There are routines in net_init.c to handle the common cases of
-alloc_etherdev, alloc_netdev. These reserve extra space for driver
+alloc_netdev_mqs()/alloc_netdev() reserve extra space for driver
private data which gets freed when the network device is freed. If
separately allocated data is attached to the network device
(netdev_priv(dev)) then it is up to the module exit handler to free that.
diff --git a/Documentation/networking/netlink_mmap.txt b/Documentation/networking/netlink_mmap.txt
index 1c2dab4..b261229 100644
--- a/Documentation/networking/netlink_mmap.txt
+++ b/Documentation/networking/netlink_mmap.txt
@@ -45,7 +45,7 @@ processing.
Conversion of the reception path involves calling poll() on the file
descriptor, once the socket is readable the frames from the ring are
-processsed in order until no more messages are available, as indicated by
+processed in order until no more messages are available, as indicated by
a status word in the frame header.
On kernel side, in order to make use of memory mapped I/O on receive, the
@@ -54,9 +54,9 @@ it will use an allocated socket buffer as usual and the contents will be
copied to the ring on transmission, nullifying most of the performance gains.
Dumps of kernel databases automatically support memory mapped I/O.
-Conversion of the transmit path involves changing message contruction to
+Conversion of the transmit path involves changing message construction to
use memory from the TX ring instead of (usually) a buffer declared on the
-stack and setting up the frame header approriately. Optionally poll() can
+stack and setting up the frame header appropriately. Optionally poll() can
be used to wait for free frames in the TX ring.
Structured and definitions for using memory mapped I/O are contained in
@@ -65,8 +65,8 @@ Structured and definitions for using memory mapped I/O are contained in
RX and TX rings
----------------
-Each ring contains a number of continous memory blocks, containing frames of
-fixed size dependant on the parameters used for ring setup.
+Each ring contains a number of continuous memory blocks, containing frames of
+fixed size dependent on the parameters used for ring setup.
Ring: [ block 0 ]
[ frame 0 ]
@@ -80,7 +80,7 @@ Ring: [ block 0 ]
[ frame 2 * n + 1 ]
The blocks are only visible to the kernel, from the point of view of user-space
-the ring just contains the frames in a continous memory zone.
+the ring just contains the frames in a continuous memory zone.
The ring parameters used for setting up the ring are defined as follows:
@@ -91,7 +91,7 @@ struct nl_mmap_req {
unsigned int nm_frame_nr;
};
-Frames are grouped into blocks, where each block is a continous region of memory
+Frames are grouped into blocks, where each block is a continuous region of memory
and holds nm_block_size / nm_frame_size frames. The total number of frames in
the ring is nm_frame_nr. The following invariants hold:
@@ -113,8 +113,8 @@ Some parameters are constrained, specifically:
- nm_frame_nr must equal the actual number of frames as specified above.
-When the kernel can't allocate phsyically continous memory for a ring block,
-it will fall back to use physically discontinous memory. This might affect
+When the kernel can't allocate physically continuous memory for a ring block,
+it will fall back to use physically discontinuous memory. This might affect
performance negatively, in order to avoid this the nm_frame_size parameter
should be chosen to be as small as possible for the required frame size and
the number of blocks should be increased instead.
@@ -231,7 +231,7 @@ Ring setup:
if (setsockopt(fd, NETLINK_TX_RING, &req, sizeof(req)) < 0)
exit(1)
- /* Calculate size of each invididual ring */
+ /* Calculate size of each individual ring */
ring_size = req.nm_block_nr * req.nm_block_size;
/* Map RX/TX rings. The TX ring is located after the RX ring */
@@ -274,9 +274,9 @@ This example assumes some ring parameters of the ring setup are available.
/* Get next frame header */
hdr = rx_ring + frame_offset;
- if (hdr->nm_status == NL_MMAP_STATUS_VALID)
+ if (hdr->nm_status == NL_MMAP_STATUS_VALID) {
/* Regular memory mapped frame */
- nlh = (void *hdr) + NL_MMAP_HDRLEN;
+ nlh = (void *)hdr + NL_MMAP_HDRLEN;
len = hdr->nm_len;
/* Release empty message immediately. May happen
diff --git a/Documentation/networking/openvswitch.txt b/Documentation/networking/openvswitch.txt
index 8fa2dd1..37c20ee 100644
--- a/Documentation/networking/openvswitch.txt
+++ b/Documentation/networking/openvswitch.txt
@@ -91,6 +91,46 @@ Often we ellipsize arguments not important to the discussion, e.g.:
in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...)
+Wildcarded flow key format
+--------------------------
+
+A wildcarded flow is described with two sequences of Netlink attributes
+passed over the Netlink socket. A flow key, exactly as described above, and an
+optional corresponding flow mask.
+
+A wildcarded flow can represent a group of exact match flows. Each '1' bit
+in the mask specifies a exact match with the corresponding bit in the flow key.
+A '0' bit specifies a don't care bit, which will match either a '1' or '0' bit
+of a incoming packet. Using wildcarded flow can improve the flow set up rate
+by reduce the number of new flows need to be processed by the user space program.
+
+Support for the mask Netlink attribute is optional for both the kernel and user
+space program. The kernel can ignore the mask attribute, installing an exact
+match flow, or reduce the number of don't care bits in the kernel to less than
+what was specified by the user space program. In this case, variations in bits
+that the kernel does not implement will simply result in additional flow setups.
+The kernel module will also work with user space programs that neither support
+nor supply flow mask attributes.
+
+Since the kernel may ignore or modify wildcard bits, it can be difficult for
+the userspace program to know exactly what matches are installed. There are
+two possible approaches: reactively install flows as they miss the kernel
+flow table (and therefore not attempt to determine wildcard changes at all)
+or use the kernel's response messages to determine the installed wildcards.
+
+When interacting with userspace, the kernel should maintain the match portion
+of the key exactly as originally installed. This will provides a handle to
+identify the flow for all future operations. However, when reporting the
+mask of an installed flow, the mask should include any restrictions imposed
+by the kernel.
+
+The behavior when using overlapping wildcarded flows is undefined. It is the
+responsibility of the user space program to ensure that any incoming packet
+can match at most one flow, wildcarded or not. The current implementation
+performs best-effort detection of overlapping wildcarded flows and may reject
+some but not all of them. However, this behavior may change in future versions.
+
+
Basic rule for evolving flow keys
---------------------------------
diff --git a/Documentation/networking/operstates.txt b/Documentation/networking/operstates.txt
index 9769457..355c6d8 100644
--- a/Documentation/networking/operstates.txt
+++ b/Documentation/networking/operstates.txt
@@ -89,8 +89,8 @@ packets. The name 'carrier' and the inversion are historical, think of
it as lower layer.
Note that for certain kind of soft-devices, which are not managing any
-real hardware, there is possible to set this bit from userpsace.
-One should use TVL IFLA_CARRIER to do so.
+real hardware, it is possible to set this bit from userspace. One
+should use TVL IFLA_CARRIER to do so.
netif_carrier_ok() can be used to query that bit.
diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt
index 23dd80e..c012236 100644
--- a/Documentation/networking/packet_mmap.txt
+++ b/Documentation/networking/packet_mmap.txt
@@ -543,6 +543,14 @@ TPACKET_V2 --> TPACKET_V3:
In the AF_PACKET fanout mode, packet reception can be load balanced among
processes. This also works in combination with mmap(2) on packet sockets.
+Currently implemented fanout policies are:
+
+ - PACKET_FANOUT_HASH: schedule to socket by skb's rxhash
+ - PACKET_FANOUT_LB: schedule to socket by round-robin
+ - PACKET_FANOUT_CPU: schedule to socket by CPU packet arrives on
+ - PACKET_FANOUT_RND: schedule to socket by random selection
+ - PACKET_FANOUT_ROLLOVER: if one socket is full, rollover to another
+
Minimal example code by David S. Miller (try things like "./test eth0 hash",
"./test eth0 lb", etc.):
@@ -704,6 +712,12 @@ So it seems to be a good candidate to be used with packet fanout.
Minimal example code by Daniel Borkmann based on Chetan Loke's lolpcap (compile
it with gcc -Wall -O2 blob.c, and try things like "./a.out eth0", etc.):
+/* Written from scratch, but kernel-to-user space API usage
+ * dissected from lolpcap:
+ * Copyright 2011, Chetan Loke <loke.chetan@gmail.com>
+ * License: GPL, version 2.0
+ */
+
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
@@ -722,27 +736,6 @@ it with gcc -Wall -O2 blob.c, and try things like "./a.out eth0", etc.):
#include <linux/if_ether.h>
#include <linux/ip.h>
-#define BLOCK_SIZE (1 << 22)
-#define FRAME_SIZE 2048
-
-#define NUM_BLOCKS 64
-#define NUM_FRAMES ((BLOCK_SIZE * NUM_BLOCKS) / FRAME_SIZE)
-
-#define BLOCK_RETIRE_TOV_IN_MS 64
-#define BLOCK_PRIV_AREA_SZ 13
-
-#define ALIGN_8(x) (((x) + 8 - 1) & ~(8 - 1))
-
-#define BLOCK_STATUS(x) ((x)->h1.block_status)
-#define BLOCK_NUM_PKTS(x) ((x)->h1.num_pkts)
-#define BLOCK_O2FP(x) ((x)->h1.offset_to_first_pkt)
-#define BLOCK_LEN(x) ((x)->h1.blk_len)
-#define BLOCK_SNUM(x) ((x)->h1.seq_num)
-#define BLOCK_O2PRIV(x) ((x)->offset_to_priv)
-#define BLOCK_PRIV(x) ((void *) ((uint8_t *) (x) + BLOCK_O2PRIV(x)))
-#define BLOCK_HDR_LEN (ALIGN_8(sizeof(struct block_desc)))
-#define BLOCK_PLUS_PRIV(sz_pri) (BLOCK_HDR_LEN + ALIGN_8((sz_pri)))
-
#ifndef likely
# define likely(x) __builtin_expect(!!(x), 1)
#endif
@@ -765,7 +758,7 @@ struct ring {
static unsigned long packets_total = 0, bytes_total = 0;
static sig_atomic_t sigint = 0;
-void sighandler(int num)
+static void sighandler(int num)
{
sigint = 1;
}
@@ -774,6 +767,8 @@ static int setup_socket(struct ring *ring, char *netdev)
{
int err, i, fd, v = TPACKET_V3;
struct sockaddr_ll ll;
+ unsigned int blocksiz = 1 << 22, framesiz = 1 << 11;
+ unsigned int blocknum = 64;
fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
if (fd < 0) {
@@ -788,13 +783,12 @@ static int setup_socket(struct ring *ring, char *netdev)
}
memset(&ring->req, 0, sizeof(ring->req));
- ring->req.tp_block_size = BLOCK_SIZE;
- ring->req.tp_frame_size = FRAME_SIZE;
- ring->req.tp_block_nr = NUM_BLOCKS;
- ring->req.tp_frame_nr = NUM_FRAMES;
- ring->req.tp_retire_blk_tov = BLOCK_RETIRE_TOV_IN_MS;
- ring->req.tp_sizeof_priv = BLOCK_PRIV_AREA_SZ;
- ring->req.tp_feature_req_word |= TP_FT_REQ_FILL_RXHASH;
+ ring->req.tp_block_size = blocksiz;
+ ring->req.tp_frame_size = framesiz;
+ ring->req.tp_block_nr = blocknum;
+ ring->req.tp_frame_nr = (blocksiz * blocknum) / framesiz;
+ ring->req.tp_retire_blk_tov = 60;
+ ring->req.tp_feature_req_word = TP_FT_REQ_FILL_RXHASH;
err = setsockopt(fd, SOL_PACKET, PACKET_RX_RING, &ring->req,
sizeof(ring->req));
@@ -804,8 +798,7 @@ static int setup_socket(struct ring *ring, char *netdev)
}
ring->map = mmap(NULL, ring->req.tp_block_size * ring->req.tp_block_nr,
- PROT_READ | PROT_WRITE, MAP_SHARED | MAP_LOCKED,
- fd, 0);
+ PROT_READ | PROT_WRITE, MAP_SHARED | MAP_LOCKED, fd, 0);
if (ring->map == MAP_FAILED) {
perror("mmap");
exit(1);
@@ -835,58 +828,6 @@ static int setup_socket(struct ring *ring, char *netdev)
return fd;
}
-#ifdef __checked
-static uint64_t prev_block_seq_num = 0;
-
-void assert_block_seq_num(struct block_desc *pbd)
-{
- if (unlikely(prev_block_seq_num + 1 != BLOCK_SNUM(pbd))) {
- printf("prev_block_seq_num:%"PRIu64", expected seq:%"PRIu64" != "
- "actual seq:%"PRIu64"\n", prev_block_seq_num,
- prev_block_seq_num + 1, (uint64_t) BLOCK_SNUM(pbd));
- exit(1);
- }
-
- prev_block_seq_num = BLOCK_SNUM(pbd);
-}
-
-static void assert_block_len(struct block_desc *pbd, uint32_t bytes, int block_num)
-{
- if (BLOCK_NUM_PKTS(pbd)) {
- if (unlikely(bytes != BLOCK_LEN(pbd))) {
- printf("block:%u with %upackets, expected len:%u != actual len:%u\n",
- block_num, BLOCK_NUM_PKTS(pbd), bytes, BLOCK_LEN(pbd));
- exit(1);
- }
- } else {
- if (unlikely(BLOCK_LEN(pbd) != BLOCK_PLUS_PRIV(BLOCK_PRIV_AREA_SZ))) {
- printf("block:%u, expected len:%lu != actual len:%u\n",
- block_num, BLOCK_HDR_LEN, BLOCK_LEN(pbd));
- exit(1);
- }
- }
-}
-
-static void assert_block_header(struct block_desc *pbd, const int block_num)
-{
- uint32_t block_status = BLOCK_STATUS(pbd);
-
- if (unlikely((block_status & TP_STATUS_USER) == 0)) {
- printf("block:%u, not in TP_STATUS_USER\n", block_num);
- exit(1);
- }
-
- assert_block_seq_num(pbd);
-}
-#else
-static inline void assert_block_header(struct block_desc *pbd, const int block_num)
-{
-}
-static void assert_block_len(struct block_desc *pbd, uint32_t bytes, int block_num)
-{
-}
-#endif
-
static void display(struct tpacket3_hdr *ppd)
{
struct ethhdr *eth = (struct ethhdr *) ((uint8_t *) ppd + ppd->tp_mac);
@@ -916,37 +857,27 @@ static void display(struct tpacket3_hdr *ppd)
static void walk_block(struct block_desc *pbd, const int block_num)
{
- int num_pkts = BLOCK_NUM_PKTS(pbd), i;
+ int num_pkts = pbd->h1.num_pkts, i;
unsigned long bytes = 0;
- unsigned long bytes_with_padding = BLOCK_PLUS_PRIV(BLOCK_PRIV_AREA_SZ);
struct tpacket3_hdr *ppd;
- assert_block_header(pbd, block_num);
-
- ppd = (struct tpacket3_hdr *) ((uint8_t *) pbd + BLOCK_O2FP(pbd));
+ ppd = (struct tpacket3_hdr *) ((uint8_t *) pbd +
+ pbd->h1.offset_to_first_pkt);
for (i = 0; i < num_pkts; ++i) {
bytes += ppd->tp_snaplen;
- if (ppd->tp_next_offset)
- bytes_with_padding += ppd->tp_next_offset;
- else
- bytes_with_padding += ALIGN_8(ppd->tp_snaplen + ppd->tp_mac);
-
display(ppd);
- ppd = (struct tpacket3_hdr *) ((uint8_t *) ppd + ppd->tp_next_offset);
- __sync_synchronize();
+ ppd = (struct tpacket3_hdr *) ((uint8_t *) ppd +
+ ppd->tp_next_offset);
}
- assert_block_len(pbd, bytes_with_padding, block_num);
-
packets_total += num_pkts;
bytes_total += bytes;
}
-void flush_block(struct block_desc *pbd)
+static void flush_block(struct block_desc *pbd)
{
- BLOCK_STATUS(pbd) = TP_STATUS_KERNEL;
- __sync_synchronize();
+ pbd->h1.block_status = TP_STATUS_KERNEL;
}
static void teardown_socket(struct ring *ring, int fd)
@@ -962,7 +893,7 @@ int main(int argc, char **argp)
socklen_t len;
struct ring ring;
struct pollfd pfd;
- unsigned int block_num = 0;
+ unsigned int block_num = 0, blocks = 64;
struct block_desc *pbd;
struct tpacket_stats_v3 stats;
@@ -984,15 +915,15 @@ int main(int argc, char **argp)
while (likely(!sigint)) {
pbd = (struct block_desc *) ring.rd[block_num].iov_base;
-retry_block:
- if ((BLOCK_STATUS(pbd) & TP_STATUS_USER) == 0) {
+
+ if ((pbd->h1.block_status & TP_STATUS_USER) == 0) {
poll(&pfd, 1, -1);
- goto retry_block;
+ continue;
}
walk_block(pbd, block_num);
flush_block(pbd);
- block_num = (block_num + 1) % NUM_BLOCKS;
+ block_num = (block_num + 1) % blocks;
}
len = sizeof(stats);
diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt
index 60d05eb..b89bc82e 100644
--- a/Documentation/networking/rxrpc.txt
+++ b/Documentation/networking/rxrpc.txt
@@ -144,7 +144,7 @@ An overview of the RxRPC protocol:
(*) Calls use ACK packets to handle reliability. Data packets are also
explicitly sequenced per call.
- (*) There are two types of positive acknowledgement: hard-ACKs and soft-ACKs.
+ (*) There are two types of positive acknowledgment: hard-ACKs and soft-ACKs.
A hard-ACK indicates to the far side that all the data received to a point
has been received and processed; a soft-ACK indicates that the data has
been received but may yet be discarded and re-requested. The sender may
diff --git a/Documentation/networking/scaling.txt b/Documentation/networking/scaling.txt
index 579994a..ca6977f 100644
--- a/Documentation/networking/scaling.txt
+++ b/Documentation/networking/scaling.txt
@@ -163,6 +163,64 @@ and unnecessary. If there are fewer hardware queues than CPUs, then
RPS might be beneficial if the rps_cpus for each queue are the ones that
share the same memory domain as the interrupting CPU for that queue.
+==== RPS Flow Limit
+
+RPS scales kernel receive processing across CPUs without introducing
+reordering. The trade-off to sending all packets from the same flow
+to the same CPU is CPU load imbalance if flows vary in packet rate.
+In the extreme case a single flow dominates traffic. Especially on
+common server workloads with many concurrent connections, such
+behavior indicates a problem such as a misconfiguration or spoofed
+source Denial of Service attack.
+
+Flow Limit is an optional RPS feature that prioritizes small flows
+during CPU contention by dropping packets from large flows slightly
+ahead of those from small flows. It is active only when an RPS or RFS
+destination CPU approaches saturation. Once a CPU's input packet
+queue exceeds half the maximum queue length (as set by sysctl
+net.core.netdev_max_backlog), the kernel starts a per-flow packet
+count over the last 256 packets. If a flow exceeds a set ratio (by
+default, half) of these packets when a new packet arrives, then the
+new packet is dropped. Packets from other flows are still only
+dropped once the input packet queue reaches netdev_max_backlog.
+No packets are dropped when the input packet queue length is below
+the threshold, so flow limit does not sever connections outright:
+even large flows maintain connectivity.
+
+== Interface
+
+Flow limit is compiled in by default (CONFIG_NET_FLOW_LIMIT), but not
+turned on. It is implemented for each CPU independently (to avoid lock
+and cache contention) and toggled per CPU by setting the relevant bit
+in sysctl net.core.flow_limit_cpu_bitmap. It exposes the same CPU
+bitmap interface as rps_cpus (see above) when called from procfs:
+
+ /proc/sys/net/core/flow_limit_cpu_bitmap
+
+Per-flow rate is calculated by hashing each packet into a hashtable
+bucket and incrementing a per-bucket counter. The hash function is
+the same that selects a CPU in RPS, but as the number of buckets can
+be much larger than the number of CPUs, flow limit has finer-grained
+identification of large flows and fewer false positives. The default
+table has 4096 buckets. This value can be modified through sysctl
+
+ net.core.flow_limit_table_len
+
+The value is only consulted when a new table is allocated. Modifying
+it does not update active tables.
+
+== Suggested Configuration
+
+Flow limit is useful on systems with many concurrent connections,
+where a single connection taking up 50% of a CPU indicates a problem.
+In such environments, enable the feature on all CPUs that handle
+network rx interrupts (as set in /proc/irq/N/smp_affinity).
+
+The feature depends on the input packet queue length to exceed
+the flow limit threshold (50%) + the flow history length (256).
+Setting net.core.netdev_max_backlog to either 1000 or 10000
+performed well in experiments.
+
RFS: Receive Flow Steering
==========================
diff --git a/Documentation/networking/sctp.txt b/Documentation/networking/sctp.txt
index 0c790a7..97b810c 100644
--- a/Documentation/networking/sctp.txt
+++ b/Documentation/networking/sctp.txt
@@ -19,7 +19,6 @@ of SCTP that is RFC 2960 compliant and provides an programming interface
referred to as the UDP-style API of the Sockets Extensions for SCTP, as
proposed in IETF Internet-Drafts.
-
Caveats:
-lksctp can be built as statically or as a module. However, be aware that
@@ -33,6 +32,4 @@ For more information, please visit the lksctp project website:
http://www.sf.net/projects/lksctp
Or contact the lksctp developers through the mailing list:
- <lksctp-developers@lists.sourceforge.net>
-
-
+ <linux-sctp@vger.kernel.org>
diff --git a/Documentation/networking/stmmac.txt b/Documentation/networking/stmmac.txt
index 654d2e5..cdd916d 100644
--- a/Documentation/networking/stmmac.txt
+++ b/Documentation/networking/stmmac.txt
@@ -123,6 +123,7 @@ struct plat_stmmacenet_data {
int bugged_jumbo;
int pmt;
int force_sf_dma_mode;
+ int force_thresh_dma_mode;
int riwt_off;
void (*fix_mac_speed)(void *priv, unsigned int speed);
void (*bus_setup)(void __iomem *ioaddr);
@@ -159,6 +160,8 @@ Where:
o pmt: core has the embedded power module (optional).
o force_sf_dma_mode: force DMA to use the Store and Forward mode
instead of the Threshold.
+ o force_thresh_dma_mode: force DMA to use the Threshold mode other than
+ the Store and Forward mode.
o riwt_off: force to disable the RX watchdog feature and switch to NAPI mode.
o fix_mac_speed: this callback is used for modifying some syscfg registers
(on ST SoCs) according to the link speed negotiated by the
@@ -172,7 +175,7 @@ Where:
registers.
o custom_cfg/custom_data: this is a custom configuration that can be passed
while initializing the resources.
- o bsp_priv: another private poiter.
+ o bsp_priv: another private pointer.
For MDIO bus The we have:
@@ -268,7 +271,7 @@ reset procedure etc).
o dwmac1000_dma.c: dma functions for the GMAC chip;
o dwmac1000.h: specific header file for the GMAC;
o dwmac100_core: MAC 100 core and dma code;
- o dwmac100_dma.c: dma funtions for the MAC chip;
+ o dwmac100_dma.c: dma functions for the MAC chip;
o dwmac1000.h: specific header file for the MAC;
o dwmac_lib.c: generic DMA functions shared among chips;
o enh_desc.c: functions for handling enhanced descriptors;
@@ -361,4 +364,4 @@ Auto-negotiated Link Parter Ability.
10) TODO:
o XGMAC is not supported.
o Complete the TBI & RTBI support.
- o extened VLAN support for 3.70a SYNP GMAC.
+ o extend VLAN support for 3.70a SYNP GMAC.
diff --git a/Documentation/networking/tproxy.txt b/Documentation/networking/tproxy.txt
index 7b5996d..ec11429 100644
--- a/Documentation/networking/tproxy.txt
+++ b/Documentation/networking/tproxy.txt
@@ -2,9 +2,8 @@ Transparent proxy support
=========================
This feature adds Linux 2.2-like transparent proxy support to current kernels.
-To use it, enable NETFILTER_TPROXY, the socket match and the TPROXY target in
-your kernel config. You will need policy routing too, so be sure to enable that
-as well.
+To use it, enable the socket match and the TPROXY target in your kernel config.
+You will need policy routing too, so be sure to enable that as well.
1. Making non-local sockets work
diff --git a/Documentation/networking/vortex.txt b/Documentation/networking/vortex.txt
index b4038ff..97282da 100644
--- a/Documentation/networking/vortex.txt
+++ b/Documentation/networking/vortex.txt
@@ -68,7 +68,7 @@ Module parameters
There are several parameters which may be provided to the driver when
its module is loaded. These are usually placed in /etc/modprobe.d/*.conf
-configuretion files. Example:
+configuration files. Example:
options 3c59x debug=3 rx_copybreak=300
@@ -178,7 +178,7 @@ max_interrupt_work=N
The driver's interrupt service routine can handle many receive and
transmit packets in a single invocation. It does this in a loop.
- The value of max_interrupt_work governs how mnay times the interrupt
+ The value of max_interrupt_work governs how many times the interrupt
service routine will loop. The default value is 32 loops. If this
is exceeded the interrupt service routine gives up and generates a
warning message "eth0: Too much work in interrupt".
@@ -359,7 +359,7 @@ steps you should take:
- OK, it's a driver problem.
You need to generate a report. Typically this is an email to the
- maintainer and/or linux-net@vger.kernel.org. The maintainer's
+ maintainer and/or netdev@vger.kernel.org. The maintainer's
email address will be in the driver source or in the MAINTAINERS file.
- The contents of your report will vary a lot depending upon the
diff --git a/Documentation/networking/x25-iface.txt b/Documentation/networking/x25-iface.txt
index 78f662e..7f213b5 100644
--- a/Documentation/networking/x25-iface.txt
+++ b/Documentation/networking/x25-iface.txt
@@ -105,7 +105,7 @@ reduced by the following measures or a combination thereof:
later.
The lapb module interface was modified to support this. Its
data_indication() method should now transparently pass the
- netif_rx() return value to the (lapb mopdule) caller.
+ netif_rx() return value to the (lapb module) caller.
(2) Drivers for kernel versions 2.2.x should always check the global
variable netdev_dropping when a new frame is received. The driver
should only call netif_rx() if netdev_dropping is zero. Otherwise