logo资料库

Linux Kernel Networking - Implementation and Theory.pdf

第1页 / 共636页
第2页 / 共636页
第3页 / 共636页
第4页 / 共636页
第5页 / 共636页
第6页 / 共636页
第7页 / 共636页
第8页 / 共636页
资料共636页,剩余部分请下载后查看
Contents at a Glance
Contents
About the Author
About the Technical Reviewer
Acknowledgments
Preface
Chapter 1: Introduction
The Linux Network Stack
The Network Device
New API (NAPI) in Network Devices
Receiving and Transmitting Packets
The Socket Buffer
The Linux Kernel Networking Development Model
Summary
Chapter 2: Netlink Sockets
The Netlink Family
Netlink Sockets Libraries
The sockaddr_nl Structure
Userspace Packages for Controlling TCP/IP Networking
Kernel Netlink Sockets
The Netlink Message Header
NETLINK_ROUTE Messages
Adding and Deleting a Routing Entry in a Routing Table
Generic Netlink Protocol
Creating and Sending Generic Netlink Messages
Socket Monitoring Interface
Summary
Quick Reference
Chapter 3: Internet Control Message Protocol (ICMP)
ICMPv4
ICMPv4 Initialization
ICMPv4 Header
Receiving ICMPv4 Messages
Sending ICMPv4 Messages: “Destination Unreachable”
Code 2: ICMP_PROT_UNREACH (Protocol Unreachable)
Code 3: ICMP_PORT_UNREACH (“Port Unreachable”)
Code 4: ICMP_FRAG_NEEDED
Code 5: ICMP_SR_FAILED
ICMPv6
ICMPv6 Initialization
ICMPv6 Header
Receiving ICMPv6 Messages
Sending ICMPv6 Messages
Example: Sending “Hop Limit Time Exceeded” ICMPv6 Messages
Example: Sending “Fragment Reassembly Time Exceeded” ICMPv6 Messages
Example: Sending “Destination Unreachable”/“Port Unreachable” ICMPv6 Messages
Example: Sending “Fragmentation Needed” ICMPv6 Messages
Example: Sending “Parameter Problem” ICMPv6 Messages
ICMP Sockets (“Ping sockets”)
Summary
Quick Reference
Methods
int icmp_rcv(struct sk_buff *skb);
extern void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info);
struct icmp6hdr *icmp6_hdr(const struct sk_buff *skb);
void icmpv6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info);
void icmpv6_param_prob(struct sk_buff *skb, u8 code, int pos);
Tables
procfs entries
sysctl_icmp_echo_ignore_all
sysctl_icmp_echo_ignore_broadcasts
sysctl_icmp_ignore_bogus_error_responses
sysctl_icmp_ratelimit
sysctl_icmp_ratemask
sysctl_icmp_errors_use_inbound_ifaddr
Creating “Destination Unreachable” Messages with iptables
Chapter 4: IPv4
IPv4 Header
IPv4 Initialization
Receiving IPv4 Packets
Receiving IPv4 Multicast Packets
IP Options
Timestamp Option
Record Route Option
IP Options and Fragmentation
Building IP Options
Sending IPv4 Packets
Fragmentation
Fast Path
Slow Path
Defragmentation
Forwarding
Summary
Quick Reference
Methods
int ip_queue_xmit(struct sk_buff *skb, struct flowi *fl);
int ip_append_data(struct sock *sk, struct flowi4 *fl4, int getfrag(void *from, char *to, int offset, int len, int odd, str...
struct sk_buff *ip_make_skb(struct sock *sk, struct flowi4 *fl4, int getfrag(void *from, char *to, int offset, int len, int...
int ip_generic_getfrag(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb);
static int icmp_glue_bits(void *from, char *to, int offset, int len, int odd, struct sk_buff *skb);
int ip_options_compile(struct net *net,struct ip_options *opt, struct sk_buff *skb);
void ip_options_fragment(struct sk_buff *skb);
void ip_options_build(struct sk_buff *skb, struct ip_options *opt, __be32 daddr, struct rtable *rt, int is_frag);
void ip_forward_options(struct sk_buff *skb);
int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev);
ip_rcv_options(struct sk_buff *skb);
int ip_options_rcv_srr(struct sk_buff *skb);
int ip_forward(struct sk_buff *skb);
static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt, struct sk_buff *skb, struct mfc_cache *c, int vifi);
static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, void *from, size_t length, struct rtable **rtp, unsigned in...
int ip_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *));
int ip_defrag(struct sk_buff *skb, u32 user);
bool skb_has_frag_list(const struct sk_buff *skb);
int ip_local_deliver(struct sk_buff *skb);
bool ip_is_fragment(const struct iphdr *iph);
int ip_decrease_ttl(struct iphdr *iph);
int ip_build_and_send_pkt(struct sk_buff *skb, struct sock *sk, __be32 saddr, __be32 daddr, struct ip_options_rcu *opt);
int ip_mr_input(struct sk_buff *skb);
int ip_mr_forward(struct net *net, struct mr_table *mrt, struct sk_buff *skb, struct mfc_cache *cache, int local);
bool ip_call_ra_chain(struct sk_buff *skb);
Macros
IPCB(skb)
FRAG_CB(skb)
int NF_HOOK(uint8_t pf, unsigned int hook, struct sk_buff *skb, struct net_device *in, struct net_device *out, int (*okfn)(...
int NF_HOOK_COND(uint8_t pf, unsigned int hook, struct sk_buff *skb, struct net_device *in, struct net_device *out, int (*o...
IPOPT_COPIED()
Chapter 5: The IPv4 Routing Subsystem
Forwarding and the FIB
Performing a Lookup in the Routing Subsystem
FIB Tables
FIB Info
Caching
Nexthop (fib_nh)
FIB Nexthop Exceptions
Policy Routing
FIB Alias (fib_alias)
ICMPv4 Redirect Message
Generating an ICMPv4 Redirect Message
Receiving an ICMPv4 Redirect Message
IPv4 Routing Cache
Rx Path
Tx Path
Summary
Quick Reference
Methods
int fib_table_insert(struct fib_table *tb, struct fib_config *cfg);
int fib_table_delete(struct fib_table *tb, struct fib_config *cfg);
struct fib_info *fib_create_info(struct fib_config *cfg);
void free_fib_info(struct fib_info *fi);
void fib_alias_accessed(struct fib_alias *fa);
void ip_rt_send_redirect(struct sk_buff *skb);
void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flowi4*fl4, bool kill_route);
void update_or_create_fnhe(struct fib_nh *nh, __be32 daddr, __be32 gw, u32 pmtu, unsigned long expires);
u32 dst_metric(const struct dst_entry *dst, int metric);
struct fib_table *fib_trie_table(u32 id);
struct leaf *fib_find_node(struct trie *t, u32 key);
Macros
FIB_RES_GW()
FIB_RES_DEV()
FIB_RES_OIF()
FIB_RES_NH()
IN_DEV_FORWARD()
IN_DEV_RX_REDIRECTS()
IN_DEV_TX_REDIRECTS()
IS_LEAF()
IS_TNODE()
change_nexthops()
Tables
Route Flags
Chapter 6: Advanced Routing
Multicast Routing
The IGMP Protocol
The Multicast Routing Table
The Multicast Forwarding Cache (MFC)
Multicast Router
The Vif Device
IPv4 Multicast Rx Path
The ip_mr_forward() Method
The ipmr_queue_xmit() Method
The ipmr_forward_finish() Method
The TTL in Multicast Traffic
Policy Routing
Policy Routing Management
Policy Routing Implementation
Multipath Routing
Summary
Quick Reference
Methods
int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsigned int optlen);
int ip_mroute_getsockopt(struct sock *sk, int optname, char __user *optval, int __user *optlen);
struct mr_table *ipmr_new_table(struct net *net, u32 id);
void ipmr_free_table(struct mr_table *mrt);
int ip_mc_join_group(struct sock *sk, struct ip_mreqn *imr);
static struct mfc_cache *ipmr_cache_find(struct mr_table *mrt, __be32 origin, __be32 mcastgrp);
bool ipv4_is_multicast(__be32 addr);
int ip_mr_input(struct sk_buff *skb);
struct mfc_cache *ipmr_cache_alloc(void);
static struct mfc_cache *ipmr_cache_alloc_unres(void);
void fib_select_multipath(struct fib_result *res);
int dev_set_allmulti(struct net_device *dev, int inc);
int igmp_rcv(struct sk_buff *skb);
static int ipmr_mfc_add(struct net *net, struct mr_table *mrt, struct mfcctl *mfc, int mrtsock, int parent);
static int ipmr_mfc_delete(struct mr_table *mrt, struct mfcctl *mfc, int parent);
static int vif_add(struct net *net, struct mr_table *mrt, struct vifctl *vifc, int mrtsock);
static int vif_delete(struct mr_table *mrt, int vifi, int notify, struct list_head *head);
static void ipmr_expire_process(unsigned long arg);
static int ipmr_cache_report(struct mr_table *mrt, struct sk_buff *pkt, vifi_t vifi, int assert);
static int ipmr_device_event(struct notifier_block *this, unsigned long event, void *ptr);
static void mrtsock_destruct(struct sock *sk);
Macros
MFC_HASH(a,b)
VIF_EXISTS(_mrt, _idx)
Procfs Multicast Entries
/proc/net/ip_mr_vif
/proc/net/ip_mr_cache
Table
Chapter 7: Linux Neighbouring Subsystem
The Neighbouring Subsystem Core
Creating and Freeing a Neighbour
Interaction Between Userspace and the Neighbouring Subsystem
Handling Network Events
The ARP protocol (IPv4)
ARP: Sending Solicitation Requests
ARP: Receiving Solicitation Requests and Replies
The arp_process( ) Method
The arp_process( ) Method—Extracting Headers:
The arp_process( ) Method—arp_ignore( ) and arp_filter( ) Methods
The NDISC Protocol (IPv6)
Duplicate Address Detection (DAD)
NIDSC: Sending Solicitation Requests
NDISC: Receiving Neighbour Solicitations and Advertisements
Summary
Quick Reference
Methods
void neigh_table_init(struct neigh_table *tbl)
void neigh_table_init_no_netlink(struct neigh_table *tbl)
int neigh_table_clear(struct neigh_table *tbl)
struct neighbour *neigh_alloc(struct neigh_table *tbl, struct net_device *dev)
struct neigh_hash_table *neigh_hash_alloc(unsigned int shift)
struct neighbour *__neigh_create(struct neigh_table *tbl, const void *pkey, struct net_device *dev, bool want_ref)
int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
void neigh_probe(struct neighbour *neigh)
int neigh_forced_gc(struct neigh_table *tbl)
void neigh_periodic_work(struct work_struct *work)
static void neigh_timer_handler(unsigned long arg)
struct neighbour *__neigh_lookup(struct neigh_table *tbl, const void *pkey, struct net_device *dev, int creat)
neigh_hh_init(struct neighbour *n, struct dst_entry *dst)
void __init arp_init(void)
int arp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
int arp_constructor(struct neighbour *neigh)
int arp_process(struct sk_buff *skb)
void arp_solicit(struct neighbour *neigh, struct sk_buff *skb)
void arp_send(int type, int ptype, __be32 dest_ip, struct net_device *dev, __be32 src_ip, const unsigned char *dest_hw, con...
void arp_xmit(struct sk_buff *skb)
struct arphdr *arp_hdr(const struct sk_buff *skb)
int arp_mc_map(__be32 addr, u8 *haddr, struct net_device *dev, int dir)
static inline int arp_fwd_proxy(struct in_device *in_dev, struct net_device *dev, struct rtable *rt)
static inline int arp_fwd_pvlan(struct in_device *in_dev, struct net_device *dev, struct rtable *rt, __be32 sip, __be32 tip)
int arp_netdev_event(struct notifier_block *this, unsigned long event, void *ptr)
int ndisc_netdev_event(struct notifier_block *this, unsigned long event, void *ptr)
int ndisc_rcv(struct sk_buff *skb)
static int neigh_blackhole(struct neighbour *neigh, struct sk_buff *skb)
static void ndisc_recv_ns(struct sk_buff *skb) and static void ndisc_recv_na(struct sk_buff *skb)
static void ndisc_recv_rs(struct sk_buff *skb) and static void ndisc_router_discovery(struct sk_buff *skb)
int ndisc_mc_map(const struct in6_addr *addr, char *buf, struct net_device *dev, int dir)
int ndisc_constructor(struct neighbour *neigh)
void ndisc_solicit(struct neighbour *neigh, struct sk_buff *skb)
int icmpv6_rcv(struct sk_buff *skb)
bool ipv6_addr_any(const struct in6_addr *a)
int inet_addr_onlink(struct in_device *in_dev, __be32 a, __be32 b)
Macros
IN_DEV_PROXY_ARP(in_dev)
IN_DEV_PROXY_ARP_PVLAN(in_dev)
IN_DEV_ARPFILTER(in_dev)
IN_DEV_ARP_ACCEPT(in_dev)
IN_DEV_ARP_ANNOUNCE(in_dev)
IN_DEV_ARP_IGNORE(in_dev)
IN_DEV_ARP_NOTIFY(in_dev)
IN_DEV_SHARED_MEDIA(in_dev)
IN_DEV_ROUTE_LOCALNET(in_dev)
neigh_hold()
The neigh_statistics Structure
Table
Chapter 8: IPv6
IPv6 – Short Introduction
IPv6 Addresses
Special Addresses
Multicast Addresses
Special Multicast Addresses
IPv6 Header
Extension Headers
IPv6 Initialization
Autoconfiguration
Receiving IPv6 Packets
Local Delivery
Forwarding
Receiving IPv6 Multicast Packets
Multicast Listener Discovery (MLD)
Joining and Leaving a Multicast Group
MLDv2 Multicast Listener Report
Multicast Source Filtering (MSF)
Joining and Leaving a Multicast Group with Source Filtering
Example: Using MCAST_MSFILTER for Source Filtering
Sending IPv6 Packets
IPv6 Routing
Summary
Quick Reference
Methods
bool ipv6_addr_any(const struct in6_addr *a);
bool ipv6_addr_equal(const struct in6_addr *a1, const struct in6_addr *a2);
static inline void ipv6_addr_set(struct in6_addr *addr, __be32 w1, __be32 w2, __be32 w3, __be32 w4);
bool ipv6_addr_is_multicast(const struct in6_addr *addr);
bool ipv6_ext_hdr(u8 nexthdr);
struct ipv6hdr *ipv6_hdr(const struct sk_buff *skb);
struct inet6_dev *in6_dev_get(const struct net_device *dev);
bool ipv6_is_mld(struct sk_buff *skb, int nexthdr, int offset);
bool raw6_local_deliver(struct sk_buff *, int);
int ipv6_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev);
bool ipv6_accept_ra(struct inet6_dev *idev);
void ip6_route_input(struct sk_buff *skb);
int ip6_forward(struct sk_buff *skb);
struct dst_entry *ip6_route_output(struct net *net, const struct sock *sk, struct flowi6 *fl6);
void in6_dev_hold(struct inet6_dev *idev); and void __in6_dev_put(struct inet6_dev *idev);
int ip6_mc_msfilter(struct sock *sk, struct group_filter *gsf);
int ip6_mc_input(struct sk_buff *skb);
int ip6_mr_input(struct sk_buff *skb);
int ipv6_dev_mc_inc(struct net_device *dev, const struct in6_addr *addr);
int __ipv6_dev_mc_dec(struct inet6_dev *idev, const struct in6_addr *addr);
bool ipv6_chk_mcast_addr(struct net_device *dev, const struct in6_addr *group, const struct in6_addr *src_addr);
inline void addrconf_addr_solict_mult(const struct in6_addr *addr, struct in6_addr *solicited)
void addrconf_join_solict(struct net_device *dev, const struct in6_addr *addr);
int ipv6_sock_mc_join(struct sock *sk, int ifindex, const struct in6_addr *addr);
int ipv6_sock_mc_drop(struct sock *sk, int ifindex, const struct in6_addr *addr);
int inet6_add_protocol(const struct inet6_protocol *prot, unsigned char protocol);
int ipv6_parse_hopopts(struct sk_buff *skb);
int ip6_local_out(struct sk_buff *skb);
int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *));
void icmpv6_param_prob(struct sk_buff *skb, u8 code, int pos);
int do_ipv6_setsockopt(struct sock *sk, int level, int optname, char __user *optval, unsigned int optlen); static int do_ip...
int igmp6_event_query(struct sk_buff *skb);
void ip6_route_input(struct sk_buff *skb);
Macros
IPV6_ADDR_MC_SCOPE( )
IPV6_ADDR_MC_FLAG_TRANSIENT( )
IPV6_ADDR_MC_FLAG_PREFIX( )
IPV6_ADDR_MC_FLAG_RENDEZVOUS( )
Tables
Special Addresses
Routing Tables Management in IPv6
Chapter 9: Netfilter
Netfilter Frameworks
Netfilter Hooks
Registration of Netfilter Hooks
Connection Tracking
Connection Tracking Initialization
Connection Tracking Entries
Connection Tracking Helpers and Expectations
IPTables
Delivery to the Local Host
Forwarding the Packet
Network Address Translation (NAT)
NAT initialization
NAT Hook Callbacks and Connection Tracking Hook Callbacks
NAT Hook Callbacks
Connection Tracking Extensions
Summary
Quick Reference
Methods
struct xt_table *ipt_register_table(struct net *net, const struct xt_table *table, const struct ipt_replace *repl);
void ipt_unregister_table(struct net *net, struct xt_table *table);
int nf_register_hook(struct nf_hook_ops *reg);
int nf_register_hooks(struct nf_hook_ops *reg, unsigned int n);
void nf_unregister_hook(struct nf_hook_ops *reg);
void nf_unregister_hooks(struct nf_hook_ops *reg, unsigned int n);
static inline void nf_conntrack_get(struct nf_conntrack *nfct);
static inline void nf_conntrack_put(struct nf_conntrack *nfct);
int nf_conntrack_helper_register(struct nf_conntrack_helper *me);
static inline struct nf_conn *resolve_normal_ct(struct net *net, struct nf_conn *tmpl, struct sk_buff *skb, unsigned int da...
struct nf_conntrack_tuple_hash *init_conntrack(struct net *net, struct nf_conn *tmpl, const struct nf_conntrack_tuple *tupl...
static struct nf_conn *__nf_conntrack_alloc(struct net *net, u16 zone, const struct nf_conntrack_tuple *orig, const struct ...
int xt_register_target(struct xt_target *target);
void xt_unregister_target(struct xt_target *target);
int xt_register_targets(struct xt_target *target, unsigned int n);
void xt_unregister_targets(struct xt_target *target, unsigned int n);
int xt_register_match(struct xt_match *target);
void xt_unregister_match(struct xt_match *target);
int xt_register_matches(struct xt_match *match, unsigned int n);
void xt_unregister_matches(struct xt_match *match, unsigned int n);
int nf_ct_extend_register(struct nf_ct_ext_type *type);
void nf_ct_extend_unregister(struct nf_ct_ext_type *type);
int __init iptable_nat_init(void);
int __init nf_conntrack_ftp_init(void);
MACRO
NF_CT_DIRECTION(hash)
Tables
Tools and Libraries
Chapter 10: IPsec
General
IKE (Internet Key Exchange)
IPsec and Cryptography
The XFRM Framework
XFRM Initialization
XFRM Policies
XFRM States (Security Associations)
ESP Implementation (IPv4)
IPv4 ESP Initialization
Receiving an IPsec Packet (Transport Mode)
Sending an IPsec Packet (Transport Mode)
XFRM Lookup
NAT Traversal in IPsec
NAT-T Mode of Operation
Summary
Quick Reference
Methods
bool xfrm_selector_match(const struct xfrm_selector *sel, const struct flowi *fl, unsigned short family);
int xfrm_policy_match(const struct xfrm_policy *pol, const struct flowi *fl, u8 type, u16 family, int dir);
struct xfrm_policy *xfrm_policy_alloc(struct net *net, gfp_t gfp);
void xfrm_policy_destroy(struct xfrm_policy *policy);
void xfrm_pol_hold(struct xfrm_policy *policy);
static inline void xfrm_pol_put(struct xfrm_policy *policy);
struct xfrm_state_afinfo *xfrm_state_get_afinfo(unsigned int family);
struct dst_entry *xfrm_bundle_create(struct xfrm_policy *policy, struct xfrm_state **xfrm, int nx, const struct flowi *fl, ...
int policy_to_flow_dir(int dir);
static struct xfrm_dst *xfrm_create_dummy_bundle(struct net *net, struct dst_entry *dst, const struct flowi *fl, int num_xf...
struct xfrm_dst *xfrm_alloc_dst(struct net *net, int family);
int xfrm_policy_insert(int dir, struct xfrm_policy *policy, int excl);
int xfrm_policy_delete(struct xfrm_policy *pol, int dir);
int xfrm_state_add(struct xfrm_state *x);
int xfrm_state_delete(struct xfrm_state *x);
void __xfrm_state_destroy(struct xfrm_state *x);
int xfrm_state_walk(struct net *net, struct xfrm_state_walk *walk, int (*func)(struct xfrm_state *, int, void*), void *data...
struct xfrm_state *xfrm_state_alloc(struct net *net);
void xfrm_queue_purge(struct sk_buff_head *list);
int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type);
static struct dst_entry *make_blackhole(struct net *net, u16 family, struct dst_entry *dst_orig);
int xdst_queue_output(struct sk_buff *skb);
struct net *xs_net(struct xfrm_state *x);
struct net *xp_net(const struct xfrm_policy *xp);
int xfrm_policy_id2dir(u32 index);
int esp_input(struct xfrm_state *x, struct sk_buff *skb);
struct ip_esp_hdr *ip_esp_hdr(const struct sk_buff *skb);
int verify_newpolicy_info(struct xfrm_userpolicy_info *p);
Table
Chapter 11: Layer 4 Protocols
Sockets
Creating Sockets
UDP (User Datagram Protocol)
UDP Initialization
Sending Packets with UDP
Receiving Packets from the Network Layer (L3) with UDP
TCP (Transmission Control Protocol)
TCP Header
TCP Initialization
TCP Timers
TCP Socket Initialization
TCP Connection Setup
Receiving Packets from the Network Layer (L3) with TCP
Sending Packets with TCP
SCTP (Stream Control Transmission Protocol)
SCTP Packets and Chunks
SCTP Common Header
SCTP Chunk Header
SCTP Chunk
SCTP Associations
Setting Up an SCTP Association
Receiving Packets with SCTP
Sending Packets with SCTP
SCTP HEARTBEAT
SCTP Multistreaming
SCTP Multihoming
DCCP: The Datagram Congestion Control Protocol
DCCP Header
DCCP Initialization
DCCP Socket Initialization
Receiving Packets from the Network Layer (L3) with DCCP
Sending Packets with DCCP
DCCP and NAT
Summary
Quick Reference
Methods
int ip_cmsg_send(struct net *net, struct msghdr *msg, struct ipcm_cookie *ipc);
void sock_put(struct sock *sk);
void sock_hold(struct sock *sk);
int sock_create(int family, int type, int protocol, struct socket **res);
int sock_map_fd(struct socket *sock, int flags);
bool sock_flag(const struct sock *sk, enum sock_flags flag);
int tcp_v4_rcv(struct sk_buff *skb);
void tcp_init_sock(struct sock *sk);
struct tcphdr *tcp_hdr(const struct sk_buff *skb);
int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t size);
struct tcp_sock *tcp_sk(const struct sock *sk);
int udp_rcv(struct sk_buff *skb);
struct udphdr *udp_hdr(const struct sk_buff *skb);
int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t len);
struct sctphdr *sctp_hdr(const struct sk_buff *skb);
struct sctp_sock *sctp_sk(const struct sock *sk);
int sctp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t msg_len);
struct sctp_association *sctp_association_new(const struct sctp_endpoint *ep, const struct sock *sk, sctp_scope_t scope, gf...
void sctp_association_free(struct sctp_association *asoc);
void sctp_chunk_hold(struct sctp_chunk *ch);
void sctp_chunk_put(struct sctp_chunk *ch);
int sctp_rcv(struct sk_buff *skb);
static int dccp_v4_rcv(struct sk_buff *skb);
int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, size_t len);
Macros
sctp_chunk_is_data()
Tables
Chapter 12: Wireless in Linux
Mac80211 Subsystem
The 802.11 MAC Header
The Frame Control
The Other 802.11 MAC Header Members
Network Topologies
Infrastructure BSS
IBSS, or Ad Hoc Mode
Power Save Mode
Entering Power Save Mode
Exiting Power Save Mode
Handling the Multicast/Broadcast Buffer
The Management Layer (MLME)
Scanning
Authentication
Association
Reassociation
Mac80211 Implementation
Rx Path
Tx Path
Fragmentation
Mac80211 debugfs
Wireless Modes
High Throughput (ieee802.11n)
Packet Aggregation
Block Ack Request (BAR)
Block Ack
Mesh Networking (802.11s)
HWMP Protocol
Setting Up a Mesh Network
Linux Wireless Development Process
Summary
Quick Reference
Methods
void ieee80211_send_bar(struct ieee80211_vif *vif, u8 *ra, u16 tid, u16 ssn);
int ieee80211_start_tx_ba_session(struct ieee80211_sta *pubsta, u16 tid, u16 timeout);
int ieee80211_stop_tx_ba_session(struct ieee80211_sta *pubsta, u16 tid);
static void ieee80211_send_addba_request(struct ieee80211_sub_if_data *sdata, const u8 *da, u16 tid, u8 dialog_token, u16 s...
void ieee80211_process_addba_request(struct ieee80211_local *local, struct sta_info *sta, struct ieee80211_mgmt *mgmt, size...
static void ieee80211_send_addba_resp(struct ieee80211_sub_if_data *sdata, u8 *da, u16 tid, u8 dialog_token, u16 status, u1...
static ieee80211_rx_result debug_noinline ieee80211_rx_h_amsdu(struct ieee80211_rx_data *rx);
void ieee80211_process_delba(struct ieee80211_sub_if_data *sdata, struct sta_info *sta, struct ieee80211_mgmt *mgmt, size_t...
void ieee80211_send_delba(struct ieee80211_sub_if_data *sdata, const u8 *da, u16 tid, u16 initiator, u16 reason_code);
void ieee80211_rx_irqsafe(struct ieee80211_hw *hw, struct sk_buff *skb);
static void ieee80211_rx_reorder_ampdu(struct ieee80211_rx_data *rx, struct sk_buff_head *frames);
static bool ieee80211_sta_manage_reorder_buf(struct ieee80211_sub_if_data *sdata, struct tid_ampdu_rx *tid_agg_rx, struct s...
static ieee80211_rx_result debug_noinline ieee80211_rx_h_check(struct ieee80211_rx_data *rx);
void ieee80211_send_nullfunc(struct ieee80211_local *local, struct ieee80211_sub_if_data *sdata, int powersave);
void ieee80211_send_pspoll(struct ieee80211_local *local, struct ieee80211_sub_if_data *sdata);
static void ieee80211_send_assoc(struct ieee80211_sub_if_data *sdata);
void ieee80211_send_auth(struct ieee80211_sub_if_data *sdata, u16 transaction, u16 auth_alg, u16 status, const u8 *extra, s...
static inline bool ieee80211_check_tim(const struct ieee80211_tim_ie *tim, u8 tim_len, u16 aid);
int ieee80211_request_scan(struct ieee80211_sub_if_data *sdata, struct cfg80211_scan_request *req);
void mesh_path_tx_pending(struct mesh_path *mpath);
struct mesh_path *mesh_path_lookup(struct ieee80211_sub_if_data *sdata, const u8 *dst);
static void ieee80211_sta_create_ibss(struct ieee80211_sub_if_data *sdata);
int ieee80211_hw_config(struct ieee80211_local *local, u32 changed);
struct ieee80211_hw *ieee80211_alloc_hw(size_t priv_data_len, const struct ieee80211_ops *ops);
int ieee80211_register_hw(struct ieee80211_hw *hw);
void ieee80211_unregister_hw(struct ieee80211_hw *hw);
int sta_info_insert(struct sta_info *sta);
int sta_info_destroy_addr(struct ieee80211_sub_if_data *sdata, const u8 *addr);
struct sta_info *sta_info_get(struct ieee80211_sub_if_data *sdata, const u8 *addr);
void ieee80211_send_probe_req(struct ieee80211_sub_if_data *sdata, u8 *dst, const u8 *ssid, size_t ssid_len, const u8 *ie, ...
static inline void ieee80211_tx_skb(struct ieee80211_sub_if_data *sdata, struct sk_buff *skb);
int ieee80211_channel_to_frequency(int chan, enum ieee80211_band band);
static int mesh_path_sel_frame_tx(enum mpath_frame_type action, u8 flags, const u8 *orig_addr, __le32 orig_sn, u8 target_fl...
static void hwmp_preq_frame_process(struct ieee80211_sub_if_data *sdata, struct ieee80211_mgmt *mgmt, const u8 *preq_elem, ...
struct ieee80211_rx_status *IEEE80211_SKB_RXCB(struct sk_buff *skb);
static bool ieee80211_tx(struct ieee80211_sub_if_data *sdata, struct sk_buff *skb, bool txpending, enum ieee80211_band band...
Table
Chapter 13: InfiniBand
RDMA and InfiniBand—General
The RDMA Stack Organization
RDMA Technology Advantages
InfiniBand Hardware Components
Addressing in InfiniBand
InfiniBand Features
InfiniBand Packets
Management Entities
RDMA Resources
RDMA Device
Protection Domain (PD)
Address Handle (AH)
Memory Region (MR)
Fast Memory Region (FMR) Pool
Memory Window (MW)
Completion Queue (CQ)
eXtended Reliable Connected (XRC) Domain
Shared Receive Queue (SRQ)
Queue Pair (QP)
QP Transport Types
QP State Machine
Work Request Processing
Supported Operations in the RDMA Architecture
Work Completion Status
Retry Flow
Receiver Not Ready (RNR) Flow
Multicast Groups
Difference Between the Userspace and the Kernel-Level RDMA API
Summary
Quick Reference
Methods
int ib_register_client(struct ib_client *client);
void ib_unregister_client(struct ib_client *client);
void ib_set_client_data(struct ib_device *device, struct ib_client *client, void *data);
void *ib_get_client_data(struct ib_device *device, struct ib_client *client);
int ib_register_event_handler(struct ib_event_handler *event_handler);
int ib_unregister_event_handler(struct ib_event_handler *event_handler);
int ib_query_device(struct ib_device *device, struct ib_device_attr *device_attr);
int ib_query_port(struct ib_device *device, u8 port_num, struct ib_port_attr *port_attr);
enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 port_num);
int ib_query_gid(struct ib_device *device, u8 port_num, int index, union ib_gid *gid);
int ib_query_pkey(struct ib_device *device, u8 port_num, u16 index, u16 *pkey);
int ib_find_gid(struct ib_device *device, union ib_gid *gid, u8 *port_num, u16 *index);
int ib_find_pkey(struct ib_device *device, u8 port_num, u16 pkey, u16 *index);
struct ib_pd *ib_alloc_pd(struct ib_device *device);
int ib_dealloc_pd(struct ib_pd *pd);
struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr);
int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc, struct ib_grh *grh, struct ib_ah_attr *ah_a...
struct ib_ah *ib_create_ah_from_wc(struct ib_pd *pd, struct ib_wc *wc, struct ib_grh *grh, u8 port_num);
int ib_modify_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr);
int ib_query_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr);
int ib_destroy_ah(struct ib_ah *ah);
struct ib_mr *ib_get_dma_mr(struct ib_pd *pd, int mr_access_flags);
static inline int ib_dma_mapping_error(struct ib_device *dev, u64 dma_addr);
static inline u64 ib_dma_map_single(struct ib_device *dev, void *cpu_addr, size_t size, enum dma_data_direction direction);
static inline void ib_dma_unmap_single(struct ib_device *dev, u64 addr, size_t size, enum dma_data_direction direction);
static inline u64 ib_dma_map_single_attrs(struct ib_device *dev, void *cpu_addr, size_t size, enum dma_data_direction direc...
static inline void ib_dma_unmap_single_attrs(struct ib_device *dev, u64 addr, size_t size, enum dma_data_direction directio...
static inline u64 ib_dma_map_page(struct ib_device *dev, struct page *page, unsigned long offset, size_t size, enum dma_dat...
static inline void ib_dma_unmap_page(struct ib_device *dev, u64 addr, size_t size, enum dma_data_direction direction);
static inline int ib_dma_map_sg(struct ib_device *dev, struct scatterlist *sg, int nents, enum dma_data_direction direction...
static inline void ib_dma_unmap_sg(struct ib_device *dev, struct scatterlist *sg, int nents, enum dma_data_direction direct...
static inline int ib_dma_map_sg_attrs(struct ib_device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir...
static inline void ib_dma_unmap_sg_attrs(struct ib_device *dev, struct scatterlist *sg, int nents, enum dma_data_direction ...
static inline u64 ib_sg_dma_address(struct ib_device *dev, struct scatterlist *sg);
static inline unsigned int ib_sg_dma_len(struct ib_device *dev, struct scatterlist *sg);
static inline void ib_dma_sync_single_for_cpu(struct ib_device *dev, u64 addr, size_t size, enum dma_data_direction dir);
static inline void ib_dma_sync_single_for_device(struct ib_device *dev, u64 addr, size_t size, enum dma_data_direction dir);
static inline void *ib_dma_alloc_coherent(struct ib_device *dev, size_t size, u64 *dma_handle, gfp_t flag);
static inline void ib_dma_free_coherent(struct ib_device *dev, size_t size, void *cpu_addr, u64 dma_handle);
struct ib_mr *ib_reg_phys_mr(struct ib_pd *pd, struct ib_phys_buf *phys_buf_array, int num_phys_buf, int mr_access_flags, u...
int ib_rereg_phys_mr(struct ib_mr *mr, int mr_rereg_mask, struct ib_pd *pd, struct ib_phys_buf *phys_buf_array, int num_phy...
int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr);
int ib_dereg_mr(struct ib_mr *mr);
struct ib_mw *ib_alloc_mw(struct ib_pd *pd, enum ib_mw_type type);
static inline int ib_bind_mw(struct ib_qp *qp, struct ib_mw *mw, struct ib_mw_bind *mw_bind);
int ib_dealloc_mw(struct ib_mw *mw);
struct ib_cq *ib_create_cq(struct ib_device *device, ib_comp_handler comp_handler, void (*event_handler)(struct ib_event *,...
int ib_resize_cq(struct ib_cq *cq, int cqe);
int ib_modify_cq(structib_cq *cq, u16 cq_count, u16 cq_period);
int ib_peek_cq(structib_cq *cq, intwc_cnt);
static inline int ib_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify_flags flags);
static inline int ib_req_ncomp_notif(struct ib_cq *cq, int wc_cnt);
static inline int ib_poll_cq(struct ib_cq *cq, int num_entries, struct ib_wc *wc);
struct ib_srq *ib_create_srq(struct ib_pd *pd, struct ib_srq_init_attr *srq_init_attr);
int ib_modify_srq(struct ib_srq *srq, struct ib_srq_attr *srq_attr, enum ib_srq_attr_mask srq_attr_mask);
int ib_query_srq(struct ib_srq *srq, struct ib_srq_attr *srq_attr);
int ib_destroy_srq(struct ib_srq *srq);
struct ib_qp *ib_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *qp_init_attr);
int ib_modify_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int qp_attr_mask);
int ib_query_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr, int qp_attr_mask, struct ib_qp_init_attr *qp_init_attr);
int ib_destroy_qp(struct ib_qp *qp);
static inline int ib_post_srq_recv(struct ib_srq *srq, struct ib_recv_wr *recv_wr, struct ib_recv_wr **bad_recv_wr);
static inline int ib_post_recv(struct ib_qp *qp, struct ib_recv_wr *recv_wr, struct ib_recv_wr **bad_recv_wr);
static inline int ib_post_send(struct ib_qp *qp, struct ib_send_wr *send_wr, struct ib_send_wr **bad_send_wr);
int ib_attach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid);
int ib_detach_mcast(struct ib_qp *qp, union ib_gid *gid, u16 lid);
Chapter 14: Advanced Topics
Network Namespaces
Namespaces Implementation
UTS Namespaces Implementation
Network Namespaces Implementation
The Network Namespace Object (struct net)
Network Namespaces Implementation: Other Data Structures
Network Namespaces Management
Moving a Network Interface to a Different Network Namespace
Communicating Between Two Network Namespaces
Cgroups
Cgroups Implementation
Cgroup Devices Controller: A Simple Example
Cgroup Memory Controller: A Simple Example
The net_prio Module
The cls_cgroup Classifier
Mounting cgroup Subsystems
Busy Poll Sockets
Enabling Globally
Enabling Per Socket
Tuning and Configuration
Performance
The Linux Bluetooth Subsystem
HCI Layer
HCI Device
HCI and the Layer Below It (Link Controller)
HCI and the Layers Above It (L2CAP/SCO)
HCI Connection
L2CAP
BNEP
Receiving Bluetooth Packets: Diagram
L2CAP Extended Features
Bluetooth Tools
IEEE 802.15.4 and 6LoWPAN
Neighbor Discovery Optimization
Linux Kernel 6LoWPAN
6LoWPAN Initialization
Near Field Communication (NFC)
NFC Tags
NFC Devices
Communication and Operation Modes
Host-Controller Interfaces
Linux NFC support
NFC Sockets
Raw Sockets
LLCP Sockets
NFC Netlink API
NFC Initialization
Drivers API
Userspace Architecture
NFC on Android
Notifications Chains
The PCI Subsystem
Wake-On-LAN (WOL)
Teaming Network Device
The PPPoE Protocol
PPPoE Header
PPPoE Initialization
PPPoX Sockets
Sending and Receiving Packets with PPPoE
Android
Android Networking
Android internals: Resources
Summary
Quick Reference
Methods
void switch_task_namespaces(struct task_struct *p, struct nsproxy *new);
struct nsproxy *create_nsproxy(void);
void free_nsproxy(struct nsproxy *ns);
struct net *dev_net(const struct net_device *dev);
void dev_net_set(struct net_device *dev, struct net *net);
void sock_net_set(struct sock *sk, struct net *net);
struct net *sock_net(const struct sock *sk);
int net_eq(const struct net *net1, const struct net *net2);
struct net *net_alloc(void);
struct net *copy_net_ns(unsigned long flags, struct user_namespace *user_ns, struct net *old_net);
int setup_net(struct net *net, struct user_namespace *user_ns);
int proc_alloc_inum(unsigned int *inum);
struct nsproxy *task_nsproxy(struct task_struct *tsk);
struct new_utsname *utsname(void);
struct uts_namespace *clone_uts_ns(struct user_namespace *user_ns, struct uts_namespace *old_ns);
struct uts_namespace *copy_utsname(unsigned long flags, struct user_namespace *user_ns, struct uts_namespace *old_ns);
struct net *sock_net(const struct sock *sk);
void sock_net_set(struct sock *sk, struct net *net);
int dev_change_net_namespace(struct net_device *dev, struct net *net, const char *pat);
void put_net(struct net *net);
struct net *get_net(struct net *net);
void get_nsproxy(struct nsproxy *ns);
struct net *get_net_ns_by_pid(pid_t pid);
struct net *get_net_ns_by_fd(int fd);
struct pid_namespace *ns_of_pid(struct pid *pid);
void put_nsproxy(struct nsproxy *ns);
int register_pernet_device(struct pernet_operations *ops);
void unregister_pernet_device(struct pernet_operations *ops);
int register_pernet_subsys(struct pernet_operations *ops);
void unregister_pernet_subsys(struct pernet_operations *ops);
static int register_vlan_device(struct net_device *real_dev, u16 vlan_id);
void cgroup_release_agent(struct work_struct *work);
int call_usermodehelper(char * path, char ** argv, char ** envp, int wait);
int bacmp(bdaddr_t *ba1, bdaddr_t *ba2);
void bacpy(bdaddr_t *dst, bdaddr_t *src);
int hci_send_frame(struct sk_buff *skb);
int hci_register_dev(struct hci_dev *hdev);
void hci_unregister_dev(struct hci_dev *hdev);
void hci_event_packet(struct hci_dev *hdev, struct sk_buff *skb);
int lowpan_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev);
void pci_unregister_driver(struct pci_driver *dev);
int pci_enable_device(struct pci_dev *dev);
int request_irq(unsigned int irq, irq_handler_t handler, unsigned long flags, const char *name, void *dev);
void free_irq(unsigned int irq, void *dev_id);
int nfc_init(void);
int nfc_register_device(struct nfc_dev *dev);
int nfc_hci_register_device(struct nfc_hci_dev *hdev);
int nci_register_device(struct nci_dev *ndev);
static int __init pppoe_init(void);
struct pppoe_hdr *pppoe_hdr(const struct sk_buff *skb);
static int pppoe_create(struct net *net, struct socket *sock);
int __set_item(struct pppoe_net *pn, struct pppox_sock *po);
void delete_item(struct pppoe_net *pn, __be16 sid, char *addr, int ifindex);
bool stage_session(__be16 sid);
int notifier_chain_register(struct notifier_block **nl, struct notifier_block *n);
int notifier_chain_unregister(struct notifier_block **nl, struct notifier_block *n);
int register_netdevice_notifier(struct notifier_block *nb);
int unregister_netdevice_notifier(struct notifier_block *nb);
int register_inet6addr_notifier(struct notifier_block *nb);
int unregister_inet6addr_notifier(struct notifier_block *nb);
int register_netevent_notifier(struct notifier_block *nb);
int unregister_netevent_notifier(struct notifier_block *nb);
int __kprobes notifier_call_chain(struct notifier_block **nl, unsigned long val, void *v, int nr_to_call, int *nr_calls);
int call_netdevice_notifiers(unsigned long val, struct net_device *dev);
int blocking_notifier_call_chain(struct blocking_notifier_head *nh, unsigned long val, void *v);
int __atomic_notifier_call_chain(struct atomic_notifier_head *nh,unsigned long val, void *v, int nr_to_call, int *nr_calls);
Macros
pci_register_driver()
Appendix A: Linux API
The sk_buff Structure
struct skb_shared_info
The net_device structure
RDMA (Remote DMA)
RDMA Device
The ib_register_client() Method
The ib_client Struct:
The ib_unregister_client() Method
The ib_get_client_data() Method
The ib_set_client_data() Method
The INIT_IB_EVENT_HANDLER macro
The ib_register_event_handler() Method
The ib_event_handler struct:
The ib_event Struct
The ib_unregister_event_handler() Method
The ib_query_device() Method
The ib_device_attr struct:
The ib_query_port() Method
The ib_port_attr Struct
The rdma_port_get_link_layer() Method
The ib_query_gid() Method
The ib_query_pkey() Method
The ib_modify_device() Method
The ib_device_modify Struct
The ib_modify_port() Method
The ib_port_modify struct:
The ib_find_gid() Method
The ib_find_pkey() Method
The rdma_node_get_transport() Method
The rdma_node_get_transport() Method
The ib_mtu_to_int() Method
The ib_width_enum_to_int() Method
The ib_rate_to_mult() Method
The ib_rate_to_mbps() Method
The ib_rate_to_mbps() Method
Protection Domain (PD)
The ib_alloc_pd() Method
The ib_dealloc_pd() Method
eXtended Reliable Connected (XRC)
The ib_alloc_xrcd() Method
The ib_dealloc_xrcd_cq() Method
Shared Receive Queue (SRQ)
The ib_srq_attr Struct
The ib_create_srq() Method
The ib_srq_init_attr Struct
The ib_modify_srq() Method
The ib_query_srq() Method
The ib_destory_srq() Method
The ib_post_srq_recv() Method
The ib_recv_wr Struct
The ib_sge Struct
Address Handle (AH)
The ib_ah_attr Struct
The ib_create_ah() Method
The ib_init_ah_from_wc() Method
The ib_create_ah_from_wc() Method
The ib_modify_ah() Method
The ib_query_ah() Method
The ib_destory_ah() Method
Multicast Groups
The ib_attach_mcast() Method
The ib_detach_mcast() method
Completion Queue (CQ)
The ib_create_cq() Method
The ib_resize_cq() Method
The ib_modify_cq() Method
The ib_peek_cq() Method
The ib_req_notify_cq() Method
The ib_req_ncomp_notif() Method
The ib_poll_cq() Method
The ib_wc Struct
The ib_destory_cq() Method
Queue Pair (QP)
The ib_qp_cap Struct
The ib_create_qp() Method
The ib_qp_init_attr Struct
The ib_modify_qp() Method
The ib_qp_attr Struct
The ib_query_qp() Method
The ib_open_qp() Method
The ib_qp_open_attr Struct
The ib_close_qp() Method
The ib_post_recv() Method
The ib_post_send() Method
The ib_send_wr Struct
The ib_mw_bind_info Struct
Memory Windows (MW)
The ib_alloc_mw() Method
The ib_bind_mw() Method
The ib_mw_bind Struct
The ib_dealloc_mw() Method
Memory Region (MR)
The ib_get_dma_mr() Method
The ib_dma_mapping_error() Method
The ib_dma_map_single() Method
The ib_dma_unmap_single() Method
The ib_dma_map_single_attrs() Method
The ib_dma_unmap_single_attrs() Method
The ib_dma_map_page() Method
The ib_dma_unmap_page() Method
The ib_dma_map_sg() Method
The ib_dma_unmap_sg() Method
The ib_dma_map_sg_attr() Method
The ib_dma_unmap_sg() Method
The ib_sg_dma_address() Method
The ib_sg_dma_len() Method
The ib_dma_sync_single_for_cpu() Method
The ib_dma_sync_single_for_device() Method
The ib_dma_alloc_coherent() Method
The ib_dma_free_coherent() method
The ib_reg_phys_mr() Method
The ib_phys_buf Struct
The ib_rereg_phys_mr() Method
The ib_query_mr() Method
The ib_mr_attr Struct
The ib_dereg_mr() Method
Appendix B: Network Administration
arp
arping
arptables
arpwatch
ApacheBench (ab)
brctl
conntrack-tools
crtools
ebtables
ether-wake
ethtool
git
hciconfig
hcidump
hcitool
ifconifg
ifenslave
iperf
Using iperf
iproute2
iptables and iptables6
ipvsadm
iw
iwconfig
libreswan Project
l2ping
lowpan-tools
lshw
lscpu
lspci
mrouted
nc
ngrep
netperf
netsniff-ng
netstat
nmap (Network Mapper)
openswan
OpenVPN
packeth
ping
pimd
poptop
ppp
pktgen
radvd
route
RP-PPPoE
sar
smcroute
snort
suricata
strongSwan
sysctl
taskset
tcpdump
top
tracepath
traceroute
tshark
tunctl
udevadm
unshare
vconfig
wpa_supplicant
wireshark
XORP
Appendix C: Glossary
Index
For your convenience Apress has placed some of the front matter material after the index. Please use the Bookmarks and Contents at a Glance links to access them.
Contents at a Glance About the Author �������������������������������������������������������������������������������������������������������������� xxv About the Technical Reviewer ���������������������������������������������������������������������������������������� xxvii Acknowledgments ����������������������������������������������������������������������������������������������������������� xxix Preface ���������������������������������������������������������������������������������������������������������������������������� xxxi ■ Chapter 1: Introduction �����������������������������������������������������������������������������������������������������1 ■ Chapter 2: Netlink Sockets ����������������������������������������������������������������������������������������������13 ■ Chapter 3: Internet Control Message Protocol (ICMP) �����������������������������������������������������37 ■ Chapter 4: IPv4 ����������������������������������������������������������������������������������������������������������������63 ■ Chapter 5: The IPv4 Routing Subsystem �����������������������������������������������������������������������113 ■ Chapter 6: Advanced Routing ����������������������������������������������������������������������������������������141 ■ Chapter 7: Linux Neighbouring Subsystem �������������������������������������������������������������������165 ■ Chapter 8: IPv6 ��������������������������������������������������������������������������������������������������������������209 ■ Chapter 9: Netfilter ��������������������������������������������������������������������������������������������������������247 ■ Chapter 10: IPsec ����������������������������������������������������������������������������������������������������������279 ■ Chapter 11: Layer 4 Protocols ���������������������������������������������������������������������������������������305 ■ Chapter 12: Wireless in Linux ���������������������������������������������������������������������������������������345 ■ Chapter 13: InfiniBand ���������������������������������������������������������������������������������������������������373 ■ Chapter 14: Advanced Topics ����������������������������������������������������������������������������������������405 v
■ Contents at a GlanCe ■ Appendix A: Linux API ���������������������������������������������������������������������������������������������������483 ■ Appendix B: Network Administration ����������������������������������������������������������������������������571 ■ Appendix C: Glossary �����������������������������������������������������������������������������������������������������589 Index ���������������������������������������������������������������������������������������������������������������������������������599 vi
Chapter 1 Introduction This book deals with the implementation of the Linux Kernel Networking stack and the theory behind it. You will find in the following pages an in-depth and detailed analysis of the networking subsystem and its architecture. I will not burden you with topics not directly related to networking, which you may encounter while reading kernel networking code (for example, locking and synchronization, SMP, atomic operations, and so on). There are plenty of resources about such topics. On the other hand, there are very few up-to-date resources that focus on kernel networking proper. By this I mean primarily describing the traversal of the packet in the Linux Kernel Networking stack and its interaction with various networking layers and subsystems—and how various networking protocols are implemented. This book is also not a cumbersome, line-by-line code walkthrough. I focus on the essence of the implementation of each network layer and the theory guidelines and principles that led to this implementation. The Linux operating system has proved itself in recent years as a successful, reliable, stable, and popular operating system. And it seems that its popularity is growing steadily, in a wide variety of flavors, from mainframes, data centers, core routers, and web servers to embedded devices like wireless routers, set-top boxes, medical instruments, navigation equipment (like GPS devices), and consumer electronics devices. Many semiconductor vendors use Linux as the basis for their Board Support Packages (BSPs). The Linux operating system, which started as a project of a Finnish student named Linus Torvalds back in 1991, based on the UNIX operating system, proved to be a serious and reliable operating system and a rival for veteran proprietary operating systems. Linux began as an Intel x86-based operating system but has been ported to a very wide range of processors, including ARM, PowerPC, MIPS, SPARC, and more. The Android operating system, based upon the Linux kernel, is common today in tablets and smartphones, and seems likely to gain popularity in the future in smart TVs. Apart from Android, Google has also contributed some kernel networking features that were merged into the mainline kernel. Linux is an open source project, and as such it has an advantage over other proprietary operating systems: its source code is freely available under the General Public License (GPL). Other open source operating systems, like the different types of BSD, have much less popularity. I should also mention in this context the OpenSolaris project, based on the Common Development and Distribution License (CDDL). This project, started by Sun Microsystems, has not achieved the popularity that Linux has. Among the large community of active Linux developers, some contribute code on behalf of the companies they work for, and some contribute code voluntarily. All of the kernel development process is accessible via the kernel mailing lists. There is one central mailing list, the Linux Kernel Mailing List (LKML), and many subsystems have their own mailing lists. Contributing code is done via sending patches to the appropriate kernel mailing lists and to the maintainers, and these patches are discussed over the mailing lists. The Linux Kernel Networking stack is a very important subsystem of the Linux kernel. It is quite difficult to find a Linux-based system, whether it is a desktop, a server, a mobile device or any other embedded device, that does not use any kind of networking. Even in the rare case when a machine doesn't have any hardware network devices, you will still be using networking (maybe unconsciously) when you use X-Windows, as X-Windows itself is based upon client-server networking. A wide range of projects are related to the Linux Networking stack, from core routers to small embedded devices. Some of these projects deal with adding vendor-specific features. For example, some hardware vendors implement Generic Segmentation Offload (GSO) in some network devices. GSO is a networking feature of the kernel network stack that divides a large packet into smaller ones in the Tx path. Many hardware vendors implement checksumming in hardware in their network devices. Checksum is a mechanism to verify that a packet was not 1
Chapter 1 ■ IntroduCtIon damaged on transit by calculating some hash from the packet and attaching it to the packet. Many projects provide some security enhancements for Linux. Sometimes these enhancements require some changes in the networking subsystem, as you will see, for example, in Chapter 3, when discussing the Openwall GNU/*/Linux project. In the embedded device arena there are, for example, many wireless routers that are Linux based; one example is the WRT54GL Linksys router, which runs Linux. There is also an open source, Linux-based operating system that can run on this device (and on some other devices), named OpenWrt, with a large and active community of developers (see https://openwrt.org/). Learning about how the various protocols are implemented by the Linux Kernel Networking stack and becoming familiar with the main data structures and the main paths of a packet in it are essential to understanding it better. The Linux Network Stack There are seven logical networking layers according to the Open Systems Interconnection (OSI) model. The lowest layer is the physical layer, which is the hardware, and the highest layer is the application layer, where userspace software processes are running. Let’s describe these seven layers: 1. The physical layer: Handles electrical signals and the low level details. 2. The data link layer: Handles data transfer between endpoints. The most common data link layer is Ethernet. The Linux Ethernet network device drivers reside in this layer. 3. The network layer: Handles packet forwarding and host addressing. In this book I discuss the most common network layers of the Linux Kernel Networking subsystem: IPv4 or IPv6. There are other, less common network layers which Linux implements, like DECnet, but they are not discussed. 4. The protocol layer/transport layer: Handles data sending between nodes. The TCP and UDP protocols are the best-known protocols. 5. The session layer: Handles sessions between endpoints. 6. The presentation layer: Handles delivery and formatting. 7. The application layer: Provides network services to end-user applications. Figure 1-1 shows the seven layers according to the OSI model. 2
Chapter 1 ■ IntroduCtIon Figure 1-1. The OSI seven-layer model Figure 1-2 shows the three layers that the Linux Kernel Networking stack handles. The L2, L3, and L4 layers in this figure correspond to the data link layer, the network layer, and the transport layer in the seven-layer model, respectively. The essence of the Linux kernel stack is passing incoming packets from L2 (the network device drivers) to L3 (the network layer, usually IPv4 or IPv6) and then to L4 (the transport layer, where you have, for example, TCP or UDP listening sockets) if they are for local delivery, or back to L2 for transmission when the packets should be forwarded. Outgoing packets that were locally generated are passed from L4 to L3 and then to L2 for actual transmission by the network device driver. Along this way there are many stages, and many things can happen. For example: • • • • • • The packet can be changed due to protocol rules (for example, due to an IPsec rule or to a NAT rule). The packet can be discarded. The packet can cause an error message to be sent. The packet can be fragmented. The packet can be defragmented. A checksum should be calculated for the packet. 3
Chapter 1 ■ IntroduCtIon Figure 1-2. The Linux Kernel Networking layers The kernel does not handle any layer above L4; those layers (the session, presentation, and application layers) are handled solely by userspace applications. The physical layer (L1) is also not handled by the Linux kernel. If you feel overwhelmed, don’t worry. You will learn a lot more about everything described here in a lot more depth in the following chapters. The Network Device The lower layer, Layer 2 (L2), as seen in Figure 1-2, is the link layer. The network device drivers reside in this layer. This book is not about network device driver development, because it focuses on the Linux kernel networking stack. I will briefly describe here the net_device structure, which represents a network device, and some of the concepts that are related to it. You should have a basic familiarity with the network device structure in order to better understand the network stack. Parameters of the device—like the size of MTU, which is typically 1,500 bytes for Ethernet devices—determine whether a packet should be fragmented. The net_device is a very large structure, consisting of device parameters like these: The IRQ number of the device. The MTU of the device. The MAC address of the device. The name of the device (like eth0 or eth1). The flags of the device (for example, whether it is up or down). A list of multicast addresses associated with the device. The promiscuity counter (discussed later in this section). The features that the device supports (like GSO or GRO offloading). An object of network device callbacks ( pointers, such as for opening and stopping a device, starting to transmit, changing the MTU of the network device, and more. net_device_ops object), which consists of function An object of running the command-line ethtool utility. ethtool callbacks, which supports getting information about the device by The number of Tx and Rx queues, when the device supports multiqueues. The timestamp of the last transmit of a packet on this device. The timestamp of the last reception of a packet on this device. • • • • • • • • • • • • • 4
分享到:
收藏