2015
TCP/IP 网络协议栈
基于 linux 3.10
分为上篇和下篇,上篇共九章,部侧重于 TCP/IP 数据收发流程,即 OSI
模型的 IP 和 TCP 层,下篇也是九章,并不属于 TCP/IP 本身,但是多少和
网络有关且常用到,比如 LC-trie 路由、netfilter 包过滤防火、还有一些
网络相关的命令行工具等,文末给出 IPV6 的协议栈模型图,此外还给出了
测试源码。
葛世超
定价 0 元
2015/4/22
第零章内核网络相关配置选项 ............................................................................................... 5
0.1
Kconfig 选项 ....................................................................................................... 5
0.2 ip-sysctl 意义 ................................................................................................................... 8
第一章
网络子系统初始化 ............................................................................................ 9
1.1 网络初始化函数调用顺序 ............................................................................................. 9
1.2 调用函数浅析 .............................................................................................................. 12
1.3 inet_init ......................................................................................................................... 16
1.4 总结 .............................................................................................................................. 21
第二章 主机到网络层(网卡) ........................................................................................... 22
2.1 TCP/IP 协议栈模型 ....................................................................................................... 22
2.2 网卡数据结构 .............................................................................................................. 22
2.3 网卡注册流程 .............................................................................................................. 26
第三章 套接字相关数据结构 ............................................................................................... 31
3.1 socket 对应的内核结构体 ............................................................................................ 31
3.2 struct proto_ops ............................................................................................................ 33
3.3 struct proto .................................................................................................................... 33
3.4 sk_buff(SKB)................................................................................................................... 34
3.5 softnet_data .................................................................................................................. 35
3.6 struct packet_type ......................................................................................................... 36
3.7 一些名词简称 .............................................................................................................. 36
第四章 网络层接收数据包流程 ........................................................................................... 37
4.1 主机到网络层的过渡 .................................................................................................. 37
4.2 进入网络层 .................................................................................................................. 40
第五章 传输层(tcp)到网络层(ip) ............................................................................... 44
第六章 应用层 ....................................................................................................................... 49
第七章 tcp 发送(传输层) .................................................................................................... 55
7.2 MSS ............................................................................................................................ 69
第八章 tcp 接(传输层) ........................................................................................................ 70
第九章 tcp 拥塞控制 ............................................................................................................. 79
9.1 CUBIC 拥塞控制 ............................................................................................................ 79
cubic 使用的算法 ............................................................................................................ 79
cubic 慢启动门限阈值 .................................................................................................... 79
9.2 cubic 拥塞代码实现 ...................................................................................................... 82
慢启动 slow start ............................................................................................................ 82
拥塞避免 congestion avoid .......................................................................................... 83
快速重传和快速恢复 ..................................................................................................... 85
下篇 杂项汇总 ....................................................................................................................... 86
第十章 网络工具 ................................................................................................................... 87
10.1 ss .................................................................................................................................. 87
10.2 netstat ......................................................................................................................... 88
列标题 ............................................................................................................................. 88
常用选项 ......................................................................................................................... 89
10.3 netstress ...................................................................................................................... 89
10.4 netperf 参考 ................................................................................................................ 90
10.5 iperf ............................................................................................................................. 90
10.6 iptraf ............................................................................................................................ 90
10.7 TcpDump ..................................................................................................................... 91
10.7.1 数据过滤 ............................................................................................................. 91
10.7.2 输入输出 ............................................................................................................. 92
10.8 nicstat .......................................................................................................................... 92
10.8.1 nicstat 的安装: ..................................................................................................... 92
10.8.2 nicstat 使用 .......................................................................................................... 93
10.9 ethtool 工具: ............................................................................................................ 95
第十一章 Linux 包过滤防火墙-netfilter iptables ................................................................. 96
11.1 netfilter 框架 ............................................................................................................... 96
11.2 防火墙规则表 .......................................................................................................... 104
11.2.1 xt_init 初始化防火墙表 ..................................................................................... 106
11.2.2 规则的组成 ....................................................................................................... 108
11.3 防火墙规则遍历 ...................................................................................................... 110
11.3.2 Hook 函数 .......................................................................................................... 117
11.4 iptables ...................................................................................................................... 123
第十二章 路由 ..................................................................................................................... 124
12.1 路由核心数据结构 ................................................................................................... 124
12.2 LC-trie(字典树、单词查找树) ............................................................................ 127
12.3 ifconfig ....................................................................................................................... 132
12.3.1 /proc/net/路由下路由信息 ............................................................................... 132
12.3.2 路由通知链函数的注册 ................................................................................... 133
12.3.3 ifconfig 调用流程 ............................................................................................... 134
12.3.4 put_child ............................................................................................................. 153
12.4 route 添加路由项 ..................................................................................................... 154
12.5 路由缓存 .................................................................................................................. 161
12.5.1 路由缓存的查找 ............................................................................................... 161
12.5.2 路由缓存的创建 ............................................................................................... 162
12.5.3 路由缓存的内存管理 ....................................................................................... 166
12.6 路由查找 .................................................................................................................. 166
12.6.1 相关数据结构 ................................................................................................... 166
12.6.2 接收包路由项查找 ........................................................................................... 170
第十三章 网络命名空间 ..................................................................................................... 177
13.1 命名空间创建 .......................................................................................................... 177
13.2 网络命名空间管理 .................................................................................................. 178
第十四章 netlink 机制 ......................................................................................................... 181
14.1 netlink 支持的通信 ................................................................................................... 181
14.2 netlink 用户空间 API ................................................................................................ 184
14.3 netlink 内核空间 API ................................................................................................ 184
第十五章 提升网络性能技术 ............................................................................................. 188
15.1 TSO/GSO .................................................................................................................... 189
15.2 LRO/GRO .................................................................................................................... 191
15.3 RSS(Receive Side Scaling)队列: ......................................................................... 196
15.4 RPS(Receive Packet Steering)队列: .................................................................. 196
15.5 RFS(Receive Flow Steering),Accelerated Receive Flow Steering ............................ 199
15.6 XPS(Transmit Packet Steering) ............................................................................. 200
第十六章 PHY ...................................................................................................................... 203
16.1 PHY ............................................................................................................................ 203
16.2 MAC 驱动 .................................................................................................................. 205
16.3 PHY 驱动 ................................................................................................................... 209
16.3.1 PHY 初始化......................................................................................................... 209
16.3.2 PHY 驱动实例..................................................................................................... 211
16.3.3 PHY 状态机......................................................................................................... 213
第十七章 ping-icmp ............................................................................................................. 218
第十八章 ipv6 简介 ............................................................................................................. 223
附录 tcp 测试程序 ............................................................................................................... 224
TCP/IP 状态图 ....................................................................................................................... 227
参考文献 ............................................................................................................................... 228
第零章 内核网络相关配置选项
0.1 Kconfig 选项
基于嵌入式,可能略有不同
packet protocol 被直接和网络设备通信的应用程序使用,其没有使用内核的其它协议,像
tcpdump 支持需要使能该选项,af_packet。
<*> Packet socket
支持 PF_PACKET 套接字,ss 之类工具监控接口(eth0...)会使用这类套接字
< > Packet: sockets monitoring interface
//UNIX 域套接字,即使没有联网 Xwindow 和 syslog 也会使用 UNIX 域套接字。强烈建议该选项
为 Y
<*> Unix domain sockets
支持 ss 工具使用的 Unix 域套接字来监控 interface
<*> UNIX: socket monitoring interface
支持 XFRM(Transformation),对接收到的数据包经过路由时会被修改;
< > Transformation user configuration interface
[ ] Transformation sub policy support
[ ] Transformation migrate database
[ ] Transformation statistics
PF_KEYv2 套接字协议族,如果使用移植于 KAME 的 IPsec 工具,该选项需要。
< > PF_KEY sockets
会使内核增加 400KB
[*] TCP/IP networking
多播,内核增加 2KB,对于 MBONE(Multicast backbone),一个应用场景是影音节目的全球
广播。
[*] IP: multicasting
这个选项用于支持网络数据包的 forward 和 redistribute,并不包括路由的基本配置。
[*] IP: advanced router
//路由的 TRIE 表统计,测试 TRIE 算法的性能
[ ] FIB TRIE statistics
通常路由根据接收到的数据包最终目的地址决策数据包的命运,如果使用策略路由,那么源地址、
TOS 也会被考虑进去。
[ ] IP: policy routing
通常,对一个数据包路由表会明确给出一个路径;如果配置该选项,对一个给定的数据包将可能
存在多种路径,路由会将这些路径当成开销是一样的,对路径的选择将是不确定的。
[ ] IP: equal cost multipath
klogd 将导出路由信息。
[ ] IP: verbose route monitoring
内核启动时将允许设备的 IP 地址和路由表的自动配置。配置的依据是内核命令行或者 BOOTP、
RARP 协议。无盘系统启动需要配置此选项。
[ ] IP: kernel level autoconfiguration
隧道,将一个协议的数据封装在另一个协议中,通过一个支持封装协议的通道发送。这里是 IP
封装 IP 的隧道支持,可用于支持主机伪装和移动 IP
< > IP: tunneling
解 GRE(Generic Routing Encapsulation)包,使用 ip_gre 和 pptp(point to point Tunning
Protocol)点对点隧道协议,则需要配置该选项。
< > IP: GRE demultiplexer
多目的地址路由支持。MBNOE
[ ] IP: multicast routing
内核维持一个 IP 映射到 MAC 的 cache,ARP 协议负责该映射,如果想支持用户空间 daemon
完成地址解析,这里配置上就行
[ ] IP: ARP daemon support
TCP/IP 网络易受 SYN 攻击,DOS 攻击阻止了合法用户建立连接;SYN cookie 方法使用加密
的方法能够在主机收到攻击时仍然可以通信。
[ ] IP: TCP syncookie support
支持 IPsec AH(Authentication Header),见 http://en.wikipedia.org/wiki/IPsec
< > IP: AH transformation
支持 IPsec ESP (Encapsulating Security Payload)
< > IP: ESP transformation
IP Payload Compression Protocol (IPComp) (RFC3173),IPsec 需要
< > IP: IPComp transformation
Support for IPsec transport mod
< > IP: IPsec transport mode
Support for IPsec tunnel mode
< > IP: IPsec tunnel mode
Support for IPsec BEET mode
< > IP: IPsec BEET mode
Support for Large Receive Offload (ipv4/tcp)
<*> Large Receive Offload (ipv4/tcp)
Support for INET (TCP, DCCP, etc) socket monitoring interface used by native Linux tools
such as ss. ss is included in iproute2
< > INET: socket monitoring interface
various TCP congestion control CUBIC TCP、 H-TCP、TCP Westwood+、Binary Increase
Congestion (BIC) control,默认使用 cubic 算法
[ ] TCP: advanced congestion control --->
RFC2385 specifies a method of giving MD5 protection to TCP sessions.
[ ] TCP: MD5 Signature Option support (RFC2385)
<*> The IPv6 protocol --->
网络数据包 security marking
[ ] Security Marking
PHY 设备对数据包进行时间戳标记
[ ] Timestamping in PHY devices
netfilter,1、透明代理 2、包过滤防火墙。
[*] Network packet filtering framework (Netfilter) --->
Datagram Congestion Control Protocol
< > The DCCP Protocol --->
Stream Control Transmission Protocol
< > The SCTP Protocol --->
RDS (Reliable Datagram Sockets) protocol,provides reliable, sequenced delivery of
datagrams over Infiniband, iWARP, or TCP.
< > The RDS Protocol
Transparent Inter Process Communication (TIPC) protocol,
< > The TIPC Protocol --->
ATM is a high-speed networking technology for Local Area Networks and Wide Area
Networks.
< > Asynchronous Transfer Mode (ATM)
对于 PVC(permanent virtual circuit)和 SVC(switched virtual circuits)下的基于 ATM
(Asynchronous Transfer Mode)的经典 IP 支持
Classical IP over ATM
如果邻居没有发现时,则不发送“ICMP host unreachable”消息
[ ] Do NOT send ICMP if no neighbor
模拟 LAN
LAN Emulation (LANE) support
ATM 之上的 Multi-Protocol 使得 ATM 边缘设备(边缘设备是指提供服务入口点的设备,如路由
器等)和 ATM 主机在子网边界建立直接的 ATM 虚拟电路。
Multi-Protocol Over ATM (MPOA) support
RFC1483/2684 Bridged protocols
[ ] Per-VC IP filter kludge
< > Layer Two Tunneling Protocol (L2TP) --->
以太网桥支持。
< > 802.1d Ethernet Bridging
[*] IGMP/MLD snooping
[ ] VLAN filtering (NEW)