logo资料库

思科VPP介绍(包括性能数据).pdf

第1页 / 共33页
第2页 / 共33页
第3页 / 共33页
第4页 / 共33页
第5页 / 共33页
第6页 / 共33页
第7页 / 共33页
第8页 / 共33页
资料共33页,剩余部分请下载后查看
FD.io – How to Push Extreme Limits of Performance and Scale with Vector Packet Processing Technology Keith Burns DEVNET-1221
Cisco Spark Questions? Use Cisco Spark to chat with the speaker after the session How 1. Find this session in the Cisco Live Mobile App 2. Click “Join the Discussion” 3. Install Spark or go directly to the space 4. Enter messages/questions in the space Cisco Spark spaces will be available until July 3, 2017. cs.co/ciscolivebot#DEVNET-1221 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public
Agenda • Introduction to VPP • Packet Processing on Commodity Hardware • Scalar and Vector Packet Processing • Graph Scheduler • Exploiting Multiple Cores • Binary APIs • Performance Data
What is Vector Packet Processing? • High performance packet-processing stack for commodity CPUs • x86_64, i686, ppc-64-BE, aarch64-LE • Endian clean, 32 / 64-bit clean • Linux user-mode process • Leverage DPDK, widely-available kernel modules • (uio, igb_uio, uio_pci_generic) • Linux user-space • Same image works in a VM, over a host kernel, in an LXC • Physical NICs via PCI direct-map • Active development since 2002 • Ships as part of Cisco embedded and server products, in volume DEVNET-1221 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 5
Packet Processing (PP) on Commodity Hardware (CH) • Packet-processing: load/store-intensive, big tables, N-tuple problems • PP on CH: significantly different than PP on NPUs • NPU: e.g. 2048 outstanding prefetches, SRAM • Commodity HW: 8 → 16 outstanding prefetches, DDRn • NPU: thousands of PPEs processing single packets • Commodity HW: tens of general-purpose cores • NPU: work distributor, TCAM, specialized counter support, QoS / queueing support • VPP solves these problems—or a useful subset—on commodity hardware • Structure the computation for CH’s convenience DEVNET-1221 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 6
Scalar Packet Processing • A fancy name for processing one packet at a time • Traditional, straightforward implementation scheme Interrupt, a calls b calls c … return return return RFI • • Considerable stack depth • Issue #1: thrashing the I-cache • When code path length exceeds the primary I-cache size, each packet incurs an identical set of I-cache misses • Only workaround: bigger caches DEVNET-1221 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 7
Scalar Packet Processing, cont’d • Dependent read latency on big forwarding tables • Example: 4 x 8 mtrie walk. Assume tables do not fit in cache. • Lookup 5.6.7.8: read root_ply[5], then ply_2[6], the ply_3[7], the ply_4[8] • Big tables: reads stall for ~170 clocks • Few opportunities to mitigate (“prefetch around”) read latency stalls DEVNET-1221 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 8
分享到:
收藏