猫棒可以设置 vlan tag 吗？

tag

路由器

交换机

海信

29 条回复 • 2023-11-27 14:13:34 +08:00

1

LGA1150

2023-06-28 20:29:36 +08:00

你的路由器不支持 VLAN tag offload ？

2

huangya

OP

2023-06-28 21:16:11 +08:00

@LGA1150 应该是不支持的。
root@OpenWrt:~# ethtool -k eth0 |grep vlan
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
rx-vlan-filter: off
vlan-challenged: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]

3

FabricPath

2023-06-29 10:04:38 +08:00

性能问题不用担心，vlan 子接口在 1G 环境下忽略不计

4

huangya

OP

2023-06-29 10:07:42 +08:00

@FabricPath 其实我不是 1G. 在我的测试 /折腾环境下我期待能跑到接近 4G （ 3 个 1000M 账号）。所以我希望能充分利用每个硬件。

5

huangya

OP

2023-06-29 10:08:48 +08:00

@FabricPath 现在是路由负载过重，跑不到期待的速率。

6

FabricPath

2023-06-29 10:32:51 +08:00

@huangya ethtool -S xxxx 看看流量是不是从单队列上来的，有可能网卡不支持 pppoe 的 rss

7

huangya

OP

2023-06-29 11:28:00 +08:00

@FabricPath 感谢分享。v 站有水平的真多。我晚上回家看看。

我现在是把中断上半部分放在 cpu0 上，cpu1 和 cpu2 做下半部分。晚上贴上 cpu loading. 记忆中 cpu0 差不多跑满了。cpu1 和 cpu2 还剩一些。

我分析了一下可能的性能热点。

1. cpu 需要打 tag 和去 tag 。
2. cpu 需要做 pppoe
3. 因为是单线多拨，使用了 kernel 的 macvlan 。macvlan 是不是 slow path?

openwrt 已经开启了 flow table （ fast path ）。在上述条件下，fast path 是不是还能工作，还是只能部分工作？

另外，我用的 soc 是 Marvell ARMADA 8040： http://macchiatobin.net/product/macchiatobin-double-shot/

8

FabricPath

2023-06-29 11:47:49 +08:00

@huangya macvlan 和 vlan 都不太可能成为性能瓶颈，macvlan 只是 xmit 的时候重新指了一下 skb->dev ，vlan 的要看上层设备的驱动怎么写的，一般也就 append 一个 header 。

这个 4*A72 跑满 3Gbps 上下行还挺吃力的，不知道他的 Packet Processor 包含什么功能。
如果你的网卡支持 ntuple ，可以用 ntuple 强制分流 pppoe 流量到其他队列上，比如我用的 i225 ，也是 3 条线路，分流到 3 个队列。

ethtool -n enp3s0|grep Filter|cut -d " " -f 2|xargs -I {} ethtool -N enp3s0 delete {}
ethtool -N enp3s0 flow-type ether dst 00:00:22:11:11:00 action 1
ethtool -N enp3s0 flow-type ether dst 00:00:22:11:11:01 action 2
ethtool -N enp3s0 flow-type ether dst 00:00:22:11:11:02 action 3

在有办法使用上 rss 的情况下，尽量不要用 rps ；如果网卡单队列，那就只能靠 rps 做软件分流了。

9

huangya

OP

2023-06-29 21:52:41 +08:00

@FabricPath 看起来是单队列。有办法确认网卡是否支持 pppoe 的 rss 吗？默认不配置 RPS 下，cpu0 几乎被吃满。并且发现测试下载的时候 rxq_0_queue_full_drops 数量会增加。

root@OpenWrt:~# ethtool -S eth0 |grep rxq
rxq_0_desc_enqueue: 9300296
rxq_0_queue_full_drops: 84685
rxq_0_packets_early_drops: 0
rxq_0_packets_bm_drops: 0
rxq_1_desc_enqueue: 0
rxq_1_queue_full_drops: 0
rxq_1_packets_early_drops: 0
rxq_1_packets_bm_drops: 0
rxq_2_desc_enqueue: 0
rxq_2_queue_full_drops: 0
rxq_2_packets_early_drops: 0
rxq_2_packets_bm_drops: 0
rxq_3_desc_enqueue: 0
rxq_3_queue_full_drops: 0
rxq_3_packets_early_drops: 0
rxq_3_packets_bm_drops: 0

10

FabricPath

2023-06-30 10:37:31 +08:00

@huangya 先看看 tx 是不是均匀的，如果 tx 是均匀的，那就说明是支持 rss 的
或者看 ethtool -k xxx|grep hash 支不支持 receive-hashing
再看
ethtool -n xxx rx-flow-hash tcp4
ethtool -n xxx rx-flow-hash udp4
有没有配置，如果都配置了，还都是在单队列上，那就是不支持解析 pppoe 的内层 hash 。

考虑用 ntuple 强制分到其他队列上，每个 CPU 处理一条宽带链路

11

huangya

OP

2023-06-30 11:13:02 +08:00

@FabricPath tx 看起来好一些。txq0-txq3 都有，txq4-txq7 没有。
root@OpenWrt:~# ethtool -S eth0 |grep txq
txq_0_desc_enqueue: 20993833
txq_0_desc_enqueue_to_ddr: 0
txq_0_buff_euqueue_to_ddr: 20993833
txq_0_desc_hardware_forwarded: 0
txq_0_packets_dequeued: 20989169
txq_0_queue_full_drops: 0
txq_0_packets_early_drops: 0
txq_0_packets_bm_drops: 0
txq_0_packets_rep_bm_drops: 0
txq_1_desc_enqueue: 4127091
txq_1_desc_enqueue_to_ddr: 0
txq_1_buff_euqueue_to_ddr: 4127091
txq_1_desc_hardware_forwarded: 0
txq_1_packets_dequeued: 4127023
txq_1_queue_full_drops: 0
txq_1_packets_early_drops: 0
txq_1_packets_bm_drops: 0
txq_1_packets_rep_bm_drops: 0
txq_2_desc_enqueue: 3610058
txq_2_desc_enqueue_to_ddr: 0
txq_2_buff_euqueue_to_ddr: 3610058
txq_2_desc_hardware_forwarded: 0
txq_2_packets_dequeued: 3609977
txq_2_queue_full_drops: 0
txq_2_packets_early_drops: 0
txq_2_packets_bm_drops: 0
txq_2_packets_rep_bm_drops: 0
txq_3_desc_enqueue: 1103662
txq_3_desc_enqueue_to_ddr: 0
txq_3_buff_euqueue_to_ddr: 1103662
txq_3_desc_hardware_forwarded: 0
txq_3_packets_dequeued: 1103615
txq_3_queue_full_drops: 0
txq_3_packets_early_drops: 0
txq_3_packets_bm_drops: 0
txq_3_packets_rep_bm_drops: 0
txq_4_desc_enqueue: 0
txq_4_desc_enqueue_to_ddr: 0
txq_4_buff_euqueue_to_ddr: 0
txq_4_desc_hardware_forwarded: 0
txq_4_packets_dequeued: 0
txq_4_queue_full_drops: 0
txq_4_packets_early_drops: 0
txq_4_packets_bm_drops: 0
txq_4_packets_rep_bm_drops: 0
txq_5_desc_enqueue: 0
txq_5_desc_enqueue_to_ddr: 0
txq_5_buff_euqueue_to_ddr: 0
txq_5_desc_hardware_forwarded: 0
txq_5_packets_dequeued: 0
txq_5_queue_full_drops: 0
txq_5_packets_early_drops: 0
txq_5_packets_bm_drops: 0
txq_5_packets_rep_bm_drops: 0
txq_6_desc_enqueue: 0
txq_6_desc_enqueue_to_ddr: 0
txq_6_buff_euqueue_to_ddr: 0
txq_6_desc_hardware_forwarded: 0
txq_6_packets_dequeued: 0
txq_6_queue_full_drops: 0
txq_6_packets_early_drops: 0
txq_6_packets_bm_drops: 0
txq_6_packets_rep_bm_drops: 0
txq_7_desc_enqueue: 0
txq_7_desc_enqueue_to_ddr: 0
txq_7_buff_euqueue_to_ddr: 0
txq_7_desc_hardware_forwarded: 0
txq_7_packets_dequeued: 0
txq_7_queue_full_drops: 0
txq_7_packets_early_drops: 0
txq_7_packets_bm_drops: 0
txq_7_packets_rep_bm_drops: 0

receive-hashing 也有，但默认关闭了。
root@OpenWrt:~# ethtool -k eth0 |grep hash
receive-hashing: off
开启之后，rx 可以均匀分布了，但是还是全部在一个 cpu 上，能跑到 900 多。
root@OpenWrt:~# ethtool -S eth0 |grep rxq
rxq_0_desc_enqueue: 26346082
rxq_0_queue_full_drops: 95608
rxq_0_packets_early_drops: 0
rxq_0_packets_bm_drops: 0
rxq_1_desc_enqueue: 2242533
rxq_1_queue_full_drops: 2057
rxq_1_packets_early_drops: 0
rxq_1_packets_bm_drops: 0
rxq_2_desc_enqueue: 2389831
rxq_2_queue_full_drops: 1742
rxq_2_packets_early_drops: 0
rxq_2_packets_bm_drops: 0
rxq_3_desc_enqueue: 4022202
rxq_3_queue_full_drops: 50
rxq_3_packets_early_drops: 0
rxq_3_packets_bm_drops: 0
如果在此基础上，使用下列命令，最好的情况（恰好 loading 被均匀分布）可以跑到 2100+。此时 cpu 1 和 cpu2 被吃满了。
for rxq in /sys/class/net/eth[01]/queues/rx*; do echo 6 > $rxq/rps_cpus; done

03:06:23 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
03:06:25 all 0.00 0.00 0.25 0.00 0.00 56.00 0.00 0.00 0.00 43.75
03:06:25 0 0.00 0.00 0.50 0.00 0.00 17.00 0.00 0.00 0.00 82.50
03:06:25 1 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00
03:06:25 2 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00
03:06:25 3 0.00 0.00 0.50 0.00 0.00 7.00 0.00 0.00 0.00 92.50
https://www.speedtest.net/result/c/d143eb75-99a3-4625-900e-3c77cb9172e9

我想把 rps 分布到 cpu1 ，cpu2 ，cpu3 ，这样可能还可以提一提. 不知道为什么 echo 14 会出错。echo 8 可以
root@OpenWrt:~# for rxq in /sys/class/net/eth[01]/queues/rx*; do echo 14 > $rxq/rps_cpus; done
ash: write error: Value too large for data type
ash: write error: Value too large for data type
ash: write error: Value too large for data type
ash: write error: Value too large for data type
ash: write error: Value too large for data type
ash: write error: Value too large for data type
ash: write error: Value too large for data type
ash: write error: Value too large for data type

12

huangya

OP

2023-06-30 11:30:17 +08:00

@FabricPath
>这个 4*A72 跑满 3Gbps 上下行还挺吃力的，不知道他的 Packet Processor 包含什么功能。
这里 BLOCK DIAGRAM 有： https://en.sekorm.com/doc/1816470.html
不知道 Packet Processor 在转发的时候是否可以在 openwrt 用上。可能是用在"ODP (Open Data Plane) compliant"?

另外 ntuple 默认是打开的。上述连接也说了 ntuple 是支持的。
root@OpenWrt:~# ethtool -k eth0 |grep ntuple
ntuple-filters: on [fixed]

13

titanium98118

2023-06-30 11:52:20 +08:00

无意中点进你们的讨论，发现了新大陆。
我的 R5c ，跑 1G 下行，核心 0 、2 基本占用 100%，核心 1 、3 50%左右。
于是按照你们讨论的命令运行了一下，发现只有 vlan offload 是 on 。。。
root@r5c:~# ethtool -k eth1 |grep vlan
rx-vlan-offload: on
tx-vlan-offload: on
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]

root@r5c:~# ethtool -k eth1|grep hash
receive-hashing: off [fixed]

root@r5c:~# ethtool -n eth1 rx-flow-hash tcp4
Cannot get RX network flow hashing options: Not supported

root@r5c:~# ethtool -k eth0 |grep ntuple
ntuple-filters: off [fixed]

倒是用 hyper-v 跑的 openwrt 能支持 receive-hashing
root@VM-OpenWrt:~# ethtool -k eth0|grep hash
receive-hashing: on

root@VM-OpenWrt:~# ethtool -n eth1 rx-flow-hash tcp4
TCP over IPV4 flows use these fields for computing Hash flow key:
IP SA
IP DA
L4 bytes 0 & 1 [TCP/UDP src port]
L4 bytes 2 & 3 [TCP/UDP dst port]

root@VM-OpenWrt:~# ethtool -k eth0 |grep ntuple
ntuple-filters: off [fixed]

但不知为何，我两个 openwrt 这个命令都是无 rxq 的结果?
root@r5c:~# ethtool -S eth1
NIC statistics:
tx_packets: 1019723009
rx_packets: 512296019
tx_errors: 0
rx_errors: 0
rx_missed: 5087
align_errors: 0
tx_single_collisions: 0
tx_multi_collisions: 0
unicast: 511816379
broadcast: 479621
multicast: 19
tx_aborted: 0
tx_underrun: 0
tx_octets: 1354315138564
rx_octets: 134934337749
rx_multicast64: 0
tx_unicast64: 1019722967
tx_broadcast64: 7
tx_multicast64: 35
tx_pause_on: 0
tx_pause_off: 0
tx_pause_all: 0
tx_deferred: 0
tx_late_collision: 0
tx_all_collision: 0
tx_aborted32: 0
align_errors32: 0
rx_frame_too_long: 0
rx_runt: 0
rx_pause_on: 0
rx_pause_off: 0
rx_pause_all: 0
rx_unknown_opcode: 0
rx_mac_error: 112
tx_underrun32: 0
rx_mac_missed: 86767
rx_tcam_dropped: 0
tdu: 0
rdu: 200992

14

FabricPath

2023-06-30 11:56:09 +08:00

@huangya 你这个 CPU 不是只有 4 个核心么，只有 4 个 bit ，所以 rps 是 0000 - 1111 （ 0~F)

你先把 ethtool -l xxx 看队列数，先 ethtool -L 把队列数调整成和你 CPU 相同的数量。
然后 cat /proc/interrupts |grep xxx 看看每个队列的中断号

修改 /proc/irq/xxxx/smp_affinity_list ，让每个中断绑到一个核心上（有的网卡有管理通道的中断，无视掉）

比如我的
# cat /proc/interrupts |grep enp2s0|awk '{print $1}'|cut -d ":" -f 1|xargs -I {} cat /proc/irq/{}/smp_affinity_list
0-11
1
3
5
7

绑中断+rss 生效的话，把 rps 全关了
然后 top ，之后按 1 ，看每个 CPU 的 SI 是不是均匀的。比如我的，3 个队列起在 3 个 CPU 上

top - 11:55:38 up 28 days, 11:19, 1 user, load average: 2.31, 2.50, 2.61
Tasks: 487 total, 1 running, 486 sleeping, 0 stopped, 0 zombie
%Cpu0 : 21.1 us, 1.1 sy, 0.0 ni, 77.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 7.1 us, 3.0 sy, 0.0 ni, 86.9 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st
%Cpu2 : 17.2 us, 3.2 sy, 0.0 ni, 79.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 3.1 us, 2.1 sy, 0.0 ni, 91.7 id, 0.0 wa, 0.0 hi, 3.1 si, 0.0 st
%Cpu4 : 17.4 us, 3.3 sy, 0.0 ni, 79.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 7.7 us, 1.0 sy, 0.0 ni, 85.6 id, 0.0 wa, 0.0 hi, 5.8 si, 0.0 st
%Cpu6 : 23.5 us, 3.1 sy, 0.0 ni, 73.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 4.1 us, 0.0 sy, 0.0 ni, 95.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu8 : 21.7 us, 3.3 sy, 0.0 ni, 73.9 id, 0.0 wa, 0.0 hi, 1.1 si, 0.0 st
%Cpu9 : 21.1 us, 4.2 sy, 0.0 ni, 74.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu10 : 18.8 us, 6.2 sy, 0.0 ni, 75.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu11 : 16.7 us, 4.2 sy, 0.0 ni, 79.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

15

FabricPath

2023-06-30 11:58:57 +08:00

@titanium98118 我之前也用的 r5c ，CPU 太拉胯了，这个 8125 在新版 driver 也强制关闭了 rss ，软路由还是 intel 的网卡好一点。我之前大概 30Kpps 的时候，CPU 会吃掉 40%左右，开流量整形之后直接飚到 60%

16

huangya

OP

2023-06-30 11:59:33 +08:00

@FabricPath 测试了一下 ntuple, 出错了，还在 debug
root@OpenWrt:~# ethtool -N eth0 flow-type ether dst 32:2F:61:11:3B:69 action 1
rmgr: Invalid RX class rules table size: Not supported
Cannot insert classification rule

17

FabricPath

2023-06-30 12:10:21 +08:00

@huangya 可以直接看网卡驱动的代码支持哪些能力，搜`set_rxnfc` 这个函数，比如 igc 是 igc_ethtool_set_rxnfc ，ntuple 极其灵活，大部分网卡都只支持一部分功能，比如 I225 只支持 vlan 和 eth header 的匹配

18

huangya

OP

2023-06-30 12:11:01 +08:00

@FabricPath
>你这个 CPU 不是只有 4 个核心么，只有 4 个 bit ，所以 rps 是 0000 - 1111 （ 0~F)
犯了了个低级错误，我 echo 用的是 10 进制。echo e 好了。最好的情况可以跑到 3200+了。此时 cpu 1,2,3 跑满。
04:07:02 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
04:07:04 all 0.00 0.00 0.25 0.00 0.00 83.50 0.00 0.00 0.00 16.25
04:07:04 0 0.00 0.00 0.50 0.00 0.00 38.50 0.00 0.00 0.00 61.00
04:07:04 1 0.00 0.00 0.00 0.00 0.00 100.00 0.00 0.00 0.00 0.00
04:07:04 2 0.00 0.00 0.50 0.00 0.00 96.50 0.00 0.00 0.00 3.00
04:07:04 3 0.00 0.00 0.00 0.00 0.00 99.00 0.00 0.00 0.00 1.00

https://www.speedtest.net/result/c/c0424651-4045-4a30-a408-fc84fc7a0917 。

后面我再研究一下 rss 。看是否不要 rps ，最好是能用 rss 。

19

huangya

OP

2023-06-30 14:44:56 +08:00

@FabricPath . 设置了 smp_affinity_list 。eth0 为 wan ，eth1 （ 10g 测试机连接），eth2 ，eth3 为 lan 。（ 51,56,61 ，66 应该就是管理中断）
root@OpenWrt:~# grep eth /proc/interrupts
47: 11684059 0 0 0 ICU-NSR 39 Level eth0
48: 0 1767000 0 0 ICU-NSR 43 Level eth0
49: 0 0 1748609 0 ICU-NSR 47 Level eth0
50: 0 0 0 2103639 ICU-NSR 51 Level eth0
51: 4 0 0 0 ICU-NSR 129 Level eth0
52: 5962016 0 0 0 ICU-NSR 39 Level eth1
53: 0 873775 0 0 ICU-NSR 43 Level eth1
54: 0 0 957364 0 ICU-NSR 47 Level eth1
55: 0 0 0 566200 ICU-NSR 51 Level eth1
56: 16 0 0 0 ICU-NSR 129 Level eth1
57: 7114790 0 0 0 ICU-NSR 40 Level eth2
58: 0 82885 0 0 ICU-NSR 44 Level eth2
59: 0 0 71360 0 ICU-NSR 48 Level eth2
60: 0 0 0 107930 ICU-NSR 52 Level eth2
61: 1 0 0 0 ICU-NSR 128 Level eth2
62: 0 0 0 0 ICU-NSR 41 Level eth3
63: 0 0 0 0 ICU-NSR 45 Level eth3
64: 0 0 0 0 ICU-NSR 49 Level eth3
65: 0 0 0 0 ICU-NSR 53 Level eth3
66: 0 0 0 0 ICU-NSR 127 Level eth3

root@OpenWrt:~# cat /proc/irq/47/smp_affinity_list
0
root@OpenWrt:~# cat /proc/irq/48/smp_affinity_list
1
root@OpenWrt:~# cat /proc/irq/49/smp_affinity_list
2
root@OpenWrt:~# cat /proc/irq/50/smp_affinity_list
3
root@OpenWrt:~# cat /proc/irq/52/smp_affinity_list
0
root@OpenWrt:~# cat /proc/irq/53/smp_affinity_list
1
root@OpenWrt:~# cat /proc/irq/54/smp_affinity_list
2
root@OpenWrt:~# cat /proc/irq/55/smp_affinity_list
3

但从测试看，没有 rps 分布均匀。所以跑到较好的速度的概率小很多。可能要跑个 pt/bt 下载才能知道。speedtest session 太少了。
root@OpenWrt:~# ethtool -n eth0 rx-flow-hash tcp4
TCP over IPV4 flows use these fields for computing Hash flow key:
IP SA
IP DA
L4 bytes 0 & 1 [TCP/UDP src port]
L4 bytes 2 & 3 [TCP/UDP dst port]

20

TESTFLIGHT2021

2023-11-27 10:31:09 +08:00

中间加个交换机搞定 H3C E508

21

huangya

OP

2023-11-27 10:33:17 +08:00

@TESTFLIGHT2021 我知道加交换机。但是想省一个设备。运营商给的光猫能带 tag 。我希望猫棒也能做到。

22

TESTFLIGHT2021

2023-11-27 10:41:00 +08:00

@huangya 海信的软件没实现芯片肯定有 VLAN 功能

23

PLDj0j9FY2y8Wm9i

2023-11-27 11:12:19 +08:00

剑桥的棒子 XE-99S 如果有厂家提供技术支持好像可以开 VLAN

24

huangya

OP

2023-11-27 11:38:49 +08:00

@username1919810
@TESTFLIGHT2021
哦，只要有人能提供剑桥的棒子且设置好了 vlan 做参考，通过比较。我应该能搞定海信的。

25

TESTFLIGHT2021

2023-11-27 11:44:35 +08:00

OP 可以起 VLAN 的然后再 VLAN 拨号

26

huangya

OP

2023-11-27 12:42:17 +08:00

@TESTFLIGHT2021
没太看懂你的意思。OP 指我吗？还是 username1919810 ，因为他发了另外一个帖子。

27

TESTFLIGHT2021

2023-11-27 13:01:17 +08:00

@huangya openwrt

28

huangya

OP

2023-11-27 13:43:35 +08:00

@TESTFLIGHT2021 嗯，目前是这样做的。但你看前面的帖子就知道了，如果运行 openwrt 的路由器没有硬件 vlan offloading 的话，可能会增加路由器的负载。

29

TESTFLIGHT2021

2023-11-27 14:13:34 +08:00

@huangya 还好 X86 就别纠结这个了