V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
lilu0826
V2EX  ›  宽带症候群

新买的 N100 小主机双螃蟹网卡跑 PVE 虚拟机网卡不定时挂掉求助

  •  
  •   lilu0826 · 2023-11-13 11:24:10 +08:00 · 2550 次点击
    这是一个创建于 400 天前的主题,其中的信息可能已经有所发展或是发生改变。
    拼夕夕买的一个 N100 小主机,带了两个螃蟹 R8169 网口,现在安装 PVE 虚拟机,不定时会断线像死机一样,检查了系统日志好像是报的网卡错误,求大神帮忙分析下日志。

    PVE 下载的最新的 8.x 版本,就装了一个 openwrt 和 iKuai 其他都还没装呢。

    日志信息:
    Nov 12 13:38:28 com kernel: ------------[ cut here ]------------
    Nov 12 13:38:28 com kernel: NETDEV WATCHDOG: enp1s0 (r8169): transmit queue 0 timed out
    Nov 12 13:38:28 com kernel: WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x23a/0x250
    Nov 12 13:38:28 com kernel: Modules linked in: tcp_diag inet_diag veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables bonding tls softdog sunrpc nfnetlink_log nfnetlink binfmt_misc snd_hda_codec_hdmi snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence intel_rapl_msr snd_sof_intel_hda intel_rapl_common snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match x86_pkg_temp_thermal intel_powerclamp snd_soc_acpi i915 soundwire_bus coretemp snd_soc_core rtw88_8821ce rtw88_8821c snd_compress kvm_intel ac97_bus drm_buddy rtw88_pci snd_pcm_dmaengine ttm rtw88_core drm_display_helper kvm snd_hda_intel cec irqbypass snd_intel_dspcfg snd_usb_audio snd_intel_sdw_acpi rc_core mac80211 crct10dif_pclmul snd_hda_codec polyval_clmulni btusb polyval_generic snd_usbmidi_lib ghash_clmulni_intel btrtl sha512_ssse3 snd_hda_core snd_rawmidi btbcm
    Nov 12 13:38:28 com kernel: drm_kms_helper aesni_intel snd_seq_device btintel snd_hwdep btmtk snd_pcm i2c_algo_bit crypto_simd cfg80211 syscopyarea cryptd sysfillrect snd_timer bluetooth rapl cmdlinepart ecdh_generic snd intel_cstate sysimgblt ecc pcspkr soundcore libarc4 spi_nor mei_me wmi_bmof mtd mei ov13858 v4l2_fwnode v4l2_async videodev mc acpi_tad acpi_pad mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq simplefb usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c crc32_pclmul r8169 spi_intel_pci i2c_i801 realtek i2c_smbus spi_intel xhci_pci intel_lpss_pci intel_lpss xhci_pci_renesas ahci libahci idma64 xhci_hcd video wmi pinctrl_alderlake
    Nov 12 13:38:28 com kernel: CPU: 1 PID: 0 Comm: swapper/1 Tainted: P O 6.2.16-3-pve #1
    Nov 12 13:38:28 com kernel: Hardware name: /, BIOS 5.27 08/10/2023
    Nov 12 13:38:28 com kernel: RIP: 0010:dev_watchdog+0x23a/0x250
    Nov 12 13:38:28 com kernel: Code: 00 e9 2b ff ff ff 48 89 df c6 05 8a 6f 7d 01 01 e8 6b 08 f8 ff 44 89 f1 48 89 de 48 c7 c7 58 64 60 9e 48 89 c2 e8 06 ab 30 ff <0f> 0b e9 1c ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
    Nov 12 13:38:28 com kernel: RSP: 0018:ffffb8b7401b8e38 EFLAGS: 00010246
    Nov 12 13:38:28 com kernel: RAX: 0000000000000000 RBX: ffff974f84454000 RCX: 0000000000000000
    Nov 12 13:38:28 com kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
    Nov 12 13:38:28 com kernel: RBP: ffffb8b7401b8e68 R08: 0000000000000000 R09: 0000000000000000
    Nov 12 13:38:28 com kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff974f844544c8
    Nov 12 13:38:28 com kernel: R13: ffff974f8445441c R14: 0000000000000000 R15: 0000000000000000
    Nov 12 13:38:28 com kernel: FS: 0000000000000000(0000) GS:ffff9752efa80000(0000) knlGS:0000000000000000
    Nov 12 13:38:28 com kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Nov 12 13:38:28 com kernel: CR2: 00007f0bfa1a8eec CR3: 000000010ae00000 CR4: 0000000000752ee0
    Nov 12 13:38:28 com kernel: PKRU: 55555554
    Nov 12 13:38:28 com kernel: Call Trace:
    Nov 12 13:38:28 com kernel: <IRQ>
    Nov 12 13:38:28 com kernel: ? __pfx_dev_watchdog+0x10/0x10
    Nov 12 13:38:28 com kernel: call_timer_fn+0x29/0x160
    Nov 12 13:38:28 com kernel: ? __pfx_dev_watchdog+0x10/0x10
    Nov 12 13:38:28 com kernel: __run_timers+0x259/0x310
    Nov 12 13:38:28 com kernel: run_timer_softirq+0x1d/0x40
    Nov 12 13:38:28 com kernel: __do_softirq+0xd6/0x346
    Nov 12 13:38:28 com kernel: ? hrtimer_interrupt+0x11f/0x250
    Nov 12 13:38:28 com kernel: __irq_exit_rcu+0xa2/0xd0
    Nov 12 13:38:28 com kernel: irq_exit_rcu+0xe/0x20
    Nov 12 13:38:28 com kernel: sysvec_apic_timer_interrupt+0x92/0xd0
    Nov 12 13:38:28 com kernel: </IRQ>
    Nov 12 13:38:28 com kernel: <TASK>
    Nov 12 13:38:28 com kernel: asm_sysvec_apic_timer_interrupt+0x1b/0x20
    Nov 12 13:38:28 com kernel: RIP: 0010:cpuidle_enter_state+0xde/0x6f0
    Nov 12 13:38:28 com kernel: Code: 2a 77 62 e8 54 7e 4a ff 8b 53 04 49 89 c7 0f 1f 44 00 00 31 ff e8 82 86 49 ff 80 7d d0 00 0f 85 eb 00 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 12 02 00 00 4d 63 ee 49 83 fd 09 0f 87 c7 04 00 00
    Nov 12 13:38:28 com kernel: RSP: 0018:ffffb8b74016be38 EFLAGS: 00000246
    Nov 12 13:38:28 com kernel: RAX: 0000000000000000 RBX: ffffd8b73fca1b00 RCX: 0000000000000000
    Nov 12 13:38:28 com kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
    Nov 12 13:38:28 com kernel: RBP: ffffb8b74016be88 R08: 0000000000000000 R09: 0000000000000000
    Nov 12 13:38:28 com kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff9f0c33a0
    Nov 12 13:38:28 com kernel: R13: 0000000000000001 R14: 0000000000000001 R15: 00004815ebc4326c
    Nov 12 13:38:28 com kernel: ? cpuidle_enter_state+0xce/0x6f0
    Nov 12 13:38:28 com kernel: cpuidle_enter+0x2e/0x50
    Nov 12 13:38:28 com kernel: do_idle+0x216/0x2a0
    Nov 12 13:38:28 com kernel: cpu_startup_entry+0x1d/0x20
    Nov 12 13:38:28 com kernel: start_secondary+0x122/0x160
    Nov 12 13:38:28 com kernel: secondary_startup_64_no_verify+0xe5/0xeb
    Nov 12 13:38:28 com kernel: </TASK>
    Nov 12 13:38:28 com kernel: ---[ end trace 0000000000000000 ]---
    后面一段日志就一直重复输出了
    Nov 12 13:38:28 com kernel: r8169 0000:01:00.0 enp1s0: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
    Nov 12 13:38:28 com kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
    Nov 12 13:38:28 com kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
    Nov 12 13:38:28 com kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
    Nov 12 13:38:28 com kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
    Nov 12 13:38:28 com kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
    Nov 12 13:38:28 com kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
    Nov 12 13:38:28 com kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
    Nov 12 13:38:28 com kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
    Nov 12 13:38:28 com kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
    Nov 12 13:38:38 com kernel: net_ratelimit: 9 callbacks suppressed
    第 1 条附言  ·  2023-12-15 13:57:09 +08:00
    已解决:把 r8168 驱动装上了,稳定运行 25 天了!
    23 条回复    2023-11-15 18:24:03 +08:00
    wizzer
        1
    wizzer  
       2023-11-13 11:26:21 +08:00
    一般是驱动问题,装 PVE 7.x 或者 直接用爱快,不用 PVE
    ArmstrongPater
        2
    ArmstrongPater  
       2023-11-13 11:28:47 +08:00   ❤️ 1
    个错误信息表明发生了一个网络设备( Network Device )的观察者( Watchdog )超时事件。具体地说,是一个名为 enp1s0 的网卡(使用 r8169 驱动程序)的传输队列 0 超时了。

    错误信息中的关键部分如下:

    arduino
    Copy code
    NETDEV WATCHDOG: enp1s0 (r8169): transmit queue 0 timed out

    这告诉我们,观察到了 enp1s0 网卡的传输队列 0 超时。
    lilu0826
        3
    lilu0826  
    OP
       2023-11-13 11:32:36 +08:00
    @wizzer 好,我去试试 PVE7.x
    lazyyz
        4
    lazyyz  
       2023-11-13 11:39:35 +08:00
    大概率是驱动问题,安装 r8168-dkms 试试
    nygpz
        5
    nygpz  
       2023-11-13 11:48:01 +08:00
    驱动问题,我之前就碰到过。重新安装网卡驱动就没问题了
    dangyuluo
        6
    dangyuluo  
       2023-11-13 12:40:01 +08:00
    下次用 Intel 的网卡吧,用了几年了比较稳定
    dangyuluo
        7
    dangyuluo  
       2023-11-13 12:40:01 +08:00
    下次用 Intel 的网卡吧,用了几年了比较稳定
    TsubasaHanekaw
        8
    TsubasaHanekaw  
       2023-11-13 13:38:37 +08:00
    PVE 自带的驱动对于螃蟹网卡支持不行 ,你要自己重装下驱动
    Just4L
        9
    Just4L  
       2023-11-13 13:48:52 +08:00
    螃蟹网卡经常断流
    sunulin
        10
    sunulin  
       2023-11-13 13:56:39 +08:00
    exsi 也是。用的 I226-V 。不定时的红屏, 也是提示 网口问题,我都怀疑是机器有啥 BUG
    ixdeal
        11
    ixdeal  
       2023-11-13 15:25:36 +08:00
    @TsubasaHanekaw #8 我的好像也是这个问题,不过最近应该 PVE8 升级了,重新安装后基本比较稳定了。
    shuax
        12
    shuax  
       2023-11-13 15:29:04 +08:00
    Nov 05 08:57:56 pve kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
    TDH <28>
    TDT <2f>
    next_to_use <2f>
    next_to_clean <28>
    buffer_info[next_to_clean]:
    time_stamp <1019aa6f8>
    next_to_watch <28>
    jiffies <1019aa8a8>
    next_to_watch.status <0>
    MAC Status <40080083>
    PHY Status <796d>
    PHY 1000BASE-T Status <3800>
    PHY Extended Status <3000>
    PCI Status <10>
    Nov 05 08:57:58 pve kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
    TDH <28>
    TDT <2f>
    next_to_use <2f>
    next_to_clean <28>
    buffer_info[next_to_clean]:
    time_stamp <1019aa6f8>
    next_to_watch <28>
    jiffies <1019aaa98>
    next_to_watch.status <0>
    MAC Status <40080083>
    PHY Status <796d>
    PHY 1000BASE-T Status <3800>
    PHY Extended Status <3000>
    PCI Status <10>

    Intel 也一样崩。
    TsubasaHanekaw
        13
    TsubasaHanekaw  
       2023-11-13 15:29:49 +08:00
    @ixdeal 我的 8 还是有问题,默认没装驱动的时候 iperf3 都只能有 300m,后续换驱动了才正常
    sudo apt-get install r8168-dkms
    laucenmi
        14
    laucenmi  
       2023-11-13 15:39:50 +08:00
    有人说警用 TSO 可以行
    ethtool -K enp1s0 tso off
    没有 ethtool 就 apt install 一下
    chnsatan
        15
    chnsatan  
       2023-11-13 16:41:27 +08:00
    @laucenmi PVE 下亲测是有效的,之前遇到过这问题
    zhouqian
        16
    zhouqian  
       2023-11-13 19:00:50 +08:00
    pve (运行时间: 121 天 18:48:15)
    i225
    挺满意的。
    VIRUSR
        17
    VIRUSR  
       2023-11-13 21:14:37 +08:00
    r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
    自从升级了 pve8 我也遇见了 3 次……
    lilu0826
        18
    lilu0826  
    OP
       2023-11-13 22:24:39 +08:00
    root@com:~# lsmod | grep r81
    r8168 655360 0
    root@com:~# ethtool -i enp1s0
    driver: r8168
    version: 8.051.02-NAPI
    firmware-version:
    expansion-rom-version:
    bus-info: 0000:01:00.0
    supports-statistics: yes
    supports-test: no
    supports-eeprom-access: no
    supports-register-dump: yes
    supports-priv-flags: no

    感谢各位大佬的回复,现在把 r8168 驱动装上了,再看看还拉闸不。
    lilu0826
        19
    lilu0826  
    OP
       2023-11-13 22:27:05 +08:00
    更新:r8168 驱动装上了感觉网速怎么都变快了!!!
    justtoxic
        20
    justtoxic  
       2023-11-14 15:37:07 +08:00
    螃蟹 8168 网卡的 n100 这种组合也能买吗,这个网卡很垃圾的,我一个三年前的索泰的小主机,cpu 是 n3160 ,用这个网卡一直有莫名其妙的丢包,安装了螃蟹自己的驱动都有丢包。
    lilu0826
        21
    lilu0826  
    OP
       2023-11-15 13:51:37 +08:00
    2023 年 11 月 15 日 13 点 30 分,运行了一天多又挂了。。。。
    lilu0826
        22
    lilu0826  
    OP
       2023-11-15 13:54:49 +08:00   ❤️ 1
    @laucenmi 再试试你这个命令,还挂的话就不玩 PVE 了。
    qsx313
        23
    qsx313  
       2023-11-15 18:24:03 +08:00
    是不是英睿达内存,是的话换三星就好了
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   5764 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 21ms · UTC 01:45 · PVG 09:45 · LAX 17:45 · JFK 20:45
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.