V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
V2EX 提问指南
geew
V2EX  ›  问与答

[linux][系统宕机]可能性的原因有哪些呢?

  •  
  •   geew · 2013-10-11 09:59:49 +08:00 · 6495 次点击
    这是一个创建于 4048 天前的主题,其中的信息可能已经有所发展或是发生改变。
    经常莫名其妙的发生
    做软路由的服务器, 安装了几个kvm

    可能的原因有哪些呢, 需要关注哪些日志来进行故障查找呢

    求运维高手求各种高手
    10 条回复    1970-01-01 08:00:00 +08:00
    megaforce
        1
    megaforce  
       2013-10-11 10:45:18 +08:00
    dmesg看看唄
    什么/var/log/下面的日志都看看
    echo1937
        2
    echo1937  
       2013-10-11 11:02:43 +08:00
    有Kdump吗?分析一下Kdump抓下来的core。
    eth2net
        3
    eth2net  
       2013-10-11 11:21:28 +08:00
    宕机是panic还是hang?如#2,有kdump最好
    sdysj
        4
    sdysj  
       2013-10-11 11:31:44 +08:00
    没日志你问个毛线啊。。。
    BOYPT
        5
    BOYPT  
       2013-10-11 14:20:56 +08:00
    不贴日志么,那我就猜猜吧:

    有可能是外星人潜入你们机房,研究你们机器CPU时候不小心改掉了SP寄存器的最高位。
    humiaozuzu
        6
    humiaozuzu  
       2013-10-11 14:42:30 +08:00
    @sdysj
    @BOYPT 人家问的是需要关注哪些日志来进行故障查找呢
    geew
        7
    geew  
    OP
       2013-10-11 16:18:19 +08:00
    好吧 贴下*.err的日志 PS: 不知道这个编辑器怎么排版啊

    Oct 11 11:09:50 master pppoe[7973]: Bad TCP checksum 2a00
    Oct 11 11:10:13 master pppoe[7973]: Bad TCP checksum 3200
    Oct 11 11:10:21 master pppoe[7973]: Bad TCP checksum 2a00
    Oct 11 11:10:23 master pppoe[7973]: Bad TCP checksum 6900
    Oct 11 11:10:34 master pppoe[7973]: Bad TCP checksum 3900
    Oct 11 11:12:23 master pppoe[7973]: Bad TCP checksum 2a00
    Oct 11 11:12:42 master pppoe[7973]: Bad TCP checksum 6100
    Oct 11 11:18:05 master pppoe[7973]: Bad TCP checksum 2a00
    Oct 11 11:18:19 master pppoe[7973]: Bad TCP checksum 6000
    Oct 11 11:18:19 master pppoe[7973]: Bad TCP checksum 2a00
    Oct 11 11:18:23 master pppoe[7973]: Bad TCP checksum 3400
    Oct 11 11:23:51 master pppoe[7973]: Bad TCP checksum 6900
    Oct 11 11:23:51 master pppoe[7973]: Bad TCP checksum 6900
    Oct 11 11:24:17 master pppoe[7973]: Bad TCP checksum 2a00
    Oct 11 11:24:17 master pppoe[7973]: Bad TCP checksum 2a00
    Oct 11 11:25:31 master pppoe[7973]: Bad TCP checksum 3d00
    Oct 11 11:30:26 master pppoe[7973]: Bad TCP checksum 3200
    Oct 11 11:38:02 master pppoe[7973]: Bad TCP checksum 3000
    Oct 11 11:44:09 master pppoe[7973]: Bad TCP checksum 2a00
    Oct 11 11:50:31 master pppoe[7973]: Bad TCP checksum 6d00
    Oct 11 11:53:43 master pppoe[7973]: Bad TCP checksum 2a00
    Oct 11 12:46:07 master pppoe[7973]: Bad TCP checksum 6900
    Oct 11 12:55:30 master pppoe[7973]: Bad TCP checksum 2a00
    Oct 11 12:57:41 master pppoe[7973]: Bad TCP checksum 2a00
    Oct 11 12:57:41 master pppoe[7973]: Bad TCP checksum 2a00
    Oct 11 13:05:56 master pppoe[7973]: Bad TCP checksum 2a00
    Oct 11 13:06:52 master pppoe[7973]: Bad TCP checksum 2a00
    Oct 11 13:28:23 master pppoe[7973]: Bad TCP checksum 900
    Oct 11 13:51:32 master pppoe[7973]: Bad TCP checksum 3c00
    Oct 11 14:05:12 master pppoe[7973]: Bad TCP checksum 2a00
    Oct 11 14:06:08 master pppoe[7973]: Bad TCP checksum 1800
    Oct 11 14:11:33 master pppoe[7973]: Bad TCP checksum 3100
    Oct 11 15:01:45 master pppoe[7973]: Bad TCP checksum 3100
    Oct 11 15:48:58 master kernel: BUG: soft lockup - CPU#0 stuck for 67s! [ksmd:99]
    Oct 11 15:48:58 master kernel: Stack:
    Oct 11 15:48:58 master kernel: Call Trace:
    Oct 11 15:48:58 master kernel: Code: 01 74 05 e8 92 7a d8 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 0f b7 17 <eb> f5 83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89 e5 0f 1f
    Oct 11 15:52:49 master kernel: [drm:radeon_dp_i2c_aux_ch] *ERROR* aux i2c too many retries, giving up
    Oct 11 15:52:49 master kernel: [drm:radeon_dp_i2c_aux_ch] *ERROR* aux i2c too many retries, giving up
    Oct 11 15:52:49 master kernel: ata2.00: failed to resume link (SControl 0)
    Oct 11 15:52:49 master kernel: ata2.01: failed to resume link (SControl 0)
    Oct 11 15:53:00 master nslcd[1847]: [8b4567] no available LDAP server found
    Oct 11 15:53:10 master nslcd[1847]: [8b4567] no available LDAP server found
    Oct 11 15:53:20 master nslcd[1847]: [7b23c6] no available LDAP server found
    Oct 11 15:53:20 master nslcd[1847]: [7b23c6] no available LDAP server found
    Oct 11 15:53:20 master nslcd[1847]: [3c9869] no available LDAP server found
    Oct 11 15:53:20 master nslcd[1847]: [3c9869] no available LDAP server found
    Oct 11 15:53:20 master nslcd[1847]: [334873] no available LDAP server found
    Oct 11 15:53:20 master nslcd[1847]: [334873] no available LDAP server found
    Oct 11 15:53:20 master nslcd[1847]: [b0dc51] no available LDAP server found
    Oct 11 15:53:20 master nslcd[1847]: [b0dc51] no available LDAP server found
    Oct 11 15:53:20 master nslcd[1847]: [495cff] no available LDAP server found
    Oct 11 15:53:20 master nslcd[1847]: [495cff] no available LDAP server found
    Oct 11 15:53:20 master nslcd[1847]: [e8944a] no available LDAP server found
    Oct 11 15:53:20 master nslcd[1847]: [e8944a] no available LDAP server found
    Oct 11 15:53:20 master nslcd[1847]: [5558ec] no available LDAP server found
    Oct 11 15:53:20 master nslcd[1847]: [5558ec] no available LDAP server found
    Oct 11 15:53:20 master nslcd[1847]: [8e1f29] no available LDAP server found
    Oct 11 15:53:20 master nslcd[1847]: [8e1f29] no available LDAP server found
    Oct 11 15:53:22 master automount[2097]: lookup_read_master: lookup(nisplus): couldn't locate nis+ table auto.master
    Oct 11 15:53:23 master xinetd[2133]: Server /usr/sbin/amandad is not executable [file=/etc/xinetd.d/amanda] [line=13]
    Oct 11 15:53:23 master xinetd[2133]: Error parsing attribute server - DISABLING SERVICE [file=/etc/xinetd.d/amanda] [line=13]
    Oct 11 15:53:23 master dhcpd: WARNING: Host declarations are global. They are not limited to the scope you declared them in.
    Oct 11 15:53:24 master libvirtd: Could not find keytab file: /etc/libvirt/krb5.tab: No such file or directory
    Oct 11 15:53:24 master nslcd[1847]: [e87ccd] no available LDAP server found
    Oct 11 15:53:24 master nslcd[1847]: [e87ccd] no available LDAP server found
    Oct 11 15:53:27 master nslcd[1847]: [1b58ba] no available LDAP server found
    Oct 11 15:53:27 master nslcd[1847]: [1b58ba] no available LDAP server found
    Oct 11 15:53:27 master qemu-kvm: Could not find keytab file: /etc/qemu/krb5.tab: No such file or directory
    Oct 11 15:53:27 master nslcd[1847]: [7ed7ab] no available LDAP server found
    Oct 11 15:53:27 master nslcd[1847]: [7ed7ab] no available LDAP server found
    Oct 11 15:53:27 master qemu-kvm: Could not find keytab file: /etc/qemu/krb5.tab: No such file or directory
    Oct 11 15:53:28 master nslcd[1847]: [b141f2] no available LDAP server found
    Oct 11 15:53:28 master nslcd[1847]: [b141f2] no available LDAP server found
    Oct 11 15:53:28 master qemu-kvm: Could not find keytab file: /etc/qemu/krb5.tab: No such file or directory
    Oct 11 15:53:28 master nslcd[1847]: [b71efb] no available LDAP server found
    Oct 11 15:53:28 master nslcd[1847]: [b71efb] no available LDAP server found
    Oct 11 15:53:28 master qemu-kvm: Could not find keytab file: /etc/qemu/krb5.tab: No such file or directory
    Oct 11 15:53:29 master nslcd[1847]: [e2a9e3] no available LDAP server found
    Oct 11 15:53:29 master nslcd[1847]: [e2a9e3] no available LDAP server found
    Oct 11 15:53:29 master qemu-kvm: Could not find keytab file: /etc/qemu/krb5.tab: No such file or directory
    Oct 11 15:53:29 master nslcd[1847]: [45e146] no available LDAP server found
    Oct 11 15:53:29 master nslcd[1847]: [45e146] no available LDAP server found
    Oct 11 15:53:29 master qemu-kvm: Could not find keytab file: /etc/qemu/krb5.tab: No such file or directory
    Oct 11 15:53:47 master nslcd[1847]: [5f007c] no available LDAP server found
    Oct 11 15:53:47 master nslcd[1847]: [5f007c] no available LDAP server found
    Oct 11 15:53:49 master nslcd[1847]: [d062c2] no available LDAP server found
    Oct 11 15:53:49 master nslcd[1847]: [d062c2] no available LDAP server found
    Oct 11 15:53:50 master nslcd[1847]: [200854] no available LDAP server found
    Oct 11 15:53:50 master nslcd[1847]: [200854] no available LDAP server found
    Oct 11 15:54:11 master nslcd[1847]: [b127f8] no available LDAP server found
    Oct 11 15:54:11 master nslcd[1847]: [b127f8] no available LDAP server found
    Oct 11 15:54:11 master kernel: kvm: 2403: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xabcd
    Oct 11 15:54:13 master kernel: kvm: 2461: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xabcd
    Oct 11 15:54:13 master kernel: kvm: 2432: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xabcd
    Oct 11 15:54:14 master kernel: kvm: 2472: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xabcd
    Oct 11 15:54:14 master kernel: kvm: 2532: cpu0 unimplemented perfctr wrmsr: 0xc1 data 0xabcd
    Oct 11 15:55:11 master nslcd[1847]: [16231b] no available LDAP server found
    Oct 11 15:55:11 master nslcd[1847]: [16231b] no available LDAP server found
    geew
        8
    geew  
    OP
       2013-10-11 16:20:44 +08:00
    @echo1937
    @eth2net
    @sdysj
    @BOYPT
    @humiaozuzu
    还需要哪些日志呢?
    ceyes
        9
    ceyes  
       2013-10-11 17:17:08 +08:00
    一般来说,kernel的bug肯定宕机。

    less /var/log/messages
    搜索"Oops" "Call Trace" "Panic"
    拿着相关的信息去bugzilla寻求帮助吧
    geew
        10
    geew  
    OP
       2013-10-12 09:34:26 +08:00
    顶上去
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   1840 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 24ms · UTC 16:20 · PVG 00:20 · LAX 08:20 · JFK 11:20
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.