如题, 是一个 wordpress 图片站, 日均 ip1.5w, pv20w 左右
同时在线 500 人左右 就会频繁 502 cpu 达到
load average: 15.04, 15.60, 14.93
cpu 是 Intel(R) Xeon(R) CPU E3-1270 v3 @ 3.50GHz 8 核心独立服务器
内存 32GB
装有 wp-super cache 缓存插件
cengos 7 64 位, nginx mysql pphp 版本都是目前最新的稳定版.
用的是 lnmp 一键安装包, 不懂其他优化
请大神帮忙看看要如何优化 解决晚上晚上高峰期 访问频繁 502 的问题
谢谢!
1
publicID002 2015-09-25 23:55:20 +08:00
|
2
oott123 2015-09-26 00:02:40 +08:00 via Android
呃,找我有啥用…
502 是表面现象,你得看产生 502 的时候输出什么日志,当时 php 进程数够不够,是有大量进程卡住了呢,还是资源空着但是进程已经到上限了… |
3
Showfom 2015-09-26 00:05:09 +08:00
1 、换 SSD
2 、同时在线 500 人? 15 分钟内还是并发啊?并发的话建议静态文件和动态文件隔离开来。 理论上这个配置的服务器抗这点流量没问题 |
4
ryd994 2015-09-26 00:08:48 +08:00 via Android
wpsuper cache 不能直接改 Nginx 配置,要手动改或者 include
如果没做这个的话 wpsupercache 是无效全 miss 的 |
5
liyucmh OP @Showfom 感谢回答, 服务器的硬盘就是 SSD: 240 GB - Intel 520 SSD
就是 51 啦统计显示晚上在线高峰期可能会突破 500 人同时在线 此时用 uptime 查询负载, 经常都是三个数值经常都是接近 20 的状态 load average: 18.69, 16.46, 13.47 |
6
liyucmh OP @publicID002 是独立服务器
@oott123 我的是 php-fpm.conf 配置文件是默认的, 没有修改, 如下 [global] pid = /usr/local/php/var/run/php-fpm.pid error_log = /usr/local/php/var/log/php-fpm.log log_level = notice [www] listen = /tmp/php-cgi.sock listen.backlog = -1 listen.allowed_clients = 127.0.0.1 listen.owner = www listen.group = www listen.mode = 0666 user = www group = www pm = dynamic pm.max_children = 20 pm.start_servers = 2 pm.min_spare_servers = 1 pm.max_spare_servers = 6 request_terminate_timeout = 100 request_slowlog_timeout = 0 slowlog = var/log/slow.log 然后错误日志 php-fpm.log 部分如下: [26-Sep-2015 08:40:08] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it [26-Sep-2015 08:41:42] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 idle, and 9 tot al children [26-Sep-2015 08:41:43] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 children, there are 0 idle, and 10 t otal children [26-Sep-2015 08:42:03] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 idle, and 17 to tal children [26-Sep-2015 08:42:04] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 16 children, there are 0 idle, and 18 t otal children [26-Sep-2015 08:42:05] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 0 idle, and 19 t otal children [26-Sep-2015 08:42:07] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it [26-Sep-2015 08:42:46] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it [26-Sep-2015 08:46:39] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 idle, and 11 to tal children 根据网上优化的那些去调整, 增加了数值什么的, 或者根据独立服务器建议设置成静态的都不行, 可能我调整不适当, 导致了更严重的问题, 运行一段时间后 502, 而且不会自动恢复! |
7
ryd994 2015-09-26 09:06:44 +08:00 via Android
主要是静态图片的话, 20w 应该顶得住啊……
|
8
ryd994 2015-09-26 09:07:04 +08:00 via Android
贴 Nginx 配置
|
9
liyucmh OP @ryd994 感谢回复, 我是 nginx 配置也是 lnmp 安装包默认的 没有修改的 如下
user www www; worker_processes auto; error_log /home/wwwlogs/nginx_error.log crit; pid /usr/local/nginx/logs/nginx.pid; #Specifies the value for maximum file descriptors that can be opened by this process. worker_rlimit_nofile 51200; events { use epoll; worker_connections 51200; multi_accept on; } http { include mime.types; default_type application/octet-stream; server_names_hash_bucket_size 128; client_header_buffer_size 32k; large_client_header_buffers 4 32k; client_max_body_size 50m; sendfile on; tcp_nopush on; keepalive_timeout 60; tcp_nodelay on; fastcgi_connect_timeout 300; fastcgi_send_timeout 300; fastcgi_read_timeout 300; fastcgi_buffer_size 64k; fastcgi_buffers 4 64k; fastcgi_busy_buffers_size 128k; fastcgi_temp_file_write_size 256k; gzip on; gzip_min_length 1k; gzip_buffers 4 16k; gzip_http_version 1.0; gzip_comp_level 2; gzip_types text/plain application/x-javascript text/css application/xml; gzip_vary on; gzip_proxied expired no-cache no-store private auth; gzip_disable "MSIE [1-6]\."; #limit_conn_zone $binary_remote_addr zone=perip:10m; ##If enable limit_conn_zone,add "limit_conn perip 10;" to server section. server_tokens off; #log format log_format access '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" $http_x_forwarded_for'; server { listen 80 default; #listen [::]:80 default ipv6only=on; server_name www.lnmp.org; index index.html index.htm index.php; root /home/wwwroot/default; #error_page 404 /404.html; location ~ [^/]\.php(/|$) { # comment try_files $uri =404; to enable pathinfo try_files $uri =404; fastcgi_pass unix:/tmp/php-cgi.sock; fastcgi_index index.php; include fastcgi.conf; #include pathinfo.conf; } location /nginx_status { stub_status on; access_log off; } location ~ .*\.(gif|jpg|jpeg|png|bmp|swf)$ { expires 30d; } location ~ .*\.(js|css)?$ { expires 12h; } access_log /home/wwwlogs/access.log access; } include vhost/*.conf; } 请帮忙看看要怎么优化, 谢谢 |
10
ryd994 2015-09-26 09:41:07 +08:00
@liyucmh 这里并没有 supercache 相关,你可以检查一下 wp 目录下的.htaccess 文件,将相应规则转换为 nginx 的 rewrite 规则。否则 supercache 是无效的
你图片文件都是以 gif|jpg|jpeg|png|bmp 结尾的么?那么至少图片是 nginx 直接处理的。你可以在 location ~ .*\.(gif|jpg|jpeg|png|bmp|swf)$ 里加一条 access_log /home/wwwlogs/access.static.log access; 来验证 你可以根据自己的情况,尝试 fastcgi_keep_conn fastcgi_cache 如果你有很多 post 的话,尝试增加 fastcgi_buffer |
11
liyucmh OP @ryd994 你好 具体站点的配置文件是有的
location / { if (-f $request_filename) { break; } set $supercache_file ''; set $supercache_uri $request_uri; set $supercache 1; set $ihttp_host ''; if ($request_method = POST) { set $supercache 0; } set $qs 0; if ($query_string) { set $qs 1; } if ($query_string ~* "^utm_source=([^&]+)&utm_medium([^&]+)&utm_campaign=([^&]+)(&utm_content=([^&]+))?$") { set $qs 0; set $supercache_uri $document_uri; } if ($qs = 1) { set $supercache 0; } # 针对已登录用户(发表过评论),可以不静态化。在访问量高峰时可注释掉 if ($http_cookie ~* "comment_author_|wordpress|wp-postpass_" ) { set $supercache 0; } #结束 if ($http_user_agent ~* '(iphone|ipod|aspen|incognito|webmate|android|dream|cupcake|froyo|blackberry9500|blackberry9520|blackberry9530|blackberry9550|blackberry 9800|webos|s8000|bada)') { set $ihttp_host '-mobile'; } if ($supercache = 0) { set $supercache_uri ''; } if ($supercache_uri ~ ^(.+)$) { set $supercache_file /wp-content/cache/supercache/$http_host$1/index${ihttp_host}.html; } if (-f $document_root$supercache_file) { #rewrite ^(.*)$ $supercache_file break; rewrite ^ $supercache_file last; } if (!-e $request_filename) { rewrite . /index.php last; } } #error_page 404 /404.html; location ~ [^/]\.(php|sh)(/|$) { # comment try_files $uri =404; to enable pathinfo try_files $uri =404; fastcgi_pass unix:/tmp/php-cgi.sock; fastcgi_index index.php; include fastcgi.conf; #include pathinfo.conf; } location ~ .*\.(gif|jpg|jpeg|png|bmp|swf)$ { expires 30d; } location ~ .*\.(js|css)?$ { expires 12h; } access_log off; } 看 php 错误日志, 提示什么 erver reached pm.max_children setting (20), consider raising it 为什么提升了相关的数值还不行, 请问要怎么设置. |
12
oott123 2015-09-26 10:17:12 +08:00 via Android
贴 502 前后的 nginx 错误日志。
502 前后运行 top 看看内存和 CPU 状态。 基本认为是高峰期 php-fpm 进程不够,导致没有 php 进程处理请求的锅,你试试把超时调短一点会不会好一些。 不过按你的意思,大部分是静态化的话,应该没有那么多请求需要 php 才对… pm.max_children 20 是不够的,增加这个看一下。 |
13
liyucmh OP @oott123 感谢回复
昨天 502 前后 nginx 的错误日志只有 2 条: 2015/09/25 22:40:03 [crit] 9446#0: *1 connect() to unix:/tmp/php-cgi.sock failed (2: No such file or directory) while connecting to upstream, client: 162.158.255.40, server: www.test112233.com, request: "GET /wp-content/themes/test112233/timthumb.php?src=http://www.test112233.com/wp-content/uploads/2015/05/1-3.jpg&w=220&h=150&zc=1 HTTP/1.1", upstream: "fastcgi://unix:/tmp/php-cgi.sock:", host: "www.test112233.com", referrer: "http://www.test112233.com/7%b1%ea%b1%b8%e5%90%88%e9%/" 2015/09/25 23:47:22 [crit] 13606#0: *1 connect() to unix:/tmp/php-cgi.sock failed (2: No such file or directory) while connecting to upstream, client: 188.114.106.203, server:www.test112233.com, request: "GET /%ec%a7%b1%ec%a7%b1%ea%b1%b8%e5%90%88%e9%9b%8612v2-07gb/ HTTP/1.1",upstream: "fastcgi://unix:/tmp/php-cgi.sock:", host: "www.test112233.com" 502 前后 top 内存是够用的 缓存了 20 几 G 空闲内存也有 感觉内存不是瓶颈 就是 cpu 占用率比较高 之前看文章说 8 核 cpu 查看 uptime 命令的时候 load average 不要超过 8, 但是高峰期的时候, 查看此命令的时候 , 负载有时会接近 20 如 load average: 18.69, 16.46, 13.47 此时访问就比较慢 还会频繁 502. pm.max_children 大概要调成什么样的才可以, 看日志 好像还要增加 you may need to increase pm.start_servers, or pm.min/max_spare_servers pm.start_servers, or pm.min/max_spare_servers 这两个参数 我之前根据网上的优化配置 就是根据内存多少去调整这个数值, 发现问题更严重. 运行一段时间 502 而且不能自动恢复! 对了 服务器有开启 php 加速缓存 opcache [opcache] ; Determines if Zend OPCache is enabled ;opcache [Zend Opcache] zend_extension="/usr/local/php/lib/php/extensions/no-debug-non-zts-20121212/opcache.so" opcache.memory_consumption=128 opcache.interned_strings_buffer=8 opcache.max_accelerated_files=4000 opcache.revalidate_freq=60 opcache.fast_shutdown=1 opcache.enable_cli=1 ;opcache end 请帮忙看看要怎么调整这些配置文件. |
14
oott123 2015-09-26 11:17:18 +08:00
> connect() to unix:/tmp/php-cgi.sock failed (2: No such file or directory)
呃, php-fpm 整个都挂了么…… 个人感觉 php 子进程吃内存比较多,吃 CPU 似乎不太多的样子;那是什么占用了 CPU ? 网站服务器上,除了 nginx 和 php-fpm 还有其它服务么,比如 MySQL ? |
15
liyucmh OP @oott123 是的 mysql 也是在此服务器上面的
%Cpu(s): 28.3 us, 4.0 sy, 0.0 ni, 63.8 id, 3.8 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 32740984 total, 1820088 free, 700568 used, 30220328 buff/cache KiB Swap: 16547836 total, 16349292 free, 198544 used. 31524392 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26122 mysql 20 0 2712468 144812 8684 S 206.0 0.4 410:40.05 mysqld 7912 www 20 0 452204 47108 26976 S 13.0 0.1 0:08.65 php-fpm 8024 www 20 0 451796 46284 26496 R 13.0 0.1 0:03.12 php-fpm 7954 www 20 0 451800 46216 26500 S 12.6 0.1 0:04.74 php-fpm 8035 www 20 0 451940 46240 26460 S 7.6 0.1 0:02.60 php-fpm 8025 www 20 0 451440 46004 26660 S 4.7 0.1 0:02.52 php-fpm 好像瓶颈的确实在 mysql 上面 上面是按 top 命令 然后按 cpu 占用率排序 观察了一段时间, mysql 的 %CPU 占用率经常飙到 100 以上 |
16
bsder 2015-09-27 00:00:10 +08:00 via Android
调整 php 的进程数就够了,如果并发 500 的话, pm.max_children=512 ,另外 pm=static
|
17
bsder 2015-09-27 00:04:38 +08:00 via Android
你 MySQL 负载根本不高,才用了 2 核。 MySQLCPU100%不代表系统负载高。
|
18
liyucmh OP @bsder 谢谢! 请问 只要调整这两个参数吗, 其他的 pm.min_spare_servers = 1
pm.max_spare_servers = 6 这两个要调整吗 |
19
liyucmh OP @bsder 追加提问, 这样设置观察了一段时间, 发现负载超高
不是高峰期, 同时在线 280 左右, uptime 负载就达到 30 左右 load average: 35.92, 35.68, 27.36 同时页面反应速度非常慢 |
20
bsder 2015-09-27 19:01:01 +08:00 via Android
数据库配置文件贴下,我怀疑是数据库配置导致的 PHP 等待 MySQL 计算查询的高负载高 io wait ,最好在负载高的时间运行下 vmstat 1 来检测性能瓶颈
|
21
bsder 2015-09-27 19:03:47 +08:00 via Android
另外可以试着开启 PHP 和 MySQL 的 slowlog 来验证一下
|
22
liyucmh OP @bsder 感谢关注解决
mysql 配置文件 my.cnf 默认的 [client] #password = your_password port = 3306 socket = /tmp/mysql.sock # The MySQL server [mysqld] port = 3306 socket = /tmp/mysql.sock datadir = /usr/local/mysql/var skip-external-locking key_buffer_size = 16M max_allowed_packet = 1M table_open_cache = 64 sort_buffer_size = 512K net_buffer_length = 8K read_buffer_size = 256K read_rnd_buffer_size = 512K myisam_sort_buffer_size = 8M #skip-networking # Replication Master Server (default) # binary logging is required for replication #log-bin=mysql-bin # binary logging format - mixed recommended #binlog_format=mixed # required unique id between 1 and 2^32 - 1 # defaults to 1 if master-host is not set # but will not function as a master if omitted server-id = 1 # Uncomment the following if you are using InnoDB tables innodb_data_home_dir = /usr/local/mysql/var innodb_data_file_path = ibdata1:10M:autoextend innodb_log_group_home_dir = /usr/local/mysql/var # You can set .._buffer_pool_size up to 50 - 80 % # of RAM but beware of setting memory usage too high innodb_buffer_pool_size = 16M innodb_additional_mem_pool_size = 2M # Set .._log_file_size to 25 % of buffer pool size innodb_log_file_size = 5M innodb_log_buffer_size = 8M innodb_flush_log_at_trx_commit = 1 innodb_lock_wait_timeout = 50 [mysqldump] quick max_allowed_packet = 16M [mysql] no-auto-rehash # Remove the next comment character if you are not familiar with SQL #safe-updates [myisamchk] key_buffer_size = 20M sort_buffer_size = 20M read_buffer = 2M write_buffer = 2M [mysqlhotcopy] interactive-timeout 慢日志开启了,都是这些提醒, 超时设置为 5s 都是这些重复 [26-Sep-2015 15:08:07] [pool wwwroot] pid 4492 script_filename = /home/wwwroot/sample.com/index.php [0x00007f9a83f8dc68] mysqli_query() /home/wwwroot/sample.com/wp-includes/wp-db.php:1739 [0x00007f9a83f8daf8] _do_query() /home/wwwroot/sample.com/wp-includes/wp-db.php:1645 [0x00007f9a83f8d988] query() /home/wwwroot/sample.com/wp-includes/wp-db.php:2195 [0x00007f9a83f8d7c8] get_results() /home/wwwroot/sample.com/wp-content/themes/sample/widgets/wid-comment.php:82 [0x00007f9a83f8d5b8] mod_newcomments() /home/wwwroot/sample.com/wp-content/themes/sample/widgets/wid-comment.php:30 [0x00007f9a83f8d470] widget() /home/wwwroot/sample.com/wp-includes/widgets.php:329 [0x00007ffe5d148ab0] display_callback() unknown:0 [0x00007f9a83f8d290] call_user_func_array() /home/wwwroot/sample.com/wp-includes/widgets.php:1272 [0x00007f9a83f8d160] dynamic_sidebar() /home/wwwroot/sample.com/wp-content/themes/sample/sidebar.php:12 [0x00007f9a83f8cfd0] +++ dump failed |
23
bsder 2015-09-28 11:06:58 +08:00
数据库的配置调大一点, 可以简单的用 /usr/share/mysql/my-huge.cnf 为模板来修改。
|
26
bsder 2015-09-28 12:47:43 +08:00
看你配置文件,应该是在 /usr/local/share/mysql/my-huge.cnf
|
27
ericls 2015-09-29 02:40:37 +08:00 via Android
这一切跟 nginx 应该没关系 你的 PHP 是什么跑的就看什么的日志
|
30
ericls 2015-09-29 13:11:49 +08:00
@liyucmh nginx 只是个代理服务器 除了 perl 好像什么脚本都不是 nginx 自己在跑 你可以看到你的 nginx 配置文件在处理 php 的时候 是代理 php-fpm 或者是什么
|
31
liyucmh OP @bsder 你好 找到了那个文件 把配置文件重命名为 my.cnf 重启 mysql 启动不起来 提示 Starting MySQL... ERROR! The server quit without updating PID file
请问替换了里面还要修改什么文件 |
32
bsder 2015-09-29 23:17:57 +08:00 via Android
你的 datadir 有修改成你原来 my.cnf 的路径吗?启动不了看 datadir 下或者 /var/lib/mysql 下主机名.err 的 MySQL 日志文件(取决于你当前 MySQL 启动的 datadir 参数)
|
33
liyucmh OP @bsder 感谢 改好了 可以启动了 然而问题依旧 还是负载很高 好像还更频繁 502 放弃了 换回原来默认的配置了
|
34
liyucmh OP @bsder 很奇怪 有时候 300 多人同时在线 负载也不会超过 5
但是有时 200 多人 负载就超过了 20 以上 是什么原因呢 用户的行为应该都是差不多的吧 难道是因为被攻击的缘故吗? 可是我上了 cdn 攻击者不会知道我的真实 ip 还是说只要 有攻击负载就会上升 不管你上不上 cdn? |
35
bsder 2015-10-01 19:49:09 +08:00
@liyucmh no pic no jb ,不开日志不 debug 凭猜测凭经验难以找出问题的根本。提供的信息片面,我看不出啥问题。好好看看统计或者自己分析下 nginx access 日志,负载高的时段,那些 url 被频繁访问。猜测是有 php 脚本在大量占用 cpu 资源吧。
|
36
feicao111 2016-01-10 15:00:30 +08:00
pm = dynamic
pm.max_children = 1024 pm.start_servers = 32 pm.min_spare_servers = 32 pm.max_spare_servers = 1024 |