程序中出现nan意味着崩溃,不对其进行判断则程序继续运行,浪费cpu。而如果程序中每次都检测,代价太大。折中之下另外写了一个检测脚本,既能及时发现程序崩溃,又无需原来的程序做额外操作,保证了性能。
脚本原理:
-
使用重定向、
tee
等将标准输出写入日志文件; -
脚本定时(10s)用
tail
查看最新输出,发现nan则杀死程序,脚本退出。
脚本用法:
-
执行程序,将屏幕输出写入文件。例如:
nohup ./ttt > ~/log.txt 2>&1 &
,或用tee
重定向:./ttt | tee log.txt
; -
用
ps
命令找到程序的pid:ps aux | grep ttt | grep -v grep | awk '{print $2}'
; -
执行监控:
./checkNAN.sh pid log.txt
。
脚本内容:
#!/bin/bash # author: qiquanji # link: <https://qiquanji.com/check-nan-script/> set -e usage() { echo "Usage: ./checkNAN pid logfile" } argc=$# if [ $argc -lt 2 ] then usage exit 1 fi PID=$1 LOGFILE=$2 COMMAND=`ps -ef | grep $PID | grep -v grep | grep -v checkNAN| head -n 1 | awk '{print $8}'` if [ "$COMMAND" = "" ]; then echo "unknow pid: $PID" exit 1 fi if [ ! -e "$LOGFILE" ]; then echo "non-exists log file: $LOGFILE" exit 1 fi echo "watch pid: $PID($COMMAND) for log file: $LOGFILE" count=0 while true do ret=`ps -ef | grep $PID | grep -v grep | grep -v checkNAN| head -n 1 | awk '{print $8}'` if [ "$ret" = "" ]; then echo "process quit!" exit 0 fi ret=$(tail $LOGFILE | grep -i nan|wc -l) if [[ $ret -ne 0 ]]; then echo "nan checked!" tail $LOGFILE | grep nan echo "kill process" kill -9 $PID echo "watch exit" exit 0 fi count=$((count+1)) if [[ $(($count%6)) -eq 0 ]]; then date=$(date +'%Y-%m-%d %H-%M-%S') echo "$date: no nan checked..." fi sleep 10 done
原文链接:https://www.qiquanji.com/post/4975.html
本站声明:网站内容来源于网络,如有侵权,请联系我们,我们将及时处理。
微信扫码关注
更新实时通知