CN110457178A - A kind of full link monitoring alarm method based on log collection analysis - Google Patents

A kind of full link monitoring alarm method based on log collection analysis Download PDF

Info

Publication number
CN110457178A
CN110457178A CN201910689380.7A CN201910689380A CN110457178A CN 110457178 A CN110457178 A CN 110457178A CN 201910689380 A CN201910689380 A CN 201910689380A CN 110457178 A CN110457178 A CN 110457178A
Authority
CN
China
Prior art keywords
log
analysis
data
opentsdb
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910689380.7A
Other languages
Chinese (zh)
Inventor
陈旋
王冲
张�荣
闫辛未
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Ai Jia Household Articles Co Ltd
Original Assignee
Jiangsu Ai Jia Household Articles Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Ai Jia Household Articles Co Ltd filed Critical Jiangsu Ai Jia Household Articles Co Ltd
Priority to CN201910689380.7A priority Critical patent/CN110457178A/en
Publication of CN110457178A publication Critical patent/CN110457178A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/168Details of user interfaces specifically adapted to file systems, e.g. browsing and visualisation, 2d or 3d GUIs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Abstract

The present invention relates to a kind of full link monitoring alarm methods based on log collection analysis, belong to technical field of the computer network.The present invention is solving the disadvantage that existing link monitoring system, General Promotion its compatibility, expansion and real-time.Based on print log file is applied, no code intrusion saves the cost of access of new technology stack as long as can access with print log for link data acquisition;For application, it is only necessary to log of interest of record itself, not additional resource loss;The middleware that new system relies on is supported distributed deployment and is expanded without limit levels, and whole expansibility is greatly improved;Using can free configuration log matching rule on demand, crucial business scenario can also be monitored in real time and be alerted;Carried out data transmission by kafka high-performance message-oriented middleware, shortens log and generate to the time really consumed, be finally reached quasi real time property.

Description

A kind of full link monitoring alarm method based on log collection analysis
Technical field
The invention belongs to technical field of the computer network more particularly to a kind of full link monitorings based on log collection analysis Alarm method.
Background technique
It is rooted in the hearts of the people recently as the concept of micro services framework, the quantity of service split out according to different dimensions is increasingly More, ordinary user's simple operation on webpage, APP is often related to multiple service calls of the multiple systems in backstage. This causes the complexity of system entirety to greatly increase, especially when positioning " difficult and complicated cases " across multisystem.
And the appearance of full link monitoring warning system, it is intended to help us understand the behavior of system, and an analysis is provided The tool of performance issue;In order to which when system jam, we quickly can position and solve the problems, such as.
Full link monitoring system based on Java technology stack and http protocol can meet the needs of monitoring to a certain extent, But there are still following problems:
1. compatible insufficient: only Java language and http protocol, the new technology stack of company (C++, Golang etc.) being supported to need volume Outer exploitation client is accessed;
2. performance loss is big: intrusive data acquisition code causes additional negative for the memory, CPU and network bandwidth etc. of application Load;
3. expansibility is poor: there are bottlenecks for the storage of link data, can not smooth water under the premise of portfolio further increases It is flat to expand;
4. business monitoring underbraced: the monitoring being only limitted in system level can not provide effective data supporting for service operation;
5. monitoring hysteresis quality: cannot notify relevant person in charge before problem causes obvious service impact.
Summary of the invention
Present invention seek to address that the shortcomings that existing link monitoring system, General Promotion its compatibility, expansion and in real time Property.
One kind is provided the technical problem to be solved by the present invention is to the deficiency for background technique to analyze based on log collection Full link monitoring alarm method, solve the disadvantage that existing link monitoring system, General Promotion its compatibility, expansion and Real-time.
The present invention uses following technical scheme to solve above-mentioned technical problem:
A kind of full link monitoring alarm method based on log collection analysis, specifically includes the following steps;
Step 1, infrastructure service is built;Specifically comprising distributed document storage hdfs, distributed columnar database HBase, message Middleware kafka, time series database opentsDB and diagrammatic representation unit Grafana;
Wherein, hdfs, for storing the distributed document of data file;
HBase, for externally providing data storage service;
OpentsDB, for the analysis of storage service data as a result, being with the time by building table in HBase and finally externally providing The data query service of dimension;
Grafana is visualized for final chart;
Kafka realizes decoupling each other for the message efficient transmitting between operation system to be analyzed and monitoring analysis system Purpose;
Step 2, operation system log recording and the maintenance of log analysis rule;
Step 3, operation system log collection and analysis processing;
Step 4, the alarm of log analysis result and subsequent diagrammatic representation.
As a kind of further preferred scheme of the full link monitoring alarm method based on log collection analysis of the present invention, institute It states specific as follows in step 2: operation system log recording and the maintenance of log analysis rule
Step 2.1, in key business scene code, key message is recorded into journal file;
Step 2.2, rule is maintained into monitoring system under corresponding application;
Step 2.3, by under the scene journal file and corresponding key message analysis rule typing monitoring system, and complete to accuse Police configures and the abnormal white list without alarm configures, and then is monitored in real time.
As a kind of further preferred scheme of the full link monitoring alarm method based on log collection analysis of the present invention, institute It states specific as follows in step 3: using filebeat as log data acquisition device, the file acquired will be needed with the lattice of message in real time Formula is sent out, while using kafka as the bridge for linking up service application and monitoring system, realizes the mesh of analysis log in real time Mark, the specific steps are as follows:
Step 3.1. downloading filebeat file is simultaneously decompressed;
Step 3.2, configuration file filebeat.yml is modified, determines the file path for needing to monitor variation, and be sent to The address kafka and topic title;
Step 3.3, order ./filebeat-e-c filebeat.yml is executed, it can be in real time by file/opt/tomcat/ The changed content of system/system.log is sent in the kafka queue of company;
Step 3.4, monitoring system is monitored in step 3.2 and configures corresponding kafka queue topic filebeatsystem, when It can be analyzed by consumption after having new log information push.
As a kind of further preferred scheme of the full link monitoring alarm method based on log collection analysis of the present invention, In In step 4, the alarm of log analysis result and subsequent diagrammatic representation according to log content are divided into two kinds of situations, specific as follows:
Step 4.1, if exception information storehouse, i.e. containing Exception keyword in three kinds of message sended over, then root Corresponding warning information push is carried out according to alarm address, and being recorded in opentsdb according to the principle that abnormal quantity of the same name adds up should Summarize in data using abnormal, and just alarm notification is handled according to the alarm address of configuration;
If non-exception information, then it is normal business diary record, then judges whether to be that we are configured to the rule of setting, if It can be exactly matched according to rule, then it is for statistical analysis according to keyword to the scene, it is included in that opentsDB is corresponding to summarize Data;
Step 4.2, and etc. in 4.1 abnormal, key business statistical data be deposited into opentsDB time series data, when only need It is configured to figure to show in software Grafana, chart presentation can be carried out according to time trend to certain class business datum, it is specific to walk Suddenly are as follows:
1. newdata source;
2. selecting openTSDB;
3. carrying out correlation openTSDB configuration: then the predominantly port ip+ is saved;
4. creating new dashboard from homepage, and select Add Query;
5. data source and corresponding polymerization methods that allocation map table needs to show, and saved;
6. obtaining the diagrammatic representation of contextual data.
The invention adopts the above technical scheme compared with prior art, has following technical effect that
1. link data acquisition of the present invention is based on print log file is applied, no code is invaded, as long as can be with print log Access, saves the cost of access of new technology stack;
2. the present invention only needs to record log of interest itself, not additional resource loss for application;
3. the middleware that new system relies on is supported distributed deployment and expanded without limit levels, greatly improve whole Expansibility;
4. present invention application can free configuration log matching rule on demand, the business scenario of key can also be monitored in real time simultaneously Alarm;
5. the present invention carry out data transmission by kafka high-performance message-oriented middleware, shorten log generation to really consume when Between, it is finally reached quasi real time property.
Detailed description of the invention
Fig. 1 is the dependence graph between complication system;
Fig. 2 is overall architecture deployment diagram of the present invention;
Fig. 3 is the actual log screenshot of the present invention;
Fig. 4 is that the of the invention rule for receiving building task message and consuming is maintained into system under corresponding application is illustrated Figure;
Fig. 5 is alarm configuration and the abnormal white list configuration schematic diagram without alarm;
Fig. 6 is log collector filebeat operating status figure;
Fig. 7 is abnormality alarming screenshot of the present invention;
Fig. 8 is that the present invention can configure correlation graph displaying in Grafana.
Specific embodiment
Technical solution of the present invention is described in further detail with reference to the accompanying drawing:
Present invention seek to address that the shortcomings that existing link monitoring system, as shown in Figure 1, General Promotion its compatibility, expansion with And real-time, step include following four, they are respectively: 1, the infrastructure service relied on is built, 2, operation system log Record and log analysis rule maintenance, 3, operation system log collection and analysis processing, 4, log analysis result alarm and it is subsequent Diagrammatic representation, the specific embodiment of each step under next I comes specifically.
As shown in Fig. 2, the one, infrastructure service that is relied on is built, wherein including following a few class services, middleware: hdfs (distributed document storage), HBase(distribution columnar database), kafka(message-oriented middleware), opentsDB(time series data Library) cluster environment, diagrammatic representation software Grafana;Wherein, hdfs is the distributed document storage system of final storage data file System;HBase relies on the file service dependent on hdfs finally externally to provide data storage service;OpentsDB is a open source Time series database, for the analysis of storage service data as a result, it is with the time by building table in HBase and finally externally providing The data query service of dimension;Grafana is the metric analysis and visualization external member of an open source, is visualized for final chart It shows;Kafka is the high-effect Message Queuing Middleware of a open source, between operation system to be analyzed and monitoring analysis system Message efficient transmitting, it is final to realize the purpose decoupled each other.
Separately expand this five kinds services, middleware builds specific steps:
1. hdfs is built
# downloading decompression
wget http://apache.claz.org/hadoop/common/hadoop-2.7.7/hadoop- 2.7.7.tar.gz
tar -zxvf hadoop-2.7.7.tar.gz
# modification solution extrudes folder name, removes version number
mv /opt/hadoop-2.7.7 /opt/hadoop
# increases catalogue newly
mkdir /home/hadoop/
# adds environmental variance
vim /home/hadoop/.bashrc
export JAVA_HOME=/usr/local/jdk1.8.0_101
export JRE_HOME=/usr/local/jdk1.8.0_101/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_ HOME/lib
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_ HOME/sbin
#vim /opt/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_101
In #vim/opt/hadoop/etc/hadoop/yarn-env.sh configuration file
Environmental variance is added after " some Java parameters "
export JAVA_HOME=/usr/local/jdk1.8.0_101
#vim /opt/hadoop/etc/hadoop/core-site.xml
Following content is added in configuration file
<property>
<name>fs.defaultFS</name>
<value>hdfs://hdfs-01.xxx.xxxxxxx.xxx:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/hadoop/tmp</value>
<description>Abasefor other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.Spark.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.spark.groups</name>
<value>*</value>
</property>
#vi /opt/hadoop/etc/hadoop/hdfs-site.xml
Following content is added in configuration file:
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hdfs-01.xxx.xxxxxxx.xxx:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
#cp /opt/hadoop/etc/hadoop/mapred-site.xml.template /opt/hadoop/etc/ hadoop/mapred-site.xml
vim /opt/hadoop/etc/hadoop/mapred-site.xml
And following content is added in configuration file
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hdfs-01.xxx.xxxxxxx.xxx:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hdfs-01.xxx.xxxxxxx.xxx:19888</value>
</property>
#vim /opt/hadoop/etc/hadoop/yarn-site.xml
Following content is added in configuration file
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hdfs-01.xxx.xxxxxxx.xxx:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hdfs-01.xxx.xxxxxxx.xxx:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hdfs-01.xxx.xxxxxxx.xxx:8035</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hdfs-01.xxx.xxxxxxx.xxx:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hdfs-01.xxx.xxxxxxx.xxx:8088</value>
</property>
#mkdir /opt/hadoop/etc/hadoop/tmp/
And read right is distributed into hadoop user:
# starts hadoop service
1 > format HDFS distributed file system
cd /opt/hadoop/bin
Execute ./hdfs namenode-format
All services of 2 > starting
cd /opt/hadoop/sbin
./start-all.sh
2. HBase is built
# downloading
wget http://apache-mirror.rbc.ru/pub/apache/hbase/stable/hbase-1.4.8- bin.tar.gz
# decompression
tar -zxvf hbase-1.4.8-bin.tar.gz
# renames
mv hbase-1.4.8 hbase
The # configuration connection address hdfs
vim /opt/hbase/conf/hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://hdfs-01.xxx.xxxxxxx.xxx:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>zk-01.xxx.xxxxxxx.xxx:2181,zk-02.xxx.xxxxxxx.xxx:2182,zk- 03.xxx.xxxxxxx.xxx:2183</value>
</property>
<!-- dns parsing -- >
<property>
<name>hbase.master.dns.interface</name>
<!-- being arranged according to practical network interface card situation -- >
<value>default</value>
</property>
<property>
<name>hbase.master.dns.nameserver</name>
<value>xxx.xx.xx.xx</value>
</property>
<property>
<name>hbase.regionserver.dns.interface</name>
<!-- being arranged according to practical network interface card situation -- >
<value>default</value>
</property>
<property>
<name>hbase.regionserver.dns.nameserver</name>
<value>xxx.xx.xx.xx</value>
</property>
# configuration surroundings variable
vi /opt/hbase/conf/hbase-env.sh
# specifies the address jdk
export JAVA_HOME=/usr/local/jdk1.8.0_101
The zk that # is even shared is set to false, subsequent to need to establish association with opentsDB so needing the even same external zk
export HBASE_MANAGES_ZK=false
# adds environmental variance
vi /etc/profile
export HBASE_HOME=/opt/hbase
export PATH=$PATH:/opt/hbase/bin
# starting and closing
cd /opt/hbase/bin
Starting
./start-hbase.sh
It closes
./stop-hbase.sh
# checks whether to start successfully
http://xxx.xxx.xx.xx:16010/master-status
3. opentsDB is built
# downloads code
git clone git://github.com/OpenTSDB/opentsdb.git
The plug-in unit of a dependence may be needed to install before # compiling
yum -y install libtool
# compiling
cd opentsdb
./build.sh
# installation
./configure
make install
# configuration surroundings build table
mkdir /etc/opentsdb/
vi /etc/opentsdb/opentsdb.conf
# modification configuration
tsd.network.port=4242
tsd.http.staticroot = /opt/opentsdb/static/
tsd.http.cachedir = /tmp/opentsdb
tsd.core.plugin_path = /opt/opentsdb/plugins
tsd.storage.hbase.zk_quorum = zk-01.xxx.xxxxxxx.xxx:2181,zk- 02.xxx.xxxxxxx.xxx:2182,zk-03.xxx.xxxxxxx.xxx:2183
#, which matches, builds catalogue jointly
mkdir /opt/opentsdb/static
mkdir /tmp/opentsdb
mkdir /opt/opentsdb/plugins
# builds table
cd /opt/opentsdb/src
env COMPRESSION=NONE HBASE_HOME=/opt/hbase ./create_table.sh
# starting
cd /opt/opentsdb/build
./tsdb tsd &
#, which is checked whether, to be started successfully
http://xxx.xxx.x.xxx:4242/
4. kafka is built
# downloading
wget http://mirrors.shuosc.org/apache/kafka/1.0.0/kafka_2.11-1.0.0.tgz
# decompression
tar -zxvf kafka_2.11-1.0.0.tgz
mv kafka_2.11-1.0.0 kafka
# modification configuration
vim /opt/kafka/config/server.properties
The public zk of # connection
zookeeper.connect=zk-01.xxx.xxxxxxx.xxx:2181,zk-02.xxx.xxxxxxx.xxx:2182, zk-03.xxx.xxxxxxx.xxx:2183
# starting
cd /opt/kafka/bin
./kafka-server-start.sh ../config/server.properties&
# tests whether to start successfully
./kafka-topics.sh --list --zookeeper zk-01.xxx.xxxxxxx.xxx:2181,zk- 02.xxx.xxxxxxx.xxx:2182,zk-03.xxx.xxxxxxx.xxx:2183
5, Grafana is built
# modifies software source configuration
vim /etc/yum.repos.d/grafana.repo
[grafana]
name=grafana
baseurl=https://packages.grafana.com/oss/rpm
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://packages.grafana.com/gpg.key
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
# installation
yum install grafana
# starting service
6、service grafana-server start
Two, operation system log recording and the maintenance of log analysis rule, the former is by system research and development personnel in key business scene generation In code, key message is recorded into journal file, such as " xx order places an order success, order number xxxxx ", " xx user's payment Success, the amount of money are xxx member ";It here is practical citing with the cloud rendering project of company, research staff needs to enter some queue Queue message quantity is monitored, and just adds log recording, code in this part of module in the item code are as follows:
/**
* building task message is received
*/
@RocketListener(topic = "xxx", consumerGroup = "xxxx")
public void onMessage(Param jobScene) {
log.info("receive construct scene msg[type:{},value:1]|", jobScene.getRenderType());
if (StringUtils.isBlank(jobScene.getJobId())) {
log.info("receive scene jobId is null: {}", jobScene);
return;
}
try {
// building task storage
constructQueueService.enqueue(jobScene);
} catch (Exception e) {
log.error("save construct scene error: {}, {}", jobScene, ExceptionUt ils.getStackTrace(e));
}
}
When then thering is the message of the type to be consumed every time, it just will appear following printed words in log:
receive construct scene msg[type:{},value:1]|
Actual log screenshot is as indicated at 3:
It can be clear that, from first symbol of left side number -> > > side to the end symbol | between, required for having us That is concerned about receives message key receive construct scene msg, thus by this " receive building task message simultaneously Consumed " rule be maintained into system under corresponding application, as shown in Figure 4:
So far, the monitoring system that the log under the scene and corresponding keyword analysis rule have been entered into us suffers;
In addition, because monitoring system needs to pay close attention to and the abnormal conditions in analysis system, and alerted, it is therefore desirable to it provides another Outer alarm configuration and the abnormal white list without alarm configure, shown in specific See Figure 5:
So far, business emphasis business scenario of concern and not need the white list being concerned about abnormal, and needed after there is exception The contact method of alarm notification is wanted to configure completion.
Three, operation system log collection and analysis processing, from step 2, the scene of care is passed through log by us Form be included in file, but the system that company is related to is dozens or even hundreds of easily, it is necessary to by a kind of means by this A little logs notify monitoring system in a manner of quasi real time property;Here we use filebeat as log data acquisition Device, it will quasi real time can need the file acquired to send out with the format of message, while use kafka as communication [industry Business application] and [monitoring system] bridge, it is final to realize the target for analyzing log in real time, the specific steps are as follows:
1. downloading filebeat file is simultaneously decompressed;
2. modifying configuration file filebeat.yml, the file path for needing to monitor variation is determined, and the kafka being sent to Location and topic title, by taking system above as an example, concrete configuration file are as follows:
filebeat.inputs:
- type: log
# log output type
enabled: true
encoding: plain
paths:
- /opt/tomcat/system/system.log
Customized two fields of # distinguish Log Types and host
fields:
appName: system
# ignores one hour journal change
ignore_older: 1h
# is begun listening for from end-of-file
tail_files: true
# canonical matches the journal file met
#multiline.pattern: '^[[:space:]]+(at|\.{3})\b|^Caused by:|^.+ Exception:|^\d+\serror'
#multiline.negate: false
multiline.pattern: '^.\d{4}\-\d{2}\-\d{2}.'
multiline.negate: true
multiline.match: after
# # is true, the log of foot canonical matching condition with thumb down
# multiline.negate: true
# # after is appended to behind file
# multiline.match: after
filebeat.config.modules:
path: /opt/filebeat/modules.d/*.yml
reload.enabled: false
setup.template.settings:
index.number_of_shards: 3
output.kafka:
enabled: true
hosts: ["kafka-01.xxx.xxxxx.xx:1111"]
topic: "filebeatsystem"
# version: "2.1"
required_acks: 1
worker: 2
max_message_bytes: 10000000
max_procs: 1
processors:
- add_host_metadata:
netinfo.enabled: true
cache.ttl: 5m
3. order ./filebeat-e-c filebeat.yml is executed, it can be in real time by file/opt/tomcat/system/ The changed content of system.log is sent in the kafka queue of company;
4. monitoring system monitors corresponding kafka queue topic filebeatsystem of configuration in (2), when there is new log It can be analyzed by consumption after message push.
Four, the alarm of log analysis result and subsequent diagrammatic representation, are divided into two kinds of situations according to log content here:
1, if exception information storehouse, i.e. containing Exception keyword in three kinds of message sended over, then according to (2) Middle alarm address carries out corresponding warning information push, and being recorded in opentsdb according to the principle that abnormal quantity of the same name adds up should Summarize in data using abnormal, and just alarm notification is handled according to the alarm address of configuration, the code of this part processing is such as Under:
private void exceptionPush(String message, String ip, String logFileName, String appName) {
List<TenantConfig> exceptionNotifies = selectExceptionNotifies (appName);
//exception polymerize exceptionProcess and alarm
if (!StringUtils.isEmpty(message) && !CollectionUtils.isEmpty (exceptionNotifies)) {
The abnormal push alarm that // polymerization is completed
// white list
// confirm it is abnormal start by exceptionName
String exceptionName = RedisKeyUtils.extractExceptionNameFileBeat (message);
if (StringUtils.isEmpty(exceptionName)) {
return;
}
// push opentsdb counts online alarm
log.info("filebeat warning process:{}", exceptionName);
// send alarm notification
if(checkInSilence(appName, ip, exceptionName)) {
log.info("notify silence appname:{},ip:{},exception:{}", appName, ip, message);
return;
}
// other relevant operations etc.
}
2, non-exception information is then normal business diary record, then at this time with regard to needing judge whether it is that we are configured to (2) The rule of middle setting, it is for statistical analysis according to keyword to the scene if can be exactly matched according to rule, it is included in OpentsDB is corresponding to summarize data, and the code of this part is as follows:
public Boolean insertOpenTsdb(String ip, String appName, String logFileName, String message) {
if (CollectorConfig.regularConfig == null || CollectionUtils.isEmpty (CollectorConfig.regularConfig.getTenantRegularConfigList())) {
return false;
}
// read nacos configuration
Optional<RegularConfig.TenantRegularConfig> tenantRegularConfigO ptional = CollectorConfig.regularConfig.getTenantRegularConfigList().
stream().filter(tenantRegularConfig -> tenantRegularConfig.getTe nantId() != null && tenantRegularConfig.getTenantId().equalsIgnoreCase (appName)).findAny();
if (tenantRegularConfigOptional.isPresent()) {
List<Regular> regularList = tenantRegularConfigOptional.get() .getRegularList();
Collections.sort(regularList);
try {
Regular regular = null;
if (null != (regular = isFixedRule(regularList, message))) {
// press logback particular log format, typing hbase
//insertHBaseLogContent(message, appName, data);
The matching of // regulation engine, typing opentsdb
LoggerData data = new LoggerData();
data.setAppName(appName);
data.setCreateTime(com.ihomefnt.sunfire.config.utils.StringUt ils.now());
data.setSplitExpress("");
data.setAppIp(ip);
data.setLoggerContent(message);
tsdbMetricStore.put(appName, ip, logFileName, data, regular, tsdbProperties.getOpenTSDB());
}
return true;
} catch (Exception e) {
log.error("insert opentsdb exception:{}", JSON.toJSONString (message));
return false;
}
}
return false;
}
It is deposited into opentsDB time series data Deng statistical data abnormal, key business in 1,2, at this moment only needs to configure figure It shows in software Grafana, chart presentation can be carried out according to time trend to certain class business datum,
System relies on the deployment scenario of middleware, as shown in table 1:
Table 1
Wherein hdfs, HBase, kafka are disposed according to fully distributed mode, and opentsDB is due to relying on HBase Distributed deployment, Grafana mainly do chart presentation, carry out single-point deployment;
3. starting filebeat client in instances in application, typing Apply Names (corresponding topic title) and needs are adopted The correspondence journal file path of collection:
Push can be acquired to log in real time;
4. being matched to corresponding rule after monitoring system consumption, being included in the corresponding aggregating records of opentsDB;
It if it is exception, is then alerted by the alerting service configured, here by taking nail follows closely Bot as an example, as shown in Figure 7.
5. for the business polymerization data acquired correlation graph displaying can be configured in Grafana, as shown in Figure 8.
It is finally reached the visualization for key business contextual data to present, and abnormal monitoring processing mesh quasi real time Mark.

Claims (4)

1. a kind of full link monitoring alarm method based on log collection analysis, it is characterised in that: specifically include the following steps;
Step 1, infrastructure service is built;Specifically comprising distributed document storage hdfs, distributed columnar database HBase, message Middleware kafka, time series database opentsDB and diagrammatic representation unit Grafana;
Wherein, hdfs, for storing the distributed document of data file;
HBase, for externally providing data storage service;
OpentsDB, for the analysis of storage service data as a result, being with the time by building table in HBase and finally externally providing The data query service of dimension;
Grafana is visualized for final chart;
Kafka realizes decoupling each other for the message efficient transmitting between operation system to be analyzed and monitoring analysis system Purpose;
Step 2, operation system log recording and the maintenance of log analysis rule;
Step 3, operation system log collection and analysis processing;
Step 4, the alarm of log analysis result and subsequent diagrammatic representation.
2. a kind of full link monitoring alarm method based on log collection analysis according to claim 1, it is characterised in that: It is specific as follows in the step 2: operation system log recording and the maintenance of log analysis rule
Step 2.1, in key business scene code, key message is recorded into journal file;
Step 2.2, rule is maintained into monitoring system under corresponding application;
Step 2.3, by under the scene journal file and corresponding key message analysis rule typing monitoring system, and complete to accuse Police configures and the abnormal white list without alarm configures, and then is monitored in real time.
3. a kind of full link monitoring alarm method based on log collection analysis according to claim 1, it is characterised in that: It is specific as follows in the step 3: to use filebeat as log data acquisition device, the file acquired will be needed with message in real time Format is sent out, while using kafka as the bridge for linking up service application and monitoring system, realizes analysis log in real time Target, the specific steps are as follows:
Step 3.1. downloading filebeat file is simultaneously decompressed;
Step 3.2, configuration file filebeat.yml is modified, determines the file path for needing to monitor variation, and be sent to The address kafka and topic title;
Step 3.3, order ./filebeat-e-c filebeat.yml is executed, it can be in real time by file/opt/tomcat/ The changed content of system/system.log is sent in the kafka queue of company;
Step 3.4, monitoring system is monitored in step 3.2 and configures corresponding kafka queue topic filebeatsystem, when having It can be analyzed by consumption after new log information push.
4. a kind of full link monitoring alarm method based on log collection analysis according to claim 1, it is characterised in that: In step 4, the alarm of log analysis result and subsequent diagrammatic representation according to log content are divided into two kinds of situations, specific as follows:
Step 4.1, if exception information storehouse, i.e. containing Exception keyword in three kinds of message sended over, then root Corresponding warning information push is carried out according to alarm address, and being recorded in opentsdb according to the principle that abnormal quantity of the same name adds up should Summarize in data using abnormal, and just alarm notification is handled according to the alarm address of configuration;
If non-exception information, then it is normal business diary record, then judges whether to be that we are configured to the rule of setting, if It can be exactly matched according to rule, then it is for statistical analysis according to keyword to the scene, it is included in that opentsDB is corresponding to summarize Data;
Step 4.2, and etc. in 4.1 abnormal, key business statistical data be deposited into opentsDB time series data, when only need It is configured to figure to show in software Grafana, chart presentation can be carried out according to time trend to certain class business datum, it is specific to walk Suddenly are as follows:
1. newdata source;
2. selecting openTSDB;
3. carrying out correlation openTSDB configuration: then the predominantly port ip+ is saved;
4. creating new dashboard from homepage, and select Add Query;
5. data source and corresponding polymerization methods that allocation map table needs to show, and saved;
6. obtaining the diagrammatic representation of contextual data.
CN201910689380.7A 2019-07-29 2019-07-29 A kind of full link monitoring alarm method based on log collection analysis Withdrawn CN110457178A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910689380.7A CN110457178A (en) 2019-07-29 2019-07-29 A kind of full link monitoring alarm method based on log collection analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910689380.7A CN110457178A (en) 2019-07-29 2019-07-29 A kind of full link monitoring alarm method based on log collection analysis

Publications (1)

Publication Number Publication Date
CN110457178A true CN110457178A (en) 2019-11-15

Family

ID=68483845

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910689380.7A Withdrawn CN110457178A (en) 2019-07-29 2019-07-29 A kind of full link monitoring alarm method based on log collection analysis

Country Status (1)

Country Link
CN (1) CN110457178A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110990218A (en) * 2019-11-22 2020-04-10 深圳前海环融联易信息科技服务有限公司 Visualization and alarm method and device based on mass logs and computer equipment
CN111145405A (en) * 2019-12-31 2020-05-12 上海申铁信息工程有限公司 High-speed railway station gate machine management system
CN111143157A (en) * 2019-11-28 2020-05-12 华为技术有限公司 Fault log processing method and device
CN111176937A (en) * 2019-12-19 2020-05-19 深圳猛犸电动科技有限公司 Message middleware monitoring and warning system, method, terminal equipment and storage medium
CN111371900A (en) * 2020-03-13 2020-07-03 北京奇艺世纪科技有限公司 Method and system for monitoring health state of synchronous link
CN111367777A (en) * 2020-03-03 2020-07-03 腾讯科技(深圳)有限公司 Alarm processing method, device, equipment and computer readable storage medium
CN112306636A (en) * 2020-10-28 2021-02-02 武汉大势智慧科技有限公司 Cloud rendering platform and intelligent scheduling method thereof
CN112328684A (en) * 2020-11-03 2021-02-05 浪潮云信息技术股份公司 Method for synchronizing time sequence data to Kafka in real time based on OpenTsdb
CN113190415A (en) * 2021-05-27 2021-07-30 北京京东拓先科技有限公司 Internet hospital system monitoring method, equipment, storage medium and program product
CN113282557A (en) * 2021-04-27 2021-08-20 联通(江苏)产业互联网有限公司 Big data log analysis method and system based on Spring framework
CN113407421A (en) * 2021-08-19 2021-09-17 北京江融信科技有限公司 Dynamic log record management method and system for micro-service gateway
CN113434619A (en) * 2021-06-25 2021-09-24 南京美慧软件有限公司 4g intelligent highway traffic road condition monitoring system
CN114866401A (en) * 2022-05-06 2022-08-05 辽宁振兴银行股份有限公司 Distributed transaction link log analysis method and system
CN116232963A (en) * 2023-02-20 2023-06-06 中银消费金融有限公司 Link tracking method and system

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110990218A (en) * 2019-11-22 2020-04-10 深圳前海环融联易信息科技服务有限公司 Visualization and alarm method and device based on mass logs and computer equipment
CN110990218B (en) * 2019-11-22 2023-12-26 深圳前海环融联易信息科技服务有限公司 Visualization and alarm method and device based on massive logs and computer equipment
CN111143157A (en) * 2019-11-28 2020-05-12 华为技术有限公司 Fault log processing method and device
CN111176937A (en) * 2019-12-19 2020-05-19 深圳猛犸电动科技有限公司 Message middleware monitoring and warning system, method, terminal equipment and storage medium
CN111145405A (en) * 2019-12-31 2020-05-12 上海申铁信息工程有限公司 High-speed railway station gate machine management system
CN111367777A (en) * 2020-03-03 2020-07-03 腾讯科技(深圳)有限公司 Alarm processing method, device, equipment and computer readable storage medium
CN111371900A (en) * 2020-03-13 2020-07-03 北京奇艺世纪科技有限公司 Method and system for monitoring health state of synchronous link
CN112306636B (en) * 2020-10-28 2023-06-16 武汉大势智慧科技有限公司 Cloud rendering platform and intelligent scheduling method thereof
CN112306636A (en) * 2020-10-28 2021-02-02 武汉大势智慧科技有限公司 Cloud rendering platform and intelligent scheduling method thereof
CN112328684A (en) * 2020-11-03 2021-02-05 浪潮云信息技术股份公司 Method for synchronizing time sequence data to Kafka in real time based on OpenTsdb
CN113282557A (en) * 2021-04-27 2021-08-20 联通(江苏)产业互联网有限公司 Big data log analysis method and system based on Spring framework
CN113190415A (en) * 2021-05-27 2021-07-30 北京京东拓先科技有限公司 Internet hospital system monitoring method, equipment, storage medium and program product
CN113434619A (en) * 2021-06-25 2021-09-24 南京美慧软件有限公司 4g intelligent highway traffic road condition monitoring system
CN113407421B (en) * 2021-08-19 2021-11-30 北京江融信科技有限公司 Dynamic log record management method and system for micro-service gateway
CN113407421A (en) * 2021-08-19 2021-09-17 北京江融信科技有限公司 Dynamic log record management method and system for micro-service gateway
CN114866401A (en) * 2022-05-06 2022-08-05 辽宁振兴银行股份有限公司 Distributed transaction link log analysis method and system
CN116232963A (en) * 2023-02-20 2023-06-06 中银消费金融有限公司 Link tracking method and system
CN116232963B (en) * 2023-02-20 2024-02-09 中银消费金融有限公司 Link tracking method and system

Similar Documents

Publication Publication Date Title
CN110457178A (en) A kind of full link monitoring alarm method based on log collection analysis
US11768811B1 (en) Managing user data in a multitenant deployment
US11386127B1 (en) Low-latency streaming analytics
US11334543B1 (en) Scalable bucket merging for a data intake and query system
US11829330B2 (en) Log data extraction from data chunks of an isolated execution environment
US11838351B1 (en) Customizable load balancing in a user behavior analytics deployment
US11416465B1 (en) Processing data associated with different tenant identifiers
US11086974B2 (en) Customizing a user behavior analytics deployment
US10984013B1 (en) Tokenized event collector
US10963347B1 (en) Data snapshots for configurable screen on a wearable device
US10152361B2 (en) Event stream processing cluster manager
US11829381B2 (en) Data source metric visualizations
US11308061B2 (en) Query management for indexer clusters in hybrid cloud deployments
US20220045919A1 (en) Lower-tier application deployment for higher-tier system
US11921693B1 (en) HTTP events with custom fields
WO2022087565A1 (en) Streaming synthesis of distributed traces from machine logs
US20080162690A1 (en) Application Management System
US11841834B2 (en) Method and apparatus for efficient synchronization of search heads in a cluster using digests
US11269808B1 (en) Event collector with stateless data ingestion
US11681707B1 (en) Analytics query response transmission
CN113486095A (en) Civil aviation air traffic control cross-network safety data exchange management platform
CN114756301A (en) Log processing method, device and system
CN115114316A (en) Processing method, device, cluster and storage medium for high-concurrency data
US11113301B1 (en) Generating metadata for events based on parsed location information of data chunks of an isolated execution environment
US11835989B1 (en) FPGA search in a cloud compute node

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20191115

WW01 Invention patent application withdrawn after publication