CN110457178A - A kind of full link monitoring alarm method based on log collection analysis - Google Patents
A kind of full link monitoring alarm method based on log collection analysis Download PDFInfo
- Publication number
- CN110457178A CN110457178A CN201910689380.7A CN201910689380A CN110457178A CN 110457178 A CN110457178 A CN 110457178A CN 201910689380 A CN201910689380 A CN 201910689380A CN 110457178 A CN110457178 A CN 110457178A
- Authority
- CN
- China
- Prior art keywords
- log
- analysis
- data
- opentsdb
- file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3065—Monitoring arrangements determined by the means or processing involved in reporting the monitored data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3452—Performance evaluation by statistical analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/3476—Data logging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
- G06F16/168—Details of user interfaces specifically adapted to file systems, e.g. browsing and visualisation, 2d or 3d GUIs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
Abstract
The present invention relates to a kind of full link monitoring alarm methods based on log collection analysis, belong to technical field of the computer network.The present invention is solving the disadvantage that existing link monitoring system, General Promotion its compatibility, expansion and real-time.Based on print log file is applied, no code intrusion saves the cost of access of new technology stack as long as can access with print log for link data acquisition;For application, it is only necessary to log of interest of record itself, not additional resource loss;The middleware that new system relies on is supported distributed deployment and is expanded without limit levels, and whole expansibility is greatly improved;Using can free configuration log matching rule on demand, crucial business scenario can also be monitored in real time and be alerted;Carried out data transmission by kafka high-performance message-oriented middleware, shortens log and generate to the time really consumed, be finally reached quasi real time property.
Description
Technical field
The invention belongs to technical field of the computer network more particularly to a kind of full link monitorings based on log collection analysis
Alarm method.
Background technique
It is rooted in the hearts of the people recently as the concept of micro services framework, the quantity of service split out according to different dimensions is increasingly
More, ordinary user's simple operation on webpage, APP is often related to multiple service calls of the multiple systems in backstage.
This causes the complexity of system entirety to greatly increase, especially when positioning " difficult and complicated cases " across multisystem.
And the appearance of full link monitoring warning system, it is intended to help us understand the behavior of system, and an analysis is provided
The tool of performance issue;In order to which when system jam, we quickly can position and solve the problems, such as.
Full link monitoring system based on Java technology stack and http protocol can meet the needs of monitoring to a certain extent,
But there are still following problems:
1. compatible insufficient: only Java language and http protocol, the new technology stack of company (C++, Golang etc.) being supported to need volume
Outer exploitation client is accessed;
2. performance loss is big: intrusive data acquisition code causes additional negative for the memory, CPU and network bandwidth etc. of application
Load;
3. expansibility is poor: there are bottlenecks for the storage of link data, can not smooth water under the premise of portfolio further increases
It is flat to expand;
4. business monitoring underbraced: the monitoring being only limitted in system level can not provide effective data supporting for service operation;
5. monitoring hysteresis quality: cannot notify relevant person in charge before problem causes obvious service impact.
Summary of the invention
Present invention seek to address that the shortcomings that existing link monitoring system, General Promotion its compatibility, expansion and in real time
Property.
One kind is provided the technical problem to be solved by the present invention is to the deficiency for background technique to analyze based on log collection
Full link monitoring alarm method, solve the disadvantage that existing link monitoring system, General Promotion its compatibility, expansion and
Real-time.
The present invention uses following technical scheme to solve above-mentioned technical problem:
A kind of full link monitoring alarm method based on log collection analysis, specifically includes the following steps;
Step 1, infrastructure service is built;Specifically comprising distributed document storage hdfs, distributed columnar database HBase, message
Middleware kafka, time series database opentsDB and diagrammatic representation unit Grafana;
Wherein, hdfs, for storing the distributed document of data file;
HBase, for externally providing data storage service;
OpentsDB, for the analysis of storage service data as a result, being with the time by building table in HBase and finally externally providing
The data query service of dimension;
Grafana is visualized for final chart;
Kafka realizes decoupling each other for the message efficient transmitting between operation system to be analyzed and monitoring analysis system
Purpose;
Step 2, operation system log recording and the maintenance of log analysis rule;
Step 3, operation system log collection and analysis processing;
Step 4, the alarm of log analysis result and subsequent diagrammatic representation.
As a kind of further preferred scheme of the full link monitoring alarm method based on log collection analysis of the present invention, institute
It states specific as follows in step 2: operation system log recording and the maintenance of log analysis rule
Step 2.1, in key business scene code, key message is recorded into journal file;
Step 2.2, rule is maintained into monitoring system under corresponding application;
Step 2.3, by under the scene journal file and corresponding key message analysis rule typing monitoring system, and complete to accuse
Police configures and the abnormal white list without alarm configures, and then is monitored in real time.
As a kind of further preferred scheme of the full link monitoring alarm method based on log collection analysis of the present invention, institute
It states specific as follows in step 3: using filebeat as log data acquisition device, the file acquired will be needed with the lattice of message in real time
Formula is sent out, while using kafka as the bridge for linking up service application and monitoring system, realizes the mesh of analysis log in real time
Mark, the specific steps are as follows:
Step 3.1. downloading filebeat file is simultaneously decompressed;
Step 3.2, configuration file filebeat.yml is modified, determines the file path for needing to monitor variation, and be sent to
The address kafka and topic title;
Step 3.3, order ./filebeat-e-c filebeat.yml is executed, it can be in real time by file/opt/tomcat/
The changed content of system/system.log is sent in the kafka queue of company;
Step 3.4, monitoring system is monitored in step 3.2 and configures corresponding kafka queue topic filebeatsystem, when
It can be analyzed by consumption after having new log information push.
As a kind of further preferred scheme of the full link monitoring alarm method based on log collection analysis of the present invention, In
In step 4, the alarm of log analysis result and subsequent diagrammatic representation according to log content are divided into two kinds of situations, specific as follows:
Step 4.1, if exception information storehouse, i.e. containing Exception keyword in three kinds of message sended over, then root
Corresponding warning information push is carried out according to alarm address, and being recorded in opentsdb according to the principle that abnormal quantity of the same name adds up should
Summarize in data using abnormal, and just alarm notification is handled according to the alarm address of configuration;
If non-exception information, then it is normal business diary record, then judges whether to be that we are configured to the rule of setting, if
It can be exactly matched according to rule, then it is for statistical analysis according to keyword to the scene, it is included in that opentsDB is corresponding to summarize
Data;
Step 4.2, and etc. in 4.1 abnormal, key business statistical data be deposited into opentsDB time series data, when only need
It is configured to figure to show in software Grafana, chart presentation can be carried out according to time trend to certain class business datum, it is specific to walk
Suddenly are as follows:
1. newdata source;
2. selecting openTSDB;
3. carrying out correlation openTSDB configuration: then the predominantly port ip+ is saved;
4. creating new dashboard from homepage, and select Add Query;
5. data source and corresponding polymerization methods that allocation map table needs to show, and saved;
6. obtaining the diagrammatic representation of contextual data.
The invention adopts the above technical scheme compared with prior art, has following technical effect that
1. link data acquisition of the present invention is based on print log file is applied, no code is invaded, as long as can be with print log
Access, saves the cost of access of new technology stack;
2. the present invention only needs to record log of interest itself, not additional resource loss for application;
3. the middleware that new system relies on is supported distributed deployment and expanded without limit levels, greatly improve whole
Expansibility;
4. present invention application can free configuration log matching rule on demand, the business scenario of key can also be monitored in real time simultaneously
Alarm;
5. the present invention carry out data transmission by kafka high-performance message-oriented middleware, shorten log generation to really consume when
Between, it is finally reached quasi real time property.
Detailed description of the invention
Fig. 1 is the dependence graph between complication system;
Fig. 2 is overall architecture deployment diagram of the present invention;
Fig. 3 is the actual log screenshot of the present invention;
Fig. 4 is that the of the invention rule for receiving building task message and consuming is maintained into system under corresponding application is illustrated
Figure;
Fig. 5 is alarm configuration and the abnormal white list configuration schematic diagram without alarm;
Fig. 6 is log collector filebeat operating status figure;
Fig. 7 is abnormality alarming screenshot of the present invention;
Fig. 8 is that the present invention can configure correlation graph displaying in Grafana.
Specific embodiment
Technical solution of the present invention is described in further detail with reference to the accompanying drawing:
Present invention seek to address that the shortcomings that existing link monitoring system, as shown in Figure 1, General Promotion its compatibility, expansion with
And real-time, step include following four, they are respectively: 1, the infrastructure service relied on is built, 2, operation system log
Record and log analysis rule maintenance, 3, operation system log collection and analysis processing, 4, log analysis result alarm and it is subsequent
Diagrammatic representation, the specific embodiment of each step under next I comes specifically.
As shown in Fig. 2, the one, infrastructure service that is relied on is built, wherein including following a few class services, middleware: hdfs
(distributed document storage), HBase(distribution columnar database), kafka(message-oriented middleware), opentsDB(time series data
Library) cluster environment, diagrammatic representation software Grafana;Wherein, hdfs is the distributed document storage system of final storage data file
System;HBase relies on the file service dependent on hdfs finally externally to provide data storage service;OpentsDB is a open source
Time series database, for the analysis of storage service data as a result, it is with the time by building table in HBase and finally externally providing
The data query service of dimension;Grafana is the metric analysis and visualization external member of an open source, is visualized for final chart
It shows;Kafka is the high-effect Message Queuing Middleware of a open source, between operation system to be analyzed and monitoring analysis system
Message efficient transmitting, it is final to realize the purpose decoupled each other.
Separately expand this five kinds services, middleware builds specific steps:
1. hdfs is built
# downloading decompression
wget http://apache.claz.org/hadoop/common/hadoop-2.7.7/hadoop-
2.7.7.tar.gz
tar -zxvf hadoop-2.7.7.tar.gz
# modification solution extrudes folder name, removes version number
mv /opt/hadoop-2.7.7 /opt/hadoop
# increases catalogue newly
mkdir /home/hadoop/
# adds environmental variance
vim /home/hadoop/.bashrc
export JAVA_HOME=/usr/local/jdk1.8.0_101
export JRE_HOME=/usr/local/jdk1.8.0_101/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_
HOME/lib
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_
HOME/sbin
#vim /opt/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_101
In #vim/opt/hadoop/etc/hadoop/yarn-env.sh configuration file
Environmental variance is added after " some Java parameters "
export JAVA_HOME=/usr/local/jdk1.8.0_101
#vim /opt/hadoop/etc/hadoop/core-site.xml
Following content is added in configuration file
<property>
<name>fs.defaultFS</name>
<value>hdfs://hdfs-01.xxx.xxxxxxx.xxx:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/hadoop/tmp</value>
<description>Abasefor other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.Spark.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.spark.groups</name>
<value>*</value>
</property>
#vi /opt/hadoop/etc/hadoop/hdfs-site.xml
Following content is added in configuration file:
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hdfs-01.xxx.xxxxxxx.xxx:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/opt/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/opt/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
#cp /opt/hadoop/etc/hadoop/mapred-site.xml.template /opt/hadoop/etc/
hadoop/mapred-site.xml
vim /opt/hadoop/etc/hadoop/mapred-site.xml
And following content is added in configuration file
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hdfs-01.xxx.xxxxxxx.xxx:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hdfs-01.xxx.xxxxxxx.xxx:19888</value>
</property>
#vim /opt/hadoop/etc/hadoop/yarn-site.xml
Following content is added in configuration file
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hdfs-01.xxx.xxxxxxx.xxx:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hdfs-01.xxx.xxxxxxx.xxx:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hdfs-01.xxx.xxxxxxx.xxx:8035</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hdfs-01.xxx.xxxxxxx.xxx:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hdfs-01.xxx.xxxxxxx.xxx:8088</value>
</property>
#mkdir /opt/hadoop/etc/hadoop/tmp/
And read right is distributed into hadoop user:
# starts hadoop service
1 > format HDFS distributed file system
cd /opt/hadoop/bin
Execute ./hdfs namenode-format
All services of 2 > starting
cd /opt/hadoop/sbin
./start-all.sh
2. HBase is built
# downloading
wget http://apache-mirror.rbc.ru/pub/apache/hbase/stable/hbase-1.4.8-
bin.tar.gz
# decompression
tar -zxvf hbase-1.4.8-bin.tar.gz
# renames
mv hbase-1.4.8 hbase
The # configuration connection address hdfs
vim /opt/hbase/conf/hbase-site.xml
<property>
<name>hbase.rootdir</name>
<value>hdfs://hdfs-01.xxx.xxxxxxx.xxx:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>zk-01.xxx.xxxxxxx.xxx:2181,zk-02.xxx.xxxxxxx.xxx:2182,zk-
03.xxx.xxxxxxx.xxx:2183</value>
</property>
<!-- dns parsing -- >
<property>
<name>hbase.master.dns.interface</name>
<!-- being arranged according to practical network interface card situation -- >
<value>default</value>
</property>
<property>
<name>hbase.master.dns.nameserver</name>
<value>xxx.xx.xx.xx</value>
</property>
<property>
<name>hbase.regionserver.dns.interface</name>
<!-- being arranged according to practical network interface card situation -- >
<value>default</value>
</property>
<property>
<name>hbase.regionserver.dns.nameserver</name>
<value>xxx.xx.xx.xx</value>
</property>
# configuration surroundings variable
vi /opt/hbase/conf/hbase-env.sh
# specifies the address jdk
export JAVA_HOME=/usr/local/jdk1.8.0_101
The zk that # is even shared is set to false, subsequent to need to establish association with opentsDB so needing the even same external zk
export HBASE_MANAGES_ZK=false
# adds environmental variance
vi /etc/profile
export HBASE_HOME=/opt/hbase
export PATH=$PATH:/opt/hbase/bin
# starting and closing
cd /opt/hbase/bin
Starting
./start-hbase.sh
It closes
./stop-hbase.sh
# checks whether to start successfully
http://xxx.xxx.xx.xx:16010/master-status
3. opentsDB is built
# downloads code
git clone git://github.com/OpenTSDB/opentsdb.git
The plug-in unit of a dependence may be needed to install before # compiling
yum -y install libtool
# compiling
cd opentsdb
./build.sh
# installation
./configure
make install
# configuration surroundings build table
mkdir /etc/opentsdb/
vi /etc/opentsdb/opentsdb.conf
# modification configuration
tsd.network.port=4242
tsd.http.staticroot = /opt/opentsdb/static/
tsd.http.cachedir = /tmp/opentsdb
tsd.core.plugin_path = /opt/opentsdb/plugins
tsd.storage.hbase.zk_quorum = zk-01.xxx.xxxxxxx.xxx:2181,zk-
02.xxx.xxxxxxx.xxx:2182,zk-03.xxx.xxxxxxx.xxx:2183
#, which matches, builds catalogue jointly
mkdir /opt/opentsdb/static
mkdir /tmp/opentsdb
mkdir /opt/opentsdb/plugins
# builds table
cd /opt/opentsdb/src
env COMPRESSION=NONE HBASE_HOME=/opt/hbase ./create_table.sh
# starting
cd /opt/opentsdb/build
./tsdb tsd &
#, which is checked whether, to be started successfully
http://xxx.xxx.x.xxx:4242/
4. kafka is built
# downloading
wget http://mirrors.shuosc.org/apache/kafka/1.0.0/kafka_2.11-1.0.0.tgz
# decompression
tar -zxvf kafka_2.11-1.0.0.tgz
mv kafka_2.11-1.0.0 kafka
# modification configuration
vim /opt/kafka/config/server.properties
The public zk of # connection
zookeeper.connect=zk-01.xxx.xxxxxxx.xxx:2181,zk-02.xxx.xxxxxxx.xxx:2182,
zk-03.xxx.xxxxxxx.xxx:2183
# starting
cd /opt/kafka/bin
./kafka-server-start.sh ../config/server.properties&
# tests whether to start successfully
./kafka-topics.sh --list --zookeeper zk-01.xxx.xxxxxxx.xxx:2181,zk-
02.xxx.xxxxxxx.xxx:2182,zk-03.xxx.xxxxxxx.xxx:2183
5, Grafana is built
# modifies software source configuration
vim /etc/yum.repos.d/grafana.repo
[grafana]
name=grafana
baseurl=https://packages.grafana.com/oss/rpm
repo_gpgcheck=1
enabled=1
gpgcheck=1
gpgkey=https://packages.grafana.com/gpg.key
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
# installation
yum install grafana
# starting service
6、service grafana-server start
Two, operation system log recording and the maintenance of log analysis rule, the former is by system research and development personnel in key business scene generation
In code, key message is recorded into journal file, such as " xx order places an order success, order number xxxxx ", " xx user's payment
Success, the amount of money are xxx member ";It here is practical citing with the cloud rendering project of company, research staff needs to enter some queue
Queue message quantity is monitored, and just adds log recording, code in this part of module in the item code are as follows:
/**
* building task message is received
*/
@RocketListener(topic = "xxx", consumerGroup = "xxxx")
public void onMessage(Param jobScene) {
log.info("receive construct scene msg[type:{},value:1]|",
jobScene.getRenderType());
if (StringUtils.isBlank(jobScene.getJobId())) {
log.info("receive scene jobId is null: {}", jobScene);
return;
}
try {
// building task storage
constructQueueService.enqueue(jobScene);
} catch (Exception e) {
log.error("save construct scene error: {}, {}", jobScene, ExceptionUt
ils.getStackTrace(e));
}
}
When then thering is the message of the type to be consumed every time, it just will appear following printed words in log:
receive construct scene msg[type:{},value:1]|
Actual log screenshot is as indicated at 3:
It can be clear that, from first symbol of left side number -> > > side to the end symbol | between, required for having us
That is concerned about receives message key receive construct scene msg, thus by this " receive building task message simultaneously
Consumed " rule be maintained into system under corresponding application, as shown in Figure 4:
So far, the monitoring system that the log under the scene and corresponding keyword analysis rule have been entered into us suffers;
In addition, because monitoring system needs to pay close attention to and the abnormal conditions in analysis system, and alerted, it is therefore desirable to it provides another
Outer alarm configuration and the abnormal white list without alarm configure, shown in specific See Figure 5:
So far, business emphasis business scenario of concern and not need the white list being concerned about abnormal, and needed after there is exception
The contact method of alarm notification is wanted to configure completion.
Three, operation system log collection and analysis processing, from step 2, the scene of care is passed through log by us
Form be included in file, but the system that company is related to is dozens or even hundreds of easily, it is necessary to by a kind of means by this
A little logs notify monitoring system in a manner of quasi real time property;Here we use filebeat as log data acquisition
Device, it will quasi real time can need the file acquired to send out with the format of message, while use kafka as communication [industry
Business application] and [monitoring system] bridge, it is final to realize the target for analyzing log in real time, the specific steps are as follows:
1. downloading filebeat file is simultaneously decompressed;
2. modifying configuration file filebeat.yml, the file path for needing to monitor variation is determined, and the kafka being sent to
Location and topic title, by taking system above as an example, concrete configuration file are as follows:
filebeat.inputs:
- type: log
# log output type
enabled: true
encoding: plain
paths:
- /opt/tomcat/system/system.log
Customized two fields of # distinguish Log Types and host
fields:
appName: system
# ignores one hour journal change
ignore_older: 1h
# is begun listening for from end-of-file
tail_files: true
# canonical matches the journal file met
#multiline.pattern: '^[[:space:]]+(at|\.{3})\b|^Caused by:|^.+
Exception:|^\d+\serror'
#multiline.negate: false
multiline.pattern: '^.\d{4}\-\d{2}\-\d{2}.'
multiline.negate: true
multiline.match: after
# # is true, the log of foot canonical matching condition with thumb down
# multiline.negate: true
# # after is appended to behind file
# multiline.match: after
filebeat.config.modules:
path: /opt/filebeat/modules.d/*.yml
reload.enabled: false
setup.template.settings:
index.number_of_shards: 3
output.kafka:
enabled: true
hosts: ["kafka-01.xxx.xxxxx.xx:1111"]
topic: "filebeatsystem"
# version: "2.1"
required_acks: 1
worker: 2
max_message_bytes: 10000000
max_procs: 1
processors:
- add_host_metadata:
netinfo.enabled: true
cache.ttl: 5m
3. order ./filebeat-e-c filebeat.yml is executed, it can be in real time by file/opt/tomcat/system/
The changed content of system.log is sent in the kafka queue of company;
4. monitoring system monitors corresponding kafka queue topic filebeatsystem of configuration in (2), when there is new log
It can be analyzed by consumption after message push.
Four, the alarm of log analysis result and subsequent diagrammatic representation, are divided into two kinds of situations according to log content here:
1, if exception information storehouse, i.e. containing Exception keyword in three kinds of message sended over, then according to (2)
Middle alarm address carries out corresponding warning information push, and being recorded in opentsdb according to the principle that abnormal quantity of the same name adds up should
Summarize in data using abnormal, and just alarm notification is handled according to the alarm address of configuration, the code of this part processing is such as
Under:
private void exceptionPush(String message, String ip, String logFileName,
String appName) {
List<TenantConfig> exceptionNotifies = selectExceptionNotifies
(appName);
//exception polymerize exceptionProcess and alarm
if (!StringUtils.isEmpty(message) && !CollectionUtils.isEmpty
(exceptionNotifies)) {
The abnormal push alarm that // polymerization is completed
// white list
// confirm it is abnormal start by exceptionName
String exceptionName = RedisKeyUtils.extractExceptionNameFileBeat
(message);
if (StringUtils.isEmpty(exceptionName)) {
return;
}
// push opentsdb counts online alarm
log.info("filebeat warning process:{}", exceptionName);
// send alarm notification
if(checkInSilence(appName, ip, exceptionName)) {
log.info("notify silence appname:{},ip:{},exception:{}", appName,
ip, message);
return;
}
// other relevant operations etc.
}
2, non-exception information is then normal business diary record, then at this time with regard to needing judge whether it is that we are configured to (2)
The rule of middle setting, it is for statistical analysis according to keyword to the scene if can be exactly matched according to rule, it is included in
OpentsDB is corresponding to summarize data, and the code of this part is as follows:
public Boolean insertOpenTsdb(String ip, String appName, String
logFileName, String message) {
if (CollectorConfig.regularConfig == null || CollectionUtils.isEmpty
(CollectorConfig.regularConfig.getTenantRegularConfigList())) {
return false;
}
// read nacos configuration
Optional<RegularConfig.TenantRegularConfig> tenantRegularConfigO
ptional = CollectorConfig.regularConfig.getTenantRegularConfigList().
stream().filter(tenantRegularConfig -> tenantRegularConfig.getTe
nantId() != null && tenantRegularConfig.getTenantId().equalsIgnoreCase
(appName)).findAny();
if (tenantRegularConfigOptional.isPresent()) {
List<Regular> regularList = tenantRegularConfigOptional.get()
.getRegularList();
Collections.sort(regularList);
try {
Regular regular = null;
if (null != (regular = isFixedRule(regularList, message))) {
// press logback particular log format, typing hbase
//insertHBaseLogContent(message, appName, data);
The matching of // regulation engine, typing opentsdb
LoggerData data = new LoggerData();
data.setAppName(appName);
data.setCreateTime(com.ihomefnt.sunfire.config.utils.StringUt
ils.now());
data.setSplitExpress("");
data.setAppIp(ip);
data.setLoggerContent(message);
tsdbMetricStore.put(appName, ip, logFileName, data,
regular, tsdbProperties.getOpenTSDB());
}
return true;
} catch (Exception e) {
log.error("insert opentsdb exception:{}", JSON.toJSONString
(message));
return false;
}
}
return false;
}
It is deposited into opentsDB time series data Deng statistical data abnormal, key business in 1,2, at this moment only needs to configure figure
It shows in software Grafana, chart presentation can be carried out according to time trend to certain class business datum,
System relies on the deployment scenario of middleware, as shown in table 1:
Table 1
Wherein hdfs, HBase, kafka are disposed according to fully distributed mode, and opentsDB is due to relying on HBase
Distributed deployment, Grafana mainly do chart presentation, carry out single-point deployment;
3. starting filebeat client in instances in application, typing Apply Names (corresponding topic title) and needs are adopted
The correspondence journal file path of collection:
Push can be acquired to log in real time;
4. being matched to corresponding rule after monitoring system consumption, being included in the corresponding aggregating records of opentsDB;
It if it is exception, is then alerted by the alerting service configured, here by taking nail follows closely Bot as an example, as shown in Figure 7.
5. for the business polymerization data acquired correlation graph displaying can be configured in Grafana, as shown in Figure 8.
It is finally reached the visualization for key business contextual data to present, and abnormal monitoring processing mesh quasi real time
Mark.
Claims (4)
1. a kind of full link monitoring alarm method based on log collection analysis, it is characterised in that: specifically include the following steps;
Step 1, infrastructure service is built;Specifically comprising distributed document storage hdfs, distributed columnar database HBase, message
Middleware kafka, time series database opentsDB and diagrammatic representation unit Grafana;
Wherein, hdfs, for storing the distributed document of data file;
HBase, for externally providing data storage service;
OpentsDB, for the analysis of storage service data as a result, being with the time by building table in HBase and finally externally providing
The data query service of dimension;
Grafana is visualized for final chart;
Kafka realizes decoupling each other for the message efficient transmitting between operation system to be analyzed and monitoring analysis system
Purpose;
Step 2, operation system log recording and the maintenance of log analysis rule;
Step 3, operation system log collection and analysis processing;
Step 4, the alarm of log analysis result and subsequent diagrammatic representation.
2. a kind of full link monitoring alarm method based on log collection analysis according to claim 1, it is characterised in that:
It is specific as follows in the step 2: operation system log recording and the maintenance of log analysis rule
Step 2.1, in key business scene code, key message is recorded into journal file;
Step 2.2, rule is maintained into monitoring system under corresponding application;
Step 2.3, by under the scene journal file and corresponding key message analysis rule typing monitoring system, and complete to accuse
Police configures and the abnormal white list without alarm configures, and then is monitored in real time.
3. a kind of full link monitoring alarm method based on log collection analysis according to claim 1, it is characterised in that:
It is specific as follows in the step 3: to use filebeat as log data acquisition device, the file acquired will be needed with message in real time
Format is sent out, while using kafka as the bridge for linking up service application and monitoring system, realizes analysis log in real time
Target, the specific steps are as follows:
Step 3.1. downloading filebeat file is simultaneously decompressed;
Step 3.2, configuration file filebeat.yml is modified, determines the file path for needing to monitor variation, and be sent to
The address kafka and topic title;
Step 3.3, order ./filebeat-e-c filebeat.yml is executed, it can be in real time by file/opt/tomcat/
The changed content of system/system.log is sent in the kafka queue of company;
Step 3.4, monitoring system is monitored in step 3.2 and configures corresponding kafka queue topic filebeatsystem, when having
It can be analyzed by consumption after new log information push.
4. a kind of full link monitoring alarm method based on log collection analysis according to claim 1, it is characterised in that:
In step 4, the alarm of log analysis result and subsequent diagrammatic representation according to log content are divided into two kinds of situations, specific as follows:
Step 4.1, if exception information storehouse, i.e. containing Exception keyword in three kinds of message sended over, then root
Corresponding warning information push is carried out according to alarm address, and being recorded in opentsdb according to the principle that abnormal quantity of the same name adds up should
Summarize in data using abnormal, and just alarm notification is handled according to the alarm address of configuration;
If non-exception information, then it is normal business diary record, then judges whether to be that we are configured to the rule of setting, if
It can be exactly matched according to rule, then it is for statistical analysis according to keyword to the scene, it is included in that opentsDB is corresponding to summarize
Data;
Step 4.2, and etc. in 4.1 abnormal, key business statistical data be deposited into opentsDB time series data, when only need
It is configured to figure to show in software Grafana, chart presentation can be carried out according to time trend to certain class business datum, it is specific to walk
Suddenly are as follows:
1. newdata source;
2. selecting openTSDB;
3. carrying out correlation openTSDB configuration: then the predominantly port ip+ is saved;
4. creating new dashboard from homepage, and select Add Query;
5. data source and corresponding polymerization methods that allocation map table needs to show, and saved;
6. obtaining the diagrammatic representation of contextual data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910689380.7A CN110457178A (en) | 2019-07-29 | 2019-07-29 | A kind of full link monitoring alarm method based on log collection analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910689380.7A CN110457178A (en) | 2019-07-29 | 2019-07-29 | A kind of full link monitoring alarm method based on log collection analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110457178A true CN110457178A (en) | 2019-11-15 |
Family
ID=68483845
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910689380.7A Withdrawn CN110457178A (en) | 2019-07-29 | 2019-07-29 | A kind of full link monitoring alarm method based on log collection analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110457178A (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110990218A (en) * | 2019-11-22 | 2020-04-10 | 深圳前海环融联易信息科技服务有限公司 | Visualization and alarm method and device based on mass logs and computer equipment |
CN111145405A (en) * | 2019-12-31 | 2020-05-12 | 上海申铁信息工程有限公司 | High-speed railway station gate machine management system |
CN111143157A (en) * | 2019-11-28 | 2020-05-12 | 华为技术有限公司 | Fault log processing method and device |
CN111176937A (en) * | 2019-12-19 | 2020-05-19 | 深圳猛犸电动科技有限公司 | Message middleware monitoring and warning system, method, terminal equipment and storage medium |
CN111371900A (en) * | 2020-03-13 | 2020-07-03 | 北京奇艺世纪科技有限公司 | Method and system for monitoring health state of synchronous link |
CN111367777A (en) * | 2020-03-03 | 2020-07-03 | 腾讯科技(深圳)有限公司 | Alarm processing method, device, equipment and computer readable storage medium |
CN112306636A (en) * | 2020-10-28 | 2021-02-02 | 武汉大势智慧科技有限公司 | Cloud rendering platform and intelligent scheduling method thereof |
CN112328684A (en) * | 2020-11-03 | 2021-02-05 | 浪潮云信息技术股份公司 | Method for synchronizing time sequence data to Kafka in real time based on OpenTsdb |
CN113190415A (en) * | 2021-05-27 | 2021-07-30 | 北京京东拓先科技有限公司 | Internet hospital system monitoring method, equipment, storage medium and program product |
CN113282557A (en) * | 2021-04-27 | 2021-08-20 | 联通(江苏)产业互联网有限公司 | Big data log analysis method and system based on Spring framework |
CN113407421A (en) * | 2021-08-19 | 2021-09-17 | 北京江融信科技有限公司 | Dynamic log record management method and system for micro-service gateway |
CN113434619A (en) * | 2021-06-25 | 2021-09-24 | 南京美慧软件有限公司 | 4g intelligent highway traffic road condition monitoring system |
CN114866401A (en) * | 2022-05-06 | 2022-08-05 | 辽宁振兴银行股份有限公司 | Distributed transaction link log analysis method and system |
CN116232963A (en) * | 2023-02-20 | 2023-06-06 | 中银消费金融有限公司 | Link tracking method and system |
-
2019
- 2019-07-29 CN CN201910689380.7A patent/CN110457178A/en not_active Withdrawn
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110990218A (en) * | 2019-11-22 | 2020-04-10 | 深圳前海环融联易信息科技服务有限公司 | Visualization and alarm method and device based on mass logs and computer equipment |
CN110990218B (en) * | 2019-11-22 | 2023-12-26 | 深圳前海环融联易信息科技服务有限公司 | Visualization and alarm method and device based on massive logs and computer equipment |
CN111143157A (en) * | 2019-11-28 | 2020-05-12 | 华为技术有限公司 | Fault log processing method and device |
CN111176937A (en) * | 2019-12-19 | 2020-05-19 | 深圳猛犸电动科技有限公司 | Message middleware monitoring and warning system, method, terminal equipment and storage medium |
CN111145405A (en) * | 2019-12-31 | 2020-05-12 | 上海申铁信息工程有限公司 | High-speed railway station gate machine management system |
CN111367777A (en) * | 2020-03-03 | 2020-07-03 | 腾讯科技(深圳)有限公司 | Alarm processing method, device, equipment and computer readable storage medium |
CN111371900A (en) * | 2020-03-13 | 2020-07-03 | 北京奇艺世纪科技有限公司 | Method and system for monitoring health state of synchronous link |
CN112306636B (en) * | 2020-10-28 | 2023-06-16 | 武汉大势智慧科技有限公司 | Cloud rendering platform and intelligent scheduling method thereof |
CN112306636A (en) * | 2020-10-28 | 2021-02-02 | 武汉大势智慧科技有限公司 | Cloud rendering platform and intelligent scheduling method thereof |
CN112328684A (en) * | 2020-11-03 | 2021-02-05 | 浪潮云信息技术股份公司 | Method for synchronizing time sequence data to Kafka in real time based on OpenTsdb |
CN113282557A (en) * | 2021-04-27 | 2021-08-20 | 联通(江苏)产业互联网有限公司 | Big data log analysis method and system based on Spring framework |
CN113190415A (en) * | 2021-05-27 | 2021-07-30 | 北京京东拓先科技有限公司 | Internet hospital system monitoring method, equipment, storage medium and program product |
CN113434619A (en) * | 2021-06-25 | 2021-09-24 | 南京美慧软件有限公司 | 4g intelligent highway traffic road condition monitoring system |
CN113407421B (en) * | 2021-08-19 | 2021-11-30 | 北京江融信科技有限公司 | Dynamic log record management method and system for micro-service gateway |
CN113407421A (en) * | 2021-08-19 | 2021-09-17 | 北京江融信科技有限公司 | Dynamic log record management method and system for micro-service gateway |
CN114866401A (en) * | 2022-05-06 | 2022-08-05 | 辽宁振兴银行股份有限公司 | Distributed transaction link log analysis method and system |
CN116232963A (en) * | 2023-02-20 | 2023-06-06 | 中银消费金融有限公司 | Link tracking method and system |
CN116232963B (en) * | 2023-02-20 | 2024-02-09 | 中银消费金融有限公司 | Link tracking method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110457178A (en) | A kind of full link monitoring alarm method based on log collection analysis | |
US11768811B1 (en) | Managing user data in a multitenant deployment | |
US11386127B1 (en) | Low-latency streaming analytics | |
US11334543B1 (en) | Scalable bucket merging for a data intake and query system | |
US11829330B2 (en) | Log data extraction from data chunks of an isolated execution environment | |
US11838351B1 (en) | Customizable load balancing in a user behavior analytics deployment | |
US11416465B1 (en) | Processing data associated with different tenant identifiers | |
US11086974B2 (en) | Customizing a user behavior analytics deployment | |
US10984013B1 (en) | Tokenized event collector | |
US10963347B1 (en) | Data snapshots for configurable screen on a wearable device | |
US10152361B2 (en) | Event stream processing cluster manager | |
US11829381B2 (en) | Data source metric visualizations | |
US11308061B2 (en) | Query management for indexer clusters in hybrid cloud deployments | |
US20220045919A1 (en) | Lower-tier application deployment for higher-tier system | |
US11921693B1 (en) | HTTP events with custom fields | |
WO2022087565A1 (en) | Streaming synthesis of distributed traces from machine logs | |
US20080162690A1 (en) | Application Management System | |
US11841834B2 (en) | Method and apparatus for efficient synchronization of search heads in a cluster using digests | |
US11269808B1 (en) | Event collector with stateless data ingestion | |
US11681707B1 (en) | Analytics query response transmission | |
CN113486095A (en) | Civil aviation air traffic control cross-network safety data exchange management platform | |
CN114756301A (en) | Log processing method, device and system | |
CN115114316A (en) | Processing method, device, cluster and storage medium for high-concurrency data | |
US11113301B1 (en) | Generating metadata for events based on parsed location information of data chunks of an isolated execution environment | |
US11835989B1 (en) | FPGA search in a cloud compute node |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20191115 |
|
WW01 | Invention patent application withdrawn after publication |