CN110457178A

CN110457178A - A kind of full link monitoring alarm method based on log collection analysis

Info

Publication number: CN110457178A
Application number: CN201910689380.7A
Authority: CN
Inventors: 陈旋; 王冲; 张�荣; 闫辛未
Original assignee: Jiangsu Ai Jia Household Articles Co Ltd
Current assignee: Jiangsu Ai Jia Household Articles Co Ltd
Priority date: 2019-07-29
Filing date: 2019-07-29
Publication date: 2019-11-15

Abstract

The present invention relates to a kind of full link monitoring alarm methods based on log collection analysis, belong to technical field of the computer network.The present invention is solving the disadvantage that existing link monitoring system, General Promotion its compatibility, expansion and real-time.Based on print log file is applied, no code intrusion saves the cost of access of new technology stack as long as can access with print log for link data acquisition；For application, it is only necessary to log of interest of record itself, not additional resource loss；The middleware that new system relies on is supported distributed deployment and is expanded without limit levels, and whole expansibility is greatly improved；Using can free configuration log matching rule on demand, crucial business scenario can also be monitored in real time and be alerted；Carried out data transmission by kafka high-performance message-oriented middleware, shortens log and generate to the time really consumed, be finally reached quasi real time property.

Description

A kind of full link monitoring alarm method based on log collection analysis

Technical field

The invention belongs to technical field of the computer network more particularly to a kind of full link monitorings based on log collection analysis Alarm method.

Background technique

It is rooted in the hearts of the people recently as the concept of micro services framework, the quantity of service split out according to different dimensions is increasingly More, ordinary user's simple operation on webpage, APP is often related to multiple service calls of the multiple systems in backstage. This causes the complexity of system entirety to greatly increase, especially when positioning " difficult and complicated cases " across multisystem.

And the appearance of full link monitoring warning system, it is intended to help us understand the behavior of system, and an analysis is provided The tool of performance issue；In order to which when system jam, we quickly can position and solve the problems, such as.

Full link monitoring system based on Java technology stack and http protocol can meet the needs of monitoring to a certain extent, But there are still following problems:

1. compatible insufficient: only Java language and http protocol, the new technology stack of company (C++, Golang etc.) being supported to need volume Outer exploitation client is accessed；

2. performance loss is big: intrusive data acquisition code causes additional negative for the memory, CPU and network bandwidth etc. of application Load；

3. expansibility is poor: there are bottlenecks for the storage of link data, can not smooth water under the premise of portfolio further increases It is flat to expand；

4. business monitoring underbraced: the monitoring being only limitted in system level can not provide effective data supporting for service operation；

5. monitoring hysteresis quality: cannot notify relevant person in charge before problem causes obvious service impact.

Summary of the invention

Present invention seek to address that the shortcomings that existing link monitoring system, General Promotion its compatibility, expansion and in real time Property.

One kind is provided the technical problem to be solved by the present invention is to the deficiency for background technique to analyze based on log collection Full link monitoring alarm method, solve the disadvantage that existing link monitoring system, General Promotion its compatibility, expansion and Real-time.

The present invention uses following technical scheme to solve above-mentioned technical problem:

A kind of full link monitoring alarm method based on log collection analysis, specifically includes the following steps；

Step 1, infrastructure service is built；Specifically comprising distributed document storage hdfs, distributed columnar database HBase, message Middleware kafka, time series database opentsDB and diagrammatic representation unit Grafana；

Wherein, hdfs, for storing the distributed document of data file；

HBase, for externally providing data storage service；

OpentsDB, for the analysis of storage service data as a result, being with the time by building table in HBase and finally externally providing The data query service of dimension；

Grafana is visualized for final chart；

Kafka realizes decoupling each other for the message efficient transmitting between operation system to be analyzed and monitoring analysis system Purpose；

Step 2, operation system log recording and the maintenance of log analysis rule；

Step 3, operation system log collection and analysis processing；

Step 4, the alarm of log analysis result and subsequent diagrammatic representation.

As a kind of further preferred scheme of the full link monitoring alarm method based on log collection analysis of the present invention, institute It states specific as follows in step 2: operation system log recording and the maintenance of log analysis rule

Step 2.1, in key business scene code, key message is recorded into journal file；

Step 2.2, rule is maintained into monitoring system under corresponding application；

Step 2.3, by under the scene journal file and corresponding key message analysis rule typing monitoring system, and complete to accuse Police configures and the abnormal white list without alarm configures, and then is monitored in real time.

As a kind of further preferred scheme of the full link monitoring alarm method based on log collection analysis of the present invention, institute It states specific as follows in step 3: using filebeat as log data acquisition device, the file acquired will be needed with the lattice of message in real time Formula is sent out, while using kafka as the bridge for linking up service application and monitoring system, realizes the mesh of analysis log in real time Mark, the specific steps are as follows:

Step 3.1. downloading filebeat file is simultaneously decompressed；

Step 3.2, configuration file filebeat.yml is modified, determines the file path for needing to monitor variation, and be sent to The address kafka and topic title；

Step 3.3, order ./filebeat-e-c filebeat.yml is executed, it can be in real time by file/opt/tomcat/ The changed content of system/system.log is sent in the kafka queue of company；

Step 3.4, monitoring system is monitored in step 3.2 and configures corresponding kafka queue topic filebeatsystem, when It can be analyzed by consumption after having new log information push.

As a kind of further preferred scheme of the full link monitoring alarm method based on log collection analysis of the present invention, In In step 4, the alarm of log analysis result and subsequent diagrammatic representation according to log content are divided into two kinds of situations, specific as follows:

Step 4.1, if exception information storehouse, i.e. containing Exception keyword in three kinds of message sended over, then root Corresponding warning information push is carried out according to alarm address, and being recorded in opentsdb according to the principle that abnormal quantity of the same name adds up should Summarize in data using abnormal, and just alarm notification is handled according to the alarm address of configuration；

If non-exception information, then it is normal business diary record, then judges whether to be that we are configured to the rule of setting, if It can be exactly matched according to rule, then it is for statistical analysis according to keyword to the scene, it is included in that opentsDB is corresponding to summarize Data；

Step 4.2, and etc. in 4.1 abnormal, key business statistical data be deposited into opentsDB time series data, when only need It is configured to figure to show in software Grafana, chart presentation can be carried out according to time trend to certain class business datum, it is specific to walk Suddenly are as follows:

1. newdata source；

2. selecting openTSDB；

3. carrying out correlation openTSDB configuration: then the predominantly port ip+ is saved；

4. creating new dashboard from homepage, and select Add Query；

5. data source and corresponding polymerization methods that allocation map table needs to show, and saved；

6. obtaining the diagrammatic representation of contextual data.

The invention adopts the above technical scheme compared with prior art, has following technical effect that

1. link data acquisition of the present invention is based on print log file is applied, no code is invaded, as long as can be with print log Access, saves the cost of access of new technology stack；

2. the present invention only needs to record log of interest itself, not additional resource loss for application；

3. the middleware that new system relies on is supported distributed deployment and expanded without limit levels, greatly improve whole Expansibility；

4. present invention application can free configuration log matching rule on demand, the business scenario of key can also be monitored in real time simultaneously Alarm；

5. the present invention carry out data transmission by kafka high-performance message-oriented middleware, shorten log generation to really consume when Between, it is finally reached quasi real time property.

Detailed description of the invention

Fig. 1 is the dependence graph between complication system；

Fig. 2 is overall architecture deployment diagram of the present invention；

Fig. 3 is the actual log screenshot of the present invention；

Fig. 4 is that the of the invention rule for receiving building task message and consuming is maintained into system under corresponding application is illustrated Figure；

Fig. 5 is alarm configuration and the abnormal white list configuration schematic diagram without alarm；

Fig. 6 is log collector filebeat operating status figure；

Fig. 7 is abnormality alarming screenshot of the present invention；

Fig. 8 is that the present invention can configure correlation graph displaying in Grafana.

Specific embodiment

Technical solution of the present invention is described in further detail with reference to the accompanying drawing:

Present invention seek to address that the shortcomings that existing link monitoring system, as shown in Figure 1, General Promotion its compatibility, expansion with And real-time, step include following four, they are respectively: 1, the infrastructure service relied on is built, 2, operation system log Record and log analysis rule maintenance, 3, operation system log collection and analysis processing, 4, log analysis result alarm and it is subsequent Diagrammatic representation, the specific embodiment of each step under next I comes specifically.

As shown in Fig. 2, the one, infrastructure service that is relied on is built, wherein including following a few class services, middleware: hdfs (distributed document storage), HBase(distribution columnar database), kafka(message-oriented middleware), opentsDB(time series data Library) cluster environment, diagrammatic representation software Grafana；Wherein, hdfs is the distributed document storage system of final storage data file System；HBase relies on the file service dependent on hdfs finally externally to provide data storage service；OpentsDB is a open source Time series database, for the analysis of storage service data as a result, it is with the time by building table in HBase and finally externally providing The data query service of dimension；Grafana is the metric analysis and visualization external member of an open source, is visualized for final chart It shows；Kafka is the high-effect Message Queuing Middleware of a open source, between operation system to be analyzed and monitoring analysis system Message efficient transmitting, it is final to realize the purpose decoupled each other.

Separately expand this five kinds services, middleware builds specific steps:

1. hdfs is built

# downloading decompression

wget http://apache.claz.org/hadoop/common/hadoop-2.7.7/hadoop- 2.7.7.tar.gz

tar -zxvf hadoop-2.7.7.tar.gz

# modification solution extrudes folder name, removes version number

mv /opt/hadoop-2.7.7 /opt/hadoop

# increases catalogue newly

mkdir /home/hadoop/

# adds environmental variance

vim /home/hadoop/.bashrc

export JAVA_HOME=/usr/local/jdk1.8.0_101

export JRE_HOME=/usr/local/jdk1.8.0_101/jre

export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_ HOME/lib

export HADOOP_HOME=/opt/hadoop

export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_ HOME/sbin

#vim /opt/hadoop/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/local/jdk1.8.0_101

In #vim/opt/hadoop/etc/hadoop/yarn-env.sh configuration file

Environmental variance is added after " some Java parameters "

export JAVA_HOME=/usr/local/jdk1.8.0_101

#vim /opt/hadoop/etc/hadoop/core-site.xml

Following content is added in configuration file

<name>fs.defaultFS</name>

<value>hdfs://hdfs-01.xxx.xxxxxxx.xxx:9000</value>

</property>

<name>io.file.buffer.size</name>

</property>

<name>hadoop.tmp.dir</name>

<value>file:/opt/hadoop/tmp</value>

<description>Abasefor other temporary directories.</description>

</property>

<name>hadoop.proxyuser.Spark.hosts</name>

</property>

<name>hadoop.proxyuser.spark.groups</name>

</property>

#vi /opt/hadoop/etc/hadoop/hdfs-site.xml

Following content is added in configuration file:

<name>dfs.namenode.secondary.http-address</name>

<value>hdfs-01.xxx.xxxxxxx.xxx:9001</value>

</property>

<name>dfs.namenode.name.dir</name>

<value>file:/opt/hadoop/dfs/name</value>

</property>

<name>dfs.datanode.data.dir</name>

<value>file:/opt/hadoop/dfs/data</value>

</property>

<name>dfs.replication</name>

</property>

<name>dfs.webhdfs.enabled</name>

</property>

#cp /opt/hadoop/etc/hadoop/mapred-site.xml.template /opt/hadoop/etc/ hadoop/mapred-site.xml

vim /opt/hadoop/etc/hadoop/mapred-site.xml

And following content is added in configuration file

<name>mapreduce.framework.name</name>

</property>

<name>mapreduce.jobhistory.address</name>

<value>hdfs-01.xxx.xxxxxxx.xxx:10020</value>

</property>

<name>mapreduce.jobhistory.webapp.address</name>

<value>hdfs-01.xxx.xxxxxxx.xxx:19888</value>

</property>

#vim /opt/hadoop/etc/hadoop/yarn-site.xml

Following content is added in configuration file

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

<name>yarn.resourcemanager.address</name>

<value>hdfs-01.xxx.xxxxxxx.xxx:8032</value>

</property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>hdfs-01.xxx.xxxxxxx.xxx:8030</value>

</property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>hdfs-01.xxx.xxxxxxx.xxx:8035</value>

</property>

<name>yarn.resourcemanager.admin.address</name>

<value>hdfs-01.xxx.xxxxxxx.xxx:8033</value>

</property>

<name>yarn.resourcemanager.webapp.address</name>

<value>hdfs-01.xxx.xxxxxxx.xxx:8088</value>

</property>

#mkdir /opt/hadoop/etc/hadoop/tmp/

And read right is distributed into hadoop user:

# starts hadoop service

1 > format HDFS distributed file system

cd /opt/hadoop/bin

Execute ./hdfs namenode-format

All services of 2 > starting

cd /opt/hadoop/sbin

./start-all.sh

2. HBase is built

# downloading

wget http://apache-mirror.rbc.ru/pub/apache/hbase/stable/hbase-1.4.8- bin.tar.gz

# decompression

tar -zxvf hbase-1.4.8-bin.tar.gz

# renames

mv hbase-1.4.8 hbase

The # configuration connection address hdfs

vim /opt/hbase/conf/hbase-site.xml

<name>hbase.rootdir</name>

<value>hdfs://hdfs-01.xxx.xxxxxxx.xxx:9000/hbase</value>

</property>

<name>hbase.cluster.distributed</name>

</property>

<name>hbase.zookeeper.quorum</name>

<value>zk-01.xxx.xxxxxxx.xxx:2181,zk-02.xxx.xxxxxxx.xxx:2182,zk- 03.xxx.xxxxxxx.xxx:2183</value>

</property>

<!-- dns parsing -- >

<name>hbase.master.dns.interface</name>

<!-- being arranged according to practical network interface card situation -- >

<value>default</value>

</property>

<name>hbase.master.dns.nameserver</name>

</property>

<name>hbase.regionserver.dns.interface</name>

<value>default</value>

</property>

<name>hbase.regionserver.dns.nameserver</name>

</property>

# configuration surroundings variable

vi /opt/hbase/conf/hbase-env.sh

# specifies the address jdk

export JAVA_HOME=/usr/local/jdk1.8.0_101

The zk that # is even shared is set to false, subsequent to need to establish association with opentsDB so needing the even same external zk

export HBASE_MANAGES_ZK=false

# adds environmental variance

vi /etc/profile

export HBASE_HOME=/opt/hbase

export PATH=$PATH:/opt/hbase/bin

# starting and closing

cd /opt/hbase/bin

Starting

./start-hbase.sh

It closes

./stop-hbase.sh

# checks whether to start successfully

http://xxx.xxx.xx.xx:16010/master-status

3. opentsDB is built

# downloads code

git clone git://github.com/OpenTSDB/opentsdb.git

The plug-in unit of a dependence may be needed to install before # compiling

yum -y install libtool

# compiling

cd opentsdb

./build.sh

# installation

./configure

make install

# configuration surroundings build table

mkdir /etc/opentsdb/

vi /etc/opentsdb/opentsdb.conf

# modification configuration

tsd.network.port=4242

tsd.http.staticroot = /opt/opentsdb/static/

tsd.http.cachedir = /tmp/opentsdb

tsd.core.plugin_path = /opt/opentsdb/plugins

tsd.storage.hbase.zk_quorum = zk-01.xxx.xxxxxxx.xxx:2181,zk- 02.xxx.xxxxxxx.xxx:2182,zk-03.xxx.xxxxxxx.xxx:2183

#, which matches, builds catalogue jointly

mkdir /opt/opentsdb/static

mkdir /tmp/opentsdb

mkdir /opt/opentsdb/plugins

# builds table

cd /opt/opentsdb/src

env COMPRESSION=NONE HBASE_HOME=/opt/hbase ./create_table.sh

# starting

cd /opt/opentsdb/build

./tsdb tsd &

#, which is checked whether, to be started successfully

http://xxx.xxx.x.xxx:4242/

4. kafka is built

# downloading

wget http://mirrors.shuosc.org/apache/kafka/1.0.0/kafka_2.11-1.0.0.tgz

# decompression

tar -zxvf kafka_2.11-1.0.0.tgz

mv kafka_2.11-1.0.0 kafka

# modification configuration

vim /opt/kafka/config/server.properties

The public zk of # connection

zookeeper.connect=zk-01.xxx.xxxxxxx.xxx:2181,zk-02.xxx.xxxxxxx.xxx:2182, zk-03.xxx.xxxxxxx.xxx:2183

# starting

cd /opt/kafka/bin

./kafka-server-start.sh ../config/server.properties&

# tests whether to start successfully

./kafka-topics.sh --list --zookeeper zk-01.xxx.xxxxxxx.xxx:2181,zk- 02.xxx.xxxxxxx.xxx:2182,zk-03.xxx.xxxxxxx.xxx:2183

5, Grafana is built

# modifies software source configuration

vim /etc/yum.repos.d/grafana.repo

[grafana]

name=grafana

baseurl=https://packages.grafana.com/oss/rpm

repo_gpgcheck=1

enabled=1

gpgcheck=1

gpgkey=https://packages.grafana.com/gpg.key

sslverify=1

sslcacert=/etc/pki/tls/certs/ca-bundle.crt

# installation

yum install grafana

# starting service

6、service grafana-server start

Two, operation system log recording and the maintenance of log analysis rule, the former is by system research and development personnel in key business scene generation In code, key message is recorded into journal file, such as " xx order places an order success, order number xxxxx ", " xx user's payment Success, the amount of money are xxx member "；It here is practical citing with the cloud rendering project of company, research staff needs to enter some queue Queue message quantity is monitored, and just adds log recording, code in this part of module in the item code are as follows:

/**

* building task message is received

*/

@RocketListener(topic = "xxx", consumerGroup = "xxxx")

public void onMessage(Param jobScene) {

log.info("receive construct scene msg[type:{},value:1]|", jobScene.getRenderType());

if (StringUtils.isBlank(jobScene.getJobId())) {

log.info("receive scene jobId is null: {}", jobScene);

return;

}

try {

// building task storage

constructQueueService.enqueue(jobScene);

} catch (Exception e) {

log.error("save construct scene error: {}, {}", jobScene, ExceptionUt ils.getStackTrace(e));

}

When then thering is the message of the type to be consumed every time, it just will appear following printed words in log:

receive construct scene msg[type:{},value:1]|

Actual log screenshot is as indicated at 3:

It can be clear that, from first symbol of left side number -> > > side to the end symbol | between, required for having us That is concerned about receives message key receive construct scene msg, thus by this " receive building task message simultaneously Consumed " rule be maintained into system under corresponding application, as shown in Figure 4:

So far, the monitoring system that the log under the scene and corresponding keyword analysis rule have been entered into us suffers；

In addition, because monitoring system needs to pay close attention to and the abnormal conditions in analysis system, and alerted, it is therefore desirable to it provides another Outer alarm configuration and the abnormal white list without alarm configure, shown in specific See Figure 5:

So far, business emphasis business scenario of concern and not need the white list being concerned about abnormal, and needed after there is exception The contact method of alarm notification is wanted to configure completion.

Three, operation system log collection and analysis processing, from step 2, the scene of care is passed through log by us Form be included in file, but the system that company is related to is dozens or even hundreds of easily, it is necessary to by a kind of means by this A little logs notify monitoring system in a manner of quasi real time property；Here we use filebeat as log data acquisition Device, it will quasi real time can need the file acquired to send out with the format of message, while use kafka as communication [industry Business application] and [monitoring system] bridge, it is final to realize the target for analyzing log in real time, the specific steps are as follows:

1. downloading filebeat file is simultaneously decompressed;

2. modifying configuration file filebeat.yml, the file path for needing to monitor variation is determined, and the kafka being sent to Location and topic title, by taking system above as an example, concrete configuration file are as follows:

filebeat.inputs:

- type: log

# log output type

enabled: true

encoding: plain

paths:

- /opt/tomcat/system/system.log

Customized two fields of # distinguish Log Types and host

fields:

appName: system

# ignores one hour journal change

ignore_older: 1h

# is begun listening for from end-of-file

tail_files: true

# canonical matches the journal file met

#multiline.pattern: '^[[:space:]]+(at|\.{3})\b|^Caused by:|^.+ Exception:|^\d+\serror'

#multiline.negate: false

multiline.pattern: '^.\d{4}\-\d{2}\-\d{2}.'

multiline.negate: true

multiline.match: after

# # is true, the log of foot canonical matching condition with thumb down

# multiline.negate: true

# # after is appended to behind file

# multiline.match: after

filebeat.config.modules:

path: /opt/filebeat/modules.d/*.yml

reload.enabled: false

setup.template.settings:

index.number_of_shards: 3

output.kafka:

enabled: true

hosts: ["kafka-01.xxx.xxxxx.xx:1111"]

topic: "filebeatsystem"

# version: "2.1"

required_acks: 1

worker: 2

max_message_bytes: 10000000

max_procs: 1

processors:

- add_host_metadata:

netinfo.enabled: true

cache.ttl: 5m

3. order ./filebeat-e-c filebeat.yml is executed, it can be in real time by file/opt/tomcat/system/ The changed content of system.log is sent in the kafka queue of company；

4. monitoring system monitors corresponding kafka queue topic filebeatsystem of configuration in (2), when there is new log It can be analyzed by consumption after message push.

Four, the alarm of log analysis result and subsequent diagrammatic representation, are divided into two kinds of situations according to log content here:

1, if exception information storehouse, i.e. containing Exception keyword in three kinds of message sended over, then according to (2) Middle alarm address carries out corresponding warning information push, and being recorded in opentsdb according to the principle that abnormal quantity of the same name adds up should Summarize in data using abnormal, and just alarm notification is handled according to the alarm address of configuration, the code of this part processing is such as Under:

private void exceptionPush(String message, String ip, String logFileName, String appName) {

List<TenantConfig> exceptionNotifies = selectExceptionNotifies (appName);

//exception polymerize exceptionProcess and alarm

if (!StringUtils.isEmpty(message) && !CollectionUtils.isEmpty (exceptionNotifies)) {

The abnormal push alarm that // polymerization is completed

// white list

// confirm it is abnormal start by exceptionName

String exceptionName = RedisKeyUtils.extractExceptionNameFileBeat (message);

if (StringUtils.isEmpty(exceptionName)) {

return;

}

// push opentsdb counts online alarm

log.info("filebeat warning process:{}", exceptionName);

// send alarm notification

if(checkInSilence(appName, ip, exceptionName)) {

log.info("notify silence appname:{},ip:{},exception:{}", appName, ip, message);

return;

}

// other relevant operations etc.

}

2, non-exception information is then normal business diary record, then at this time with regard to needing judge whether it is that we are configured to (2) The rule of middle setting, it is for statistical analysis according to keyword to the scene if can be exactly matched according to rule, it is included in OpentsDB is corresponding to summarize data, and the code of this part is as follows:

public Boolean insertOpenTsdb(String ip, String appName, String logFileName, String message) {

if (CollectorConfig.regularConfig == null || CollectionUtils.isEmpty (CollectorConfig.regularConfig.getTenantRegularConfigList())) {

return false;

}

// read nacos configuration

Optional<RegularConfig.TenantRegularConfig> tenantRegularConfigO ptional = CollectorConfig.regularConfig.getTenantRegularConfigList().

stream().filter(tenantRegularConfig -> tenantRegularConfig.getTe nantId() != null && tenantRegularConfig.getTenantId().equalsIgnoreCase (appName)).findAny();

if (tenantRegularConfigOptional.isPresent()) {

List<Regular> regularList = tenantRegularConfigOptional.get() .getRegularList();

Collections.sort(regularList);

try {

Regular regular = null;

if (null != (regular = isFixedRule(regularList, message))) {

// press logback particular log format, typing hbase

//insertHBaseLogContent(message, appName, data);

The matching of // regulation engine, typing opentsdb

LoggerData data = new LoggerData();

data.setAppName(appName);

data.setCreateTime(com.ihomefnt.sunfire.config.utils.StringUt ils.now());

data.setSplitExpress("");

data.setAppIp(ip);

data.setLoggerContent(message);

tsdbMetricStore.put(appName, ip, logFileName, data, regular, tsdbProperties.getOpenTSDB());

}

return true;

} catch (Exception e) {

log.error("insert opentsdb exception:{}", JSON.toJSONString (message));

return false;

}

return false;

}

It is deposited into opentsDB time series data Deng statistical data abnormal, key business in 1,2, at this moment only needs to configure figure It shows in software Grafana, chart presentation can be carried out according to time trend to certain class business datum,

System relies on the deployment scenario of middleware, as shown in table 1:

Table 1

Wherein hdfs, HBase, kafka are disposed according to fully distributed mode, and opentsDB is due to relying on HBase Distributed deployment, Grafana mainly do chart presentation, carry out single-point deployment；

3. starting filebeat client in instances in application, typing Apply Names (corresponding topic title) and needs are adopted The correspondence journal file path of collection:

Push can be acquired to log in real time；

4. being matched to corresponding rule after monitoring system consumption, being included in the corresponding aggregating records of opentsDB；

It if it is exception, is then alerted by the alerting service configured, here by taking nail follows closely Bot as an example, as shown in Figure 7.

5. for the business polymerization data acquired correlation graph displaying can be configured in Grafana, as shown in Figure 8.

It is finally reached the visualization for key business contextual data to present, and abnormal monitoring processing mesh quasi real time Mark.

Claims

1. a kind of full link monitoring alarm method based on log collection analysis, it is characterised in that: specifically include the following steps；

Wherein, hdfs, for storing the distributed document of data file；

HBase, for externally providing data storage service；

Grafana is visualized for final chart；

Step 3, operation system log collection and analysis processing；

2. a kind of full link monitoring alarm method based on log collection analysis according to claim 1, it is characterised in that: It is specific as follows in the step 2: operation system log recording and the maintenance of log analysis rule

3. a kind of full link monitoring alarm method based on log collection analysis according to claim 1, it is characterised in that: It is specific as follows in the step 3: to use filebeat as log data acquisition device, the file acquired will be needed with message in real time Format is sent out, while using kafka as the bridge for linking up service application and monitoring system, realizes analysis log in real time Target, the specific steps are as follows:

Step 3.1. downloading filebeat file is simultaneously decompressed；

Step 3.4, monitoring system is monitored in step 3.2 and configures corresponding kafka queue topic filebeatsystem, when having It can be analyzed by consumption after new log information push.

4. a kind of full link monitoring alarm method based on log collection analysis according to claim 1, it is characterised in that: In step 4, the alarm of log analysis result and subsequent diagrammatic representation according to log content are divided into two kinds of situations, specific as follows:

1. newdata source；

2. selecting openTSDB；

4. creating new dashboard from homepage, and select Add Query；

6. obtaining the diagrammatic representation of contextual data.