CN110297745A - A kind of Fault Locating Method and system storing monitoring system - Google Patents

A kind of Fault Locating Method and system storing monitoring system Download PDF

Info

Publication number
CN110297745A
CN110297745A CN201910600199.4A CN201910600199A CN110297745A CN 110297745 A CN110297745 A CN 110297745A CN 201910600199 A CN201910600199 A CN 201910600199A CN 110297745 A CN110297745 A CN 110297745A
Authority
CN
China
Prior art keywords
log information
data
module
storage
storage array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910600199.4A
Other languages
Chinese (zh)
Inventor
曾凌波
卢宇彤
杜云飞
贾卓
杨杰
彭运勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
National Sun Yat Sen University
Original Assignee
National Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Sun Yat Sen University filed Critical National Sun Yat Sen University
Priority to CN201910600199.4A priority Critical patent/CN110297745A/en
Publication of CN110297745A publication Critical patent/CN110297745A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3034Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present invention relates to a kind of Fault Locating Methods and system for storing monitoring system, and system includes underlying device data acquisition module, data filtering module, underlying device data collection module, data processing module, data visualization module and data communication module;The present invention solves the monitored item Indexes Comparison dispersion of O&M monitoring system in the prior art, the deficiency that operation maintenance personnel intuitively can not be positioned quickly, the underlying device data acquisition module deployment that the present invention uses is simple, accuracy, real-time are high, each monitoring function module is mutually indepedent, coupling is low, convenient for expanding of system function, the present invention is easy to use, simple and effective, by fault data in data visualization module centralized displaying, ease of use personnel observe it, can fast implement the positioning to fault point.

Description

A kind of Fault Locating Method and system storing monitoring system
Technical field
The present invention relates to monitoring fields, more particularly, to a kind of Fault Locating Method and system for storing monitoring system.
Background technique
Continuous along with high-performance calculation business is expanded and the appearance of novel HPC, BD and AI integration technology no matter It is data center or Supercomputer Center, all to the reliability of the data storage of data storage infrastructure, capacity scale and deposits Storing up aspect of performance, more stringent requirements are proposed.It is well known that while storage size expands, the probability of software and hardware failure It can be promoted, and how ensure that the reliability of data storage becomes most important while promoting storage size and performance.Therefore, Operation maintenance personnel needs to monitor storage software and hardware in real time, finds the problem of storage cluster and handles in time in time.
In existing data center and Supercomputer Center, the O&M monitoring to storage system is mainly that software and hardware separates individually prison Control.Hardware aspect, with the expansion of storage system scale, store equipment be related to not can avoid different vendor, different brands with And different frameworks, and the equipment of these different vendors, different brands and different frameworks may all have a set of respective monitoring management System, operation maintenance personnel needs go study using these management systems, can also aggravate the routine work amount of operation maintenance personnel;Software side Face, for the difference of business, data center and Supercomputer Center will use the storage software of suitable service operation, and in HPC cluster System mainly uses Lustre file system as the data management software on storage system upper layer.Existing open source monitoring software Very much, but in order to improve versatility, each monitored item Indexes Comparison dispersion is live when entire storage system breaks down The root that failure may occur with problem is inconsistent, and operation maintenance personnel can not be quickly located intuitively where the root of failure.
Summary of the invention
In order to solve the monitored item Indexes Comparison dispersion of O&M monitoring system in the prior art, operation maintenance personnel can not be intuitively fast Deficiency where the root of the positioning failure of speed, the present invention provides a kind of Fault Locating Methods for storing monitoring system.
A kind of Fault Locating Method storing monitoring system, comprising the following steps:
Step S1: each storage array and storage server in the underlying device module of storage monitoring network dispose bottom Layer device data acquisition module;
Step S2: collected storage in each storage array of underlying device data collecting module collected and storage server Array log information and storage server log information, and by collected storage array log information and storage server log Information is transferred in data filtering module;
Step S3: the data filtering module believes collected storage array log information and storage server log Breath is filtered by Logstash filter, and Logstash filter is unified to come underlying device data collecting module collected Data be filtered processing, Logstash filter by initial data filtering lay equal stress on it is newly organized line up general format, reject tool There are the storage array log information and storage server log information of noise, and by filtered storage array log information and deposits Storage server log information is transferred in underlying device data collection module;
Step S4: underlying device data collection module is by filtered storage array log information and storage server log Information, which is stored into database, carries out persistence preservation;
Step S5: the data processing module storage array log information stored of acquisition underlying device data collection module and Storage server log information, and storage array log information and storage server log information are handled, according to The data format of Logstash layout finds out data label corresponding with underlying device module one by one, by these data labels Property Name is converted to, finds out fault point according to Property Name, positions the underlying device module to break down, and will be after processing Data and fault message be sent in data visualization module and data communication module;
Step S6: treated data and fault message are shown that data are logical by the data visualization module Letter module sends treated data and fault message in mobile terminal.
Preferably, data processing module believes storage array log information and storage server log in the step S5 The process handled is ceased using timing function, timing function periodically connects underlying device data collection module, and To in underlying device data collection module storage array log information and storage server log information handle, from acquisition To storage array log information and storage server log information in filter out fault message, by timing function will handle after Data and fault message be sent in data communication module.
Preferably, data processing module believes storage array log information and storage server log in the step S5 Breath using restful api function will treated data and fault message jsonization for data visualization module calling.
The present invention also provides a kind of fault location system for storing monitoring system, the system based on above-mentioned method, System include underlying device data acquisition module, data filtering module, underlying device data collection module, data processing module, Data visualization module and data communication module;
The underlying device data acquisition module is used to acquire the storage array of each storage array and storage server Log information and storage server log information;
The data filtering module is used for the storage array log information arrived to underlying device data collecting module collected It is filtered with storage server log information, rejects the storage array log information with noise and storage server log letter Breath;
The underlying device data collection module is for the collected storage of storage bottom device data acquisition module Array log information and storage server log information;
The data processing module is used for collected storage array log information and storage server log information It is analyzed and processed;
The data visualization module is used for data processing module treated storage array log information and storage Server log information is visualized;
The data communication module is for storage array log information and storage service after handling data processing module Device log information is transferred in mobile terminal.
When system works, underlying device data acquisition module BOB(beginning of block) acquires the storage of each storage array and storage server Array log information and storage server log information, and by collected storage array log information and storage server log Information is sent to data filtering module, and data filtering module carries out storage array log information and storage server log information The storage array log information and storage server log information with noise are rejected in filtering, and by filtered storage array Log information and storage server log information, which are sent in underlying device data collection module, to be stored, and data processing module is the bottom of from Storage array log information and storage server log information are extracted in layer device data collection module, and to storage array log Information and storage server log information are analyzed and processed;Will treated data are separately sent to data visualization module with And in data communication module, treated that data visualize to data processing module for data visualization module;Data Communication module is transferred to shifting for storage array log information after handling data processing module and storage server log information In dynamic terminal, checked convenient for staff.
Preferably, the data processing module is provided with multiple, one of them is main data processing module, remaining is standby With data processing module, when master data processing module breaks down, preliminary data processing module is actuated for work until main Data processing module restores normal condition.
Preferably, the data processing module and data visualization module use b/s mode.
Compared with prior art, the beneficial effect of technical solution of the present invention is:
The present invention solves the monitored item Indexes Comparison dispersion of O&M monitoring system in the prior art, and operation maintenance personnel can not be straight The deficiency quickly positioned is seen, the underlying device data acquisition module deployment that the present invention uses is simple, accuracy, real-time are high, Each functional module is mutually indepedent, coupling is low, is convenient for expanding of system function, easy to use, simple and effective of the invention, by number of faults According to the centralized displaying in data visualization module, ease of use personnel observe it, can fast implement and determine fault point Position.
Detailed description of the invention
Fig. 1 is flow chart of the method for the present invention.
Fig. 2 is this system frame diagram.
Specific embodiment
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;
In order to better illustrate this embodiment, the certain components of attached drawing have omission, zoom in or out, and do not represent actual product Size;
To those skilled in the art, it is to be understood that certain known features and its explanation, which may be omitted, in attached drawing 's.
The following further describes the technical solution of the present invention with reference to the accompanying drawings and examples.
Embodiment 1
As shown in Figure 1, a kind of Fault Locating Method for storing monitoring system, comprising the following steps:
Step S1: each storage array and storage server in the underlying device module of storage monitoring network dispose bottom Layer device data acquisition module;
Step S2: collected storage in each storage array of underlying device data collecting module collected and storage server Array log information and storage server log information, and by collected storage array log information and storage server log Information is transferred in data filtering module;
Step S3: the data filtering module believes collected storage array log information and storage server log Breath is filtered by Logstash filter, and Logstash filter is unified to come underlying device data collecting module collected Data be filtered processing, Logstash filter by initial data filtering lay equal stress on it is newly organized line up general format, reject tool There are the storage array log information and storage server log information of noise, and by filtered storage array log information and deposits Storage server log information is transferred in underlying device data collection module;
Step S4: underlying device data collection module is by filtered storage array log information and storage server log Information, which is stored into database, carries out persistence preservation;
Step S5: the data processing module storage array log information stored of acquisition underlying device data collection module and Storage server log information, and storage array log information and storage server log information are handled, according to The data format of Logstash layout finds out data label corresponding with underlying device module one by one, by these data labels Property Name is converted to, finds out fault point according to Property Name, positions the underlying device module to break down, and will be after processing Data and fault message be sent in data visualization module and data communication module;
Step S6: treated data and fault message are shown that data are logical by the data visualization module Letter module sends treated data and fault message in mobile terminal.
As a preferred embodiment, data processing module to storage array log information and is deposited in the step S5 For the process that storage server log information is handled using timing function, timing function periodically connects underlying device number According to collection module, and in underlying device data collection module storage array log information and storage server log information into Row processing, filters out fault message from collected storage array log information and storage server log information, by timing Device function sends treated data and fault message in data communication module.
As a preferred embodiment, data processing module to storage array log information and is deposited in the step S5 Store up server log information using restful api function will treated data and fault message jsonization for data visualization Change module to call.
Embodiment 2
As shown in Fig. 2, a kind of unified monitoring system of High-Performance Computing Cluster storage monitoring network is present embodiments provided, it is described System is based on above-mentioned method, including underlying device data acquisition module, data filtering module, underlying device data collection mould Block, data processing module, data visualization module and data communication module;
The underlying device data acquisition module is used to acquire the storage array of each storage array and storage server Log information and storage server log information;
The data filtering module is used for the storage array log information arrived to underlying device data collecting module collected It is filtered with storage server log information, rejects the storage array log information with noise and storage server log letter Breath;
The underlying device data collection module is for the collected data of storage bottom device data acquisition module;
The data processing module is for being analyzed and processed collected data;
The data visualization module is used for that treated that data visualize to data processing module;
The data communication module is for storage array log information and storage service after handling data processing module Device log information is transferred in mobile terminal.
When system works, underlying device data acquisition module BOB(beginning of block) acquires the storage of each storage array and storage server Array log information and storage server log information, and by collected storage array log information and storage server log Information is sent to data filtering module, and data filtering module carries out storage array log information and storage server log information The storage array log information and storage server log information with noise are rejected in filtering, and by filtered storage array Log information and storage server log information, which are sent in underlying device data collection module, to be stored, and data processing module is the bottom of from Storage array log information and storage server log information are extracted in layer device data collection module, and to storage array log Information and storage server log information are analyzed and processed;Will treated data are separately sent to data visualization module with And in data communication module, treated that data visualize to data processing module for data visualization module;
Data communication module is for storage array log information and storage server log after handling data processing module Information is transferred in mobile terminal, is checked convenient for staff.
As a preferred embodiment, there are two the data processing module settings, and respectively master data handles mould Block and preliminary data processing module, when master data processing module breaks down, preliminary data processing module is actuated for work Make until master data processing module restores normal condition.
As a preferred embodiment, the data processing module and data visualization module use b/s mode.
The same or similar label correspond to the same or similar components;
The terms describing the positional relationship in the drawings are only for illustration, should not be understood as the limitation to this patent;
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.

Claims (6)

1. a kind of Fault Locating Method for storing monitoring system, which comprises the following steps:
Step S1: each storage array and storage server deployment bottom in the underlying device module of storage monitoring network are set Standby data acquisition module;
Step S2: collected storage array in each storage array of underlying device data collecting module collected and storage server Log information and storage server log information, and by collected storage array log information and storage server log information It is transferred in data filtering module;
Step S3: the data filtering module is logical to collected storage array log information and storage server log information It crosses Logstash filter to be filtered, the unified number that underlying device data collecting module collected comes of Logstash filter According to being filtered processing, Logstash filter by initial data filtering lay equal stress on it is newly organized line up general format, reject to have and make an uproar The storage array log information and storage server log information of sound, and filtered storage array log information and storage are taken Business device log information is transferred in underlying device data collection module;
Step S4: underlying device data collection module is by filtered storage array log information and storage server log information Storage carries out persistence preservation into database;
Step S5: the storage array log information and storage that data processing module acquisition underlying device data collection module is stored Server log information, and storage array log information and storage server log information are handled, according to Logstash The data format of layout, finds out data label corresponding with underlying device module one by one, these data labels are converted to category Property title, find out fault point according to Property Name, position the underlying device module to break down, and will treated data with And fault message is sent in data visualization module and data communication module;
Step S6: the data visualization module shows treated data and fault message, data communication mould Block sends treated data and fault message in mobile terminal.
2. a kind of Fault Locating Method for storing monitoring system according to claim 1, which is characterized in that the step The process that data processing module handles storage array log information and storage server log information in S5 uses timing Device function, timing function periodically connect underlying device data collection module, and in underlying device data collection module Storage array log information and storage server log information handled, from collected storage array log information and depositing Fault message is filtered out in storage server log information, is sent treated data and fault message to by timing function In data communication module.
3. a kind of Fault Locating Method for storing monitoring system according to claim 2, which is characterized in that the step Data processing module will locate storage array log information and storage server log information using restful api function in S5 Data and fault message jsonization after reason are called for data visualization module.
4. a kind of fault location system for storing monitoring system, the system are based on the described in any item sides of claims 1 to 3 Method, which is characterized in that including underlying device data acquisition module, data filtering module, underlying device data collection module, data Processing module, data visualization module and data communication module;
The underlying device data acquisition module is used to acquire the storage array log of each storage array and storage server Information and storage server log information;
The data filtering module be used for underlying device data collecting module collected to storage array log information and deposit Storage server log information is filtered, and rejects the storage array log information and storage server log information with noise;
The underlying device data collection module is for the collected storage array of storage bottom device data acquisition module Log information and storage server log information;
The data processing module is used to carry out collected storage array log information and storage server log information Analysis processing;
The data visualization module is used for data processing module treated storage array log information and storage service Device log information is visualized;
The data communication module is for storage array log information and storage server day after handling data processing module Will information is transferred in mobile terminal.
5. a kind of fault location system for storing monitoring system according to claim 4, which is characterized in that the data Processing module be provided with it is multiple, one of them be main data processing module, remaining be preliminary data processing module.
6. a kind of fault location system for storing monitoring system according to claim 4, which is characterized in that the data Processing module and data visualization module use b/s mode.
CN201910600199.4A 2019-07-04 2019-07-04 A kind of Fault Locating Method and system storing monitoring system Pending CN110297745A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910600199.4A CN110297745A (en) 2019-07-04 2019-07-04 A kind of Fault Locating Method and system storing monitoring system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910600199.4A CN110297745A (en) 2019-07-04 2019-07-04 A kind of Fault Locating Method and system storing monitoring system

Publications (1)

Publication Number Publication Date
CN110297745A true CN110297745A (en) 2019-10-01

Family

ID=68030352

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910600199.4A Pending CN110297745A (en) 2019-07-04 2019-07-04 A kind of Fault Locating Method and system storing monitoring system

Country Status (1)

Country Link
CN (1) CN110297745A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104714863A (en) * 2015-02-06 2015-06-17 浪潮电子信息产业股份有限公司 Method for completely storing Raid card logs on basis of Linux operation system after system crashes
CN107463602A (en) * 2017-06-15 2017-12-12 努比亚技术有限公司 A kind of log processing method and server, client
CN107729214A (en) * 2017-10-13 2018-02-23 福建富士通信息软件有限公司 A kind of visual distributed system monitors O&M method and device in real time
CN107943668A (en) * 2017-12-15 2018-04-20 江苏神威云数据科技有限公司 Computer server cluster daily record monitoring method and monitor supervision platform
CN109492073A (en) * 2018-10-31 2019-03-19 北京达佳互联信息技术有限公司 Blog search method, blog search device and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104714863A (en) * 2015-02-06 2015-06-17 浪潮电子信息产业股份有限公司 Method for completely storing Raid card logs on basis of Linux operation system after system crashes
CN107463602A (en) * 2017-06-15 2017-12-12 努比亚技术有限公司 A kind of log processing method and server, client
CN107729214A (en) * 2017-10-13 2018-02-23 福建富士通信息软件有限公司 A kind of visual distributed system monitors O&M method and device in real time
CN107943668A (en) * 2017-12-15 2018-04-20 江苏神威云数据科技有限公司 Computer server cluster daily record monitoring method and monitor supervision platform
CN109492073A (en) * 2018-10-31 2019-03-19 北京达佳互联信息技术有限公司 Blog search method, blog search device and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN105718351B (en) A kind of distributed monitoring management system towards Hadoop clusters
CN107943668A (en) Computer server cluster daily record monitoring method and monitor supervision platform
CN105117171B (en) A kind of energy SCADA mass data distributed processing system(DPS) and its method
CN105302697B (en) A kind of running state monitoring method and system of density data model database
CN107018042A (en) Method for tracing and tracing system for online service system
CN108549671A (en) Real time data acquisition and visual implementation method and device
CN102638378B (en) Mass storage system monitoring method integrating heterogeneous storage devices
CN104011719B (en) The method and system that message is tracked and checked
CN106940677A (en) One kind application daily record data alarm method and device
CN101916507A (en) Bridge health monitoring system
CN1992632B (en) Communication network warning method and warning system
CN107479514B (en) A kind of industry big data process control data acquisition and Transmission system and implementation method
CN111343029B (en) Monitoring platform and method based on topology monitoring of data forwarding nodes
CN111159165A (en) Electric power underground low-power-consumption edge computing system and method based on cloud platform
CN108234176A (en) A kind of monitoring system and its method
CN106095659A (en) The method for real-time monitoring of a kind of destructuring event log data and device
CN109885453A (en) Big data platform monitoring system based on flow data processing
CN112269718A (en) Service system fault analysis method and device
CN105989140A (en) Data block processing method and equipment
CN109361565A (en) Data center's management system for monitoring based on block chain
CN108696369A (en) A kind of warning information processing equipment and method
CN106730833A (en) A kind of network game service condition monitoring system and method
CN109164720A (en) Injection molding machine group of planes remote monitoring system and method based on IIoT
CN109977125A (en) A kind of big data safety analysis plateform system based on network security
CN110266544A (en) The device and method that a kind of the reason of cloud platform micro services serv-fail positions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191001

RJ01 Rejection of invention patent application after publication