CN110297745A - A kind of Fault Locating Method and system storing monitoring system - Google Patents
A kind of Fault Locating Method and system storing monitoring system Download PDFInfo
- Publication number
- CN110297745A CN110297745A CN201910600199.4A CN201910600199A CN110297745A CN 110297745 A CN110297745 A CN 110297745A CN 201910600199 A CN201910600199 A CN 201910600199A CN 110297745 A CN110297745 A CN 110297745A
- Authority
- CN
- China
- Prior art keywords
- log information
- data
- module
- storage
- storage array
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 26
- 238000000034 method Methods 0.000 title claims abstract description 18
- 238000012545 processing Methods 0.000 claims abstract description 52
- 238000013480 data collection Methods 0.000 claims abstract description 25
- 238000013079 data visualisation Methods 0.000 claims abstract description 25
- 238000001914 filtration Methods 0.000 claims abstract description 22
- 238000004891 communication Methods 0.000 claims abstract description 18
- 230000002688 persistence Effects 0.000 claims description 3
- 238000004321 preservation Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 238000000151 deposition Methods 0.000 claims 1
- 238000012423 maintenance Methods 0.000 abstract description 7
- 239000006185 dispersion Substances 0.000 abstract description 4
- 230000007812 deficiency Effects 0.000 abstract description 3
- 230000008878 coupling Effects 0.000 abstract description 2
- 238000010168 coupling process Methods 0.000 abstract description 2
- 238000005859 coupling reaction Methods 0.000 abstract description 2
- 238000013500 data storage Methods 0.000 description 3
- 238000007726 management method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- VQLYBLABXAHUDN-UHFFFAOYSA-N bis(4-fluorophenyl)-methyl-(1,2,4-triazol-1-ylmethyl)silane;methyl n-(1h-benzimidazol-2-yl)carbamate Chemical compound C1=CC=C2NC(NC(=O)OC)=NC2=C1.C=1C=C(F)C=CC=1[Si](C=1C=CC(F)=CC=1)(C)CN1C=NC=N1 VQLYBLABXAHUDN-UHFFFAOYSA-N 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3034—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/3476—Data logging
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The present invention relates to a kind of Fault Locating Methods and system for storing monitoring system, and system includes underlying device data acquisition module, data filtering module, underlying device data collection module, data processing module, data visualization module and data communication module;The present invention solves the monitored item Indexes Comparison dispersion of O&M monitoring system in the prior art, the deficiency that operation maintenance personnel intuitively can not be positioned quickly, the underlying device data acquisition module deployment that the present invention uses is simple, accuracy, real-time are high, each monitoring function module is mutually indepedent, coupling is low, convenient for expanding of system function, the present invention is easy to use, simple and effective, by fault data in data visualization module centralized displaying, ease of use personnel observe it, can fast implement the positioning to fault point.
Description
Technical field
The present invention relates to monitoring fields, more particularly, to a kind of Fault Locating Method and system for storing monitoring system.
Background technique
Continuous along with high-performance calculation business is expanded and the appearance of novel HPC, BD and AI integration technology no matter
It is data center or Supercomputer Center, all to the reliability of the data storage of data storage infrastructure, capacity scale and deposits
Storing up aspect of performance, more stringent requirements are proposed.It is well known that while storage size expands, the probability of software and hardware failure
It can be promoted, and how ensure that the reliability of data storage becomes most important while promoting storage size and performance.Therefore,
Operation maintenance personnel needs to monitor storage software and hardware in real time, finds the problem of storage cluster and handles in time in time.
In existing data center and Supercomputer Center, the O&M monitoring to storage system is mainly that software and hardware separates individually prison
Control.Hardware aspect, with the expansion of storage system scale, store equipment be related to not can avoid different vendor, different brands with
And different frameworks, and the equipment of these different vendors, different brands and different frameworks may all have a set of respective monitoring management
System, operation maintenance personnel needs go study using these management systems, can also aggravate the routine work amount of operation maintenance personnel;Software side
Face, for the difference of business, data center and Supercomputer Center will use the storage software of suitable service operation, and in HPC cluster
System mainly uses Lustre file system as the data management software on storage system upper layer.Existing open source monitoring software
Very much, but in order to improve versatility, each monitored item Indexes Comparison dispersion is live when entire storage system breaks down
The root that failure may occur with problem is inconsistent, and operation maintenance personnel can not be quickly located intuitively where the root of failure.
Summary of the invention
In order to solve the monitored item Indexes Comparison dispersion of O&M monitoring system in the prior art, operation maintenance personnel can not be intuitively fast
Deficiency where the root of the positioning failure of speed, the present invention provides a kind of Fault Locating Methods for storing monitoring system.
A kind of Fault Locating Method storing monitoring system, comprising the following steps:
Step S1: each storage array and storage server in the underlying device module of storage monitoring network dispose bottom
Layer device data acquisition module;
Step S2: collected storage in each storage array of underlying device data collecting module collected and storage server
Array log information and storage server log information, and by collected storage array log information and storage server log
Information is transferred in data filtering module;
Step S3: the data filtering module believes collected storage array log information and storage server log
Breath is filtered by Logstash filter, and Logstash filter is unified to come underlying device data collecting module collected
Data be filtered processing, Logstash filter by initial data filtering lay equal stress on it is newly organized line up general format, reject tool
There are the storage array log information and storage server log information of noise, and by filtered storage array log information and deposits
Storage server log information is transferred in underlying device data collection module;
Step S4: underlying device data collection module is by filtered storage array log information and storage server log
Information, which is stored into database, carries out persistence preservation;
Step S5: the data processing module storage array log information stored of acquisition underlying device data collection module and
Storage server log information, and storage array log information and storage server log information are handled, according to
The data format of Logstash layout finds out data label corresponding with underlying device module one by one, by these data labels
Property Name is converted to, finds out fault point according to Property Name, positions the underlying device module to break down, and will be after processing
Data and fault message be sent in data visualization module and data communication module;
Step S6: treated data and fault message are shown that data are logical by the data visualization module
Letter module sends treated data and fault message in mobile terminal.
Preferably, data processing module believes storage array log information and storage server log in the step S5
The process handled is ceased using timing function, timing function periodically connects underlying device data collection module, and
To in underlying device data collection module storage array log information and storage server log information handle, from acquisition
To storage array log information and storage server log information in filter out fault message, by timing function will handle after
Data and fault message be sent in data communication module.
Preferably, data processing module believes storage array log information and storage server log in the step S5
Breath using restful api function will treated data and fault message jsonization for data visualization module calling.
The present invention also provides a kind of fault location system for storing monitoring system, the system based on above-mentioned method,
System include underlying device data acquisition module, data filtering module, underlying device data collection module, data processing module,
Data visualization module and data communication module;
The underlying device data acquisition module is used to acquire the storage array of each storage array and storage server
Log information and storage server log information;
The data filtering module is used for the storage array log information arrived to underlying device data collecting module collected
It is filtered with storage server log information, rejects the storage array log information with noise and storage server log letter
Breath;
The underlying device data collection module is for the collected storage of storage bottom device data acquisition module
Array log information and storage server log information;
The data processing module is used for collected storage array log information and storage server log information
It is analyzed and processed;
The data visualization module is used for data processing module treated storage array log information and storage
Server log information is visualized;
The data communication module is for storage array log information and storage service after handling data processing module
Device log information is transferred in mobile terminal.
When system works, underlying device data acquisition module BOB(beginning of block) acquires the storage of each storage array and storage server
Array log information and storage server log information, and by collected storage array log information and storage server log
Information is sent to data filtering module, and data filtering module carries out storage array log information and storage server log information
The storage array log information and storage server log information with noise are rejected in filtering, and by filtered storage array
Log information and storage server log information, which are sent in underlying device data collection module, to be stored, and data processing module is the bottom of from
Storage array log information and storage server log information are extracted in layer device data collection module, and to storage array log
Information and storage server log information are analyzed and processed;Will treated data are separately sent to data visualization module with
And in data communication module, treated that data visualize to data processing module for data visualization module;Data
Communication module is transferred to shifting for storage array log information after handling data processing module and storage server log information
In dynamic terminal, checked convenient for staff.
Preferably, the data processing module is provided with multiple, one of them is main data processing module, remaining is standby
With data processing module, when master data processing module breaks down, preliminary data processing module is actuated for work until main
Data processing module restores normal condition.
Preferably, the data processing module and data visualization module use b/s mode.
Compared with prior art, the beneficial effect of technical solution of the present invention is:
The present invention solves the monitored item Indexes Comparison dispersion of O&M monitoring system in the prior art, and operation maintenance personnel can not be straight
The deficiency quickly positioned is seen, the underlying device data acquisition module deployment that the present invention uses is simple, accuracy, real-time are high,
Each functional module is mutually indepedent, coupling is low, is convenient for expanding of system function, easy to use, simple and effective of the invention, by number of faults
According to the centralized displaying in data visualization module, ease of use personnel observe it, can fast implement and determine fault point
Position.
Detailed description of the invention
Fig. 1 is flow chart of the method for the present invention.
Fig. 2 is this system frame diagram.
Specific embodiment
The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent;
In order to better illustrate this embodiment, the certain components of attached drawing have omission, zoom in or out, and do not represent actual product
Size;
To those skilled in the art, it is to be understood that certain known features and its explanation, which may be omitted, in attached drawing
's.
The following further describes the technical solution of the present invention with reference to the accompanying drawings and examples.
Embodiment 1
As shown in Figure 1, a kind of Fault Locating Method for storing monitoring system, comprising the following steps:
Step S1: each storage array and storage server in the underlying device module of storage monitoring network dispose bottom
Layer device data acquisition module;
Step S2: collected storage in each storage array of underlying device data collecting module collected and storage server
Array log information and storage server log information, and by collected storage array log information and storage server log
Information is transferred in data filtering module;
Step S3: the data filtering module believes collected storage array log information and storage server log
Breath is filtered by Logstash filter, and Logstash filter is unified to come underlying device data collecting module collected
Data be filtered processing, Logstash filter by initial data filtering lay equal stress on it is newly organized line up general format, reject tool
There are the storage array log information and storage server log information of noise, and by filtered storage array log information and deposits
Storage server log information is transferred in underlying device data collection module;
Step S4: underlying device data collection module is by filtered storage array log information and storage server log
Information, which is stored into database, carries out persistence preservation;
Step S5: the data processing module storage array log information stored of acquisition underlying device data collection module and
Storage server log information, and storage array log information and storage server log information are handled, according to
The data format of Logstash layout finds out data label corresponding with underlying device module one by one, by these data labels
Property Name is converted to, finds out fault point according to Property Name, positions the underlying device module to break down, and will be after processing
Data and fault message be sent in data visualization module and data communication module;
Step S6: treated data and fault message are shown that data are logical by the data visualization module
Letter module sends treated data and fault message in mobile terminal.
As a preferred embodiment, data processing module to storage array log information and is deposited in the step S5
For the process that storage server log information is handled using timing function, timing function periodically connects underlying device number
According to collection module, and in underlying device data collection module storage array log information and storage server log information into
Row processing, filters out fault message from collected storage array log information and storage server log information, by timing
Device function sends treated data and fault message in data communication module.
As a preferred embodiment, data processing module to storage array log information and is deposited in the step S5
Store up server log information using restful api function will treated data and fault message jsonization for data visualization
Change module to call.
Embodiment 2
As shown in Fig. 2, a kind of unified monitoring system of High-Performance Computing Cluster storage monitoring network is present embodiments provided, it is described
System is based on above-mentioned method, including underlying device data acquisition module, data filtering module, underlying device data collection mould
Block, data processing module, data visualization module and data communication module;
The underlying device data acquisition module is used to acquire the storage array of each storage array and storage server
Log information and storage server log information;
The data filtering module is used for the storage array log information arrived to underlying device data collecting module collected
It is filtered with storage server log information, rejects the storage array log information with noise and storage server log letter
Breath;
The underlying device data collection module is for the collected data of storage bottom device data acquisition module;
The data processing module is for being analyzed and processed collected data;
The data visualization module is used for that treated that data visualize to data processing module;
The data communication module is for storage array log information and storage service after handling data processing module
Device log information is transferred in mobile terminal.
When system works, underlying device data acquisition module BOB(beginning of block) acquires the storage of each storage array and storage server
Array log information and storage server log information, and by collected storage array log information and storage server log
Information is sent to data filtering module, and data filtering module carries out storage array log information and storage server log information
The storage array log information and storage server log information with noise are rejected in filtering, and by filtered storage array
Log information and storage server log information, which are sent in underlying device data collection module, to be stored, and data processing module is the bottom of from
Storage array log information and storage server log information are extracted in layer device data collection module, and to storage array log
Information and storage server log information are analyzed and processed;Will treated data are separately sent to data visualization module with
And in data communication module, treated that data visualize to data processing module for data visualization module;
Data communication module is for storage array log information and storage server log after handling data processing module
Information is transferred in mobile terminal, is checked convenient for staff.
As a preferred embodiment, there are two the data processing module settings, and respectively master data handles mould
Block and preliminary data processing module, when master data processing module breaks down, preliminary data processing module is actuated for work
Make until master data processing module restores normal condition.
As a preferred embodiment, the data processing module and data visualization module use b/s mode.
The same or similar label correspond to the same or similar components;
The terms describing the positional relationship in the drawings are only for illustration, should not be understood as the limitation to this patent;
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair
The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description
To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this
Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention
Protection scope within.
Claims (6)
1. a kind of Fault Locating Method for storing monitoring system, which comprises the following steps:
Step S1: each storage array and storage server deployment bottom in the underlying device module of storage monitoring network are set
Standby data acquisition module;
Step S2: collected storage array in each storage array of underlying device data collecting module collected and storage server
Log information and storage server log information, and by collected storage array log information and storage server log information
It is transferred in data filtering module;
Step S3: the data filtering module is logical to collected storage array log information and storage server log information
It crosses Logstash filter to be filtered, the unified number that underlying device data collecting module collected comes of Logstash filter
According to being filtered processing, Logstash filter by initial data filtering lay equal stress on it is newly organized line up general format, reject to have and make an uproar
The storage array log information and storage server log information of sound, and filtered storage array log information and storage are taken
Business device log information is transferred in underlying device data collection module;
Step S4: underlying device data collection module is by filtered storage array log information and storage server log information
Storage carries out persistence preservation into database;
Step S5: the storage array log information and storage that data processing module acquisition underlying device data collection module is stored
Server log information, and storage array log information and storage server log information are handled, according to Logstash
The data format of layout, finds out data label corresponding with underlying device module one by one, these data labels are converted to category
Property title, find out fault point according to Property Name, position the underlying device module to break down, and will treated data with
And fault message is sent in data visualization module and data communication module;
Step S6: the data visualization module shows treated data and fault message, data communication mould
Block sends treated data and fault message in mobile terminal.
2. a kind of Fault Locating Method for storing monitoring system according to claim 1, which is characterized in that the step
The process that data processing module handles storage array log information and storage server log information in S5 uses timing
Device function, timing function periodically connect underlying device data collection module, and in underlying device data collection module
Storage array log information and storage server log information handled, from collected storage array log information and depositing
Fault message is filtered out in storage server log information, is sent treated data and fault message to by timing function
In data communication module.
3. a kind of Fault Locating Method for storing monitoring system according to claim 2, which is characterized in that the step
Data processing module will locate storage array log information and storage server log information using restful api function in S5
Data and fault message jsonization after reason are called for data visualization module.
4. a kind of fault location system for storing monitoring system, the system are based on the described in any item sides of claims 1 to 3
Method, which is characterized in that including underlying device data acquisition module, data filtering module, underlying device data collection module, data
Processing module, data visualization module and data communication module;
The underlying device data acquisition module is used to acquire the storage array log of each storage array and storage server
Information and storage server log information;
The data filtering module be used for underlying device data collecting module collected to storage array log information and deposit
Storage server log information is filtered, and rejects the storage array log information and storage server log information with noise;
The underlying device data collection module is for the collected storage array of storage bottom device data acquisition module
Log information and storage server log information;
The data processing module is used to carry out collected storage array log information and storage server log information
Analysis processing;
The data visualization module is used for data processing module treated storage array log information and storage service
Device log information is visualized;
The data communication module is for storage array log information and storage server day after handling data processing module
Will information is transferred in mobile terminal.
5. a kind of fault location system for storing monitoring system according to claim 4, which is characterized in that the data
Processing module be provided with it is multiple, one of them be main data processing module, remaining be preliminary data processing module.
6. a kind of fault location system for storing monitoring system according to claim 4, which is characterized in that the data
Processing module and data visualization module use b/s mode.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910600199.4A CN110297745A (en) | 2019-07-04 | 2019-07-04 | A kind of Fault Locating Method and system storing monitoring system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910600199.4A CN110297745A (en) | 2019-07-04 | 2019-07-04 | A kind of Fault Locating Method and system storing monitoring system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110297745A true CN110297745A (en) | 2019-10-01 |
Family
ID=68030352
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910600199.4A Pending CN110297745A (en) | 2019-07-04 | 2019-07-04 | A kind of Fault Locating Method and system storing monitoring system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110297745A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104714863A (en) * | 2015-02-06 | 2015-06-17 | 浪潮电子信息产业股份有限公司 | Method for completely storing Raid card logs on basis of Linux operation system after system crashes |
CN107463602A (en) * | 2017-06-15 | 2017-12-12 | 努比亚技术有限公司 | A kind of log processing method and server, client |
CN107729214A (en) * | 2017-10-13 | 2018-02-23 | 福建富士通信息软件有限公司 | A kind of visual distributed system monitors O&M method and device in real time |
CN107943668A (en) * | 2017-12-15 | 2018-04-20 | 江苏神威云数据科技有限公司 | Computer server cluster daily record monitoring method and monitor supervision platform |
CN109492073A (en) * | 2018-10-31 | 2019-03-19 | 北京达佳互联信息技术有限公司 | Blog search method, blog search device and computer readable storage medium |
-
2019
- 2019-07-04 CN CN201910600199.4A patent/CN110297745A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104714863A (en) * | 2015-02-06 | 2015-06-17 | 浪潮电子信息产业股份有限公司 | Method for completely storing Raid card logs on basis of Linux operation system after system crashes |
CN107463602A (en) * | 2017-06-15 | 2017-12-12 | 努比亚技术有限公司 | A kind of log processing method and server, client |
CN107729214A (en) * | 2017-10-13 | 2018-02-23 | 福建富士通信息软件有限公司 | A kind of visual distributed system monitors O&M method and device in real time |
CN107943668A (en) * | 2017-12-15 | 2018-04-20 | 江苏神威云数据科技有限公司 | Computer server cluster daily record monitoring method and monitor supervision platform |
CN109492073A (en) * | 2018-10-31 | 2019-03-19 | 北京达佳互联信息技术有限公司 | Blog search method, blog search device and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105718351B (en) | A kind of distributed monitoring management system towards Hadoop clusters | |
CN107943668A (en) | Computer server cluster daily record monitoring method and monitor supervision platform | |
CN105117171B (en) | A kind of energy SCADA mass data distributed processing system(DPS) and its method | |
CN105302697B (en) | A kind of running state monitoring method and system of density data model database | |
CN107018042A (en) | Method for tracing and tracing system for online service system | |
CN108549671A (en) | Real time data acquisition and visual implementation method and device | |
CN102638378B (en) | Mass storage system monitoring method integrating heterogeneous storage devices | |
CN104011719B (en) | The method and system that message is tracked and checked | |
CN106940677A (en) | One kind application daily record data alarm method and device | |
CN101916507A (en) | Bridge health monitoring system | |
CN1992632B (en) | Communication network warning method and warning system | |
CN107479514B (en) | A kind of industry big data process control data acquisition and Transmission system and implementation method | |
CN111343029B (en) | Monitoring platform and method based on topology monitoring of data forwarding nodes | |
CN111159165A (en) | Electric power underground low-power-consumption edge computing system and method based on cloud platform | |
CN108234176A (en) | A kind of monitoring system and its method | |
CN106095659A (en) | The method for real-time monitoring of a kind of destructuring event log data and device | |
CN109885453A (en) | Big data platform monitoring system based on flow data processing | |
CN112269718A (en) | Service system fault analysis method and device | |
CN105989140A (en) | Data block processing method and equipment | |
CN109361565A (en) | Data center's management system for monitoring based on block chain | |
CN108696369A (en) | A kind of warning information processing equipment and method | |
CN106730833A (en) | A kind of network game service condition monitoring system and method | |
CN109164720A (en) | Injection molding machine group of planes remote monitoring system and method based on IIoT | |
CN109977125A (en) | A kind of big data safety analysis plateform system based on network security | |
CN110266544A (en) | The device and method that a kind of the reason of cloud platform micro services serv-fail positions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191001 |
|
RJ01 | Rejection of invention patent application after publication |