CN109522287A

CN109522287A - Monitoring method, system, equipment and the medium of distributed document storage cluster

Info

Publication number: CN109522287A
Application number: CN201811087179.3A
Authority: CN
Inventors: 王涛
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-09-18
Filing date: 2018-09-18
Publication date: 2019-03-26
Anticipated expiration: 2038-09-18
Also published as: CN109522287B

Abstract

The invention discloses a kind of monitoring methods of distributed document storage cluster, system, equipment and medium, the described method includes: monitoring server receives the monitoring configuration information for the distributed document storage cluster that monitor supervision platform is sent, and receive the internal state for the distributed document storage cluster that monitor client is regularly sent, it is for statistical analysis to the internal state of cluster, obtain monitored item purpose real-time monitoring data, if monitored item purpose real-time monitoring data meets exceptional condition, then monitoring server generates abnormal problem, and the abnormal central server repaired instruction and be sent to distributed document storage cluster is generated according to abnormal problem, so that central server calls corresponding abnormal recovery scenario to repair abnormal problem.By the real time monitoring to distributed document storage cluster, can note abnormalities the present invention problem in time, and repair to abnormal problem, maintain the health status of cluster in time, improve the O&M efficiency of distributed document storage cluster.

Description

Monitoring method, system, equipment and the medium of distributed document storage cluster

Technical field

The present invention relates to field of computer technology more particularly to a kind of monitoring method of distributed document storage cluster, it is System, equipment and medium.

Background technique

CEPH is the distributed file storage system of an open source, provides the function of object, block and file storage, CEPH It is widely used in the data management service system of each company, improves the serious forgiveness and storage efficiency of data, it can be managed With the data of analysis magnanimity, and the data of the big order of magnitude can be provided for the access of thousands of users, dramatically save manual resource And administration overhead.

However, the distributed storage of CEPH can generally possess numerous node servers, compare in terms of monitoring O&M Complexity, if breaking down hidden danger in server cluster, it is not easy to the place gone wrong is positioned in time, at present in server cluster When something goes wrong, the reason of manually checking failure problems is needed, so that the period of orientation problem is longer, reduces CEPH cluster O&M efficiency.

Summary of the invention

Monitoring method, system, equipment and the medium of a kind of distributed document storage cluster are provided in the embodiment of the present invention, with Solve the problems, such as CEPH cluster orientation problem not in time and O&M low efficiency.

A kind of monitoring method of distributed document storage cluster, comprising:

Monitoring server receives the monitoring configuration information for the distributed document storage cluster that monitor supervision platform is sent, wherein institute Stating monitoring configuration information includes monitoring project and exceptional condition；

The monitoring server receives the inside shape for the distributed document storage cluster that monitor client is regularly sent State, wherein the monitor client is previously deployed at the corresponding node clothes of monitoring node of the distributed document storage cluster It is engaged on device, the internal state of the distributed document storage cluster is corresponding from the monitoring node by the monitor client timing Node server in obtain；

The monitoring server is according to the monitoring configuration information, to the internal state of the distributed document storage cluster It is for statistical analysis, obtain the monitored item purpose real-time monitoring data；

If the monitored item purpose real-time monitoring data meets the exceptional condition, the monitoring server is by the prison Control project is determined as exception object, using the real-time monitoring data as abnormal data, and according to the exception object and described Abnormal data generates abnormal problem；

The monitoring server generates abnormal repair according to the abnormal problem and instructs, and the exception is repaired instruction hair It is sent to the central server of the distributed document storage cluster；

If the central server receives abnormal repair and instructs, the abnormal reparation instruction is parsed, And corresponding abnormal recovery scenario is called to repair the abnormal problem according to parsing result.

A kind of monitoring system of distributed document storage cluster, comprising: monitoring server and central server, wherein institute It states and passes through network connection between monitoring server and the central server；

The monitoring server includes:

Configuration module is monitored, the monitoring configuration information of the distributed document storage cluster for receiving monitor supervision platform transmission, Wherein, the monitoring configuration information includes monitoring project and exceptional condition；

Data reception module, for receiving the inside for the distributed document storage cluster that monitor client is regularly sent State, wherein the monitor client is previously deployed at the corresponding node of monitoring node of the distributed document storage cluster On server, the internal state of the distributed document storage cluster is by the monitor client periodically from the monitoring node pair It is obtained in the node server answered；

Data analysis module is used for according to the monitoring configuration information, to the inside of the distributed document storage cluster State is for statistical analysis, obtains the monitored item purpose real-time monitoring data；

Abnormal confirmation module, if meeting the exceptional condition for the monitored item purpose real-time monitoring data, by institute State monitoring project and be determined as exception object, using the real-time monitoring data as abnormal data, and according to the exception object and The abnormal data generates abnormal problem；

Abnormal notification module is instructed for generating abnormal repair according to the abnormal problem, and abnormal repair is referred to Enable the central server for being sent to the distributed document storage cluster；

The central server includes:

Abnormal repair module solves the abnormal reparation instruction if instructing for receiving abnormal repair Analysis, and call corresponding abnormal recovery scenario to repair the abnormal problem according to parsing result.

A kind of computer equipment, including memory, processor and storage are in the memory and can be in the processing The computer program run on device, the processor realize above-mentioned distributed document storage cluster when executing the computer program Monitoring method the step of.

A kind of computer readable storage medium, the computer-readable recording medium storage have computer program, the meter Calculation machine program realizes the step of monitoring method of above-mentioned distributed document storage cluster when being executed by processor.

Monitoring method, system, equipment and the medium of above-mentioned distributed document storage cluster are received by monitoring server and are used Family is directed to the monitoring configuration information of distributed document storage cluster configuration in monitor supervision platform, according to the monitor client disposed in advance The internal state of timing acquisition distributed document storage cluster is uploaded to monitoring server, and monitoring server is then to distributed document The internal state of storage cluster is for statistical analysis, obtains monitored item purpose real-time monitoring data, enables monitoring server Distributed document storage cluster is monitored in real time, the customization of monitored item purpose may be implemented, meanwhile, if monitored item purpose is real When monitoring data meet exceptional condition, monitoring server then generates corresponding abnormal problem, and is generated according to abnormal problem different Instruction is often repaired, by the abnormal central server repaired instruction and be sent to distributed document storage cluster, central server is being connect After receiving abnormal reparation instruction, abnormal reparation instruction is parsed, and according to the corresponding abnormal reparation of parsing result calling Scheme repairs abnormal problem, the health status of distributed document storage cluster is maintained in time, so that distributed document is deposited Accumulation can operate normally, and the O&M efficiency of distributed document storage be improved, to improve distributed document storage cluster Intelligent management is horizontal.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings Obtain other attached drawings.

Fig. 1 is an application environment schematic diagram of the monitoring method of distributed document storage cluster in one embodiment of the invention；

Fig. 2 is a flow chart of the monitoring method of distributed document storage cluster in one embodiment of the invention；

Fig. 3 is monitoring server output monitoring in the monitoring method of distributed document storage cluster in one embodiment of the invention One specific flow chart of data；

Fig. 4 is that monitoring server sends alarm in the monitoring method of distributed document storage cluster in one embodiment of the invention One specific flow chart of information；

Fig. 5 is the one of step S60 specific stream in the monitoring method of distributed document storage cluster in one embodiment of the invention Cheng Tu；

Fig. 6 is that central server sends reparation in the monitoring method of distributed document storage cluster in one embodiment of the invention As a result a specific flow chart；

Fig. 7 is a functional block diagram of the monitoring system of distributed document storage cluster in one embodiment of the invention；

Fig. 8 is a schematic diagram of computer equipment in one embodiment of the invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.

The monitoring method of distributed document storage cluster provided by the present application, can be applicable in the application environment such as Fig. 1, In, distributed document storage cluster includes central server and several node servers, and monitoring server is received by network The internal state for the distributed document storage cluster that monitor client is obtained from node server in real time, obtains monitoring data, Monitoring server analyzes monitoring data, real-time monitoring data is output to monitor supervision platform, and when there is abnormal problem, Abnormal repair is sent by network to instruct to the central server of distributed document storage cluster, and node is taken by central server Business device is managed maintenance.Monitor client and monitor supervision platform specifically can be, but not limited to be various personal computers, notebook Computer, smart phone, tablet computer and portable wearable device etc..Distributed document storage collection provided in an embodiment of the present invention The monitoring method of group depends on monitoring server and the common cooperation of central server is completed.

In one embodiment, Fig. 2 shows a flow chart of the monitoring method of distributed document storage cluster in the present embodiment, As shown in Fig. 2, the monitoring method of the distributed document storage cluster includes step S10 to step S60, details are as follows:

S10: monitoring server receives the monitoring configuration information for the distributed document storage cluster that monitor supervision platform is sent, In, monitoring configuration information includes monitoring project and exceptional condition.

In embodiments of the present invention, distributed document storage cluster is the function of an offer object, block and file storage Distributed file storage system, the server cluster which is made of multiple servers realizes, Distributed document storage cluster includes central server and node server, wherein central server is used for node server It is managed, node server is used for storage management file.

Monitoring server is the server being monitored for the internal state to distributed document storage cluster, monitoring clothes Business implement body can be, but not limited to be NAGIOS (network monitoring) server, ZABBIX (system monitoring) server and GANGLIA (cluster monitoring) server, monitor supervision platform is that monitoring server provides the interactive tool for being used for monitoring management, for example, monitor supervision platform Specifically it can be the virtual terminals such as browser, so that user can configure in monitor supervision platform and checking monitoring information.

Specifically, user is pre-configured with the monitoring configuration information of distributed document storage cluster, monitoring clothes in monitor supervision platform Business device receives the monitoring configuration information for the distributed document storage cluster that monitor supervision platform is sent by network, monitoring configuration information again Including monitoring project and exceptional condition, which includes the IP address of monitored object and monitored object, and exceptional condition is prison It controls in configuration information and is supervised for the Rule of judgment of monitored object setting for judging whether the monitored object is in normal condition Control project can be monitoring server and default the conventional monitoring project being monitored to the internal state that distributed document stores, example Such as, resource utilization, disk size and network flow etc. are monitored, are also possible to the customized monitoring project of user, had The monitoring project of body can be customized according to actual needs, herein with no restrictions.

Preferably, distributed document storage cluster specifically can be CEPH cluster, and CEPH cluster is a kind of distribution of open source The file storage security of formula document storage system, CEPH cluster is high and file storage efficiency is fast.

For example, monitoring project specifically can be monitoring CEPH when distributed document storage cluster is specially CEPH cluster OSD (Object Storage Device, object storage device) is a in the information of the active state of cluster, monitoring CEPH cluster The monitoring project such as 80 port connection numbers of number or monitoring CEPH cluster interior joint server, wherein the major function of OSD is to deposit Data, replicate data, equilibrium data and recovery data etc. are stored up, provide storage service for CEPH cluster.For monitoring CEPH cluster The monitoring project of 80 port connection numbers of interior joint server, the monitored item purpose exceptional condition specifically can be set are as follows: node 80 port connection numbers of server are less than 5.If monitoring the 80 port connection numbers of certain node server less than 5, table It is abnormal to show that 80 ports of the node server occur, meets the exceptional condition in preset monitoring configuration information.

S20: monitoring server receives the internal state for the distributed document storage cluster that monitor client is regularly sent, In, monitor client is previously deployed on the corresponding node server of monitoring node of distributed document storage cluster, distributed The internal state of file storage cluster is obtained from the corresponding node server of monitoring node by monitor client timing.

In embodiments of the present invention, the monitoring node of distributed document storage cluster refers to distributed document storage cluster For collecting the node server of the internal state of distributed document storage cluster.

Preferably, when distributed document storage cluster is specially CEPH cluster, monitoring node is that the MON of CEPH cluster is saved Point, MON node are stored with the cluster view of CEPH cluster state, which includes reflecting for the Servers-all of CEPH cluster The real time information of figure is penetrated, CEPH cluster, which needs to send to MON node before being written and read data, requests, and request is newest to reflect Figure is penetrated, and calculates the storage position of data by mapping graph, to carry out corresponding read operation.

Specifically, the visitor of deployment monitoring in advance in the corresponding node server of monitoring node of distributed document storage cluster Family end, monitor client actively obtain the internal state of distributed document storage cluster using preset Telescript, wherein logical Letter script is the shell script edited in advance, which is used to obtain the inside of distributed document storage cluster State, and the internal state that will acquire are sent to monitoring server.

Preferably, when distributed document storage cluster is specially CEPH cluster, monitor client is in preset communication foot The built-in command that CEPH is used in this obtains the internal state of CEPH cluster, example from the corresponding node server of monitoring node Such as, the instruction such as " CEPH-s ", " CEPH pg stat " or " CEPH osd dump " is used, wherein " CEPH-s " instruction is to look into See the state of cluster, " CEPH pg stat " instruction is to check the state of pg, and " CEPH osd dump " instruction is to check osd's State, pg be in CEPH cluster data store put in order group, for by data carry out logic grouping.

Specifically, the internal state for the distributed document storage cluster that monitor client obtains is monitoring data, monitoring Monitoring data is sent to monitoring server network by client in the form of message, which is that monitor client will monitor The message that data are formed according to preset form collator, wherein message is the data cell exchanged in network with transmission, message energy Enough completely includes data information to be sent, and there is no limit the numbers that disposably can need to send by message transmissions for length It is believed that breath, preset format can be specifically configured according to actual needs, herein with no restrictions.

Preferably, when distributed document storage cluster is specially CEPH cluster, timed task is arranged in monitor client, fixed When using preset Telescript obtain CEPH cluster internal state, and by the internal state of CEPH cluster be uploaded to monitoring clothes It is engaged in device, which can be configured according to the needs of application, for example, timed task specifically can be by using Crontab order configure corresponding configuration file, and by " */3****/etc/zabbix/scripts/CEPH- Status.sh 192.168.1.15CEPH_MON > > etc/zabbix/scripts/CEPH-status.log " order write-in should In configuration file, monitor client executes the configuration file, can with the data of the internal state of timing acquisition CEPH cluster, In, Crontab order is for the instruction being periodically performed, " */3****/etc/zabbix/scripts/CEPH- to be arranged Status.sh 192.168.1.15CEPH_MON > >/etc/zabbix/scripts/CEPH-status.log " order indicates The monitoring data that every 3 minutes monitor clients will be collected by preset Telescript, being sent to IP address is 192.168.1.15 monitoring server.

S30: monitoring server counts the internal state of distributed document storage cluster according to monitoring configuration information Analysis, obtains monitored item purpose real-time monitoring data.

Specifically, monitoring server receives the message that monitor client is sent, and parses to message, reads in message Monitoring data configured according to preset monitoring to obtain the monitoring data of the internal state of distributed document storage cluster Information, it is for statistical analysis to the internal state of distributed document storage cluster, it obtains corresponding with monitored item purpose monitored object Monitoring data, obtain each monitored item purpose real-time monitoring data.

S40: if monitored item purpose real-time monitoring data meets exceptional condition, monitoring project is determined as by monitoring server Exception object using real-time monitoring data as abnormal data, and generates abnormal problem according to exception object and abnormal data.

In the embodiment of the present invention, monitoring server is after statistics obtains each monitored item purpose real time data, by the prison The real-time monitoring data of control project exceptional condition corresponding with the monitoring project is compared, and whether comparison real-time monitoring data accords with It closes and is directed to the preset exceptional condition of monitoring project.

Specifically, if monitored item purpose real-time monitoring data meets exceptional condition, monitoring server is by monitored item purpose Monitored object is determined as exception object, and the IP address of monitored object is abnormal address, and using real-time monitoring data as abnormal Data indicate that the monitored item purpose real-time monitoring data is in abnormality, need to carry out maintenance processing to the exception object, together When, monitoring server generates abnormal problem according to exception object and abnormal data, and the abnormal problem is for describing in distributed text It is abnormal the specific object and specific abnormal data of problem in part storage cluster, operation maintenance personnel is asked according to abnormal Topic carries out positioning problems to distributed document storage cluster rapidly.

It is understood that if monitored item purpose real-time monitoring data does not meet exceptional condition, then it represents that the monitoring project Real-time monitoring data be in normal condition, which can normally run, Maintenance free.

S50: monitoring server generates abnormal repair according to abnormal problem and instructs, and exception is repaired instruction and is sent to distribution The central server of formula file storage cluster.

In embodiments of the present invention, distributed document storage cluster includes central server and node server, in it is genuinely convinced Business device is the centre management service for carrying out the management operations such as resource management, performance maintenance and monitoring configuration to node server Device, node server are the servers that the operations such as data processing and data storage are carried out for object, block or file etc.；

Specifically, for the monitoring project for meeting exceptional condition, monitoring server generates corresponding different according to abnormal problem Instruction is often repaired, this repairs order, abnormal address, exception object and the abnormal data that instruction includes request maintenance extremely, and different The central server that instruction is sent to distributed document storage cluster is often repaired, request distributed document storage cluster carries out abnormal Maintenance.

S60: if central server receives abnormal repair and instructs, abnormal reparation instruction is parsed, and according to solution Abnormal recovery scenario repairs abnormal problem accordingly for analysis call by result.

Specifically, if central server receives abnormal repair and instructs, instruction is repaired to the exception and is parsed, is obtained Abnormal abnormal address, exception object and the abnormal data repairing instruction and carrying, and determined according to the exception object and abnormal data The abnormal problem of appearance.

Central server calls corresponding abnormal recovery scenario according to abnormal problem, in distributed document storage cluster The abnormal problem that the corresponding server of abnormal address occurs is safeguarded that the exception recovery scenario is stored according to distributed document The recovery scenario that some common abnormal conditions in cluster are configured in advance, so as to central server be able to use it is preset Abnormal recovery scenario carries out intelligent reparation to the abnormal problem occurred in distributed document storage cluster in time.

For example, monitoring project is the disk of server A 1 in monitoring distributed file storage cluster in a monitoring project Utilization rate, and the corresponding exceptional condition of monitoring project is more than 95% for the disk size of server A 1.In monitoring server In the monitoring data of acquisition, if the disk size of server A 1 is more than 95%, monitoring server is true by the address of server A 1 It is set to abnormal address, the disk of server A 1 is determined as exception object, the disk size of server A 1 is determined as abnormal data, Monitoring server generates abnormal problem according to exception object and abnormal data, and generates abnormal repair according to abnormal problem and instruct, The exception is repaired into the central server that instruction is sent to distributed document storage cluster, requests central server to server A 1 Disk safeguarded that central server then will repair extremely instruction according to this and obtain corresponding abnormal recovery scenario, for example, right The journal file of caching carries out cleaning or carries out the abnormal recovery scenario of compression processing to history file, is repaired extremely using this Scheme safeguards the disk of server A 1, repairs the abnormal problem that distributed document storage cluster occurs.

In the present embodiment, user is received by monitoring server to match in monitor supervision platform for distributed document storage cluster The monitoring configuration information set, according to the internal state for the monitor client timing acquisition distributed document storage cluster disposed in advance It is uploaded to monitoring server, monitoring server is then for statistical analysis to the internal state of distributed document storage cluster, obtains Monitored item purpose real-time monitoring data, enables monitoring server to monitor distributed document storage cluster in real time, can To realize that monitored item purpose customizes, meanwhile, if monitored item purpose real-time monitoring data meets exceptional condition, monitoring server Corresponding abnormal problem is generated, and abnormal repair is generated according to abnormal problem and is instructed, exception is repaired into instruction and is sent to distribution The central server of formula file storage cluster, central server repair instruction after receiving abnormal reparation instruction, to abnormal It is parsed, and calls corresponding abnormal recovery scenario to repair abnormal problem according to parsing result, maintain distribution in time The health status of formula file storage cluster, enables distributed document storage cluster to operate normally, and improves distributed document and deposits The O&M efficiency of storage, so that the intelligent management for improving distributed document storage cluster is horizontal.

In one embodiment, after step S30, i.e., in monitoring server according to monitoring configuration information, to distributed document The internal state of storage cluster is for statistical analysis, and after obtaining monitored item purpose real-time monitoring data, which is deposited Monitoring server can also export real-time monitoring data according to preset output template in the monitoring method of accumulation, be described in detail such as Under:

As shown in figure 3, the monitoring method of the distributed document storage cluster further includes following steps after step S30:

S31: real-time monitoring data is monitored data filling according to preset output template by monitoring server, obtains mesh Mark data.

Specifically, user is that each monitoring allocation of items is corresponding after monitor supervision platform has configured monitoring configuration information Output template, the output template are the templates being configured in advance, for that will monitor the output of obtained monitoring data, monitoring clothes Be engaged in device it is for statistical analysis in the internal state to distributed document storage cluster, obtain monitored item purpose real-time monitoring data it Afterwards, real-time monitoring data is monitored data filling according to preset output template, to obtain showing in output template Target data, wherein preset output template can be monitor supervision platform offer sample form, be also possible to user addition Self-defined template, for example, the template in the form of figure, text or report etc. shows, specifically showing form can be according to reality Border needs to be configured, herein with no restrictions.

S32: target data is output to monitor supervision platform by monitoring server, so that user checks distribution by monitor supervision platform The real-time status of file storage cluster.

Specifically, target data is output to monitor supervision platform by monitoring server, shows distributed document in real time for user The real-time status of storage cluster, and by the target data for meeting exceptional condition to mark red or amplification form in monitor supervision platform It shows, plays eye-catching effect, for being different from the monitoring project for being in normal condition, so that user is in the target data of output In, it can quickly know abnormal monitoring project.

S33: monitoring server stores target data into preset historical data base.

Specifically, preset historical data base is in monitoring server for storing the database of target data, monitoring clothes Business device stores target data into preset historical data base, so that user can be to the history of distributed document storage cluster Status data checked, wherein the preset historical data base specifically can be oracle database or MongoDB number According to library etc., specific type of database can be selected according to actual needs, herein with no restrictions.

S34: monitoring server is according to the target data in historical data base, to the operation shape of distributed document storage cluster State is analyzed, and is analyzed as a result, so that user based on the analysis results safeguards distributed document storage cluster.

Specifically, monitoring server to the operating status of distributed document storage cluster carry out analysis include within 1 day, Target data within 1 week and within one month is analyzed, and the analysis result analyzed includes abnormal monitored item occur There is abnormal monitored item object time section and abnormal monitored item purpose total time occurs in mesh, and user can take according to monitoring The analysis that business device is analyzed is as a result, optimizing and safeguarding to distributed document storage cluster, for example, if node server A2 Disk size occur 6 abnormal problems within one week, then user can be according to the analysis result to node server A2's Disk size carries out the processing such as dilatation, to increase the memory capacity of node server A2, promotes distributed document storage cluster Performance.

In the present embodiment, real-time monitoring data is monitored according to preset output template by number by monitoring server It according to filling, obtains target data and is output to monitor supervision platform, so that user checks distributed document storage cluster by monitor supervision platform Real-time status, and can quickly know abnormal monitoring project in target data, find distributed document storage collection in time The abnormal problem of group, meanwhile, monitoring server stores target data into preset historical data base, and according to historical data Target data in library analyzes the operating status of distributed document storage cluster, is analyzed as a result, so that user's root Distributed document storage cluster is optimized and safeguarded according to analysis result, so as to promote distributed document storage cluster Performance.

In one embodiment, after step S40, if meeting exceptional condition in monitored item purpose real-time monitoring data, Monitoring project is determined as exception object by monitoring server, using real-time monitoring data as abnormal data, and according to exception object After generating abnormal problem with abnormal data, monitoring server can also be given birth in the monitoring method of the distributed document storage cluster It is sent in preset alarm address at warning information, details are as follows:

As shown in figure 4, the monitoring method of the distributed document storage cluster further includes following steps after step S30:

S41: monitoring server determines the severity of abnormal problem according to preset service attribute.

In the present embodiment, the severity of abnormal problem includes " warning ", " general serious ", " serious " and " disaster " four A grade, preset service attribute are the business functions according to the monitored item purpose monitored object in distributed document storage cluster The content being configured in advance, monitoring server go to determine abnormal after monitoring abnormal problem according to preset service attribute The severity of problem.

For example, monitoring project is the number of OSD service state in monitoring distributed file storage cluster, if the monitoring project Real-time monitoring data meet exceptional condition, then the monitoring project occur abnormal problem severity be " disaster " rank, Indicate that the abnormal problem needs to solve immediately, otherwise distributed document storage cluster will collapse.

And the 80 port connection numbers that project is monitoring server A3 are monitored, meet in the monitored item purpose real-time monitoring data When exceptional condition, monitoring server can determine the tight of the abnormal problem that the monitoring project occurs according to preset service attribute Weight degree is " warning " rank, and the specific severity of abnormal problem can be according to monitored object in distributed document storage cluster Business function be determined.

S42: monitoring server generates warning information, and selection and abnormal problem according to preset format according to abnormal problem The corresponding alarm sending method of severity.

Specifically, preset format specifically can be the format that monitoring report, or alarm mail etc. are configured in advance, But it is not limited to this, can specifically be configured according to the needs of practical application, and monitoring server can will be in abnormal problem Appearance is filled up to the preset format and generates warning information, and chooses alarm sender corresponding with the severity of abnormal problem Formula, the alarm sending method are the preset modes to send a warning message of severity according to abnormal problem, specific to alert Sending method can be configured according to the needs of practical application.

For example, the severity of abnormal problem is the warning information of " disaster " grade, then sending method is alerted accordingly are as follows: Monitoring server will send a warning message always according to preset monitored item purpose monitoring frequency, until the preset monitoring project Monitoring data be in normal condition until, urge related personnel carry out distributed document storage cluster maintenance.

And the severity of abnormal problem is the warning information of " warning " grade, then alerts sending method accordingly are as follows: needle To identical warning information, monitoring server only makees primary transmission processing, until at the preset monitored item purpose monitoring data In normal condition and then secondary there is the warning information just and will continue to retransmit warning information.

S43: warning information is sent to preset alarm address according to alarm sending method by monitoring server.

Specifically, monitoring server obtains preset alarm address, and by warning information according to corresponding alarm sender Formula is sent to preset alarm address, which is the reception address of warning information, the preset alarm address Including but not limited to email address, the address Jabber and short message address, wherein Jabber is a Instant Messenger of linux system Inquiry server.

In the present embodiment, the severity of abnormal problem is determined according to preset service attribute by monitoring server, Meanwhile warning information is generated according to preset format according to abnormal problem, choose announcement corresponding with the severity of abnormal problem Warning information is sent to preset alarm address according to alarm sending method by alert sending method, for different abnormal problems, It takes different sending methods to notify operation maintenance personnel, operation maintenance personnel is enabled to take corresponding maintenance side according to warning information Formula improves the maintenance efficiency of distributed document storage cluster.

In one embodiment, if the present embodiment provides receive abnormal repair to the central server mentioned in step S60 Multiple instruction then parses abnormal reparation instruction, and calls corresponding abnormal recovery scenario to ask exception according to parsing result The concrete methods of realizing repaired is inscribed to be described in detail.

Referring to Fig. 5, Fig. 5 shows a specific flow chart of step S60, details are as follows:

S601: central server receives abnormal repair and instructs, and repairs to instruct according to exception and determine abnormal problem, and out The node server of existing abnormal problem.

Specifically, central server receive that monitoring server sends it is abnormal repair instruction, and to abnormal reparation instruct into Row parsing, obtains abnormal address, exception object and abnormal data, so that it is determined that abnormal problem and the node for abnormal problem occur take Business device.

S602: central server is searched and abnormal problem pair from preset abnormal recovery scenario library according to abnormal problem The priority level of the abnormal recovery scenario and each abnormal recovery scenario answered.

Specifically, central server triggers central server based on the abnormal order for repairing the request maintenance that instruction carries Management service operation to node server, central server are looked into from preset abnormal recovery scenario library according to abnormal problem Look for abnormal recovery scenario corresponding with abnormal problem, wherein abnormal recovery scenario is according in distributed document storage cluster The recovery scenario that some common abnormal conditions are configured in advance, and be every according to the repairing effect of each abnormal recovery scenario A exception recovery scenario distributes priority level, is stored in abnormal recovery scenario library, which is for storing The database of abnormal recovery scenario.

S603: central server successively obtains each according to the priority level sequence from high to low of abnormal recovery scenario Abnormal recovery scenario repairs the abnormal problem of node server, until monitored item purpose real-time monitoring data do not meet it is different Until normal condition or each abnormal recovery scenario are called.

Specifically, central server according to abnormal recovery scenario priority level sequence from high to low, for example, first repairs Compound case, the second recovery scenario, third recovery scenario ... etc. successively obtain each abnormal recovery scenario to the different of node server Chang Wenti is repaired, until monitored item purpose real-time monitoring data does not meet exceptional condition or each abnormal recovery scenario Until being called.

For example, in a monitoring project, the disk utilization rate that monitoring project is monitoring node server A4, and the monitoring The corresponding exceptional condition of project is that the disk size of server A 4 is more than 95%, in the monitoring data that monitoring server obtains, If the disk size of server A 4 is more than 95%, monitoring server sends abnormal repair and instructs to central server, to divide The central server of cloth file storage cluster safeguards that central server is then repaired according to the exception to the disk of server A 4 Multiple instruction, obtains corresponding all abnormal recovery scenarios, and according to the excellent of abnormal recovery scenario in abnormal recovery scenario library First rank to server A 4 carry out problem reparation, call first the first recovery scenario to the journal file cached in server A 4 into Row cleaning, if the disk size of server A 4 then continues still above 95% after being repaired using the first recovery scenario The second recovery scenario is called to carry out compression processing etc. to the history file in server A 4, at the disk size of server A 4 Until normal state or each abnormal recovery scenario are invoked once.

In the present embodiment, by central server receive it is abnormal repair instruction, and according to abnormal problem, from preset different The priority level that abnormal recovery scenario corresponding with abnormal problem and each abnormal recovery scenario are searched in normal recovery scenario library, is pressed According to the priority level sequence from high to low of abnormal recovery scenario, each abnormal recovery scenario is successively obtained to node server Abnormal problem is repaired, and for same abnormal problem, is had multiple abnormal recovery scenarios that can carry out problem reparation, is improved abnormal The repair rate of problem, and the priority level by confirming abnormal recovery scenario, the preferably abnormal recovery scenario of first using effect Carry out problem reparation, can be improved the maintenance efficiency of distributed document storage cluster.

In one embodiment, it after step S60, is instructed if receiving abnormal repair in central server, to exception It repairs instruction to be parsed, and after calling corresponding abnormal recovery scenario to repair abnormal problem according to parsing result, Central server can also will repair result and be sent to preset Instant Messenger in the monitoring method of the distributed document storage cluster It interrogates in address, details are as follows:

As shown in fig. 6, the monitoring method of the distributed document storage cluster further includes following steps after step S60:

S61: central server detects the monitoring project after node server reparation, obtains repairing result.

Specifically, central server is after carrying out problem reparation to abnormal problem using abnormal recovery scenario, to appearance The monitoring project of abnormal node server is detected, if the monitored item purpose real-time monitoring data after repairing does not meet this The corresponding exceptional condition of monitoring project, then repairing result is to repair successfully, which restores normal condition, otherwise, if repairing Monitored item purpose real-time monitoring data after multiple meets the corresponding exceptional condition of monitoring project, then repairs result and lose to repair It loses, indicates that the monitoring project is still within abnormality.

Further, if repairing result is that successfully, central server sends abnormal problem and reparation result to preset Operative communication address, so that operation maintenance personnel understands the reparation record of distributed document storage cluster, so as to distributed text Part storage cluster carries out further Optimal Maintenance, which is operation maintenance personnel for handling work event Information receive address, which includes but is not limited to public mailbox address, individual mailbox address and short Believe the addresses such as address.

S62: if repairing result is failure, abnormal problem and reparation result are sent preset Instant Messenger by central server Address is interrogated, so that operation maintenance personnel carries out manual maintenance to distributed document storage cluster in time according to abnormal problem.

Specifically, if repairing result is failure, then it represents that the abnormal recovery scenario in abnormal recovery scenario library can not solve There is no abnormal repair to instruct corresponding abnormal recovery scenario for corresponding abnormal problem or central server, then center service Abnormal problem and reparation result are sent preset instant messaging address by device, which is operation maintenance personnel Information for handling emergency receives address, which includes but is not limited to IM (Instant Messenger, instant messaging) information is with receiving the instant messagings such as address, wechat information reception address and ICQ information reception address Location carries out the abnormal problem of appearance so that operation maintenance personnel is capable of the internal state of timely learning distributed document storage cluster Manual maintenance avoids distributed document storage cluster from going wrong, and causes the loss of data.

In the present embodiment, the monitoring project after node server reparation is detected by central server, is obtained To reparation as a result, abnormal problem and reparation result are sent preset Instant Messenger by central server if repairing result is failure Address is interrogated, abnormal problem is informed into operation maintenance personnel in time, so that operation maintenance personnel in time deposits distributed document according to abnormal problem Accumulation carries out manual maintenance, maintains the health status of distributed document storage cluster, avoids the loss of data.

It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.

In one embodiment, a kind of monitoring system of distributed document storage cluster, distributed document storage collection are provided The monitoring method of distributed document storage cluster corresponds in the monitoring system and above-described embodiment of group.As shown in fig. 7, this point The monitoring system of cloth file storage cluster includes monitoring server and central server, and wherein monitoring server includes that monitoring is matched Set module 71, data reception module 72, data analysis module 73, abnormal confirmation module 74 and abnormal notification module 75, in it is genuinely convinced Business device includes abnormal repair module 76, and detailed description are as follows for each functional module:

Monitoring server includes:

Configuration module 71 is monitored, matches confidence for receiving the monitoring of distributed document storage cluster of monitor supervision platform transmission Breath, wherein monitoring configuration information includes monitoring project and exceptional condition；

Data reception module 72, for receiving the inside shape for the distributed document storage cluster that monitor client is regularly sent State, wherein monitor client is previously deployed on the corresponding node server of monitoring node of distributed document storage cluster, point The internal state of cloth file storage cluster is obtained from the corresponding node server of monitoring node by monitor client timing；

Data analysis module 73, for according to monitoring configuration information, to the internal state of distributed document storage cluster into Row statistical analysis, obtains monitored item purpose real-time monitoring data；

Abnormal confirmation module 74 will monitor project if meeting exceptional condition for monitored item purpose real-time monitoring data It is determined as exception object, using real-time monitoring data as abnormal data, and abnormal ask is generated according to exception object and abnormal data Topic；

Abnormal notification module 75 for generating abnormal reparation instruction according to abnormal problem, and repairs instruction transmission for abnormal To the central server of distributed document storage cluster；

Central server includes:

Abnormal repair module 76 parses abnormal reparation instruction, and root if instructing for receiving abnormal repair Corresponding abnormal recovery scenario is called to repair abnormal problem according to parsing result.

Further, the monitoring server further include:

Database population module is obtained for real-time monitoring data to be monitored data filling according to preset output template To target data；

Data outputting module, for target data to be output to monitor supervision platform, so that user is checked point by monitor supervision platform The real-time status of cloth file storage cluster；

Data memory module, for storing target data into preset historical data base；

Data statistics module, for according to the target data in historical data base, to the fortune of distributed document storage cluster Row state is analyzed, and is analyzed as a result, so that user based on the analysis results safeguards distributed document storage cluster.

Further, the monitoring server further include:

Exception level confirmation module, for determining the severity of abnormal problem according to preset service attribute；

Warning information generation module, for according to abnormal problem according to preset format generate warning information, and choose with The corresponding alarm sending method of the severity of abnormal problem；

Warning information sending module, for warning information to be sent to preset alarm address according to alarm sending method.

Further, the abnormal repair module 76 of central server includes:

Anomaly analysis submodule instructs for receiving abnormal repair, and is instructed according to abnormal reparation and determine abnormal problem, with And there is the node server of abnormal problem；

Scheme acquisition submodule, for searching from preset abnormal recovery scenario library and being asked with abnormal according to abnormal problem Inscribe the priority level of corresponding abnormal recovery scenario and each abnormal recovery scenario；

Abnormal submodule of repairing successively is obtained for the sequence of the priority level according to abnormal recovery scenario from high to low Each exception recovery scenario repairs the abnormal problem of node server, until monitored item purpose real-time monitoring data is not inconsistent Until conjunction exceptional condition or each abnormal recovery scenario are called.

Further, the central server further include:

Item detection module obtains repairing result for detecting the monitoring project after node server reparation；

Information sending module, if for repair result be failure, by abnormal problem and repair result be sent to it is preset i.e. When address so that operation maintenance personnel according to abnormal problem in time to distributed document storage cluster carry out manual maintenance.

The specific restriction of monitoring system about distributed document storage cluster may refer to above for distributed text The restriction of the monitoring method of part storage cluster, details are not described herein.In the monitoring system of above-mentioned distributed document storage cluster Modules can be realized fully or partially through software, hardware and combinations thereof.Above-mentioned each module can be embedded in the form of hardware Or independently of in the processor in computer equipment, can also be stored in a software form in the memory in computer equipment, The corresponding operation of the above modules is executed in order to which processor calls.

In one embodiment, a kind of computer equipment is provided, which can be server, internal junction Composition can be as shown in Figure 8.The computer equipment include by system bus connect processor, memory, network interface and Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The network interface of machine equipment is used to communicate with external terminal by network connection.When the computer program is executed by processor with Realize a kind of monitoring method of distributed document storage cluster.

In one embodiment, a kind of computer equipment is provided, including memory, processor and storage are on a memory And the computer program that can be run on a processor, processor realize above-described embodiment distributed document when executing computer program Step in the monitoring method of storage cluster, such as step S10 shown in Fig. 2 to step S60.

In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program realizes the step in the monitoring method of above-described embodiment distributed document storage cluster when being executed by processor, alternatively, Processor realizes each module of the monitoring system of distributed document storage cluster in above-described embodiment when executing computer program Function.To avoid repeating, which is not described herein again.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, To any reference of memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of the system is divided into different functional unit or module, more than completing The all or part of function of description.

Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations；Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features；And these are modified Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims

1. a kind of monitoring method of distributed document storage cluster, which is characterized in that the prison of the distributed document storage cluster Prosecutor method includes:

Monitoring server receives the monitoring configuration information for the distributed document storage cluster that monitor supervision platform is sent, wherein the prison Controlling configuration information includes monitoring project and exceptional condition；

The monitoring server receives the internal state for the distributed document storage cluster that monitor client is regularly sent, In, the monitor client is previously deployed at the corresponding node server of monitoring node of the distributed document storage cluster On, the internal state of the distributed document storage cluster is by the monitor client periodically from the corresponding section of the monitoring node It is obtained in point server；

The monitoring server carries out the internal state of the distributed document storage cluster according to the monitoring configuration information Statistical analysis, obtains the monitored item purpose real-time monitoring data；

If the monitored item purpose real-time monitoring data meets the exceptional condition, the monitoring server is by the monitored item Mesh is determined as exception object, using the real-time monitoring data as abnormal data, and according to the exception object and the exception Data generate abnormal problem；

The monitoring server generates abnormal repair according to the abnormal problem and instructs, and the exception is repaired instruction and is sent to The central server of the distributed document storage cluster；

If the central server receives abnormal repair and instructs, the abnormal reparation instruction is parsed, and root Corresponding abnormal recovery scenario is called to repair the abnormal problem according to parsing result.

2. the monitoring method of distributed document storage cluster as described in claim 1, which is characterized in that in the monitoring service Device is for statistical analysis to the internal state of the distributed document storage cluster according to the monitoring configuration information, obtains institute After stating monitored item purpose real-time monitoring data, the monitoring method of the distributed document storage cluster further include:

The real-time monitoring data is monitored data filling according to preset output template by the monitoring server, obtains mesh Mark data；

The target data is output to the monitor supervision platform by the monitoring server, so that user is looked by the monitor supervision platform See the real-time status of the distributed document storage cluster；

The monitoring server stores the target data into preset historical data base；

The monitoring server is according to the target data in the historical data base, to the fortune of the distributed document storage cluster Row state is analyzed, and is analyzed as a result, so that the user stores the distributed document according to the analysis result Cluster is safeguarded.

3. the monitoring method of distributed document storage cluster as described in claim 1, which is characterized in that if in the monitored item Purpose real-time monitoring data meets the exceptional condition, then the monitoring project is determined as abnormal right by the monitoring server As using the real-time monitoring data as abnormal data, and generating abnormal ask according to the exception object and the abnormal data After topic, the monitoring method of the distributed document storage cluster further include:

The monitoring server determines the severity of the abnormal problem according to preset service attribute；

The monitoring server generates warning information according to preset format according to the abnormal problem, and chooses and the exception The corresponding alarm sending method of the severity of problem；

The warning information is sent to preset alarm address according to the alarm sending method by the monitoring server.

4. the monitoring method of distributed document storage cluster as described in claim 1, which is characterized in that the central server It instructs, the abnormal reparation instruction is parsed, and called accordingly according to parsing result if receiving abnormal repair Abnormal recovery scenario to the abnormal problem carry out repair include:

The central server receives abnormal repair and instructs, and repairs instruction according to the exception and determine that the exception is asked Topic, and there is the node server of the abnormal problem；

The central server is searched and the abnormal problem from preset abnormal recovery scenario library according to the abnormal problem The priority level of corresponding exception recovery scenario and each abnormal recovery scenario；

The central server successively obtains each described different according to the priority level sequence from high to low of abnormal recovery scenario Normal recovery scenario repairs the abnormal problem of the node server, until the monitored item purpose real-time monitoring data not Meet the exceptional condition or each abnormal recovery scenario it is called until.

5. the monitoring method of distributed document storage cluster as claimed in claim 4, which is characterized in that in the center service If device receives abnormal repair and instructs, the abnormal reparation instruction is parsed, and phase is called according to parsing result After the abnormal recovery scenario answered repairs the abnormal problem, the monitoring method of the distributed document storage cluster is also Include:

The central server detects the monitoring project after the node server reparation, obtains repairing result；

If the reparation result is failure, the central server sends the abnormal problem and the reparation result to default Instant messaging address so that operation maintenance personnel according to the abnormal problem in time to the distributed document storage cluster carry out people Work maintenance.

6. a kind of monitoring system of distributed document storage cluster, which is characterized in that the prison of the distributed document storage cluster Control system includes monitoring server and central server, wherein is passed through between the monitoring server and the central server Network connection；

The monitoring server includes:

Configuration module is monitored, the monitoring configuration information of the distributed document storage cluster for receiving monitor supervision platform transmission, wherein The monitoring configuration information includes monitoring project and exceptional condition；

Data reception module, for receiving the inside shape for the distributed document storage cluster that monitor client is regularly sent State, wherein the monitor client is previously deployed at the corresponding node clothes of monitoring node of the distributed document storage cluster It is engaged on device, the internal state of the distributed document storage cluster is corresponding from the monitoring node by the monitor client timing Node server in obtain；

Data analysis module is used for according to the monitoring configuration information, to the internal state of the distributed document storage cluster It is for statistical analysis, obtain the monitored item purpose real-time monitoring data；

Abnormal confirmation module, if meeting the exceptional condition for the monitored item purpose real-time monitoring data, by the prison Control project is determined as exception object, using the real-time monitoring data as abnormal data, and according to the exception object and described Abnormal data generates abnormal problem；

Abnormal notification module repairs instruction for generating exception according to the abnormal problem, and abnormal repair is instructed hair It is sent to the central server of the distributed document storage cluster；

The central server includes:

Abnormal repair module parses the abnormal reparation instruction if instructing for receiving abnormal repair, and Corresponding abnormal recovery scenario is called to repair the abnormal problem according to parsing result.

7. the monitoring system of distributed document storage cluster as claimed in claim 6, which is characterized in that the monitoring server Further include:

Database population module is obtained for the real-time monitoring data to be monitored data filling according to preset output template To target data；

Data outputting module, for the target data to be output to the monitor supervision platform, so that user is flat by the monitoring Platform checks the real-time status of the distributed document storage cluster；

Data memory module, for storing the target data into preset historical data base；

Data statistics module, for according to the target data in the historical data base, to the distributed document storage cluster Operating status analyzed, analyzed as a result, so that the user is according to the analysis result to the distributed document Storage cluster is safeguarded.

8. the monitoring system of distributed document storage cluster as claimed in claim 6, which is characterized in that in the center service In device, the exception repair module includes:

Anomaly analysis submodule is instructed for receiving abnormal repair, and described different according to the abnormal reparation instruction determination Chang Wenti, and there is the node server of the abnormal problem；

Scheme acquisition submodule, for according to the abnormal problem, searched from preset abnormal recovery scenario library with it is described different The priority level of the corresponding abnormal recovery scenario of Chang Wenti and each abnormal recovery scenario；

It is abnormal to repair submodule, for the sequence of the priority level according to abnormal recovery scenario from high to low, successively obtain each The exception recovery scenario repairs the abnormal problem of the node server, until the monitored item purpose monitors in real time Data do not meet the exceptional condition or each abnormal recovery scenario it is called until.

9. a kind of computer equipment, including memory, processor and storage are in the memory and can be in the processor The computer program of upper operation, which is characterized in that the processor realized when executing the computer program as claim 1 to The step of monitoring method of any one of 5 distributed document storage clusters.

10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In realization distributed document storage cluster as described in any one of claim 1 to 5 when the computer program is executed by processor Monitoring method the step of.