Summary of the invention
This specification one or more embodiment describes the monitoring method and device of a kind of distributed system, can be timely
It was found that High Availabitity problem and precise positioning problem, to promote quick emergency recovery.
In a first aspect, providing a kind of monitoring method of distributed system, the distributed system is by multiple single machine structures
At cluster, method includes:
Obtain multinomial High Availabitity achievement data of the distributed system within the current preset time cycle, the multinomial height
It can include the multinomial single machine High Availabitity achievement data of each single machine and the multinomial cluster High Availabitity index of cluster with achievement data
Data;
Obtain the corresponding unusual fluctuation measure function of each High Availabitity index in the current preset time cycle;
It is utilized respectively the corresponding unusual fluctuation measure function, assesses each High Availabitity index in the current preset time cycle
Data obtain the result whether each High Availabitity achievement data needs early warning.
In a kind of possible embodiment, the acquisition distributed system is more within the current preset time cycle
Item High Availabitity achievement data, comprising:
Obtain log of the distributed system within the current preset time cycle;
The log for same source address is parsed according to preset model, obtains the multinomial list of each single machine
Machine High Availabitity achievement data.
Further, wherein the multinomial High Availabitity for obtaining the distributed system within the current preset time cycle
Achievement data, further includes:
Operation is carried out according to preset algorithm to the multinomial single machine High Availabitity achievement data of each single machine, determines the cluster
Multinomial cluster High Availabitity achievement data.
Further, wherein the log includes runnability log and/or infrastructure service log.
Further, wherein the runnability log includes the service condition data of CPU, load condition data, memory
At least one of in service condition data and GC several data;The High Availabitity index includes the service condition parameter of CPU, load
At least one of in situation parameter, memory service condition parameter and GC parameter.
Further, wherein the infrastructure service log include calling interface method time-consuming, calling interface method result,
At least one of in the interface method time-consuming of database manipulation and the interface method result of database manipulation;The High Availabitity index
Including calling interface method time-consuming parameter, calling interface method result parameter, database manipulation interface method time-consuming parameter and
At least one of in the interface method result parameter of database manipulation.
In a kind of possible embodiment, wherein the unusual fluctuation measure function determines in the following manner:
It can to the multinomial height obtained at least one preset period of time before the current preset time cycle
It is for statistical analysis with achievement data difference, determine each corresponding index of High Availabitity index of the current preset time cycle
Base line formula;
According to the corresponding index base line formula of each High Availabitity index of the current preset time cycle, work as described in determination
The corresponding unusual fluctuation measure function of each High Availabitity index of preceding preset period of time.
Further, wherein in described at least one preset period of time to before the current preset time cycle
The multinomial High Availabitity achievement data difference obtained is for statistical analysis, determines each height of the current preset time cycle
The corresponding index base line formula of index can be used, comprising:
It is assumed that the interior multinomial height obtained of at least one preset period of time before the current preset time cycle
Can be with achievement data according to normal distribution, each height of the current preset time cycle according to the determine the probability of numeric distribution can
With the corresponding index base line formula of index.
Further, wherein described each corresponding index of High Availabitity index according to the current preset time cycle
Base line formula determines the corresponding unusual fluctuation measure function of each High Availabitity index of the current preset time cycle, comprising:
According to the corresponding index base line formula of each High Availabitity index of the current preset time cycle and described work as
The ring of each High Availabitity index of each High Availabitity index of preceding preset period of time and a upper preset period of time than ratio,
And/or year-on-year ratio, determine the corresponding unusual fluctuation measure function of each High Availabitity index of the current preset time cycle.
In a kind of possible embodiment, wherein the method also includes:
The result of early warning whether is needed to distinguish the multinomial High Availabitity achievement data and each High Availabitity achievement data
Information fusion is carried out according to cluster dimension and single machine dimension, is assembled into warning information message;
According to the corresponding single machine of the warning information message or cluster, the warning information message is sent with predetermined manner
Give the single machine or the corresponding default terminal of cluster.
Further, wherein the predetermined manner includes one or more of mode:
Instant messaging (instant messaging, IM) notice, short message and phone.
Second aspect, provides a kind of monitoring device of distributed system, and the distributed system is by multiple single machine structures
At cluster, device includes:
First acquisition unit refers to for obtaining multinomial High Availabitity of the distributed system within the current preset time cycle
Mark data, the multinomial High Availabitity achievement data includes, the multinomial single machine High Availabitity achievement data of each single machine and cluster it is more
Item cluster High Availabitity achievement data;
Second acquisition unit is measured for obtaining the corresponding unusual fluctuation of each High Availabitity index in the current preset time cycle
Function;
Assessment unit, the corresponding unusual fluctuation measure function obtained for being utilized respectively the second acquisition unit, is commented
Estimate each High Availabitity achievement data in the current preset time cycle, obtains whether each High Availabitity achievement data needs early warning
As a result.
The third aspect provides a kind of computer readable storage medium, is stored thereon with computer program, when the calculating
When machine program executes in a computer, enable computer execute first aspect method.
Fourth aspect provides a kind of calculating equipment, including memory and processor, and being stored in the memory can hold
Line code, when the processor executes the executable code, the method for realizing first aspect.
The method and apparatus provided by this specification embodiment, the distributed system are the collection being made of multiple single machines
Group, obtains multinomial High Availabitity achievement data of the distributed system within the current preset time cycle, the multinomial height first
It can include the multinomial single machine High Availabitity achievement data of each single machine and the multinomial cluster High Availabitity index of cluster with achievement data
Then data obtain the corresponding unusual fluctuation measure function of each High Availabitity index in the current preset time cycle, then are utilized respectively
The corresponding unusual fluctuation measure function assesses each High Availabitity achievement data in the current preset time cycle, obtains each height
The result of early warning whether can be needed with achievement data.Therefore the multinomial of cluster is not only obtained in this specification embodiment
Cluster High Availabitity achievement data, and the multinomial single machine High Availabitity achievement data of each single machine is obtained, and measure according to unusual fluctuation
Function judges whether each High Availabitity achievement data needs early warning (whether unusual fluctuation occurring), wherein different preset time weeks
The High Availabitity index of phase or different item may correspond to different unusual fluctuation measure functions, be referred to by the High Availabitity to distributed system
Target fining monitoring, so as to find High Availabitity problem and precise positioning problem in time, to promote quick emergency recovery.
Specific embodiment
With reference to the accompanying drawing, the scheme provided this specification is described.
Fig. 1 is the implement scene schematic diagram of one embodiment that this specification discloses.The implement scene is related to for distribution
The monitoring of formula system, wherein the distributed system is the cluster being made of multiple single machines.Referring to Fig. 1, distributed system 11 is
The cluster being made of single machine A, single machine B, single machine C and single machine D, user is by terminal 12 (for example, mobile phone, plate or PC
Deng) to access or using the distributed system 11, it is to be understood that the number for the single machine for including in cluster can be according to reality
Demand determines that the number of single machine as shown in the figure is by way of example only.
In a distributed system, what one group of independent computer (i.e. single machine) was presented to user is one unified whole
Body just looks like as being a system.System possesses the physics and logical resource of many general, can dynamically distribute task,
The physics and logical resource of dispersion realize information exchange by computer network.It is managed in a manner of global in system there are one
The distributed operating system of computer resource.In general, for users, only one model of distributed system or pattern.It is grasping
Make there is one layer of software middleware (middleware) to be responsible for realizing this model on system.
Distributed operating system is with global mode management system resource, it can be any dispatch network money of user
Source, and scheduling process is " transparent ".When user submits an operation, distributed operating system can be as needed
Most suitable processor (i.e. single machine) is selected in system, and the operation of user is submitted to the processing routine, is fulfiled assignment in processor
Afterwards, result is transmitted to user.In this process, user is not aware that the presence of multiple processors, this system is just
It seem that a processor is the same.
It should be noted that distributed system: supporting the software systems of distributed treatment in this specification embodiment, being
The system of task is executed on the multiprocessor architecture interconnected by communication network.It not only includes distributed operating system,
It further include distributed program design language and its compiling (explanation) system, distributed file system and distributed data base system
Deng.
In this specification embodiment, monitoring for distributed system, primarily directed to the high availability of distributed system
Monitoring, by the monitoring to cluster and single machine many indexes, to judge whether indices occur unusual fluctuation, that is,
It says, judges whether indices need early warning, monitored by the fining of the High Availabitity index to distributed system, so as to
Discovery High Availabitity problem and precise positioning problem in time, to promote quick emergency recovery.
Fig. 2 shows the monitoring method flow chart according to the distributed system of one embodiment, the distributed system is served as reasons
The cluster that multiple single machines are constituted, such as distributed system 11 shown in Fig. 1.As shown in Fig. 2, distributed system in the embodiment
Monitoring method the following steps are included: step 21, obtains multinomial height of the distributed system within the current preset time cycle
Achievement data can be used, the multinomial High Availabitity achievement data includes the multinomial single machine High Availabitity achievement data of each single machine, Yi Jiji
The multinomial cluster High Availabitity achievement data of group;Step 22, each the High Availabitity index obtained in the current preset time cycle is corresponding
Unusual fluctuation measure function;Step 23, it is utilized respectively the corresponding unusual fluctuation measure function, is assessed in the current preset time cycle
Each High Availabitity achievement data obtains the result whether each High Availabitity achievement data needs early warning.It is described below above each
The specific executive mode of step.
First in step 21, multinomial High Availabitity index number of the distributed system within the current preset time cycle is obtained
According to the multinomial High Availabitity achievement data includes the multinomial single machine High Availabitity achievement data of each single machine and the multi itemset of cluster
Group's High Availabitity achievement data.Wherein, the current preset time cycle can be understood as the early warning period, and the above-mentioned early warning period can basis
Demand setting, for example, being 5 minutes, half an hour or one day etc. by early warning cycle set.
In one example, log of the distributed system within the current preset time cycle is obtained;For it is same come
The log of source address is parsed according to preset model, obtains the multinomial single machine High Availabitity achievement data of each single machine.
It is understood that the log is the log of each single machine record in distributed system, the source address of log be can be identified for that
The corresponding single machine of the log out, so as to which single machine High Availabitity achievement data is mapped with single machine, convenient for subsequent for every
Whether a single machine analysis single machine High Availabitity achievement data is unusual fluctuation value, and whether needs early warning.
In one example, the multinomial single machine High Availabitity achievement data of each single machine is transported according to preset algorithm
It calculates, determines the multinomial cluster High Availabitity achievement data of the cluster.Wherein, above-mentioned preset algorithm can be, but not limited to each single machine
A single machine High Availabitity achievement data average, minimize or maximizing.Implement as shown in Table 1 for this specification
A kind of mapping table of the cluster High Availabitity achievement data of the single machine High Availabitity achievement data and cluster for single machine that example provides.
Table one: the mapping table of single machine High Availabitity achievement data and cluster High Availabitity achievement data
It is being obtained for system is the cluster being made of single machine A, single machine B, single machine C and single machine D in a distributed manner referring to table one
After taking the High Availabitity achievement data of a High Availabitity index of each single machine, referred to by this High Availabitity that maximizing obtains cluster
Target High Availabitity achievement data.Wherein, in table one, by taking above-mentioned preset algorithm is maximizing as an example, other algorithms are also class
Seemingly, this will not be repeated here.
In one example, the log includes runnability log and/or infrastructure service log.
Further, the runnability log includes that the service condition data, load condition data, memory of CPU use
At least one of in situation data and GC several data;The High Availabitity index includes service condition parameter, the load condition of CPU
At least one of in parameter, memory service condition parameter and GC parameter.
Further, the infrastructure service log includes calling interface method time-consuming, calling interface method result, database
At least one of in the interface method time-consuming of operation and the interface method result of database manipulation;The High Availabitity index includes connecing
Mouth method call time-consuming parameter, calling interface method result parameter, the interface method time-consuming parameter of database manipulation and database
At least one of in the interface method result parameter of operation.
It should be noted that the above-mentioned early warning period is typically larger than the record period of log, for example, log records one per minute
It is secondary, and the early warning period is 5 minutes, in this case, within an early warning period, one High Availabitity index of log recording
Multiple High Availabitity achievement datas.For the ease of analysis, further processing can be made for this multiple High Availabitity achievement data,
For example, maximizing, minimizes, or average, using the High Availabitity achievement data obtained after processing as the early warning
The High Availabitity achievement data of this High Availabitity index in period, it is subsequent judge whether to occur unusual fluctuation value can be for after the processing
An obtained High Availabitity achievement data is judged.A kind of log recording provided as shown in Table 2 for this specification embodiment
High Availabitity achievement data and the early warning period High Availabitity achievement data to be assessed mapping table.
Table two: the mapping table of the High Availabitity achievement data of log and High Availabitity achievement data to be assessed
Referring to table two, by taking the cpu usage parameter value of the log recording in an early warning period in single machine A as an example,
It is to be assessed by averaging to obtain after the multiple High Availabitity achievement datas for obtaining a High Availabitity index in the early warning period
This High Availabitity index High Availabitity achievement data.Wherein, in table two, by taking above-mentioned preset algorithm is to average as an example,
His algorithm be also it is similar, this will not be repeated here.
Then in step 22, letter is measured in the corresponding unusual fluctuation of each High Availabitity index obtained in the current preset time cycle
Number.In this specification embodiment, the unusual fluctuation measure function can be obtained by the analysis to historical data, it can be using offline
Mode carries out above-mentioned analysis, can also carry out above-mentioned analysis using online mode.
Wherein, off-line calculation: off-line calculation exactly all input datas known, input data before calculating starts will not produce
Changing, and the calculating carried out under the premise of solving the problems, such as will obtain a result after one immediately.Belong to number in big data
According to calculating section, it is corresponding with off-line calculation in the portion, be to calculate in real time.
It is understood that needing to determine that letter is measured in unusual fluctuation respectively for each High Availabitity index of each single machine or cluster
Number.
In one example, the unusual fluctuation measure function determines in the following manner: to the current preset time cycle
The multinomial High Availabitity achievement data difference obtained at least one preset period of time before is for statistical analysis, determines
The corresponding index base line formula of each High Availabitity index of the current preset time cycle;According to week current preset time
The corresponding index base line formula of each High Availabitity index of phase, determines each High Availabitity index of the current preset time cycle
Corresponding unusual fluctuation measure function.
Specifically, it can be assumed that obtained at least one preset period of time before the current preset time cycle
The multinomial High Availabitity achievement data is according to normal distribution, the current preset time cycle according to the determine the probability of numeric distribution
The corresponding index base line formula of each High Availabitity index.For example, carrying out above-mentioned calculating using 3sigma algorithm.
Wherein, These parameters base line formula can be understood as interval threshold, and the interval threshold is for limiting High Availabitity index
Normal value belonging to interval range.In this specification embodiment, can be by judging whether High Availabitity achievement data falls on
Interval range is stated, to determine whether the High Availabitity achievement data is unusual fluctuation value, whether needs early warning accordingly.
In addition, can be combined with other conditions other than according to These parameters base line formula to determine High Availabitity index
Whether data are unusual fluctuation value.In one example, corresponding according to each High Availabitity index of the current preset time cycle
Each High Availabitity index of index base line formula and the current preset time cycle are every with a upper preset period of time
The ring of item High Availabitity index determines each High Availabitity index of the current preset time cycle than ratio and/or year-on-year ratio
Corresponding unusual fluctuation measure function.
Wherein, ring is the ratio of the High Availabitity achievement data of the same item High Availabitity index in two time adjacent segments than ratio
Value, the above-mentioned period can be preset period of time, for example, each High Availabitity index of the current preset time cycle with it is upper
The ratio of each High Availabitity index of one preset period of time.
Wherein, year-on-year ratio is the High Availabitity of the same item High Availabitity index in the identical subinterval in two time adjacent segments
The ratio of achievement data, the above-mentioned period can day, above-mentioned subinterval can be preset period of time, for example, today is described
Each of the preset period of time in the same period of each High Availabitity index and yesterday of current preset time cycle is high
The ratio of index can be used.
In one example, year-on-year threshold value of the ring than the ring of ratio than threshold value and year-on-year ratio can be preset, is passed through
High Availabitity achievement data compared with index base line formula, ring than ratio compared with ring is than threshold value, and ratio and same on year-on-year basis
It is comprehensive to determine whether the High Availabitity achievement data is unusual fluctuation value than the comparison of threshold value, and whether need early warning.
Finally in step 23, it is utilized respectively the corresponding unusual fluctuation measure function, is assessed in the current preset time cycle
Each High Availabitity achievement data obtains the result whether each High Availabitity achievement data needs early warning.It is understood that when height
When can be judged as unusual fluctuation value with achievement data, then this High Availabitity achievement data needs early warning, does not otherwise need early warning.
In one example, after step 23, to the multinomial High Availabitity achievement data and each High Availabitity index
Whether data need the result of early warning to carry out information fusion according to cluster dimension and single machine dimension respectively, are assembled into warning information report
Text;According to the corresponding single machine of the warning information message or cluster, by the warning information message with predetermined manner be sent to
The single machine or the corresponding default terminal of cluster.
Wherein, the predetermined manner includes one or more of mode:
Instant messaging (instant messaging, IM) notice, short message and phone.
It is understood that each single machine or cluster may be safeguarded or be managed by different personnel, this specification is implemented
In example, the differentiated distribution of warning information message may be implemented, it is easier to be concerned about by user, to promote user experience.
Furthermore, it is possible to just assemble above-mentioned warning information message when there is the High Availabitity achievement data for needing early warning, go forward side by side
The distribution of row message achievees the effect that failure is reminded;Alternatively, can also be in spite of in the presence of the High Availabitity index number for needing early warning
According to above-mentioned warning information message being assembled, and carry out the distribution of message, to reach constantly monitoring of the user to indices.
The method provided by this specification embodiment, the distributed system are the cluster being made of multiple single machines, first
Multinomial High Availabitity achievement data of the distributed system within the current preset time cycle is first obtained, the multinomial High Availabitity refers to
Marking data includes the multinomial single machine High Availabitity achievement data of each single machine and the multinomial cluster High Availabitity achievement data of cluster, so
The corresponding unusual fluctuation measure function of each High Availabitity index in the current preset time cycle is obtained afterwards, then is utilized respectively corresponding institute
Unusual fluctuation measure function is stated, each High Availabitity achievement data in the current preset time cycle is assessed, obtains each High Availabitity index
Whether data need the result of early warning.Therefore the multinomial cluster height that cluster is not only obtained in this specification embodiment can
With achievement data, and the multinomial single machine High Availabitity achievement data of each single machine is obtained, and is judged according to unusual fluctuation measure function
Whether each High Availabitity achievement data needs early warning (whether unusual fluctuation occurring), wherein different preset period of time or difference
The High Availabitity index of item may correspond to different unusual fluctuation measure functions, pass through the fine of the High Availabitity index to distributed system
Change monitoring, so as to find High Availabitity problem and precise positioning problem in time, to promote quick emergency recovery.
Fig. 3, which is shown, implements schematic diagram, the distribution according to the monitoring method of the distributed system of one embodiment
System is the cluster being made of multiple single machines, such as distributed system 11 shown in Fig. 1.As shown in figure 3, dividing in the embodiment
The monitoring method of cloth system is mainly realized by following module:
Module 31 is acquired for Real-time Metadata.
Wherein, above-mentioned metadata includes cluster High Availabitity metadata and single machine High Availabitity metadata.
On the one hand, the High Availabitity metadata based on Real-time Data Center collects integration, to obtain single machine High Availabitity metadata.
In this specification embodiment, a variety of Real-time Data Centers that can be provided using current industry class, by taking
It is engaged in that client (client) is installed on device, then timing second grade is analyzed the log transmission of update to data from server flat
Platform.By Real-time Data Center collect original log information, log information includes the runnability log of server, as CPU,
LOAD, memory, gc situation etc. also include the infrastructure service log of system operation, such as the calling time-consuming and as a result, number of interface method
According to the interface method time-consuming and result of library operation DAO.
It in one example, can be by carrying out default operation to single machine High Availabitity metadata, to obtain cluster High Availabitity
Metadata.
On the other hand, it is obtained outside metadata except through collecting the mode of log, other existing monitoring can also be passed through
Platform can reduce the new cost for obtaining data to obtain the part High Availabitity data in terms of cluster.
Module 32 is modeled for real time data.
In this specification embodiment, module 32 includes Data Analysis Platform, for the data flow obtained from module 31, is passed through
One or more processing such as data filtering, data aggregate and data modeling, by treated, data pass to the progress of offline number storehouse
Offline backup, or data pass to the Production database (database, DB) in module 34 by treated, so that module 34
Index unusual fluctuation detection is carried out to real time data.
Collecting the log information to come due to module 31 is typically all original log information, is needed to these this when
Log carries out the modeling on basis, obtains corresponding data flow of the High Availabitity index based on time shaft.
For example, clean interface data may be such that
2018-03-30T14:55:15.538+0800:2.987:[CMS-concurrent-mark- start]
2018-03-30T14:55:15.541+e800:2.991:[CMS-concurrent-mark: 0.003/
0.003secs] [Times:user=0.02sys=0.00, real=0.00secs]
2018-03-30T14:55:15.541+0800:2.991:[CMS-concurrent-precl ean-start]
2018-03-30T14:55:15.559+0800:3.009:[CMS-concurrent-precl ean:0.e18/
0.018secs] [Times:user=0.11sys=0.00, real=e.e2secs]
2018-03-30T14:55:15.559+e8ee:3.ee9:[CMS-concurrent-abort able-
preclean-start]
So, data modeling needs the source address based on log, is modeled in conjunction with this partial log and is believed as follows
Breath:
Appname-1-1,2018-03-30 14:55, gc, 1
Wherein, appname-1-1 representative server name, 2018-03-30 14:55 represent the time, and gc represents High Availabitity and refers to
It marks (being gc number here), 1 represents the numerical value of High Availabitity index, i.e. High Availabitity achievement data.
Then per minute, every server log is parsed by above-mentioned model respectively, it can when obtaining being based on
Between axis High Availabitity achievement data stream.
As procedure described above, by all High Availabitity indexs of a system, acquisition of information and data are carried out in this way
Complete fining High Availabitity achievement data can be obtained in modeling parsing.
Wherein, data filtering and data aggregate are technical term commonly used in the art, are no longer explained herein.
Module 33, for obtaining High Availabitity index baseline.
Because the data trend and magnitude of each index are all different, and the not homologous rays of same index, corresponding
Data result and trend be also it is different, if unusual fluctuation threshold value of warning is arranged in the mode based on artificial experience, workload
Can be very big, and as the variation of time can become not accurate, so using the historical data analysis based on algorithm here, come
Obtain the unusual fluctuation measure function of subsequent time, come it is intelligentized obtain it is final whether should early warning result;It, will be real-time in realization
All data that data obtain all are written to offline number storehouse, are analyzed by common unusual fluctuation detection algorithm off-line data
And index base line formula is obtained, while the dynamic formula being written back to production DB in real time.Wherein, in selection unusual fluctuation measure function
When, it is contemplated that the one of High Availabitity index is big, and feature is, the trend of data entirety is more stable, so can select warp here
Allusion quotation unusual fluctuation detection algorithm 3sigma algorithm carries out baseline calculating to off-line data.
Module 34 can be described as High Availabitity unusual fluctuation detecting and alarm, for according to the real time data and Dynamic Baseline in production DB
Index unusual fluctuation detection is carried out, and early warning is carried out by warning module.
High Availabitity achievement data based on time shaft after taking Real-time Data Center modeling and based on from online data
After modeling obtained high-altitude index base line formula, recycle using all High Availabitity achievement datas of single system as high-altitude index
The input of base line formula obtains final pre-warning mark position (being or non-), finally obtain all High Availabitity achievement datas and whether
Need the result of early warning.These indexs are pressed cluster dimension respectively with engine and single machine dimension carries out every High Availabitity index result
Information fusion is assembled into warning information message, is pushed to corresponding personnel in a manner of IM notice, short message, phone etc..
Information fusion advice method is similar as follows:
High Availabitity refines monitor supervision platform-aappname details
Cluster index:
(table)
Index | current unusual fluctuation value | stable reference value | whether unusual fluctuation
CPU | 81 | 23 | it is no
loadl...
FGC
Tair-xx- success rate
Tair-xx- is time-consuming
Interface-method-success rate
Interface-method-time-consuming
Dal- is time-consuming
Dao- data source-success rate
Dao- data source-time-consuming
Based on above-mentioned mechanism, it can accomplish that system manager when High Availabitity unusual fluctuation occurs for system, accurately receives
The fining result warning information of cluster and each dimension of single machine.
The realization of this programme, by the advantage of real-time online data platform, got in real time system it is all it is high can
With the source data of index, while after modeling to indices, all indication informations of cluster and single machine can be obtained;In addition,
By the investment of unusual fluctuation detection algorithm, reduces threshold value of warning setup cost and promote precision;It, can finally by the polymerization of information
Accomplish system personnel when High Availabitity unusual fluctuation occurs for system, accurately receives the fining polymerization of cluster and each dimension of single machine
As a result, rather than scattered multiple individual event warning information, facilitate precise positioning, reduce early warning cost.
According to the embodiment of another aspect, a kind of monitoring device of distributed system is also provided, the distributed system is
The cluster being made of multiple single machines.Fig. 4 shows the schematic block diagram of the monitoring device of the distributed system according to one embodiment.
As shown in figure 4, the device 400 includes:
First acquisition unit 41, for obtaining multinomial High Availabitity of the distributed system within the current preset time cycle
Achievement data, the multinomial High Availabitity achievement data include, the multinomial single machine High Availabitity achievement data of each single machine and cluster
Multinomial cluster High Availabitity achievement data;
Second acquisition unit 42, for obtaining the corresponding unusual fluctuation weighing apparatus of each High Availabitity index in the current preset time cycle
Flow function;
Assessment unit 43 measures letter for being utilized respectively the corresponding unusual fluctuation that the second acquisition unit 42 obtains
Number assesses each High Availabitity achievement data in the current preset time cycle, obtains whether each High Availabitity achievement data needs
The result of early warning.
Optionally, as one embodiment, the first acquisition unit 41 is specifically used for the acquisition distributed system and exists
Log in the current preset time cycle;The log for same source address is parsed according to preset model, is obtained
The multinomial single machine High Availabitity achievement data of each single machine.
Further, the first acquisition unit 41 is also used to the multinomial single machine High Availabitity index number to each single machine
Operation is carried out according to according to preset algorithm, determines the multinomial cluster High Availabitity achievement data of the cluster.
Further, the log that the first acquisition unit 41 obtains includes runnability log and/or basis clothes
Business log.
Further, the runnability log includes that the service condition data, load condition data, memory of CPU use
At least one of in situation data and GC several data;The High Availabitity index includes service condition parameter, the load condition of CPU
At least one of in parameter, memory service condition parameter and GC parameter.
Further, the infrastructure service log includes calling interface method time-consuming, calling interface method result, database
At least one of in the interface method time-consuming of operation and the interface method result of database manipulation;The High Availabitity index includes connecing
Mouth method call time-consuming parameter, calling interface method result parameter, the interface method time-consuming parameter of database manipulation and database
At least one of in the interface method result parameter of operation.
Optionally, as one embodiment, described device further include:
Determination unit, the unusual fluctuation measure function obtained for determining the second acquisition unit 42 in the following manner:
It can to the multinomial height obtained at least one preset period of time before the current preset time cycle
It is for statistical analysis with achievement data difference, determine each corresponding index of High Availabitity index of the current preset time cycle
Base line formula;
According to the corresponding index base line formula of each High Availabitity index of the current preset time cycle, work as described in determination
The corresponding unusual fluctuation measure function of each High Availabitity index of preceding preset period of time.
Further, the determination unit, specifically at least one before the hypothesis current preset time cycle
The multinomial High Availabitity achievement data obtained in preset period of time is according to normal distribution, according to the determine the probability of numeric distribution
The corresponding index base line formula of each High Availabitity index of the current preset time cycle.
Further, the determination unit, specifically for being referred to according to each High Availabitity of the current preset time cycle
Mark corresponding index base line formula and each High Availabitity index of the current preset time cycle and a upper preset time
The ring of each High Availabitity index in period determines each height of the current preset time cycle than ratio and/or year-on-year ratio
The corresponding unusual fluctuation measure function of index can be used.
Optionally, as one embodiment, described device further include:
Prewarning unit, multinomial High Availabitity achievement data and the assessment for being obtained to the first acquisition unit 41
Each High Availabitity achievement data that unit 43 obtains whether need the result of early warning respectively according to cluster dimension and single machine dimension into
Row information polymerization, is assembled into warning information message;According to the corresponding single machine of the warning information message or cluster, by the early warning
Infomational message is sent to default terminal corresponding with the single machine or cluster with predetermined manner.
Further, the predetermined manner includes one or more of mode:
IM notice, short message and phone.
The device provided by this specification embodiment, the distributed system are the cluster being made of multiple single machines, first
Multinomial High Availabitity achievement data of the distributed system within the current preset time cycle is first obtained by first acquisition unit 41,
The multinomial High Availabitity achievement data includes the multinomial single machine High Availabitity achievement data of each single machine and the multinomial cluster of cluster
Then it is corresponding to obtain each High Availabitity index in the current preset time cycle by second acquisition unit 42 for High Availabitity achievement data
Unusual fluctuation measure function, then the corresponding unusual fluctuation measure function is utilized respectively by assessment unit 43, assesses the current preset time
Each High Availabitity achievement data in period obtains the result whether each High Availabitity achievement data needs early warning.Therefore
The multinomial cluster High Availabitity achievement data of cluster is not only obtained in this specification embodiment, but also obtains the multinomial of each single machine
Single machine High Availabitity achievement data, and judge whether each High Availabitity achievement data needs early warning (i.e. according to unusual fluctuation measure function
Whether unusual fluctuation is occurred), wherein the High Availabitity index of different preset period of time or different item may correspond to different unusual fluctuations
Measure function is monitored by the fining of the High Availabitity index to distributed system, so as to find High Availabitity problem in time
And precise positioning problem, to promote quick emergency recovery.
According to the embodiment of another aspect, a kind of computer readable storage medium is also provided, is stored thereon with computer journey
Sequence enables computer execute and combines method described in Fig. 2 or Fig. 3 when the computer program executes in a computer.
According to the embodiment of another further aspect, a kind of calculating equipment, including memory and processor, the memory are also provided
In be stored with executable code, when the processor executes the executable code, realize and combine side described in Fig. 2 or Fig. 3
Method.
Those skilled in the art are it will be appreciated that in said one or multiple examples, function described in the invention
It can be realized with hardware, software, firmware or their any combination.It when implemented in software, can be by these functions
Storage in computer-readable medium or as on computer-readable medium one or more instructions or code transmitted.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects
It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention
Protection scope, all any modification, equivalent substitution, improvement and etc. on the basis of technical solution of the present invention, done should all
Including within protection scope of the present invention.