CN103428026B - For sharing the method and system that the problem in dynamic cloud determines and diagnoses - Google Patents
For sharing the method and system that the problem in dynamic cloud determines and diagnoses Download PDFInfo
- Publication number
- CN103428026B CN103428026B CN201310174315.3A CN201310174315A CN103428026B CN 103428026 B CN103428026 B CN 103428026B CN 201310174315 A CN201310174315 A CN 201310174315A CN 103428026 B CN103428026 B CN 103428026B
- Authority
- CN
- China
- Prior art keywords
- tolerance
- event
- virtual machine
- deviation
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000006399 behavior Effects 0.000 claims description 56
- 238000000034 method Methods 0.000 claims description 20
- 230000000875 corresponding Effects 0.000 claims description 12
- 238000009825 accumulation Methods 0.000 claims description 5
- 238000006116 polymerization reaction Methods 0.000 claims description 5
- 238000010801 machine learning Methods 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 claims description 3
- 201000010099 disease Diseases 0.000 claims 1
- 238000011068 load Methods 0.000 description 20
- 238000003860 storage Methods 0.000 description 17
- 238000005516 engineering process Methods 0.000 description 16
- 238000004590 computer program Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 13
- 239000000203 mixture Substances 0.000 description 12
- 230000002159 abnormal effect Effects 0.000 description 7
- 230000005012 migration Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 6
- 238000005259 measurement Methods 0.000 description 5
- 230000001808 coupling Effects 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 4
- 238000005859 coupling reaction Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000005291 magnetic Effects 0.000 description 3
- 238000002156 mixing Methods 0.000 description 3
- 230000003287 optical Effects 0.000 description 3
- 239000011800 void material Substances 0.000 description 3
- HUTDUHSNJYTCAR-UHFFFAOYSA-N ancymidol Chemical compound C1=CC(OC)=CC=C1C(O)(C=1C=NC=NC=1)C1CC1 HUTDUHSNJYTCAR-UHFFFAOYSA-N 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 125000004122 cyclic group Chemical group 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 230000002596 correlated Effects 0.000 description 1
- 230000003111 delayed Effects 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000003365 glass fiber Substances 0.000 description 1
- 210000003702 immature single positive T cell Anatomy 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006011 modification reaction Methods 0.000 description 1
- 230000002093 peripheral Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000000644 propagated Effects 0.000 description 1
- 238000004080 punching Methods 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000000528 statistical test Methods 0.000 description 1
- 230000001960 triggered Effects 0.000 description 1
Abstract
The present invention relates to the method and system that a kind of problem for sharing in dynamic cloud determines and diagnoses.Described method includes: monitor each virtual machine in described shared dynamic cloud environment and physical server at least one tolerance;Identify problem symptom according to described supervision and generate event;Analyze described event to determine the deviation with normal behaviour;And be exception based on cloud or application and trouble according to existing knowledge by described event classification.
Description
Technical field
Embodiments of the invention usually relate to information technology, more particularly, it relates to Intel Virtualization Technology.
Background technology
May often break down in large-scale data center, it is the principal element of data center's management cost.
Data center can generate and monitor data in a large number, but detects fault and corresponding basic from these data
Reason is highly difficult.Generally, by using predefined threshold value manual observation data to perform this task.
Additionally, virtualization cloud environment proposes new challenge, including shared resource, to virtual machine (VM)
Carry out migrating and adjusted size of dynamic operation environment, the live load etc. of change.Share contention for resources
Cache, disk and Internet resources, and if distinguish the performance issue that causes due to contention with across
Challenge is there is when doing the application and trouble of resource.Exist dynamic when distinguishing property abnormality and changing with live load
State property is challenged.
Virtualization is increasingly being used for emerging cloud system.This type of arranges the various of middle appearance on a large scale
Exception or fault will dramatically increase and always manage cost, and make the penalty that bottom applies.Therefore, need
Be used for dynamically sharing the Efficient fault management technique of cloud, these technology can distinguish cloud relevant abnormalities with
Application and trouble.
Summary of the invention
In one aspect of the invention, it is provided that determine for sharing the problem in dynamic cloud environment and examine
Disconnected technology.A kind of for sharing the illustrative computer realization that the problem in dynamic cloud determines and diagnoses
Method may comprise steps of: monitor in described shared dynamic cloud environment at least one tolerance
Each virtual machine and physical server;Identify problem symptom according to described supervision and generate event;
Analyze described event to determine the deviation with normal behaviour;And according to existing knowledge, described event is divided
Class is exception based on cloud or application and trouble.
In another aspect of the present invention, it is provided that a kind of true for sharing the problem in dynamic cloud environment
System that is fixed and that diagnose.Described system includes memorizer, be coupled to described memorizer at least one at
Reason device, and at least one different software module, each different software module is comprised in tangible meter
In calculation machine computer-readable recording medium.Described software module includes: monitoring engine modules, it is on the processor
Perform, for at least one tolerance monitor each virtual machine in described shared dynamic cloud environment and
Physical server, and export the supervision data time series (time series) corresponding with each tolerance;
Event generates engine modules, and it performs on the processor, asks for identifying according to described supervision
Topic symptom also generates event;Problem Determination Engine module, it performs on the processor, for dividing
Analyse described event to determine and to position the deviation with normal behaviour;And diagnostic engine module, it is in institute
State on processor perform, for according to existing knowledge by described event classification be exception based on cloud or should
Use fault.
In another aspect of the present invention, a kind of for determining the void under the multiple operating environments in system
The method of plan machine behavior includes: each virtual machine level in systems monitors at least one resource;?
Physical host rank in described system monitors that the polymerization of each resource uses;Capture multiple tolerance, institute
State on all virtual machines of each resource of multiple metrics management each physical host in the system
Accumulation uses;And analyze described tolerance with determine in described system multiple operating environments under virtual
Machine behavior.
Another aspect of the present invention or its element can realize with the form of goods, and described goods are tangible
Ground comprises computer-readable instruction, when performing described computer-readable instruction, causes computer to perform
Multiple method step described here.Additionally, another aspect of the present invention or its element can be with dresses
The form put realizes, and described device includes memorizer and at least one processor, described at least one
Reason device is coupled to described memorizer and operable to perform described method step.More additionally, the present invention
Another aspect or its element can realize with the form of parts, described parts are retouched at this for performing
The method step stated or its element;Described parts can include (i) hardware module (multiple), (ii)
Software module (multiple), or the combination of (iii) hardware and software module;(i)-(iii) any
One all realizes particular technology described herein, and described software module is stored in tangible calculating
In machine readable storage medium storing program for executing (or this type of medium multiple).
From the detailed description of the exemplary embodiment to the present invention read below with reference to accompanying drawing, this
Bright these and other target, characteristic and advantage will become clear from.
Accompanying drawing explanation
Fig. 1 is the schematic diagram illustrating system architecture according to an embodiment of the invention;
Fig. 2 is to illustrate that Instance failure feature (signature) according to an embodiment of the invention is joined
The table of number;
Fig. 3 is the schematic diagram illustrating fault signature example according to an embodiment of the invention;
Fig. 4 is to illustrate instance system according to an embodiment of the invention and the table of application tolerance;
Fig. 5 is the table illustrating detected failure scenario according to an embodiment of the invention;
Fig. 6 be illustrate according to an embodiment of the invention for sharing the problem in dynamic cloud environment
The flow chart of the technology determined and diagnose;And
Fig. 7 is the exemplary computer system of at least one embodiment that wherein can realize the present invention
System schematic.
Detailed description of the invention
As the described herein, one aspect of the present invention include the problem in dynamic cloud of sharing determine and
Diagnosis.At least one embodiment of the present invention includes a kind of three stage methods, and it is by limited training
Or working knowledge identifies fault as early as possible.Initially examine for affected virtual machine and resource
Surveying and after fault location, diagnostic engine uses the fault signature of expert's definition to be some by failure modes
One in type (cloud is relevant or application is relevant).As detailed herein, at least the one of the present invention
The framework of individual embodiment allows system expands to new application, monitored tolerance and analytical technology.
Therefore, embodiments of the invention include distinguishing cloud relevant abnormalities and application and trouble.For example,
Cloud relevant abnormalities can include that incorrect virtual machine (VM) size arranges, causes owing to sharing
Impact, and reconfigure and/or migrate.Non-cloud/application and trouble such as can include error configurations
Application, software defect, hardware fault, live load change, and application or operating system (OS)
Update.
Additionally, as the described herein, at least one embodiment of the present invention includes being operated by use
Contextual information and monitored VM, resource and application tolerance, promote to adapt to the Rapid Variable Design of environment.
Additionally, in an example of the present invention embodiment, the result from different analytical technologies can be combined
To improve accuracy.
As described in detail further at this, monitor that data can include monitoring various system and application tolerance,
Such as CPU (CPU), memorizer utilization rate, cache hit/miss rate, magnetic
Dish read/write, network utilization, page fault, the context switching of each VM, end-to-end prolong
Late etc..Each data metric can be represented as rolling average time series after pre-processing, and
Reference format allows problem detection technology to expand to newly measure.Supervision can also comprise operating environment, bag
Include the context wherein performing application to capture the application that coexists.Can also use physical host CPU,
Memorizer, cache, Web vector graphic etc. characterize.
As described, at least one embodiment of the present invention includes a kind of determining for problem and reliably
Three stage methods of diagnosis.In this type of example embodiment, the first stage identifies the potential of Deviant Behavior
Symptom also generates event, the particular VM involved by each state event location and resource.Second stage is passed through
Other resource on involved VM, and across running its of same application for involved resource
Its VM calculates dependency, analyzes each event further.3rd diagnostic phases is at described relevance values
Any notable deviation of middle identification and normal use behavior, with by failure modes be multiple cloud relevant and/
Or the one in application relevant abnormalities.Therefore, detect and normal behaviour when light weight event generation phase
Deviation time, comprehensive diagnostic will be triggered.
At least one embodiment of the present invention can include being considered as application black box, and needs limited
Train or need not training to process new opplication.With only monitor one or two resource (such as CPU
With memorizer) many existing issues determine that method is different, at least one embodiment of the present invention includes
Monitoring resource widely, including CPU, memorizer, cache, disk and network, this enables standard
Really detection and location multiclass fault.
Additionally, at least one embodiment of present system is automated, and and if only if during labelling exception
Just need manual intervention.The potential source of described system indication problem is to recover.
The diagnostic phases of at least one embodiment of the present invention relies on Professional knowledge so that exactly to fault
Classify.For example, it is possible to provide Professional knowledge by standardized fault signature, these faults are special
Levy how the relevant each fault of capture shows as the spy of the deviation with normal behaviour in difference metric datas
Property.When Deviant Behavior being detected, by with the character of the deviation of normal behaviour and various available fault
Characteristic matching is to classify to fault.Described fault signature can be with standardized extensible markup
Language (XML) form represents, and expert can add or edit the knowledge about failure scenario with
For classification in the future.
Another concept described here is the concept of operating environment.Therefore, in order to detect system exactly
Fault in system, at least one embodiment of the present invention is included in each VM rank and monitors various resources,
And monitor that the polymerization of each resource uses in physical host rank.Operating environment capture management resource exists
The tolerance that accumulation on all VM of each physical host uses.The behavior of particular VM is a behaviour
Making may be normal under environment, but may be abnormal under another operating environment.
For example, it is considered to running the VM of application, it is delayed over normal and/or acceptable delay.
Memorizer utilization rate on this VM is the highest, and the physical server of this VM of trustship shows the highest
Page fault and cache-miss.Only have this information and possibly cannot disclose the character of problem, because
This is likely to be due to high workload load, applies relevant abnormalities or from a server to another server
Real-time VM migrate caused.Analyze operation context and will assist in identification problem.VM migrates in real time
The highest page fault and cache-miss will be shown as on source and destination's server, and
Other VM of the same application of trustship and server are by unaffected.This impact will be also interim, directly
Completing to migration, hereafter application performance should recover normal.Application is abnormal will only affect this VM of trustship
Server, and run other VM of same application by unaffected.The live load increased is by shadow
Ring all VM running same application, and by revealed by the dependency across VM.
Fig. 1 is the schematic diagram illustrating system architecture according to an embodiment of the invention.For example,
Fig. 1 shows the cloud architecture 102 including multiple cluster.Such as, a cluster is illustrated as service
Device main frame 104, it includes VM 106,108 and 110, and another cluster is illustrated as server master
Machine 112, it includes VM 114,116 and 118.
Fig. 1 also show system component, including monitoring that engine 120, event generate engine 132, problem
Determine engine 138 and diagnostic engine 146.As the described herein, monitor that engine 120 is for various
Tolerance (including CPU, memorizer, cache, network and disk resource) monitors each virtual machine
And physical server.Event generates engine 132 and identifies the potential symptom of fault and generate event.Problem
Determine that engine 138 analyzes described event to determine the deviation with normal behaviour, diagnostic engine 146 basis
Described exception is classified by Professional knowledge.In conjunction with the description to Fig. 1, it is described further below each
Stage.
As described, monitor that engine 120 is collected for each virtual machine and physical server and locates
Manage the various systems relevant with CPU, memorizer, cache, network and disk resource and answer expenditure
Amount.Specifically, system metrics catcher (profiler) module 122 is collected tolerance and they is passed
Delivering to data pre-processor module 126, it performs some by submodule 128 and filters and smooth operation.
Pretreatment obtains sample mean and the exceptional value of deletion error by submodule 130 in moving window
And any distortion (kink) in smoothed data.Additionally, monitor that engine 120 also includes application tolerance
Collector module 124 is to collect application tolerance.
Because Deviant Behavior only can be identified as trend and be not from any single sample value, institute
Time series data will be processed with the algorithm at least one embodiment of the present invention.Data point is (at this
The a series of moving averages being defined as in Fixed Time Interval) constitute the present invention at least one is real
That executes the Outlier Detection Algorithm in example inputs unit substantially.As time goes on, for monitored
Each tolerance generates multiple data points.Therefore, monitor that the output of engine 120 is data point stream, each
Data point is derived from the supervision data time series corresponding with each system and application tolerance.Additionally, system
It is adapted to monitor the new tolerance of time series data being represented as being similar to.
Consider below in connection with the example constructing data point from time series data.Assume from following 12
Individual cpu busy percentage sample starts: 34.2,26.8,32,27.7,38.2,29,30.1,28.3,33.5,31.1,
27.8,39}.It follows that obtain moving average in window size (such as, w=3): 31,33,
28.8,32.6,32.4,29.1,30.6,30.9,30.8,32.6,30.3,29.6}.Then, from movement above
Average time extracts data point in sequence data.Such as, if the gap length of each data point is determined
Position is k=4, then acquisition three below data point: [31,33,28.8,32.6], [32.4,29.1,30.6,30.9]
[30.8,32.6,30.3,29.6].
It is noted that measure at virtual machine level and physical host rank collection system respectively.This is
Because some resource (such as CPU and memorizer) is generally between the VM on physical host
It is partitioned, and other resource (such as cache) is shared.Particular VM is run in order to capture
The context of physical host or state, at least one embodiment of the present invention includes defining operation environment
Concept.As it has been described above, operating environment capture tolerance, described metrics management resource is at each physical host
All or multiple VM on accumulation use.Such as, this can include that each physical host is (across it
The VM of all trustships) total CPU and memorizer utilization rate and cache hit rate.Such as,
In the case of the real-time VM causing abnormal application behavior (such as, bigger application delay) migrates,
It is observed that the cache hit/miss on the physical server of trustship fault VM and the page
Mistake, the cache life on other physical server of other VM running same application with trustship
In/miss compare abnormal (extremely) with page fault.
As described, event generation engine 132 uses about deviateing the specific of normal behaviour
Many of VM() and the information of resource (multiple), identify the potential symptom of Deviant Behavior and trigger thing
Part.The generation of event can not directly represent the existence of fault, but can only show the possibility of fault
Property.May need to analyze to be confirmed whether to exist fault further, and if it is present to fault
Classify.Such as, hard work load (but non-faulting) that its degree is not seen before may touch
The event of sending out, because the memorizer utilization rate of VM is the highest.But, analyze (by problem further
Determine that engine 138 performs) this and other resource (such as CPU and disk) or operation may be disclosed
The high usage of other VM of same application is correlated with, be the most only defined as high workload load scenarios and not
It it is fault.
To be reaffirmed, the effect of event generation engine 132 determines that the position of potential Deviant Behavior,
Can to be further analyzed to analyze described particular VM (multiple) and resource (multiple).
This at least one embodiment being easy to the present invention expands to generate a large amount of large scale system monitoring data.This
Outward, event generates engine 132 and utilizes the one in multiple machine learning techniques, via model constructor
Module 134 builds the model of normal use behavior.Described model in the data observed of detection with
The deviation of normal behaviour, and trigger event is with output to event analyser module 136.
Model constructor module 134 can realize modeling technique (such as hidden Markov model
(HMM), arest neighbors, K are average) and statistical technique (such as regression technique).Described modeling
Technology attempts qualitatively and/or quantitatively measuring the deviation journey between analyzed data point and past observing value
Degree.The framework of at least one embodiment of the present invention allows to generate any modeling technique as event to draw
Hold up a part of plug and play of 132.
For example, modeling based on HMM is directed to use with in a large number (such as, hundreds of) data point
Training HMM, causes the concept of described model capture normal behaviour.Because application behavior may be with negative
Carry or live load mixing and change, so at least one embodiment of the present invention includes training multiple
HMM (is provided with for different sights the most wherein and closes typical case's application live load sight
Knowledge).If this step is to quickly and have low overhead, then limit created HMM
Quantity.Can be that HMM provides new test data point to determine that it meets journey with described model
Degree how.The advantage of the method is that HMM can capture data sample sequence and is not only to include this
The sample set of data point.
Additionally, arest neighbors technology include calculating the data point that considered and setting models set of data points it
Between distance metric.Described model data point set can be similar to the training set for HMM modeling
Close, or can be the certain amount of data point selected from recent past.Later approach will
Can As time goes on compare taking up room of application, and can detect with various time scales
Change.Additionally, bigger distance metric will produce and normal or the relatively large deviation of expected behavior.Refer to
Going out, this type of technology need not any type of training.
Distance metric can be one of selected from some example alternate items: the corresponding sample in two data points it
Between simple vector difference, from individual specimen calculate aggregate statistics data between difference (average
And standard deviation), the distance in Euclidean space, combination in any etc. of these tolerance.With HMM
Difference, described distance metric provides the quantitative measurement of the departure degree observed in systems.
Additionally, develop multiple statistical technique to analyze continuous data.It is, for example possible to use it is known
The model of regression technique exploitation normal behaviour, and any new data point can be compared with this model
With degree of being determined for compliance with.Whether described for instruction data point is met described model by this, if or do not meet,
Then indicate the departure degree of described data point and described model.Other statistical test such as can include symbol
Close goodness test or test based on threshold value.Additionally, statistical technique can apply to obtain from data point
Rectangular histogram rather than be applied to data point itself.
As it has been described above, Problem Determination Engine 138(it include that statistical analyzer module 140 and fault are divided
Class device module 142) use across VM and the statistic correlation of resource, identify exception and locate them
To many of affected resource (multiple) and VM().Event is generated what engine 132 generated
Each event, this stage further analytical data is abnormal to identify.Therefore, at least the one of the present invention
In individual embodiment, never call Problem Determination Engine 138, unless the event of generation.As an example it is assumed that
For virtual machine VMiOn tolerance MjGeneration event.Then this stage calculates relevant MjData and
Relevant VMiOn other tolerance each data between dependency, and relevant VMiOn Mj
Data and about run same application other VM each on MjData between dependency
(this information is the most all provided).According to knowing about the Canonical correlation value under normal behaviour
Know, it is indicated that with any notable deviation of normal relevance values.These deviations are provided to diagnostic engine
146, in order to be classified into different classes of fault and non-faulting behavior according to Professional knowledge.
As an example it is assumed that TtrainingRepresent window W sometimetrIn comprise training data point
Time series.TtestRepresent a certain window WteThe time series of interior test data point, in order to Ttest
In number of data points less than or equal to TtrainingIn number of data points.When will train and test data
Between sequence combine to produce dependency sample time-series, be expressed as Tcorrelation.Point out
, each system metrics on each VM will have the time series of correspondence.Vm and r is made to divide
Biao Shi virtual machine and system metrics.Use symbol TX-vm-rRepresent system metrics r of virtual machine vm
Time series X(wherein X can be training, test or sample time-series).It follows that meter
Calculate Tcorrelation-vm1-rAnd Tcorrelation-vm2-rBetween dependency (c1), calculate T similarlytraining-vm1-r
And Ttraining-vm2-rDependency (c2) between time series.It is compared to each other the relevance values (c1 obtained
And c2) to identify described event.(c2) is used as threshold value (T) described event to be defined as
Workload intensity change (as abs (c1-c2)≤T) or abnormal when T (as abs (c1-c2) >).
Said process will be reliably used for the change of live load, because any change (example of live load
As illustrated by the assembly 144 in Fig. 1) will not affect significantly across the VM running same application
Dependency;That is, the CPU of all VM and memorizer utilization rate are by along with the increase of live load
And increase.By contrast, if there is affecting the exception of certain particular VM and resource, then this will be
The tolerance of the resource that management no longer associates with other VM running same application reflects.
The correlation total calculated in this process is that tolerance number is plus the VM number running application.Refer to
Going out, at least one embodiment of the present invention includes the proximity analyzing the position at generation event place
Territory.This contributes to expanding to bigger system size and some monitored tolerance.
Diagnostic engine 146(includes fault location module 148) use predefined Professional knowledge, will ask
Topic determines that the potential abnormal sight that engine 138 detects is categorized as in various faults and non-faulting classification
A kind of.Described Professional knowledge can be supplied to system with standardized fault signature form.Fault is special
Levy in the relevance values of Problem Determination Engine 138 capture and capture in operating environment one group with just
The deviation of Chang Hangwei, described deviation is the characteristic of they described faults.When Deviant Behavior being detected
Time, attempt to match with the deviation of normal behaviour and one or more known fault feature.If sent out
Now mate, then fault will successfully be classified by diagnostic engine 146.If not finding coupling, then
Described behavior is labeled as Deviant Behavior by diagnostic engine, and this engine does not has the knowledge of the behavior.Fig. 2 retouches
State the fault type characteristic of at least one embodiment various failure scenarios of differentiation contributing to the present invention.
Fig. 2 is the table 202 illustrating Instance failure characteristic parameter according to an embodiment of the invention.
At least one embodiment of the present invention uses XML format to describe fault signature, in order to when system detects
During to fault, it is allowed to extend to add new fault signature (such as, systems specialists adding).Cause
This, at least one embodiment of present system can be helped by expert, learns the event to new type
Barrier is classified.Can be each fault tool in conjunction with the hypothesis that at least one embodiment of the present invention is carried out
There is distinctive feature, and described feature can be with monitored measurement representation.
Fig. 3 is the schematic diagram illustrating fault signature example 302 according to an embodiment of the invention.
For example, Fig. 3 shows the expert changing sight about two types fault and a kind of live load
The example of the feature created.Use and be represented as feature described in the different context-descriptives of labelling.Institute
State only capture in feature to deviate the tolerance of normal behaviour and be threshold value by these measurement representations, described threshold value
The minimum deflection in relevance values needed for expression and described characteristic matching.
Such as, all VM across hosts applications are calculated as CPU dependency.In relevance values with
The deviation of normal behaviour must be at least identical with the threshold size being defined as (CPU-corr-thr),
So as coupling this feature.The different contextual taggings used include: (a) VM environmental context is in void
The capture fault performance of planization resource class;B the capture of () operating environment context obtains at physical host
The tolerance obtained, these measurement representations use across the polymerization of all VM being positioned on described main frame;(c)
System supervisor context captures any special log message obtained from system supervisor;And
(d) application context capture application level performance metric.
In those contextual taggings of the fault that fault signature can include characterizing its definition one or many
Individual, and allow at least one embodiment of the present invention to uniquely identify this feature.Fig. 3 shows
The example aspects changed about fault and live load.According to source and destination host page fault discrimination
Real-time migration fault and other fault (operation context), and use CPU, memorizer and context
Notable deviation in switching number identifies that the VM size of mistake is arranged.Fig. 2 summarizes different events
Barrier feature identification symbol, they contribute at least one embodiment of the present invention to each classification of fault and
Subclass is identified and classifies.
As detailed herein, at least one embodiment of the present invention includes monitoring various system and application
Measure and detect various fault.Therefore, Fig. 4 is to illustrate example according to an embodiment of the invention
System and the table 402 of application tolerance.As described herein, at least one embodiment bag of the present invention
Including supervision engine, it is collected from physical host and virtual machine thereon and measures for fault detect and examine
Disconnected.Measurement data can cross over multiple system and application tolerance.The present invention listed by table 402 in Fig. 4
At least one embodiment monitored one group system tolerance.3rd row appointment of table 402 is virtual
Machine rank still collects tolerance in physical server rank.
Fig. 5 is the table 502 illustrating detected failure scenario according to an embodiment of the invention.
For example, Fig. 5 describes the various faults using at least one embodiment of the present invention to be detected.
The details of Instance failure are described below.
In fault " the VM resource size of mistake is arranged ", VM resource distribution (CPU and storage
Device) it is wrongly configured.Consider following error configuration sight: (a) for target VM CPU and
The reserved allocated low-down value of memorizer;B () CPU reservation is configured with low-down value, and
Memorizer is reserved is virtual value;And (c) memorizer is reserved is arranged to low-down value, and CPU
Reserved have virtual value.
Fault " fault VM real-time migration " reflection is likely to be due to the problem that real-time migration of virtual machine produces.
Consider following two sight.In the first sight, VM is moved to the physical host of heavy congestion.
The capacity of this physical host only be enough to accommodate the VM being migrated.In the second sight, the most enough
Enough resources perform the source host migration VM of the heavy congestion migrated.
" live load blending change " fault corresponding to live load mixing or live load character more
Changing, change application is used the degree of different resource by this." workload intensity change " indicating fault
The intensity of live load increases or reduces, and the character of live load itself keeps constant.
One of " application error configuration " representation for fault application parameter is arranged to incorrect or invalid value
Situation.Additionally, " VM configuration error " capture uses some configuration parameter punching with physical host
Prominent mode configures situation during virtual machine.Such as, if the CPU on source and destination main frame does not has
Promising virtual machine provides one group of identical characteristic, and the activity of the most such as VM real-time migration etc will failure.
" VM reconfigures " may reconfigure period generation at VM, and VM reconfigures permissible
By the adjustment of dynamic VM size, VM real-time migration or real by creating new VM in shared cloud
Example realizes." impact caused due to resource-sharing " fault refers to following situation: at Same Physical
Two or more VM contention for resources of trustship on machine, and the performance of one or more VM is subject to
Impact.For example, it is contemplated that the cache scarcity application on a VM, this application and another VM
On cache sensitive application by common trustship.Cache scarcity VM actively uses cache,
Thus affect cache hit rate and the performance of other application significantly.CPU monopolizes (hog) table
Show that application enters the sight that the Infinite Cyclic of exclusive (that is, using major part) CPU calculates.This fault
May produce by introducing C code block, this code block performs Floating-point Computation in Infinite Cyclic.This will
Consume most of available processors cycle of VM, thus cause poor application performance.
Memorizer is monopolized and is represented the sight that there is memory leakage in application.This is real by running c program
Existing, this program consumes a large amount of memorizeies in target VM, and (it is constantly pre-from heap by malloc
Retain reservoir and do not discharge allocated block).Leave considerably less memorizer for application, thus cause
Application throughput significantly reduces.
Additionally, the situation that disk is monopolized is similar to CPU and sight monopolized by memorizer, and can make
Realize with the Hadoop sequence example of multiple parallel runnings, in order to produce high disk utilization sight.
In the fault that referred to as network is exclusive, the utilization rate of network link is the highest.Additionally, be referred to as height
In the fault that speed caching is exclusive, the benchmark of cache intensive and target VM coexist the most slow with simulation
Deposit exclusive.
Fig. 6 be illustrate according to an embodiment of the invention for sharing the problem in dynamic cloud environment
The flow chart of the technology determined and diagnose.Step 602 includes monitoring described common at least one tolerance
Enjoy each virtual machine in dynamic cloud environment and physical server.Tolerance can include processing single with central authorities
The relevant tolerance such as unit, memorizer, cache, Internet resources, disk resource.Monitor and additionally may be used
To include exporting data point stream, each data point is derived from and each system in described shared dynamic cloud environment
Unite and measure corresponding supervision data time series with application.Additionally, at least one enforcement of the present invention
In example, monitor and include monitoring at virtual machine level and physical host rank respectively.
Step 604 includes identifying problem symptom according to described supervision and generating event.Identification symptom is permissible
Including identifying trend from time series data.Additionally, at least one embodiment of the present invention includes profit
Build the model of normal use behavior with machine learning techniques, and use described model to detect to be supervised
Depending on data in the deviation of normal behaviour.
Step 606 includes analyzing described event to determine the deviation with normal behaviour.Analysis can include
The statistic correlation across virtual machine and resource is used to position described deviation relative to impacted resource and void
The position of plan machine.Additionally, at least one embodiment of the present invention includes analyzing generates described event place
The peripheral region of position.
Step 608 include according to existing knowledge by described event classification be exception based on cloud or application therefore
Barrier.Described existing (or specialty) knowledge can include fault signature, and wherein fault signature captures one group
As event feature and the deviation of normal behaviour.Additionally, at least one embodiment bag of the present invention
Include when the event of generation, the described deviation with normal behaviour and fault signature are matched, and if
Find coupling, the most described event is classified, if not finding coupling, then by described thing
Part is labeled as Deviant Behavior.
Also as the described herein, it is multiple that at least one embodiment of the present invention includes determining in system
Virtual machine behavior under operating environment.This such as may include that each virtual machine level in systems
Monitor at least one resource;Physical host rank in the system monitors that the polymerization of each resource makes
With;Capture multiple tolerance, each resource of the plurality of metrics management each physics in the system
Accumulation on all virtual machines of main frame uses;And it is many to determine in described system to analyze described tolerance
Virtual machine behavior under individual operating environment.Additionally, at least one embodiment of the present invention can include root
Described intersystem problem is detected according to the described virtual machine behavior under multiple operating environments.
As the described herein, the technology shown in Fig. 6 can also include providing a kind of system, wherein
Described system includes different software module, and each of described different software module is included in tangible
In computer-readable recordable storage medium.Such as, all modules (or its any subset) can be
In same medium, or each can be in different medium.Described module can include detailed herein
Any or all of assembly.In one aspect of the invention, described module such as can be in hardware handles
Run on device.Then the described different software module of described system can be used (as it has been described above, firmly
Perform on part processor) perform described method step.Additionally, a kind of computer program can wrap
Including tangible computer-readable recordable storage medium, it has the code being suitable to be performed to perform
At least one method step described here, provides described different software module including for described system.
Additionally, the technology shown in Fig. 6 can be by including the meter of computer usable program code
Calculation machine program product realizes, and described computer usable program code is stored in data handling system
Computer-readable recording medium in, and wherein download institute by network from remote data processing system
State computer usable program code.Additionally, in one aspect of the invention, described computer program product
Product can include the meter in the computer-readable recording medium being stored in server data processing system
Calculation machine usable program code, and wherein said computer usable program code downloaded to far by network
Journey data handling system, in order to use in the computer-readable recording medium of described remote system.
Person of ordinary skill in the field knows, various aspects of the invention can be implemented as system,
Method or computer program.Therefore, various aspects of the invention can be implemented as following shape
Formula, it may be assumed that hardware embodiment, completely Software Implementation (include firmware, resident soft completely
Part, microcode etc.), or the embodiment that software and hardware aspect combines, may be collectively referred to as " electricity here
Road ", " module " or " system ".Additionally, various aspects of the invention are also implemented as at meter
The form of the computer program in calculation machine computer-readable recording medium, comprises calculating in this computer-readable medium
The program code that machine is readable.
One aspect of the present invention or its element can realize with the form of device, and described device includes depositing
Reservoir and at least one processor, at least one processor described is coupled to described memorizer and can grasp
Make to perform exemplary method steps.
Additionally, one aspect of the present invention can use on general purpose computer or work station run soft
Part.With reference to Fig. 7, this type of embodiment such as can use processor 702, memorizer 704 and input
/ output interface (such as, is formed by display 706 and keyboard 708).Term " processor " as
This is used, it is intended to includes any processing equipment, such as, includes CPU(CPU) and
/ or the processing equipment of other formal layout circuit.Additionally, term " processor " can refer to multiple individually
Processor.Term " memorizer " is intended to the memorizer including associating with processor or CPU, example
Such as RAM(random access memory), ROM(read only memory), fixed memory device (example
Such as, hard disk drive), removable memory equipment (such as, floppy disk), flash memory etc..Additionally,
Phrase " input/output interface " is as used in this, it is intended to include such as entering data into
Mechanism (such as, mouse) in described processing unit, and close with described processing unit for providing
Mechanism's (such as, printer) of the result of connection.As a part for data processing unit 712, place
Reason device 702, memorizer 704 and input/output interface (such as display 706 and keyboard 708) example
As interconnected by bus 710.Can also such as pass through bus 710 is network interface 714(example
Such as network interface card, it is provided that network interface card is to be connected with computer network) and Media Interface Connector 716(is such as
Floppy disk or CD-ROM drive, it is provided that they are to be connected with medium 718) provide applicable
Interconnection.
Therefore, as the described herein, including instruction or the generation of the described method for performing the present invention
The computer software of code can be stored in association storage device (such as, ROM, fixing maybe can move
Dynamic memorizer) in, and when ready for use, (such as, it is loaded into RAM by partly or entirely loading
In) and realized by CPU.This type of software can include but not limited to firmware, resident software, micro-generation
Code etc..
The data handling system being suitable for storage and/or execution program code will include that at least one passes through
System bus 710 is directly or indirectly connected to the processor 702 of memory component 704.Described storage
The local storage, the Large Copacity that use the term of execution that device element can being included in program code actual are deposited
Storage device and provide the interim storage of at least some of program code with reduce must the term of execution from greatly
The cache memory of the number of times of mass storage devices retrieval coding.
Input/output or I/O equipment (include but not limited to that keyboard 708, display 706, indication set
Standby etc.) directly (can such as pass through bus 710) or by middle I/O controller (for understand
See and be omitted) it is connected with described system.
Network adapter (such as network interface 714) can also be connected to described system so that described
Data handling system can be become and other data handling system or remote by intermediate dedicated or public network
Journey printer or storage device are connected.Modem, cable modem and Ethernet card are simply
A few in currently available types of network adapters.
As used in this (including claim), " server " includes runtime server program
Physical data processing systems (such as, system 712 as shown in Figure 7).It will be appreciated that this type of
Physical server can include including display and keyboard.
As described, various aspects of the invention can be taked to be included in computer-readable medium
The form of computer program, described computer-readable medium has the computer being included in can
Reader code.Furthermore, it is possible to use the combination in any of computer-readable medium.Computer-readable is situated between
Matter can be computer-readable signal media or computer-readable recording medium.Computer-readable storage medium
Matter such as can be but not limited to electricity, magnetic, optical, electromagnetic, infrared ray or quasiconductor system,
Device or device, or the combination of above-mentioned any appropriate.Computer-readable recording medium is more specifically
Example (non exhaustive list) including: there is the electrical connection of one or more wire, portable meter
Calculation machine dish, hard disk, random-access memory (ram), read only memory (ROM), erasable
Formula programmable read only memory (EPROM or flash memory), optical fiber, the read-only storage of portable compact disc
Device (CD-ROM), light storage device, magnetic memory device or the combination of above-mentioned any appropriate.
In this document, computer-readable recording medium can be any tangible medium comprised or store program,
This program can be commanded execution system, device or device and use or in connection.
Computer-readable signal media can include the most in a base band or pass as a carrier wave part
The data signal broadcast, wherein carries computer-readable program code.This type of data signal propagated
Can take various forms, include but not limited to electromagnetic signal, optical signal or above-mentioned any conjunction
Suitable combination.Computer-readable signal media can also is that appointing beyond computer-readable recording medium
What computer-readable medium, this computer-readable medium can send, propagate or transmit for by referring to
Make execution system, device or device use or program in connection.
The program code comprised on computer-readable medium can with suitable medium transmission, including but
It is not limited to wireless, wired, optical cable, RF etc., or the combination of above-mentioned any appropriate.
Each side for performing the present invention can be write with the combination in any of at least one program language
The computer program code of the operation in face, described programming language includes object-oriented programming
Language such as Java, Smalltalk, C++ etc., also include the process type programming language of routine
Such as " C " language or similar programming language.Program code can fully be counted user
Perform on calculation machine, perform the most on the user computer, perform as independent software kit,
Part part the most on the user computer perform the most on the remote computer or completely at remote computer or
Perform on server.In the situation relating to remote computer, remote computer can be by any number of
The network of class includes that Local Area Network or wide area network (WAN) are connected to subscriber computer,
Or, it may be connected to outer computer (such as utilizes ISP to pass through the Internet
Connect).
At this by with reference to method, device (system) and computer program product according to embodiments of the present invention
Flow chart and/or the block diagram of product describe various aspects of the invention.Should be appreciated that flow chart and/or frame
The combination of each square frame in each square frame of figure and flow chart and/or block diagram, can be by computer program
Instruction realizes.These computer program instructions can be supplied to general purpose computer, special-purpose computer or its
The processor of its programmable data processing means, thus produce a kind of machine so that these computers
Programmed instruction, when being performed by the processor of computer or other programmable data processing means, is produced
The device of the function/action of regulation in one or more square frames in flowchart and/or block diagram.
These computer program instructions can also be stored in computer-readable medium, these instructions make
Obtain computer, other programmable data processing means or miscellaneous equipment to work in a specific way, thus,
The instruction being stored in computer-readable medium just produces and includes in flowchart and/or block diagram
Goods (the article of of the instruction of the function/action of regulation in one or more square frames
Manufacture).Therefore, one aspect of the present invention includes visibly comprising computer-readable instruction
Goods, when perform described computer-readable instruction time, cause computer perform described here multiple
Method step.
Can also computer program instructions be loaded into computer, other programmable data processing means,
Or on miscellaneous equipment so that hold on computer, other programmable data processing means or miscellaneous equipment
Row sequence of operations step, to produce computer implemented process so that computer or other
The instruction performed on programmable device provides the one or more square frames in flowchart and/or block diagram
The process of the function/action of middle regulation.
Flow chart and block diagram in accompanying drawing show the system of multiple embodiments according to the present invention, method
Architecture in the cards, function and operation with computer program.In this, flow process
Each square frame in figure or block diagram can represent a module, component section or a part for code, described
A part for module, component section or code include at least one for realize regulation logic function can
Perform instruction.It should also be noted that some as replace realization in, the function marked in square frame
Can also occur to be different from the order marked in accompanying drawing.Such as, two continuous print square frames are actually
Can perform substantially in parallel, they can also perform sometimes in the opposite order, and this is according to involved
Depending on function.It is also noted that each square frame in block diagram and/or flow chart and block diagram and/
Or the combination of the square frame in flow chart, can be with performing the function of regulation or the special based on firmly of action
The system of part realizes, or can realize with the combination of specialized hardware with computer instruction.
It should be noted that, any method described here can include another step providing a kind of system
Suddenly, described system includes the different software module comprised in a computer-readable storage medium;Described mould
Block such as can include any or all of assembly shown in Fig. 1.Then described system can be used
Described different software module and/or submodule (as it has been described above, performing on hardware processor 702) are held
The described method step of row.Additionally, computer program can include computer-readable recording medium,
It has the code being suitable to be performed to perform at least one method step described here, including for
Described system provides described different software module.
Under any circumstance, it will be appreciated that the assembly being shown in which can hardware in a variety of manners,
Software or combinations thereof realize;Such as, special IC (multiple) (ASIC), function
Circuit, have associative storage through properly programmed universal digital computer etc..Give at this
After the teachings of the present invention provided, those skilled in the art is by it is contemplated that its of assembly of the present invention
It realizes.
Term as used herein is intended merely to describe specific embodiment and be not intended as the present invention
Restriction.As used in this, singulative " ", " one " and " being somebody's turn to do " are intended to equally
Including plural form, unless the context clearly dictates otherwise.It will also be understood that ought be in this description
During use, term " includes " and/or " comprising " specifies the characteristic of statement, integer, step, behaviour
Make, element and/or the existence of assembly, but it is not excluded that other characteristic, integer, step, operation,
Element, assembly and/or the existence of its group or increase.
Counter structure, material, operation and the dress of all function limitations in claims below
Put the equivalent of (means) or step, it is intended to include any for specifically note in the claims
Other unit perform the structure of this function, material or operation combinedly.
At least one aspect of the present invention can be provided with the effect of benefit, such as, create cloud and reconfigure work
Dynamic feature is to process reconfiguring of virtualization driving.
Give the description to various embodiments of the present invention for illustrative purposes, but described description is not
It is intended to exhaustive or is limited to the disclosed embodiments.In the scope without departing from described embodiment
In the case of spirit, for a person skilled in the art many modifications and variations will be all aobvious and
It is clear to.The selection of term as used herein is to most preferably explain that the principle of embodiment, reality should
With or technological improvement to the technology in market, or enable those skilled in the art to understand at this
Disclosed embodiment.
Claims (15)
1. for sharing the method that the problem in dynamic cloud environment determines and diagnoses, described method bag
Include:
The each virtual machine in described shared dynamic cloud environment and each thing is monitored at least one tolerance
Reason server;
Identify the symptom of problem in shared dynamic cloud environment according to described supervision and generate and described disease
The event that shape is corresponding;
Analyze described event to determine the deviation with normal behaviour;And
According to the comparison of described event Yu at least one fault signature, it is based on cloud by described event classification
Exception or application and trouble, wherein, at least one fault signature described capture one group with normal behaviour
Deviation;
Wherein performed at least one step in above-mentioned steps by computer equipment.
Method the most according to claim 1, at least one tolerance wherein said includes processing single with central authorities
At least one relevant tolerance in unit, memorizer, cache, Internet resources and dish resource.
Method the most according to claim 1, wherein said supervision includes exporting data point stream, every number
Strong point is derived from the supervision data corresponding with each system in described shared dynamic cloud environment and application tolerance
Time series.
Method the most according to claim 1, wherein said supervision includes respectively at virtual machine level and thing
Reason main frame rank monitors.
Method the most according to claim 1, wherein said identification includes identifying from time series data
Trend.
Method the most according to claim 1, including utilizing machine learning techniques to build normal use row
For model.
Method the most according to claim 6, detects in monitored data including using described model
Deviation with normal behaviour.
Method the most according to claim 1, wherein said analysis includes using across virtual machine and resource
Statistic correlation positions described deviation relative to impacted resource and the position of virtual machine.
Method the most according to claim 1, including analyzing around the position generating described event place
Region.
10. for sharing the system that the problem in dynamic cloud environment determines and diagnoses, described system
Including:
The each virtual machine being suitable to monitor in described shared dynamic cloud environment at least one tolerance is with every
The module of individual physical server;
Be suitable to identify the symptom of problem in shared dynamic cloud environment according to described supervision and generate and institute
State the module of event corresponding to symptom;
Be suitable to the module analyzing described event to determine the deviation with normal behaviour;And
Be suitable to the comparison according to described event Yu at least one fault signature, be base by described event classification
In exception or the module of application and trouble of cloud, wherein, at least one fault signature described capture one group with
The deviation of normal behaviour.
11. systems according to claim 10, wherein said being suitable to monitors institute at least one tolerance
The module stating each virtual machine in shared dynamic cloud environment and physical server includes being suitable to export data
Point stream submodule, each data point be derived from each system in described shared dynamic cloud environment and answer
With supervision data time series corresponding to tolerance.
12. systems according to claim 10, wherein said system includes:
Be suitable to the module of the model utilizing machine learning techniques to build normal use behavior;And
It is adapted in use to described model to detect in monitored data the module with the deviation of normal behaviour.
13. 1 kinds for sharing the system that the problem in dynamic cloud environment determines and diagnoses, described system
Including:
Memorizer;
It is coupled at least one processor of described memorizer;And
At least one different software module, each different software module is comprised in tangible computer can
Reading in medium, at least one different software module described includes:
Monitoring engine modules, it performs on the processor, for measuring at least one
Monitor each virtual machine in described shared dynamic cloud environment and each physical server, and export with every
The supervision data time series that individual tolerance is corresponding;
Event generates engine modules, and it performs on the processor, for according to described supervision
Identify the symptom of the problem in shared dynamic cloud environment and generate the event corresponding with described symptom;
Problem Determination Engine module, it performs on the processor, is used for analyzing described event
To determine and to position the deviation with normal behaviour;And
Diagnostic engine module, it performs on the processor, for according to described event with extremely
The comparison of a few fault signature, is exception based on cloud or application and trouble by described event classification, its
In, at least one fault signature described captures the deviation of a group and normal behaviour.
14. 1 kinds for the method determining the virtual machine behavior under the multiple operating environments in system, institute
The method of stating includes:
Each virtual machine level in systems monitors at least one resource;
Physical host rank in the system monitors that the polymerization of each resource uses;
Capture multiple tolerance, each resource of the plurality of metrics management each physics in the system
Accumulation on all virtual machines of main frame uses;And
Analyze described tolerance with determine in described system multiple operating environments under virtual machine behavior, its
In, described analysis comprises tolerance described in comparison and at least one fault signature, wherein, described at least one
Individual fault signature captures the deviation of a group and normal behaviour;
Wherein performed at least one step in above-mentioned steps by computer equipment.
15. methods according to claim 14, including according to the described virtual machine under multiple operating environments
Behavior and detect described intersystem problem.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/470,589 US8862727B2 (en) | 2012-05-14 | 2012-05-14 | Problem determination and diagnosis in shared dynamic clouds |
US13/470,589 | 2012-05-14 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103428026A CN103428026A (en) | 2013-12-04 |
CN103428026B true CN103428026B (en) | 2016-11-30 |
Family
ID=
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101088072A (en) * | 2004-12-24 | 2007-12-12 | 国际商业机器公司 | A method and system for monitoring transaction based systems |
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101088072A (en) * | 2004-12-24 | 2007-12-12 | 国际商业机器公司 | A method and system for monitoring transaction based systems |
Non-Patent Citations (1)
Title |
---|
Statistical Techniques for Online Anomaly Detection in Data Centers;Chengwei Wang etc.;《Integrated Network Management (IM), 2011 IFIP/IEEE International Symposium on》;20110527;第385-391页,图1 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9864676B2 (en) | Bottleneck detector application programming interface | |
US8862728B2 (en) | Problem determination and diagnosis in shared dynamic clouds | |
EP2956858B1 (en) | Periodicity optimization in an automated tracing system | |
US9658936B2 (en) | Optimization analysis using similar frequencies | |
US9767006B2 (en) | Deploying trace objectives using cost analyses | |
US8843901B2 (en) | Cost analysis for selecting trace objectives | |
US9021447B2 (en) | Application tracing by distributed objectives | |
US9424157B2 (en) | Early detection of failing computers | |
US20130283102A1 (en) | Deployment of Profile Models with a Monitoring Agent | |
CN107967485A (en) | Electro-metering equipment fault analysis method and device | |
CN111459700A (en) | Method and apparatus for diagnosing device failure, diagnostic device, and storage medium | |
WO2014200836A1 (en) | Systems and methods for monitoring system performance and availability | |
CN111949429A (en) | Server fault monitoring method and system based on density clustering algorithm | |
CN117170303B (en) | PLC fault intelligent diagnosis maintenance system based on multivariate time sequence prediction | |
Sirshar et al. | Comparative Analysis of Software Defect PredictionTechniques | |
CN103428026B (en) | For sharing the method and system that the problem in dynamic cloud determines and diagnoses | |
CN112749003A (en) | Method, apparatus and computer-readable storage medium for system optimization | |
CN110263811A (en) | A kind of equipment running status monitoring method and system based on data fusion | |
Xia et al. | Reducing the Length of Field-replay Based Load Testing | |
Tang et al. | Fine-Grained Diagnosis Method for Microservice Faults Based on Hierarchical Correlation Analysis | |
CN117873856A (en) | Software testing method, storage medium and computer equipment | |
Hentrich | Detecting Unusual Performance Behavior in Heterogeneous Environments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20170505 Address after: Room 3, Pacific Plaza, 1 Queen's Road East, Wan Chai, Hongkong,, China Patentee after: Oriental concept Limited Address before: American New York Patentee before: International Business Machines Corp. |