CN109933452A - A kind of micro services intelligent monitoring method towards anomalous propagation - Google Patents
A kind of micro services intelligent monitoring method towards anomalous propagation Download PDFInfo
- Publication number
- CN109933452A CN109933452A CN201910220179.4A CN201910220179A CN109933452A CN 109933452 A CN109933452 A CN 109933452A CN 201910220179 A CN201910220179 A CN 201910220179A CN 109933452 A CN109933452 A CN 109933452A
- Authority
- CN
- China
- Prior art keywords
- service
- interface
- abnormal
- measurement
- micro services
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
The present invention relates to a kind of, and the micro services intelligent monitoring method towards anomalous propagation is established micro services and is called topological diagram to portray anomalous propagation relationship between micro services based on agent skill group monitoring service invocation information;It is called using Lasso regression modeling interface and is associated between measurement, the variation by monitoring correlation model detects abnormal micro services;Intensity of anomaly based on PageRank algorithm evaluation micro services and its calling interface, the present invention realize transparence service monitoring, the prediction of Automatic Measurement value with the service of noting abnormalities, the intensity of anomaly of intelligent assessment figure interior joint with test problems root because.
Description
Technical field
The present invention relates to the method for diagnosing faults of micro services software systems more particularly to it is a kind of towards anomalous propagation in incognito
Business intelligent monitoring method, belongs to software technology field.
Background technique
Monomer-type framework and SOA software architecture are the architectural forms that software company generallys use, by the development of more than ten years,
The complex that software systems have become, scalability is very low with maintainability, and heavy technology debt has been born by enterprise.Current interconnection
Net dog-eat-dog, user demand and market environment moment are in quickly variation, when facing current Internet application,
The scalability of conventional software architectural form and flexibility are obviously insufficient, and design, exploitation, test and O&M cost significantly increase
Add.Therefore, the concept of micro services is suggested, and micro services are a kind of using single application program as the soft of one group of software service external member
Part architectural form, each service operation are communicated in independent process each other by lightweight protocol.The spy of micro services framework
Property is very suitable to agile development and continuous integrating, solves the pain spot of conventional software architectural, obtains academia and industry
Extensive concern and research.
After software systems micro services, improving maintainability and while flexibility, but make between service according to
The relationship of relying is intricate, increases failure odds and the loss of failure bring.Such as in the website of a high flow capacity,
Some serviced component once postpones, and may cause all application resources and is depleted, so-called avalanche effect is caused, when serious
Whole system can be caused to paralyse.Therefore system is effectively monitored, and quick positioning failure is the reason is that ensure micro services reliability and performance
One of key technology.
Mainly there are following a few classes for the work of micro services fault diagnosis: (1) based on the diagnostic method of monitoring metrics.The party
Method is mainly collection system operating index, such as CPU, memory, network etc., when reflecting application program current state and one section with this
Interior operation trend.If a certain measurement is more than preset threshold values, then it represents that system there is a problem, and trigger alarm, so
Afterwards, administrator is using monitoring data as foundation, solved the problems, such as in conjunction with the experience of itself (Wang T, Zhang W, Ye C, Wei J,
Zhong H, Huang T.FD4C:Automatic Fault Diagnosis Framework for Web Applications
In Cloud Computing.IEEE Transactions on Systems, Man, and Cybernetics:
Systems.2016,46(1):61-75;M.Farshchi,J.G.Schneider,I.Weber,and J.Grundy,
“Metric selection and anomaly detection for cloud operations using log and
metric correlation analysis,”Journal of Systems and Software,2018,137,pp.531-
549.);(2) based on the method for monitoring and analyzing of log, log has explicitly recorded the operating condition of system, is convenient for persistence, and
And can easily search for, it is usually the effective means found out failure cause and support more business target
(ELK.https://www.elastic.co/);(3) the monitoring, diagnosing method based on distributed request tracking, by based on mark
The execution route of the method acquisition request of note, compares by the analysis to execution route or by path, Lai Faxian system
Failure (A.Nandi, A.Mandal, S.Atreja, G.B.Dasgupta, and S.Bhattacharya, " Anomaly
Detection Using Program Control Flow Graph Mining From Execution Logs,"22nd
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,
San Francisco,California,USA,2016;T.Jia,P.Chen,L.Yang,Y.Li,F.Meng and J.Xu,"
An Approach for Anomaly Diagnosis Based on Hybrid Graph Model with Logs for
Distributed Services,"IEEE International Conference on Web Services,Honolulu,
HI,2017,pp.25-32.).Wherein realizes simply based on measurement, the monitoring fault diagnosis mode of log, but cannot reflect system
Integrality, Business Stream can not be tracked, the rank of fault location is usually serviced component, in complicated micro services interactive relation
In, administrator will take a substantial amount of time to search and orientation problem;And the monitoring, diagnosing method based on distributed request tracking
Monitor reference of the track as fault diagnosis of request by way of log or code implant, but the expansion of the method monitoring
Malleability is lower, can not accomplish, also do not account for anomalous propagation the problem of transparent to application.
Summary of the invention
Technology of the invention solves the problems, such as: overcoming the deficiencies of the prior art and provide a kind of Efficient fault towards micro services
Diagnostic system.By the calling monitoring to service-transparency, the scalability of system is improved, reduces the shadow that monitoring runs micro services
It rings;By research and application data, realize the fine granularity failure root of interface level because of positioning.
The technology of the present invention solution: a kind of micro services intelligent monitoring method towards anomalous propagation realizes that steps are as follows:
Service call monitoring: the first step monitors service invocation information based on agent skill group, with multi-component system Ni=
(requestUID, serviceUID, spanUID, parentUID, info) Lai Jilu service call relationship, wherein
RequestUID is request identifier, is generated in request inlet;ServiceUID is service identifier;SpanUID is service
Call span identifier;ParentUID is father span identifier, if it is -1, indicates that current span is root span;Info includes
Other information indicates with multi-component system info=(serviceUID, startTime, endTime, duration), wherein
ServiceUID is by serviced component and example number unique identification;StartTime and endTime is that service call starts, terminates
Time;Duration is the execution time of service call.Based on the above-mentioned service invocation information monitored, constructs service call and open up
Flutterring figure, detailed process is as follows:
(1) initial stage, topological diagram G are sky, include collected recalls information in set S;
(2) tuple that is belonging to same request and having call relation is taken out from set S, by the serviceUID in tuple
Representative Service Instance is added in G as point, call relation as directed edge, if fruit dot or side have existed, is not repeated
Addition;
(3) if set S is not sky, (2) are continued to execute.Otherwise, algorithm terminates.
Exception service detection: second step is associated with mould between building service inner joint call number and service monitoring measurement
Type, the specific steps are as follows:
(1) data of total interface call number in collecting the monitoring data measured in service and servicing.For some
It services for the measurement m in S, uses vectorIt indicates in moment t, service i is to q in the service
The call number of interface, whereinIndicating that moment t service i calls number in the service is t1Interface number, it is marked
Quasi-ization processing, the explanatory variable as Lasso regression model.Use YtIndicate that measurement m in the monitor value of moment t, is returned as Lasso
Return the response variable of model;
(2) Lasso regression model is constructed based on above-mentioned data, the independent variable of model is the service interface obtained by (1)
The vector that call number is constituted, dependent variable are monitor value of some measurement m in moment t.The regression model further constructed are as follows:WhereinFor regression coefficient, α is stochastic error.In constraint conditionUnder, pass through
Coordinate descent find out so thatThe regression coefficient and error term of minimization;
(3) adjusting parameter t, the form of Generalized Cross Validation method are selected using Generalized Cross Validation method are as follows:Wherein RSS (c) indicates residual sum of squares (RSS):P (c) is
The number of effective regression coefficient in Lasso recurrence;
(4) during service operation, metric is predicted based on Lasso regression model, calculates residual error:When residual absolute value is greater than the threshold value of setting, it is abnormal to assert that measurement occurs, and then think to service
Occur abnormal;
Third step, failed services diagnosis: the data obtained based on first two steps, by occurred abnormal service according to its tune
Fault propagation subgraph is constructed with relationship.In subgraph, given a mark using intensity of anomaly of the PageRank algorithm to each service,
Specific step is as follows:
(1) initial stage uses PR initial value of the ratio of exception measurement in servicing as the service, P=[p0,p1,...,pn]T
For the column vector that the PR initial value of multiple services is constituted, wherein piFor the ratio of exception measurement in service i;
(2) service p is calculatediPR value beWherein, Pk(pi) it is kth time
Iteration services piScore, I (pj) it is to be directed toward pjPoint set, O (pj) it is to be directed toward pjPoint set, q is damped coefficient, mesh
Be guarantee algorithm convergence;
(3) if Pk(pi) meet | Pk-Pk-1| < δ, then iteration terminates.Otherwise, (2) are continued to execute.
(4) it is ranked up according to the score of service, it is believed that highest scoring is exactly the service for causing failure.In service
Portion gives a mark according to the intensity of anomaly that the Lasso model of foundation further calls service interface.Specific step is as follows:
(41) for j-th of interface, by the parameter ω in the Lasso model of relative exception measurementiAnd it is abnormal
The prediction residual of measurement is normalized, and obtains new value aiAnd bi;
(42) then the exception of j-th of interface is scored atWherein n is and j-th of interface related exception
The number of measurement;
(43) according to the abnormal score of the interface calculated in (2), the intensity of anomaly of interface is ranked up.
The principle of the present invention: for the multilingual characteristic of micro services, using between the mechanism monitoring service based on agency
Call relation is realized and is monitored to the service call of service-transparency;When service carries out interface calling, corresponding system money can be occupied
Source, therefore the metric monitored can show to change accordingly, therefore consider that establishing interface calls and being associated between metric
Model portrays influence relationship between the two.In order to reduce the complexity of model, retain on the most influential interface tune of measurement
With constructing the correlation model between interface call number and measurement using Lasso homing method, and found out according to the correlation model
Abnormal measurement, then the ratio according to shared by exception measurement in service, finds out and abnormal service occurs;When some service occurs
When abnormal, it is likely to that relative service is caused exception also occur whithin a period of time.Therefore it is opened up with service call
It flutters figure and gives a mark to the intensity of anomaly of service using PageRank algorithm come propagation abnormal between the service of portraying, find out and draw
Send out abnormal service.Inside failed services, the regression model between measurement is called based on interface, to the intensity of anomaly of interface
It gives a mark, finally positions the interface to break down.
The invention has the following advantages over the prior art:
(1) service-transparency monitors: the monitoring to service call is realized based on agent skill group, is accomplished to monitor to service-transparency,
Business development personnel can reduce the influence for calling monitoring to application performance without making any modification to greatest extent.
(2) automation exception service detection: the regression model called based on Lasso homing method building measurement with interface,
In service operation, system can be predicted metric automatically by regression model, if residual absolute value of the difference is greater than threshold
Value, then it is assumed that exception occur, to realize the automation service of noting abnormalities.
(3) failure root is because of positioning: failure subgraph is constructed based on the exception service detected and service call topological diagram, therefore
Hedge figure can be well reflected abnormal communication process, further using PageRank algorithm to the intensity of anomaly serviced in figure
It gives a mark.Because PageRank algorithm can reflect the influence degree of figure interior joint, most possible initiation can be found out
Abnormal service.
Detailed description of the invention
Fig. 1 is the implementation flow chart of the method for the present invention;
Fig. 2 is the use environment of present example method.
Specific embodiment
Below in conjunction with specific implementation example and attached drawing, the present invention is described in detail.
As shown in Figure 1, the micro services method for diagnosing faults proposed by the present invention towards anomalous propagation, includes the following steps,
(1) agency is deployed in each Service Instance, to collect the monitoring metrics data of service call relationship and service, and will be counted
According to being persisted in database;(2) service call topology is constructed based on the service invocation information being collected into cold-start phase
Figure, and Lasso regression model is constructed based on the measurement delta data being collected into and service interface call number;(3) it is servicing
Whether operation phase, the Lasso regression model monitoring service based on building are abnormal;(4) it when service occurs abnormal, is based on
PageRank algorithm finds out the most possible service for causing exception, and calls in the interface of exception service positioned internal exception.
As shown in Fig. 2, the use environment as embodiment method of the present invention, target micro services application is Sock-Shop,
Using Kubernetes as basic running environment, Service Instance is deployed on pod, wherein the 10 of core service is each own
One example, there are three example, MySQL has an example for MongoDB service.One is disposed on each pod acts on behalf of Agent,
For monitoring measurement variation in service invocation information and service.The request of workload generator analog subscriber, generates load;Failure note
Enter device by preset script, by direct fault location into system, with the diagnosis effect of test failure diagnostic system;Fault diagnosis system
It unites and carries out fault diagnosis based on the data being collected into.Method proposed by the invention is realized in fault diagnosis system.
Embodiment method flow of the present invention:
(1) by the monitoring metrics value acted on behalf of Agent and collect each Service Instance being deployed in Service Instance, including
Cpu busy percentage, multiple monitor values such as memory usage, magnetic disc i/o rate, number of request per second, service inner joint call number, with
And service request recalls information;
(2) it in cold-start phase, is generated and is loaded by workload generator, collected service request recalls information, use multi-component system
NiThe form of=(requestUID, serviceUID, spanUID, parentUID, info) is recorded, and is added in set
In S;
(3) in set S, classify according to requestUID to multi-component system, in the identical multi-component system of requestUID
The call relation serviced in the middle same request of discovery, the service for having call relation is added in topological diagram G, the point in figure is
Service Instance, side indicates the call relation of service, if figure midpoint or side have existed, does not repeat to add.Repeat above-mentioned mistake
Journey, until set S is sky;
(4) monitoring metrics value and service inner joint call number in servicing are collected, respectively as Lasso regression model
Response variable and explanatory variable.Wherein, Y is usedtIndicate response of the measurement m in the monitor value of moment t, as Lasso regression model
Variable uses vectorIndicate in moment t, service i to some service in q interface calling it is secondary
Number, as explanatory variable, whereinIndicating that moment t service i calls number in the service is t1Interface number, finally to upper
It states data and does standardization;
(5) Lasso regression model, expression formula are constructed based on above-mentioned data are as follows:Wherein YtTable
Indication amount m is in the monitor value of moment t, and p is the number for initiating the service service called, and q indicates of the service inner joint
Number,For regression coefficient,Indicating that moment t service i calls number in the service is t1Interface number, α is random error
?;In constraint conditionUnder, pass through coordinate descent minimizationWherein c is
Adjusting parameter;
(6) adjusting parameter c, the form of Generalized Cross Validation method are selected using Generalized Cross Validation method are as follows:Wherein RSS (c) indicates residual sum of squares (RSS):YtIt indicates
Measure m moment t monitor value, p (c) be Lasso return in effectively regression coefficient number, N be monitoring measure number;
(7) during service operation, metric is predicted based on Lasso regression model, calculates residual error:Wherein YtIndicate measurement m moment t monitor value, when residual absolute value be greater than setting threshold value when,
It is abnormal to assert that measurement occurs, and then it is abnormal to think that service occurs;
(8) the service call topological diagram that (3) obtain and exception service set building anomalous propagation that (7) obtain are based on
Figure uses the positioning failure service of PageRank algorithm below;
(9) in the initial stage, PR initial value of the ratio of exception measurement in servicing as the service, P=[p are used0,p1,...,
pn]TFor the column vector that the PR initial value of multiple services is constituted, wherein piFor the ratio of exception measurement in service i;
(10) pass through formulaThe PR value of each service is calculated, wherein q is
Damped coefficient, I (pj) it is to be directed toward pjPoint set, O (pj) it is to be directed toward pjPoint set, Pk(pi) it is kth time iteration service
piScore;
(11) after successive ignition, work as Pk(pi) meet | Pk-Pk-1| < δ, then iteration terminates;
(12) intensity of anomaly of service is ranked up according to the abnormal score of service, it is believed that highest scoring is exactly most to have
Abnormal service may be caused.Inside exception service, according to the Lasso model of (5) building to the exception of the interface in service
Degree is given a mark;
(13) for j-th of interface, by the parameter ω in the Lasso model of relative exception measurementiAnd it is abnormal
The prediction residual of measurement is normalized, and obtains new value aiAnd bi;
(14) then the exception of j-th of interface is scored atWherein n is and j-th of interface related exception
The number of measurement;
(15) the abnormal score obtained according to (14), is ranked up the intensity of anomaly of interface.This can finally be found out
Failure root in secondary exception is because of the exceptional interface in service and service.
In short, the present invention is based on agent skill groups to monitor service invocation information, establishes micro services and call topological diagram micro- to portray
Anomalous propagation relationship between service;It is called using Lasso regression modeling interface and is associated between measurement, by the change for monitoring correlation model
Change and detects abnormal micro services;Intensity of anomaly based on PageRank algorithm evaluation micro services and its calling interface, the present invention realize
Transparence service monitoring, the prediction of Automatic Measurement value is with the service of noting abnormalities, the intensity of anomaly of intelligent assessment figure interior joint
With test problems root because.
Claims (2)
1. a kind of micro services intelligent monitoring method towards anomalous propagation, which is characterized in that comprise the following steps that
The first step, service call monitoring: based on agent skill group monitor service invocation information, with multi-component system N=(requestUID,
ServiceUID, spanUID, parentUID, info) record service call relationship, wherein requestUID is request mark
Symbol is generated in request inlet, and serviceUID is service identifier, and span indicates that a service call, spanUID are service
Span identifier is called, parentUID is father span identifier, if it is -1, indicates that current span is root span, info is packet
Other relevant informations contained, info=(serviceUID, startTime, endTime, duration), wherein startTime
With endTime be service call start, the end time, duration be service call the execution time, monitored based on above-mentioned
Service invocation information, construct service call topological diagram, to portray anomalous propagation;
Exception service detection: second step constructs the correlation model between the call number of service interface and service monitoring measurement, inspection
It measures out and occurs abnormal service, the specific steps are as follows:
(1) service interface calls monitoring:It indicates to service the calling of q service interface in i in moment t
The vector that number is constituted, whereinIndicate that number is t in moment t service i1Service interface number;
(2) establish Lasso regression model based on the Lasso resource returned: the independent variable of the regression model is by step (1)
The vector that the service interface call number of middle acquisition is constituted, dependent variable are monitor value of some measurement m in moment t, and building is returned
Return model are as follows:WhereinFor regression coefficient, α is stochastic error;In constraint conditionUnder, solved by coordinate descent so thatMinimum regression coefficient and mistake
Poor item, c are adjusting parameter;
(3) abnormal resource detects: during service operation, based on the Lasso forecast of regression model service constructed in step (2)
Resource metric, calculate residual error:Wherein, YiIt (t) is the monitor value measured,It is to pass through
Lasso model it is abnormal to assert that measurement occurs, place clothes when residual absolute value is greater than the threshold value of setting to the predicted value of measurement
Business is then detected as exception, and finally detection obtains occurred abnormal service;
Failed services diagnosis: third step occurs in abnormal service and the first step according to what detection in second step obtained
The service call topological diagram building fault propagation subgraph monitored, using the abnormal journey of each service of PageRank algorithm evaluation
Degree;
4th step, inside failed services, the parameter of the Lasso regression model based on buildingAnd prediction residual Ri(t), into
One step, which is found out, causes abnormal interface calling.
2. the micro services intelligent monitoring method according to claim 1 towards anomalous propagation, it is characterised in that: the described 4th
Step, inside failed services, the parameter and Prediction Parameters of the Lasso regression model based on building are found out and cause abnormal connect
Mouth calls, specific as follows:
(41) for j-th of interface, by the parameter ω in the Lasso model of relative exception measurementiAnd exception measurement
Prediction residual Ri(t) it is normalized, obtains new value aiAnd bi;
(42) then the exception of j-th of interface is scored atWherein n is and j-th of interface related exception measurement
Number;
(43) according to the abnormal score of the interface calculated in step (2), the intensity of anomaly of interface is ranked up, is drawn to find out
The abnormal interface of hair calls.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910220179.4A CN109933452B (en) | 2019-03-22 | 2019-03-22 | Micro-service intelligent monitoring method facing abnormal propagation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910220179.4A CN109933452B (en) | 2019-03-22 | 2019-03-22 | Micro-service intelligent monitoring method facing abnormal propagation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109933452A true CN109933452A (en) | 2019-06-25 |
CN109933452B CN109933452B (en) | 2020-06-19 |
Family
ID=66988052
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910220179.4A Active CN109933452B (en) | 2019-03-22 | 2019-03-22 | Micro-service intelligent monitoring method facing abnormal propagation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109933452B (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427275A (en) * | 2019-07-11 | 2019-11-08 | 复旦大学 | Micro services latent fault and fault rootstock prediction technique based on trace logs study |
CN110442641A (en) * | 2019-08-06 | 2019-11-12 | 中国工商银行股份有限公司 | A kind of link topology figure methods of exhibiting, device, storage medium and equipment |
CN110825589A (en) * | 2019-11-07 | 2020-02-21 | 字节跳动有限公司 | Anomaly detection method and device for micro-service system and electronic equipment |
CN111190756A (en) * | 2019-11-18 | 2020-05-22 | 中山大学 | Root cause positioning algorithm based on call chain data |
CN111597070A (en) * | 2020-07-27 | 2020-08-28 | 北京必示科技有限公司 | Fault positioning method and device, electronic equipment and storage medium |
CN112118127A (en) * | 2020-08-07 | 2020-12-22 | 中国科学院软件研究所 | Service reliability guarantee method based on fault similarity |
CN112231187A (en) * | 2019-07-15 | 2021-01-15 | 华为技术有限公司 | Micro-service abnormity analysis method and device |
CN112615743A (en) * | 2020-12-18 | 2021-04-06 | 江苏云柜网络技术有限公司 | Topological graph drawing method and device |
CN112667457A (en) * | 2019-10-16 | 2021-04-16 | 烽火通信科技股份有限公司 | Method and system for monitoring service call under micro-service architecture |
CN112698975A (en) * | 2020-12-14 | 2021-04-23 | 北京大学 | Fault root cause positioning method and system of micro-service architecture information system |
CN112817785A (en) * | 2019-11-15 | 2021-05-18 | 亚信科技(中国)有限公司 | Anomaly detection method and device for micro-service system |
WO2021147832A1 (en) * | 2020-01-23 | 2021-07-29 | 阿里巴巴集团控股有限公司 | Data processing method and apparatus, database system, electronic device, and storage medium |
CN113190373A (en) * | 2021-05-31 | 2021-07-30 | 中国人民解放军国防科技大学 | Micro-service system fault root cause positioning method based on fault feature comparison |
CN113626288A (en) * | 2021-08-12 | 2021-11-09 | 杭州朗和科技有限公司 | Fault processing method, system, device, storage medium and electronic equipment |
CN114024837A (en) * | 2022-01-06 | 2022-02-08 | 杭州大乘智能科技有限公司 | Fault root cause positioning method of micro-service system |
CN114598742A (en) * | 2022-03-04 | 2022-06-07 | 北京北信源软件股份有限公司 | Micro-service importance determination method, device, electronic equipment and storage medium |
CN115314559A (en) * | 2022-08-03 | 2022-11-08 | 苏州创意云网络科技有限公司 | Network service system and abnormal response method thereof |
CN115396341A (en) * | 2022-08-16 | 2022-11-25 | 度小满科技(北京)有限公司 | Service stability evaluation method and device, storage medium and electronic device |
CN117520040A (en) * | 2024-01-05 | 2024-02-06 | 中国民航大学 | Micro-service fault root cause determining method, electronic equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103986625A (en) * | 2014-05-29 | 2014-08-13 | 中国科学院软件研究所 | Cloud application fault diagnosis system based on statistical monitoring |
US20170177008A1 (en) * | 2015-12-21 | 2017-06-22 | International Business Machines Corporation | Topological connectivity and relative distances from temporal sensor measurements of physical delivery system |
CN107766205A (en) * | 2017-10-10 | 2018-03-06 | 武汉大学 | A kind of monitoring system and method towards the tracking of micro services invoked procedure |
CN108322351A (en) * | 2018-03-05 | 2018-07-24 | 北京奇艺世纪科技有限公司 | Generate method and apparatus, fault determination method and the device of topological diagram |
CN108762908A (en) * | 2018-05-31 | 2018-11-06 | 阿里巴巴集团控股有限公司 | System calls method for detecting abnormality and device |
CN109144724A (en) * | 2018-07-27 | 2019-01-04 | 众安信息技术服务有限公司 | A kind of micro services resource scheduling system and method |
CN109213616A (en) * | 2018-09-25 | 2019-01-15 | 江苏润和软件股份有限公司 | A kind of micro services software systems method for detecting abnormality based on calling map analysis |
CN109254865A (en) * | 2018-09-25 | 2019-01-22 | 江苏润和软件股份有限公司 | A kind of cloud data center based on statistical analysis services abnormal root because of localization method |
-
2019
- 2019-03-22 CN CN201910220179.4A patent/CN109933452B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103986625A (en) * | 2014-05-29 | 2014-08-13 | 中国科学院软件研究所 | Cloud application fault diagnosis system based on statistical monitoring |
US20170177008A1 (en) * | 2015-12-21 | 2017-06-22 | International Business Machines Corporation | Topological connectivity and relative distances from temporal sensor measurements of physical delivery system |
CN107766205A (en) * | 2017-10-10 | 2018-03-06 | 武汉大学 | A kind of monitoring system and method towards the tracking of micro services invoked procedure |
CN108322351A (en) * | 2018-03-05 | 2018-07-24 | 北京奇艺世纪科技有限公司 | Generate method and apparatus, fault determination method and the device of topological diagram |
CN108762908A (en) * | 2018-05-31 | 2018-11-06 | 阿里巴巴集团控股有限公司 | System calls method for detecting abnormality and device |
CN109144724A (en) * | 2018-07-27 | 2019-01-04 | 众安信息技术服务有限公司 | A kind of micro services resource scheduling system and method |
CN109213616A (en) * | 2018-09-25 | 2019-01-15 | 江苏润和软件股份有限公司 | A kind of micro services software systems method for detecting abnormality based on calling map analysis |
CN109254865A (en) * | 2018-09-25 | 2019-01-22 | 江苏润和软件股份有限公司 | A kind of cloud data center based on statistical analysis services abnormal root because of localization method |
Non-Patent Citations (1)
Title |
---|
SIGELMAN BENJAMIN H.等: "Dapper, a Large-Scale Distributed Systems Tracing Infrastructure", 《GOOGLE TECHNICAL REPORT》 * |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427275B (en) * | 2019-07-11 | 2022-11-18 | 复旦大学 | Micro-service potential error and fault source prediction method based on track log learning |
CN110427275A (en) * | 2019-07-11 | 2019-11-08 | 复旦大学 | Micro services latent fault and fault rootstock prediction technique based on trace logs study |
CN112231187B (en) * | 2019-07-15 | 2022-07-26 | 华为技术有限公司 | Micro-service abnormity analysis method and device |
CN112231187A (en) * | 2019-07-15 | 2021-01-15 | 华为技术有限公司 | Micro-service abnormity analysis method and device |
CN110442641A (en) * | 2019-08-06 | 2019-11-12 | 中国工商银行股份有限公司 | A kind of link topology figure methods of exhibiting, device, storage medium and equipment |
CN110442641B (en) * | 2019-08-06 | 2022-07-12 | 中国工商银行股份有限公司 | Link topology graph display method and device, storage medium and equipment |
CN112667457A (en) * | 2019-10-16 | 2021-04-16 | 烽火通信科技股份有限公司 | Method and system for monitoring service call under micro-service architecture |
CN110825589A (en) * | 2019-11-07 | 2020-02-21 | 字节跳动有限公司 | Anomaly detection method and device for micro-service system and electronic equipment |
CN110825589B (en) * | 2019-11-07 | 2024-01-05 | 字节跳动有限公司 | Abnormality detection method and device for micro-service system and electronic equipment |
CN112817785A (en) * | 2019-11-15 | 2021-05-18 | 亚信科技(中国)有限公司 | Anomaly detection method and device for micro-service system |
CN111190756B (en) * | 2019-11-18 | 2023-04-28 | 中山大学 | Root cause positioning algorithm based on call chain data |
CN111190756A (en) * | 2019-11-18 | 2020-05-22 | 中山大学 | Root cause positioning algorithm based on call chain data |
WO2021147832A1 (en) * | 2020-01-23 | 2021-07-29 | 阿里巴巴集团控股有限公司 | Data processing method and apparatus, database system, electronic device, and storage medium |
CN111597070A (en) * | 2020-07-27 | 2020-08-28 | 北京必示科技有限公司 | Fault positioning method and device, electronic equipment and storage medium |
CN112118127A (en) * | 2020-08-07 | 2020-12-22 | 中国科学院软件研究所 | Service reliability guarantee method based on fault similarity |
CN112118127B (en) * | 2020-08-07 | 2021-11-09 | 中国科学院软件研究所 | Service reliability guarantee method based on fault similarity |
CN112698975A (en) * | 2020-12-14 | 2021-04-23 | 北京大学 | Fault root cause positioning method and system of micro-service architecture information system |
CN112698975B (en) * | 2020-12-14 | 2022-09-27 | 北京大学 | Fault root cause positioning method and system of micro-service architecture information system |
CN112615743A (en) * | 2020-12-18 | 2021-04-06 | 江苏云柜网络技术有限公司 | Topological graph drawing method and device |
CN113190373B (en) * | 2021-05-31 | 2022-04-05 | 中国人民解放军国防科技大学 | Micro-service system fault root cause positioning method based on fault feature comparison |
CN113190373A (en) * | 2021-05-31 | 2021-07-30 | 中国人民解放军国防科技大学 | Micro-service system fault root cause positioning method based on fault feature comparison |
CN113626288B (en) * | 2021-08-12 | 2023-08-25 | 杭州朗和科技有限公司 | Fault processing method, system, device, storage medium and electronic equipment |
CN113626288A (en) * | 2021-08-12 | 2021-11-09 | 杭州朗和科技有限公司 | Fault processing method, system, device, storage medium and electronic equipment |
CN114024837B (en) * | 2022-01-06 | 2022-04-05 | 杭州乘云数字技术有限公司 | Fault root cause positioning method of micro-service system |
CN114024837A (en) * | 2022-01-06 | 2022-02-08 | 杭州大乘智能科技有限公司 | Fault root cause positioning method of micro-service system |
CN114598742A (en) * | 2022-03-04 | 2022-06-07 | 北京北信源软件股份有限公司 | Micro-service importance determination method, device, electronic equipment and storage medium |
CN115314559A (en) * | 2022-08-03 | 2022-11-08 | 苏州创意云网络科技有限公司 | Network service system and abnormal response method thereof |
CN115314559B (en) * | 2022-08-03 | 2023-09-29 | 苏州创意云网络科技有限公司 | Network service system, abnormal response method thereof, service unit, scheduling processing unit, electronic device and computer storage medium |
CN115396341A (en) * | 2022-08-16 | 2022-11-25 | 度小满科技(北京)有限公司 | Service stability evaluation method and device, storage medium and electronic device |
CN115396341B (en) * | 2022-08-16 | 2023-12-05 | 度小满科技(北京)有限公司 | Service stability evaluation method and device, storage medium and electronic device |
CN117520040A (en) * | 2024-01-05 | 2024-02-06 | 中国民航大学 | Micro-service fault root cause determining method, electronic equipment and storage medium |
CN117520040B (en) * | 2024-01-05 | 2024-03-08 | 中国民航大学 | Micro-service fault root cause determining method, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109933452B (en) | 2020-06-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109933452A (en) | A kind of micro services intelligent monitoring method towards anomalous propagation | |
CN111756582B (en) | Service chain monitoring method based on NFV log alarm | |
CN105337765B (en) | A kind of distribution hadoop cluster automatic fault diagnosis repair system | |
CN109213616A (en) | A kind of micro services software systems method for detecting abnormality based on calling map analysis | |
WO2021036229A1 (en) | Method for changing service on device and service changing system | |
CN107124289B (en) | Weblog time alignment method, device and host | |
TW201941058A (en) | Anomaly detection method and device | |
CN101997709B (en) | Root alarm data analysis method and system | |
Nováczki | An improved anomaly detection and diagnosis framework for mobile network operators | |
CN111176879A (en) | Fault repairing method and device for equipment | |
CN102111797A (en) | Fault diagnosis method and fault diagnosis equipment | |
Ehlers et al. | A self-adaptive monitoring framework for component-based software systems | |
Li et al. | Fighting the fog of war: Automated incident detection for cloud systems | |
CN110032463A (en) | A kind of system fault locating method and system based on Bayesian network | |
CN113010392B (en) | Big data platform testing method, device, equipment, storage medium and system | |
KR102580916B1 (en) | Apparatus and method for managing trouble using big data of 5G distributed cloud system | |
CN115118621B (en) | Dependency graph-based micro-service performance diagnosis method and system | |
Bocciarelli et al. | BPMN-based business process modeling and simulation | |
CN114201326A (en) | Micro-service abnormity diagnosis method based on attribute relation graph | |
Yu et al. | TraceRank: Abnormal service localization with dis‐aggregated end‐to‐end tracing data in cloud native systems | |
CN107204868B (en) | Task operation monitoring information acquisition method and device | |
CN112506802B (en) | Test data management method and system | |
CN111158979A (en) | Service dial testing method, system, device and storage medium | |
Li et al. | Microservice anomaly detection based on tracing data using semi-supervised learning | |
CN109889258A (en) | A kind of optical network fault method of calibration and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |