CN109933452A - A kind of micro services intelligent monitoring method towards anomalous propagation - Google Patents

A kind of micro services intelligent monitoring method towards anomalous propagation Download PDF

Info

Publication number
CN109933452A
CN109933452A CN201910220179.4A CN201910220179A CN109933452A CN 109933452 A CN109933452 A CN 109933452A CN 201910220179 A CN201910220179 A CN 201910220179A CN 109933452 A CN109933452 A CN 109933452A
Authority
CN
China
Prior art keywords
service
interface
abnormal
measurement
micro services
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910220179.4A
Other languages
Chinese (zh)
Other versions
CN109933452B (en
Inventor
王焘
张文博
薛晓东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CAS filed Critical Institute of Software of CAS
Priority to CN201910220179.4A priority Critical patent/CN109933452B/en
Publication of CN109933452A publication Critical patent/CN109933452A/en
Application granted granted Critical
Publication of CN109933452B publication Critical patent/CN109933452B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention relates to a kind of, and the micro services intelligent monitoring method towards anomalous propagation is established micro services and is called topological diagram to portray anomalous propagation relationship between micro services based on agent skill group monitoring service invocation information;It is called using Lasso regression modeling interface and is associated between measurement, the variation by monitoring correlation model detects abnormal micro services;Intensity of anomaly based on PageRank algorithm evaluation micro services and its calling interface, the present invention realize transparence service monitoring, the prediction of Automatic Measurement value with the service of noting abnormalities, the intensity of anomaly of intelligent assessment figure interior joint with test problems root because.

Description

A kind of micro services intelligent monitoring method towards anomalous propagation
Technical field
The present invention relates to the method for diagnosing faults of micro services software systems more particularly to it is a kind of towards anomalous propagation in incognito Business intelligent monitoring method, belongs to software technology field.
Background technique
Monomer-type framework and SOA software architecture are the architectural forms that software company generallys use, by the development of more than ten years, The complex that software systems have become, scalability is very low with maintainability, and heavy technology debt has been born by enterprise.Current interconnection Net dog-eat-dog, user demand and market environment moment are in quickly variation, when facing current Internet application, The scalability of conventional software architectural form and flexibility are obviously insufficient, and design, exploitation, test and O&M cost significantly increase Add.Therefore, the concept of micro services is suggested, and micro services are a kind of using single application program as the soft of one group of software service external member Part architectural form, each service operation are communicated in independent process each other by lightweight protocol.The spy of micro services framework Property is very suitable to agile development and continuous integrating, solves the pain spot of conventional software architectural, obtains academia and industry Extensive concern and research.
After software systems micro services, improving maintainability and while flexibility, but make between service according to The relationship of relying is intricate, increases failure odds and the loss of failure bring.Such as in the website of a high flow capacity, Some serviced component once postpones, and may cause all application resources and is depleted, so-called avalanche effect is caused, when serious Whole system can be caused to paralyse.Therefore system is effectively monitored, and quick positioning failure is the reason is that ensure micro services reliability and performance One of key technology.
Mainly there are following a few classes for the work of micro services fault diagnosis: (1) based on the diagnostic method of monitoring metrics.The party Method is mainly collection system operating index, such as CPU, memory, network etc., when reflecting application program current state and one section with this Interior operation trend.If a certain measurement is more than preset threshold values, then it represents that system there is a problem, and trigger alarm, so Afterwards, administrator is using monitoring data as foundation, solved the problems, such as in conjunction with the experience of itself (Wang T, Zhang W, Ye C, Wei J, Zhong H, Huang T.FD4C:Automatic Fault Diagnosis Framework for Web Applications In Cloud Computing.IEEE Transactions on Systems, Man, and Cybernetics: Systems.2016,46(1):61-75;M.Farshchi,J.G.Schneider,I.Weber,and J.Grundy, “Metric selection and anomaly detection for cloud operations using log and metric correlation analysis,”Journal of Systems and Software,2018,137,pp.531- 549.);(2) based on the method for monitoring and analyzing of log, log has explicitly recorded the operating condition of system, is convenient for persistence, and And can easily search for, it is usually the effective means found out failure cause and support more business target (ELK.https://www.elastic.co/);(3) the monitoring, diagnosing method based on distributed request tracking, by based on mark The execution route of the method acquisition request of note, compares by the analysis to execution route or by path, Lai Faxian system Failure (A.Nandi, A.Mandal, S.Atreja, G.B.Dasgupta, and S.Bhattacharya, " Anomaly Detection Using Program Control Flow Graph Mining From Execution Logs,"22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco,California,USA,2016;T.Jia,P.Chen,L.Yang,Y.Li,F.Meng and J.Xu," An Approach for Anomaly Diagnosis Based on Hybrid Graph Model with Logs for Distributed Services,"IEEE International Conference on Web Services,Honolulu, HI,2017,pp.25-32.).Wherein realizes simply based on measurement, the monitoring fault diagnosis mode of log, but cannot reflect system Integrality, Business Stream can not be tracked, the rank of fault location is usually serviced component, in complicated micro services interactive relation In, administrator will take a substantial amount of time to search and orientation problem;And the monitoring, diagnosing method based on distributed request tracking Monitor reference of the track as fault diagnosis of request by way of log or code implant, but the expansion of the method monitoring Malleability is lower, can not accomplish, also do not account for anomalous propagation the problem of transparent to application.
Summary of the invention
Technology of the invention solves the problems, such as: overcoming the deficiencies of the prior art and provide a kind of Efficient fault towards micro services Diagnostic system.By the calling monitoring to service-transparency, the scalability of system is improved, reduces the shadow that monitoring runs micro services It rings;By research and application data, realize the fine granularity failure root of interface level because of positioning.
The technology of the present invention solution: a kind of micro services intelligent monitoring method towards anomalous propagation realizes that steps are as follows:
Service call monitoring: the first step monitors service invocation information based on agent skill group, with multi-component system Ni= (requestUID, serviceUID, spanUID, parentUID, info) Lai Jilu service call relationship, wherein RequestUID is request identifier, is generated in request inlet;ServiceUID is service identifier;SpanUID is service Call span identifier;ParentUID is father span identifier, if it is -1, indicates that current span is root span;Info includes Other information indicates with multi-component system info=(serviceUID, startTime, endTime, duration), wherein ServiceUID is by serviced component and example number unique identification;StartTime and endTime is that service call starts, terminates Time;Duration is the execution time of service call.Based on the above-mentioned service invocation information monitored, constructs service call and open up Flutterring figure, detailed process is as follows:
(1) initial stage, topological diagram G are sky, include collected recalls information in set S;
(2) tuple that is belonging to same request and having call relation is taken out from set S, by the serviceUID in tuple Representative Service Instance is added in G as point, call relation as directed edge, if fruit dot or side have existed, is not repeated Addition;
(3) if set S is not sky, (2) are continued to execute.Otherwise, algorithm terminates.
Exception service detection: second step is associated with mould between building service inner joint call number and service monitoring measurement Type, the specific steps are as follows:
(1) data of total interface call number in collecting the monitoring data measured in service and servicing.For some It services for the measurement m in S, uses vectorIt indicates in moment t, service i is to q in the service The call number of interface, whereinIndicating that moment t service i calls number in the service is t1Interface number, it is marked Quasi-ization processing, the explanatory variable as Lasso regression model.Use YtIndicate that measurement m in the monitor value of moment t, is returned as Lasso Return the response variable of model;
(2) Lasso regression model is constructed based on above-mentioned data, the independent variable of model is the service interface obtained by (1) The vector that call number is constituted, dependent variable are monitor value of some measurement m in moment t.The regression model further constructed are as follows:WhereinFor regression coefficient, α is stochastic error.In constraint conditionUnder, pass through Coordinate descent find out so thatThe regression coefficient and error term of minimization;
(3) adjusting parameter t, the form of Generalized Cross Validation method are selected using Generalized Cross Validation method are as follows:Wherein RSS (c) indicates residual sum of squares (RSS):P (c) is The number of effective regression coefficient in Lasso recurrence;
(4) during service operation, metric is predicted based on Lasso regression model, calculates residual error:When residual absolute value is greater than the threshold value of setting, it is abnormal to assert that measurement occurs, and then think to service Occur abnormal;
Third step, failed services diagnosis: the data obtained based on first two steps, by occurred abnormal service according to its tune Fault propagation subgraph is constructed with relationship.In subgraph, given a mark using intensity of anomaly of the PageRank algorithm to each service, Specific step is as follows:
(1) initial stage uses PR initial value of the ratio of exception measurement in servicing as the service, P=[p0,p1,...,pn]T For the column vector that the PR initial value of multiple services is constituted, wherein piFor the ratio of exception measurement in service i;
(2) service p is calculatediPR value beWherein, Pk(pi) it is kth time Iteration services piScore, I (pj) it is to be directed toward pjPoint set, O (pj) it is to be directed toward pjPoint set, q is damped coefficient, mesh Be guarantee algorithm convergence;
(3) if Pk(pi) meet | Pk-Pk-1| < δ, then iteration terminates.Otherwise, (2) are continued to execute.
(4) it is ranked up according to the score of service, it is believed that highest scoring is exactly the service for causing failure.In service Portion gives a mark according to the intensity of anomaly that the Lasso model of foundation further calls service interface.Specific step is as follows:
(41) for j-th of interface, by the parameter ω in the Lasso model of relative exception measurementiAnd it is abnormal The prediction residual of measurement is normalized, and obtains new value aiAnd bi
(42) then the exception of j-th of interface is scored atWherein n is and j-th of interface related exception The number of measurement;
(43) according to the abnormal score of the interface calculated in (2), the intensity of anomaly of interface is ranked up.
The principle of the present invention: for the multilingual characteristic of micro services, using between the mechanism monitoring service based on agency Call relation is realized and is monitored to the service call of service-transparency;When service carries out interface calling, corresponding system money can be occupied Source, therefore the metric monitored can show to change accordingly, therefore consider that establishing interface calls and being associated between metric Model portrays influence relationship between the two.In order to reduce the complexity of model, retain on the most influential interface tune of measurement With constructing the correlation model between interface call number and measurement using Lasso homing method, and found out according to the correlation model Abnormal measurement, then the ratio according to shared by exception measurement in service, finds out and abnormal service occurs;When some service occurs When abnormal, it is likely to that relative service is caused exception also occur whithin a period of time.Therefore it is opened up with service call It flutters figure and gives a mark to the intensity of anomaly of service using PageRank algorithm come propagation abnormal between the service of portraying, find out and draw Send out abnormal service.Inside failed services, the regression model between measurement is called based on interface, to the intensity of anomaly of interface It gives a mark, finally positions the interface to break down.
The invention has the following advantages over the prior art:
(1) service-transparency monitors: the monitoring to service call is realized based on agent skill group, is accomplished to monitor to service-transparency, Business development personnel can reduce the influence for calling monitoring to application performance without making any modification to greatest extent.
(2) automation exception service detection: the regression model called based on Lasso homing method building measurement with interface, In service operation, system can be predicted metric automatically by regression model, if residual absolute value of the difference is greater than threshold Value, then it is assumed that exception occur, to realize the automation service of noting abnormalities.
(3) failure root is because of positioning: failure subgraph is constructed based on the exception service detected and service call topological diagram, therefore Hedge figure can be well reflected abnormal communication process, further using PageRank algorithm to the intensity of anomaly serviced in figure It gives a mark.Because PageRank algorithm can reflect the influence degree of figure interior joint, most possible initiation can be found out Abnormal service.
Detailed description of the invention
Fig. 1 is the implementation flow chart of the method for the present invention;
Fig. 2 is the use environment of present example method.
Specific embodiment
Below in conjunction with specific implementation example and attached drawing, the present invention is described in detail.
As shown in Figure 1, the micro services method for diagnosing faults proposed by the present invention towards anomalous propagation, includes the following steps, (1) agency is deployed in each Service Instance, to collect the monitoring metrics data of service call relationship and service, and will be counted According to being persisted in database;(2) service call topology is constructed based on the service invocation information being collected into cold-start phase Figure, and Lasso regression model is constructed based on the measurement delta data being collected into and service interface call number;(3) it is servicing Whether operation phase, the Lasso regression model monitoring service based on building are abnormal;(4) it when service occurs abnormal, is based on PageRank algorithm finds out the most possible service for causing exception, and calls in the interface of exception service positioned internal exception.
As shown in Fig. 2, the use environment as embodiment method of the present invention, target micro services application is Sock-Shop, Using Kubernetes as basic running environment, Service Instance is deployed on pod, wherein the 10 of core service is each own One example, there are three example, MySQL has an example for MongoDB service.One is disposed on each pod acts on behalf of Agent, For monitoring measurement variation in service invocation information and service.The request of workload generator analog subscriber, generates load;Failure note Enter device by preset script, by direct fault location into system, with the diagnosis effect of test failure diagnostic system;Fault diagnosis system It unites and carries out fault diagnosis based on the data being collected into.Method proposed by the invention is realized in fault diagnosis system.
Embodiment method flow of the present invention:
(1) by the monitoring metrics value acted on behalf of Agent and collect each Service Instance being deployed in Service Instance, including Cpu busy percentage, multiple monitor values such as memory usage, magnetic disc i/o rate, number of request per second, service inner joint call number, with And service request recalls information;
(2) it in cold-start phase, is generated and is loaded by workload generator, collected service request recalls information, use multi-component system NiThe form of=(requestUID, serviceUID, spanUID, parentUID, info) is recorded, and is added in set In S;
(3) in set S, classify according to requestUID to multi-component system, in the identical multi-component system of requestUID The call relation serviced in the middle same request of discovery, the service for having call relation is added in topological diagram G, the point in figure is Service Instance, side indicates the call relation of service, if figure midpoint or side have existed, does not repeat to add.Repeat above-mentioned mistake Journey, until set S is sky;
(4) monitoring metrics value and service inner joint call number in servicing are collected, respectively as Lasso regression model Response variable and explanatory variable.Wherein, Y is usedtIndicate response of the measurement m in the monitor value of moment t, as Lasso regression model Variable uses vectorIndicate in moment t, service i to some service in q interface calling it is secondary Number, as explanatory variable, whereinIndicating that moment t service i calls number in the service is t1Interface number, finally to upper It states data and does standardization;
(5) Lasso regression model, expression formula are constructed based on above-mentioned data are as follows:Wherein YtTable Indication amount m is in the monitor value of moment t, and p is the number for initiating the service service called, and q indicates of the service inner joint Number,For regression coefficient,Indicating that moment t service i calls number in the service is t1Interface number, α is random error ?;In constraint conditionUnder, pass through coordinate descent minimizationWherein c is Adjusting parameter;
(6) adjusting parameter c, the form of Generalized Cross Validation method are selected using Generalized Cross Validation method are as follows:Wherein RSS (c) indicates residual sum of squares (RSS):YtIt indicates Measure m moment t monitor value, p (c) be Lasso return in effectively regression coefficient number, N be monitoring measure number;
(7) during service operation, metric is predicted based on Lasso regression model, calculates residual error:Wherein YtIndicate measurement m moment t monitor value, when residual absolute value be greater than setting threshold value when, It is abnormal to assert that measurement occurs, and then it is abnormal to think that service occurs;
(8) the service call topological diagram that (3) obtain and exception service set building anomalous propagation that (7) obtain are based on Figure uses the positioning failure service of PageRank algorithm below;
(9) in the initial stage, PR initial value of the ratio of exception measurement in servicing as the service, P=[p are used0,p1,..., pn]TFor the column vector that the PR initial value of multiple services is constituted, wherein piFor the ratio of exception measurement in service i;
(10) pass through formulaThe PR value of each service is calculated, wherein q is Damped coefficient, I (pj) it is to be directed toward pjPoint set, O (pj) it is to be directed toward pjPoint set, Pk(pi) it is kth time iteration service piScore;
(11) after successive ignition, work as Pk(pi) meet | Pk-Pk-1| < δ, then iteration terminates;
(12) intensity of anomaly of service is ranked up according to the abnormal score of service, it is believed that highest scoring is exactly most to have Abnormal service may be caused.Inside exception service, according to the Lasso model of (5) building to the exception of the interface in service Degree is given a mark;
(13) for j-th of interface, by the parameter ω in the Lasso model of relative exception measurementiAnd it is abnormal The prediction residual of measurement is normalized, and obtains new value aiAnd bi
(14) then the exception of j-th of interface is scored atWherein n is and j-th of interface related exception The number of measurement;
(15) the abnormal score obtained according to (14), is ranked up the intensity of anomaly of interface.This can finally be found out Failure root in secondary exception is because of the exceptional interface in service and service.
In short, the present invention is based on agent skill groups to monitor service invocation information, establishes micro services and call topological diagram micro- to portray Anomalous propagation relationship between service;It is called using Lasso regression modeling interface and is associated between measurement, by the change for monitoring correlation model Change and detects abnormal micro services;Intensity of anomaly based on PageRank algorithm evaluation micro services and its calling interface, the present invention realize Transparence service monitoring, the prediction of Automatic Measurement value is with the service of noting abnormalities, the intensity of anomaly of intelligent assessment figure interior joint With test problems root because.

Claims (2)

1. a kind of micro services intelligent monitoring method towards anomalous propagation, which is characterized in that comprise the following steps that
The first step, service call monitoring: based on agent skill group monitor service invocation information, with multi-component system N=(requestUID, ServiceUID, spanUID, parentUID, info) record service call relationship, wherein requestUID is request mark Symbol is generated in request inlet, and serviceUID is service identifier, and span indicates that a service call, spanUID are service Span identifier is called, parentUID is father span identifier, if it is -1, indicates that current span is root span, info is packet Other relevant informations contained, info=(serviceUID, startTime, endTime, duration), wherein startTime With endTime be service call start, the end time, duration be service call the execution time, monitored based on above-mentioned Service invocation information, construct service call topological diagram, to portray anomalous propagation;
Exception service detection: second step constructs the correlation model between the call number of service interface and service monitoring measurement, inspection It measures out and occurs abnormal service, the specific steps are as follows:
(1) service interface calls monitoring:It indicates to service the calling of q service interface in i in moment t The vector that number is constituted, whereinIndicate that number is t in moment t service i1Service interface number;
(2) establish Lasso regression model based on the Lasso resource returned: the independent variable of the regression model is by step (1) The vector that the service interface call number of middle acquisition is constituted, dependent variable are monitor value of some measurement m in moment t, and building is returned Return model are as follows:WhereinFor regression coefficient, α is stochastic error;In constraint conditionUnder, solved by coordinate descent so thatMinimum regression coefficient and mistake Poor item, c are adjusting parameter;
(3) abnormal resource detects: during service operation, based on the Lasso forecast of regression model service constructed in step (2) Resource metric, calculate residual error:Wherein, YiIt (t) is the monitor value measured,It is to pass through Lasso model it is abnormal to assert that measurement occurs, place clothes when residual absolute value is greater than the threshold value of setting to the predicted value of measurement Business is then detected as exception, and finally detection obtains occurred abnormal service;
Failed services diagnosis: third step occurs in abnormal service and the first step according to what detection in second step obtained The service call topological diagram building fault propagation subgraph monitored, using the abnormal journey of each service of PageRank algorithm evaluation Degree;
4th step, inside failed services, the parameter of the Lasso regression model based on buildingAnd prediction residual Ri(t), into One step, which is found out, causes abnormal interface calling.
2. the micro services intelligent monitoring method according to claim 1 towards anomalous propagation, it is characterised in that: the described 4th Step, inside failed services, the parameter and Prediction Parameters of the Lasso regression model based on building are found out and cause abnormal connect Mouth calls, specific as follows:
(41) for j-th of interface, by the parameter ω in the Lasso model of relative exception measurementiAnd exception measurement Prediction residual Ri(t) it is normalized, obtains new value aiAnd bi
(42) then the exception of j-th of interface is scored atWherein n is and j-th of interface related exception measurement Number;
(43) according to the abnormal score of the interface calculated in step (2), the intensity of anomaly of interface is ranked up, is drawn to find out The abnormal interface of hair calls.
CN201910220179.4A 2019-03-22 2019-03-22 Micro-service intelligent monitoring method facing abnormal propagation Active CN109933452B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910220179.4A CN109933452B (en) 2019-03-22 2019-03-22 Micro-service intelligent monitoring method facing abnormal propagation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910220179.4A CN109933452B (en) 2019-03-22 2019-03-22 Micro-service intelligent monitoring method facing abnormal propagation

Publications (2)

Publication Number Publication Date
CN109933452A true CN109933452A (en) 2019-06-25
CN109933452B CN109933452B (en) 2020-06-19

Family

ID=66988052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910220179.4A Active CN109933452B (en) 2019-03-22 2019-03-22 Micro-service intelligent monitoring method facing abnormal propagation

Country Status (1)

Country Link
CN (1) CN109933452B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427275A (en) * 2019-07-11 2019-11-08 复旦大学 Micro services latent fault and fault rootstock prediction technique based on trace logs study
CN110442641A (en) * 2019-08-06 2019-11-12 中国工商银行股份有限公司 A kind of link topology figure methods of exhibiting, device, storage medium and equipment
CN110825589A (en) * 2019-11-07 2020-02-21 字节跳动有限公司 Anomaly detection method and device for micro-service system and electronic equipment
CN111190756A (en) * 2019-11-18 2020-05-22 中山大学 Root cause positioning algorithm based on call chain data
CN111597070A (en) * 2020-07-27 2020-08-28 北京必示科技有限公司 Fault positioning method and device, electronic equipment and storage medium
CN112118127A (en) * 2020-08-07 2020-12-22 中国科学院软件研究所 Service reliability guarantee method based on fault similarity
CN112231187A (en) * 2019-07-15 2021-01-15 华为技术有限公司 Micro-service abnormity analysis method and device
CN112615743A (en) * 2020-12-18 2021-04-06 江苏云柜网络技术有限公司 Topological graph drawing method and device
CN112667457A (en) * 2019-10-16 2021-04-16 烽火通信科技股份有限公司 Method and system for monitoring service call under micro-service architecture
CN112698975A (en) * 2020-12-14 2021-04-23 北京大学 Fault root cause positioning method and system of micro-service architecture information system
CN112817785A (en) * 2019-11-15 2021-05-18 亚信科技(中国)有限公司 Anomaly detection method and device for micro-service system
WO2021147832A1 (en) * 2020-01-23 2021-07-29 阿里巴巴集团控股有限公司 Data processing method and apparatus, database system, electronic device, and storage medium
CN113190373A (en) * 2021-05-31 2021-07-30 中国人民解放军国防科技大学 Micro-service system fault root cause positioning method based on fault feature comparison
CN113626288A (en) * 2021-08-12 2021-11-09 杭州朗和科技有限公司 Fault processing method, system, device, storage medium and electronic equipment
CN114024837A (en) * 2022-01-06 2022-02-08 杭州大乘智能科技有限公司 Fault root cause positioning method of micro-service system
CN114598742A (en) * 2022-03-04 2022-06-07 北京北信源软件股份有限公司 Micro-service importance determination method, device, electronic equipment and storage medium
CN115314559A (en) * 2022-08-03 2022-11-08 苏州创意云网络科技有限公司 Network service system and abnormal response method thereof
CN115396341A (en) * 2022-08-16 2022-11-25 度小满科技(北京)有限公司 Service stability evaluation method and device, storage medium and electronic device
CN117520040A (en) * 2024-01-05 2024-02-06 中国民航大学 Micro-service fault root cause determining method, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103986625A (en) * 2014-05-29 2014-08-13 中国科学院软件研究所 Cloud application fault diagnosis system based on statistical monitoring
US20170177008A1 (en) * 2015-12-21 2017-06-22 International Business Machines Corporation Topological connectivity and relative distances from temporal sensor measurements of physical delivery system
CN107766205A (en) * 2017-10-10 2018-03-06 武汉大学 A kind of monitoring system and method towards the tracking of micro services invoked procedure
CN108322351A (en) * 2018-03-05 2018-07-24 北京奇艺世纪科技有限公司 Generate method and apparatus, fault determination method and the device of topological diagram
CN108762908A (en) * 2018-05-31 2018-11-06 阿里巴巴集团控股有限公司 System calls method for detecting abnormality and device
CN109144724A (en) * 2018-07-27 2019-01-04 众安信息技术服务有限公司 A kind of micro services resource scheduling system and method
CN109213616A (en) * 2018-09-25 2019-01-15 江苏润和软件股份有限公司 A kind of micro services software systems method for detecting abnormality based on calling map analysis
CN109254865A (en) * 2018-09-25 2019-01-22 江苏润和软件股份有限公司 A kind of cloud data center based on statistical analysis services abnormal root because of localization method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103986625A (en) * 2014-05-29 2014-08-13 中国科学院软件研究所 Cloud application fault diagnosis system based on statistical monitoring
US20170177008A1 (en) * 2015-12-21 2017-06-22 International Business Machines Corporation Topological connectivity and relative distances from temporal sensor measurements of physical delivery system
CN107766205A (en) * 2017-10-10 2018-03-06 武汉大学 A kind of monitoring system and method towards the tracking of micro services invoked procedure
CN108322351A (en) * 2018-03-05 2018-07-24 北京奇艺世纪科技有限公司 Generate method and apparatus, fault determination method and the device of topological diagram
CN108762908A (en) * 2018-05-31 2018-11-06 阿里巴巴集团控股有限公司 System calls method for detecting abnormality and device
CN109144724A (en) * 2018-07-27 2019-01-04 众安信息技术服务有限公司 A kind of micro services resource scheduling system and method
CN109213616A (en) * 2018-09-25 2019-01-15 江苏润和软件股份有限公司 A kind of micro services software systems method for detecting abnormality based on calling map analysis
CN109254865A (en) * 2018-09-25 2019-01-22 江苏润和软件股份有限公司 A kind of cloud data center based on statistical analysis services abnormal root because of localization method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SIGELMAN BENJAMIN H.等: "Dapper, a Large-Scale Distributed Systems Tracing Infrastructure", 《GOOGLE TECHNICAL REPORT》 *

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427275B (en) * 2019-07-11 2022-11-18 复旦大学 Micro-service potential error and fault source prediction method based on track log learning
CN110427275A (en) * 2019-07-11 2019-11-08 复旦大学 Micro services latent fault and fault rootstock prediction technique based on trace logs study
CN112231187B (en) * 2019-07-15 2022-07-26 华为技术有限公司 Micro-service abnormity analysis method and device
CN112231187A (en) * 2019-07-15 2021-01-15 华为技术有限公司 Micro-service abnormity analysis method and device
CN110442641A (en) * 2019-08-06 2019-11-12 中国工商银行股份有限公司 A kind of link topology figure methods of exhibiting, device, storage medium and equipment
CN110442641B (en) * 2019-08-06 2022-07-12 中国工商银行股份有限公司 Link topology graph display method and device, storage medium and equipment
CN112667457A (en) * 2019-10-16 2021-04-16 烽火通信科技股份有限公司 Method and system for monitoring service call under micro-service architecture
CN110825589A (en) * 2019-11-07 2020-02-21 字节跳动有限公司 Anomaly detection method and device for micro-service system and electronic equipment
CN110825589B (en) * 2019-11-07 2024-01-05 字节跳动有限公司 Abnormality detection method and device for micro-service system and electronic equipment
CN112817785A (en) * 2019-11-15 2021-05-18 亚信科技(中国)有限公司 Anomaly detection method and device for micro-service system
CN111190756B (en) * 2019-11-18 2023-04-28 中山大学 Root cause positioning algorithm based on call chain data
CN111190756A (en) * 2019-11-18 2020-05-22 中山大学 Root cause positioning algorithm based on call chain data
WO2021147832A1 (en) * 2020-01-23 2021-07-29 阿里巴巴集团控股有限公司 Data processing method and apparatus, database system, electronic device, and storage medium
CN111597070A (en) * 2020-07-27 2020-08-28 北京必示科技有限公司 Fault positioning method and device, electronic equipment and storage medium
CN112118127A (en) * 2020-08-07 2020-12-22 中国科学院软件研究所 Service reliability guarantee method based on fault similarity
CN112118127B (en) * 2020-08-07 2021-11-09 中国科学院软件研究所 Service reliability guarantee method based on fault similarity
CN112698975A (en) * 2020-12-14 2021-04-23 北京大学 Fault root cause positioning method and system of micro-service architecture information system
CN112698975B (en) * 2020-12-14 2022-09-27 北京大学 Fault root cause positioning method and system of micro-service architecture information system
CN112615743A (en) * 2020-12-18 2021-04-06 江苏云柜网络技术有限公司 Topological graph drawing method and device
CN113190373B (en) * 2021-05-31 2022-04-05 中国人民解放军国防科技大学 Micro-service system fault root cause positioning method based on fault feature comparison
CN113190373A (en) * 2021-05-31 2021-07-30 中国人民解放军国防科技大学 Micro-service system fault root cause positioning method based on fault feature comparison
CN113626288B (en) * 2021-08-12 2023-08-25 杭州朗和科技有限公司 Fault processing method, system, device, storage medium and electronic equipment
CN113626288A (en) * 2021-08-12 2021-11-09 杭州朗和科技有限公司 Fault processing method, system, device, storage medium and electronic equipment
CN114024837B (en) * 2022-01-06 2022-04-05 杭州乘云数字技术有限公司 Fault root cause positioning method of micro-service system
CN114024837A (en) * 2022-01-06 2022-02-08 杭州大乘智能科技有限公司 Fault root cause positioning method of micro-service system
CN114598742A (en) * 2022-03-04 2022-06-07 北京北信源软件股份有限公司 Micro-service importance determination method, device, electronic equipment and storage medium
CN115314559A (en) * 2022-08-03 2022-11-08 苏州创意云网络科技有限公司 Network service system and abnormal response method thereof
CN115314559B (en) * 2022-08-03 2023-09-29 苏州创意云网络科技有限公司 Network service system, abnormal response method thereof, service unit, scheduling processing unit, electronic device and computer storage medium
CN115396341A (en) * 2022-08-16 2022-11-25 度小满科技(北京)有限公司 Service stability evaluation method and device, storage medium and electronic device
CN115396341B (en) * 2022-08-16 2023-12-05 度小满科技(北京)有限公司 Service stability evaluation method and device, storage medium and electronic device
CN117520040A (en) * 2024-01-05 2024-02-06 中国民航大学 Micro-service fault root cause determining method, electronic equipment and storage medium
CN117520040B (en) * 2024-01-05 2024-03-08 中国民航大学 Micro-service fault root cause determining method, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN109933452B (en) 2020-06-19

Similar Documents

Publication Publication Date Title
CN109933452A (en) A kind of micro services intelligent monitoring method towards anomalous propagation
CN111756582B (en) Service chain monitoring method based on NFV log alarm
CN105337765B (en) A kind of distribution hadoop cluster automatic fault diagnosis repair system
CN109213616A (en) A kind of micro services software systems method for detecting abnormality based on calling map analysis
WO2021036229A1 (en) Method for changing service on device and service changing system
CN107124289B (en) Weblog time alignment method, device and host
TW201941058A (en) Anomaly detection method and device
CN101997709B (en) Root alarm data analysis method and system
Nováczki An improved anomaly detection and diagnosis framework for mobile network operators
CN111176879A (en) Fault repairing method and device for equipment
CN102111797A (en) Fault diagnosis method and fault diagnosis equipment
Ehlers et al. A self-adaptive monitoring framework for component-based software systems
Li et al. Fighting the fog of war: Automated incident detection for cloud systems
CN110032463A (en) A kind of system fault locating method and system based on Bayesian network
CN113010392B (en) Big data platform testing method, device, equipment, storage medium and system
KR102580916B1 (en) Apparatus and method for managing trouble using big data of 5G distributed cloud system
CN115118621B (en) Dependency graph-based micro-service performance diagnosis method and system
Bocciarelli et al. BPMN-based business process modeling and simulation
CN114201326A (en) Micro-service abnormity diagnosis method based on attribute relation graph
Yu et al. TraceRank: Abnormal service localization with dis‐aggregated end‐to‐end tracing data in cloud native systems
CN107204868B (en) Task operation monitoring information acquisition method and device
CN112506802B (en) Test data management method and system
CN111158979A (en) Service dial testing method, system, device and storage medium
Li et al. Microservice anomaly detection based on tracing data using semi-supervised learning
CN109889258A (en) A kind of optical network fault method of calibration and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant