CN109800052A - Abnormality detection and localization method and device applied to distributed container cloud platform - Google Patents

Abnormality detection and localization method and device applied to distributed container cloud platform Download PDF

Info

Publication number
CN109800052A
CN109800052A CN201811537333.2A CN201811537333A CN109800052A CN 109800052 A CN109800052 A CN 109800052A CN 201811537333 A CN201811537333 A CN 201811537333A CN 109800052 A CN109800052 A CN 109800052A
Authority
CN
China
Prior art keywords
component
tcp
information
subgraph
exception
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811537333.2A
Other languages
Chinese (zh)
Other versions
CN109800052B (en
Inventor
叶可江
卢澄志
须成忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201811537333.2A priority Critical patent/CN109800052B/en
Publication of CN109800052A publication Critical patent/CN109800052A/en
Priority to PCT/CN2019/123989 priority patent/WO2020119627A1/en
Application granted granted Critical
Publication of CN109800052B publication Critical patent/CN109800052B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present invention relates to container cloud platform fields, and in particular to a kind of abnormality detection applied to distributed container cloud platform and localization method and device, this method and device first obtain the TCP delay information of each container assemblies;Postpone information to the TCP of each container assemblies by sliding window accumulation and Outlier Detection Algorithm to analyze, obtains the status information and formation component status information key-value pair of each component;Component exception subgraph is constructed by component status information key-value pair;The container assemblies node occurred extremely is oriented according to component exception subgraph.This method and device are reduced the expense of data acquisition, are improved the accuracy and real-time of abnormality judgement using TCP delay information progress abnormality judgement.Simultaneously in view of between each component, the interference between physical machine and component proposes propagation of the component exception subgraph to indicate abnormality, improves the accuracy positioned extremely.

Description

Abnormality detection and localization method and device applied to distributed container cloud platform
Technical field
The present invention relates to container cloud platform fields, in particular to a kind of applied to the different of distributed container cloud platform Often detection and localization method and device.
Background technique
Cloud computing obtains the favor of industrial circle and academia as a kind of new services presentation mode.The pass of cloud computing Key technology is exactly virtualization technology, and by virtualizing all kinds of resources, cloud computing service provider can easily will very much All kinds of resources, which are customized, consigns to user's use, and numerous applications also gradually start to move in cloud computing cluster.Traditional void Quasi-ization technology includes KVM, Xen etc..But traditional virtualization technology is due to excessively heavy, for some component in application cluster It is created, modification and migration operation are all very complicated, therefore cloud computing service provider needs the void of more lightweight Quasi-ization technology.Container technique is a kind of virtualization technology of the operating system grade of lightweight.Compared to traditional virtualization technology The virtualization of virtualization for hardware layer, container rests on operating system layer, creates it either, modifies or migrate all It is very convenient.Container technique is cracking to be used by all kinds of cloud computing service providers.Due to these features of container, Yong Hu Often by each assembly operating in independent container when disposing its application, conveniently to be tieed up to application Shield, which results in the internal structures of container cloud complexity.The characteristics of less isolated property of container, also results between container mutually simultaneously It interferes more serious.Once exception occurs in some container, will propagate rapidly extremely.And then influence different application groups Part.Cloud service provider needs a kind of side that can be positioned extremely to the complicated application cluster established by container Method.
Typically, an application being deployed on container cloud is often made of hundreds of component, and component and group It interdepends between part, constitutes the complicated figure by component as node.It can be from this using the relevant knowledge of graph theory The root occurred extremely is navigated in the figure of a complexity.I.e. the cloud computing platform based on container technique is usually by thousands of physical machines It forms, usually runs dozens of container in every physical machine, thus based on the cloud computing platform of container technique compared to traditional Cloud computing platform is more complicated.Compared to traditional virtual machine, vessel isolation is worse, interferes between container and container more tight Weight.Thus compared to conventional virtual machine, container is also easier to influence each other.Simultaneously because the behaviour of container deployment under operation Make in system, thus the exception of physical machine can also cause the container disposed on it to be abnormal.Existing abnormality detection positioning Scheme lacks the analysis of relevance between component and physical machine between component, while existing abnormality detection locating scheme Utility achievement data is carried out abnormality detection and is positioned, and is brought and is greatly stored and transmitted expense, thus cannot be fitted well Answer the distributed container cloud platform environment of serious interference.
Nguyen et al. is in " Insight:in-situ online service failure path inference in Production computing infrastructures " chapter 3 propose that the positioning of online black box exception positioning system is abnormal Component.The system utilizes the normal fluctuation model of virtual machine performance index structural behavior index, judges the data point of anomalous variation, Abnormal component is positioned in combination with the dependence between the temporal information and component of changed data point.Although the system It can be detected and be positioned to abnormal, but since it uses performance indicator to carry out abnormality detection and judge, for complexity Distributed container cloud platform, monitoring performance index bring expense will be very huge.
Summary of the invention
The embodiment of the invention provides a kind of abnormality detections applied to distributed container cloud platform and localization method and dress It sets, at least to solve the technical issues of traditional method for detecting abnormality based on unimodule can not be suitable for distributed container cloud.
An embodiment according to the present invention provides a kind of abnormality detection and positioning applied to distributed container cloud platform Method, comprising the following steps:
Obtain the TCP delay information of each container assemblies;
Postpone information to the TCP of each container assemblies by sliding window accumulation and Outlier Detection Algorithm to analyze, obtain The status information and formation component status information key-value pair of each component;
Component exception subgraph is constructed by component status information key-value pair;
The container assemblies node occurred extremely is oriented according to component exception subgraph.
Further, postpone information to the TCP of each container assemblies by sliding window accumulation and Outlier Detection Algorithm to carry out Analysis, obtains the status information of each component and formation component status information key-value pair includes:
Sliding window [the L of initialization component0, Lk], input TCP delay information is until the data that TCP postpones in sliding window Number reaches k, initializes average valueAccumulation and Sk=0;Wherein [L0, Lk] it is that storage TCP postpones team of the information from 0 to k Column, k are the integer of 0 < k < 60;
Input TCP postpones information L againt, TCP is postponed into information LtIt is inserted into sliding window, and is deleted in sliding window earliest TCP postpone information Lt-k, average value in calculation windowAnd calculate accumulation andWherein LtFor The TCP of t moment postpones information, the integer of t t > k;
Calculate early warning value Sdiff=Smax-Smin, wherein Smax、Smin∈[St-k, St], St-kWhen postponing information for earliest TCP Accumulation and;
Judge SdiffWhether between normality threshold [- h, h], if it is, judging that the state Status of the component is Normally, otherwise judge the state Status of the component for exception;
According to the status information formation component status information key-value pair<CID:MID:Status>of each component, wherein CID table Show the number of component, MID indicates the number of physical machine locating for component, and Status indicates the state of component, when component states are Status value is 1 when abnormal, is normally then 0.
Further, constructing component exception subgraph by component status information key-value pair includes:
Input module dependence graph G, the matrix of component dependencies figure are expressed as G=(Eij), wherein i, and j expression is answered With the component in cluster, Eij indicates the dependence between i component and j component, the Eij value if component i is dependent on component j It is 1, otherwise Eij value is 0;
Traverse component status information key-value pair deletes i=CID from component dependencies figure G when Status value is 0 Or the row and column of j=CID, traversal finish to obtain component dependencies subgraph G1;
It whether there is stand-alone assembly node in determination component dependence subgraph G1, stand-alone assembly node is independent of it His component nodes and the component nodes not relied on for any other component nodes, construction component is different after this kind of component nodes are deleted Chang Zitu G '.
Further, orienting the container assemblies node occurred extremely according to component exception subgraph includes:
Traverse component exception subgraph G ' calculates δi=∑j∈G’EijIf δi=0, then it represents that component nodes i is abnormal Root node.
Further, method is also wrapped after orienting the container assemblies node occurred extremely according to component exception subgraph It includes:
Judge whether the MID of each abnormal root node is identical, if identical, judges that the physical machine generation that number is MID is different Often.
Further, the TCP for obtaining each container assemblies postpones information and includes:
Postpone information using the TCP that software tcprstat collects each component.
According to another embodiment of the present invention, a kind of abnormality detection applied to distributed container cloud platform and fixed is provided Position device, comprising:
Postpone information acquisition unit, the TCP for obtaining each container assemblies postpones information;
State information acquisition unit, for accumulating the TCP with Outlier Detection Algorithm to each container assemblies by sliding window Delay information is analyzed, and the status information and formation component status information key-value pair of each component are obtained;
Component exception subgraph construction unit, for constructing component exception subgraph by component status information key-value pair;
Abnormal positioning unit, for orienting the container assemblies node occurred extremely according to component exception subgraph.
Further, device further include:
Abnormal deciding means, if identical, judges that number is for judging whether the MID of each abnormal root node is identical The physical machine of MID is abnormal.
A kind of storage medium, storage medium, which is stored with, can be realized above-mentioned any one applied to distributed container cloud platform Abnormality detection and localization method program file.
A kind of processor, processor is for running program, wherein program executes being applied to for above-mentioned any one when running The abnormality detection and localization method of distributed container cloud platform.
The abnormality detection and localization method and device for being applied to distributed container cloud platform in the embodiment of the present invention, uses TCP postpone information carry out abnormality judgement, reduce data acquisition expense, improve abnormality judgement accuracy with Real-time.Simultaneously in view of between each component, the interference between physical machine and component proposes component exception subgraph to indicate The propagation of abnormality improves the accuracy positioned extremely.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of abnormality detection and localization method that the present invention is applied to distributed container cloud platform;
Fig. 2 is that the present invention is applied to the abnormality detection of distributed container cloud platform and the preferred flow charts of localization method;
Fig. 3 is that the present invention is applied to the abnormality detection of distributed container cloud platform and the module map of positioning device;
Fig. 4 is that the present invention is applied to the abnormality detection of distributed container cloud platform and the preferred module figure of localization method.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product Or other step or units that equipment is intrinsic.
With the mature of container technique, the cloud computing system based on container technique, that is, container cloud is had begun gradually Replace traditional cloud computing system based on virtual machine.Since container has the characteristics that light-weighted, the deployment of container is more convenient. Thus composition is more complicated compared to traditional cloud computing platform inside container cloud.Secondly isolation phase of the container to system items resource It is not strong compared with for virtual machine, and multiple containers are run on same physical host, the interference between container is comparatively strong, because Once some container is abnormal inside this container cloud, will propagate rapidly extremely, and then influence entire cluster.And due to container The internal environment of cloud complexity has not been suitable for distributed container cloud ring based on the method for detecting abnormality of unimodule for tradition Border.The prior art is analyzed using performance indicator abnormal, and the expense of data acquisition is increased, while needing to construct normal Volatility model, it is lower and lack real-time for fluctuating accuracy rate for frequent and complicated container cloud platform.
The present invention provides a kind of abnormality detection and positioning applied to distributed container cloud platform for container cloud platform Method and device.Abnormal positioning and inspection can be carried out to more complicated distributed container cloud platform by this method and device It surveys, while the accuracy rate positioned extremely is improved by its component exception subgraph.
Embodiment 1
An embodiment according to the present invention provides a kind of abnormality detection applied to distributed container cloud platform and positioning side Method, referring to Fig. 1, comprising the following steps:
S101: the TCP delay information of each container assemblies is obtained;
S102: postponing information to the TCP of each container assemblies by sliding window accumulation and Outlier Detection Algorithm and analyze, Obtain the status information and formation component status information key-value pair of each component;
S103: component exception subgraph is constructed by component status information key-value pair;
S104: the container assemblies node occurred extremely is oriented according to component exception subgraph.
This method carries out abnormality judgement using TCP delay information, reduces the expense of data acquisition, improves exception The accuracy and real-time of state judgement.Simultaneously in view of between each component, the interference between physical machine and component proposes group Propagation of the part exception subgraph to indicate abnormality, improves the accuracy positioned extremely.
In as a preferred technical scheme, by sliding window accumulation with Outlier Detection Algorithm to the TCP of each container assemblies Delay information is analyzed, and obtains the status information of each component and formation component status information key-value pair includes:
Sliding window [the L of initialization component0, Lk], input TCP (Transmission Control Protocol transmission Control protocol) postpone information until the data amount check that TCP postpones in sliding window reaches k, initialization average valueAccumulation And Sk=0, which is initialization value, Sk=Sk-1=... S0=0, Lk=Lk-1...=L0=0;Wherein [L0, Lk] it is that storage TCP prolongs Slow queue of the information from 0 to k, the size of queue are k, and k value is the integer of 0 < k < 60 as input, k, and usual k takes 10;
Input TCP postpones information L againt, TCP is postponed into information LtIt is inserted into sliding window, and is deleted in sliding window earliest TCP postpone information Lt-k, average value in calculation windowAnd calculate accumulation andIt is herein Iterative calculation, when t is k+1, St-1=Sk=0;Wherein LtPostpone information, the integer of t t > k for the TCP of t moment;
Calculate early warning value Sdiff=Smax-Smin, wherein Smax、Smin∈[St-k, St], St-kWhen postponing information for earliest TCP Accumulation and;
Judge SdiffWhether between normality threshold [- h, h], if it is, judging that the state Status of the component is Normally, otherwise judge the state Status of the component for exception;H indicates acceptable SdiffRange, for input one of parameter.
According to the status information formation component status information key-value pair<CID:MID:Status>of each component, wherein CID table Show the number of component, MID indicates the number of physical machine locating for component, and Status indicates the state of component, when component states are Status value is 1 when abnormal, is normally then 0.
In as a preferred technical scheme, constructing component exception subgraph by component status information key-value pair includes:
Input module dependence graph G, the matrix of component dependencies figure are expressed as G=(Eij), wherein i, and j expression is answered With the component in cluster, Eij indicates the dependence between i component and j component, the Eij value if component i is dependent on component j It is 1, otherwise Eij value is 0;
Traverse component status information key-value pair deletes i=CID from component dependencies figure G when Status value is 0 Or the row and column of j=CID, traversal finish to obtain component dependencies subgraph G1;
It whether there is stand-alone assembly node in determination component dependence subgraph G1, i.e. the stand-alone assembly node is not depend on The component nodes not relied in other assemblies node and for any other component nodes, construction group after this kind of component nodes are deleted Part exception subgraph G '.
In as a preferred technical scheme, the container assemblies node packet occurred extremely is oriented according to component exception subgraph It includes:
Traverse component exception subgraph G ' calculates δi=∑j∈G’EijIf δi=0, then it represents that component nodes i is abnormal Root node.
In as a preferred technical scheme, referring to fig. 2, method is orienting the appearance occurred extremely according to component exception subgraph After device assembly node further include:
S105: judging whether the MID of each abnormal root node is identical, if identical, judges the physical machine hair that number is MID It is raw abnormal.
In as a preferred technical scheme, the TCP delay information for obtaining each container assemblies includes:
Postpone information using the TCP that software tcprstat collects each component.
Below with specific embodiment, this method is described in detail, a kind of distribution container cloud that is applied to of the present invention is put down The abnormality detection of platform and localization method the following steps are included:
Service manager submits abnormal Location Request to service broker;
After service broker receives abnormal Location Request, postpone letter using the TCP of software tcprstat collection assembly Breath.Software tcprstat is the tcp layer analysis tool freely increased income, and statisticallys analyze the response time of request, be can be used for interim Analysis, can also timed task do information collection;
Information is postponed to the TCP of component collected by service broker by sliding window accumulation and Outlier Detection Algorithm It is analyzed, the status information Status and formation component status information key-value pair<CID:MID:Status>of securing component;
Component status information key-value pair<CID:MID:Status>is submitted to service manager by service broker;
Service manager constructs component exception subgraph G ' after being collected into all component status information key assignments;
Service manager traverse component exception subgraph G ' calculates δi=∑j∈G’EijIf δi=0, then it represents that component section Point i is abnormal root node;
Judge whether the MID of each abnormal root node is identical, if identical, the physical machine for indicating that number is MID is abnormal.
Embodiment 2
Another embodiment according to the present invention provides a kind of abnormality detection and positioning applied to distributed container cloud platform Device, referring to Fig. 3, comprising:
Postpone information acquisition unit 201, the TCP for obtaining each container assemblies postpones information;
State information acquisition unit 202, for being accumulated with Outlier Detection Algorithm by sliding window to each container assemblies TCP delay information is analyzed, and the status information and formation component status information key-value pair of each component are obtained;
Component exception subgraph construction unit 203, for constructing component exception subgraph by component status information key-value pair;
Abnormal positioning unit 204, for orienting the container assemblies node occurred extremely according to component exception subgraph.
The abnormality detection and positioning device of Based on Distributed container cloud platform of the present invention are carried out abnormal using TCP delay information State judgement, reduces the expense of data acquisition, improves the accuracy and real-time of abnormality judgement.Simultaneously in view of each Between component, interference between physical machine and component proposes propagation of the component exception subgraph to indicate abnormality, improves The accuracy of abnormal positioning.
In as a preferred technical scheme, referring to fig. 4, device further include:
Abnormal deciding means 205, for judging whether the MID of each abnormal root node is identical, if identical, judges to number It is abnormal for the physical machine of MID.
Embodiment 3
A kind of storage medium, storage medium, which is stored with, can be realized above-mentioned any one applied to distributed container cloud platform Abnormality detection and localization method program file.
Embodiment 4
A kind of processor, processor is for running program, wherein program executes being applied to for above-mentioned any one when running The abnormality detection and localization method of distributed container cloud platform.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed technology contents can pass through others Mode is realized.Wherein, system embodiment described above is only schematical, such as the division of unit, can be one kind Logical function partition, there may be another division manner in actual implementation, such as multiple units or components can combine or can To be integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Coupling, direct-coupling or communication connection can be through some interfaces, the indirect coupling or communication connection of unit or module, It can be electrical or other forms.
Unit may or may not be physically separated as illustrated by the separation member, shown as a unit Component may or may not be physical unit, it can and it is in one place, or may be distributed over multiple units On.It can some or all of the units may be selected to achieve the purpose of the solution of this embodiment according to the actual needs.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
It, can if integrated unit is realized in the form of SFU software functional unit and when sold or used as an independent product To be stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention substantially or Say that all or part of the part that contributes to existing technology or the technical solution can embody in the form of software products Out, which is stored in a storage medium, including some instructions are used so that a computer equipment (can be personal computer, server or network equipment etc.) executes all or part of step of each embodiment method of the present invention Suddenly.And storage medium above-mentioned includes: USB flash disk, read-only memory (ROM, Read-Only Memory), random access memory The various media that can store program code such as (RAM, Random Access Memory), mobile hard disk, magnetic or disk.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (10)

1. a kind of abnormality detection and localization method applied to distributed container cloud platform, which comprises the following steps:
Obtain the TCP delay information of each container assemblies;
Postpone information to the TCP of each container assemblies by sliding window accumulation and Outlier Detection Algorithm to analyze, obtains each group The status information and formation component status information key-value pair of part;
Component exception subgraph is constructed by component status information key-value pair;
The container assemblies node occurred extremely is oriented according to component exception subgraph.
2. the method according to claim 1, wherein described pass through sliding window accumulation and Outlier Detection Algorithm pair The TCP delay information of each container assemblies is analyzed, and the status information and formation component status information key-value pair of each component are obtained Include:
Sliding window [the L of initialization component0, Lk], input TCP delay information is until the data amount check that TCP postpones in sliding window Reach k, initializes average valueAccumulation and Sk=0;Wherein [L0, Lk] it is that storage TCP postpones queue of the information from 0 to k, k For the integer of 0 < k < 60;
Input TCP postpones information L againt, TCP is postponed into information LtIt is inserted into sliding window, and is deleted in sliding window earliest TCP postpones information Lt-k, average value in calculation windowAnd calculate accumulation andWherein LtFor t The TCP at moment postpones information, the integer of t t > k;
Calculate early warning value Sdiff=Smax-Smin, wherein Smax、Smin∈[St-k, St], St-kTiring out when postponing information for earliest TCP Product and;
Judge SdiffWhether between normality threshold [- h, h], if it is, judge the state Status of the component be it is normal, Otherwise judge the state Status of the component for exception;
According to the status information formation component status information key-value pair<CID:MID:Status>of each component, wherein CID expression group The number of part, MID indicate the number of physical machine locating for component, and Status indicates the state of component, when component states are abnormal When Status value be 1, normally then be 0.
3. according to the method described in claim 2, it is characterized in that, described different by component status information key-value pair construction component Chang Zitu includes:
Input module dependence graph G, the matrix of component dependencies figure are expressed as G=(Eij), wherein i, and j indicates application collection Component in group, Eij indicate the dependence between i component and j component, and Eij value is 1 if component i is dependent on component j, Otherwise Eij value is 0;
Traverse component status information key-value pair deletes i=CID or j when Status value is 0 from component dependencies figure G The row and column of=CID, traversal finish to obtain component dependencies subgraph G1;
It whether there is stand-alone assembly node in determination component dependence subgraph G1, stand-alone assembly node is independent of other groups Part node and the component nodes not relied on for any other component nodes, construction component is extremely sub after this kind of component nodes are deleted Scheme G '.
4. according to the method described in claim 3, it is characterized in that, described oriented according to component exception subgraph occurs extremely Container assemblies node includes:
Traverse component exception subgraph G ' calculates δi=∑j∈G’EijIf δi=0, then it represents that component nodes i is abnormal root section Point.
5. according to the method described in claim 4, it is characterized in that, the method is oriented described according to component exception subgraph Extremely after the container assemblies node occurred further include:
Judge whether the MID of each abnormal root node is identical, if identical, the physical machine for judging that number is MID is abnormal.
6. the method according to claim 1, wherein the TCP delay information for obtaining each container assemblies includes:
Postpone information using the TCP that software tcprstat collects each component.
7. a kind of abnormality detection and positioning device applied to distributed container cloud platform characterized by comprising
Postpone information acquisition unit, the TCP for obtaining each container assemblies postpones information;
State information acquisition unit, for being postponed by sliding window accumulation and Outlier Detection Algorithm to the TCP of each container assemblies Information is analyzed, and the status information and formation component status information key-value pair of each component are obtained;
Component exception subgraph construction unit, for constructing component exception subgraph by component status information key-value pair;
Abnormal positioning unit, for orienting the container assemblies node occurred extremely according to component exception subgraph.
8. device according to claim 7, which is characterized in that described device further include:
Abnormal deciding means, if identical, judges that number is MID's for judging whether the MID of each abnormal root node is identical Physical machine is abnormal.
9. a kind of storage medium, which is characterized in that the storage medium be stored with can be realized it is any one in claim 1 to 6 It is applied to the abnormality detection of distributed container cloud platform and the program file of localization method described in.
10. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run Benefit is applied to the abnormality detection and localization method of distributed container cloud platform described in requiring any one of 1 to 6.
CN201811537333.2A 2018-12-15 2018-12-15 Anomaly detection and positioning method and device applied to distributed container cloud platform Active CN109800052B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811537333.2A CN109800052B (en) 2018-12-15 2018-12-15 Anomaly detection and positioning method and device applied to distributed container cloud platform
PCT/CN2019/123989 WO2020119627A1 (en) 2018-12-15 2019-12-09 Abnormality detection and positioning method and apparatus applied to distributed container cloud platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811537333.2A CN109800052B (en) 2018-12-15 2018-12-15 Anomaly detection and positioning method and device applied to distributed container cloud platform

Publications (2)

Publication Number Publication Date
CN109800052A true CN109800052A (en) 2019-05-24
CN109800052B CN109800052B (en) 2020-11-24

Family

ID=66556890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811537333.2A Active CN109800052B (en) 2018-12-15 2018-12-15 Anomaly detection and positioning method and device applied to distributed container cloud platform

Country Status (2)

Country Link
CN (1) CN109800052B (en)
WO (1) WO2020119627A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111061586A (en) * 2019-12-05 2020-04-24 深圳先进技术研究院 Container cloud platform anomaly detection method and system and electronic equipment
WO2020119627A1 (en) * 2018-12-15 2020-06-18 深圳先进技术研究院 Abnormality detection and positioning method and apparatus applied to distributed container cloud platform
WO2021109048A1 (en) * 2019-12-05 2021-06-10 深圳先进技术研究院 Container cloud platform abnormality detection method and system, and electronic device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106487633A (en) * 2016-10-11 2017-03-08 中国银联股份有限公司 A kind of abnormal monitoring method of virtual machine and device
CN106776005A (en) * 2016-11-23 2017-05-31 华中科技大学 A kind of resource management system and method towards containerization application
CN107612787A (en) * 2017-11-06 2018-01-19 南京易捷思达软件科技有限公司 A kind of cloud hostdown detection method for cloud platform of being increased income based on Openstack
US20180032903A1 (en) * 2016-07-28 2018-02-01 International Business Machines Corporation Optimized re-training for analytic models
WO2018084912A1 (en) * 2016-11-02 2018-05-11 Qualcomm Incorporated Methods and systems for anomaly detection using function specifications derived from server input/output (i/o) behavior
CN108259241A (en) * 2018-01-11 2018-07-06 上海有云信息技术有限公司 A kind of abnormal localization method and device of cloud platform monitoring system
CN108306747A (en) * 2017-01-11 2018-07-20 阿里巴巴集团控股有限公司 A kind of cloud security detection method, device and electronic equipment
CN108337108A (en) * 2017-12-28 2018-07-27 天津麒麟信息技术有限公司 A kind of cloud platform failure automation localization method based on association analysis
CN108491306A (en) * 2018-03-19 2018-09-04 广东电网有限责任公司珠海供电局 One kind being based on enterprise's private clound credibility monitoring method and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3345626B2 (en) * 1994-09-29 2002-11-18 富士通株式会社 Processor error countermeasure device in multiprocessor system and processor error countermeasure method in multiprocessor system
CN101505243B (en) * 2009-03-10 2011-01-05 中国科学院软件研究所 Performance exception detecting method for Web application
CN105242971B (en) * 2015-10-20 2019-02-22 北京航空航天大学 Memory object management method and system towards Stream Processing system
CN108306879B (en) * 2018-01-30 2020-11-06 福建师范大学 Distributed real-time anomaly positioning method based on Web session flow
CN109800052B (en) * 2018-12-15 2020-11-24 深圳先进技术研究院 Anomaly detection and positioning method and device applied to distributed container cloud platform

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180032903A1 (en) * 2016-07-28 2018-02-01 International Business Machines Corporation Optimized re-training for analytic models
CN106487633A (en) * 2016-10-11 2017-03-08 中国银联股份有限公司 A kind of abnormal monitoring method of virtual machine and device
WO2018084912A1 (en) * 2016-11-02 2018-05-11 Qualcomm Incorporated Methods and systems for anomaly detection using function specifications derived from server input/output (i/o) behavior
CN106776005A (en) * 2016-11-23 2017-05-31 华中科技大学 A kind of resource management system and method towards containerization application
CN108306747A (en) * 2017-01-11 2018-07-20 阿里巴巴集团控股有限公司 A kind of cloud security detection method, device and electronic equipment
CN107612787A (en) * 2017-11-06 2018-01-19 南京易捷思达软件科技有限公司 A kind of cloud hostdown detection method for cloud platform of being increased income based on Openstack
CN108337108A (en) * 2017-12-28 2018-07-27 天津麒麟信息技术有限公司 A kind of cloud platform failure automation localization method based on association analysis
CN108259241A (en) * 2018-01-11 2018-07-06 上海有云信息技术有限公司 A kind of abnormal localization method and device of cloud platform monitoring system
CN108491306A (en) * 2018-03-19 2018-09-04 广东电网有限责任公司珠海供电局 One kind being based on enterprise's private clound credibility monitoring method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JORDAN HOCHENBAUM: "Automatic Anomaly Detection in the Cloud", 《ARXIV》 *
TAO WANG: "Self-adaptive cloud monitoring with online anomaly detection", 《FUTURE GENERATION COMPUTER SYSTEMS》 *
王桂平: "云环境下面向可信的虚拟机异常检测关键技术研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020119627A1 (en) * 2018-12-15 2020-06-18 深圳先进技术研究院 Abnormality detection and positioning method and apparatus applied to distributed container cloud platform
CN111061586A (en) * 2019-12-05 2020-04-24 深圳先进技术研究院 Container cloud platform anomaly detection method and system and electronic equipment
WO2021109048A1 (en) * 2019-12-05 2021-06-10 深圳先进技术研究院 Container cloud platform abnormality detection method and system, and electronic device
CN111061586B (en) * 2019-12-05 2023-09-19 深圳先进技术研究院 Container cloud platform anomaly detection method and system and electronic equipment

Also Published As

Publication number Publication date
WO2020119627A1 (en) 2020-06-18
CN109800052B (en) 2020-11-24

Similar Documents

Publication Publication Date Title
US10860576B2 (en) Splitting a query into native query operations and post-processing operations
AU2017281638B2 (en) Application migration system
US10177998B2 (en) Augmenting flow data for improved network monitoring and management
EP2011015B1 (en) Method and system for determining compatibility of computer systems
AU2020200578A1 (en) Intelligent configuration discovery techniques
EP2425349B1 (en) Application efficiency engine
EP2515233A1 (en) Detecting and diagnosing misbehaving applications in virtualized computing systems
EP2524322B1 (en) A virtualization and consolidation analysis engine for enterprise data centers
US7882216B2 (en) Process and methodology for generic analysis of metrics related to resource utilization and performance
CN109800052A (en) Abnormality detection and localization method and device applied to distributed container cloud platform
US20200026566A1 (en) Workload identification and display of workload-specific metrics
US20070250615A1 (en) Method and System For Determining Compatibility of Computer Systems
CN106170947A (en) A kind of alarm information processing method, relevant device and system
WO2009026703A1 (en) Method and system for evaluating virtualized environments
WO2012117318A1 (en) Generating a semantic graph relating information assets
US10754866B2 (en) Management device and management method
US9270539B2 (en) Predicting resource provisioning times in a computing environment
US20130254524A1 (en) Automated configuration change authorization
CN108989430A (en) Load-balancing method, device and storage medium
US20190215246A1 (en) Predictive analysis in a software defined network
CN106471473A (en) Mechanism for the too high distribution of server in the minds of in control data
CN115514657A (en) Network modeling method, network problem analysis method and related equipment
Jehangiri et al. Diagnosing cloud performance anomalies using large time series dataset analysis
CN108009004A (en) The implementation method of service application availability measurement monitoring based on Docker
CN110196751A (en) The partition method and device of mutual interference service, electronic equipment, storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant