CN109800052A - Abnormality detection and localization method and device applied to distributed container cloud platform - Google Patents

Abnormality detection and localization method and device applied to distributed container cloud platform Download PDF

Info

Publication number
CN109800052A
CN109800052A CN201811537333.2A CN201811537333A CN109800052A CN 109800052 A CN109800052 A CN 109800052A CN 201811537333 A CN201811537333 A CN 201811537333A CN 109800052 A CN109800052 A CN 109800052A
Authority
CN
China
Prior art keywords
component
tcp
information
subgraph
container
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811537333.2A
Other languages
Chinese (zh)
Other versions
CN109800052B (en
Inventor
叶可江
卢澄志
须成忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201811537333.2A priority Critical patent/CN109800052B/en
Publication of CN109800052A publication Critical patent/CN109800052A/en
Priority to PCT/CN2019/123989 priority patent/WO2020119627A1/en
Application granted granted Critical
Publication of CN109800052B publication Critical patent/CN109800052B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本发明涉及容器云平台领域,具体涉及一种应用于分布式容器云平台的异常检测与定位方法及装置,该方法及装置先获取各容器组件的TCP延迟信息;通过滑动窗口累积和异常检测算法对各容器组件的TCP延迟信息进行分析,获取各组件的状态信息并生成组件状态信息键值对;通过组件状态信息键值对构造组件异常子图;根据组件异常子图定位出异常发生的容器组件节点。该方法及装置采用TCP延迟信息进行异常状态判断,降低了数据采集的开销,提高了异常状态判断的准确性与实时性。同时考虑到各组件之间,物理机与组件之间的干扰,提出了组件异常子图用以表示异常状态的传播,提高了异常定位的准确性。

The invention relates to the field of container cloud platforms, in particular to an abnormality detection and positioning method and device applied to a distributed container cloud platform. The method and device first obtain TCP delay information of each container component; accumulate and anomaly detection algorithms through sliding windows Analyze the TCP delay information of each container component, obtain the status information of each component, and generate a key-value pair of component status information; construct a component exception subgraph through the component status information key-value pair; locate the abnormal container according to the component exception subgraph component node. The method and device use the TCP delay information to judge the abnormal state, reduce the overhead of data collection, and improve the accuracy and real-time performance of the abnormal state judgment. At the same time, considering the interference between components and between physical machines and components, a component anomaly subgraph is proposed to represent the propagation of anomalies, which improves the accuracy of anomaly localization.

Description

Abnormality detection and localization method and device applied to distributed container cloud platform
Technical field
The present invention relates to container cloud platform fields, in particular to a kind of applied to the different of distributed container cloud platform Often detection and localization method and device.
Background technique
Cloud computing obtains the favor of industrial circle and academia as a kind of new services presentation mode.The pass of cloud computing Key technology is exactly virtualization technology, and by virtualizing all kinds of resources, cloud computing service provider can easily will very much All kinds of resources, which are customized, consigns to user's use, and numerous applications also gradually start to move in cloud computing cluster.Traditional void Quasi-ization technology includes KVM, Xen etc..But traditional virtualization technology is due to excessively heavy, for some component in application cluster It is created, modification and migration operation are all very complicated, therefore cloud computing service provider needs the void of more lightweight Quasi-ization technology.Container technique is a kind of virtualization technology of the operating system grade of lightweight.Compared to traditional virtualization technology The virtualization of virtualization for hardware layer, container rests on operating system layer, creates it either, modifies or migrate all It is very convenient.Container technique is cracking to be used by all kinds of cloud computing service providers.Due to these features of container, Yong Hu Often by each assembly operating in independent container when disposing its application, conveniently to be tieed up to application Shield, which results in the internal structures of container cloud complexity.The characteristics of less isolated property of container, also results between container mutually simultaneously It interferes more serious.Once exception occurs in some container, will propagate rapidly extremely.And then influence different application groups Part.Cloud service provider needs a kind of side that can be positioned extremely to the complicated application cluster established by container Method.
Typically, an application being deployed on container cloud is often made of hundreds of component, and component and group It interdepends between part, constitutes the complicated figure by component as node.It can be from this using the relevant knowledge of graph theory The root occurred extremely is navigated in the figure of a complexity.I.e. the cloud computing platform based on container technique is usually by thousands of physical machines It forms, usually runs dozens of container in every physical machine, thus based on the cloud computing platform of container technique compared to traditional Cloud computing platform is more complicated.Compared to traditional virtual machine, vessel isolation is worse, interferes between container and container more tight Weight.Thus compared to conventional virtual machine, container is also easier to influence each other.Simultaneously because the behaviour of container deployment under operation Make in system, thus the exception of physical machine can also cause the container disposed on it to be abnormal.Existing abnormality detection positioning Scheme lacks the analysis of relevance between component and physical machine between component, while existing abnormality detection locating scheme Utility achievement data is carried out abnormality detection and is positioned, and is brought and is greatly stored and transmitted expense, thus cannot be fitted well Answer the distributed container cloud platform environment of serious interference.
Nguyen et al. is in " Insight:in-situ online service failure path inference in Production computing infrastructures " chapter 3 propose that the positioning of online black box exception positioning system is abnormal Component.The system utilizes the normal fluctuation model of virtual machine performance index structural behavior index, judges the data point of anomalous variation, Abnormal component is positioned in combination with the dependence between the temporal information and component of changed data point.Although the system It can be detected and be positioned to abnormal, but since it uses performance indicator to carry out abnormality detection and judge, for complexity Distributed container cloud platform, monitoring performance index bring expense will be very huge.
Summary of the invention
The embodiment of the invention provides a kind of abnormality detections applied to distributed container cloud platform and localization method and dress It sets, at least to solve the technical issues of traditional method for detecting abnormality based on unimodule can not be suitable for distributed container cloud.
An embodiment according to the present invention provides a kind of abnormality detection and positioning applied to distributed container cloud platform Method, comprising the following steps:
Obtain the TCP delay information of each container assemblies;
Postpone information to the TCP of each container assemblies by sliding window accumulation and Outlier Detection Algorithm to analyze, obtain The status information and formation component status information key-value pair of each component;
Component exception subgraph is constructed by component status information key-value pair;
The container assemblies node occurred extremely is oriented according to component exception subgraph.
Further, postpone information to the TCP of each container assemblies by sliding window accumulation and Outlier Detection Algorithm to carry out Analysis, obtains the status information of each component and formation component status information key-value pair includes:
Sliding window [the L of initialization component0, Lk], input TCP delay information is until the data that TCP postpones in sliding window Number reaches k, initializes average valueAccumulation and Sk=0;Wherein [L0, Lk] it is that storage TCP postpones team of the information from 0 to k Column, k are the integer of 0 < k < 60;
Input TCP postpones information L againt, TCP is postponed into information LtIt is inserted into sliding window, and is deleted in sliding window earliest TCP postpone information Lt-k, average value in calculation windowAnd calculate accumulation andWherein LtFor The TCP of t moment postpones information, the integer of t t > k;
Calculate early warning value Sdiff=Smax-Smin, wherein Smax、Smin∈[St-k, St], St-kWhen postponing information for earliest TCP Accumulation and;
Judge SdiffWhether between normality threshold [- h, h], if it is, judging that the state Status of the component is Normally, otherwise judge the state Status of the component for exception;
According to the status information formation component status information key-value pair<CID:MID:Status>of each component, wherein CID table Show the number of component, MID indicates the number of physical machine locating for component, and Status indicates the state of component, when component states are Status value is 1 when abnormal, is normally then 0.
Further, constructing component exception subgraph by component status information key-value pair includes:
Input module dependence graph G, the matrix of component dependencies figure are expressed as G=(Eij), wherein i, and j expression is answered With the component in cluster, Eij indicates the dependence between i component and j component, the Eij value if component i is dependent on component j It is 1, otherwise Eij value is 0;
Traverse component status information key-value pair deletes i=CID from component dependencies figure G when Status value is 0 Or the row and column of j=CID, traversal finish to obtain component dependencies subgraph G1;
It whether there is stand-alone assembly node in determination component dependence subgraph G1, stand-alone assembly node is independent of it His component nodes and the component nodes not relied on for any other component nodes, construction component is different after this kind of component nodes are deleted Chang Zitu G '.
Further, orienting the container assemblies node occurred extremely according to component exception subgraph includes:
Traverse component exception subgraph G ' calculates δi=∑j∈G’EijIf δi=0, then it represents that component nodes i is abnormal Root node.
Further, method is also wrapped after orienting the container assemblies node occurred extremely according to component exception subgraph It includes:
Judge whether the MID of each abnormal root node is identical, if identical, judges that the physical machine generation that number is MID is different Often.
Further, the TCP for obtaining each container assemblies postpones information and includes:
Postpone information using the TCP that software tcprstat collects each component.
According to another embodiment of the present invention, a kind of abnormality detection applied to distributed container cloud platform and fixed is provided Position device, comprising:
Postpone information acquisition unit, the TCP for obtaining each container assemblies postpones information;
State information acquisition unit, for accumulating the TCP with Outlier Detection Algorithm to each container assemblies by sliding window Delay information is analyzed, and the status information and formation component status information key-value pair of each component are obtained;
Component exception subgraph construction unit, for constructing component exception subgraph by component status information key-value pair;
Abnormal positioning unit, for orienting the container assemblies node occurred extremely according to component exception subgraph.
Further, device further include:
Abnormal deciding means, if identical, judges that number is for judging whether the MID of each abnormal root node is identical The physical machine of MID is abnormal.
A kind of storage medium, storage medium, which is stored with, can be realized above-mentioned any one applied to distributed container cloud platform Abnormality detection and localization method program file.
A kind of processor, processor is for running program, wherein program executes being applied to for above-mentioned any one when running The abnormality detection and localization method of distributed container cloud platform.
The abnormality detection and localization method and device for being applied to distributed container cloud platform in the embodiment of the present invention, uses TCP postpone information carry out abnormality judgement, reduce data acquisition expense, improve abnormality judgement accuracy with Real-time.Simultaneously in view of between each component, the interference between physical machine and component proposes component exception subgraph to indicate The propagation of abnormality improves the accuracy positioned extremely.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of abnormality detection and localization method that the present invention is applied to distributed container cloud platform;
Fig. 2 is that the present invention is applied to the abnormality detection of distributed container cloud platform and the preferred flow charts of localization method;
Fig. 3 is that the present invention is applied to the abnormality detection of distributed container cloud platform and the module map of positioning device;
Fig. 4 is that the present invention is applied to the abnormality detection of distributed container cloud platform and the preferred module figure of localization method.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product Or other step or units that equipment is intrinsic.
With the mature of container technique, the cloud computing system based on container technique, that is, container cloud is had begun gradually Replace traditional cloud computing system based on virtual machine.Since container has the characteristics that light-weighted, the deployment of container is more convenient. Thus composition is more complicated compared to traditional cloud computing platform inside container cloud.Secondly isolation phase of the container to system items resource It is not strong compared with for virtual machine, and multiple containers are run on same physical host, the interference between container is comparatively strong, because Once some container is abnormal inside this container cloud, will propagate rapidly extremely, and then influence entire cluster.And due to container The internal environment of cloud complexity has not been suitable for distributed container cloud ring based on the method for detecting abnormality of unimodule for tradition Border.The prior art is analyzed using performance indicator abnormal, and the expense of data acquisition is increased, while needing to construct normal Volatility model, it is lower and lack real-time for fluctuating accuracy rate for frequent and complicated container cloud platform.
The present invention provides a kind of abnormality detection and positioning applied to distributed container cloud platform for container cloud platform Method and device.Abnormal positioning and inspection can be carried out to more complicated distributed container cloud platform by this method and device It surveys, while the accuracy rate positioned extremely is improved by its component exception subgraph.
Embodiment 1
An embodiment according to the present invention provides a kind of abnormality detection applied to distributed container cloud platform and positioning side Method, referring to Fig. 1, comprising the following steps:
S101: the TCP delay information of each container assemblies is obtained;
S102: postponing information to the TCP of each container assemblies by sliding window accumulation and Outlier Detection Algorithm and analyze, Obtain the status information and formation component status information key-value pair of each component;
S103: component exception subgraph is constructed by component status information key-value pair;
S104: the container assemblies node occurred extremely is oriented according to component exception subgraph.
This method carries out abnormality judgement using TCP delay information, reduces the expense of data acquisition, improves exception The accuracy and real-time of state judgement.Simultaneously in view of between each component, the interference between physical machine and component proposes group Propagation of the part exception subgraph to indicate abnormality, improves the accuracy positioned extremely.
In as a preferred technical scheme, by sliding window accumulation with Outlier Detection Algorithm to the TCP of each container assemblies Delay information is analyzed, and obtains the status information of each component and formation component status information key-value pair includes:
Sliding window [the L of initialization component0, Lk], input TCP (Transmission Control Protocol transmission Control protocol) postpone information until the data amount check that TCP postpones in sliding window reaches k, initialization average valueAccumulation And Sk=0, which is initialization value, Sk=Sk-1=... S0=0, Lk=Lk-1...=L0=0;Wherein [L0, Lk] it is that storage TCP prolongs Slow queue of the information from 0 to k, the size of queue are k, and k value is the integer of 0 < k < 60 as input, k, and usual k takes 10;
Input TCP postpones information L againt, TCP is postponed into information LtIt is inserted into sliding window, and is deleted in sliding window earliest TCP postpone information Lt-k, average value in calculation windowAnd calculate accumulation andIt is herein Iterative calculation, when t is k+1, St-1=Sk=0;Wherein LtPostpone information, the integer of t t > k for the TCP of t moment;
Calculate early warning value Sdiff=Smax-Smin, wherein Smax、Smin∈[St-k, St], St-kWhen postponing information for earliest TCP Accumulation and;
Judge SdiffWhether between normality threshold [- h, h], if it is, judging that the state Status of the component is Normally, otherwise judge the state Status of the component for exception;H indicates acceptable SdiffRange, for input one of parameter.
According to the status information formation component status information key-value pair<CID:MID:Status>of each component, wherein CID table Show the number of component, MID indicates the number of physical machine locating for component, and Status indicates the state of component, when component states are Status value is 1 when abnormal, is normally then 0.
In as a preferred technical scheme, constructing component exception subgraph by component status information key-value pair includes:
Input module dependence graph G, the matrix of component dependencies figure are expressed as G=(Eij), wherein i, and j expression is answered With the component in cluster, Eij indicates the dependence between i component and j component, the Eij value if component i is dependent on component j It is 1, otherwise Eij value is 0;
Traverse component status information key-value pair deletes i=CID from component dependencies figure G when Status value is 0 Or the row and column of j=CID, traversal finish to obtain component dependencies subgraph G1;
It whether there is stand-alone assembly node in determination component dependence subgraph G1, i.e. the stand-alone assembly node is not depend on The component nodes not relied in other assemblies node and for any other component nodes, construction group after this kind of component nodes are deleted Part exception subgraph G '.
In as a preferred technical scheme, the container assemblies node packet occurred extremely is oriented according to component exception subgraph It includes:
Traverse component exception subgraph G ' calculates δi=∑j∈G’EijIf δi=0, then it represents that component nodes i is abnormal Root node.
In as a preferred technical scheme, referring to fig. 2, method is orienting the appearance occurred extremely according to component exception subgraph After device assembly node further include:
S105: judging whether the MID of each abnormal root node is identical, if identical, judges the physical machine hair that number is MID It is raw abnormal.
In as a preferred technical scheme, the TCP delay information for obtaining each container assemblies includes:
Postpone information using the TCP that software tcprstat collects each component.
Below with specific embodiment, this method is described in detail, a kind of distribution container cloud that is applied to of the present invention is put down The abnormality detection of platform and localization method the following steps are included:
Service manager submits abnormal Location Request to service broker;
After service broker receives abnormal Location Request, postpone letter using the TCP of software tcprstat collection assembly Breath.Software tcprstat is the tcp layer analysis tool freely increased income, and statisticallys analyze the response time of request, be can be used for interim Analysis, can also timed task do information collection;
Information is postponed to the TCP of component collected by service broker by sliding window accumulation and Outlier Detection Algorithm It is analyzed, the status information Status and formation component status information key-value pair<CID:MID:Status>of securing component;
Component status information key-value pair<CID:MID:Status>is submitted to service manager by service broker;
Service manager constructs component exception subgraph G ' after being collected into all component status information key assignments;
Service manager traverse component exception subgraph G ' calculates δi=∑j∈G’EijIf δi=0, then it represents that component section Point i is abnormal root node;
Judge whether the MID of each abnormal root node is identical, if identical, the physical machine for indicating that number is MID is abnormal.
Embodiment 2
Another embodiment according to the present invention provides a kind of abnormality detection and positioning applied to distributed container cloud platform Device, referring to Fig. 3, comprising:
Postpone information acquisition unit 201, the TCP for obtaining each container assemblies postpones information;
State information acquisition unit 202, for being accumulated with Outlier Detection Algorithm by sliding window to each container assemblies TCP delay information is analyzed, and the status information and formation component status information key-value pair of each component are obtained;
Component exception subgraph construction unit 203, for constructing component exception subgraph by component status information key-value pair;
Abnormal positioning unit 204, for orienting the container assemblies node occurred extremely according to component exception subgraph.
The abnormality detection and positioning device of Based on Distributed container cloud platform of the present invention are carried out abnormal using TCP delay information State judgement, reduces the expense of data acquisition, improves the accuracy and real-time of abnormality judgement.Simultaneously in view of each Between component, interference between physical machine and component proposes propagation of the component exception subgraph to indicate abnormality, improves The accuracy of abnormal positioning.
In as a preferred technical scheme, referring to fig. 4, device further include:
Abnormal deciding means 205, for judging whether the MID of each abnormal root node is identical, if identical, judges to number It is abnormal for the physical machine of MID.
Embodiment 3
A kind of storage medium, storage medium, which is stored with, can be realized above-mentioned any one applied to distributed container cloud platform Abnormality detection and localization method program file.
Embodiment 4
A kind of processor, processor is for running program, wherein program executes being applied to for above-mentioned any one when running The abnormality detection and localization method of distributed container cloud platform.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed technology contents can pass through others Mode is realized.Wherein, system embodiment described above is only schematical, such as the division of unit, can be one kind Logical function partition, there may be another division manner in actual implementation, such as multiple units or components can combine or can To be integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Coupling, direct-coupling or communication connection can be through some interfaces, the indirect coupling or communication connection of unit or module, It can be electrical or other forms.
Unit may or may not be physically separated as illustrated by the separation member, shown as a unit Component may or may not be physical unit, it can and it is in one place, or may be distributed over multiple units On.It can some or all of the units may be selected to achieve the purpose of the solution of this embodiment according to the actual needs.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
It, can if integrated unit is realized in the form of SFU software functional unit and when sold or used as an independent product To be stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention substantially or Say that all or part of the part that contributes to existing technology or the technical solution can embody in the form of software products Out, which is stored in a storage medium, including some instructions are used so that a computer equipment (can be personal computer, server or network equipment etc.) executes all or part of step of each embodiment method of the present invention Suddenly.And storage medium above-mentioned includes: USB flash disk, read-only memory (ROM, Read-Only Memory), random access memory The various media that can store program code such as (RAM, Random Access Memory), mobile hard disk, magnetic or disk.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (10)

1. a kind of abnormality detection and localization method applied to distributed container cloud platform, which comprises the following steps:
Obtain the TCP delay information of each container assemblies;
Postpone information to the TCP of each container assemblies by sliding window accumulation and Outlier Detection Algorithm to analyze, obtains each group The status information and formation component status information key-value pair of part;
Component exception subgraph is constructed by component status information key-value pair;
The container assemblies node occurred extremely is oriented according to component exception subgraph.
2. the method according to claim 1, wherein described pass through sliding window accumulation and Outlier Detection Algorithm pair The TCP delay information of each container assemblies is analyzed, and the status information and formation component status information key-value pair of each component are obtained Include:
Sliding window [the L of initialization component0, Lk], input TCP delay information is until the data amount check that TCP postpones in sliding window Reach k, initializes average valueAccumulation and Sk=0;Wherein [L0, Lk] it is that storage TCP postpones queue of the information from 0 to k, k For the integer of 0 < k < 60;
Input TCP postpones information L againt, TCP is postponed into information LtIt is inserted into sliding window, and is deleted in sliding window earliest TCP postpones information Lt-k, average value in calculation windowAnd calculate accumulation andWherein LtFor t The TCP at moment postpones information, the integer of t t > k;
Calculate early warning value Sdiff=Smax-Smin, wherein Smax、Smin∈[St-k, St], St-kTiring out when postponing information for earliest TCP Product and;
Judge SdiffWhether between normality threshold [- h, h], if it is, judge the state Status of the component be it is normal, Otherwise judge the state Status of the component for exception;
According to the status information formation component status information key-value pair<CID:MID:Status>of each component, wherein CID expression group The number of part, MID indicate the number of physical machine locating for component, and Status indicates the state of component, when component states are abnormal When Status value be 1, normally then be 0.
3. according to the method described in claim 2, it is characterized in that, described different by component status information key-value pair construction component Chang Zitu includes:
Input module dependence graph G, the matrix of component dependencies figure are expressed as G=(Eij), wherein i, and j indicates application collection Component in group, Eij indicate the dependence between i component and j component, and Eij value is 1 if component i is dependent on component j, Otherwise Eij value is 0;
Traverse component status information key-value pair deletes i=CID or j when Status value is 0 from component dependencies figure G The row and column of=CID, traversal finish to obtain component dependencies subgraph G1;
It whether there is stand-alone assembly node in determination component dependence subgraph G1, stand-alone assembly node is independent of other groups Part node and the component nodes not relied on for any other component nodes, construction component is extremely sub after this kind of component nodes are deleted Scheme G '.
4. according to the method described in claim 3, it is characterized in that, described oriented according to component exception subgraph occurs extremely Container assemblies node includes:
Traverse component exception subgraph G ' calculates δi=∑j∈G’EijIf δi=0, then it represents that component nodes i is abnormal root section Point.
5. according to the method described in claim 4, it is characterized in that, the method is oriented described according to component exception subgraph Extremely after the container assemblies node occurred further include:
Judge whether the MID of each abnormal root node is identical, if identical, the physical machine for judging that number is MID is abnormal.
6. the method according to claim 1, wherein the TCP delay information for obtaining each container assemblies includes:
Postpone information using the TCP that software tcprstat collects each component.
7. a kind of abnormality detection and positioning device applied to distributed container cloud platform characterized by comprising
Postpone information acquisition unit, the TCP for obtaining each container assemblies postpones information;
State information acquisition unit, for being postponed by sliding window accumulation and Outlier Detection Algorithm to the TCP of each container assemblies Information is analyzed, and the status information and formation component status information key-value pair of each component are obtained;
Component exception subgraph construction unit, for constructing component exception subgraph by component status information key-value pair;
Abnormal positioning unit, for orienting the container assemblies node occurred extremely according to component exception subgraph.
8. device according to claim 7, which is characterized in that described device further include:
Abnormal deciding means, if identical, judges that number is MID's for judging whether the MID of each abnormal root node is identical Physical machine is abnormal.
9. a kind of storage medium, which is characterized in that the storage medium be stored with can be realized it is any one in claim 1 to 6 It is applied to the abnormality detection of distributed container cloud platform and the program file of localization method described in.
10. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run Benefit is applied to the abnormality detection and localization method of distributed container cloud platform described in requiring any one of 1 to 6.
CN201811537333.2A 2018-12-15 2018-12-15 Anomaly detection and location method and device applied to distributed container cloud platform Active CN109800052B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811537333.2A CN109800052B (en) 2018-12-15 2018-12-15 Anomaly detection and location method and device applied to distributed container cloud platform
PCT/CN2019/123989 WO2020119627A1 (en) 2018-12-15 2019-12-09 Abnormality detection and positioning method and apparatus applied to distributed container cloud platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811537333.2A CN109800052B (en) 2018-12-15 2018-12-15 Anomaly detection and location method and device applied to distributed container cloud platform

Publications (2)

Publication Number Publication Date
CN109800052A true CN109800052A (en) 2019-05-24
CN109800052B CN109800052B (en) 2020-11-24

Family

ID=66556890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811537333.2A Active CN109800052B (en) 2018-12-15 2018-12-15 Anomaly detection and location method and device applied to distributed container cloud platform

Country Status (2)

Country Link
CN (1) CN109800052B (en)
WO (1) WO2020119627A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111061586A (en) * 2019-12-05 2020-04-24 深圳先进技术研究院 Anomaly detection method, system and electronic device for container cloud platform
WO2020119627A1 (en) * 2018-12-15 2020-06-18 深圳先进技术研究院 Abnormality detection and positioning method and apparatus applied to distributed container cloud platform
WO2021109048A1 (en) * 2019-12-05 2021-06-10 深圳先进技术研究院 Container cloud platform abnormality detection method and system, and electronic device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106487633A (en) * 2016-10-11 2017-03-08 中国银联股份有限公司 A kind of abnormal monitoring method of virtual machine and device
CN106776005A (en) * 2016-11-23 2017-05-31 华中科技大学 A kind of resource management system and method towards containerization application
CN107612787A (en) * 2017-11-06 2018-01-19 南京易捷思达软件科技有限公司 A kind of cloud hostdown detection method for cloud platform of being increased income based on Openstack
US20180032903A1 (en) * 2016-07-28 2018-02-01 International Business Machines Corporation Optimized re-training for analytic models
WO2018084912A1 (en) * 2016-11-02 2018-05-11 Qualcomm Incorporated Methods and systems for anomaly detection using function specifications derived from server input/output (i/o) behavior
CN108259241A (en) * 2018-01-11 2018-07-06 上海有云信息技术有限公司 A kind of abnormal localization method and device of cloud platform monitoring system
CN108306747A (en) * 2017-01-11 2018-07-20 阿里巴巴集团控股有限公司 A kind of cloud security detection method, device and electronic equipment
CN108337108A (en) * 2017-12-28 2018-07-27 天津麒麟信息技术有限公司 A kind of cloud platform failure automation localization method based on association analysis
CN108491306A (en) * 2018-03-19 2018-09-04 广东电网有限责任公司珠海供电局 One kind being based on enterprise's private clound credibility monitoring method and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3345626B2 (en) * 1994-09-29 2002-11-18 富士通株式会社 Processor error countermeasure device in multiprocessor system and processor error countermeasure method in multiprocessor system
CN101505243B (en) * 2009-03-10 2011-01-05 中国科学院软件研究所 Performance exception detecting method for Web application
CN105242971B (en) * 2015-10-20 2019-02-22 北京航空航天大学 Stream processing system-oriented memory object management method and system
CN108306879B (en) * 2018-01-30 2020-11-06 福建师范大学 Distributed real-time anomaly location method based on Web session flow
CN109800052B (en) * 2018-12-15 2020-11-24 深圳先进技术研究院 Anomaly detection and location method and device applied to distributed container cloud platform

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180032903A1 (en) * 2016-07-28 2018-02-01 International Business Machines Corporation Optimized re-training for analytic models
CN106487633A (en) * 2016-10-11 2017-03-08 中国银联股份有限公司 A kind of abnormal monitoring method of virtual machine and device
WO2018084912A1 (en) * 2016-11-02 2018-05-11 Qualcomm Incorporated Methods and systems for anomaly detection using function specifications derived from server input/output (i/o) behavior
CN106776005A (en) * 2016-11-23 2017-05-31 华中科技大学 A kind of resource management system and method towards containerization application
CN108306747A (en) * 2017-01-11 2018-07-20 阿里巴巴集团控股有限公司 A kind of cloud security detection method, device and electronic equipment
CN107612787A (en) * 2017-11-06 2018-01-19 南京易捷思达软件科技有限公司 A kind of cloud hostdown detection method for cloud platform of being increased income based on Openstack
CN108337108A (en) * 2017-12-28 2018-07-27 天津麒麟信息技术有限公司 A kind of cloud platform failure automation localization method based on association analysis
CN108259241A (en) * 2018-01-11 2018-07-06 上海有云信息技术有限公司 A kind of abnormal localization method and device of cloud platform monitoring system
CN108491306A (en) * 2018-03-19 2018-09-04 广东电网有限责任公司珠海供电局 One kind being based on enterprise's private clound credibility monitoring method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JORDAN HOCHENBAUM: "Automatic Anomaly Detection in the Cloud", 《ARXIV》 *
TAO WANG: "Self-adaptive cloud monitoring with online anomaly detection", 《FUTURE GENERATION COMPUTER SYSTEMS》 *
王桂平: "云环境下面向可信的虚拟机异常检测关键技术研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020119627A1 (en) * 2018-12-15 2020-06-18 深圳先进技术研究院 Abnormality detection and positioning method and apparatus applied to distributed container cloud platform
CN111061586A (en) * 2019-12-05 2020-04-24 深圳先进技术研究院 Anomaly detection method, system and electronic device for container cloud platform
WO2021109048A1 (en) * 2019-12-05 2021-06-10 深圳先进技术研究院 Container cloud platform abnormality detection method and system, and electronic device
CN111061586B (en) * 2019-12-05 2023-09-19 深圳先进技术研究院 Container cloud platform anomaly detection method and system and electronic equipment

Also Published As

Publication number Publication date
WO2020119627A1 (en) 2020-06-18
CN109800052B (en) 2020-11-24

Similar Documents

Publication Publication Date Title
US10177998B2 (en) Augmenting flow data for improved network monitoring and management
US10860576B2 (en) Splitting a query into native query operations and post-processing operations
JP5719974B2 (en) Management system for managing a computer system having a plurality of devices to be monitored
EP2515233A1 (en) Detecting and diagnosing misbehaving applications in virtualized computing systems
CA2648528C (en) Method and system for determining compatibility of computer systems
US10983856B2 (en) Identifying root causes of performance issues
EP2425349B1 (en) Application efficiency engine
US20140372347A1 (en) Methods and systems for identifying action for responding to anomaly in cloud computing system
US10606649B2 (en) Workload identification and display of workload-specific metrics
JP2019505912A (en) Outlier detection of streaming data
CN109800052A (en) Abnormality detection and localization method and device applied to distributed container cloud platform
Kang et al. {DAPA}: Diagnosing Application Performance Anomalies for Virtualized Infrastructures
US9600791B2 (en) Managing a network system
CN106170947A (en) A kind of alarm information processing method, relevant device and system
US10754866B2 (en) Management device and management method
US20130254524A1 (en) Automated configuration change authorization
WO2017021290A1 (en) Network operation
CN108009004A (en) The implementation method of service application availability measurement monitoring based on Docker
CN110196751A (en) The partition method and device of mutual interference service, electronic equipment, storage medium
Wang et al. Adaptive placement of data analysis tasks for staging based in-situ processing
Apte et al. Look who's talking: Discovering dependencies between virtual machines using {CPU} utilization
WO2016085443A1 (en) Application management based on data correlations
US20230195495A1 (en) Realtime property based application discovery and clustering within computing environments
KR101636796B1 (en) Virtual Infra Obstacle Managing System and Method therefor
Abdel Raouf et al. A predictive replication for multi‐tenant databases using deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant