CN111338897A - Identification method of abnormal node in application host, monitoring equipment and electronic equipment - Google Patents

Identification method of abnormal node in application host, monitoring equipment and electronic equipment Download PDF

Info

Publication number
CN111338897A
CN111338897A CN202010110736.XA CN202010110736A CN111338897A CN 111338897 A CN111338897 A CN 111338897A CN 202010110736 A CN202010110736 A CN 202010110736A CN 111338897 A CN111338897 A CN 111338897A
Authority
CN
China
Prior art keywords
application host
clustering
algorithm
index data
application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010110736.XA
Other languages
Chinese (zh)
Other versions
CN111338897B (en
Inventor
陈楚濠
张静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JD Digital Technology Holdings Co Ltd
Original Assignee
JD Digital Technology Holdings Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JD Digital Technology Holdings Co Ltd filed Critical JD Digital Technology Holdings Co Ltd
Priority to CN202010110736.XA priority Critical patent/CN111338897B/en
Publication of CN111338897A publication Critical patent/CN111338897A/en
Application granted granted Critical
Publication of CN111338897B publication Critical patent/CN111338897B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The disclosure provides an identification method of an abnormal node in an application host, monitoring equipment, electronic equipment and a computer storage medium, and relates to the technical field of computers. The method comprises the following steps: clustering each application host into two types according to a plurality of index data of the application host through a clustering algorithm; calculating the abnormal score of each application host through an iForest algorithm; selecting N indexes from the multiple indexes through an ensemble learning algorithm; extracting M principal components from the N selected indexes through a PCA (principal component analysis) dimension reduction algorithm; and clustering each application host through a density-based clustering algorithm according to the M-dimensional index data of each application host, thereby obtaining abnormal nodes in each application host. The method can realize automatic monitoring, reduce the interference of noise in data and improve the identification efficiency and accuracy.

Description

Identification method of abnormal node in application host, monitoring equipment and electronic equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method for identifying an abnormal node in an application host, a monitoring device, an electronic device, and a computer storage medium.
Background
And checking each performance monitoring index of the host one by one according to rules based on human experience, finding out the application host which is possibly abnormal, determining the range of the abnormal host, checking the difference of each index, and finally finding out the abnormal host.
Whether each index of each machine is different under the application needs to be manually checked, the monitoring indexes are numerous and difficult to compare, the efficiency is low, and real abnormal nodes are not easy to find.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The present disclosure is directed to a method for identifying an abnormal node in an application host, a monitoring device, an electronic device, and a computer storage medium, which overcome, at least to some extent, the problems of inefficiency and difficulty in finding an abnormal node due to the limitations of the related art.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to an aspect of the present disclosure, there is provided a method for identifying an abnormal node in an application host, including: acquiring a plurality of index data of each application host; clustering the application hosts into two types according to the index data through a clustering algorithm; calculating the abnormal score of each application host through an iForest algorithm; selecting N indexes from the plurality of indexes through an ensemble learning algorithm; extracting M main components from the N selected indexes through a PCA (principal component analysis) dimension reduction algorithm, so that the index data of each application host are reduced into M-dimensional index data; clustering each application host through a density-based clustering algorithm according to the M-dimensional index data of each application host, thereby obtaining abnormal nodes in each application host; wherein, N and M are natural numbers, and M is less than N.
In one embodiment of the present disclosure, M is equal to 2; after clustering each application host through a density-based clustering algorithm according to the M-dimensional index data of each application host, the method further comprises: and mapping the clustering result in a two-dimensional scatter diagram, wherein the abscissa is a first principal component, and the ordinate is a second principal component.
In an embodiment of the present disclosure, the clustering the application hosts into two classes according to the index data by a clustering algorithm includes: and clustering the application hosts into two types by a K-means clustering algorithm according to the index data.
In one embodiment of the present disclosure, the ensemble learning algorithm includes a random forest RF algorithm, Bagging, GBDT, or XGBoost algorithm.
In one embodiment of the present disclosure, N is equal to 5, and/or the density-based clustering algorithm is a DBSCAN algorithm.
In one embodiment of the present disclosure, M is equal to 3; after clustering each application host through a density-based clustering algorithm according to the M-dimensional index data of each application host, the method further comprises: and mapping the clustering result in a three-dimensional scatter diagram, wherein the X-axis coordinate is a first principal component, the Z-axis coordinate is a second principal component, and the Y-axis coordinate is a third principal component.
According to another aspect of the present disclosure, there is provided an application host monitoring apparatus including: the index acquisition module is used for acquiring a plurality of index data of each application host; the first clustering module is used for clustering the application hosts into two types according to the index data through a clustering algorithm; the abnormal score acquisition module is used for calculating the abnormal score of each application host through an iForest algorithm; the index screening module is used for selecting N indexes from the multiple indexes through an ensemble learning algorithm; the dimensionality reduction processing module is used for extracting M principal components from the selected N indexes through a PCA dimensionality reduction algorithm so as to reduce the index data of each application host into M-dimensional index data; the second clustering module is used for clustering each application host through a density-based clustering algorithm according to the M-dimensional index data of each application host so as to obtain abnormal nodes in each application host; wherein, N and M are natural numbers, and M is less than N.
In one embodiment of the present disclosure, M is equal to 2, the apparatus further comprising: and the scatter diagram display module is used for mapping the clustering result in a two-dimensional scatter diagram, wherein the abscissa is a first principal component, and the ordinate is a second principal component.
According to yet another aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the above-mentioned method for identifying an abnormal node in the application host by executing the executable instructions.
According to yet another aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described method for identifying an abnormal node in an application host.
According to the identification method, the monitoring device, the electronic device and the computer storage medium of the abnormal nodes in the application hosts, the application hosts are clustered into two types according to a clustering algorithm, the abnormal score of each application host is calculated through an iForest algorithm, the most important N indexes are selected from a plurality of indexes through an integrated learning algorithm, then M principal components are extracted through a PCA (principal component analysis) dimension reduction algorithm, and the clustering algorithm based on density clusters the application hosts according to the M principal components, so that the abnormal nodes in the application hosts are obtained, automatic monitoring can be achieved, noise interference in data is reduced, and identification efficiency is improved.
Furthermore, the clustering result is displayed in a two-dimensional or three-dimensional scatter diagram mode, so that abnormal nodes can be identified more visually and rapidly, monitoring and processing of monitoring personnel are facilitated, and efficiency is further improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.
Fig. 1 is a schematic diagram illustrating an exemplary system structure of a video data processing method or apparatus to which the embodiments of the present disclosure may be applied in the embodiments of the present disclosure;
FIG. 2 is a flow diagram illustrating a method for identifying an abnormal node in an application host according to an embodiment of the present disclosure;
FIG. 3 is a flow chart of a method for identifying an abnormal node in an application host according to another embodiment of the present disclosure;
FIG. 4 is a flow chart of a method for identifying an abnormal node in an application host according to another embodiment of the present disclosure;
FIG. 5 is a block diagram of an application host monitoring device in one embodiment of the present disclosure;
FIG. 6 is a block diagram of an application host monitoring device in another embodiment of the present disclosure;
FIG. 7 shows a graphical representation of a two-dimensional scatter plot in an embodiment of the disclosure;
fig. 8 is a block diagram showing a configuration of a computer device to which the identification method of an abnormal node in an application host is applied in the embodiment of the present disclosure; and
FIG. 9 shows a schematic diagram of a computer storage medium of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
According to the scheme provided by the disclosure, dimension reduction algorithm operation is carried out on each index of the host under each application, the dimension of the abnormal index of Top N is extracted, then clustering is carried out to obtain the multi-dimensional scatter diagram, and the dimension table of the abnormal index of the abnormal node is identified. To facilitate understanding, several terms referred to in the present disclosure are explained below.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
The iForest (Isolation Forest) is a fast anomaly detection method based on Ensemble learning (Ensemble), and has linear time complexity and high accuracy. iForest is applicable to anomaly detection with continuous data, and defines anomalies as "outliers that are easily isolated" -which can be understood as points that are sparsely distributed and are far from a dense population. Statistically, in the data space, the sparsely distributed regions indicate that the probability of data occurring in the regions is low, and thus the data falling in the regions can be considered abnormal. iForest belongs to a non-parametric inspection and unsupervised method, i.e. without defining a mathematical model and without labeled training.
Ensemble Learning (Ensemble Learning) is a machine Learning method that uses a series of learners to perform Learning and integrates the Learning results using a certain rule to obtain a better Learning effect than a single learner.
The idea of ensemble learning is to integrate several single classifiers when classifying new instances, and to decide the final classification by some combination of the classification results of multiple classifiers, so as to achieve better performance than that of a single classifier. If a single classifier is compared with a decision maker, the method of ensemble learning is equivalent to a decision maker which performs a decision by multiple decision makers.
Principal Component Analysis (PCA), also known as Principal component Analysis, aims to convert multiple indices into a few comprehensive indices using the idea of dimension reduction. In statistics, principal component analysis is a technique that simplifies the data set. It is a linear transformation. This transformation transforms the data into a new coordinate system such that the first large variance of any data projection is at the first coordinate (called the first principal component), the second large variance is at the second coordinate (the second principal component), and so on. Principal component analysis is often used to reduce the dimensionality of the data set while maintaining the features of the data set that contribute most to the variance. This is done by keeping the lower order principal components and ignoring the higher order principal components so that the lower order components tend to retain the most important aspects of the data.
The density-based clustering algorithm can find clusters of various shapes and sizes in noisy data. The core idea is to find the points with higher density, and then connect the close high density points into one piece step by step, so as to generate various clusters. The algorithm is implemented by drawing a circle (called neighborhood eps-neighbor bourhood) with eps as a radius by taking each data point as a center of the circle, and counting how many points are in the circle, wherein the number is the density value of the point. Then, a density threshold value MinPts can be selected, for example, the center point with the number of points in the circle smaller than MinPts is a low density point, and the center point with the density larger than or equal to MinPts is a high density point (called Core point). If there is a high density of points within the circle of another high density of points, the two points are connected so that multiple points can be continuously connected in series. Then, if there is a point of low density also within the circle of points of high density, it is also connected to the nearest point of high density, called the boundary point. Thus all points that can be joined together are in a cluster, while low density points that are not within the circle of any high density points are outliers.
Density-based clustering is a very intuitive clustering method, i.e., areas of high density in close proximity are trained into one piece to form clusters. The method can find clusters of various sizes and shapes and has certain anti-noise property.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Fig. 1 shows a schematic diagram of an exemplary system architecture 100 to which a video data processing method or apparatus of an embodiment of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include one or more of application hosts 101, 102, 103, a network 104, and a monitoring device 105. The network 104 serves as a medium for providing communication links between the application hosts 101, 102, 103 and the monitoring device 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
It should be understood that the number of application hosts, networks, and monitoring devices in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, the monitoring device 105 may be a server cluster composed of a plurality of servers, and the like.
The application hosts 101, 102, 103 may interact with the monitoring device 105 over the network 104 to receive or send messages or the like. The application hosts 101, 102, 103 may be servers, or various electronic devices including, but not limited to, smart phones, tablets, portable computers, desktop computers, wearable devices, virtual reality devices, smart homes, and the like.
Hereinafter, each step of the method for identifying an abnormal node in an application host in the present exemplary embodiment will be described in more detail with reference to the drawings and the embodiments.
Fig. 2 is a flowchart illustrating a method for identifying an abnormal node in an application host according to an embodiment of the present disclosure.
As shown in fig. 2, in step S21, a plurality of index data of each application host is acquired. The index data includes, for example, CPU utilization, memory utilization, network speed, and other various data.
And step S22, clustering the application hosts into two types according to the index data through a clustering algorithm. Various Clustering algorithms such as a K-MEANS algorithm, a K-MEDOIDS (K center point) algorithm, a CLARANS (Clustering Large applied onto clustered Search based on a random selection) algorithm, and the like can be adopted.
In step S23, an abnormality score is calculated for each application host by using the iForest (Isolation Forest) algorithm.
In step S24, Top N indicators are selected from the plurality of indicators by the ensemble learning algorithm. Ensemble learning algorithms include, for example, an RF (Random Forest) algorithm, a Bagging (bootstrapping aggregation algorithm), a GBDT (Gradient Boosting Tree), or an XGBoost (extreme Gradient Boosting) algorithm. N is a natural number, and can take the value of 5, 6, 7 or 4.
Step S25, extracting M Principal components from the N selected indicators by using a Principal Component Analysis (PCA) dimensionality reduction algorithm, thereby reducing the plurality of indicator data of each application host to M-dimensional indicator data, where M is a natural number and M < N. For example, M may take the value 2, 3 or 4.
And step S26, clustering each application host through a density-based clustering algorithm according to the M-dimensional index data of each application host, thereby obtaining abnormal nodes in each application host. The Density-Based Clustering algorithm may include a DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm.
In the embodiment, the application hosts are clustered into two types according to a clustering algorithm, the abnormal score of each application host is calculated through an iForest algorithm, the most important N indexes are selected from a plurality of indexes through an ensemble learning algorithm, then M principal components are extracted through a PCA (principal component analysis) dimension reduction algorithm, and each application host is clustered through the density-based clustering algorithm according to the M principal components, so that abnormal nodes in each application host are obtained, automatic monitoring can be achieved, interference of noise in data is reduced, and identification efficiency and accuracy are improved.
In one embodiment, after the abnormal host is identified, the principal component is recalled according to the M-dimensional index of the abnormal host, wherein the index with a large corresponding coefficient is the abnormal index, so that the abnormal index of the abnormal node is obtained.
Fig. 3 is a flow chart illustrating a method for identifying an abnormal node in an application host according to another embodiment of the present disclosure.
Steps S31-S34 of FIG. 3 are similar to steps S21-S24 of FIG. 2 and will not be described in detail herein for brevity.
And step S35, extracting 2 principal components from the selected N indexes through a PCA dimension reduction algorithm, so that the index data of each application host are reduced into 2-dimensional index data.
And step S36, clustering the application hosts through a density-based clustering algorithm according to the 2-dimensional index data of the application hosts.
Step S37, mapping the clustering result in step S36 in a two-dimensional scatter plot, wherein the abscissa is the first principal component and the ordinate is the second principal component.
The principle of the PCA dimension reduction algorithm is to project the raw data from the original space into the principal component space and then pull the projection back into the original space. The first principal component reflects the variance of the normal value and the last principal component reflects the variance of the outlier, if only the first principal component is used for projection and reconstruction, the error after reconstruction is small for most data; but for outliers the error after reconstruction is still relatively large. In the embodiment, the first two principal components are selected, and the cumulative contribution rate of the first two principal components can reach 80%. The larger the element value in the feature vector corresponding to the principal component is, the larger the coefficient of the corresponding index is, that is, the index of the abnormal host which is mainly distinguished from the normal population is the abnormal index, and the number of the abnormal indexes is adjusted along with the characteristics of the actual data.
In one embodiment, 3 principal components are extracted from the selected N indexes through a PCA (principal component analysis) dimension reduction algorithm, and a plurality of index data of each application host are reduced into 3-dimensional index data; clustering each application host through a density-based clustering algorithm according to the 3-dimensional index data of each application host; and mapping the clustering result in a three-dimensional scatter diagram, wherein the X-axis coordinate is a first principal component, the Z-axis coordinate is a second principal component, and the Y-axis coordinate is a third principal component.
Fig. 4 is a flow chart illustrating a method for identifying an abnormal node in an application host according to another embodiment of the present disclosure.
As shown in fig. 4, in step S41, a plurality of index data of each application host are acquired.
And step S42, clustering the application hosts into two types by a K-MEANS clustering algorithm according to the index data. And obtaining the score of each application host based on the result of the clustering algorithm, and adding an initial label to the application host according to the score.
In step S43, an anomaly score is calculated for each application host by the iForest algorithm. And replacing the initial label of each application host with the obtained abnormal score.
And step S44, selecting Top N indexes from the indexes through a random forest algorithm, wherein N is a natural number. N can take the values of 5, 6 and 7.
And step S45, extracting 2 principal components from the selected Top N indexes through a PCA dimension reduction algorithm, so that the index data of each application host are reduced into 2-dimensional index data. The principle of the PCA dimension reduction algorithm is to project the raw data from the original space into the principal component space and then pull the projection back into the original space. The first principal component reflects the variance of the normal value and the last principal component reflects the variance of the outlier, if only the first principal component is used for projection and reconstruction, the error after reconstruction is small for most data; but for outliers the error after reconstruction is still relatively large. The first two principal components are selected under our scene according to the fact that the cumulative contribution rate of the first two principal components can reach 80%.
And step S46, clustering each application host through the DBSCAN algorithm according to the 2-dimensional index data of each application host. The DBSCAN clustering algorithm utilizes the concept of density-based clustering, and requires that the number of objects included in a certain region in a clustering space is not less than a given threshold. The algorithm can find clusters in any shape in a noisy spatial database, link adjacent areas with high enough density, and effectively process abnormal data.
And step S47, mapping the clustering result in a two-dimensional scatter diagram, wherein the abscissa is a first principal component, and the ordinate is a second principal component, so as to show an abnormal index dimension table of the abnormal node. And mapping the first principal component to an abscissa axis of the two-dimensional scattergram, wherein a value corresponding to the abscissa axis is a value calculated by a linear expression of the screened important index in the first principal component, mapping the second principal component to an ordinate axis of the two-dimensional scattergram, and a value corresponding to the ordinate axis is a value calculated by a linear expression of the screened important index in the second principal component.
In the above embodiment, the first two principal components are extracted by PCA dimension reduction, that is, the multi-index data of the application host is reduced into two dimensions, the host is clustered according to the density by the DBSCAN clustering algorithm, and the clustering result is mapped in the two-dimensional scatter diagram. The abscissa axis is a first principal component, the ordinate axis is a second principal component, the clustering result obtained through the density-based DBSCAN clustering algorithm is mapped into the two-dimensional scatter diagram, which type with a large number of hosts is marked as 0 type, and which type deviating from most groups is marked as-1 type. FIG. 7 shows a graphical representation of a two-dimensional scatter plot in an embodiment of the disclosure.
Fig. 5 shows a block diagram of an application host monitoring device in an embodiment of the present disclosure. As shown in fig. 5, the app host monitoring apparatus includes: an index obtaining module 51, configured to obtain a plurality of index data of each application host; the first clustering module 52 is configured to cluster the application hosts into two classes according to the index data through a clustering algorithm; an anomaly score obtaining module 53, configured to calculate an anomaly score of each application host through an iForest algorithm; an index screening module 54 for selecting N indexes from the plurality of indexes through an ensemble learning algorithm; a dimensionality reduction processing module 55, configured to extract M principal components from the selected N indexes through a PCA dimensionality reduction algorithm, so as to reduce the index data of each application host to M-dimensional index data; the second clustering module 56 is configured to cluster the application hosts through a density-based clustering algorithm according to the M-dimensional index data of the application hosts, so as to obtain abnormal nodes in the application hosts; wherein, N and M are natural numbers, and M is less than N.
Fig. 6 shows a block diagram of an application host monitoring device in another embodiment of the present disclosure. As shown in fig. 6, the apparatus includes an index obtaining module 61 configured to obtain a plurality of index data of each application host; the first clustering module 62 is configured to cluster the application hosts into two classes according to the index data through a clustering algorithm; an abnormal score obtaining module 63, configured to calculate an abnormal score of each application host through an iForest algorithm; an index screening module 64 for selecting N indexes from the plurality of indexes through an ensemble learning algorithm; a dimensionality reduction processing module 65, configured to extract 2 principal components from the N selected indexes through a PCA dimensionality reduction algorithm, so as to reduce the index data of each application host to 2-dimensional index data; the second clustering module 66 is used for clustering each application host through a density-based clustering algorithm according to the 2-dimensional index data of each application host; and a scatter diagram display module 67, configured to map the clustering result in a two-dimensional scatter diagram, where an abscissa is a first principal component and an ordinate is a second principal component, so as to obtain abnormal nodes in each application host.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 800 according to this embodiment of the invention is described below with reference to fig. 8. The electronic device 800 shown in fig. 8 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present invention.
As shown in fig. 8, electronic device 800 is in the form of a general purpose computing device. The components of the electronic device 800 may include, but are not limited to: the at least one processing unit 810, the at least one memory unit 820, and a bus 830 that couples the various system components including the memory unit 820 and the processing unit 810.
Wherein the storage unit stores program code that is executable by the processing unit 810 to cause the processing unit 810 to perform steps according to various exemplary embodiments of the present invention as described in the above section "exemplary methods" of the present specification. For example, the processing unit 810 may execute S21 shown in fig. 2 to obtain a plurality of index data of each application host; s22, clustering each application host into two types according to the index data through a clustering algorithm; s23, calculating the abnormal score of each application host through an iForest algorithm; s24, selecting N indexes from the indexes through an ensemble learning algorithm; s25, carrying out PCA on the selected N indexes; extracting M main components by a dimensionality reduction algorithm so as to reduce the index data of each application host into M dimensionality index data, wherein M is a natural number, and M is less than N; and S26, clustering the application hosts through a density-based clustering algorithm according to the M-dimensional index data of the application hosts, thereby obtaining abnormal nodes in the application hosts.
The storage unit 820 may include readable media in the form of volatile memory units such as a random access memory unit (RAM)8201 and/or a cache memory unit 8202, and may further include a read only memory unit (ROM) 8203.
The storage unit 820 may also include a program/utility 8204 having a set (at least one) of program modules 8205, such program modules 8205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 830 may be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 800 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 800 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 860. As shown, the network adapter 860 communicates with the other modules of the electronic device 800 via the bus 830. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.
Referring to fig. 9, a program product 900 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (10)

1. A method for identifying abnormal nodes in an application host is characterized by comprising the following steps:
acquiring a plurality of index data of each application host;
clustering the application hosts into two types according to the index data through a clustering algorithm;
calculating the abnormal score of each application host through an isolated forest iForest algorithm;
selecting N indexes from the plurality of indexes through an ensemble learning algorithm;
extracting M principal components from the N selected indexes through Principal Component Analysis (PCA) dimensionality reduction algorithm, and reducing the index data of each application host into M-dimensional index data;
clustering each application host through a density-based clustering algorithm according to the M-dimensional index data of each application host, thereby obtaining abnormal nodes in each application host;
wherein, N and M are natural numbers, and M is less than N.
2. The method for identifying an abnormal node in an application host according to claim 1, wherein M is equal to 2;
after clustering each application host through a density-based clustering algorithm according to the M-dimensional index data of each application host, the method further comprises:
and mapping the clustering result in a two-dimensional scatter diagram, wherein the abscissa is a first principal component, and the ordinate is a second principal component.
3. The method for identifying abnormal nodes in application hosts according to claim 1 or 2, wherein the clustering the application hosts into two classes according to the index data by a clustering algorithm comprises:
and clustering the application hosts into two types by a K-means clustering algorithm according to the index data.
4. The method for identifying abnormal nodes in an application host according to claim 3, wherein the ensemble learning algorithm comprises a random forest RF algorithm, a guided aggregation algorithm Bagging, a gradient boosting tree GBDT, or an extreme gradient boosting XGboost algorithm.
5. Method for identifying an abnormal node in an application host according to claim 1 or 2, wherein N is equal to 5 and/or the density-based clustering algorithm is the DBSCAN algorithm.
6. The method for identifying an abnormal node in an application host according to claim 1, wherein M is equal to 3;
after clustering each application host through a density-based clustering algorithm according to the M-dimensional index data of each application host, the method further comprises:
and mapping the clustering result in a three-dimensional scatter diagram, wherein the X-axis coordinate is a first principal component, the Z-axis coordinate is a second principal component, and the Y-axis coordinate is a third principal component.
7. An application host monitoring device, comprising:
the index acquisition module is used for acquiring a plurality of index data of each application host;
the first clustering module is used for clustering the application hosts into two types according to the index data through a clustering algorithm;
the abnormal score acquisition module is used for calculating the abnormal score of each application host through an isolated forest iForest algorithm;
the index screening module is used for selecting N indexes from the multiple indexes through an ensemble learning algorithm;
the dimensionality reduction processing module is used for extracting M principal components from the selected N indexes through a PCA dimensionality reduction algorithm so as to reduce the index data of each application host into M-dimensional index data;
the second clustering module is used for clustering each application host through a density-based clustering algorithm according to the M-dimensional index data of each application host so as to obtain abnormal nodes in each application host;
wherein, N and M are natural numbers, and M is less than N.
8. The app host monitoring device of claim 7, wherein M equals 2, the device further comprising:
and the scatter diagram display module is used for mapping the clustering result in a two-dimensional scatter diagram, wherein the abscissa is a first principal component, and the ordinate is a second principal component.
9. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to execute the executable instructions to perform the method for identifying an abnormal node in the application host according to any one of claims 1 to 6.
10. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the method for identifying an abnormal node in an application host according to any one of claims 1 to 6.
CN202010110736.XA 2020-02-24 2020-02-24 Method for identifying abnormal node in application host, monitoring equipment and electronic equipment Active CN111338897B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010110736.XA CN111338897B (en) 2020-02-24 2020-02-24 Method for identifying abnormal node in application host, monitoring equipment and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010110736.XA CN111338897B (en) 2020-02-24 2020-02-24 Method for identifying abnormal node in application host, monitoring equipment and electronic equipment

Publications (2)

Publication Number Publication Date
CN111338897A true CN111338897A (en) 2020-06-26
CN111338897B CN111338897B (en) 2024-07-19

Family

ID=71183685

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010110736.XA Active CN111338897B (en) 2020-02-24 2020-02-24 Method for identifying abnormal node in application host, monitoring equipment and electronic equipment

Country Status (1)

Country Link
CN (1) CN111338897B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111859069A (en) * 2020-07-15 2020-10-30 北京市燃气集团有限责任公司 Network malicious crawler identification method, system, terminal and storage medium
CN112836926A (en) * 2020-12-27 2021-05-25 四川大学 Enterprise operation condition evaluation method based on electric power big data
CN113762717A (en) * 2021-08-03 2021-12-07 国能国华(北京)电力研究院有限公司 Equipment running state monitoring method and device, electronic equipment and storage medium
CN113822379A (en) * 2021-11-22 2021-12-21 成都数联云算科技有限公司 Process process anomaly analysis method and device, electronic equipment and storage medium
CN114780338A (en) * 2022-04-14 2022-07-22 京东科技信息技术有限公司 Host information processing method and device, electronic equipment and computer readable medium
CN115438035A (en) * 2022-10-27 2022-12-06 江西师范大学 Data exception handling method based on KPCA and mixed similarity
CN117113235A (en) * 2023-10-20 2023-11-24 深圳市互盟科技股份有限公司 Cloud computing data center energy consumption optimization method and system

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070282777A1 (en) * 2006-05-30 2007-12-06 Honeywell International Inc. Automatic fault classification for model-based process monitoring
CN106951776A (en) * 2017-01-18 2017-07-14 中国船舶重工集团公司第七0九研究所 A kind of Host Anomaly Detection method and system
CN107977301A (en) * 2017-11-21 2018-05-01 东软集团股份有限公司 Detection method, device, storage medium and the electronic equipment of unit exception
CN108512827A (en) * 2018-02-09 2018-09-07 世纪龙信息网络有限责任公司 The identification of abnormal login and method for building up, the device of supervised learning model
CN109978070A (en) * 2019-04-03 2019-07-05 北京市天元网络技术股份有限公司 A kind of improved K-means rejecting outliers method and device
CN110046665A (en) * 2019-04-17 2019-07-23 成都信息工程大学 Based on isolated two abnormal classification point detecting method of forest, information data processing terminal
CN110059775A (en) * 2019-05-22 2019-07-26 湃方科技(北京)有限责任公司 Rotary-type mechanical equipment method for detecting abnormality and device
CN110132598A (en) * 2019-05-13 2019-08-16 中国矿业大学 Slewing rolling bearing fault noise diagnostics algorithm
CN110390358A (en) * 2019-07-23 2019-10-29 杨勇 A kind of deep learning method based on feature clustering
CN110505179A (en) * 2018-05-17 2019-11-26 中国科学院声学研究所 A kind of detection method and system of exception flow of network
CN110533314A (en) * 2019-08-23 2019-12-03 西安交通大学 A kind of wind power plant exception unit recognition methods based on probability density distribution

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070282777A1 (en) * 2006-05-30 2007-12-06 Honeywell International Inc. Automatic fault classification for model-based process monitoring
CN106951776A (en) * 2017-01-18 2017-07-14 中国船舶重工集团公司第七0九研究所 A kind of Host Anomaly Detection method and system
CN107977301A (en) * 2017-11-21 2018-05-01 东软集团股份有限公司 Detection method, device, storage medium and the electronic equipment of unit exception
CN108512827A (en) * 2018-02-09 2018-09-07 世纪龙信息网络有限责任公司 The identification of abnormal login and method for building up, the device of supervised learning model
CN110505179A (en) * 2018-05-17 2019-11-26 中国科学院声学研究所 A kind of detection method and system of exception flow of network
CN109978070A (en) * 2019-04-03 2019-07-05 北京市天元网络技术股份有限公司 A kind of improved K-means rejecting outliers method and device
CN110046665A (en) * 2019-04-17 2019-07-23 成都信息工程大学 Based on isolated two abnormal classification point detecting method of forest, information data processing terminal
CN110132598A (en) * 2019-05-13 2019-08-16 中国矿业大学 Slewing rolling bearing fault noise diagnostics algorithm
CN110059775A (en) * 2019-05-22 2019-07-26 湃方科技(北京)有限责任公司 Rotary-type mechanical equipment method for detecting abnormality and device
CN110390358A (en) * 2019-07-23 2019-10-29 杨勇 A kind of deep learning method based on feature clustering
CN110533314A (en) * 2019-08-23 2019-12-03 西安交通大学 A kind of wind power plant exception unit recognition methods based on probability density distribution

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
叶枫,江永省: "基于聚类融合欠采样的不平衡分类方法", vol. 37, no. 1, pages 292 - 297 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111859069A (en) * 2020-07-15 2020-10-30 北京市燃气集团有限责任公司 Network malicious crawler identification method, system, terminal and storage medium
CN111859069B (en) * 2020-07-15 2021-10-15 北京市燃气集团有限责任公司 Network malicious crawler identification method, system, terminal and storage medium
CN112836926A (en) * 2020-12-27 2021-05-25 四川大学 Enterprise operation condition evaluation method based on electric power big data
CN112836926B (en) * 2020-12-27 2022-03-11 四川大学 Enterprise operation condition evaluation method based on electric power big data
CN113762717A (en) * 2021-08-03 2021-12-07 国能国华(北京)电力研究院有限公司 Equipment running state monitoring method and device, electronic equipment and storage medium
CN113822379A (en) * 2021-11-22 2021-12-21 成都数联云算科技有限公司 Process process anomaly analysis method and device, electronic equipment and storage medium
CN113822379B (en) * 2021-11-22 2022-02-22 成都数联云算科技有限公司 Process process anomaly analysis method and device, electronic equipment and storage medium
CN114780338A (en) * 2022-04-14 2022-07-22 京东科技信息技术有限公司 Host information processing method and device, electronic equipment and computer readable medium
CN115438035A (en) * 2022-10-27 2022-12-06 江西师范大学 Data exception handling method based on KPCA and mixed similarity
CN117113235A (en) * 2023-10-20 2023-11-24 深圳市互盟科技股份有限公司 Cloud computing data center energy consumption optimization method and system
CN117113235B (en) * 2023-10-20 2024-01-26 深圳市互盟科技股份有限公司 Cloud computing data center energy consumption optimization method and system

Also Published As

Publication number Publication date
CN111338897B (en) 2024-07-19

Similar Documents

Publication Publication Date Title
CN111338897B (en) Method for identifying abnormal node in application host, monitoring equipment and electronic equipment
Varma et al. Snuba: Automating weak supervision to label training data
Ciaburro MATLAB for machine learning
US20200104409A1 (en) Method and system for extracting information from graphs
KR20210023452A (en) Apparatus and method for review analysis per attribute
CN110795568A (en) Risk assessment method and device based on user information knowledge graph and electronic equipment
US20210157819A1 (en) Determining a collection of data visualizations
CN115244587A (en) Efficient ground truth annotation
US12032605B2 (en) Searchable data structure for electronic documents
CN109657056B (en) Target sample acquisition method and device, storage medium and electronic equipment
CN110389932B (en) Automatic classification method and device for power files
Norris Machine Learning with the Raspberry Pi
Concolato et al. Data science: A new paradigm in the age of big-data science and analytics
CN110708285A (en) Flow monitoring method, device, medium and electronic equipment
Azizi et al. Graph-based generative representation learning of semantically and behaviorally augmented floorplans
Alymani et al. Graph machine learning classification using architectural 3D topological models
Tanha A multiclass boosting algorithm to labeled and unlabeled data
CN115905528A (en) Event multi-label classification method and device with time sequence characteristics and electronic equipment
US20230138367A1 (en) Generation of graphical user interface prototypes
US20240028831A1 (en) Apparatus and a method for detecting associations among datasets of different types
Luna Pattern mining: Current status and emerging topics
US11532174B2 (en) Product baseline information extraction
US20230162518A1 (en) Systems for Generating Indications of Relationships between Electronic Documents
Mengle et al. Mastering machine learning on Aws: advanced machine learning in Python using SageMaker, Apache Spark, and TensorFlow
Ghosh et al. Understanding machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: Jingdong Digital Technology Holding Co.,Ltd.

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Digital Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: JINGDONG DIGITAL TECHNOLOGY HOLDINGS Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant