CN109040155B

CN109040155B - Asset identification method and computer equipment

Info

Publication number: CN109040155B
Application number: CN201710428726.9A
Authority: CN
Inventors: 严子洋
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Zhejiang Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Zhejiang Co Ltd
Priority date: 2017-06-08
Filing date: 2017-06-08
Publication date: 2021-06-08
Anticipated expiration: 2037-06-08
Also published as: CN109040155A

Abstract

The embodiment of the invention provides an asset identification method and computer equipment. The method comprises the following steps: acquiring first transmission data of assets to be identified in a preset time period, wherein the first transmission data comprise the asset quantity and the data packet quantity which are subjected to data transmission with the assets to be identified; mapping the first transmission data to a first space, and determining the position of the first transmission data in the first space; and determining the system type of the to-be-identified asset corresponding to the first transmission data according to the position of the first transmission data in the first space and the corresponding relation between the predetermined system type and the position of the first sample transmission data in the first space. According to the method, the position of the first transmission data in the first space is obtained through the first transmission data of the assets to be identified, and the system type of the assets to be identified is determined according to the corresponding relation between the system type and the position of the first sample transmission data in the first space, so that the system type to which the assets belong can be identified automatically and intelligently, and the working efficiency is improved.

Description

Asset identification method and computer equipment

Technical Field

The embodiment of the invention relates to the technical field of information, in particular to an asset identification method and computer equipment.

Background

The business refers to the sum of a series of processes such as production, operation, transaction processing and the like of enterprises and organizations.

With the introduction of Information Technology, services have been tightly coupled with IT (Information Technology).

From the IT perspective, the service includes an IT support system (referred to as a service support system for short), service data, a service process, and a service participant. The service support system is a basic stone of the service, and includes various software and hardware IT resources for carrying service operation, such as network equipment, security equipment, a host, a database, middleware, and the like.

The operation and maintenance department of the business is responsible for the management of hardware assets and software assets, and the hardware, software and the combination of the hardware and the software of the equipment are collectively called as assets. The IT resources are organically combined together and share a group of tasks for generating specific customer value, so that a business support system is formed.

The service support system topological diagram is a view of the relationship between the assets built by taking the service as a link on the basis of the traditional asset management. Based on the service topology, the user can know the system to which each asset belongs at a glance and know the current operation state and safety state of the service operated by the asset. In the service topology, the actual state of the asset is represented by a visual icon, so that the state of the asset can be visually seen, whether the asset is normal or unavailable or whether an alarm exists can be visually seen. If the service fails, whether the host or the database or the switch has a problem can be quickly checked, and service failure diagnosis can be conveniently and quickly carried out along service topology.

The first forming of the topological graph of the service support system is basically that the integrated scheme is planned and drawn manually at the initial stage of system construction. However, after the system is formally on line, the topology of the service support system may need to be updated due to the need for modification and adjustment due to performance problems, service expansion, and the like.

In the prior art, the topological graph of the service support system is mainly updated by manual maintenance, and according to the updated data from the process flow or the updated data obtained by automatic acquisition, the system to which the assets to be identified belong is determined by a manual mode for the assets to be identified, and the topological structures are matched one by one, so that the purpose of updating the topological graph of the system is achieved.

It can be understood that the system topology map updating method has the following defects: asset update data relies on manual repetitive recognition and is inefficient.

At present, no corresponding method is available in the prior art to solve the problem of low efficiency of manual identification.

Disclosure of Invention

In order to overcome the defects in the prior art, the embodiment of the invention provides an asset identification method and computer equipment.

In one aspect, an embodiment of the present invention provides an asset identification method, including: acquiring first transmission data of assets to be identified in a preset time period, wherein the first transmission data comprise the asset quantity and the data packet quantity which are subjected to data transmission with the assets to be identified; mapping the first transmission data to a first space, and determining the position of the first transmission data in the first space; and determining the system type of the to-be-identified asset corresponding to the first transmission data according to the position of the first transmission data in the first space and the corresponding relation between the predetermined system type and the position of the first sample transmission data in the first space.

In another aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, a bus, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the following method:

acquiring first transmission data of assets to be identified in a preset time period, wherein the first transmission data comprise the asset quantity and the data packet quantity which are subjected to data transmission with the assets to be identified; mapping the first transmission data to a first space, and determining the position of the first transmission data in the first space; and determining the system type of the to-be-identified asset corresponding to the first transmission data according to the position of the first transmission data in the first space and the corresponding relation between the predetermined system type and the position of the first sample transmission data in the first space.

The first space is provided with a predetermined logistic regression curve, the logistic regression curve corresponds to a preset system type and is determined according to first sample transmission data of the system; correspondingly, the determining the system type of the asset to be identified corresponding to the first transmission data specifically includes: and determining whether the assets to be identified corresponding to the first transmission data belong to the system or not according to the relative position of the first transmission data in the first space and the logistic regression curve.

After determining the system type of the asset to be identified corresponding to the first transmission data, the method further includes:

acquiring second transmission data of assets to be identified in a preset time period, wherein the second transmission data comprise data packet numbers of the assets to be identified and assets in the system for data transmission and application user types of the assets in the system; mapping the second transmission data to a second space, and determining the position of the second transmission data in the second space; and determining the asset type of the asset to be identified corresponding to the second transmission data according to the position of the second transmission data in the second space and the corresponding relation between the predetermined asset type and the position of the second sample transmission data in the second space.

The second space has at least one predetermined center of mass corresponding to a preset asset type and determined from second sample transmission data of the asset;

correspondingly, the asset type of the asset to be identified corresponding to the second transmission data is determined, specifically: and taking the asset type corresponding to the centroid closest to the second space distance of the second transmission data as the asset type of the asset to be identified.

After determining the asset type of the asset to be identified corresponding to the second sample transmission data, the method further includes:

acquiring third transmission data of assets to be identified in a preset time period, wherein the third transmission data comprise the number of data packets transmitted between the assets to be identified and the assets of the same asset type, and the cluster comprises a plurality of assets of which the asset types are the same as those of the assets to be identified; mapping the third transmission data to a third space, and determining the position of the third transmission data in the third space; and determining the cluster type of the to-be-identified asset corresponding to the third transmission data according to the position of the third transmission data in the third space and the corresponding relation between the predetermined cluster type and the position of the third sample transmission data in the third space.

The third space is provided with a predetermined multivariate fitting curve, the multivariate fitting curve corresponds to a preset cluster type and is determined according to third sample transmission data of the cluster; correspondingly, the third transmission data are mapped to a third space, and the position of the third transmission data in the third space is determined, specifically, a fitting curve of the third transmission data in the third space is obtained;

determining the cluster type of the asset to be identified corresponding to the third transmission data according to the position of the third transmission data in the third space and the corresponding relationship between the predetermined cluster type and the position of the third sample transmission data in the third space, specifically:

determining a fitting coefficient of a fitting curve of the third transmission data and a predetermined fitting curve; and taking the cluster type corresponding to the fitting curve when the fitting coefficient is maximum as the cluster type of the assets to be identified.

According to the technical scheme, the asset identification method, the operation server and the service server provided by the embodiment of the invention have the advantages that the position of the first transmission data in the first space is obtained through the first transmission data of the asset to be identified, and the system type of the asset to be identified is determined according to the corresponding relation between the system type and the position of the first sample transmission data in the first space, so that the system type to which the asset belongs can be automatically and intelligently identified, and the working efficiency is improved.

Drawings

FIG. 1 is a schematic flow chart of an asset identification method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of an asset identification method according to another embodiment of the present invention;

FIG. 3 is an image of a sigmoid function provided by yet another embodiment of the present invention;

FIG. 4 is a schematic diagram of a logistic regression curve obtained by machine training of an asset identification method according to another embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating automatic system attribution determination according to an asset identification method according to another embodiment of the present invention;

FIG. 6 is a schematic flow chart of a method for asset identification according to yet another embodiment of the present invention;

FIG. 7 is a schematic flow chart of a method for asset identification according to yet another embodiment of the present invention;

FIGS. 8-11 are schematic diagrams of a k-means algorithm of an asset identification method according to another embodiment of the present invention;

FIG. 12 is a schematic diagram of a server asset class attribution model obtained by machine training of an asset identification method according to yet another embodiment of the present invention;

FIG. 13 is a schematic diagram illustrating a server asset class attribution model determination of an asset identification method according to yet another embodiment of the present invention;

FIG. 14 is a schematic flow chart diagram illustrating a method for asset identification according to yet another embodiment of the present invention;

FIG. 15 is a schematic flow chart diagram illustrating a method for asset identification according to yet another embodiment of the present invention;

FIG. 16 is a schematic view of a multi-fit curve for a cluster of a method for asset identification according to yet another embodiment of the present invention;

FIG. 17 is a schematic diagram of a multi-element fitted curve of a plurality of clusters in a system for asset identification according to yet another embodiment of the present invention;

FIG. 18 is a schematic diagram illustrating cluster attribution model determination of an asset identification method according to yet another embodiment of the present invention;

FIG. 19 is a schematic view of an asset identification system provided in accordance with yet another embodiment of the present invention;

fig. 20 is a schematic structural diagram of a computer device according to yet another embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention.

In the present embodiment, the hardware, software and a combination of both of the devices are collectively referred to as assets.

For convenience of explanation, the asset is exemplified as a server.

In the operation and maintenance process of the service support system, due to the requirements of performance problems, service expansion and the like, all assets can be changed and adjusted, so that unknown assets exist in the process, and for the assets to be identified, asset identification can be carried out based on machine learning so as to determine the system to which the assets to be identified belong.

The service Support System topology includes a plurality of application systems, such as a CM System (Customer Relationship Management), a BOSS System (Business & Operation Support System), a Customer service System, and the like.

Fig. 1 is a schematic flow chart illustrating an asset identification method according to an embodiment of the present invention.

Referring to fig. 1, the method provided by the embodiment of the present invention specifically includes the following steps:

step 11, obtaining first transmission data of assets to be identified in a preset time period, wherein the first transmission data comprises the asset quantity and the data packet quantity of data transmission with the assets to be identified.

Optionally, for the asset to be identified, based on the identifier of the asset to be identified, first transmission data of the asset to be identified within a preset time period is acquired through a network package tool.

The assets to be identified can be servers to be identified, the number of the assets is the number of other servers which perform data transmission with the servers to be identified, and the number of the data packets is the number of the data packets transmitted between the servers to be identified and the other servers.

The network package tool can acquire the data package transmitted by the asset to be identified and other assets, and the data package is analyzed to obtain other asset information which has an association relation with the asset to be identified.

Specifically, the existence of the association refers to transmission with the asset to be identified. Other asset information includes the number of other assets, i.e., how many assets are in transmission with the asset to be identified, and other asset attributes. The attributes of the other assets are parameters such as the system to which each asset belongs, the asset type, the cluster to which the asset belongs, etc. In addition, the number of the data packets transmitted by the assets to be identified can be obtained by analyzing the data packets.

For example, based on the unknown server IP, the tcpdump command is used to grab the traffic packets passing through the network card of the server within a specified time period.

Wherein, tcpdump (product the traffic a network), and a packet analysis tool for intercepting the data packet on the network according to the definition of the user. Support filtering for network layer, protocol, host, network or port, and provide logical statements and, or, not, etc.

Specifically, the traffic packet includes system time, source host, port > target host, port, data packet parameters, and the like, and analyzes other IPs having an association relationship with the IP in the traffic packet, and counts the number of traffic packets between the IP and other associated IPs.

And step 12, mapping the first transmission data to a first space, and determining the position of the first transmission data in the first space.

In this embodiment, the position of the first transmission data in the first space may be obtained by using a plurality of data visualization manners.

Alternatively, the first space may be a two-dimensional coordinate system, and the first transmission data may be mapped to one two-dimensional coordinate system in various ways. The ordinate of the transmission data can be determined according to the asset number, and the abscissa of the transmission data can be determined according to the data packet number, so that the position of the first transmission data in the first space is determined.

And step 13, determining the system type of the to-be-identified asset corresponding to the first transmission data according to the position of the first transmission data in the first space and the corresponding relation between the predetermined system type and the position of the first sample transmission data in the first space.

Before the step, acquiring the corresponding relation between the predetermined system type and the position of the first sample transmission data in the first space, wherein the corresponding relation is obtained through a machine learning algorithm.

Optionally, based on a known server IP, capturing a traffic packet passing through the network card of the server within a specified time period through a tcpdump command, analyzing other IPs in the traffic packet, which have an association relationship with the IP, and counting the number of the traffic packets between the IPs.

And comparing the analyzed IP with an asset database, wherein the asset database is a database which stores equipment hardware and software configuration information responsible for all operation and maintenance departments, each server asset has at least one IP record, the unknown asset is obtained by searching, and a relation matrix between the unknown asset and the known asset is summarized by using a statistical method, as shown in the following table 1.

In the table, the horizontal entries are unknown assets, the column entries are known assets, and the table entries are the number of traffic packets.

For example, the number of the mutual traffic packets between the two assets w1 and a1 in the collection period is 5.

The method comprises the steps of taking asset historical data of a known system as first sample transmission data, training through a machine learning algorithm to obtain a system attribution model of the known asset, namely the corresponding relation between each system type and the position of the first sample transmission data in a first space, matching the position of the first transmission data in the first space with the position of the first sample transmission data in the first space, determining that the asset to be identified corresponding to the first transmission data is attributed to the system if the first transmission data is matched with the first sample transmission data in the first space, matching the asset to be identified with the position of the other first sample transmission data in the first space if the first transmission data is not matched with the first sample transmission data in the first space, and the like, so that whether a new unknown asset belongs to the system or not can be automatically judged through the model.

The asset identification method provided by the embodiment at least has the following technical effects:

the method comprises the steps of obtaining the position of first transmission data in a first space according to the first transmission data of the asset to be identified, and determining the system type of the asset to be identified corresponding to the first transmission data according to the corresponding relation between the predetermined system type and the position of the first sample transmission data in the first space, so that the system type to which the asset belongs can be automatically and intelligently identified, and the working efficiency is improved.

Fig. 2 is a flowchart illustrating an asset identification method according to another embodiment of the present invention.

Referring to fig. 2, on the basis of the above embodiment, the asset identification method provided by the present embodiment. The first space has a predetermined logistic regression curve corresponding to a predetermined system type and determined according to first sample transmission data of the system.

And aiming at each application system in the service support system, each system corresponds to a logistic regression curve in a first space, and the logistic regression curve can be used for judging whether the assets to be identified belong to the system corresponding to the logistic regression curve.

In step 13 of the method, there are various ways for determining the system type of the asset to be identified corresponding to the first transmission data, and this embodiment illustrates one of them.

And step 13', determining whether the assets to be identified corresponding to the first transmission data belong to the system or not according to the relative position of the first transmission data and the logistic regression curve.

After determining the position of the first transmission data in the first space, classifying the first transmission data according to the classification characteristic of the logistic regression curve through the relative position of the first transmission data in the first space and the logistic regression curve, and determining whether the assets to be identified belong to the system.

Before step 13', training is carried out through a machine learning algorithm of logistic regression based on asset historical data of the known system, and a system attribution model of the known asset is obtained.

Specifically, Logistic Regression (LR), also called Logistic Regression analysis, is one of classification and prediction algorithms, and predicts the occurrence of future results through the expression of historical data.

Specifically, first sample transmission data of a known asset in a preset time period is obtained, wherein the first sample transmission data comprises the asset number and the data packet number transmitted by the known asset; and training the first sample transmission data by adopting a logistic regression algorithm, and determining the logistic regression curve corresponding to a preset system type.

Logistic regression algorithms use known independent variables to predict the value of a discrete dependent variable. The algorithm is as follows: the goal of conventional regression algorithms is to fit a polynomial function f (x) such that the error between the predicted and true values is minimized. The specific formula is as follows:

f(x)＝c₀+c₁x_i+…+c_n-1x_n

where n is the number of features and c is the fitting coefficient for each feature.

Suppose a data set has n independent features, x₁To x_nIs n features of the sample, where [ x₁…x_n]Is an input vector, so the training process is determined to be [ c₀,c₂…c_n-1]Such that the expression is most accurate for the output values of the plurality of input vectors.

In order to make f (x) have good logical judgment property, it is preferable to directly express the probability that the sample with the feature x is classified into a certain class, for example, when f (x) >0.5, it can be expressed that x is classified into a positive class, f (x) <0.5, it is expressed that x is classified into a negative class, and f (x) is always between [0,1 ]. Introducing a sigmoid function. This function is defined as follows:

fig. 3 shows an image of a sigmoid function provided by another embodiment of the present invention.

Referring to fig. 3, the sigmoid function has the characteristics required by the present embodiment, which defines a domain of all real numbers, a domain of values between [0,1], and a 0-point value of 0.5.

The means to convert f (x) into sigmoid function is:

let p (x) ═ 1 be the probability that a sample with feature x is classified into category 1, then p (x)/[1-p (x) ] is defined as the odds ratio.

Introducing logarithm:

solving for p (x) by the above formula to obtain the following formula:

after the required sigmoid function is obtained, only the n parameters c in the formula need to be fitted like the conventional linear regression. This transformation is called a logit transformation, also called a logical transformation.

Determining the logistic regression curve according to the first sample transmission data of the system, specifically:

performing machine training on the known system assets, namely the first sample transmission data through a logistic regression algorithm, and realizing the following steps:

the independent variables are two ratios, the numerator of the first ratio is the known system asset and the number of the nodes related in the designated system, and the denominator is the total number of the nodes of the designated system; the first numerator of the ratio is the number of traffic packets for known system assets and other associated nodes within the given system, and the denominator is the total number of traffic packets for the given system.

The dependent variable is whether a known system asset belongs to a known system.

Based on the above variables, training is performed by a logistic regression algorithm, and a logistic regression curve of a designated system, i.e., a preset system, can be obtained according to p (x) training.

FIG. 4 is a diagram illustrating a logistic regression curve obtained by machine training of an asset identification method according to another embodiment of the present invention.

Referring to fig. 4, in the first space, the ordinate is the first ratio, the abscissa is the second ratio, and the curve in the diagram is a logistic regression curve.

The nodes which are round at the left lower part of the curve are the IPs which do not belong to the system, and the nodes which are square at the right upper part of the curve are the IPs which belong to the system, so that the nodes can be classified according to the classification characteristic of the logistic regression curve, and whether the nodes belong to the system or not is determined.

And for the unknown asset nodes obtained by comparing the traffic packet capturing, counting the association number and the traffic packet number of the unknown nodes and other nodes, and automatically judging whether the unknown nodes belong to the system by using a system attribution model.

Fig. 5 is a schematic diagram illustrating system ownership automatic determination of an asset identification method according to yet another embodiment of the present invention.

Referring to fig. 5, if it is determined that the position of the first transmission data in the first space is marked by a circle at the lower left of the curve, it indicates that the first transmission data does not belong to the system corresponding to the logistic regression curve. And if the circle mark indicating that the position of the first transmission data in the first space is on the upper right of the curve is obtained through judgment, the system corresponding to the logistic regression curve is represented.

by setting a logistic regression curve in the first space and determining whether the assets to be identified corresponding to the first transmission data belong to the system or not according to the relative position of the first transmission data and the logistic regression curve, the system type to which the assets belong can be automatically and intelligently identified, and therefore the assets can be rapidly and accurately identified.

Fig. 6 is a flowchart illustrating an asset identification method according to another embodiment of the present invention.

Referring to fig. 6, on the basis of the above embodiment, after the step 13, the method provided by the embodiment of the present invention may be further applied to identify the asset type in the system.

Alternatively, there are multiple types of assets within a system, such as APP servers, Web servers, cache servers, interface servers, file servers, database servers, and the like.

The method provided by the embodiment of the invention specifically comprises the following steps:

and 21, acquiring second transmission data of the assets to be identified in a preset time period, wherein the second transmission data comprises the data packet number of the assets to be identified and the assets in the system for data transmission and the application user types of the assets in the system.

Optionally, for the assets to be identified, second transmission data of the assets to be identified within a preset time period is acquired through the network package tool based on the identification of the assets to be identified.

The assets to be identified can be servers to be identified, the number of the data packets is the number of the data packets transmitted between the servers to be identified and the servers in the system, and the application user type refers to the user type of the application software on the servers in the system.

For example, the servers in the system are APP servers, and besides APP type users, there may be multiple types of users on the APP servers.

The network package tool can obtain the data package transmitted by the asset to be identified and other assets, and the data package is analyzed to obtain the data package number transmitted by the asset to be identified and the assets in the system, and the application user type on the assets in the system.

And step 22, mapping the second transmission data to a second space, and determining the position of the second transmission data in the second space.

In this embodiment, the position of the second transmission data in the second space may be obtained by using a plurality of data visualization manners. Optionally, the second space may be a two-dimensional coordinate system, and there may be various ways of mapping the second transmission data to one two-dimensional coordinate system.

Specifically, the abscissa of the second transmission data may be determined according to the application user type of the asset, and the total coordinates of the transmission data may be determined according to the number of packets transmitted by the asset in the system, so as to obtain the position of the second transmission data in the second space.

And step 23, determining the asset type of the asset to be identified corresponding to the second transmission data according to the position of the second transmission data in the second space and the corresponding relation between the predetermined asset type and the position of the second sample transmission data in the second space.

Before this step, a correspondence between the predetermined asset type and the position of the second sample transmission data in the second space is obtained, the correspondence being obtained by a machine learning algorithm.

Optionally, based on the system of the asset to be identified, capturing the traffic packet passing through the network card of the server within a specified time period through a tcpdump command, analyzing other IPs in the traffic packet, which have an association relationship with the IP, in the system, and counting the number of the traffic packets between the IPs, and the application software users on the asset.

And comparing the analyzed IP with an asset database, wherein the asset database is a database which stores equipment hardware and software configuration information responsible for all operation and maintenance departments, each server asset has at least one IP record, and the relation between the unknown asset and the known system is obtained according to a lookup table 1.

For example, if the system to which the asset to be identified belongs is determined to be the known system 1 in table 1 through step 13, information related to the asset in the known system 1 can be obtained.

And taking the asset historical data of the known system as second sample transmission data, training by a machine learning algorithm to obtain an asset attribution model of the known asset, namely the corresponding relation between each asset type and the position of the second sample transmission data in the second space, and matching the position of the second transmission data in the second space with the position of the second sample transmission data in the second space.

For example, the position of the second sample transmission data in the second space corresponds to an asset of which the asset type is an APP server, if the position of the second transmission data in the second space matches the position of the asset of the APP server, it can be determined that the asset to be identified corresponding to the second transmission data is the APP server, if not, the asset is matched with the position of the second sample transmission data of the corresponding WEB server in the second space, and so on, so that which type of server asset the new unknown asset belongs to can be automatically determined through the model.

the position of the second transmission data in the second space is obtained according to the second transmission data of the assets to be identified, and the asset type of the assets to be identified corresponding to the second transmission data is determined according to the corresponding relation between the predetermined asset type and the position of the second sample transmission data in the second space, so that the asset type of the assets to be identified can be automatically and intelligently identified, and the working efficiency is improved.

Fig. 7 is a flowchart illustrating an asset identification method according to another embodiment of the present invention.

Referring to fig. 7, on the basis of the above embodiment, the asset identification method provided by the present embodiment. The second space has at least one predetermined centroid, which corresponds to a preset asset type and is determined from second sample transmission data of the asset.

In step 23 of the method, there are various ways for determining the asset type of the asset to be identified corresponding to the second transmission data, and this embodiment illustrates one of the ways.

And step 23', taking the asset type corresponding to the centroid closest to the second transmission data as the asset type of the asset to be identified.

After the position of the second transmission data in the second space is determined, the second transmission data can be classified according to the classification characteristic of the centroid through the distance between the second transmission data and the centroid in the second space, and whether the asset to be identified belongs to the asset type is determined.

Before step 23', after determining the system attribution of the unknown asset, performing machine training based on the asset historical data of the known system and the K-Means algorithm to obtain a model, and judging which asset type the unknown asset belongs to.

Specifically, the k-means algorithm is a very common clustering algorithm, and its basic idea is: by iteratively searching a partitioning scheme of k clusters, the overall error obtained when the mean of the k clusters is used to represent the samples of the corresponding classes is minimized.

Specifically, second sample transmission data of the known assets in a preset time period are obtained, wherein the second sample transmission data comprise the number of data packets transmitted with the known assets in the system and the application user types of the known assets; and training the second sample transmission data by adopting a k-means clustering algorithm, and determining the centroid corresponding to the preset asset type.

The basis of the k-means algorithm is the sum of the squares of the minimum errors criterion. The cost function is:

where μ c (i) represents the mean of the ith cluster, i.e., the mean of the eigenvalues of the algorithm itself.

In this step, the smaller the cost function is, the better the cost function is, and intuitively speaking, the more similar the samples in each class are, the smaller the square error between the samples and the mean value of the class is, the sum of the square errors obtained by all classes is obtained, that is, whether each cluster is optimal when the cluster is classified into k classes can be verified.

The cost function of the above formula cannot be minimized by an analytic method, and only an iterative method is available. The k-means algorithm is to cluster the samples into k clusters (cluster), where k is given and the solving process is: each sample is assigned to a class c (i) by calculating the convergence, and the algorithm is described as follows:

1. randomly selecting k clustering centroid points;

2. the following process is repeated until convergence.

For each sample i, calculate the class to which it should belong:

wherein c (i) represents the classification of i, μ_jRepresenting the centroid of j.

For each class j, the centroid of the class is recalculated:

in the formula, mu_jRepresenting the centroid of j.

Fig. 8-11 respectively show k-means algorithm diagrams of an asset identification method according to still another embodiment of the present invention.

Referring to fig. 8-11, 4 centroids obtained by calculation in the same example are shown.

Determining the centroid according to the second sample transmission data of the system, specifically: and performing machine training on the known system assets, namely second sample transmission data through a k-means algorithm, and realizing the following steps:

fig. 12 is a schematic diagram illustrating a server asset class attribution model obtained by machine training of an asset identification method according to yet another embodiment of the present invention.

Referring to fig. 12, 6 clustering centroid points are taken, which respectively represent 6 classes of server assets, the types of application users on the known servers and the number of traffic packets between the known servers and other nodes are counted, the class attribution of the known assets is confirmed through calculation, a known server distribution graph is drawn, the centroid of each class of assets is recalculated, and the above process is repeated until all the known server assets are classified into the graph.

Fig. 13 is a schematic diagram illustrating a server asset class attribution model determination of an asset identification method according to yet another embodiment of the present invention.

Referring to fig. 13, for a newly added unknown server asset, the types of application users and the number of traffic packets of other nodes are counted, and the traffic packets are displayed on a distribution diagram in the same manner, and the server assets to which the unknown asset belongs are automatically determined by judging the distance from different centroid points.

For example, if it is determined that the position of the second transmission data in the second space is closest to the centroid of the APP server, it indicates that the second transmission data and the APP server belong to the same asset type.

the center of mass is arranged in the second space, and the type of the assets to be identified corresponding to the second transmission data is determined according to the distance between the second transmission data and the center of mass, so that the type of the assets to be identified can be automatically and intelligently identified, and the assets can be quickly and accurately identified.

Fig. 14 is a flowchart illustrating an asset identification method according to yet another embodiment of the present invention.

Referring to fig. 14, on the basis of the foregoing embodiment, after step 23, the method provided by the embodiment of the present invention may be further applied to identify the cluster type in the system.

Alternatively, assets of the same asset type may form multiple clusters in the same system, each cluster including multiple assets of the same asset type.

and 31, acquiring third transmission data of the assets to be identified in a preset time period, wherein the third transmission data comprise the number of data packets transmitted between the assets to be identified and the assets of the same asset type and the identification of the assets in the system.

Optionally, for the assets to be identified, based on the identifiers of the assets to be identified, third transmission data of the assets to be identified within a preset time period are acquired through a network package tool.

The assets to be identified can be servers to be identified, the number of the data packets is the number of the data packets transmitted between the servers to be identified and the servers in the system, wherein the servers to be identified belong to the same type as the servers to be identified, and the identification of the assets in the system can be the serial number of the servers, which belong to the same type as the servers to be identified.

For example, the servers to be identified are APP servers, the total number of known APP servers in the system is 7,7 APP servers have corresponding server identifiers such as numbers 1 to 7, and the number of data packets distributed by the APP servers to be identified and performing data transmission with the 7 APP servers is obtained.

The network package tool can acquire the data packages transmitted by the assets to be identified and other assets, and the asset identification transmitted by the assets to be identified and the asset number transmitted by analyzing the data packages.

And step 32, mapping the third transmission data to a third space, and determining the position of the third transmission data in the third space.

The position of the third transmission data in the third space can be obtained by adopting a plurality of data visualization modes. Optionally, the third space may be a two-dimensional coordinate system, and there may be a plurality of ways to map the third transmission data to one two-dimensional coordinate system. Specifically, the abscissa of the third transmission data may be determined according to the asset identifier, and the total coordinate of the transmission data may be determined according to the number of data packets transmitted by the asset, so as to obtain the position of the third transmission data in the third space.

And step 33, determining the cluster type of the asset to be identified corresponding to the third transmission data according to the position of the third transmission data in the third space and the corresponding relationship between the predetermined cluster type and the position of the third sample transmission data in the third space.

Before this step, a corresponding relation between the predetermined cluster type and the position of the third sample transmission data in the third space is obtained through a machine learning algorithm.

Optionally, based on the system of the asset to be identified, capturing the traffic packet passing through the network card of the server within a specified time period through a tcpdump command, analyzing other IPs in the traffic packet, which have an association relationship with the IP, in the system, and counting the number of the traffic packets between the IPs.

And comparing the analyzed IP with an asset database, wherein the asset database is a database storing configuration information of hardware and software of equipment in charge of all operation and maintenance departments, each server asset has at least one IP record, and the cluster type of the assets of the known system is obtained according to a lookup table.

And taking the asset historical data of the known system as third sample transmission data, training by a machine learning algorithm to obtain a cluster attribution model of the known asset, namely the corresponding relation between the cluster to which each asset belongs and the position of the third sample transmission data in a third space, and matching the position of the third transmission data in the third space with the position of the third sample transmission data in the third space.

For example, the position of the third sample transmission data in the third space corresponds to an asset of the first cluster type, if the position of the third sample transmission data in the third space matches the position of the asset of the first cluster, it may be determined that the asset to be identified corresponding to the third transmission data belongs to the first cluster, if not, the position of the third sample transmission data in the third space matches the position of the asset of the second cluster type in the third space, and so on, so that which cluster the new unknown asset belongs to can be automatically determined through the model.

the position of the third transmission data in the third space is obtained according to the third transmission data of the assets to be identified, and the cluster type of the assets to be identified corresponding to the third transmission data is determined according to the corresponding relation between the predetermined cluster type and the position of the third sample transmission data in the third space, so that the cluster type to which the assets belong can be identified automatically and intelligently, and the working efficiency is improved.

Fig. 15 is a flowchart illustrating an asset identification method according to yet another embodiment of the present invention.

Referring to fig. 15, on the basis of the above embodiment, the present embodiment provides an asset identification method. The third space has a predetermined multivariate fit curve corresponding to a preset cluster type and determined from third sample transmission data of the cluster. The method comprises the steps that a certain system in a business support system comprises multiple asset types, the assets of the same type form multiple clusters, each cluster corresponds to a multivariate fitting curve in a third space, and the multivariate fitting curve can be used for judging whether an asset to be identified belongs to the cluster corresponding to the multivariate fitting curve.

The method of step 32 may determine the position of the third transmission data in the third space in various ways, and this embodiment illustrates one of them.

The method comprises the following steps:

and step 32', obtaining a fitting curve of the third transmission data in a third space.

For the server assets, counting the IP of the server assets and the associated servers and the quantity of the traffic packets, and calculating a fitting curve f_w(x)。

Specifically, the abscissa of the third transmission data is determined according to the number of assets transmitted to the assets to be identified, the ordinate of the third transmission data is determined according to the number of data packets, and a fitting curve of the third transmission data in a third space can be obtained according to a fitting algorithm in the prior art.

For example, a multivariate fitting algorithm is used:

given a certain function type y ═ f (x), and a set of m data points (x, y), the absolute deviation | y is minimized_i-f(x_i) The sum of squares of i, i.e. the parameters in the determination function y ═ f (x), is minimized:

in the formula, parameters in the function are used as independent variables, the sum of squares of absolute deviations is used as a target function, and the solution can be realized by utilizing a multivariate function extremum theory.

Such as: let the form of the prospective model be:

y＝f(x)＝a₀+a₁x+…+a_nxⁿ

wherein n is fixed.

Minimization is required by least squares estimation (for example, n-2):

s, calculating the partial derivative of the parameter, and making the partial derivative equal to zero to obtain an equation:

matrix representation of the normal equation set solution:

then the normal system of equations is:

(AA^T)a＝A^Ty

if AAT is reversible, the normal equation is solved as:

a＝(AA^T)^-1A^Ty

and obtaining a y value, namely a fitting curve of the third transmission data in a third space.

In step 33 of the method, there are various ways for determining the cluster type of the asset to be identified corresponding to the third transmission data, which is illustrated in this embodiment.

The method comprises the following steps:

and 331, determining a fitting coefficient of a fitting curve of the third transmission data and a predetermined fitting curve.

And 332, taking the cluster type corresponding to the fitting curve with the maximum fitting coefficient as the cluster type of the assets to be identified.

Fitting coefficients of a fitting curve of the third transmission data in the third space and a predetermined fitting curve are calculated. Wherein, the larger the fitting coefficient is, the closer to which curve is, i.e. which cluster is attributed.

And through calculation of a fitting coefficient of a fitting curve of the third transmission data and a predetermined fitting curve in a third space, the third transmission data can be classified to determine which cluster the assets to be identified belong to.

Before step 331, after determining that the asset type of the unknown asset belongs to, for the same asset type, there are a plurality of clusters, and it is also necessary to determine which cluster belongs to, and a model can be obtained by performing machine training based on asset history data of a known system and a machine learning algorithm of multivariate fitting, and it is determined which cluster the unknown asset belongs to.

Specifically, third sample transmission data of the known assets in a preset time period are obtained, wherein the third sample transmission data comprise the quantity of the assets transmitted by the known assets and the assets outside the cluster, and the quantity of transmitted data packets; and training the third sample transmission data by adopting a multivariate fitting algorithm, and determining the multivariate fitting curve corresponding to a preset cluster type.

And (3) training the flow packet capturing information of the known system cluster in real time to obtain a model through a machine learning algorithm of multivariate fitting, so as to realize automatic judgment of the newly added unknown node cluster.

And respectively counting the quantity of the interactive flow packets of the servers outside the cluster and the servers inside the cluster, then taking the average value, and calculating by a multivariate fitting algorithm to obtain a fitting curve f (x) of the known cluster.

Fig. 16 is a diagram illustrating a multi-fit curve of a cluster of an asset identification method according to yet another embodiment of the present invention.

Referring to fig. 16, for a plurality of clusters within a system, each cluster is analyzed. And acquiring the quantity of the assets transmitted by each cluster and the outside of the cluster as an abscissa of a third space, and acquiring a multivariate fitting curve of each known cluster in the third space by taking the total quantity of the traffic packets transmitted by each cluster as an ordinate.

Fig. 17 is a diagram illustrating a multi-cluster multivariate fit curve in a system of an asset identification method according to yet another embodiment of the present invention.

Referring to FIG. 17, in the same way, a multi-fit curve can be obtained for all clusters in the systemf₁(x)、f₂(x)……f_n(x)。

Fig. 18 is a schematic diagram illustrating a cluster attribution model determining method according to yet another embodiment of the present invention.

Referring to fig. 18, for the newly added unknown server asset, the number of assets transmitted by the newly added unknown server asset is counted, and the counted number is displayed on the third space in the same manner as the number of traffic packets.

Specifically, the IP and the quantity of the associated servers and the quantity of the flow packets are counted, and a regression curve f is calculated_w(x) And the regression curve f of the known clusters₁(x) And (6) comparing.

The fitting coefficient is calculated by the following formula, and the larger the fitting coefficient is, the closer to which curve is, i.e., which cluster is to be attributed.

In the formula, R is a fitting coefficient, n is a cluster number, and fw (xi) is a y value corresponding to each curve xi.

the fitting curve of each cluster in the system is set in the third space, and the cluster to which the assets to be identified belong corresponding to the third transmission data is determined according to the fitting coefficient of the third transmission data and the fitting curve, so that the cluster type to which the assets belong can be automatically and intelligently identified, and the assets can be rapidly and accurately identified.

Fig. 19 is a schematic diagram of an asset identification system according to yet another embodiment of the present invention.

Referring to fig. 19, a system for identifying assets according to still another embodiment of the present invention includes a traffic packet capturing module, a system attribution judging module, an asset attribution judging module, and a cluster attribution judging module.

The traffic packet capturing module is used for capturing traffic packets passing through the network card of the server in a specified time period through a tcpdump command based on a known server IP, analyzing other IPs in the traffic packets, wherein the other IPs have an association relation with the IP, and counting the number of the traffic packets between the IPs. And comparing the analyzed IP with an asset database to find out unknown assets, and inducing a relationship matrix between the unknown assets and the known assets by a statistical method, wherein the relationship matrix is shown in a table 1.

The system attribution judging module is used for training through a machine learning algorithm of logistic regression based on asset historical data of a known system to obtain a system attribution model of the known asset, and can automatically judge whether a new unknown asset belongs to the system through the model.

In particular, the system attribution determination module may run the steps implementing the methods of fig. 1 and 2.

The logistic regression curve as shown in fig. 4 can be trained by machine training of the known system assets through the logistic regression algorithm.

For the unknown asset nodes obtained by comparing the traffic packet capturing, the association number and the traffic packet number of the unknown nodes and other nodes are counted, and whether the unknown nodes belong to the system can be automatically judged by applying the system attribution model, such as the system attribution automatic judgment shown in fig. 5.

And the asset type judgment module is used for performing machine training according to known various asset information and a K-Means algorithm in the system to obtain a model after determining the system attribution of the unknown asset, and judging which asset type the unknown asset belongs to.

In particular, the asset type determination module may run steps implementing the methods of fig. 6 and 7.

The machine training is carried out on the known system assets through a k-means algorithm, and the method is specifically realized as follows:

and (3) taking 6 clustering center points which respectively represent 6 types of server assets, counting the types of application users on the known server and the quantity of flow packets between the known server assets and other nodes, confirming the class attribution of the known assets through calculation, drawing a known server distribution diagram, recalculating the center of mass of each type of assets, repeating the process until all the known server assets are classified into the diagram, and training a server asset class attribution model as shown in the figure 12.

For newly added unknown server assets, the types of application users and the quantity of flow packets of other nodes on the newly added unknown server assets are counted, the newly added unknown server assets are displayed on a distribution diagram in the same mode, and the server assets belonging to which type of server assets the unknown assets belong are automatically determined through the judgment of the distance between the newly added unknown server assets and different centroid points, such as the server asset class attribution model judgment shown in fig. 13.

And the cluster attribution judging module is used for training the flow packet capturing information of the known system cluster in real time to obtain a model through a machine learning algorithm of multivariate fitting, so that the cluster automatic judgment of the newly added unknown node is realized.

In particular, the cluster attribution determination module may run the steps implementing the methods of fig. 14 and 15.

The number of the interactive flow packets of the servers outside the cluster and the servers inside the cluster is respectively counted through a multivariate fitting algorithm, then the average value is taken, the fitting curve f (x) of the known cluster is calculated through the multivariate fitting algorithm, and a cluster multivariate fitting curve as shown in figure 16 can be trained.

In the same way, a fitted curve for all clusters in the system can be obtained, as shown in fig. 17.

For the newly added unknown server assets, the IP of the server assets, the number of the associated servers and the number of the traffic packets are counted, a fitting curve fw (x) is calculated, the fitting curve is compared with a fitting curve f1(x) of the known cluster, and a fitting coefficient is calculated, wherein the larger the fitting coefficient is, the closer the fitting curve is to which curve is, that is, which cluster is to be attributed is shown in fig. 18.

The asset identification system provided by the embodiment of the invention automatically captures asset information, compares the asset information with the asset database, checks unknown asset information, classifies and identifies the unknown assets according to system attribution, asset attribution, cluster attribution and the like through a machine learning algorithm, and updates the unknown assets into the system topological graph. The method may be specifically used for implementing the above method embodiment, and this embodiment is not described again.

The asset identification system provided by the embodiment has at least the following technical effects:

the system to which the unknown assets belong, the type of assets in the system and the cluster in the assets are automatically judged through a machine learning algorithm, so that the process of manual confirmation is reduced, and the accuracy of data updating is improved.

Referring to fig. 20, the embodiment of the present invention provides a computer device, which includes a memory (memory)201, a processor (processor)202, a bus 203, and a computer program stored in the memory 201 and running on the processor. The processor 201 and the memory 202 complete communication with each other through the bus 203.

Optionally, the computer device may further include a communication Interface 204, where the communication Interface 204 is used for information transmission between the device and other communication devices.

The processor 201 is used to call the program instructions in the memory 202 to implement the method of fig. 1-2 when executing the program, and also implement the following method:

the first space has a predetermined logistic regression curve, the logistic regression curve corresponds to a preset system type, and is determined according to first sample transmission data of the system, and specifically, the logistic regression curve is determined by: acquiring first sample transmission data of a known asset in a preset time period, wherein the first sample transmission data comprises the asset quantity and the data packet quantity transmitted by the known asset; and training the first sample transmission data by adopting a logistic regression algorithm, and determining the logistic regression curve corresponding to a preset system type.

In another embodiment, the processor, when executing the program, implements the method of fig. 6-7, and further implements the method of:

the second space has at least one predetermined centroid, the centroid corresponds to a preset asset type, and is determined according to second sample transmission data of the asset, specifically: obtaining second sample transmission data of the known assets in a preset time period, wherein the second data comprises the number of data packets transmitted with the known assets in the system and the application user types of the known assets; and training the second sample transmission data by adopting a k-means clustering algorithm, and determining the centroid corresponding to the preset asset type.

In another embodiment, the processor, when executing the program, implements the method of fig. 14-15, and further implements the method of:

the third space has a predetermined multivariate fit curve, the multivariate fit curve corresponds to a preset cluster type, and is determined according to third sample transmission data of the cluster, specifically: acquiring third sample transmission data of the known assets in a preset time period, wherein the third sample transmission data comprises the quantity of the assets transmitted by the known assets and the assets outside the cluster, and the quantity of transmitted data packets; and training the third sample transmission data by adopting a multivariate fitting algorithm, and determining the multivariate fitting curve corresponding to a preset cluster type.

The computer device provided in this embodiment may be configured to execute the program corresponding to the method in the foregoing method embodiment, and this implementation is not described again.

The computer device provided by the embodiment at least has the following technical effects:

when the processor executes the program, the position of the first transmission data in the first space is obtained according to the first transmission data of the assets to be identified, and the system type of the assets to be identified is determined according to the corresponding relation between the system type and the position of the first sample transmission data in the first space, so that the system type to which the assets belong can be identified automatically and intelligently, and the working efficiency is improved.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the embodiments of the present invention, and not to limit the same; although embodiments of the present invention have been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An asset identification method, comprising:

acquiring first transmission data of assets to be identified in a preset time period, wherein the first transmission data comprise the asset quantity and the data packet quantity which are subjected to data transmission with the assets to be identified;

mapping the first transmission data to a first space, and determining the position of the first transmission data in the first space;

and determining the system type of the to-be-identified asset corresponding to the first transmission data according to the position of the first transmission data in the first space and the corresponding relation between the predetermined system type and the position of the first sample transmission data in the first space.

2. The method of claim 1, wherein: the first space is provided with a predetermined logistic regression curve, the logistic regression curve corresponds to a preset system type and is determined according to first sample transmission data of the system;

correspondingly, the determining, according to the position of the first transmission data in the first space and the corresponding relationship between the predetermined system type and the position of the first sample transmission data in the first space, the system type of the asset to be identified corresponding to the first transmission data is specifically:

and determining whether the assets to be identified corresponding to the first transmission data belong to the system or not according to the relative position of the first transmission data in the first space and the logistic regression curve.

3. The method of claim 2, wherein: the first space has a predetermined logistic regression curve, the logistic regression curve corresponds to a preset system type and is determined according to first sample transmission data of the system, and the method specifically comprises the following steps:

acquiring first sample transmission data of a known asset in a preset time period, wherein the first sample transmission data comprises the asset quantity and the data packet quantity transmitted by the known asset;

and training the first sample transmission data by adopting a logistic regression algorithm, and determining the logistic regression curve corresponding to a preset system type.

4. A method according to claim 1 or 2 or 3, characterized in that: after determining the system type of the asset to be identified corresponding to the first transmission data, the method further includes:

acquiring second transmission data of assets to be identified in a preset time period, wherein the second transmission data comprise the number of data packets transmitted between the assets to be identified and the assets of the system and the application user types of the assets in the system;

mapping the second transmission data to a second space, and determining the position of the second transmission data in the second space;

and determining the asset type of the asset to be identified corresponding to the second transmission data according to the position of the second transmission data in the second space and the corresponding relation between the predetermined asset type and the position of the second sample transmission data in the second space.

5. The method of claim 4, wherein: the second space has at least one predetermined center of mass corresponding to a preset asset type and determined from second sample transmission data of the asset;

correspondingly, according to the position of the second transmission data in the second space and the corresponding relationship between the predetermined asset type and the position of the second sample transmission data in the second space, determining the asset type of the asset to be identified corresponding to the second transmission data, specifically:

and taking the asset type corresponding to the centroid closest to the second space distance of the second transmission data as the asset type of the asset to be identified.

6. The method of claim 5, wherein: the second space has at least one predetermined centroid, which corresponds to a preset asset type and is determined according to second sample transmission data of the asset, specifically:

acquiring second sample transmission data of the known assets in a preset time period, wherein the second transmission data comprises the number of data packets transmitted with the known assets in the system and the application user types of the known assets;

and training the second sample transmission data by adopting a k-means clustering algorithm, and determining the centroid corresponding to the preset asset type.

7. The method of claim 4, wherein: after determining the asset type of the asset to be identified corresponding to the second sample transmission data, the method further includes:

acquiring third transmission data of assets to be identified in a preset time period, wherein the third transmission data comprise the number of data packets transmitted between the assets to be identified and the assets of the same asset type and the identification of the assets in the system;

mapping the third transmission data to a third space, and determining the position of the third transmission data in the third space;

and determining the cluster type of the to-be-identified asset corresponding to the third transmission data according to the position of the third transmission data in the third space and the corresponding relation between the predetermined cluster type and the position of the third sample transmission data in the third space.

8. The method of claim 7, wherein: the third space is provided with a predetermined multivariate fitting curve, the multivariate fitting curve corresponds to a preset cluster type and is determined according to third sample transmission data of the cluster;

correspondingly, the third transmission data are mapped to a third space, and the position of the third transmission data in the third space is determined, specifically, a multivariate fit curve of the third transmission data in the third space is obtained;

determining a fitting coefficient of a multivariate fitting curve of the third transmission data and a predetermined multivariate fitting curve;

and taking the cluster type corresponding to the multivariate fitting curve with the maximum fitting coefficient as the cluster type of the assets to be identified.

9. The method of claim 8, wherein: the third space has a predetermined multivariate fit curve, the multivariate fit curve corresponds to a preset cluster type, and is determined according to third sample transmission data of the cluster, specifically:

acquiring third sample transmission data of the known assets in a preset time period, wherein the third sample transmission data comprises the quantity of the assets transmitted by the known assets and the assets outside the cluster, and the quantity of transmitted data packets;

and training the third sample transmission data by adopting a multivariate fitting algorithm, and determining the multivariate fitting curve corresponding to a preset cluster type.

10. A computer device comprising a memory, a processor, a bus and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-9 when executing the program.