CN114139629A

CN114139629A - Self-guided mixed data representation learning method and system based on metric learning

Info

Publication number: CN114139629A
Application number: CN202111463166.3A
Authority: CN
Inventors: 蹇松雷; 黄辰林; 谭郁松; 李宝; 董攀; 丁滟; 任怡; 王晓川; 张建锋; 谭霜
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2021-12-02
Filing date: 2021-12-02
Publication date: 2022-03-04

Abstract

The invention discloses a metric learning-based self-guided mixed data characterization learning method and a system, the method comprises the step of alternately training two mutually coupled three-layer neural networks of a P guiding machine and a C guiding machine, and each round of alternate training comprises the following steps: and calculating the representation corresponding to the triple in the P bootstrap, calculating the guide information according to the obtained representation, inputting the guide information into the C bootstrap to guide the training of the C bootstrap, updating the parameters in the C bootstrap, calculating the representation corresponding to the triple in the C bootstrap, transmitting the calculated guide information into the P bootstrap to guide the training of the P bootstrap according to the representation in the C bootstrap, and updating the parameters in the P bootstrap. The invention can not only reflect the coupling relation between the characteristics from the characteristic level and learn the mixed data representation containing the coupling relation of the discrete characteristics and the continuous characteristics, but also effectively reflect the difference between the data objects and realize the distinction between the data objects through a mutual learning mechanism.

Description

Self-guided mixed data representation learning method and system based on metric learning

Technical Field

The invention belongs to the field of computer data science, and particularly relates to a metric learning-based self-guided mixed data representation learning method and system.

Background

The existing network intrusion detection method adopts a mode of directly splicing continuous features and converted discrete features aiming at mixed data, and adopts a simple one-hot coding mode of converting the discrete features, namely setting each appeared feature value as 1 and setting other values as 0. The coding mode (1) ignores the heterogeneous correlation relationship between the discrete characteristic and the continuous characteristic; (2) the correlation between discrete features is ignored. Hybrid data, which refers to attribute data containing discrete features and continuous features, is a common type of data. The learning of the characterization for the mixed data is very important for the subsequent machine learning task, and the characterization of the mixed data is also challenging due to the heterogeneity between features. In order to solve the characterization problem of mixed data, a neural network and metric learning are introduced into a characterization model, so that the characterization which is more suitable for a subsequent clustering algorithm is learned. Although mixed data is widespread in the real world, little research has been done on the characterization of mixed data. At the feature level, a good characterization should capture heterogeneous coupling relationships (e.g., complex interactions and dependencies) between discrete features and continuous features. At the data object level, a good representation should be able to distinguish objects well, thereby facilitating the subsequent learning tasks (e.g., clustering, classification, etc.). However, most of the existing characterization methods only focus on the relationship of the feature level, and ignore the distinctiveness of the object level.

Most existing methods of hybrid data characterization ignore or partially ignore the heterogeneous relationships between discrete and continuous features. For example, k-prototype quantifies the relationship between mixed data objects by computing Euclidean distances between consecutive features and Hamming distances between discrete features. This approach treats the individual features as independent of each other, while ignoring the correlation between features. Other methods convert continuous features into discrete features by discretizing the continuous features and then using discrete feature processing methods to compute the correlation between features. For example, the mADD uses an isometric discretization method (i.e., a continuous value in a certain interval is replaced by a discrete value) to convert a continuous feature into a discrete feature, then models the relationship between the continuous and discrete features, and introduces a weight parameter to control the importance of each feature. Both SpectraLCAT and CopledMC use k-means clustering to convert continuous features into discrete features, using the cluster labels as new discrete features. Because these methods process mixed data based on continuous feature discretization, they cannot directly capture the distribution of continuous features, resulting in information loss. Some model-based approaches attempt to capture heterogeneous coupling relationships by transforming the data space. For example, EGMCM transforms mixture features into an ordering space, learning dependencies between attributes through a Gaussian mixture associative structure (Copula). This approach not only results in loss of information but also fails to capture distinctiveness between attribute objects. In recent years, the rise of neural networks has largely supported characterization learning. For example, an automatic coding machine (auto encoder) is a typical fully-connected neural network model, and the fully-connected network structure can capture the correlation between features to a certain extent, so that the correlation between continuous features is captured by using the automatic coding machine. However, these methods focus on the representation of the feature level only, and do not enhance the distinctiveness between objects.

Disclosure of Invention

The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the invention provides a self-guided mixed data representation learning method and system based on metric learning, which not only can reflect the coupling relation between features from a feature level and learn the mixed data representation containing the coupling relation between discrete features and continuous features, but also can effectively reflect the difference between data objects and can realize the distinction between the data objects through a mutual learning mechanism.

In order to solve the technical problems, the invention adopts the technical scheme that:

a self-guided mixed data characterization learning method based on metric learning comprises the following steps of alternately training two mutually coupled three-layer neural networks of a P guiding machine and a C guiding machine:

1) generating two coding spaces for a P guiding machine and a C guiding machine, and initializing network parameters;

2) performing multiple rounds of alternating training for the P and C leaders, and each round of alternating training comprises: constructing a triple serving as input data, calculating the representation corresponding to the triple in the P leader, calculating the guide information of the C leader according to the obtained representation, inputting the guide information of the C leader into the C leader to guide the training of the C leader, updating the parameters in the C leader, further calculating the representation corresponding to the triple in the C leader, calculating the guide information of the P leader according to the representation in the C leader, transmitting the guide information of the P leader into the P leader to guide the training of the P leader, and updating the parameters in the P leader.

Optionally, the step 1) of generating two coding spaces for the P and C booters means generating a naive coding space F of the P booter^pCoupled code space F of C guide machine^cSaid plain code space F^pThe system is used for converting discrete features in the data object into one-hot representation and splicing the one-hot representation with continuous features in the data object to obtain a naive encoding vector of the data object; the coupled coding space F^cThe method is used for generating a correlation matrix from correlation among all discrete features and continuous features in the data object and then generating the correlation matrixAnd taking the data vector obtained by the flattening operation as a coupling encoding vector of the data object.

Optionally, the calculation function expression of the correlation between the discrete features and the continuous features is as follows:

in the above formula, the first and second carbon atoms are,

representing discrete features

Continuous characteristic v_jThe correlation relationship between the two components is shown,

representing continuous features

And discrete features v_jτ is a threshold parameter and λ is a scaling factor, wherein the joint density

The formula of the calculation function is:

in the above formula, N is the number of data objects,

as discrete characteristic values

And v_jThe kernel function of (a) to (b),

is a kernel function of a continuous feature,

represents the variable A_iContinuous feature value f on the k-th data object_i，

Represents the variable A_iContinuous feature value f on the x-th data object_i，h_iA bandwidth parameter representing a continuous characteristic; wherein the kernel function

The definition function expression of (1) is:

in the above formula, the first and second carbon atoms are,

representing discrete features v_jAnd λ is a proportionality coefficient in the corresponding characteristic value on the kth data object.

Optionally, the three layers of neural networks of the P-bootstrap include:

an input layer for inputting triplets<x，x_i，x_j>；

A characterization layer for computing triplets<x，x_i，x_j>The corresponding representation of each element in the group to obtain a representation group

Automatic metric learning layer for computing token groups

The characteristic binary group h,

Measure of distance between

And characterizing the dyads h,

Measure of distance between

And C guidance information delta of the leader^c；

The three-layer neural network of the C guiding machine respectively comprises:

an input layer for inputting triplets<x，x_i，x_j>；

Automatic metric learning layer for computing token groups

The characteristic binary group h,

Measure of distance between

And characterizing the dyads h,

Measure of distance between

And guidance information δ of the P leader^p。

Optionally, the characterization layer of the P-bootstrap calculates the triplets<x，x_i，x_j>The function expression of the corresponding representation of each element in the table is as follows:

h^p＝σ(f^pW₁)，

in the above formula, h^pFor the characterization corresponding to this element, σ is the logistic function, f^pIs a naive encoding vector, W, of the element₁Is the weight of the element;

the characterization layer calculation triplets of the C-boot<x，x_i，x_j>The function expression of the corresponding representation of each element in the table is as follows:

h^c＝σ(f^cW₂)，

in the above formula, h^cFor the characterization corresponding to this element, σ is the logistic function, f^cFor a coupled coded vector of the element, W₂Is the weight of the element.

Optionally, the automated metric learning layer of the P-bootstrap calculates a set of tokens

The characteristic binary group h,

Measure of distance between

And characterizing the dyads h,

Measure of distance between

The functional expression of (a) is:

in the above formula, the first and second carbon atoms are,

representing the characteristic binary group h,

Measure of distance between

Or characterizing the doublet h,

Measure of distance between

h^p，

For the calculated token doublet, W₃Is a learning parameter;

computing characterization groups of the automatic metric learning layer of the C-bootstrap

The characteristic binary group h,

Measure of distance between

And characterizing the dyads h,

Measure of distance between

The functional expression of (a) is:

in the above formula, the first and second carbon atoms are,

representing the characteristic binary group h,

Measure of distance between

Or characterizing the doublet h,

Measure of distance between

h^c，

For the calculated token doublet, W₄Is a learning parameter;

the automatic measurement learning layer of the P guiding machine calculates the guiding information delta of the C guiding machine^cAnd calculating guiding information delta of the P guiding machine by the automatic metric learning layer of the C guiding machine^pThe functional expression of (a) is:

in the above formula, δ_h(h_i，h_j) Guidance information δ representing C guide^cOr guidance information δ of the P-boot^p，<h，h_i，h_j>Representing the characterization group calculated by the automatic metric learning layer of the bootstrap, and d is a distance function.

Optionally, when the guidance information of the C-boot apparatus is input into the C-boot apparatus in step 2) to guide the training of the C-boot apparatus, the loss function adopted is:

in the above formula, the first and second carbon atoms are,

representing the loss function employed to train the C leader,<x，x_i，x_j>for the three-tuple of the input,

representing distance magnitude relationship in C leader

Guidance information with respect to P-boot

The log probability of (a) of (b),

and

indicating guidance information delta of P leader^p，

To characterize the binary group h,

A measure of the distance between the two,

to characterize the binary group h,

σ is a logistic function.

Optionally, when the guidance information of the P-boot apparatus in step 2) is transmitted to the P-boot apparatus to guide the training of the P-boot apparatus, the loss function adopted is:

in the above formula, the first and second carbon atoms are,

representing the loss function employed to train the P leader,<x，x_i，x_j>for the three-tuple of the input,

representing distance magnitude relationship in P-booter

Guidance information with respect to a C leader

The log probability of (a) of (b),

and

guidance information δ representing C guide^c，

To characterize the binary group h,

A measure of the distance between the two,

to characterize the binary group h,

Measure of distance between

σ is a logistic function.

In addition, the invention also provides a feature extraction method for network intrusion data, which comprises the following steps: collecting network behavior data comprising discrete features and continuous features; and inputting the network behavior data containing discrete features and continuous features into a P leader and a C leader which finish training by adopting the metric learning-based self-guided mixed data characterization learning method to obtain the network behavior features corresponding to the network behavior data.

In addition, the invention also provides a metric learning-based self-guided hybrid data characterization learning system, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the steps of the metric learning-based self-guided hybrid data characterization learning method or the steps of the network intrusion data-specific feature extraction method.

Furthermore, the present invention also provides a computer readable storage medium, in which a computer program programmed or configured to execute the metric learning-based self-guided hybrid data characterization learning method or the feature extraction method for network intrusion data is stored.

Compared with the prior art, the invention has the following advantages: the existing network intrusion detection method adopts a mode of directly splicing continuous features and converted discrete features aiming at mixed data, and adopts a simple one-hot coding mode of converting the discrete features, namely setting each appeared feature value as 1 and setting other values as 0. The coding mode (1) ignores the heterogeneous correlation relationship between the discrete characteristic and the continuous characteristic; (2) the correlation between discrete features is ignored. Aiming at the characteristic that the existing mixed data characterization cannot effectively capture the complex coupling relation between different types of features, the invention provides a self-guided characterization learning mechanism based on a complementary coding mode, which can strengthen the relation between attribute objects and is realized as a new characterization learning method by metric learning, wherein the self-guided characterization learning model consists of two mutually cooperative guiding machines. A bootstrap machine infers the distance relationship of a triple through pairwise similarity based on a naive coding space, and then the distance relationship of the triple is used as guidance information and input into another bootstrap machine based on a coupled coding space for metric learning. Similarly, the triplet distance relationship generated by the bootstrap is also used as guiding information to input into the original bootstrap for metric learning. In the process of the interaction and automatic learning, the two guiding machines continuously improve mutual consensus degree, so that a stable state is achieved. Finally, the self-guiding opportunity learns a representation which can effectively distinguish the data objects, and on one hand, the self-guiding opportunity learns a mixed data representation containing discrete features and continuous feature coupling relations; on the other hand, the distinction between the data objects can be realized through a mutual learning mechanism.

Drawings

Fig. 1 is a schematic flow chart of alternate training performed in the embodiment of the present invention.

FIG. 2 shows a naive encoding space F in an embodiment of the invention^pCoupled code space F of C guide machine^cSchematic view of the structure of (1).

Fig. 3 is a diagram of a neural network architecture formed by a P-bootstrap and a C-bootstrap in an embodiment of the present invention.

Detailed Description

The invention will be described in further detail below with reference to the accompanying drawings and specific embodiments.

As shown in fig. 1, the method for learning the metric learning-based self-guided hybrid data representation in this embodiment includes the steps of performing alternate training on two mutually coupled three-layer neural networks, namely, a P-leader and a C-leader:

The data embedded learning is mainly responsible for integrating the continuous features and the discrete features learned by CDRL to form a complete representation of data, and the discrete features and the continuous features can be simultaneously mapped to the same continuous space in such a way, and the coupling relation between the features can be learned. The naive coding space converts the discrete data into 0-1 one-hot representation, and then the continuous features are spliced to obtain the naive code of each data object. The coupling code describes the coupling relation between the discrete characteristic and the continuous characteristic, firstly, a correlation relation matrix of all discrete characteristic values and continuous data is generated, namely, each data object generates a data matrix, and then, a data vector is obtained by flattening operation, namely, splicing each row of the matrix, namely, the coupling code of one data object. As shown in fig. 2, the generation of two coding spaces for the P and C directors in step 1) of this embodiment means that a naive coding space F of the P bootstrap is generated^pCoupled code space F of C guide machine^cPlain code space F^pThe system is used for converting discrete features in the data object into one-hot representation and splicing the one-hot representation with continuous features in the data object to obtain a naive encoding vector of the data object; coupled coding space F^cAnd the method is used for generating a correlation relation matrix according to the correlation relation between all discrete features and continuous features in the data object, and then using a data vector obtained by flattening the correlation relation matrix as a coupling encoding vector of the data object. Plain coding space F by P-bootstrap in this embodiment^pCoupled code space F of C guide machine^cThe construction of two complementary coding spaces allows each data object to be coded from the original table of information into two vectors, called na iotave-coded vectors and coupled-coded vectors. The two coding spaces describe the same object from different angles based on different assumptions. In the naive coding space, each feature is treated as equivalent and independent of each otherThe variable of (2). The na iotave encoding vector distinguishes between discrete and continuous features. However, in the coupled coding space, the continuous features are considered to be highly correlated with the discrete features, and the coupled coding vector can be constructed by estimating the joint probability density of any one continuous feature and any one discrete feature.

Plain coding space F^pThe method is characterized by comprising the converted discrete features and the converted continuous features, and the most complete information in the original information table is contained. We use one-hot encoding to convert discrete variables into binary features. Each binary feature has a unique 1 corresponding to a discrete variable value, and the others are all 0. After splicing the converted discrete features to the original continuous features, a new naive code is formed, and the final naive code vector can be expressed as:

wherein d is_nIs the dimension of the continuous characteristic, | | is the number of discrete characteristic values.

Coupled coding space F^cThe data object is formed by splicing coupling coding matrixes of all data objects. In a coupled coding matrix, the rows represent continuous features and the columns represent discrete feature values. Each position in the matrix is estimated from the joint probability density of feature doublets of mixed types, which quantifies the interaction between discrete and continuous features. For each feature doublet, i.e. one discrete feature and one continuous feature

We consider it as two variable sets, i.e.<A_i，V_j>And then estimates its density using the product kernel. So from the variable A_iContinuous characteristic value of

And from variable V_iIs measured by the discrete characteristic value v_jHas a combined density of

Therefore, in this embodiment, the calculation function expression of the correlation between the discrete features and the continuous features is:

in the above formula, the first and second carbon atoms are,

representing discrete features

representing continuous features

The formula of the calculation function is:

in the above formula, N is the number of data objects,

as discrete characteristic values

And v_jThe kernel function of (a) to (b),

is a kernel function of a continuous feature,

The definition function expression of (1) is:

in the above formula, the first and second carbon atoms are,

For the kernel function of continuous feature, a gaussian kernel is used as the kernel function of continuous variable in this embodiment.

After we have derived the density estimates of any two discrete variable and continuous variable doublets, we define the coupling encoding matrix M for the data object x_xThe following were used:

wherein the content of the first and second substances,

representing a correlation between discrete features, continuous features, i.e.

Is derived from the density estimate of the mixed dyad, reflecting the interaction between the continuous and discrete variables.

In this embodiment, a neural network formed by the P-leader and the C-leader is referred to as MAI (metal-based auto indicator), and the MAI is composed of the P-leader and the C-leader in two different coding spaces. The P and C booters are coupled to each other. The first layer of the model is a set of two eigenvectors of a triplet, the two eigenvectors are encoded from the mixed data, and the coding spaces in the two directors are called a naive coding space and a coupled coding space, respectively. The second layer is a characterization layer that updates the characterization of this layer by a distance metric optimization function from the previous layer. The third layer is an automatic metric learning layer that can enhance the distinguishing information between data objects through the triplet distance relationship and provide guidance information for another leader. In the training process, firstly generating two coding spaces, initializing parameters in two guiding machines, and constructing input data in a small batch parameter updating mode, wherein each group of input data comprises a plurality of triples; and then, calculating the representation corresponding to the triplet in the P leader, calculating according to the obtained representation to obtain guidance information, inputting the guidance information into the C leader for training, updating the parameters in the C leader to obtain the representation corresponding to the triplet in the C leader, calculating according to the representation in the C leader to obtain new guidance information, transmitting the guidance information into the P leader to guide the training of the P leader, and finally obtaining a stable parameter result according to the alternate training mode. And splicing the finally obtained data representations in the two guiding machines to obtain the representation of the final mixed data object. As shown in fig. 3, the three layers of neural networks of the P-bootstrap in this embodiment respectively include:

an input layer for inputting triplets<x，x_i，x_j>；

Automatic metric learning layer for computing token groups

The characteristic binary group h,

Measure of distance between

And characterizing the dyads h,

Measure of distance between

And C guidance information delta of the leader^c；

The three-layer neural network of the C guiding machine respectively comprises:

an input layer for inputting triplets<x，x_i，x_j>；

Automatic metric learning layer for computing token groups

The characteristic binary group h,

Measure of distance between

And characterizing the dyads h,

Measure of distance between

And guidance information δ of the P leader^p。

Triple of calculation of characterization layer of P-bootstrap in this embodiment<x,x_i,x_j>The function expression of the corresponding representation of each element in the table is as follows:

h^p＝σ(f^pW₁)，

in the above formula, h^pFor the characterization corresponding to this element, σ is the logistic function, f^pIs a naive encoding vector, W, of the element₁Is the weight of the element; the features obtained from the na iotave coding space are independent and not related, where the logistic function defines the functional expression as: σ ═ 1/(1+ e)^-z) Where z is an independent variable, in order to capture the coupling relationship between features, the present embodiment uses the characterization layer of the P-bootstrap to encode the vector f^pConversion into a token vector h of length K over a fully-connected network^p。

Triple calculation by the characterization layer of the C-director in this embodiment<x,x_i,x_j>The function expression of the corresponding representation of each element in the table is as follows:

h^c＝σ(f^cW₂)，

in the above formula, h^cFor the characterization corresponding to this element, σ is the logistic function, f^cFor a coupled coded vector of the element, W₂Is the weight of the element. In this embodiment, the C-director is used to couple the coded vector f^cMapped as a token vector h of length J over another fully-connected network^c. Through two fully-connected networks, the feature vectors of two coding spaces are respectively mapped into a new characterization vector h^pAnd h^c. At this stage, the token vector captures only the coupling at the feature level, similar to an automatic coding machine. To enhance data objects in representation h^pAnd h^cThe above distinctiveness, the present embodiment introduces an automatic metric learning layer, and then none is used through this layerThe bounded objective function achieves optimization of the characterization.

In this embodiment, the automatic metric learning layer of the P-boot machine calculates the characterization group

The characteristic binary group h,

Measure of distance between

And characterizing the dyads h,

Measure of distance between

The functional expression of (a) is:

in the above formula, the first and second carbon atoms are,

representing the characteristic binary group h,

Measure of distance between

Or characterizing the doublet h,

Measure of distance between

h^p,

For the calculated token doublet, W₃Is a learning parameter;

automatic metric learning layer calculation characterization group of C guiding machine

The characteristic binary group h,

Measure of distance between

And characterizing the dyads h,

Measure of distance between

The functional expression of (a) is:

in the above formula, the first and second carbon atoms are,

representing the characteristic binary group h,

Measure of distance between

Or characterizing the doublet h,

Measure of distance between

h^c,

For the calculated token doublet, W₄Is a learning parameter;

calculating guiding information delta of the C guiding machine by the automatic measurement learning layer of the P guiding machine^cAnd calculating guiding information delta of the P guiding machine by the automatic metric learning layer of the C guiding machine^pThe functional expression of (a) is:

In this embodiment, for the P-boot engine of the naive encoding space, data objects x and x_iDeriving distance measures of the two by their characterization

Similarly, for a C-boot machine coupled to the code space, data objects x and x_iDeriving distance measures of the two by their characterization

Given a reference data object x and two comparison data objects x_iAnd x_jWe can easily calculate the distance measure between them according to the above definition. In the conventional metric learning method, we need to provide a metric sequential relationship of data object doublets, which is generally given by a label. But in unsupervised learning we do not have distance labels. To solve this problem, we design a bootstrap process to obtain the distance magnitude relationship of two pairs of duplets, so as to provide guidance information for the bootstrap. We define a binary function delta_hTo represent the distance magnitude relationship of two tuples formed by the representation of a triplet delta_h(h_i，h_j)：

In the above formula, δ_h(h_i，h_j) Guidance information δ representing C guide^cOr guidance information δ of the P-boot^p，<h，h_i,h_j>And d is a distance function, such as Euclidean distance, cosine distance and the like. Here we use the following distance function: d (h, h)_i)＝||h-h_i||₂。

Given a triplet<x,x_i,x_j>C guide machine calculates a distance size relation

And guiding the metric learning process of the P bootstrap. Therefore, the distance magnitude relationship in the P-boot

With respect to the guidance information

The conditional probabilities in the two directors with respect to the triplets can thus be combined in a multi-objective loss function. In this embodiment, when the guidance information of the C leader is input into the C leader in step 2) to guide the training of the C leader, the loss function adopted is:

in the above formula, the first and second carbon atoms are,

representing the loss function employed to train the C leader,<x,x_i，x_j>for the three-tuple of the input,

representing distance magnitude relationship in C leader

Guidance information with respect to P-boot

The log probability of (a) of (b),

and

indicating guidance information delta of P leader^p，

To characterize the binary group h,

A measure of the distance between the two,

to characterize the binary group h,

σ is a logistic function.

In this embodiment, when the guidance information of the P-boot apparatus in step 2) is transmitted to the P-boot apparatus to guide the training of the P-boot apparatus, the loss function adopted is:

in the above formula, the first and second carbon atoms are,

representing the loss function employed to train the P leader,<x,x_i,x_j>for the three-tuple of the input,

representing distance magnitude relationship in P-booter

Guidance information with respect to a C leader

The log probability of (a) of (b),

and

guidance information δ representing C guide^c，

To characterize the binary group h,

A measure of the distance between the two,

to characterize the binary group h,

Measure of distance between

σ is a logistic function. This form of the loss function is common to all of the above-mentioned hungse a variant of the loss function, i.e. a probabilistic version of infinite boundary (infinite margin) loss function is used. Therefore, in the embodiment, a metric learning objective function based on an infinite boundary is constructed in the layer, and the boundary of the data object of the characterization layer can be enhanced by optimizing the objective function. Based on the objective optimization function, parameter estimation can be obtained in a gradient descent mode.

The existing network intrusion detection method adopts a mode of directly splicing continuous features and converted discrete features aiming at mixed data, and adopts a simple one-hot coding mode of converting the discrete features, namely setting each appeared feature value as 1 and setting other values as 0. The coding mode (1) ignores the heterogeneous correlation relationship between the discrete characteristic and the continuous characteristic; (2) the correlation between discrete features is ignored. In view of the problems of the existing methods, the present embodiment also provides a method for extracting features from network intrusion data, including: collecting network behavior data comprising discrete features and continuous features; the method comprises the steps of inputting network behavior data containing discrete features and continuous features into a P leader and a C leader which finish training by adopting the metric learning-based self-guided mixed data characterization learning method to obtain network behavior features corresponding to the network behavior data, learning network intrusion behavior characterization with higher distinctiveness by the metric learning-based self-guided mixed data characterization learning method, and inputting the network behavior features corresponding to the network behavior data into a preset network intrusion detection model after the network behavior features corresponding to the network behavior data are obtained, so that a more accurate network intrusion detection result can be obtained.

In addition, the present embodiment also provides a metric learning-based self-guided hybrid data characterization learning system, which includes a microprocessor and a memory connected to each other, where the microprocessor is programmed or configured to execute the steps of the metric learning-based self-guided hybrid data characterization learning method or the steps of the feature extraction method for network intrusion data.

Furthermore, the present embodiment also provides a computer-readable storage medium, in which a computer program programmed or configured to execute the foregoing metric learning-based self-guided hybrid data characterization learning method or the foregoing feature extraction method for network intrusion data is stored.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims

1. A self-guided hybrid data characterization learning method based on metric learning is characterized by comprising the following steps of alternately training two mutually coupled three-layer neural networks of a P guiding machine and a C guiding machine:

2. The metric learning-based self-guided hybrid data characterization learning method according to claim 1, wherein the generation of two coding spaces for the P and C guides in step 1) is to generate a naive coding space F of the P guide^pCoupled code space F of C guide machine^cSaid plain code space F^pThe system is used for converting discrete features in the data object into one-hot representation and splicing the one-hot representation with continuous features in the data object to obtain a naive encoding vector of the data object; the coupled code spaceM F^cAnd the method is used for generating a correlation relation matrix according to the correlation relation between all discrete features and continuous features in the data object, and then using a data vector obtained by flattening the correlation relation matrix as a coupling encoding vector of the data object.

3. The metric learning-based self-guided hybrid data characterization learning method according to claim 2, wherein the calculation function expression of the correlation between the discrete features and the continuous features is as follows:

in the above formula, the first and second carbon atoms are,

representing discrete features

representing continuous features

The formula of the calculation function is:

in the above formula, N is the number of data objects,

as discrete characteristic values

And v_jThe kernel function of (a) to (b),

is a kernel function of a continuous feature,

The definition function expression of (1) is:

in the above formula, the first and second carbon atoms are,

4. The metric learning-based self-guided hybrid data characterization learning method according to claim 1, 2 or 3, wherein the three-layer neural network of the P-guided machine respectively comprises:

an input layer for inputting triplets<x，x_i，x_j>；

Automatic metric learning layer for computing token groups

The characteristic binary group h,

Measure of distance between

And characterizing the dyads h,

Measure of distance between

And C guidance information delta of the leader^c；

The three-layer neural network of the C guiding machine respectively comprises:

an input layer for inputting triplets<x，x_i，x_j>；

Automatic metric learning layer for computing token groups

The characteristic binary group h,

Measure of distance between

And characterizing the dyads h,

Measure of distance between

And guidance information δ of the P leader^p。

5. The metric learning-based self-guided hybrid data representation learning method of claim 4, wherein the representation layer of the P-guide computes triplets<x，x_i，x_j>The function expression of the corresponding representation of each element in the table is as follows:

h^p＝σ(f^pW₁)，

h^c＝σ(f^cW₂)，

6. The metric learning-based self-guided hybrid data characterization learning method according to claim 4, wherein the automatic metric learning layer of the P-guide machine calculates the characterization group

The characteristic binary group h,

Measure of distance between

And characterizing the dyads h,

Measure of distance between

The functional expression of (a) is:

in the above formula, the first and second carbon atoms are,

representing the characteristic binary group h,

Measure of distance between

Or characterizing the doublet h,

Measure of distance between

h^p，

For the calculated token doublet, W₃Is a learning parameter;

The characteristic binary group h,

Measure of distance between

Positive characteristic of binary group h,

Measure of distance between

The functional expression of (a) is:

in the above formula, the first and second carbon atoms are,

representing the characteristic binary group h,

Measure of distance between

Or characterizing the doublet h,

Measure of distance between

h^c，

For the calculated token doublet, W₄Is a learning parameter;

the automatic measurement learning layer of the P guiding machine calculates the guiding information delta of the C guiding machine_cAnd calculating guiding information delta of the P guiding machine by the automatic metric learning layer of the C guiding machine^pThe functional expression of (a) is:

7. The metric learning-based self-guided hybrid data characterization learning method according to claim 1, wherein in the step 2), when the guidance information of the C-guided machine is input into the C-guided machine to guide the training of the C-guided machine, the loss function is:

in the above formula, the first and second carbon atoms are,

representing distance magnitude relationship in C leader

Guidance information with respect to P-boot

The log probability of (a) of (b),

and

guide information 6P indicating the P leader,

to characterize the binary group h,

A measure of the distance between the two,

to characterize the binary group h,

σ is a logistic function; in the step 2), when the guiding information of the P leader is transmitted into the P leader to guide the training of the P leader, the adopted loss function is as follows:

in the above formula, the first and second carbon atoms are,

representing distance magnitude relationship in P-booter

Guidance information with respect to a C leader

The log probability of (a) of (b),

and

guidance information δ representing C guide^c，

To characterize the binary group h,

A measure of the distance between the two,

to characterize the binary group h,

Measure of distance between

σ is a logistic function.

8. A feature extraction method for network intrusion data is characterized by comprising the following steps: collecting network behavior data comprising discrete features and continuous features; inputting network behavior data containing discrete features and continuous features into a P bootstrap and a C bootstrap which are trained by adopting the metric learning-based self-guided mixed data characterization learning method of any one of claims 1-7, and obtaining network behavior features corresponding to the network behavior data.

9. A metric learning-based self-guided hybrid data representation learning system, comprising a microprocessor and a memory connected to each other, wherein the microprocessor is programmed or configured to perform the steps of the metric learning-based self-guided hybrid data representation learning method according to any one of claims 1 to 7 or the steps of the feature extraction method for network intrusion data according to claim 8.

10. A computer-readable storage medium, wherein a computer program programmed or configured to perform the metric learning-based self-guided hybrid data characterization learning method according to any one of claims 1 to 7 or the feature extraction method for network intrusion data according to claim 8 is stored in the computer-readable storage medium.