CN114139629A - Self-guided mixed data representation learning method and system based on metric learning - Google Patents
Self-guided mixed data representation learning method and system based on metric learning Download PDFInfo
- Publication number
- CN114139629A CN114139629A CN202111463166.3A CN202111463166A CN114139629A CN 114139629 A CN114139629 A CN 114139629A CN 202111463166 A CN202111463166 A CN 202111463166A CN 114139629 A CN114139629 A CN 114139629A
- Authority
- CN
- China
- Prior art keywords
- leader
- learning
- data
- distance
- representation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000012512 characterization method Methods 0.000 claims abstract description 67
- 238000012549 training Methods 0.000 claims abstract description 32
- 230000008878 coupling Effects 0.000 claims abstract description 19
- 238000010168 coupling process Methods 0.000 claims abstract description 19
- 238000005859 coupling reaction Methods 0.000 claims abstract description 19
- 238000013528 artificial neural network Methods 0.000 claims abstract description 14
- 230000006870 function Effects 0.000 claims description 67
- 239000013598 vector Substances 0.000 claims description 29
- 125000004432 carbon atom Chemical group C* 0.000 claims description 18
- 230000006399 behavior Effects 0.000 claims description 17
- 238000004364 calculation method Methods 0.000 claims description 11
- 239000011159 matrix material Substances 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 8
- 229910052799 carbon Inorganic materials 0.000 claims description 7
- 238000003860 storage Methods 0.000 claims description 7
- 238000005259 measurement Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 abstract description 4
- 230000000875 corresponding effect Effects 0.000 description 25
- 238000010586 diagram Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 6
- 238000001514 detection method Methods 0.000 description 5
- 230000003993 interaction Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000000295 complement effect Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 241000039077 Copula Species 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a metric learning-based self-guided mixed data characterization learning method and a system, the method comprises the step of alternately training two mutually coupled three-layer neural networks of a P guiding machine and a C guiding machine, and each round of alternate training comprises the following steps: and calculating the representation corresponding to the triple in the P bootstrap, calculating the guide information according to the obtained representation, inputting the guide information into the C bootstrap to guide the training of the C bootstrap, updating the parameters in the C bootstrap, calculating the representation corresponding to the triple in the C bootstrap, transmitting the calculated guide information into the P bootstrap to guide the training of the P bootstrap according to the representation in the C bootstrap, and updating the parameters in the P bootstrap. The invention can not only reflect the coupling relation between the characteristics from the characteristic level and learn the mixed data representation containing the coupling relation of the discrete characteristics and the continuous characteristics, but also effectively reflect the difference between the data objects and realize the distinction between the data objects through a mutual learning mechanism.
Description
Technical Field
The invention belongs to the field of computer data science, and particularly relates to a metric learning-based self-guided mixed data representation learning method and system.
Background
The existing network intrusion detection method adopts a mode of directly splicing continuous features and converted discrete features aiming at mixed data, and adopts a simple one-hot coding mode of converting the discrete features, namely setting each appeared feature value as 1 and setting other values as 0. The coding mode (1) ignores the heterogeneous correlation relationship between the discrete characteristic and the continuous characteristic; (2) the correlation between discrete features is ignored. Hybrid data, which refers to attribute data containing discrete features and continuous features, is a common type of data. The learning of the characterization for the mixed data is very important for the subsequent machine learning task, and the characterization of the mixed data is also challenging due to the heterogeneity between features. In order to solve the characterization problem of mixed data, a neural network and metric learning are introduced into a characterization model, so that the characterization which is more suitable for a subsequent clustering algorithm is learned. Although mixed data is widespread in the real world, little research has been done on the characterization of mixed data. At the feature level, a good characterization should capture heterogeneous coupling relationships (e.g., complex interactions and dependencies) between discrete features and continuous features. At the data object level, a good representation should be able to distinguish objects well, thereby facilitating the subsequent learning tasks (e.g., clustering, classification, etc.). However, most of the existing characterization methods only focus on the relationship of the feature level, and ignore the distinctiveness of the object level.
Most existing methods of hybrid data characterization ignore or partially ignore the heterogeneous relationships between discrete and continuous features. For example, k-prototype quantifies the relationship between mixed data objects by computing Euclidean distances between consecutive features and Hamming distances between discrete features. This approach treats the individual features as independent of each other, while ignoring the correlation between features. Other methods convert continuous features into discrete features by discretizing the continuous features and then using discrete feature processing methods to compute the correlation between features. For example, the mADD uses an isometric discretization method (i.e., a continuous value in a certain interval is replaced by a discrete value) to convert a continuous feature into a discrete feature, then models the relationship between the continuous and discrete features, and introduces a weight parameter to control the importance of each feature. Both SpectraLCAT and CopledMC use k-means clustering to convert continuous features into discrete features, using the cluster labels as new discrete features. Because these methods process mixed data based on continuous feature discretization, they cannot directly capture the distribution of continuous features, resulting in information loss. Some model-based approaches attempt to capture heterogeneous coupling relationships by transforming the data space. For example, EGMCM transforms mixture features into an ordering space, learning dependencies between attributes through a Gaussian mixture associative structure (Copula). This approach not only results in loss of information but also fails to capture distinctiveness between attribute objects. In recent years, the rise of neural networks has largely supported characterization learning. For example, an automatic coding machine (auto encoder) is a typical fully-connected neural network model, and the fully-connected network structure can capture the correlation between features to a certain extent, so that the correlation between continuous features is captured by using the automatic coding machine. However, these methods focus on the representation of the feature level only, and do not enhance the distinctiveness between objects.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the invention provides a self-guided mixed data representation learning method and system based on metric learning, which not only can reflect the coupling relation between features from a feature level and learn the mixed data representation containing the coupling relation between discrete features and continuous features, but also can effectively reflect the difference between data objects and can realize the distinction between the data objects through a mutual learning mechanism.
In order to solve the technical problems, the invention adopts the technical scheme that:
a self-guided mixed data characterization learning method based on metric learning comprises the following steps of alternately training two mutually coupled three-layer neural networks of a P guiding machine and a C guiding machine:
1) generating two coding spaces for a P guiding machine and a C guiding machine, and initializing network parameters;
2) performing multiple rounds of alternating training for the P and C leaders, and each round of alternating training comprises: constructing a triple serving as input data, calculating the representation corresponding to the triple in the P leader, calculating the guide information of the C leader according to the obtained representation, inputting the guide information of the C leader into the C leader to guide the training of the C leader, updating the parameters in the C leader, further calculating the representation corresponding to the triple in the C leader, calculating the guide information of the P leader according to the representation in the C leader, transmitting the guide information of the P leader into the P leader to guide the training of the P leader, and updating the parameters in the P leader.
Optionally, the step 1) of generating two coding spaces for the P and C booters means generating a naive coding space F of the P booterpCoupled code space F of C guide machinecSaid plain code space FpThe system is used for converting discrete features in the data object into one-hot representation and splicing the one-hot representation with continuous features in the data object to obtain a naive encoding vector of the data object; the coupled coding space FcThe method is used for generating a correlation matrix from correlation among all discrete features and continuous features in the data object and then generating the correlation matrixAnd taking the data vector obtained by the flattening operation as a coupling encoding vector of the data object.
Optionally, the calculation function expression of the correlation between the discrete features and the continuous features is as follows:
in the above formula, the first and second carbon atoms are,representing discrete featuresContinuous characteristic vjThe correlation relationship between the two components is shown,representing continuous featuresAnd discrete features vjτ is a threshold parameter and λ is a scaling factor, wherein the joint densityThe formula of the calculation function is:
in the above formula, N is the number of data objects,as discrete characteristic valuesAnd vjThe kernel function of (a) to (b),is a kernel function of a continuous feature,represents the variable AiContinuous feature value f on the k-th data objecti,Represents the variable AiContinuous feature value f on the x-th data objecti,hiA bandwidth parameter representing a continuous characteristic; wherein the kernel functionThe definition function expression of (1) is:
in the above formula, the first and second carbon atoms are,representing discrete features vjAnd λ is a proportionality coefficient in the corresponding characteristic value on the kth data object.
Optionally, the three layers of neural networks of the P-bootstrap include:
an input layer for inputting triplets<x,xi,xj>;
A characterization layer for computing triplets<x,xi,xj>The corresponding representation of each element in the group to obtain a representation group
Automatic metric learning layer for computing token groupsThe characteristic binary group h,Measure of distance betweenAnd characterizing the dyads h,Measure of distance betweenAnd C guidance information delta of the leaderc;
The three-layer neural network of the C guiding machine respectively comprises:
an input layer for inputting triplets<x,xi,xj>;
A characterization layer for computing triplets<x,xi,xj>The corresponding representation of each element in the group to obtain a representation group
Automatic metric learning layer for computing token groupsThe characteristic binary group h,Measure of distance betweenAnd characterizing the dyads h,Measure of distance betweenAnd guidance information δ of the P leaderp。
Optionally, the characterization layer of the P-bootstrap calculates the triplets<x,xi,xj>The function expression of the corresponding representation of each element in the table is as follows:
hp=σ(fpW1),
in the above formula, hpFor the characterization corresponding to this element, σ is the logistic function, fpIs a naive encoding vector, W, of the element1Is the weight of the element;
the characterization layer calculation triplets of the C-boot<x,xi,xj>The function expression of the corresponding representation of each element in the table is as follows:
hc=σ(fcW2),
in the above formula, hcFor the characterization corresponding to this element, σ is the logistic function, fcFor a coupled coded vector of the element, W2Is the weight of the element.
Optionally, the automated metric learning layer of the P-bootstrap calculates a set of tokensThe characteristic binary group h,Measure of distance betweenAnd characterizing the dyads h,Measure of distance betweenThe functional expression of (a) is:
in the above formula, the first and second carbon atoms are,representing the characteristic binary group h,Measure of distance betweenOr characterizing the doublet h,Measure of distance betweenhp,For the calculated token doublet, W3Is a learning parameter;
computing characterization groups of the automatic metric learning layer of the C-bootstrapThe characteristic binary group h,Measure of distance betweenAnd characterizing the dyads h,Measure of distance betweenThe functional expression of (a) is:
in the above formula, the first and second carbon atoms are,representing the characteristic binary group h,Measure of distance betweenOr characterizing the doublet h,Measure of distance betweenhc,For the calculated token doublet, W4Is a learning parameter;
the automatic measurement learning layer of the P guiding machine calculates the guiding information delta of the C guiding machinecAnd calculating guiding information delta of the P guiding machine by the automatic metric learning layer of the C guiding machinepThe functional expression of (a) is:
in the above formula, δh(hi,hj) Guidance information δ representing C guidecOr guidance information δ of the P-bootp,<h,hi,hj>Representing the characterization group calculated by the automatic metric learning layer of the bootstrap, and d is a distance function.
Optionally, when the guidance information of the C-boot apparatus is input into the C-boot apparatus in step 2) to guide the training of the C-boot apparatus, the loss function adopted is:
in the above formula, the first and second carbon atoms are,representing the loss function employed to train the C leader,<x,xi,xj>for the three-tuple of the input,representing distance magnitude relationship in C leaderGuidance information with respect to P-bootThe log probability of (a) of (b),andindicating guidance information delta of P leaderp,To characterize the binary group h,A measure of the distance between the two,to characterize the binary group h,σ is a logistic function.
Optionally, when the guidance information of the P-boot apparatus in step 2) is transmitted to the P-boot apparatus to guide the training of the P-boot apparatus, the loss function adopted is:
in the above formula, the first and second carbon atoms are,representing the loss function employed to train the P leader,<x,xi,xj>for the three-tuple of the input,representing distance magnitude relationship in P-booterGuidance information with respect to a C leaderThe log probability of (a) of (b),andguidance information δ representing C guidec,To characterize the binary group h,A measure of the distance between the two,to characterize the binary group h,Measure of distance betweenσ is a logistic function.
In addition, the invention also provides a feature extraction method for network intrusion data, which comprises the following steps: collecting network behavior data comprising discrete features and continuous features; and inputting the network behavior data containing discrete features and continuous features into a P leader and a C leader which finish training by adopting the metric learning-based self-guided mixed data characterization learning method to obtain the network behavior features corresponding to the network behavior data.
In addition, the invention also provides a metric learning-based self-guided hybrid data characterization learning system, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the steps of the metric learning-based self-guided hybrid data characterization learning method or the steps of the network intrusion data-specific feature extraction method.
Furthermore, the present invention also provides a computer readable storage medium, in which a computer program programmed or configured to execute the metric learning-based self-guided hybrid data characterization learning method or the feature extraction method for network intrusion data is stored.
Compared with the prior art, the invention has the following advantages: the existing network intrusion detection method adopts a mode of directly splicing continuous features and converted discrete features aiming at mixed data, and adopts a simple one-hot coding mode of converting the discrete features, namely setting each appeared feature value as 1 and setting other values as 0. The coding mode (1) ignores the heterogeneous correlation relationship between the discrete characteristic and the continuous characteristic; (2) the correlation between discrete features is ignored. Aiming at the characteristic that the existing mixed data characterization cannot effectively capture the complex coupling relation between different types of features, the invention provides a self-guided characterization learning mechanism based on a complementary coding mode, which can strengthen the relation between attribute objects and is realized as a new characterization learning method by metric learning, wherein the self-guided characterization learning model consists of two mutually cooperative guiding machines. A bootstrap machine infers the distance relationship of a triple through pairwise similarity based on a naive coding space, and then the distance relationship of the triple is used as guidance information and input into another bootstrap machine based on a coupled coding space for metric learning. Similarly, the triplet distance relationship generated by the bootstrap is also used as guiding information to input into the original bootstrap for metric learning. In the process of the interaction and automatic learning, the two guiding machines continuously improve mutual consensus degree, so that a stable state is achieved. Finally, the self-guiding opportunity learns a representation which can effectively distinguish the data objects, and on one hand, the self-guiding opportunity learns a mixed data representation containing discrete features and continuous feature coupling relations; on the other hand, the distinction between the data objects can be realized through a mutual learning mechanism.
Drawings
Fig. 1 is a schematic flow chart of alternate training performed in the embodiment of the present invention.
FIG. 2 shows a naive encoding space F in an embodiment of the inventionpCoupled code space F of C guide machinecSchematic view of the structure of (1).
Fig. 3 is a diagram of a neural network architecture formed by a P-bootstrap and a C-bootstrap in an embodiment of the present invention.
Detailed Description
The invention will be described in further detail below with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, the method for learning the metric learning-based self-guided hybrid data representation in this embodiment includes the steps of performing alternate training on two mutually coupled three-layer neural networks, namely, a P-leader and a C-leader:
1) generating two coding spaces for a P guiding machine and a C guiding machine, and initializing network parameters;
2) performing multiple rounds of alternating training for the P and C leaders, and each round of alternating training comprises: constructing a triple serving as input data, calculating the representation corresponding to the triple in the P leader, calculating the guide information of the C leader according to the obtained representation, inputting the guide information of the C leader into the C leader to guide the training of the C leader, updating the parameters in the C leader, further calculating the representation corresponding to the triple in the C leader, calculating the guide information of the P leader according to the representation in the C leader, transmitting the guide information of the P leader into the P leader to guide the training of the P leader, and updating the parameters in the P leader.
The data embedded learning is mainly responsible for integrating the continuous features and the discrete features learned by CDRL to form a complete representation of data, and the discrete features and the continuous features can be simultaneously mapped to the same continuous space in such a way, and the coupling relation between the features can be learned. The naive coding space converts the discrete data into 0-1 one-hot representation, and then the continuous features are spliced to obtain the naive code of each data object. The coupling code describes the coupling relation between the discrete characteristic and the continuous characteristic, firstly, a correlation relation matrix of all discrete characteristic values and continuous data is generated, namely, each data object generates a data matrix, and then, a data vector is obtained by flattening operation, namely, splicing each row of the matrix, namely, the coupling code of one data object. As shown in fig. 2, the generation of two coding spaces for the P and C directors in step 1) of this embodiment means that a naive coding space F of the P bootstrap is generatedpCoupled code space F of C guide machinecPlain code space FpThe system is used for converting discrete features in the data object into one-hot representation and splicing the one-hot representation with continuous features in the data object to obtain a naive encoding vector of the data object; coupled coding space FcAnd the method is used for generating a correlation relation matrix according to the correlation relation between all discrete features and continuous features in the data object, and then using a data vector obtained by flattening the correlation relation matrix as a coupling encoding vector of the data object. Plain coding space F by P-bootstrap in this embodimentpCoupled code space F of C guide machinecThe construction of two complementary coding spaces allows each data object to be coded from the original table of information into two vectors, called na iotave-coded vectors and coupled-coded vectors. The two coding spaces describe the same object from different angles based on different assumptions. In the naive coding space, each feature is treated as equivalent and independent of each otherThe variable of (2). The na iotave encoding vector distinguishes between discrete and continuous features. However, in the coupled coding space, the continuous features are considered to be highly correlated with the discrete features, and the coupled coding vector can be constructed by estimating the joint probability density of any one continuous feature and any one discrete feature.
Plain coding space FpThe method is characterized by comprising the converted discrete features and the converted continuous features, and the most complete information in the original information table is contained. We use one-hot encoding to convert discrete variables into binary features. Each binary feature has a unique 1 corresponding to a discrete variable value, and the others are all 0. After splicing the converted discrete features to the original continuous features, a new naive code is formed, and the final naive code vector can be expressed as:wherein d isnIs the dimension of the continuous characteristic, | | is the number of discrete characteristic values.
Coupled coding space FcThe data object is formed by splicing coupling coding matrixes of all data objects. In a coupled coding matrix, the rows represent continuous features and the columns represent discrete feature values. Each position in the matrix is estimated from the joint probability density of feature doublets of mixed types, which quantifies the interaction between discrete and continuous features. For each feature doublet, i.e. one discrete feature and one continuous featureWe consider it as two variable sets, i.e.<Ai,Vj>And then estimates its density using the product kernel. So from the variable AiContinuous characteristic value ofAnd from variable ViIs measured by the discrete characteristic value vjHas a combined density of
Therefore, in this embodiment, the calculation function expression of the correlation between the discrete features and the continuous features is:
in the above formula, the first and second carbon atoms are,representing discrete featuresContinuous characteristic vjThe correlation relationship between the two components is shown,representing continuous featuresAnd discrete features vjτ is a threshold parameter and λ is a scaling factor, wherein the joint densityThe formula of the calculation function is:
in the above formula, N is the number of data objects,as discrete characteristic valuesAnd vjThe kernel function of (a) to (b),is a kernel function of a continuous feature,represents the variable AiContinuous feature value f on the k-th data objecti,Represents the variable AiContinuous feature value f on the x-th data objecti,hiA bandwidth parameter representing a continuous characteristic; wherein the kernel functionThe definition function expression of (1) is:
in the above formula, the first and second carbon atoms are,representing discrete features vjAnd λ is a proportionality coefficient in the corresponding characteristic value on the kth data object.For the kernel function of continuous feature, a gaussian kernel is used as the kernel function of continuous variable in this embodiment.
After we have derived the density estimates of any two discrete variable and continuous variable doublets, we define the coupling encoding matrix M for the data object xxThe following were used:
wherein the content of the first and second substances,representing a correlation between discrete features, continuous features, i.e.Is derived from the density estimate of the mixed dyad, reflecting the interaction between the continuous and discrete variables.
In this embodiment, a neural network formed by the P-leader and the C-leader is referred to as MAI (metal-based auto indicator), and the MAI is composed of the P-leader and the C-leader in two different coding spaces. The P and C booters are coupled to each other. The first layer of the model is a set of two eigenvectors of a triplet, the two eigenvectors are encoded from the mixed data, and the coding spaces in the two directors are called a naive coding space and a coupled coding space, respectively. The second layer is a characterization layer that updates the characterization of this layer by a distance metric optimization function from the previous layer. The third layer is an automatic metric learning layer that can enhance the distinguishing information between data objects through the triplet distance relationship and provide guidance information for another leader. In the training process, firstly generating two coding spaces, initializing parameters in two guiding machines, and constructing input data in a small batch parameter updating mode, wherein each group of input data comprises a plurality of triples; and then, calculating the representation corresponding to the triplet in the P leader, calculating according to the obtained representation to obtain guidance information, inputting the guidance information into the C leader for training, updating the parameters in the C leader to obtain the representation corresponding to the triplet in the C leader, calculating according to the representation in the C leader to obtain new guidance information, transmitting the guidance information into the P leader to guide the training of the P leader, and finally obtaining a stable parameter result according to the alternate training mode. And splicing the finally obtained data representations in the two guiding machines to obtain the representation of the final mixed data object. As shown in fig. 3, the three layers of neural networks of the P-bootstrap in this embodiment respectively include:
an input layer for inputting triplets<x,xi,xj>;
A characterization layer for computing triplets<x,xi,xj>The corresponding representation of each element in the group to obtain a representation group
Automatic metric learning layer for computing token groupsThe characteristic binary group h,Measure of distance betweenAnd characterizing the dyads h,Measure of distance betweenAnd C guidance information delta of the leaderc;
The three-layer neural network of the C guiding machine respectively comprises:
an input layer for inputting triplets<x,xi,xj>;
A characterization layer for computing triplets<x,xi,xj>The corresponding representation of each element in the group to obtain a representation group
Automatic metric learning layer for computing token groupsThe characteristic binary group h,Measure of distance betweenAnd characterizing the dyads h,Measure of distance betweenAnd guidance information δ of the P leaderp。
Triple of calculation of characterization layer of P-bootstrap in this embodiment<x,xi,xj>The function expression of the corresponding representation of each element in the table is as follows:
hp=σ(fpW1),
in the above formula, hpFor the characterization corresponding to this element, σ is the logistic function, fpIs a naive encoding vector, W, of the element1Is the weight of the element; the features obtained from the na iotave coding space are independent and not related, where the logistic function defines the functional expression as: σ ═ 1/(1+ e)-z) Where z is an independent variable, in order to capture the coupling relationship between features, the present embodiment uses the characterization layer of the P-bootstrap to encode the vector fpConversion into a token vector h of length K over a fully-connected networkp。
Triple calculation by the characterization layer of the C-director in this embodiment<x,xi,xj>The function expression of the corresponding representation of each element in the table is as follows:
hc=σ(fcW2),
in the above formula, hcFor the characterization corresponding to this element, σ is the logistic function, fcFor a coupled coded vector of the element, W2Is the weight of the element. In this embodiment, the C-director is used to couple the coded vector fcMapped as a token vector h of length J over another fully-connected networkc. Through two fully-connected networks, the feature vectors of two coding spaces are respectively mapped into a new characterization vector hpAnd hc. At this stage, the token vector captures only the coupling at the feature level, similar to an automatic coding machine. To enhance data objects in representation hpAnd hcThe above distinctiveness, the present embodiment introduces an automatic metric learning layer, and then none is used through this layerThe bounded objective function achieves optimization of the characterization.
In this embodiment, the automatic metric learning layer of the P-boot machine calculates the characterization groupThe characteristic binary group h,Measure of distance betweenAnd characterizing the dyads h,Measure of distance betweenThe functional expression of (a) is:
in the above formula, the first and second carbon atoms are,representing the characteristic binary group h,Measure of distance betweenOr characterizing the doublet h,Measure of distance betweenhp,For the calculated token doublet, W3Is a learning parameter;
automatic metric learning layer calculation characterization group of C guiding machineThe characteristic binary group h,Measure of distance betweenAnd characterizing the dyads h,Measure of distance betweenThe functional expression of (a) is:
in the above formula, the first and second carbon atoms are,representing the characteristic binary group h,Measure of distance betweenOr characterizing the doublet h,Measure of distance betweenhc,For the calculated token doublet, W4Is a learning parameter;
calculating guiding information delta of the C guiding machine by the automatic measurement learning layer of the P guiding machinecAnd calculating guiding information delta of the P guiding machine by the automatic metric learning layer of the C guiding machinepThe functional expression of (a) is:
in the above formula, δh(hi,hj) Guidance information δ representing C guidecOr guidance information δ of the P-bootp,<h,hi,hj>Representing the characterization group calculated by the automatic metric learning layer of the bootstrap, and d is a distance function.
In this embodiment, for the P-boot engine of the naive encoding space, data objects x and xiDeriving distance measures of the two by their characterizationSimilarly, for a C-boot machine coupled to the code space, data objects x and xiDeriving distance measures of the two by their characterizationGiven a reference data object x and two comparison data objects xiAnd xjWe can easily calculate the distance measure between them according to the above definition. In the conventional metric learning method, we need to provide a metric sequential relationship of data object doublets, which is generally given by a label. But in unsupervised learning we do not have distance labels. To solve this problem, we design a bootstrap process to obtain the distance magnitude relationship of two pairs of duplets, so as to provide guidance information for the bootstrap. We define a binary function deltahTo represent the distance magnitude relationship of two tuples formed by the representation of a triplet deltah(hi,hj):
In the above formula, δh(hi,hj) Guidance information δ representing C guidecOr guidance information δ of the P-bootp,<h,hi,hj>And d is a distance function, such as Euclidean distance, cosine distance and the like. Here we use the following distance function: d (h, h)i)=||h-hi||2。
Given a triplet<x,xi,xj>C guide machine calculates a distance size relationAnd guiding the metric learning process of the P bootstrap. Therefore, the distance magnitude relationship in the P-bootWith respect to the guidance informationThe conditional probabilities in the two directors with respect to the triplets can thus be combined in a multi-objective loss function. In this embodiment, when the guidance information of the C leader is input into the C leader in step 2) to guide the training of the C leader, the loss function adopted is:
in the above formula, the first and second carbon atoms are,representing the loss function employed to train the C leader,<x,xi,xj>for the three-tuple of the input,representing distance magnitude relationship in C leaderGuidance information with respect to P-bootThe log probability of (a) of (b),andindicating guidance information delta of P leaderp,To characterize the binary group h,A measure of the distance between the two,to characterize the binary group h,σ is a logistic function.
In this embodiment, when the guidance information of the P-boot apparatus in step 2) is transmitted to the P-boot apparatus to guide the training of the P-boot apparatus, the loss function adopted is:
in the above formula, the first and second carbon atoms are,representing the loss function employed to train the P leader,<x,xi,xj>for the three-tuple of the input,representing distance magnitude relationship in P-booterGuidance information with respect to a C leaderThe log probability of (a) of (b),andguidance information δ representing C guidec,To characterize the binary group h,A measure of the distance between the two,to characterize the binary group h,Measure of distance betweenσ is a logistic function. This form of the loss function is common to all of the above-mentioned hungse a variant of the loss function, i.e. a probabilistic version of infinite boundary (infinite margin) loss function is used. Therefore, in the embodiment, a metric learning objective function based on an infinite boundary is constructed in the layer, and the boundary of the data object of the characterization layer can be enhanced by optimizing the objective function. Based on the objective optimization function, parameter estimation can be obtained in a gradient descent mode.
The existing network intrusion detection method adopts a mode of directly splicing continuous features and converted discrete features aiming at mixed data, and adopts a simple one-hot coding mode of converting the discrete features, namely setting each appeared feature value as 1 and setting other values as 0. The coding mode (1) ignores the heterogeneous correlation relationship between the discrete characteristic and the continuous characteristic; (2) the correlation between discrete features is ignored. In view of the problems of the existing methods, the present embodiment also provides a method for extracting features from network intrusion data, including: collecting network behavior data comprising discrete features and continuous features; the method comprises the steps of inputting network behavior data containing discrete features and continuous features into a P leader and a C leader which finish training by adopting the metric learning-based self-guided mixed data characterization learning method to obtain network behavior features corresponding to the network behavior data, learning network intrusion behavior characterization with higher distinctiveness by the metric learning-based self-guided mixed data characterization learning method, and inputting the network behavior features corresponding to the network behavior data into a preset network intrusion detection model after the network behavior features corresponding to the network behavior data are obtained, so that a more accurate network intrusion detection result can be obtained.
In addition, the present embodiment also provides a metric learning-based self-guided hybrid data characterization learning system, which includes a microprocessor and a memory connected to each other, where the microprocessor is programmed or configured to execute the steps of the metric learning-based self-guided hybrid data characterization learning method or the steps of the feature extraction method for network intrusion data.
Furthermore, the present embodiment also provides a computer-readable storage medium, in which a computer program programmed or configured to execute the foregoing metric learning-based self-guided hybrid data characterization learning method or the foregoing feature extraction method for network intrusion data is stored.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.
Claims (10)
1. A self-guided hybrid data characterization learning method based on metric learning is characterized by comprising the following steps of alternately training two mutually coupled three-layer neural networks of a P guiding machine and a C guiding machine:
1) generating two coding spaces for a P guiding machine and a C guiding machine, and initializing network parameters;
2) performing multiple rounds of alternating training for the P and C leaders, and each round of alternating training comprises: constructing a triple serving as input data, calculating the representation corresponding to the triple in the P leader, calculating the guide information of the C leader according to the obtained representation, inputting the guide information of the C leader into the C leader to guide the training of the C leader, updating the parameters in the C leader, further calculating the representation corresponding to the triple in the C leader, calculating the guide information of the P leader according to the representation in the C leader, transmitting the guide information of the P leader into the P leader to guide the training of the P leader, and updating the parameters in the P leader.
2. The metric learning-based self-guided hybrid data characterization learning method according to claim 1, wherein the generation of two coding spaces for the P and C guides in step 1) is to generate a naive coding space F of the P guidepCoupled code space F of C guide machinecSaid plain code space FpThe system is used for converting discrete features in the data object into one-hot representation and splicing the one-hot representation with continuous features in the data object to obtain a naive encoding vector of the data object; the coupled code spaceM FcAnd the method is used for generating a correlation relation matrix according to the correlation relation between all discrete features and continuous features in the data object, and then using a data vector obtained by flattening the correlation relation matrix as a coupling encoding vector of the data object.
3. The metric learning-based self-guided hybrid data characterization learning method according to claim 2, wherein the calculation function expression of the correlation between the discrete features and the continuous features is as follows:
in the above formula, the first and second carbon atoms are,representing discrete featuresContinuous characteristic vjThe correlation relationship between the two components is shown,representing continuous featuresAnd discrete features vjτ is a threshold parameter and λ is a scaling factor, wherein the joint densityThe formula of the calculation function is:
in the above formula, N is the number of data objects,as discrete characteristic valuesAnd vjThe kernel function of (a) to (b),is a kernel function of a continuous feature,represents the variable AiContinuous feature value f on the k-th data objecti,Represents the variable AiContinuous feature value f on the x-th data objecti,hiA bandwidth parameter representing a continuous characteristic; wherein the kernel functionThe definition function expression of (1) is:
4. The metric learning-based self-guided hybrid data characterization learning method according to claim 1, 2 or 3, wherein the three-layer neural network of the P-guided machine respectively comprises:
an input layer for inputting triplets<x,xi,xj>;
A characterization layer for computing triplets<x,xi,xj>The corresponding representation of each element in the group to obtain a representation group
Automatic metric learning layer for computing token groupsThe characteristic binary group h,Measure of distance betweenAnd characterizing the dyads h,Measure of distance betweenAnd C guidance information delta of the leaderc;
The three-layer neural network of the C guiding machine respectively comprises:
an input layer for inputting triplets<x,xi,xj>;
A characterization layer for computing triplets<x,xi,xj>The corresponding representation of each element in the group to obtain a representation group
5. The metric learning-based self-guided hybrid data representation learning method of claim 4, wherein the representation layer of the P-guide computes triplets<x,xi,xj>The function expression of the corresponding representation of each element in the table is as follows:
hp=σ(fpW1),
in the above formula, hpFor the characterization corresponding to this element, σ is the logistic function, fpIs a naive encoding vector, W, of the element1Is the weight of the element;
the characterization layer calculation triplets of the C-boot<x,xi,xj>The function expression of the corresponding representation of each element in the table is as follows:
hc=σ(fcW2),
in the above formula, hcFor the characterization corresponding to this element, σ is the logistic function, fcFor a coupled coded vector of the element, W2Is the weight of the element.
6. The metric learning-based self-guided hybrid data characterization learning method according to claim 4, wherein the automatic metric learning layer of the P-guide machine calculates the characterization groupThe characteristic binary group h,Measure of distance betweenAnd characterizing the dyads h,Measure of distance betweenThe functional expression of (a) is:
in the above formula, the first and second carbon atoms are,representing the characteristic binary group h,Measure of distance betweenOr characterizing the doublet h,Measure of distance betweenhp,For the calculated token doublet, W3Is a learning parameter;
computing characterization groups of the automatic metric learning layer of the C-bootstrapThe characteristic binary group h,Measure of distance betweenPositive characteristic of binary group h,Measure of distance betweenThe functional expression of (a) is:
in the above formula, the first and second carbon atoms are,representing the characteristic binary group h,Measure of distance betweenOr characterizing the doublet h,Measure of distance betweenhc,For the calculated token doublet, W4Is a learning parameter;
the automatic measurement learning layer of the P guiding machine calculates the guiding information delta of the C guiding machinecAnd calculating guiding information delta of the P guiding machine by the automatic metric learning layer of the C guiding machinepThe functional expression of (a) is:
in the above formula, δh(hi,hj) Guidance information δ representing C guidecOr guidance information δ of the P-bootp,<h,hi,hj>Representing the characterization group calculated by the automatic metric learning layer of the bootstrap, and d is a distance function.
7. The metric learning-based self-guided hybrid data characterization learning method according to claim 1, wherein in the step 2), when the guidance information of the C-guided machine is input into the C-guided machine to guide the training of the C-guided machine, the loss function is:
in the above formula, the first and second carbon atoms are,representing the loss function employed to train the C leader,<x,xi,xj>for the three-tuple of the input,representing distance magnitude relationship in C leaderGuidance information with respect to P-bootThe log probability of (a) of (b),andguide information 6P indicating the P leader,to characterize the binary group h,A measure of the distance between the two,to characterize the binary group h,σ is a logistic function; in the step 2), when the guiding information of the P leader is transmitted into the P leader to guide the training of the P leader, the adopted loss function is as follows:
in the above formula, the first and second carbon atoms are,representing the loss function employed to train the P leader,<x,xi,xj>for the three-tuple of the input,representing distance magnitude relationship in P-booterGuidance information with respect to a C leaderThe log probability of (a) of (b),andguidance information δ representing C guidec,To characterize the binary group h,A measure of the distance between the two,to characterize the binary group h,Measure of distance betweenσ is a logistic function.
8. A feature extraction method for network intrusion data is characterized by comprising the following steps: collecting network behavior data comprising discrete features and continuous features; inputting network behavior data containing discrete features and continuous features into a P bootstrap and a C bootstrap which are trained by adopting the metric learning-based self-guided mixed data characterization learning method of any one of claims 1-7, and obtaining network behavior features corresponding to the network behavior data.
9. A metric learning-based self-guided hybrid data representation learning system, comprising a microprocessor and a memory connected to each other, wherein the microprocessor is programmed or configured to perform the steps of the metric learning-based self-guided hybrid data representation learning method according to any one of claims 1 to 7 or the steps of the feature extraction method for network intrusion data according to claim 8.
10. A computer-readable storage medium, wherein a computer program programmed or configured to perform the metric learning-based self-guided hybrid data characterization learning method according to any one of claims 1 to 7 or the feature extraction method for network intrusion data according to claim 8 is stored in the computer-readable storage medium.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111463166.3A CN114139629A (en) | 2021-12-02 | 2021-12-02 | Self-guided mixed data representation learning method and system based on metric learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111463166.3A CN114139629A (en) | 2021-12-02 | 2021-12-02 | Self-guided mixed data representation learning method and system based on metric learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114139629A true CN114139629A (en) | 2022-03-04 |
Family
ID=80387370
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111463166.3A Pending CN114139629A (en) | 2021-12-02 | 2021-12-02 | Self-guided mixed data representation learning method and system based on metric learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114139629A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110070183A (en) * | 2019-03-11 | 2019-07-30 | 中国科学院信息工程研究所 | A kind of the neural network model training method and device of weak labeled data |
CN111919223A (en) * | 2018-03-26 | 2020-11-10 | 平衡媒体技术有限责任公司 | Abstract interface for machine learning algorithm gameplay |
CN113158577A (en) * | 2021-04-30 | 2021-07-23 | 中国人民解放军国防科技大学 | Discrete data characterization learning method and system based on hierarchical coupling relation |
CN113179276A (en) * | 2021-04-30 | 2021-07-27 | 中国人民解放军国防科技大学 | Intelligent intrusion detection method and system based on explicit and implicit feature learning |
US20210334664A1 (en) * | 2020-04-24 | 2021-10-28 | Adobe Inc. | Domain Adaptation for Machine Learning Models |
-
2021
- 2021-12-02 CN CN202111463166.3A patent/CN114139629A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111919223A (en) * | 2018-03-26 | 2020-11-10 | 平衡媒体技术有限责任公司 | Abstract interface for machine learning algorithm gameplay |
CN110070183A (en) * | 2019-03-11 | 2019-07-30 | 中国科学院信息工程研究所 | A kind of the neural network model training method and device of weak labeled data |
US20210334664A1 (en) * | 2020-04-24 | 2021-10-28 | Adobe Inc. | Domain Adaptation for Machine Learning Models |
CN113158577A (en) * | 2021-04-30 | 2021-07-23 | 中国人民解放军国防科技大学 | Discrete data characterization learning method and system based on hierarchical coupling relation |
CN113179276A (en) * | 2021-04-30 | 2021-07-27 | 中国人民解放军国防科技大学 | Intelligent intrusion detection method and system based on explicit and implicit feature learning |
Non-Patent Citations (1)
Title |
---|
蹇松雷: "基于复杂异构数据的表征学习研究", 工学博士学位论文, pages 1 - 60 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111581405B (en) | Cross-modal generalization zero sample retrieval method for generating confrontation network based on dual learning | |
CN109783682B (en) | Point-to-point similarity-based depth non-relaxed Hash image retrieval method | |
CN107871014A (en) | A kind of big data cross-module state search method and system based on depth integration Hash | |
CN109960737B (en) | Remote sensing image content retrieval method for semi-supervised depth confrontation self-coding Hash learning | |
CN109753571B (en) | Scene map low-dimensional space embedding method based on secondary theme space projection | |
CN111274398A (en) | Method and system for analyzing comment emotion of aspect-level user product | |
CN112000772B (en) | Sentence-to-semantic matching method based on semantic feature cube and oriented to intelligent question and answer | |
CN114239560B (en) | Three-dimensional image classification method, apparatus, device, and computer-readable storage medium | |
CN115018021A (en) | Machine room abnormity detection method and device based on graph structure and abnormity attention mechanism | |
CN114817568B (en) | Knowledge hypergraph link prediction method combining attention mechanism and convolutional neural network | |
CN115909002A (en) | Image translation method based on contrast learning | |
CN112214570A (en) | Cross-modal retrieval method and device based on counterprojection learning hash | |
CN114896434A (en) | Hash code generation method and device based on center similarity learning | |
CN117152504A (en) | Space correlation guided prototype distillation small sample classification method | |
CN116739100A (en) | Vulnerability detection method of quantum neural network and automatic driving vulnerability detection method | |
CN114139629A (en) | Self-guided mixed data representation learning method and system based on metric learning | |
CN115599984A (en) | Retrieval method | |
CN115422945A (en) | Rumor detection method and system integrating emotion mining | |
Wang et al. | Interpolation normalization for contrast domain generalization | |
CN113297385B (en) | Multi-label text classification system and method based on improved GraphRNN | |
CN112862007B (en) | Commodity sequence recommendation method and system based on user interest editing | |
Jang et al. | Observational learning algorithm for an ensemble of neural networks | |
CN115861196A (en) | Active learning method for multi-modal medical images | |
CN115408536A (en) | Knowledge graph complementing method based on context information fusion | |
CN113158577A (en) | Discrete data characterization learning method and system based on hierarchical coupling relation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |