CN114139629A - Self-guided mixed data representation learning method and system based on metric learning - Google Patents

Self-guided mixed data representation learning method and system based on metric learning Download PDF

Info

Publication number
CN114139629A
CN114139629A CN202111463166.3A CN202111463166A CN114139629A CN 114139629 A CN114139629 A CN 114139629A CN 202111463166 A CN202111463166 A CN 202111463166A CN 114139629 A CN114139629 A CN 114139629A
Authority
CN
China
Prior art keywords
leader
learning
data
distance
representation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111463166.3A
Other languages
Chinese (zh)
Inventor
蹇松雷
黄辰林
谭郁松
李宝
董攀
丁滟
任怡
王晓川
张建锋
谭霜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202111463166.3A priority Critical patent/CN114139629A/en
Publication of CN114139629A publication Critical patent/CN114139629A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a metric learning-based self-guided mixed data characterization learning method and a system, the method comprises the step of alternately training two mutually coupled three-layer neural networks of a P guiding machine and a C guiding machine, and each round of alternate training comprises the following steps: and calculating the representation corresponding to the triple in the P bootstrap, calculating the guide information according to the obtained representation, inputting the guide information into the C bootstrap to guide the training of the C bootstrap, updating the parameters in the C bootstrap, calculating the representation corresponding to the triple in the C bootstrap, transmitting the calculated guide information into the P bootstrap to guide the training of the P bootstrap according to the representation in the C bootstrap, and updating the parameters in the P bootstrap. The invention can not only reflect the coupling relation between the characteristics from the characteristic level and learn the mixed data representation containing the coupling relation of the discrete characteristics and the continuous characteristics, but also effectively reflect the difference between the data objects and realize the distinction between the data objects through a mutual learning mechanism.

Description

Self-guided mixed data representation learning method and system based on metric learning
Technical Field
The invention belongs to the field of computer data science, and particularly relates to a metric learning-based self-guided mixed data representation learning method and system.
Background
The existing network intrusion detection method adopts a mode of directly splicing continuous features and converted discrete features aiming at mixed data, and adopts a simple one-hot coding mode of converting the discrete features, namely setting each appeared feature value as 1 and setting other values as 0. The coding mode (1) ignores the heterogeneous correlation relationship between the discrete characteristic and the continuous characteristic; (2) the correlation between discrete features is ignored. Hybrid data, which refers to attribute data containing discrete features and continuous features, is a common type of data. The learning of the characterization for the mixed data is very important for the subsequent machine learning task, and the characterization of the mixed data is also challenging due to the heterogeneity between features. In order to solve the characterization problem of mixed data, a neural network and metric learning are introduced into a characterization model, so that the characterization which is more suitable for a subsequent clustering algorithm is learned. Although mixed data is widespread in the real world, little research has been done on the characterization of mixed data. At the feature level, a good characterization should capture heterogeneous coupling relationships (e.g., complex interactions and dependencies) between discrete features and continuous features. At the data object level, a good representation should be able to distinguish objects well, thereby facilitating the subsequent learning tasks (e.g., clustering, classification, etc.). However, most of the existing characterization methods only focus on the relationship of the feature level, and ignore the distinctiveness of the object level.
Most existing methods of hybrid data characterization ignore or partially ignore the heterogeneous relationships between discrete and continuous features. For example, k-prototype quantifies the relationship between mixed data objects by computing Euclidean distances between consecutive features and Hamming distances between discrete features. This approach treats the individual features as independent of each other, while ignoring the correlation between features. Other methods convert continuous features into discrete features by discretizing the continuous features and then using discrete feature processing methods to compute the correlation between features. For example, the mADD uses an isometric discretization method (i.e., a continuous value in a certain interval is replaced by a discrete value) to convert a continuous feature into a discrete feature, then models the relationship between the continuous and discrete features, and introduces a weight parameter to control the importance of each feature. Both SpectraLCAT and CopledMC use k-means clustering to convert continuous features into discrete features, using the cluster labels as new discrete features. Because these methods process mixed data based on continuous feature discretization, they cannot directly capture the distribution of continuous features, resulting in information loss. Some model-based approaches attempt to capture heterogeneous coupling relationships by transforming the data space. For example, EGMCM transforms mixture features into an ordering space, learning dependencies between attributes through a Gaussian mixture associative structure (Copula). This approach not only results in loss of information but also fails to capture distinctiveness between attribute objects. In recent years, the rise of neural networks has largely supported characterization learning. For example, an automatic coding machine (auto encoder) is a typical fully-connected neural network model, and the fully-connected network structure can capture the correlation between features to a certain extent, so that the correlation between continuous features is captured by using the automatic coding machine. However, these methods focus on the representation of the feature level only, and do not enhance the distinctiveness between objects.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: aiming at the problems in the prior art, the invention provides a self-guided mixed data representation learning method and system based on metric learning, which not only can reflect the coupling relation between features from a feature level and learn the mixed data representation containing the coupling relation between discrete features and continuous features, but also can effectively reflect the difference between data objects and can realize the distinction between the data objects through a mutual learning mechanism.
In order to solve the technical problems, the invention adopts the technical scheme that:
a self-guided mixed data characterization learning method based on metric learning comprises the following steps of alternately training two mutually coupled three-layer neural networks of a P guiding machine and a C guiding machine:
1) generating two coding spaces for a P guiding machine and a C guiding machine, and initializing network parameters;
2) performing multiple rounds of alternating training for the P and C leaders, and each round of alternating training comprises: constructing a triple serving as input data, calculating the representation corresponding to the triple in the P leader, calculating the guide information of the C leader according to the obtained representation, inputting the guide information of the C leader into the C leader to guide the training of the C leader, updating the parameters in the C leader, further calculating the representation corresponding to the triple in the C leader, calculating the guide information of the P leader according to the representation in the C leader, transmitting the guide information of the P leader into the P leader to guide the training of the P leader, and updating the parameters in the P leader.
Optionally, the step 1) of generating two coding spaces for the P and C booters means generating a naive coding space F of the P booterpCoupled code space F of C guide machinecSaid plain code space FpThe system is used for converting discrete features in the data object into one-hot representation and splicing the one-hot representation with continuous features in the data object to obtain a naive encoding vector of the data object; the coupled coding space FcThe method is used for generating a correlation matrix from correlation among all discrete features and continuous features in the data object and then generating the correlation matrixAnd taking the data vector obtained by the flattening operation as a coupling encoding vector of the data object.
Optionally, the calculation function expression of the correlation between the discrete features and the continuous features is as follows:
Figure BDA0003389396020000021
in the above formula, the first and second carbon atoms are,
Figure BDA0003389396020000022
representing discrete features
Figure BDA0003389396020000023
Continuous characteristic vjThe correlation relationship between the two components is shown,
Figure BDA0003389396020000024
representing continuous features
Figure BDA0003389396020000025
And discrete features vjτ is a threshold parameter and λ is a scaling factor, wherein the joint density
Figure BDA0003389396020000026
The formula of the calculation function is:
Figure BDA0003389396020000027
in the above formula, N is the number of data objects,
Figure BDA0003389396020000028
as discrete characteristic values
Figure BDA0003389396020000029
And vjThe kernel function of (a) to (b),
Figure BDA0003389396020000031
is a kernel function of a continuous feature,
Figure BDA0003389396020000032
represents the variable AiContinuous feature value f on the k-th data objecti
Figure BDA0003389396020000033
Represents the variable AiContinuous feature value f on the x-th data objecti,hiA bandwidth parameter representing a continuous characteristic; wherein the kernel function
Figure BDA0003389396020000034
The definition function expression of (1) is:
Figure BDA0003389396020000035
in the above formula, the first and second carbon atoms are,
Figure BDA0003389396020000036
representing discrete features vjAnd λ is a proportionality coefficient in the corresponding characteristic value on the kth data object.
Optionally, the three layers of neural networks of the P-bootstrap include:
an input layer for inputting triplets<x,xi,xj>;
A characterization layer for computing triplets<x,xi,xj>The corresponding representation of each element in the group to obtain a representation group
Figure BDA0003389396020000037
Automatic metric learning layer for computing token groups
Figure BDA0003389396020000038
The characteristic binary group h,
Figure BDA0003389396020000039
Measure of distance between
Figure BDA00033893960200000310
And characterizing the dyads h,
Figure BDA00033893960200000311
Measure of distance between
Figure BDA00033893960200000312
And C guidance information delta of the leaderc
The three-layer neural network of the C guiding machine respectively comprises:
an input layer for inputting triplets<x,xi,xj>;
A characterization layer for computing triplets<x,xi,xj>The corresponding representation of each element in the group to obtain a representation group
Figure BDA00033893960200000313
Automatic metric learning layer for computing token groups
Figure BDA00033893960200000314
The characteristic binary group h,
Figure BDA00033893960200000315
Measure of distance between
Figure BDA00033893960200000316
And characterizing the dyads h,
Figure BDA00033893960200000317
Measure of distance between
Figure BDA00033893960200000318
And guidance information δ of the P leaderp
Optionally, the characterization layer of the P-bootstrap calculates the triplets<x,xi,xj>The function expression of the corresponding representation of each element in the table is as follows:
hp=σ(fpW1),
in the above formula, hpFor the characterization corresponding to this element, σ is the logistic function, fpIs a naive encoding vector, W, of the element1Is the weight of the element;
the characterization layer calculation triplets of the C-boot<x,xi,xj>The function expression of the corresponding representation of each element in the table is as follows:
hc=σ(fcW2),
in the above formula, hcFor the characterization corresponding to this element, σ is the logistic function, fcFor a coupled coded vector of the element, W2Is the weight of the element.
Optionally, the automated metric learning layer of the P-bootstrap calculates a set of tokens
Figure BDA00033893960200000319
The characteristic binary group h,
Figure BDA00033893960200000320
Measure of distance between
Figure BDA00033893960200000321
And characterizing the dyads h,
Figure BDA00033893960200000322
Measure of distance between
Figure BDA00033893960200000323
The functional expression of (a) is:
Figure BDA00033893960200000324
in the above formula, the first and second carbon atoms are,
Figure BDA00033893960200000325
representing the characteristic binary group h,
Figure BDA00033893960200000326
Measure of distance between
Figure BDA00033893960200000327
Or characterizing the doublet h,
Figure BDA00033893960200000328
Measure of distance between
Figure BDA00033893960200000329
hp
Figure BDA00033893960200000330
For the calculated token doublet, W3Is a learning parameter;
computing characterization groups of the automatic metric learning layer of the C-bootstrap
Figure BDA0003389396020000041
The characteristic binary group h,
Figure BDA0003389396020000042
Measure of distance between
Figure BDA0003389396020000043
And characterizing the dyads h,
Figure BDA0003389396020000044
Measure of distance between
Figure BDA0003389396020000045
The functional expression of (a) is:
Figure BDA0003389396020000046
in the above formula, the first and second carbon atoms are,
Figure BDA0003389396020000047
representing the characteristic binary group h,
Figure BDA0003389396020000048
Measure of distance between
Figure BDA0003389396020000049
Or characterizing the doublet h,
Figure BDA00033893960200000410
Measure of distance between
Figure BDA00033893960200000411
hc
Figure BDA00033893960200000412
For the calculated token doublet, W4Is a learning parameter;
the automatic measurement learning layer of the P guiding machine calculates the guiding information delta of the C guiding machinecAnd calculating guiding information delta of the P guiding machine by the automatic metric learning layer of the C guiding machinepThe functional expression of (a) is:
Figure BDA00033893960200000413
in the above formula, δh(hi,hj) Guidance information δ representing C guidecOr guidance information δ of the P-bootp,<h,hi,hj>Representing the characterization group calculated by the automatic metric learning layer of the bootstrap, and d is a distance function.
Optionally, when the guidance information of the C-boot apparatus is input into the C-boot apparatus in step 2) to guide the training of the C-boot apparatus, the loss function adopted is:
Figure BDA00033893960200000414
Figure BDA00033893960200000415
in the above formula, the first and second carbon atoms are,
Figure BDA00033893960200000416
representing the loss function employed to train the C leader,<x,xi,xj>for the three-tuple of the input,
Figure BDA00033893960200000417
representing distance magnitude relationship in C leader
Figure BDA00033893960200000418
Guidance information with respect to P-boot
Figure BDA00033893960200000419
The log probability of (a) of (b),
Figure BDA00033893960200000420
and
Figure BDA00033893960200000421
indicating guidance information delta of P leaderp
Figure BDA00033893960200000422
To characterize the binary group h,
Figure BDA00033893960200000423
A measure of the distance between the two,
Figure BDA00033893960200000424
to characterize the binary group h,
Figure BDA00033893960200000425
σ is a logistic function.
Optionally, when the guidance information of the P-boot apparatus in step 2) is transmitted to the P-boot apparatus to guide the training of the P-boot apparatus, the loss function adopted is:
Figure BDA00033893960200000426
Figure BDA00033893960200000427
in the above formula, the first and second carbon atoms are,
Figure BDA00033893960200000428
representing the loss function employed to train the P leader,<x,xi,xj>for the three-tuple of the input,
Figure BDA00033893960200000429
representing distance magnitude relationship in P-booter
Figure BDA00033893960200000430
Guidance information with respect to a C leader
Figure BDA00033893960200000431
The log probability of (a) of (b),
Figure BDA00033893960200000432
and
Figure BDA00033893960200000433
guidance information δ representing C guidec
Figure BDA00033893960200000434
To characterize the binary group h,
Figure BDA00033893960200000435
A measure of the distance between the two,
Figure BDA00033893960200000436
to characterize the binary group h,
Figure BDA00033893960200000437
Measure of distance between
Figure BDA00033893960200000438
σ is a logistic function.
In addition, the invention also provides a feature extraction method for network intrusion data, which comprises the following steps: collecting network behavior data comprising discrete features and continuous features; and inputting the network behavior data containing discrete features and continuous features into a P leader and a C leader which finish training by adopting the metric learning-based self-guided mixed data characterization learning method to obtain the network behavior features corresponding to the network behavior data.
In addition, the invention also provides a metric learning-based self-guided hybrid data characterization learning system, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the steps of the metric learning-based self-guided hybrid data characterization learning method or the steps of the network intrusion data-specific feature extraction method.
Furthermore, the present invention also provides a computer readable storage medium, in which a computer program programmed or configured to execute the metric learning-based self-guided hybrid data characterization learning method or the feature extraction method for network intrusion data is stored.
Compared with the prior art, the invention has the following advantages: the existing network intrusion detection method adopts a mode of directly splicing continuous features and converted discrete features aiming at mixed data, and adopts a simple one-hot coding mode of converting the discrete features, namely setting each appeared feature value as 1 and setting other values as 0. The coding mode (1) ignores the heterogeneous correlation relationship between the discrete characteristic and the continuous characteristic; (2) the correlation between discrete features is ignored. Aiming at the characteristic that the existing mixed data characterization cannot effectively capture the complex coupling relation between different types of features, the invention provides a self-guided characterization learning mechanism based on a complementary coding mode, which can strengthen the relation between attribute objects and is realized as a new characterization learning method by metric learning, wherein the self-guided characterization learning model consists of two mutually cooperative guiding machines. A bootstrap machine infers the distance relationship of a triple through pairwise similarity based on a naive coding space, and then the distance relationship of the triple is used as guidance information and input into another bootstrap machine based on a coupled coding space for metric learning. Similarly, the triplet distance relationship generated by the bootstrap is also used as guiding information to input into the original bootstrap for metric learning. In the process of the interaction and automatic learning, the two guiding machines continuously improve mutual consensus degree, so that a stable state is achieved. Finally, the self-guiding opportunity learns a representation which can effectively distinguish the data objects, and on one hand, the self-guiding opportunity learns a mixed data representation containing discrete features and continuous feature coupling relations; on the other hand, the distinction between the data objects can be realized through a mutual learning mechanism.
Drawings
Fig. 1 is a schematic flow chart of alternate training performed in the embodiment of the present invention.
FIG. 2 shows a naive encoding space F in an embodiment of the inventionpCoupled code space F of C guide machinecSchematic view of the structure of (1).
Fig. 3 is a diagram of a neural network architecture formed by a P-bootstrap and a C-bootstrap in an embodiment of the present invention.
Detailed Description
The invention will be described in further detail below with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, the method for learning the metric learning-based self-guided hybrid data representation in this embodiment includes the steps of performing alternate training on two mutually coupled three-layer neural networks, namely, a P-leader and a C-leader:
1) generating two coding spaces for a P guiding machine and a C guiding machine, and initializing network parameters;
2) performing multiple rounds of alternating training for the P and C leaders, and each round of alternating training comprises: constructing a triple serving as input data, calculating the representation corresponding to the triple in the P leader, calculating the guide information of the C leader according to the obtained representation, inputting the guide information of the C leader into the C leader to guide the training of the C leader, updating the parameters in the C leader, further calculating the representation corresponding to the triple in the C leader, calculating the guide information of the P leader according to the representation in the C leader, transmitting the guide information of the P leader into the P leader to guide the training of the P leader, and updating the parameters in the P leader.
The data embedded learning is mainly responsible for integrating the continuous features and the discrete features learned by CDRL to form a complete representation of data, and the discrete features and the continuous features can be simultaneously mapped to the same continuous space in such a way, and the coupling relation between the features can be learned. The naive coding space converts the discrete data into 0-1 one-hot representation, and then the continuous features are spliced to obtain the naive code of each data object. The coupling code describes the coupling relation between the discrete characteristic and the continuous characteristic, firstly, a correlation relation matrix of all discrete characteristic values and continuous data is generated, namely, each data object generates a data matrix, and then, a data vector is obtained by flattening operation, namely, splicing each row of the matrix, namely, the coupling code of one data object. As shown in fig. 2, the generation of two coding spaces for the P and C directors in step 1) of this embodiment means that a naive coding space F of the P bootstrap is generatedpCoupled code space F of C guide machinecPlain code space FpThe system is used for converting discrete features in the data object into one-hot representation and splicing the one-hot representation with continuous features in the data object to obtain a naive encoding vector of the data object; coupled coding space FcAnd the method is used for generating a correlation relation matrix according to the correlation relation between all discrete features and continuous features in the data object, and then using a data vector obtained by flattening the correlation relation matrix as a coupling encoding vector of the data object. Plain coding space F by P-bootstrap in this embodimentpCoupled code space F of C guide machinecThe construction of two complementary coding spaces allows each data object to be coded from the original table of information into two vectors, called na iotave-coded vectors and coupled-coded vectors. The two coding spaces describe the same object from different angles based on different assumptions. In the naive coding space, each feature is treated as equivalent and independent of each otherThe variable of (2). The na iotave encoding vector distinguishes between discrete and continuous features. However, in the coupled coding space, the continuous features are considered to be highly correlated with the discrete features, and the coupled coding vector can be constructed by estimating the joint probability density of any one continuous feature and any one discrete feature.
Plain coding space FpThe method is characterized by comprising the converted discrete features and the converted continuous features, and the most complete information in the original information table is contained. We use one-hot encoding to convert discrete variables into binary features. Each binary feature has a unique 1 corresponding to a discrete variable value, and the others are all 0. After splicing the converted discrete features to the original continuous features, a new naive code is formed, and the final naive code vector can be expressed as:
Figure BDA0003389396020000061
wherein d isnIs the dimension of the continuous characteristic, | | is the number of discrete characteristic values.
Coupled coding space FcThe data object is formed by splicing coupling coding matrixes of all data objects. In a coupled coding matrix, the rows represent continuous features and the columns represent discrete feature values. Each position in the matrix is estimated from the joint probability density of feature doublets of mixed types, which quantifies the interaction between discrete and continuous features. For each feature doublet, i.e. one discrete feature and one continuous feature
Figure BDA0003389396020000071
We consider it as two variable sets, i.e.<Ai,Vj>And then estimates its density using the product kernel. So from the variable AiContinuous characteristic value of
Figure BDA0003389396020000072
And from variable ViIs measured by the discrete characteristic value vjHas a combined density of
Figure BDA0003389396020000073
Therefore, in this embodiment, the calculation function expression of the correlation between the discrete features and the continuous features is:
Figure BDA0003389396020000074
in the above formula, the first and second carbon atoms are,
Figure BDA0003389396020000075
representing discrete features
Figure BDA0003389396020000076
Continuous characteristic vjThe correlation relationship between the two components is shown,
Figure BDA0003389396020000077
representing continuous features
Figure BDA0003389396020000078
And discrete features vjτ is a threshold parameter and λ is a scaling factor, wherein the joint density
Figure BDA0003389396020000079
The formula of the calculation function is:
Figure BDA00033893960200000710
in the above formula, N is the number of data objects,
Figure BDA00033893960200000711
as discrete characteristic values
Figure BDA00033893960200000712
And vjThe kernel function of (a) to (b),
Figure BDA00033893960200000713
is a kernel function of a continuous feature,
Figure BDA00033893960200000714
represents the variable AiContinuous feature value f on the k-th data objecti
Figure BDA00033893960200000715
Represents the variable AiContinuous feature value f on the x-th data objecti,hiA bandwidth parameter representing a continuous characteristic; wherein the kernel function
Figure BDA00033893960200000716
The definition function expression of (1) is:
Figure BDA00033893960200000717
in the above formula, the first and second carbon atoms are,
Figure BDA00033893960200000718
representing discrete features vjAnd λ is a proportionality coefficient in the corresponding characteristic value on the kth data object.
Figure BDA00033893960200000719
For the kernel function of continuous feature, a gaussian kernel is used as the kernel function of continuous variable in this embodiment.
After we have derived the density estimates of any two discrete variable and continuous variable doublets, we define the coupling encoding matrix M for the data object xxThe following were used:
Figure BDA00033893960200000720
wherein the content of the first and second substances,
Figure BDA00033893960200000721
representing a correlation between discrete features, continuous features, i.e.
Figure BDA00033893960200000722
Is derived from the density estimate of the mixed dyad, reflecting the interaction between the continuous and discrete variables.
In this embodiment, a neural network formed by the P-leader and the C-leader is referred to as MAI (metal-based auto indicator), and the MAI is composed of the P-leader and the C-leader in two different coding spaces. The P and C booters are coupled to each other. The first layer of the model is a set of two eigenvectors of a triplet, the two eigenvectors are encoded from the mixed data, and the coding spaces in the two directors are called a naive coding space and a coupled coding space, respectively. The second layer is a characterization layer that updates the characterization of this layer by a distance metric optimization function from the previous layer. The third layer is an automatic metric learning layer that can enhance the distinguishing information between data objects through the triplet distance relationship and provide guidance information for another leader. In the training process, firstly generating two coding spaces, initializing parameters in two guiding machines, and constructing input data in a small batch parameter updating mode, wherein each group of input data comprises a plurality of triples; and then, calculating the representation corresponding to the triplet in the P leader, calculating according to the obtained representation to obtain guidance information, inputting the guidance information into the C leader for training, updating the parameters in the C leader to obtain the representation corresponding to the triplet in the C leader, calculating according to the representation in the C leader to obtain new guidance information, transmitting the guidance information into the P leader to guide the training of the P leader, and finally obtaining a stable parameter result according to the alternate training mode. And splicing the finally obtained data representations in the two guiding machines to obtain the representation of the final mixed data object. As shown in fig. 3, the three layers of neural networks of the P-bootstrap in this embodiment respectively include:
an input layer for inputting triplets<x,xi,xj>;
A characterization layer for computing triplets<x,xi,xj>The corresponding representation of each element in the group to obtain a representation group
Figure BDA0003389396020000081
Automatic metric learning layer for computing token groups
Figure BDA0003389396020000082
The characteristic binary group h,
Figure BDA0003389396020000083
Measure of distance between
Figure BDA0003389396020000084
And characterizing the dyads h,
Figure BDA0003389396020000085
Measure of distance between
Figure BDA0003389396020000086
And C guidance information delta of the leaderc
The three-layer neural network of the C guiding machine respectively comprises:
an input layer for inputting triplets<x,xi,xj>;
A characterization layer for computing triplets<x,xi,xj>The corresponding representation of each element in the group to obtain a representation group
Figure BDA0003389396020000087
Automatic metric learning layer for computing token groups
Figure BDA0003389396020000088
The characteristic binary group h,
Figure BDA0003389396020000089
Measure of distance between
Figure BDA00033893960200000810
And characterizing the dyads h,
Figure BDA00033893960200000811
Measure of distance between
Figure BDA00033893960200000812
And guidance information δ of the P leaderp
Triple of calculation of characterization layer of P-bootstrap in this embodiment<x,xi,xj>The function expression of the corresponding representation of each element in the table is as follows:
hp=σ(fpW1),
in the above formula, hpFor the characterization corresponding to this element, σ is the logistic function, fpIs a naive encoding vector, W, of the element1Is the weight of the element; the features obtained from the na iotave coding space are independent and not related, where the logistic function defines the functional expression as: σ ═ 1/(1+ e)-z) Where z is an independent variable, in order to capture the coupling relationship between features, the present embodiment uses the characterization layer of the P-bootstrap to encode the vector fpConversion into a token vector h of length K over a fully-connected networkp
Triple calculation by the characterization layer of the C-director in this embodiment<x,xi,xj>The function expression of the corresponding representation of each element in the table is as follows:
hc=σ(fcW2),
in the above formula, hcFor the characterization corresponding to this element, σ is the logistic function, fcFor a coupled coded vector of the element, W2Is the weight of the element. In this embodiment, the C-director is used to couple the coded vector fcMapped as a token vector h of length J over another fully-connected networkc. Through two fully-connected networks, the feature vectors of two coding spaces are respectively mapped into a new characterization vector hpAnd hc. At this stage, the token vector captures only the coupling at the feature level, similar to an automatic coding machine. To enhance data objects in representation hpAnd hcThe above distinctiveness, the present embodiment introduces an automatic metric learning layer, and then none is used through this layerThe bounded objective function achieves optimization of the characterization.
In this embodiment, the automatic metric learning layer of the P-boot machine calculates the characterization group
Figure BDA0003389396020000091
The characteristic binary group h,
Figure BDA0003389396020000092
Measure of distance between
Figure BDA0003389396020000093
And characterizing the dyads h,
Figure BDA0003389396020000094
Measure of distance between
Figure BDA0003389396020000095
The functional expression of (a) is:
Figure BDA0003389396020000096
in the above formula, the first and second carbon atoms are,
Figure BDA0003389396020000097
representing the characteristic binary group h,
Figure BDA0003389396020000098
Measure of distance between
Figure BDA0003389396020000099
Or characterizing the doublet h,
Figure BDA00033893960200000910
Measure of distance between
Figure BDA00033893960200000911
hp,
Figure BDA00033893960200000912
For the calculated token doublet, W3Is a learning parameter;
automatic metric learning layer calculation characterization group of C guiding machine
Figure BDA00033893960200000913
The characteristic binary group h,
Figure BDA00033893960200000914
Measure of distance between
Figure BDA00033893960200000915
And characterizing the dyads h,
Figure BDA00033893960200000916
Measure of distance between
Figure BDA00033893960200000917
The functional expression of (a) is:
Figure BDA00033893960200000918
in the above formula, the first and second carbon atoms are,
Figure BDA00033893960200000919
representing the characteristic binary group h,
Figure BDA00033893960200000920
Measure of distance between
Figure BDA00033893960200000921
Or characterizing the doublet h,
Figure BDA00033893960200000922
Measure of distance between
Figure BDA00033893960200000923
hc,
Figure BDA00033893960200000924
For the calculated token doublet, W4Is a learning parameter;
calculating guiding information delta of the C guiding machine by the automatic measurement learning layer of the P guiding machinecAnd calculating guiding information delta of the P guiding machine by the automatic metric learning layer of the C guiding machinepThe functional expression of (a) is:
Figure BDA00033893960200000925
in the above formula, δh(hi,hj) Guidance information δ representing C guidecOr guidance information δ of the P-bootp,<h,hi,hj>Representing the characterization group calculated by the automatic metric learning layer of the bootstrap, and d is a distance function.
In this embodiment, for the P-boot engine of the naive encoding space, data objects x and xiDeriving distance measures of the two by their characterization
Figure BDA00033893960200000926
Similarly, for a C-boot machine coupled to the code space, data objects x and xiDeriving distance measures of the two by their characterization
Figure BDA00033893960200000927
Given a reference data object x and two comparison data objects xiAnd xjWe can easily calculate the distance measure between them according to the above definition. In the conventional metric learning method, we need to provide a metric sequential relationship of data object doublets, which is generally given by a label. But in unsupervised learning we do not have distance labels. To solve this problem, we design a bootstrap process to obtain the distance magnitude relationship of two pairs of duplets, so as to provide guidance information for the bootstrap. We define a binary function deltahTo represent the distance magnitude relationship of two tuples formed by the representation of a triplet deltah(hi,hj):
Figure BDA00033893960200000928
In the above formula, δh(hi,hj) Guidance information δ representing C guidecOr guidance information δ of the P-bootp,<h,hi,hj>And d is a distance function, such as Euclidean distance, cosine distance and the like. Here we use the following distance function: d (h, h)i)=||h-hi||2
Given a triplet<x,xi,xj>C guide machine calculates a distance size relation
Figure BDA0003389396020000101
And guiding the metric learning process of the P bootstrap. Therefore, the distance magnitude relationship in the P-boot
Figure BDA0003389396020000102
With respect to the guidance information
Figure BDA0003389396020000103
The conditional probabilities in the two directors with respect to the triplets can thus be combined in a multi-objective loss function. In this embodiment, when the guidance information of the C leader is input into the C leader in step 2) to guide the training of the C leader, the loss function adopted is:
Figure BDA0003389396020000104
Figure BDA0003389396020000105
in the above formula, the first and second carbon atoms are,
Figure BDA0003389396020000106
representing the loss function employed to train the C leader,<x,xi,xj>for the three-tuple of the input,
Figure BDA0003389396020000107
representing distance magnitude relationship in C leader
Figure BDA0003389396020000108
Guidance information with respect to P-boot
Figure BDA0003389396020000109
The log probability of (a) of (b),
Figure BDA00033893960200001010
and
Figure BDA00033893960200001011
indicating guidance information delta of P leaderp
Figure BDA00033893960200001012
To characterize the binary group h,
Figure BDA00033893960200001013
A measure of the distance between the two,
Figure BDA00033893960200001014
to characterize the binary group h,
Figure BDA00033893960200001015
σ is a logistic function.
In this embodiment, when the guidance information of the P-boot apparatus in step 2) is transmitted to the P-boot apparatus to guide the training of the P-boot apparatus, the loss function adopted is:
Figure BDA00033893960200001016
Figure BDA00033893960200001017
in the above formula, the first and second carbon atoms are,
Figure BDA00033893960200001018
representing the loss function employed to train the P leader,<x,xi,xj>for the three-tuple of the input,
Figure BDA00033893960200001019
representing distance magnitude relationship in P-booter
Figure BDA00033893960200001020
Guidance information with respect to a C leader
Figure BDA00033893960200001021
The log probability of (a) of (b),
Figure BDA00033893960200001022
and
Figure BDA00033893960200001023
guidance information δ representing C guidec
Figure BDA00033893960200001024
To characterize the binary group h,
Figure BDA00033893960200001025
A measure of the distance between the two,
Figure BDA00033893960200001026
to characterize the binary group h,
Figure BDA00033893960200001027
Measure of distance between
Figure BDA00033893960200001028
σ is a logistic function. This form of the loss function is common to all of the above-mentioned hungse a variant of the loss function, i.e. a probabilistic version of infinite boundary (infinite margin) loss function is used. Therefore, in the embodiment, a metric learning objective function based on an infinite boundary is constructed in the layer, and the boundary of the data object of the characterization layer can be enhanced by optimizing the objective function. Based on the objective optimization function, parameter estimation can be obtained in a gradient descent mode.
The existing network intrusion detection method adopts a mode of directly splicing continuous features and converted discrete features aiming at mixed data, and adopts a simple one-hot coding mode of converting the discrete features, namely setting each appeared feature value as 1 and setting other values as 0. The coding mode (1) ignores the heterogeneous correlation relationship between the discrete characteristic and the continuous characteristic; (2) the correlation between discrete features is ignored. In view of the problems of the existing methods, the present embodiment also provides a method for extracting features from network intrusion data, including: collecting network behavior data comprising discrete features and continuous features; the method comprises the steps of inputting network behavior data containing discrete features and continuous features into a P leader and a C leader which finish training by adopting the metric learning-based self-guided mixed data characterization learning method to obtain network behavior features corresponding to the network behavior data, learning network intrusion behavior characterization with higher distinctiveness by the metric learning-based self-guided mixed data characterization learning method, and inputting the network behavior features corresponding to the network behavior data into a preset network intrusion detection model after the network behavior features corresponding to the network behavior data are obtained, so that a more accurate network intrusion detection result can be obtained.
In addition, the present embodiment also provides a metric learning-based self-guided hybrid data characterization learning system, which includes a microprocessor and a memory connected to each other, where the microprocessor is programmed or configured to execute the steps of the metric learning-based self-guided hybrid data characterization learning method or the steps of the feature extraction method for network intrusion data.
Furthermore, the present embodiment also provides a computer-readable storage medium, in which a computer program programmed or configured to execute the foregoing metric learning-based self-guided hybrid data characterization learning method or the foregoing feature extraction method for network intrusion data is stored.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims (10)

1. A self-guided hybrid data characterization learning method based on metric learning is characterized by comprising the following steps of alternately training two mutually coupled three-layer neural networks of a P guiding machine and a C guiding machine:
1) generating two coding spaces for a P guiding machine and a C guiding machine, and initializing network parameters;
2) performing multiple rounds of alternating training for the P and C leaders, and each round of alternating training comprises: constructing a triple serving as input data, calculating the representation corresponding to the triple in the P leader, calculating the guide information of the C leader according to the obtained representation, inputting the guide information of the C leader into the C leader to guide the training of the C leader, updating the parameters in the C leader, further calculating the representation corresponding to the triple in the C leader, calculating the guide information of the P leader according to the representation in the C leader, transmitting the guide information of the P leader into the P leader to guide the training of the P leader, and updating the parameters in the P leader.
2. The metric learning-based self-guided hybrid data characterization learning method according to claim 1, wherein the generation of two coding spaces for the P and C guides in step 1) is to generate a naive coding space F of the P guidepCoupled code space F of C guide machinecSaid plain code space FpThe system is used for converting discrete features in the data object into one-hot representation and splicing the one-hot representation with continuous features in the data object to obtain a naive encoding vector of the data object; the coupled code spaceM FcAnd the method is used for generating a correlation relation matrix according to the correlation relation between all discrete features and continuous features in the data object, and then using a data vector obtained by flattening the correlation relation matrix as a coupling encoding vector of the data object.
3. The metric learning-based self-guided hybrid data characterization learning method according to claim 2, wherein the calculation function expression of the correlation between the discrete features and the continuous features is as follows:
Figure FDA0003389396010000011
in the above formula, the first and second carbon atoms are,
Figure FDA0003389396010000012
representing discrete features
Figure FDA0003389396010000013
Continuous characteristic vjThe correlation relationship between the two components is shown,
Figure FDA0003389396010000014
representing continuous features
Figure FDA0003389396010000015
And discrete features vjτ is a threshold parameter and λ is a scaling factor, wherein the joint density
Figure FDA0003389396010000016
The formula of the calculation function is:
Figure FDA0003389396010000017
in the above formula, N is the number of data objects,
Figure FDA0003389396010000018
as discrete characteristic values
Figure FDA0003389396010000019
And vjThe kernel function of (a) to (b),
Figure FDA00033893960100000110
is a kernel function of a continuous feature,
Figure FDA00033893960100000111
represents the variable AiContinuous feature value f on the k-th data objecti
Figure FDA00033893960100000112
Represents the variable AiContinuous feature value f on the x-th data objecti,hiA bandwidth parameter representing a continuous characteristic; wherein the kernel function
Figure FDA00033893960100000113
The definition function expression of (1) is:
Figure FDA00033893960100000114
in the above formula, the first and second carbon atoms are,
Figure FDA00033893960100000115
representing discrete features vjAnd λ is a proportionality coefficient in the corresponding characteristic value on the kth data object.
4. The metric learning-based self-guided hybrid data characterization learning method according to claim 1, 2 or 3, wherein the three-layer neural network of the P-guided machine respectively comprises:
an input layer for inputting triplets<x,xi,xj>;
A characterization layer for computing triplets<x,xi,xj>The corresponding representation of each element in the group to obtain a representation group
Figure FDA0003389396010000021
Automatic metric learning layer for computing token groups
Figure FDA0003389396010000022
The characteristic binary group h,
Figure FDA0003389396010000023
Measure of distance between
Figure FDA0003389396010000024
And characterizing the dyads h,
Figure FDA0003389396010000025
Measure of distance between
Figure FDA0003389396010000026
And C guidance information delta of the leaderc
The three-layer neural network of the C guiding machine respectively comprises:
an input layer for inputting triplets<x,xi,xj>;
A characterization layer for computing triplets<x,xi,xj>The corresponding representation of each element in the group to obtain a representation group
Figure FDA0003389396010000027
Automatic metric learning layer for computing token groups
Figure FDA0003389396010000028
The characteristic binary group h,
Figure FDA0003389396010000029
Measure of distance between
Figure FDA00033893960100000210
And characterizing the dyads h,
Figure FDA00033893960100000211
Measure of distance between
Figure FDA00033893960100000212
And guidance information δ of the P leaderp
5. The metric learning-based self-guided hybrid data representation learning method of claim 4, wherein the representation layer of the P-guide computes triplets<x,xi,xj>The function expression of the corresponding representation of each element in the table is as follows:
hp=σ(fpW1),
in the above formula, hpFor the characterization corresponding to this element, σ is the logistic function, fpIs a naive encoding vector, W, of the element1Is the weight of the element;
the characterization layer calculation triplets of the C-boot<x,xi,xj>The function expression of the corresponding representation of each element in the table is as follows:
hc=σ(fcW2),
in the above formula, hcFor the characterization corresponding to this element, σ is the logistic function, fcFor a coupled coded vector of the element, W2Is the weight of the element.
6. The metric learning-based self-guided hybrid data characterization learning method according to claim 4, wherein the automatic metric learning layer of the P-guide machine calculates the characterization group
Figure FDA00033893960100000213
The characteristic binary group h,
Figure FDA00033893960100000214
Measure of distance between
Figure FDA00033893960100000215
And characterizing the dyads h,
Figure FDA00033893960100000216
Measure of distance between
Figure FDA00033893960100000217
The functional expression of (a) is:
Figure FDA00033893960100000218
in the above formula, the first and second carbon atoms are,
Figure FDA00033893960100000219
representing the characteristic binary group h,
Figure FDA00033893960100000220
Measure of distance between
Figure FDA00033893960100000221
Or characterizing the doublet h,
Figure FDA00033893960100000222
Measure of distance between
Figure FDA00033893960100000223
hp
Figure FDA00033893960100000224
For the calculated token doublet, W3Is a learning parameter;
computing characterization groups of the automatic metric learning layer of the C-bootstrap
Figure FDA00033893960100000225
The characteristic binary group h,
Figure FDA00033893960100000226
Measure of distance between
Figure FDA00033893960100000227
Positive characteristic of binary group h,
Figure FDA00033893960100000228
Measure of distance between
Figure FDA00033893960100000229
The functional expression of (a) is:
Figure FDA00033893960100000230
in the above formula, the first and second carbon atoms are,
Figure FDA00033893960100000231
representing the characteristic binary group h,
Figure FDA00033893960100000232
Measure of distance between
Figure FDA00033893960100000233
Or characterizing the doublet h,
Figure FDA00033893960100000234
Measure of distance between
Figure FDA00033893960100000235
hc
Figure FDA00033893960100000236
For the calculated token doublet, W4Is a learning parameter;
the automatic measurement learning layer of the P guiding machine calculates the guiding information delta of the C guiding machinecAnd calculating guiding information delta of the P guiding machine by the automatic metric learning layer of the C guiding machinepThe functional expression of (a) is:
Figure FDA0003389396010000031
in the above formula, δh(hi,hj) Guidance information δ representing C guidecOr guidance information δ of the P-bootp,<h,hi,hj>Representing the characterization group calculated by the automatic metric learning layer of the bootstrap, and d is a distance function.
7. The metric learning-based self-guided hybrid data characterization learning method according to claim 1, wherein in the step 2), when the guidance information of the C-guided machine is input into the C-guided machine to guide the training of the C-guided machine, the loss function is:
Figure FDA0003389396010000032
Figure FDA0003389396010000033
in the above formula, the first and second carbon atoms are,
Figure FDA0003389396010000034
representing the loss function employed to train the C leader,<x,xi,xj>for the three-tuple of the input,
Figure FDA0003389396010000035
representing distance magnitude relationship in C leader
Figure FDA0003389396010000036
Guidance information with respect to P-boot
Figure FDA0003389396010000037
The log probability of (a) of (b),
Figure FDA0003389396010000038
and
Figure FDA0003389396010000039
guide information 6P indicating the P leader,
Figure FDA00033893960100000310
to characterize the binary group h,
Figure FDA00033893960100000311
A measure of the distance between the two,
Figure FDA00033893960100000312
to characterize the binary group h,
Figure FDA00033893960100000313
σ is a logistic function; in the step 2), when the guiding information of the P leader is transmitted into the P leader to guide the training of the P leader, the adopted loss function is as follows:
Figure FDA00033893960100000314
Figure FDA00033893960100000315
in the above formula, the first and second carbon atoms are,
Figure FDA00033893960100000316
representing the loss function employed to train the P leader,<x,xi,xj>for the three-tuple of the input,
Figure FDA00033893960100000317
representing distance magnitude relationship in P-booter
Figure FDA00033893960100000318
Guidance information with respect to a C leader
Figure FDA00033893960100000319
The log probability of (a) of (b),
Figure FDA00033893960100000320
and
Figure FDA00033893960100000321
guidance information δ representing C guidec
Figure FDA00033893960100000322
To characterize the binary group h,
Figure FDA00033893960100000323
A measure of the distance between the two,
Figure FDA00033893960100000324
to characterize the binary group h,
Figure FDA00033893960100000325
Measure of distance between
Figure FDA00033893960100000326
σ is a logistic function.
8. A feature extraction method for network intrusion data is characterized by comprising the following steps: collecting network behavior data comprising discrete features and continuous features; inputting network behavior data containing discrete features and continuous features into a P bootstrap and a C bootstrap which are trained by adopting the metric learning-based self-guided mixed data characterization learning method of any one of claims 1-7, and obtaining network behavior features corresponding to the network behavior data.
9. A metric learning-based self-guided hybrid data representation learning system, comprising a microprocessor and a memory connected to each other, wherein the microprocessor is programmed or configured to perform the steps of the metric learning-based self-guided hybrid data representation learning method according to any one of claims 1 to 7 or the steps of the feature extraction method for network intrusion data according to claim 8.
10. A computer-readable storage medium, wherein a computer program programmed or configured to perform the metric learning-based self-guided hybrid data characterization learning method according to any one of claims 1 to 7 or the feature extraction method for network intrusion data according to claim 8 is stored in the computer-readable storage medium.
CN202111463166.3A 2021-12-02 2021-12-02 Self-guided mixed data representation learning method and system based on metric learning Pending CN114139629A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111463166.3A CN114139629A (en) 2021-12-02 2021-12-02 Self-guided mixed data representation learning method and system based on metric learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111463166.3A CN114139629A (en) 2021-12-02 2021-12-02 Self-guided mixed data representation learning method and system based on metric learning

Publications (1)

Publication Number Publication Date
CN114139629A true CN114139629A (en) 2022-03-04

Family

ID=80387370

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111463166.3A Pending CN114139629A (en) 2021-12-02 2021-12-02 Self-guided mixed data representation learning method and system based on metric learning

Country Status (1)

Country Link
CN (1) CN114139629A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070183A (en) * 2019-03-11 2019-07-30 中国科学院信息工程研究所 A kind of the neural network model training method and device of weak labeled data
CN111919223A (en) * 2018-03-26 2020-11-10 平衡媒体技术有限责任公司 Abstract interface for machine learning algorithm gameplay
CN113158577A (en) * 2021-04-30 2021-07-23 中国人民解放军国防科技大学 Discrete data characterization learning method and system based on hierarchical coupling relation
CN113179276A (en) * 2021-04-30 2021-07-27 中国人民解放军国防科技大学 Intelligent intrusion detection method and system based on explicit and implicit feature learning
US20210334664A1 (en) * 2020-04-24 2021-10-28 Adobe Inc. Domain Adaptation for Machine Learning Models

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111919223A (en) * 2018-03-26 2020-11-10 平衡媒体技术有限责任公司 Abstract interface for machine learning algorithm gameplay
CN110070183A (en) * 2019-03-11 2019-07-30 中国科学院信息工程研究所 A kind of the neural network model training method and device of weak labeled data
US20210334664A1 (en) * 2020-04-24 2021-10-28 Adobe Inc. Domain Adaptation for Machine Learning Models
CN113158577A (en) * 2021-04-30 2021-07-23 中国人民解放军国防科技大学 Discrete data characterization learning method and system based on hierarchical coupling relation
CN113179276A (en) * 2021-04-30 2021-07-27 中国人民解放军国防科技大学 Intelligent intrusion detection method and system based on explicit and implicit feature learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
蹇松雷: "基于复杂异构数据的表征学习研究", 工学博士学位论文, pages 1 - 60 *

Similar Documents

Publication Publication Date Title
CN111581405B (en) Cross-modal generalization zero sample retrieval method for generating confrontation network based on dual learning
CN109783682B (en) Point-to-point similarity-based depth non-relaxed Hash image retrieval method
CN107871014A (en) A kind of big data cross-module state search method and system based on depth integration Hash
CN109960737B (en) Remote sensing image content retrieval method for semi-supervised depth confrontation self-coding Hash learning
CN109753571B (en) Scene map low-dimensional space embedding method based on secondary theme space projection
CN111274398A (en) Method and system for analyzing comment emotion of aspect-level user product
CN112000772B (en) Sentence-to-semantic matching method based on semantic feature cube and oriented to intelligent question and answer
CN114239560B (en) Three-dimensional image classification method, apparatus, device, and computer-readable storage medium
CN115018021A (en) Machine room abnormity detection method and device based on graph structure and abnormity attention mechanism
CN114817568B (en) Knowledge hypergraph link prediction method combining attention mechanism and convolutional neural network
CN115909002A (en) Image translation method based on contrast learning
CN112214570A (en) Cross-modal retrieval method and device based on counterprojection learning hash
CN114896434A (en) Hash code generation method and device based on center similarity learning
CN117152504A (en) Space correlation guided prototype distillation small sample classification method
CN116739100A (en) Vulnerability detection method of quantum neural network and automatic driving vulnerability detection method
CN114139629A (en) Self-guided mixed data representation learning method and system based on metric learning
CN115599984A (en) Retrieval method
CN115422945A (en) Rumor detection method and system integrating emotion mining
Wang et al. Interpolation normalization for contrast domain generalization
CN113297385B (en) Multi-label text classification system and method based on improved GraphRNN
CN112862007B (en) Commodity sequence recommendation method and system based on user interest editing
Jang et al. Observational learning algorithm for an ensemble of neural networks
CN115861196A (en) Active learning method for multi-modal medical images
CN115408536A (en) Knowledge graph complementing method based on context information fusion
CN113158577A (en) Discrete data characterization learning method and system based on hierarchical coupling relation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination