CN110009013B - Encoder training and representation information extraction method and device - Google Patents

Encoder training and representation information extraction method and device Download PDF

Info

Publication number
CN110009013B
CN110009013B CN201910219343.XA CN201910219343A CN110009013B CN 110009013 B CN110009013 B CN 110009013B CN 201910219343 A CN201910219343 A CN 201910219343A CN 110009013 B CN110009013 B CN 110009013B
Authority
CN
China
Prior art keywords
loss
sample data
image
data
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910219343.XA
Other languages
Chinese (zh)
Other versions
CN110009013A (en
Inventor
焦剑波
暴林超
魏云超
石宏辉
刘永雄
刘威
黄煦涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910219343.XA priority Critical patent/CN110009013B/en
Publication of CN110009013A publication Critical patent/CN110009013A/en
Application granted granted Critical
Publication of CN110009013B publication Critical patent/CN110009013B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The method comprises the steps of respectively adopting encoders with the same model parameters to obtain corresponding encoding characteristics aiming at least two loss data of original sample data and original sample data, adopting corresponding decoder decoding characteristics, and obtaining prediction loss based on each encoding characteristic, the original sample data and each decoding characteristic; and if the prediction loss meets the preset convergence condition, initializing a target encoder by adopting the model parameters, and acquiring the characterization information of the data by adopting the target encoder. Therefore, the training efficiency and effect of the encoder training are improved, and the effectiveness of extracted representation information extraction is improved.

Description

Encoder training and representation information extraction method and device
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for encoder training and representation information extraction.
Background
Machine learning: the algorithm is an algorithm for automatically analyzing and obtaining rules from data and predicting unknown data by using the rules. Because a large number of statistical theories are involved in the machine learning algorithm, the machine learning is particularly closely associated with the inference statistics, which are also called statistical learning theories. And (4) algorithm design. Machine learning is the core of artificial intelligence, is the fundamental approach for making computers have intelligence, is applied to all fields of artificial intelligence, and mainly uses induction, synthesis rather than deduction.
Machine learning tasks, such as classification problems, typically require that the input be very easy to process, either mathematically or computationally. However, data in our real world, such as pictures, video, and sensor measurements, are very complex, redundant, and variable. It is very important how to effectively extract and express features.
Since the traditional manual feature extraction requires a lot of manpower and relies on very specialized knowledge, and is inconvenient to popularize, the feature learning comes up. So-called token learning is a collection of techniques to learn a feature, i.e., to convert original sample data into a form of efficiently developed data that can be machine-learned. The method avoids the trouble of manually extracting the features, and allows a computer to learn how to extract the features while learning the use features.
In the prior art, during characterization learning, an encoder is usually trained by combining multiple tasks and learning simultaneously or by means of discriminant learning, so that characterization information of data is extracted through the trained encoder, a required target model is built based on the trained encoder, data processing is performed by using the target model, and for example, transfer learning and the like are further performed by using the characterization information.
Because an encoder for extracting the representation information is a key link of data processing in machine learning, how to improve the training efficiency and effect of the encoder is a problem to be considered at present.
Disclosure of Invention
The embodiment of the application provides a method and a device for training an encoder and extracting characterization information, which are used for improving the training efficiency and effect of the encoder and the effectiveness of the extracted characterization information.
In one aspect, an encoder training method is provided, including:
carrying out noise superposition processing on original sample data to obtain at least two loss data;
aiming at original sample data and at least two loss data, respectively adopting encoders with the same model parameters to carry out encoding processing to obtain corresponding encoding characteristics;
decoding the obtained coding features by adopting a corresponding decoder to obtain corresponding decoding features;
obtaining a discrimination loss based on each coding feature, and obtaining a reconstruction loss based on original sample data and each decoding feature;
obtaining corresponding triple training data according to original sample data;
respectively adopting an encoder with model parameters to carry out feature extraction processing on each training data in the triple training data of the original sample data to obtain corresponding feature vectors;
determining a triplet loss representing the distance relationship between the feature vectors;
obtaining a predicted loss based on the reconstruction loss, the discrimination loss and the triple loss, wherein the predicted loss is positively correlated with the reconstruction loss, the discrimination loss and the triple loss;
and if the predicted loss conforms to the preset convergence condition, determining the model parameter as a reference value of the target parameter of the encoder, and if the predicted loss does not conform to the preset convergence condition, adjusting the model parameter until the predicted loss conforms to the preset convergence condition.
In one aspect, a method for extracting characterization information is provided, including:
obtaining a target model parameter of a target encoder by adopting a reference value of the target parameter of the encoder obtained by the encoder training method;
initializing a target encoder according to the target model parameters;
and adopting the target encoder to obtain the characterization information of the data.
In one aspect, an encoder training apparatus is provided, including:
the superposition unit is used for carrying out noise superposition processing on the original sample data to obtain at least two loss data;
the encoding unit is used for respectively adopting encoders with the same model parameters to carry out encoding processing on the original sample data and the at least two loss data to obtain corresponding encoding characteristics;
the decoding unit is used for decoding the obtained coding characteristics by adopting a corresponding decoder to obtain corresponding decoding characteristics;
the first obtaining unit is used for obtaining the discrimination loss based on each coding characteristic and obtaining the reconstruction loss based on the original sample data and each decoding characteristic;
the second obtaining unit is used for obtaining corresponding triple training data according to the original sample data;
the extracting unit is used for respectively adopting an encoder with model parameters to carry out feature extraction processing on each training data in the triple training data of the original sample data to obtain corresponding feature vectors;
the first determining unit is used for determining the triple loss representing the distance relation among the characteristic vectors;
the prediction unit is used for obtaining prediction loss based on the reconstruction loss, the discrimination loss and the triple loss, and the prediction loss is positively correlated with the reconstruction loss, the discrimination loss and the triple loss;
and the second determining unit is used for determining the model parameter as the reference value of the target parameter of the encoder if the prediction loss conforms to the preset convergence condition, and adjusting the model parameter until the prediction loss conforms to the preset convergence condition if the prediction loss does not conform to the preset convergence condition.
In one aspect, a representation information extraction apparatus is provided, including:
an obtaining unit, configured to obtain a target model parameter of a target encoder from a reference value of an encoder target parameter obtained by the encoder training method;
a setting unit for initializing the target encoder according to the target model parameters;
and the extraction unit is used for acquiring the characterization information of the data by adopting the target encoder.
In one aspect, there is provided a control apparatus comprising:
at least one memory for storing program instructions;
and the at least one processor is used for calling the program instructions stored in the memory and executing the steps of any one of the encoder training methods or the characterization information extraction method according to the obtained program instructions.
In one aspect, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps of any of the encoder training methods or the characterization information extraction methods described above.
In the method and the device for extracting the encoder training and the characterization information, corresponding coding characteristics and decoding characteristics are obtained respectively for at least two loss data of original sample data and original sample data, discrimination loss is obtained based on each coding characteristic, and reconstruction loss is obtained based on the original sample data and each decoding characteristic; respectively obtaining a feature vector of each training data in the triple training data of the original sample data, and determining triple loss representing the distance relationship among the feature vectors; obtaining a predicted loss based on the reconstruction loss, the discrimination loss and the triple loss; and if the prediction loss meets the preset convergence condition, initializing a target encoder by adopting the model parameters, and acquiring the characterization information of the data by adopting the target encoder. Therefore, the training efficiency and effect of the encoder training are improved, special processing is not needed to be carried out on data needing to extract the representation information, various data formats and modes can be applied, the application range is wide, and the extraction effectiveness of the extracted representation information is improved.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic diagram of an encoder training in an embodiment of the present application;
FIG. 2 is a flowchart illustrating an implementation of a method for training an encoder according to an embodiment of the present disclosure;
FIG. 3a is a schematic diagram of a loss data acquisition in an embodiment of the present application;
FIG. 3b is a schematic diagram of noise superposition according to an embodiment of the present disclosure;
FIG. 3c is a schematic diagram illustrating a noise superposition effect according to an embodiment of the present disclosure;
FIG. 3d is a diagram illustrating an image random warping process according to an embodiment of the present disclosure;
FIG. 3e is a diagram illustrating a comparison of characterization learning results according to an embodiment of the present disclosure;
fig. 4 is a flowchart of an implementation of a method for extracting characterization information according to an embodiment of the present application;
FIG. 5a is a schematic structural diagram of an encoder training apparatus according to an embodiment of the present disclosure;
FIG. 5b is a schematic structural diagram of a representation information extraction apparatus according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a control device in an embodiment of the present application.
Detailed Description
In order to make the purpose, technical solution and beneficial effects of the present application more clear and more obvious, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
First, some terms referred to in the embodiments of the present application are explained to facilitate understanding by those skilled in the art.
Machine learning: it is mainly to design and analyze some algorithms that allow computers to "learn" automatically. The machine learning algorithm is an algorithm for automatically analyzing and obtaining rules from data and predicting unknown data by using the rules. Because a large number of statistical theories are involved in the learning algorithm, machine learning is particularly closely related to inference statistics, which are also called statistical learning theories. In the aspect of algorithm design, the machine learning theory concerns the realizable and effective learning algorithm. Machine learning is the core of artificial intelligence, is the fundamental approach for making computers have intelligence, is applied to all fields of artificial intelligence, and mainly uses induction, synthesis rather than deduction.
And (3) characterization learning: is a collection of techniques to learn a feature, i.e., to convert the original sample data into a form that can be efficiently developed by machine learning. The method avoids the trouble of manually extracting the features, and allows a computer to learn how to extract the features while learning the use features.
Laplace transform: it is an integral transform commonly used in engineering mathematics, also known as Laplace transform. The Laplace transform is a linear transform, and can convert a function with a parameter real number t (t is more than or equal to 0) into a function with a parameter complex number s.
And (3) supervision and learning: is a machine learning task that infers a function from labeled training data. The task of supervised learning is to learn a model, which is applied to predict the corresponding output for a given input. This model is typically in the form of a decision function Y ═ f (X) or a conditional probability distribution P (Y | X).
Unsupervised learning: in the machine learning process, no artificial labeled learning form is used, and the method is opposite to 'supervised learning'.
Spatial domain: also known as data space (image space), is a space made up of data pixels. The direct processing of pixel values in data space with length (distance) as an argument is called spatial domain processing.
A Gaussian pyramid: is a technique used in data processing, computer vision, signal processing. The gaussian pyramid is essentially a multi-scale representation of the signal, i.e., the same signal or picture is gaussian blurred multiple times and down-sampled to generate multiple sets of signals or pictures at different scales for subsequent processing.
Laplacian pyramid: and subtracting the predicted data after the upsampling and Gaussian convolution of the data of the upper layer from each layer of data of the Gaussian pyramid to obtain a series of difference data. In the operation process of the gaussian pyramid, partial high-frequency detail information can be lost by data through convolution and downsampling operations, and in order to describe the high-frequency information, the laplacian pyramid is defined.
Affine transformation: the affine transformation between two vector spaces consists of a non-singular linear transformation and a translation transformation.
Judging the model: is a method of modeling the relationship between unobserved data and observed data. In the probabilistic framework, knowing the input variable x, the discriminant model predicts the output y by solving the conditional probability distribution P (y | x).
Convolutional Neural Network (CNN): the artificial neuron is a feedforward neural network, can respond to peripheral units and can perform large-scale data processing. The convolutional neural network includes convolutional layers and pooling layers.
Generate a countermeasure Network (GAN): the system consists of a generating network and a judging network. The generation network takes as input a random sampling from the underlying space, and its output needs to mimic as much as possible the real samples in the training set. The input of the discrimination network is the real sample or the output of the generation network, and the purpose is to distinguish the output of the generation network from the real sample as much as possible. The generation network should cheat the discrimination network as much as possible. The two networks resist each other and continuously adjust parameters, and the final purpose is to make the judgment network unable to judge whether the output result of the generated network is real or not.
The design concept of the embodiment of the present application is described below.
As society moves into the digital information age, real world data (e.g., pictures, video, and sensor measurements) are becoming more complex and varied, which presents significant challenges to data management and analysis. For example, machine learning tasks typically require that the input data be mathematically or computationally very convenient to process, requiring that valid features be extracted and expressed in advance.
Since the traditional manual feature extraction requires a lot of manpower and relies on very specialized knowledge, and is inconvenient to popularize, the feature learning comes up. The method avoids the trouble of manually extracting the features, and allows a computer to learn how to extract the features while learning the use features. For example, the visual characterization learning is to use a camera and a computer to replace human eyes to perform machine vision such as identification, tracking and measurement on a target, and further perform graphic processing, and the data is processed by the computer to become data more suitable for human eyes to observe or transmitted to an instrument to detect. The method can be applied to the fields of visual object identification, such as Web data automatic labeling, mass data searching, data content filtering, medical remote consultation and the like; the method can also be applied to the detection of visual objects, such as industrial robots, unmanned automobiles and other fields; the method can also be applied to visual object tracking, such as identifying and tracking people in video monitoring.
In the traditional scheme, the following modes are mainly adopted during characterization learning:
the first mode is as follows: by reconstructing the original sample data, the compressed features are learned. However, in this way, the learned characterization effect is weak because the task of reconstructing data is simple.
The second way is: the characterization learning is performed by defining different related tasks, such as the relative position relationship of the predicted data blocks, the rotation angle of the predicted data, and the like. However, in this way, strong a priori knowledge is required and there are specific requirements on the format and modality of the input data.
The third mode is as follows: and the characterization learning is realized by integrating multiple tasks and learning at the same time. For example, the relative relationship task, the coloring task, the template task, and the motion segmentation task are merged into one framework. However, since each task corresponds to a respective objective function, input data requires special processing for multi-task learning.
The fourth mode is as follows: and (4) realizing characterization learning by adopting discriminant learning. For example, a twin network or ternary twin network structure is used to distinguish between different samples. However, this method requires large-scale labeling, has a small application range, and consumes a lot of manpower and material resources.
The applicant analyzes the traditional technology and finds that an encoder for extracting the representation information is a key link of data processing, but the traditional technology does not provide a technical scheme of the encoder capable of directly extracting the effective representation information of the original data, so that the training efficiency and the training effect of the encoder are problems to be considered.
In view of this, the applicant considers that laplace transform and noise superposition may be adopted to obtain multiple damaged data of original sample data, and a discriminant inference method may be adopted to perform random distortion processing on the original sample data, so as to obtain triple training data including the original data, and an encoder created based on a convolutional neural network is trained by using the original sample data, the damaged data, and the triple training data, so as to obtain a target encoder, so that the characterization information of the data may be extracted according to the target encoder.
In view of the above analysis and consideration, the embodiment of the present application provides a technical scheme for encoder training and characterizing information extraction, in which laplace transform and noise superposition are applied to original sample data to obtain multiple damaged data of the original sample data; carrying out random distortion processing on original sample data by adopting an differential reasoning method to obtain positive sample data and triple training data containing anchor point sample data, namely the original sample data, the positive sample data and the negative sample data; according to at least two damaged data of the original sample data, respectively adopting encoders with the same model parameters to obtain the discrimination loss and the reconstruction loss of the original sample data; respectively adopting encoders with the same model parameters to obtain the feature vectors of each training data in the triple training data, and determining triple losses representing the distance relationship among the feature vectors; if the prediction loss obtained based on the reconstruction loss, the discrimination loss and the triple loss accords with the convergence condition, the target encoder is obtained according to the encoding parameters, otherwise, the model parameters are adjusted, and the step of adopting Laplace transform and noise superposition to the original sample data is returned. Further, the target encoder is adopted to extract the characterization information of the data. Therefore, the training efficiency and effect of the encoder training are improved, special processing is not needed to be carried out on data needing to extract the representation information, various data formats and modes can be applied, the application range is wide, and the extraction effectiveness of the extracted representation information is improved.
According to the technical scheme for encoder training and representation information extraction, the target encoder used for accurately extracting the representation information can be obtained, and further, a target model applied to the fields of image classification, target detection, automatic driving, robots and the like can be built based on the target encoder.
To further illustrate the technical solutions provided by the embodiments of the present application, the following detailed description is made with reference to the accompanying drawings and the detailed description. Although the embodiments of the present application provide the method operation steps as shown in the following embodiments or figures, more or less operation steps may be included in the method based on the conventional or non-inventive labor. In steps where no necessary causal relationship exists logically, the order of execution of the steps is not limited to that provided by the embodiments of the present application. The method can be executed in sequence or in parallel according to the method shown in the embodiment or the figure when the method is executed in an actual processing procedure or a device.
Referring to fig. 1, a schematic diagram of an encoder training scheme provided in the present application is shown. The main principles of encoder training are as follows:
s101: the encoding characteristics and the decoding characteristics of original sample data and damaged data are obtained through the Laplace distillation module 101, and the feature vectors corresponding to the training data in the triple training data are obtained through the discriminant reasoning module 102.
The damaged data is obtained by performing laplacian transform and noise superposition on original sample data. The triplet training data includes: anchor sample data, positive sample data, and negative sample data. The anchor sample data is the original sample data. The positive sample data is data obtained by randomly distorting original sample data. The negative sample data is data different from the original sample data.
S102: obtaining the discrimination loss through each coding feature; obtaining reconstruction loss through each decoding characteristic and original sample data; and obtaining the triple loss according to each feature vector.
S103: and obtaining the predicted loss according to the discrimination loss, the reconstruction loss and the triple loss.
S104: if the prediction loss meets the preset convergence condition, obtaining a target encoder based on the model parameters of each encoder, otherwise, adjusting the model parameters of the laplacian distillation module 101 and the discriminant inference module 102 according to the prediction loss. Optionally, the predicted loss meets a preset convergence condition, and may be that the predicted loss is lower than a preset threshold.
Wherein the laplace distillation module 101: the method comprises the steps of performing Laplace transform and noise superposition processing on original sample data to obtain at least two damaged data; respectively adopting encoders with the same model parameters to carry out encoding processing on original sample data and each damaged data to obtain corresponding encoding characteristics; and respectively adopting a corresponding decoder to decode each coding characteristic to obtain corresponding decoding characteristics.
Wherein the discriminant inference module 102: the method comprises the steps of performing random distortion processing on original sample data to obtain positive sample data; combining the positive sample data, the negative sample data and the anchor point data into triple training data; and respectively carrying out coding processing and full connection on each training data in the triple training data by adopting a coder to obtain corresponding characteristic vectors.
In fig. 1, an image "dog" is used as original sample data, an image cat is used as negative sample data, and random noise, information removal noise, and blurring noise are used as three different superimposed noises as examples for explanation. In practical application, original sample data, negative sample data and noise types can be selected according to actual requirements. For example, the noise type may also be true random noise, multi-scale blurring or information loss, and the like, which is not limited herein. The model parameters of each encoder in the set of encoders 103 are shared.
Referring to fig. 2, a flowchart of an implementation of an encoder training method provided in the present application is shown. The method comprises the following specific processes:
step 201: the control equipment acquires original sample data and negative sample data to be processed.
Specifically, when step 201 is executed, the negative sample data is data different from the original sample data. Optionally, any data different from the original sample data may be selected from the data set. The original sample data can be data in image, video, multi-frame data and other formats.
For example, the original sample data is a peony image, and the negative sample data is a rose image.
Step 202: the control equipment performs Laplace transform and noise superposition on the original sample data to obtain at least two damaged data.
Specifically, referring to fig. 3a, a schematic diagram of the acquisition of loss data is shown.
S2021: and the control equipment performs Gaussian transformation on the original sample data to obtain a Gaussian pyramid.
The gaussian pyramid is a technique used in encoder training, computer vision, and signal processing. The gaussian pyramid is essentially a multi-scale representation of the signal, i.e., the same signal or picture is gaussian blurred multiple times and down-sampled to generate multiple sets of signals or pictures at different scales for subsequent processing.
S2022: the control device performs laplacian transformation on the gaussian pyramid to obtain a laplacian pyramid.
In the operation process of the gaussian pyramid, partial high-frequency detail information can be lost by data through convolution and downsampling operations, and in order to describe the high-frequency information, the laplacian pyramid is defined. The laplacian pyramid is: and subtracting the predicted data after the upsampling and Gaussian convolution of the data of the upper layer from each layer of data of the Gaussian pyramid to obtain a series of difference data. The laplacian pyramid contains at least two layers of sampled data.
S2022: the control device performs the following steps for each noise type of noise in the noise set, respectively: and superposing noise of a noise type in a layer of randomly selected sampling data in the Laplace pyramid, and performing Laplace inverse transformation on the Laplace pyramid subjected to the superposition of the noise to obtain corresponding loss data.
Optionally, for different superimposed noise types, when obtaining loss data, the following formula may also be adopted:
Figure BDA0002003084230000111
wherein,
Figure BDA0002003084230000112
to lose data, x is the original sample data,
Figure BDA0002003084230000113
the first-layer sampling data is obtained after noise is superimposed on the first-layer sampling data of the Laplacian pyramid, and C is a noise type in the noise set C.
Specifically, when obtaining the loss data based on the random noise, the following formula may be adopted:
Figure BDA0002003084230000114
wherein,
Figure BDA0002003084230000115
to lose data, x is the original sample data,
Figure BDA0002003084230000116
the first layer of sampled data is obtained after noise is superimposed on the first layer of sampled data of the Laplacian pyramid. Dn represents random noise.
Specifically, when obtaining loss data based on information denoising, the following formula may be adopted:
Figure BDA0002003084230000117
wherein,
Figure BDA0002003084230000121
to lose data, x is the original sample data,
Figure BDA0002003084230000122
the first layer of sampled data is obtained after noise is superimposed on the first layer of sampled data of the Laplacian pyramid. In represents information removal noise.
Specifically, when obtaining the loss data based on the blurring noise, the following formula may be adopted:
Figure BDA0002003084230000123
wherein,
Figure BDA0002003084230000124
to lose data, x is the original sample data,
Figure BDA0002003084230000125
the first layer of sampled data is obtained after noise is superimposed on the first layer of sampled data of the Laplacian pyramid. SR denotes blurring noise.
Optionally, if the original sample data is image data, the original sample data may be adjusted to a set length and width, and random cropping may be performed. For example, the length and width are set to 256x256, and the length and width after random cutting is 227x 227.
In the embodiment of the present application, the noise types of the noise set are described by taking random noise, information removal noise, and fuzzification noise as examples. The noise type may also be true random noise, multi-scale blurring or information loss, etc. And are not intended to be limiting herein.
Optionally, the random noise may be gaussian random noise with a set variance (e.g., 25), and when the random noise is superimposed, a layer of sampling images is randomly selected from the laplacian pyramid for superimposition.
Optionally, noise is removed from the information, a set percentage of pixel points can be randomly removed, and when the noise is removed from the information and superimposed, a layer of sampling images is randomly selected from the laplacian pyramid for superimposing.
The step of blurring noise is to remove the high-frequency information by removing the information of the lowest layer of the Gaussian pyramid.
In the embodiment of the application, original sample data is constructed into the laplacian pyramid, noise superposition is respectively carried out in the laplacian pyramid through noises of various noise types, and the laplacian pyramid after the noises are superposed is reconstructed into lost data. The original sample data of the spatial domain is converted into a Laplacian pyramid of the Laplacian domain through Laplacian transformation, and then is inversely transformed into lost data of the spatial domain.
In this way, noise superposition is performed in the laplacian domain, rather than spatial domain superposition noise in the traditional approach, such that changes to the data are accompanied by global semantic information, rather than just local semantic information. Because it is difficult to capture the non-local semantic concepts only by the local semantic information, in the embodiment of the application, better representation can be learned by the global semantic information.
Further, in the embodiment of the application, noise superposition is performed by parallelly adopting the noises of multiple noise types, so that an encoder can learn a more difficult task, obtain stronger learning capability and learn better characterization information.
Referring to fig. 3b, a schematic diagram of noise superposition is shown, and fig. 3b shows the result of noise superposition at different laplacian pyramid levels. The images shown in fig. 3b are, in order: the original sample data includes conventional data obtained by conventionally superimposing noise (i.e., directly superimposing noise in a spatial domain), loss data with a noise superimposed Level (LPS) of 4, loss data with LPS of 6, and loss data with LPS ═ 8.
As can be seen from fig. 3b, compared with the conventional method in which noise is directly superimposed in the spatial domain, the loss data obtained by superimposing noise in the laplacian transform domain focuses not only on the local information but also on the global information. And loss data obtained by superposing noise at different LPS levels also shows different ranges on interference scale, so that a better encoder for extracting characterization information can be obtained in subsequent steps.
Fig. 3c is a schematic diagram illustrating the noise superposition effect. The images shown in fig. 3c are, in order: original sample data, loss data of superimposed random noise, loss data of superimposed information noise removal, and loss data of superimposed fuzzification noise. As can be seen in fig. 3c, the superposition of noise of different noise types produces different noise effects, but each image reflects features that combine local and global information.
Step 203: and the control equipment acquires triple training data according to the original sample data and the negative sample data.
Specifically, the triplet training data includes: anchor sample data, positive sample data, and negative sample data. The anchor sample data is the original sample data. The positive sample data is obtained by randomly distorting original sample data. The negative sample data is data different from the original sample data. The random warping process may be implemented by perspective transformation, affine transformation, rotation transformation, and the like, which is not limited herein.
When the control device obtains positive sample data, the following steps can be adopted:
s2031: and randomly sampling the original sample data to obtain randomly sampled data.
Specifically, the original sample data is normalized, and random sampling is performed in a designated area to obtain each random sampling data.
S2032: and obtaining an affine transformation matrix according to the random sampling data and the target data.
Specifically, the affine transformation matrix satisfies the following conditions: and multiplying the affine transformation matrix and the random sampling data into target data.
If the original sample data is an original image, the length and width of the original image are normalized (for example, 256 × 256), and random sampling is performed in designated areas (for example, 100 × 100 at four corners) of the original image respectively, so as to obtain random sampling coordinate points, and obtain a quadrilateral area. The affine transformation matrix satisfies the following formula:
Figure BDA0002003084230000141
where M is an affine transformation matrix, i is a randomly sampled coordinate point with a sequence number of 0, 1, 2, 3 … …, t is a transformation coefficient, and src (i) is a randomly sampled coordinate point (x)i,yi),xi,yiRespectively the abscissa and ordinate of the randomly sampled coordinate point. Coordinate point dst (i) ═ (x) of target pointi′,yi′),xi′,yi' are the abscissa and ordinate of the coordinate point of the target point, respectively.
S2033: and randomly distorting the original sample data according to the affine transformation matrix, and cutting and scaling the randomly distorted original sample data to obtain positive sample data.
Specifically, since the affine transformation matrix satisfies the following condition: the product of the affine transformation matrix and the random sampling data is the target data, so that the original sample data can be subjected to affine transformation matrix to realize random distortion of the original sample data. And then the edges of the original data after random distortion can be cut and filled, and the original data can be scaled to the original size.
Optionally, when positive sample data is obtained, the following formula may be adopted:
xp=Pers(x);
wherein x is original sample data, xpPers () is a random warping function for positive sample data. Alternatively, the random warping processing function may adopt an affine transformation matrix or a perspective transformation function.
For example, referring to fig. 3d, a schematic diagram of an image random warping process is shown. The images shown in fig. 3d are, in order: and carrying out random sampling on the original sample data, and carrying out perspective transformation on the original sample data to obtain positive sample data.
As shown in fig. 3d, the control device performs random sampling on the original sample data to obtain each random sampling coordinate point, and performs perspective transformation on the original sample data according to the random sampling coordinate point and the coordinate point of the target point to obtain the positive sample data.
In the embodiment of the application, original sample data is used as anchor data, the original sample data is transformed to obtain positive sample data, and a sample different from the original sample data is selected as negative sample data. And combining anchor point data, positive sample data and negative sample data into triple training data. In this way, after the original sample data is randomly warped, although the positive sample data is deformed and warped more greatly than the original sample data (e.g. dog in the image in fig. 3 d), the main semantic information in the original sample data is retained in the positive sample data.
In the embodiment of the present application, only the step 202 is executed first and then the step 203 is executed as an example for description, in practical applications, the execution sequence of the step 202 and the step 203 may be executed sequentially or simultaneously, which is not limited to this.
Step 204: the control equipment obtains the encoding characteristics and the decoding characteristics of the original sample data and each damaged data, and obtains the characteristic vector of each training data in the triple training data.
Specifically, the control device establishes a CNN model through the CNN, obtains the encoding characteristics and the decoding characteristics of the original sample data and each damaged data by using the CNN model, and obtains the feature vector of each training data in the triplet training data. The CNN model mainly includes an encoder and a decoder.
When the encoding characteristics of the original sample data and each damaged data are obtained, the following steps can be adopted: and respectively adopting encoders with the same model parameters to carry out encoding processing on each damaged data and the original sample data to obtain corresponding encoding characteristics.
When the decoding characteristics of the original sample data and each damaged data are obtained, the following steps can be adopted: and respectively aiming at each coding characteristic, adopting a corresponding decoder to perform decoding processing to obtain a corresponding decoding characteristic.
When obtaining the feature vector of each training data in the triplet training data, the following steps may be adopted:
and respectively carrying out coding processing and full-connection processing on the characteristics by adopting encoders with the same model parameters aiming at each training data in the triple training data to obtain corresponding characteristic vectors. In the embodiment of the present application, the model parameters of each encoder are shared.
The CNN model body may have any structure, and in the embodiment of the present application, an AlexNet structure is taken as an example for description. The encoder adopts AlexNet, and the decoder is a three-layer deconvolution (deconv) layer and is used for decoding and reconstructing the coding characteristics obtained by the encoder into a data structure with the same size as the original sample data. The encoder is also used to extract feature vectors of the training data.
As shown in fig. 1, in the embodiment of the present application, since noise superposition processing is performed on original sample data by using three types of noise, three AlexNet with the same structure are used to perform encoding processing on each loss data, and three decoders are used to decode each obtained encoding feature. Wherein, the model parameters in each encoder are shared, and the model parameters in each decoder may not be shared. And fully connecting the feature vectors output by the encoder through a full connection layer aiming at each training data in the triple training data to obtain the fully connected feature vectors.
The characterization learned by the model is mainly embodied in the model parameters of the encoder, and the quality of the characterization can be verified by verifying the performance of the model parameters.
Step 205: and the control equipment obtains the prediction loss according to the original sample data, each coding characteristic, each decoding characteristic and each characteristic vector.
Specifically, the control device obtains the discrimination loss through each coding feature, obtains the reconstruction loss through each decoding feature and the original sample data, obtains the triple loss according to each feature vector, and obtains the prediction loss according to the discrimination loss, the reconstruction loss and the triple loss.
The discriminant loss represents the similarity of the encoding characteristics output by the encoder and the encoding characteristics of the original sample data in the characteristic distribution. The reconstruction loss is used for judging the similarity degree of the output data of the decoder and the original sample data in a spatial domain. The triplet penalty is used to represent: and (4) triplet loss of distance relation between feature vectors of training data in the triplet training data.
Wherein, when the discriminant loss is obtained, a discriminant sub-function may be employed:
LD=Ex[logD(G(x))]+∑c∈CEc[log(1-D(G(Lapc)))];
wherein L isDTo distinguish loss, x is the original sample data, G (x) is the coding feature of the original sample data, G (Lap)c) To lose the coding characteristics of the data, D () is the discriminator network, E is the data expectation, and C is one of the noise types in the noise set C.
In the embodiment of the application, the discriminant subfunction refers to the concept of GAN, a CNN model is used as a generator G, the generator G is implemented by adopting 4 layers of convolution (conv), and the output of an encoder is used as the input of the discriminant subfunction. Conventionally, GAN networks usually use discriminators for image domains, and in the embodiment of the present application, discriminators are used for feature domains, so as to expect to obtain similarity of feature planes. Therefore, the coding characteristics obtained by the coder and the coding characteristics obtained by the original sample data can be ensured to keep consistency in characteristic distribution, namely the similarity of data distribution.
Wherein, when the reconstruction loss is obtained, a reconstruction subfunction may be adopted:
Lrec=∑c∈CEx‖x-zc2+Ex‖x-zx2
wherein L isrecFor reconstruction loss, E is the mathematical expectation, x is the original sample data, zcDecoding characteristics of the lost data for noise type c, zxC is a noise type in the noise set C, which is a decoding characteristic of the original sample data.
Therefore, the reconstruction sub-function comprehensively judges the performance of all reconstruction processes according to each loss data and the reconstruction data of the original sample data, namely the decoding characteristics.
When the triple loss is obtained, a triple loss function can be adopted:
Ltrip=|d(Fθ(x),Fθ(xp))-d(Fθ(x),Fθ(y))+δ|+
wherein L istripFor triple loss, x is the original sample data, y is the negative sample data, xpAs positive sample data, FθIs a feature vector | · non conducting phosphor+Representing taking a positive function, i.e. taking 0 when the function value is a negative value and remaining unchanged when the function value is a non-negative value, d () is a distance function, optionally, an euclidean distance may be used, δ represents the minimum boundary of the feature vector of the positive sample data and the feature vector of the negative sample data, and optionally, δ may be set to 20.
Wherein the predicted loss can be obtained by using the following formula:
Figure BDA0002003084230000171
wherein L is the predicted loss, LtripIs a triplet loss, LrecFor reconstruction of losses, LDTo discriminate losses, G is the generator, which is used to obtain the coding features, and D () is the discriminator network.
Step 206: the control device determines whether the predicted loss meets a predetermined convergence condition, if so, performs step 207, otherwise, performs step 208.
Step 207: the control device determines the model parameters as reference values for the encoder target parameters.
Step 208: the control device adjusts the model parameters of the encoder and decoder according to the prediction loss, and step 201 is performed.
Specifically, when the steps 206 to 208 are executed, if the prediction loss meets the preset convergence condition, the control device determines the model parameter as the reference value of the encoder target parameter. And if the predicted loss does not accord with the preset convergence condition, the control equipment adjusts the model parameters until the predicted loss accords with the preset convergence condition.
After obtaining the reference value of the target parameter of the encoder, the target encoder may be initialized according to the reference value of the target parameter of the encoder, and the target encoder is used to obtain the characterization information of the data. Fig. 4 is a flowchart of an implementation of the method for extracting the characterization information. The method comprises the following specific processes:
step 401: and controlling the reference value of the target parameter of the encoder of the equipment to obtain the target model parameter of the target encoder.
Step 402: and the control equipment initializes the target encoder according to the reference value of the encoder target parameter and adopts the target encoder to obtain the characterization information of the data.
Further, the control device may build a desired target model according to the target encoder, and perform data processing using the target model.
The target model is mainly a model which needs to extract the representation information of data and process the data according to the representation information, and can be applied to the fields of image classification, target detection, automatic driving, visual object tracking, Web data automatic labeling, mass data searching, data content filtering, medical remote consultation, robots and the like.
For example, the object model may be a classification model, an object detection model, a semantic segmentation model, and the like. The target tasks may be classification tasks, target detection tasks, and semantic segmentation tasks.
In the embodiment of the application, the validity of the characterization information extracted by the target encoder is evaluated from the perspective of convolutional layer output, model initialization and transfer learning.
And evaluating the first evaluation scene according to the coding characteristics output by the convolutional layer. FIG. 3e is a diagram illustrating a comparison between the characterization learning result and the learning result. Fig. 3e (a) is: and when the traditional full-supervised characterization learning method is adopted to extract the coding features of the image, the coding features output by the first layer of convolutional layer. Fig. 3e (b) is: by adopting the scheme in the embodiment of the application, the coding characteristics output by the first layer of the convolutional layer are obtained when the target encoder extracts the coding characteristics of the image.
Obviously, the graph (b) in fig. 3e is similar to the graph (a), that is, the target encoder can obtain encoding characteristics similar to those of the conventional fully supervised characterization learning method, and can learn more accurate edge filters and color filters well.
Table 1.
Figure BDA0002003084230000191
And a second evaluation scenario, wherein evaluation is performed from the perspective of model initialization. Referring to table 1, an evaluation table is initialized for the model. Table 1 contains 5 initialization modes, in order: random initialization, spatial domain initialization, laplacian initialization, discriminant inference initialization, and the target encoder of the scheme. I.e. the encoders obtained by training respectively by random, spatial domain, laplacian, discriminant reasoning and in the present scheme.
Specifically, each encoder is obtained based on 5 manners in table 1, and a linear classifier is respectively connected to each layer of the 5 convolutional layers of each encoder (e.g., AlexNet), so as to evaluate the classification performance, i.e., the classification accuracy, of the encoder in data (e.g., an image network aggregation (ImageNet) data set).
As can be seen from table 1, the target encoder obtained by the present scheme is significantly higher than the encoders obtained by other schemes in the classification performance.
And applying a third scenario, evaluating from the perspective of transfer learning, namely detecting whether the obtained encoder can help other data and characterization learning of the task.
Referring to table 2, a migration learning evaluation table is shown. The values in table 2 represent the task scores. The 5 methods shown in table 2 are sequentially adopted to obtain each encoder, and a corresponding classification model, a target detection model and a semantic segmentation model are obtained based on each encoder respectively, so as to execute the tasks of classification, target detection and semantic segmentation.
Table 2.
Figure BDA0002003084230000201
The Fc6-8 refers to that when an encoder of the classification model is trained, model parameters of the convolution of the first 5 layers of the encoder are fixed and are not updated, and only model coefficients of the full connection layer Fc6-8 are updated. Correspondingly, ALL means that ALL model parameters are updated and learned when the encoder is trained. As can be seen from table 2, the task scores of the tasks executed by the models built based on the target encoder in the present solution are significantly higher than the task scores of other modes, that is, the present solution is significantly better than other solutions.
In the embodiment of the application, on one hand, the original sample data is subjected to Laplace transformation, the original sample data in a spatial domain is converted into a Laplace pyramid in a Laplace domain, noise is superimposed on a random layer of the Laplace pyramid to obtain loss data, combination of bottom-layer representation and high-layer representation is achieved, and features sensitive to edge features can be learned;
on the other hand, random distortion processing is carried out on the data through an identification reasoning method, the triple loss of the distance relation between the feature vectors of the triple training data is obtained, the distance of different contents is increased in the feature space, meanwhile, the difference of similar contents is reduced, and therefore the encoder can also obtain high-level semantic information of the data.
And moreover, discrimination loss and reconstruction loss of original sample data and triple loss of triple training data are obtained, and prediction loss is obtained based on the discrimination loss, the reconstruction loss and the triple loss, so that the scheme simultaneously considers the similarity of the spatial domain and the feature domain distribution and the feature vector similarity between positive sample data and negative sample data, jointly restricts the training process, and adopts multi-task learning (such as superposition of various noises) to make the representation information obtained by training more robust.
The embodiment of the application has no strong constraint on the input data, can use various data formats and modes, does not require any special processing on the input data, has wider applicability, can extract more bottom-layer semantic information, and obtains more robust and more representative model parameters for subsequent application. For example, instead of performing model training on a large-scale labeled data set and performing model initialization on a target model, the encoder training method provided in the embodiment of the present application is used to build a target model applied to the fields of image classification, target detection, automatic driving, visual object tracking, Web data automatic labeling, mass data search, data content filtering, medical remote consultation, robots, and the like.
And initializing the model.
Based on the same inventive concept, the embodiment of the present application further provides an encoder training apparatus, and as the principle of the apparatus and the device for solving the problem is similar to an encoder training method, the implementation of the apparatus can refer to the implementation of the method, and repeated details are omitted.
Fig. 5a is a schematic structural diagram of an encoder training apparatus according to an embodiment of the present application. An encoder training apparatus comprising:
a superposition unit 510, configured to perform noise superposition processing on original sample data to obtain at least two loss data;
the encoding unit 511 is configured to perform encoding processing on the original sample data and the at least two pieces of loss data by using encoders with the same model parameter, respectively, to obtain corresponding encoding characteristics;
a decoding unit 512, configured to perform decoding processing on the obtained encoding features by using corresponding decoders, so as to obtain corresponding decoding features;
a first obtaining unit 513, configured to obtain a discrimination loss based on each coding feature, and obtain a reconstruction loss based on original sample data and each decoding feature;
a second obtaining unit 514, configured to obtain, according to the original sample data, corresponding triple training data;
an extracting unit 515, configured to perform, for each training data in the triple training data of the original sample data, feature extraction processing by using an encoder with model parameters, respectively, to obtain a corresponding feature vector;
a first determining unit 516, configured to determine a triplet loss characterizing a distance relationship between feature vectors;
the prediction unit 517 is configured to obtain a prediction loss based on the reconstruction loss, the discrimination loss, and the triplet loss, where the prediction loss is positively correlated with the reconstruction loss, the discrimination loss, and the triplet loss;
a second determining unit 518, configured to determine the model parameter as a reference value of the target parameter of the encoder if the prediction loss meets a preset convergence condition, and adjust the model parameter until the prediction loss meets the preset convergence condition if the prediction loss does not meet the preset convergence condition.
Preferably, the two damaged data are obtained by performing laplace transform on original sample data and superimposing noise;
the triplet training data includes: the anchor point sample data is original sample data, the positive sample data is obtained by randomly distorting the original sample data, and the negative sample data is different from the original sample data.
Preferably, the first determining unit 516 is configured to:
determining a first distance between the feature vector of the anchor point sample data and the feature vector of the positive sample data;
determining a second distance between the feature vector of the anchor point sample data and the feature vector of the negative sample data;
a triplet penalty is determined based on a difference between the first distance and the second distance.
Preferably, the first obtaining unit 513 is configured to:
respectively obtaining an original discrimination value of the coding feature of original sample data and a loss discrimination value of the coding feature of each loss function by adopting a preset discrimination function;
determining a discrimination loss based on the original discrimination value and each loss discrimination value;
the discrimination loss represents the similarity degree of the encoding characteristics output by the encoder and the encoding characteristics of the original sample data on the characteristic distribution, and the discrimination loss is positively correlated with the original discrimination value and negatively correlated with the loss discrimination value.
Preferably, the first obtaining unit 513 is configured to:
respectively determining a decoding difference value between each decoding characteristic and original sample data;
obtaining a reconstruction loss based on each decoded difference;
the reconstruction loss is used for judging the similarity degree of the output data of the decoder and the original sample data in a spatial domain.
Based on the same inventive concept, the embodiment of the present application further provides a device for extracting characterization information, and since the principle of the device and the apparatus for solving the problem is similar to that of a method for extracting characterization information, the implementation of the device can refer to the implementation of the method, and repeated details are not repeated.
Fig. 5b is a schematic structural diagram of a representation information extraction apparatus according to an embodiment of the present application. A characterization information extraction device includes:
an obtaining unit 521, configured to obtain a target model parameter of a target encoder according to a reference value of a target parameter of the encoder obtained by the above-mentioned encoder training method;
a setting unit 522 for initializing the target encoder according to the target model parameters;
an extracting unit 523, configured to obtain the characterization information of the data by using the target encoder.
In the method and the device for extracting the encoder training and the characterization information, corresponding coding characteristics and decoding characteristics are obtained respectively for at least two loss data of original sample data and original sample data, discrimination loss is obtained based on each coding characteristic, and reconstruction loss is obtained based on the original sample data and each decoding characteristic; respectively obtaining a feature vector of each training data in the triple training data of the original sample data, and determining triple loss representing the distance relationship among the feature vectors; obtaining a predicted loss based on the reconstruction loss, the discrimination loss and the triple loss; and if the prediction loss meets the preset convergence condition, initializing a target encoder by adopting the model parameters, and acquiring the characterization information of the data by adopting the target encoder. Therefore, the training efficiency and effect of the encoder training are improved, special processing is not needed to be carried out on data needing to extract the representation information, various data formats and modes can be applied, the application range is wide, and the extraction effectiveness of the extracted representation information is improved.
Fig. 6 is a schematic structural diagram of a control device. Based on the same technical concept, the embodiment of the present application further provides a control device, which may include a memory 601 and a processor 602.
The memory 601 is used for storing computer programs executed by the processor 602. The memory 601 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like. The processor 602 may be a Central Processing Unit (CPU), a digital processing unit, or the like. The specific connection medium between the memory 601 and the processor 602 is not limited in the embodiments of the present application. In the embodiment of the present application, the memory 601 and the processor 602 are connected by a bus 603 in fig. 6, the bus 603 is represented by a thick line in fig. 6, and the connection manner between other components is merely for illustrative purposes and is not limited thereto. The bus 603 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.
The memory 601 may be a volatile memory (volatile memory), such as a random-access memory (RAM); the memory 601 may also be a non-volatile memory (non-volatile memory) such as, but not limited to, a read-only memory (rom), a flash memory (flash memory), a Hard Disk Drive (HDD) or a solid-state drive (SSD), or any other medium which can be used to carry or store desired program code in the form of instructions or data structures and which can be accessed by a computer. The memory 601 may be a combination of the above memories.
A processor 602, configured to execute the encoder training method provided in the embodiment shown in fig. 2 and the characterization information extraction method provided in the embodiment shown in fig. 4 when calling the computer program stored in the memory 601.
Embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the encoder training method and the representation information extraction method in any of the above method embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the technical solutions mentioned above substantially or otherwise contributing to the related art may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a control device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (14)

1. An encoder training method for image processing, comprising:
performing noise superposition processing on image sample data to obtain at least two image loss data, wherein the two image loss data are obtained after performing Laplace transform and superposition noise on the image sample data;
aiming at the image sample data and the at least two image loss data, respectively adopting encoders with the same model parameters to carry out encoding processing to obtain corresponding image encoding characteristics;
adopting a corresponding decoder to decode the obtained image coding features to obtain corresponding image decoding features;
obtaining a discrimination loss based on each image coding feature, and obtaining a reconstruction loss based on the image sample data and each image decoding feature;
obtaining corresponding triple image training data according to the image sample data;
respectively adopting an encoder with the model parameters to carry out feature extraction processing on each training data in the triple image training data of the image sample data to obtain corresponding feature vectors;
determining a triplet loss representing the distance relationship between the feature vectors;
obtaining a predicted loss based on the reconstruction loss, the discrimination loss and the triplet loss, wherein the predicted loss is positively correlated with the reconstruction loss, the discrimination loss and the triplet loss;
and if the prediction loss conforms to the preset convergence condition, determining the model parameter as a reference value of the target parameter of the encoder, and if the prediction loss does not conform to the preset convergence condition, adjusting the model parameter until the prediction loss conforms to the preset convergence condition.
2. The method of claim 1, wherein the triplet of image training data comprises: the image processing method comprises anchor point sample data, positive sample data and negative sample data, wherein the anchor point sample data are the image sample data, the positive sample data are obtained by performing random warping on the image sample data, and the negative sample data are data different from the image sample data.
3. The method of claim 2, wherein determining a triplet of penalties that characterize the distance relationship between feature vectors comprises:
determining a first distance between the feature vector of the anchor point sample data and the feature vector of the positive sample data;
determining a second distance between the feature vector of the anchor point sample data and the feature vector of the negative sample data;
determining a triplet penalty based on a difference between the first distance and the second distance.
4. The method of claim 1, 2 or 3, wherein obtaining a discriminant loss based on each image coding feature comprises:
respectively obtaining an original discrimination value of the image coding feature of the image sample data and a loss discrimination value of the image coding feature of each loss function by adopting a preset discrimination function;
determining a discrimination loss based on the original discrimination value and each loss discrimination value;
the discrimination loss represents the similarity degree of the image coding features output by the encoder and the image coding features of the image sample data on feature distribution, and the discrimination loss is positively correlated with the original discrimination value and negatively correlated with the loss discrimination value.
5. The method of claim 1, 2 or 3, wherein obtaining a reconstruction loss based on the image sample data and image decoding features comprises:
respectively determining a decoding difference value between each image decoding characteristic and the image sample data;
obtaining a reconstruction loss based on each decoded difference;
and the reconstruction loss is used for judging the similarity degree of the output data of the decoder and the image sample data in a spatial domain.
6. An image representation information extraction method is characterized by comprising the following steps:
obtaining target model parameters of a target encoder by using the reference values of the target parameters of the encoder obtained by the method according to any one of claims 1 to 5;
initializing the target encoder according to the target model parameters;
and obtaining image representation information of the image data by adopting the target encoder.
7. An apparatus for training an encoder for image processing, comprising:
the superposition unit is used for carrying out noise superposition processing on image sample data to obtain at least two image loss data, wherein the two image loss data are obtained after the image sample data is subjected to Laplace transform and noise superposition;
the encoding unit is used for respectively adopting encoders with the same model parameters to carry out encoding processing on the image sample data and the at least two image loss data to obtain corresponding image encoding characteristics;
the decoding unit is used for decoding the obtained image coding features by adopting a corresponding decoder to obtain corresponding image decoding features;
the first obtaining unit is used for obtaining discrimination loss based on each image coding characteristic and obtaining reconstruction loss based on the image sample data and each image decoding characteristic;
a second obtaining unit, configured to obtain, according to the image sample data, corresponding triple image training data;
an extracting unit, configured to perform feature extraction processing on each training data in the triple image training data of the image sample data by using an encoder having the model parameter, respectively, to obtain a corresponding feature vector;
the first determining unit is used for determining the triple loss representing the distance relation among the characteristic vectors;
a prediction unit, configured to obtain a prediction loss based on the reconstruction loss, the discrimination loss, and the triplet loss, where the prediction loss is positively correlated with the reconstruction loss, the discrimination loss, and the triplet loss;
and the second determining unit is used for determining the model parameter as a reference value of the target parameter of the encoder if the prediction loss meets a preset convergence condition, and adjusting the model parameter until the prediction loss meets the preset convergence condition if the prediction loss does not meet the preset convergence condition.
8. The apparatus of claim 7, in which the triplet of image training data comprises: the image processing method comprises anchor point sample data, positive sample data and negative sample data, wherein the anchor point sample data are the image sample data, the positive sample data are obtained by performing random warping on the image sample data, and the negative sample data are data different from the image sample data.
9. The apparatus of claim 8, wherein the first determination unit is to:
determining a first distance between the feature vector of the anchor point sample data and the feature vector of the positive sample data;
determining a second distance between the feature vector of the anchor point sample data and the feature vector of the negative sample data;
determining a triplet penalty based on a difference between the first distance and the second distance.
10. The apparatus of claim 7, 8 or 9, wherein the first obtaining unit is to:
respectively obtaining an original discrimination value of the image coding feature of the image sample data and a loss discrimination value of the image coding feature of each loss function by adopting a preset discrimination function;
determining a discrimination loss based on the original discrimination value and each loss discrimination value;
the discrimination loss represents the similarity degree of the image coding features output by the encoder and the image coding features of the image sample data on feature distribution, and the discrimination loss is positively correlated with the original discrimination value and negatively correlated with the loss discrimination value.
11. The apparatus of claim 7, 8 or 9, wherein the first obtaining unit is to:
respectively determining a decoding difference value between each image decoding characteristic and the image sample data;
obtaining a reconstruction loss based on each decoded difference;
and the reconstruction loss is used for judging the similarity degree of the output data of the decoder and the image sample data in a spatial domain.
12. An image representation information extraction apparatus characterized by comprising:
an obtaining unit, configured to obtain target model parameters of a target encoder by using the reference values of the encoder target parameters obtained by the method according to any one of claims 1 to 5;
a setting unit, configured to initialize the target encoder according to the target model parameter;
and the extraction unit is used for obtaining the image representation information of the image data by adopting the target encoder.
13. A control apparatus, characterized by comprising:
at least one memory for storing program instructions;
at least one processor for calling program instructions stored in said memory and for executing the steps of the method of any of the preceding claims 1-5 or 6 according to the obtained program instructions.
14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5 or 6.
CN201910219343.XA 2019-03-21 2019-03-21 Encoder training and representation information extraction method and device Active CN110009013B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910219343.XA CN110009013B (en) 2019-03-21 2019-03-21 Encoder training and representation information extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910219343.XA CN110009013B (en) 2019-03-21 2019-03-21 Encoder training and representation information extraction method and device

Publications (2)

Publication Number Publication Date
CN110009013A CN110009013A (en) 2019-07-12
CN110009013B true CN110009013B (en) 2021-04-27

Family

ID=67167770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910219343.XA Active CN110009013B (en) 2019-03-21 2019-03-21 Encoder training and representation information extraction method and device

Country Status (1)

Country Link
CN (1) CN110009013B (en)

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110442804A (en) * 2019-08-13 2019-11-12 北京市商汤科技开发有限公司 A kind of training method, device, equipment and the storage medium of object recommendation network
CN110910982A (en) * 2019-11-04 2020-03-24 广州金域医学检验中心有限公司 Self-coding model training method, device, equipment and storage medium
CN110889338A (en) * 2019-11-08 2020-03-17 中国铁道科学研究院集团有限公司基础设施检测研究所 Unsupervised railway track bed foreign matter detection and sample construction method and unsupervised railway track bed foreign matter detection and sample construction device
CN111046655B (en) * 2019-11-14 2023-04-07 腾讯科技(深圳)有限公司 Data processing method and device and computer readable storage medium
CN117197615A (en) * 2019-12-09 2023-12-08 杭州海康威视数字技术股份有限公司 Model training method, feature extraction method and device
CN113159288B (en) * 2019-12-09 2022-06-28 支付宝(杭州)信息技术有限公司 Coding model training method and device for preventing private data leakage
CN111400754B (en) * 2020-03-11 2021-10-01 支付宝(杭州)信息技术有限公司 Construction method and device of user classification system for protecting user privacy
CN111291190B (en) * 2020-03-23 2023-04-07 腾讯科技(深圳)有限公司 Training method of encoder, information detection method and related device
CN111489803B (en) * 2020-03-31 2023-07-21 重庆金域医学检验所有限公司 Report form coding model generation method, system and equipment based on autoregressive model
CN111768457B (en) * 2020-05-14 2022-10-04 北京航空航天大学 Image data compression method, device, electronic equipment and storage medium
CN111639684B (en) * 2020-05-15 2024-03-01 北京三快在线科技有限公司 Training method and device for data processing model
CN111723812B (en) * 2020-06-05 2023-07-07 南强智视(厦门)科技有限公司 Real-time semantic segmentation method based on sequence knowledge distillation
CN111680787B (en) * 2020-06-12 2022-12-09 中国人民解放军战略支援部队信息工程大学 Side channel curve processing method and device and electronic equipment
CN111710346B (en) * 2020-06-18 2021-07-27 腾讯科技(深圳)有限公司 Audio processing method and device, computer equipment and storage medium
CN111738351B (en) * 2020-06-30 2023-12-19 创新奇智(重庆)科技有限公司 Model training method and device, storage medium and electronic equipment
CN112288699B (en) * 2020-10-23 2024-02-09 北京百度网讯科技有限公司 Method, device, equipment and medium for evaluating relative definition of image
CN112565763A (en) * 2020-11-30 2021-03-26 北京达佳互联信息技术有限公司 Abnormal image sample generation method and device, and image detection method and device
CN112541944B (en) * 2020-12-10 2022-07-12 山东师范大学 Probability twin target tracking method and system based on conditional variational encoder
CN114625871B (en) * 2020-12-14 2023-06-23 四川大学 Ternary grouping method based on attention position joint coding
CN113268631B (en) * 2021-04-21 2024-04-19 北京点众快看科技有限公司 Video screening method and device based on big data
CN113240021B (en) * 2021-05-19 2021-12-10 推想医疗科技股份有限公司 Method, device and equipment for screening target sample and storage medium
CN113836866B (en) * 2021-06-04 2024-05-24 腾讯科技(深圳)有限公司 Text encoding method, text encoding device, computer readable medium and electronic equipment
CN113378921A (en) * 2021-06-09 2021-09-10 北京百度网讯科技有限公司 Data screening method and device and electronic equipment
CN113592769B (en) * 2021-06-23 2024-04-12 腾讯医疗健康(深圳)有限公司 Abnormal image detection and model training method, device, equipment and medium
CN113470758B (en) * 2021-07-06 2023-10-13 北京科技大学 Chemical reaction yield prediction method based on causal discovery and multi-structure information coding
CN114429179B (en) * 2022-01-11 2024-02-09 中国人民解放军国防科技大学 Unmanned platform-oriented capability computing method and system
CN114418069B (en) * 2022-01-19 2024-06-14 腾讯科技(深圳)有限公司 Encoder training method, encoder training device and storage medium
CN114490950B (en) * 2022-04-07 2022-07-12 联通(广东)产业互联网有限公司 Method and storage medium for training encoder model, and method and system for predicting similarity
CN114915786B (en) * 2022-04-26 2023-07-28 哈尔滨工业大学(深圳) Asymmetric semantic image compression method for Internet of things scene
CN115116451A (en) * 2022-06-15 2022-09-27 腾讯科技(深圳)有限公司 Audio decoding method, audio encoding method, audio decoding device, audio encoding device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107437077A (en) * 2017-08-04 2017-12-05 深圳市唯特视科技有限公司 A kind of method that rotation face based on generation confrontation network represents study
CN107862668A (en) * 2017-11-24 2018-03-30 河海大学 A kind of cultural relic images restored method based on GNN
CN109002488A (en) * 2018-06-26 2018-12-14 北京邮电大学 A kind of recommended models training method and device based on first path context

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106980641B (en) * 2017-02-09 2020-01-21 上海媒智科技有限公司 Unsupervised Hash quick picture retrieval system and unsupervised Hash quick picture retrieval method based on convolutional neural network
US10255681B2 (en) * 2017-03-02 2019-04-09 Adobe Inc. Image matting using deep learning
US10574959B2 (en) * 2017-07-05 2020-02-25 Qualcomm Incorporated Color remapping for non-4:4:4 format video content
US11734955B2 (en) * 2017-09-18 2023-08-22 Board Of Trustees Of Michigan State University Disentangled representation learning generative adversarial network for pose-invariant face recognition
CN108537742B (en) * 2018-03-09 2021-07-09 天津大学 Remote sensing image panchromatic sharpening method based on generation countermeasure network
CN108428221A (en) * 2018-03-26 2018-08-21 广东顺德西安交通大学研究院 A kind of neighborhood bivariate shrinkage function denoising method based on shearlet transformation
CN108226892B (en) * 2018-03-27 2021-09-28 天津大学 Deep learning-based radar signal recovery method in complex noise environment
CN108600750A (en) * 2018-04-10 2018-09-28 山东师范大学 Multiple description coded, coding/decoding method based on KSVD and system
CN108829685A (en) * 2018-05-07 2018-11-16 内蒙古工业大学 A kind of illiteracy Chinese inter-translation method based on single language training
CN109033938A (en) * 2018-06-01 2018-12-18 上海阅面网络科技有限公司 A kind of face identification method based on ga s safety degree Fusion Features
CN108875818B (en) * 2018-06-06 2020-08-18 西安交通大学 Zero sample image classification method based on combination of variational self-coding machine and antagonistic network
CN109063731B (en) * 2018-06-26 2020-11-10 北京航天自动控制研究所 Scene adaptability criterion training sample set generation method
CN109145129B (en) * 2018-09-07 2020-03-31 深圳码隆科技有限公司 Depth measurement learning method and device based on hierarchical triple loss function

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107437077A (en) * 2017-08-04 2017-12-05 深圳市唯特视科技有限公司 A kind of method that rotation face based on generation confrontation network represents study
CN107862668A (en) * 2017-11-24 2018-03-30 河海大学 A kind of cultural relic images restored method based on GNN
CN109002488A (en) * 2018-06-26 2018-12-14 北京邮电大学 A kind of recommended models training method and device based on first path context

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Exploring Aaymmetric Encoder-Decoder Structure for Context-based Sentence Representation Learning";Shuai Tang,at el.;《arXiv》;20180601;全文 *
"Recent Adcaces in Autoencoder-Based Representation Learning";Michael Tschannen,at el.;《arXiv》;20181212;全文 *

Also Published As

Publication number Publication date
CN110009013A (en) 2019-07-12

Similar Documents

Publication Publication Date Title
CN110009013B (en) Encoder training and representation information extraction method and device
CN111209952B (en) Underwater target detection method based on improved SSD and migration learning
Fu et al. Removing rain from single images via a deep detail network
Zhang et al. Adaptive residual networks for high-quality image restoration
CN109948796B (en) Self-encoder learning method, self-encoder learning device, computer equipment and storage medium
CN112132959B (en) Digital rock core image processing method and device, computer equipment and storage medium
Tewari et al. Diffusion with forward models: Solving stochastic inverse problems without direct supervision
Shi et al. Unsharp mask guided filtering
Zhao et al. A deep cascade of neural networks for image inpainting, deblurring and denoising
CN114170184A (en) Product image anomaly detection method and device based on embedded feature vector
Zhang et al. Hierarchical attention aggregation with multi-resolution feature learning for GAN-based underwater image enhancement
CN111199197A (en) Image extraction method and processing equipment for face recognition
Kratzwald et al. Improving video generation for multi-functional applications
Wang et al. An efficient remote sensing image denoising method in extended discrete shearlet domain
Zin et al. Local image denoising using RAISR
Yang et al. Infrared image super-resolution with parallel random Forest
Tsuji et al. Non-guided depth completion with adversarial networks
Viriyavisuthisakul et al. Parametric regularization loss in super-resolution reconstruction
Yang et al. Single frame image super resolution via learning multiple anfis mappings
Chen et al. A deep motion deblurring network using channel adaptive residual module
Pahwa et al. LVRNet: Lightweight image restoration for aerial images under low visibility
Zhang et al. Se-dcgan: a new method of semantic image restoration
Wyzykowski et al. A Universal Latent Fingerprint Enhancer Using Transformers
Han et al. Semantic-Aware Face Deblurring with Pixel-Wise Projection Discriminator
Khan et al. Perceptual adversarial non-residual learning for blind image denoising

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40008583

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant