CN116739075A - Unsupervised pre-training method of neural network for image processing - Google Patents
Unsupervised pre-training method of neural network for image processing Download PDFInfo
- Publication number
- CN116739075A CN116739075A CN202310656829.6A CN202310656829A CN116739075A CN 116739075 A CN116739075 A CN 116739075A CN 202310656829 A CN202310656829 A CN 202310656829A CN 116739075 A CN116739075 A CN 116739075A
- Authority
- CN
- China
- Prior art keywords
- image
- neural network
- loss
- input
- stage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 61
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000012549 training Methods 0.000 title claims abstract description 28
- 238000012545 processing Methods 0.000 title claims abstract description 15
- 239000013598 vector Substances 0.000 claims abstract description 25
- 230000008447 perception Effects 0.000 claims abstract description 11
- 230000008569 process Effects 0.000 claims abstract description 10
- 230000006870 function Effects 0.000 claims description 24
- 230000000873 masking effect Effects 0.000 claims description 11
- 230000000007 visual effect Effects 0.000 claims description 9
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000009467 reduction Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 235000019788 craving Nutrition 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 235000003642 hunger Nutrition 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 230000037351 starvation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0499—Feedforward networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Abstract
The invention relates to the technical field of unsupervised learning of neural networks, in particular to an unsupervised pre-training method of a neural network for image processing, which comprises the following steps: firstly, dividing an image into image blocks, then performing mask operation, then calculating a perception loss, calculating a contrast loss and a reconstruction loss, and finally training by using the loss. After training, the input image is processed by using the trained model to obtain category characteristic vectors and reconstructed image vectors. According to the invention, the influence of mask operation on the neural network can be measured by using the perception loss, the characteristics of the neural network are more obvious by using the contrast loss, and finally, the network learns how to abstract the image into the characteristics by using the reconstruction loss, and meanwhile, the information loss in the abstract process is reduced, so that the characteristic extraction capability of the neural network on the image is improved.
Description
Technical Field
The invention relates to the technical field of unsupervised learning of neural networks, in particular to an unsupervised pre-training method of a neural network for image processing.
Background
As neural networks develop, the craving for data for machine learning has increased, but the creation of data set labels is a time consuming, laborious and not a good task. In particular, the data are now scaled in billions, with the intention of manually tagging them with near-astronomical night. Thus, to alleviate data starvation, an unsupervised learning approach may be employed for processing.
Common algorithms for unsupervised learning are classified into clustering, dimension reduction and self-supervised learning. The clustering method is an unsupervised algorithm proposed earlier, which achieves element partitioning by minimizing intra-class distances and maximizing inter-class distances. This problem is the NP-hard problem. The existing clustering method has good convergence speed and result under the condition of small number of elements and low dimension of the elements, but the cost of the clustering algorithm is great when facing high-dimensional features. The dimension reduction method maps the high-dimension data to the low-dimension space through a certain mapping method, and maintains the distance relation of the original data. The dimension reduction method cannot capture abstract links between data.
The self-supervision learning utilizes auxiliary tasks to mine supervision information from the data, trains a neural network by using the constructed supervision information, and extracts the characteristics required by downstream tasks. It can be divided into two main directions, a direction based on generation and a direction based on discrimination. One of the first proposed architectures based on the direction of generation is the self-encoder. The data is first input into an encoder that causes the neural network to learn its characteristics, known as encoding. The learned features are then used to reconstruct the original input data using a decoder, referred to as decoding. The goal of the self-encoder is to make the reconstructed data and the input data differ as little as possible. The encoder is the required feature extractor. The denoising self-encoder then proposes to obtain more versatile feature extraction capabilities by "zeroing out" certain entries. Masking self-encoders are inspired by denoising self-encoders and BERTs in the natural language domain. It was found that constructing noise with a large proportion (about 75%) of a mask of 16 x 16 size forced the network to learn some higher order semantic information. The other main direction is a direction based on discrimination. CPC uses InfoNCE loss to build an autoregressive model by contrast to predict the implicit spatial features. This inspires a direction in which the supervision information is obtained by comparing the differences between samples. SimCLR generates two different enhanced images of the same image using a twin network. Taking the two images as positive samples, and taking the enhanced images of other images in the batch as negative samples so as to obtain contrast information. The existing self-supervision learning method still has the condition of insufficient generalization capability, and particularly aims at the problem of insufficient image feature extraction capability during large-scale image processing.
Disclosure of Invention
The invention aims to provide an unsupervised pre-training method for a neural network for image processing, which reduces the loss of image features in the abstract process through perception loss, contrast loss and reconstruction loss and solves the technical problem of insufficient image feature extraction capability in the existing unsupervised pre-training method.
To achieve the above object, the present invention provides an unsupervised pretraining method for a neural network for image processing, comprising the steps of:
step 1: introducing a dataset having a plurality of types of sample images;
step 2: performing masking operation on the image input of the input data set to respectively obtain a raw data set and a masked data set of the image;
step 3: dividing the neural network into a plurality of stages, using a plurality of vision converters as a backbone network in each stage, respectively inputting an original data set and a masked data set, checking the difference output by the two stages, and recording as a perception loss;
step 4: at the last layer of the neural network, the output of the visual transducer is obtained and divided into a classification unit and an image unit; for the classification unit, calculating the difference between mask input of the image and neural network output of original input of the image and original input of other images, and recording the difference as contrast loss; for an image unit, calculating the difference between the output of the neural network input by the image mask and the pixel value of the original image, and recording the difference as reconstruction loss;
step 5: training the neural network by using the perceived loss, the contrast loss and the reconstruction loss together as a total loss function;
step 6: after training, the model inputs the image and outputs the category feature vector and the reconstructed image vector.
Optionally, the process of masking the image input into the dataset comprises the steps of:
defining an image asB refers to the quantity of input data of each batch, and H, W and C refer to the width, height and channel dimension of the image respectively; firstly dividing an image into a plurality of image blocks, setting the size P multiplied by P of the divided image blocks, and defining a set of the divided image blocksWherein->Inputting the vector form code into a linear layer in a neural network, obtaining the vector form code of the vector form code and embedding the vector form code, and adding a randomly initialized class unit +.>Splicing to obtain image block feature set->D refers to a feature dimension, expressed as:
T=Concat(Patches,V CLS )
setting a mask rate m r ∈[0,1]A mask M.epsilon.0, 1 is constructed B×N Make it satisfy (sigma) M[i]=1 1)/N≈m r The method comprises the steps of carrying out a first treatment on the surface of the Constructing a randomly initialized mask vectorThus constructing a mask input from the mask, formulating the operation as:
where M i denotes a mask flag that masks M to the ith tile feature, 1 denotes that this tile feature needs to be masked.
Optionally, the neural network is divided into a plurality of stages, and the difference output by each stage is checked and recorded as a process of perceived loss, including the following steps:
step 3.1: dividing the neural network f into n phases, phase i being denoted Stage i (X):
f j (X)=Stage j ⊙Stage j-1 ⊙...⊙Stage 1 (X)
Wherein "+.;
step 3.2: stage per Stage i (X) contains lambda i The flow of the visual transducer is expressed as:
X′ (l) =X (l) +MSA(LN(X (l) ))
X (l+1) =X′ (l) +FFN(LN(X′ (l) ))
where l denotes a layer-l neural network, LN denotes layer normalization, MSA denotes a multi-head attention mechanism, FFN denotes a feed-forward neural network,
the flow of MSA is formulated as:
wherein Concat refers to splicing operation, N h Refer to the attention head, i.e. N h The attention mechanism outputs, the attention of the h head is defined as:
SelfAttenion (h) (X):=[φ (h) (X)]V (h)
wherein the method comprises the steps ofIs a function of providing spatial attention based on the content of the input data, and functions to aggregate V (h) The definition is:
wherein the method comprises the steps ofIs a linear projection matrix, τ is a temperature parameter;
the flow of FFN is described as:
FFN(X)=σ(XW 1 )W 2
wherein W is 1 ,W 2 Is a linear projection matrix, σ is an activation function;
step 3.3: calculating the perceived loss, and only calculating the perceived loss of the masked area
Wherein xi j Is a super-parameter coefficient for measuring the perceptual loss weight of each stage, and is xi j <ξ j+1 ,T[b,i]The ith image block feature representing the b-th image.
Optionally, the output of the network into which the original image is input is re-divided into image unitsAnd category element->For the output of the network into which the masked image is input, the +.>And->
Constructing a similarity function sim (·) to measure similarity between the classification units, wherein the formula is as follows:
sim(a,b)=a T b
a cross entropy function is used as a contrast loss,
wherein T is CLS [b],Finger class unit T CLS ,/>The b-th image data of the input data. τ is a temperature parameter for controlling the inter-class distance.
The L1 penalty is used to calculate the reconstruction penalty for the masked region,
wherein Patches [ b, i ]],Refers to the original image and the i-th image block characteristic of the b-th image in output.
Optionally, the total loss function is:
wherein, xi, beta and gamma are super parameter coefficients, and xi refers to the super parameter coefficient xi for measuring the perception loss weight of each stage j Is a set of (3).
The invention provides an unsupervised pre-training method of a neural network for image processing, which comprises the following steps: firstly, dividing an image into image blocks, then performing mask operation, then calculating a perception loss, calculating a contrast loss and a reconstruction loss, and finally training by using the loss. After training, the input image is processed by using the trained model to obtain category characteristic vectors and reconstructed image vectors. According to the invention, the influence of mask operation on the neural network can be measured by using the perception loss, the characteristics of the neural network are more obvious by using the contrast loss, and finally, the network learns how to abstract the image into the characteristics by using the reconstruction loss, and meanwhile, the information loss in the abstract process is reduced, so that the characteristic extraction capability of the neural network on the image is improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an unsupervised pretraining method of the neural network for image processing of the present invention.
Fig. 2 is an original image data diagram of a pre-training input of an embodiment of the present invention.
Fig. 3 is a schematic diagram showing the effect of masking operation according to an embodiment of the present invention.
Fig. 4 is a training effect diagram of a training network in accordance with an embodiment of the present invention.
Fig. 5 is a schematic diagram of network output without masking operation according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.
The invention provides an embodiment of an unsupervised pre-training method for a neural network for image classification, comprising the following steps:
s1: introducing a dataset having a plurality of types of sample images;
s2: performing masking operation on the image input of the input data set to respectively obtain a raw data set and a masked data set of the image;
s3: dividing the neural network into a plurality of stages, using a plurality of vision converters as a backbone network in each stage, respectively inputting an original data set and a masked data set, checking the difference output by the two stages, and recording as a perception loss;
s4: at the last layer of the neural network, the output of the visual transducer is obtained and divided into a classification unit and an image unit; for the classification unit, calculating the difference between mask input of the image and neural network output of original input of the image and original input of other images, and recording the difference as contrast loss; for an image unit, calculating the difference between the output of the neural network input by the image mask and the pixel value of the original image, and recording the difference as reconstruction loss;
s5: training the neural network by using the perceived loss, the contrast loss and the reconstruction loss together as a total loss function;
s6: after training, the model inputs the image and outputs the category feature vector and the reconstructed image vector.
The detailed flow of steps is shown in figure 1.
Further, the following describes the steps of the present invention in connection with the specific embodiments:
in step S1, the image dataset is imageNet-1K. The method comprises 140 or more than ten thousand pictures and 1000 image categories.
The step of masking the image in step S2 is:
2.1 define an image asB refers to the number of input image data per batch, H, W, C refers to the width, height and channel dimensions of the image, respectively. First, an image is divided into a plurality of image blocks, and a divided image block size p×p=16×16 is set. Defining a set of segmented image blocks->Wherein->Inputting it into a linear layer of neural network, obtaining its vector form code and embedding, and adding randomly initialized class unit vector ++>Splicing to obtain image block feature set->D refers to the feature dimension. Described by the formula as
T=Concat(Patches,V CLS )#(1)
2.2 sets a mask rate mr=0.75. Constructing a mask M.epsilon.0, 1 B×N Let it satisfy m r ≈(∑ M[i]=1 1) N. Constructing a randomly initialized mask vectorThereby constructing a mask input from the mask. This operation is described by the formula:
where M i refers to the mask mark where mask M corresponds to the ith tile feature. 1 indicates that this tile feature needs to be masked. The mask vector in this operation overlays the original feature vector.
In step S3, the step of checking the output difference of each stage and recording as a perceived loss is as follows:
3.1 dividing the neural network f into n phases, phase i being denoted S i (X)
f j (X)=S j (S j-1 (...S 1 (X)#(3)
3.2 each stage S i (X) contains lambda i And a visual transducer. Which is set as = { lambda i The visual transformer flow is formulated as = {2,2,6,2 }:
where l represents a layer-l neural network. LN refers to layer normalization. MSA refers to a multi-headed attention mechanism. FFN refers to a feed-forward neural network.
The flow of MSA is formulated as:
wherein Concat refers to splicing operation, N h Refer to the attention head, i.e. N h A personal attentiveness mechanism output. The attention of the h head is defined as
SelfAttenion (h) (X):=[φ (h) (X)]V (h) #(6)
Wherein the method comprises the steps ofIs a function of providing spatial attention based on the input data content. Its function is to polymerize V (h) . It is defined as:
wherein the method comprises the steps ofIs a linear projection matrix. τ φ Is a temperature parameter, the value is +.>
The flow of FFN is described as:
FFN(X)=σ(XW 1 )W 2 #(8)
wherein W is 1 ,W 2 Is a linear projection matrix. Sigma is the activation function GeLU.
3.3 calculating the perceived loss. Only the perceived loss of the masked area is calculated
Wherein xi j Is a coefficient super-parameter for measuring the perception loss weight of each stage, and is xi j <ξ j+1 。T[b,i]The ith image block feature representing the b-th image.
The step of calculating the contrast loss and the reconstruction loss in the step S4 is that
4.1 re-dividing the output of the network into which the raw image was input into image unitsAnd category unitFor the output of the network into which the masked image is input, the +.>And->
4.2 construction of similarity function sim (·) similarity between taxons is measured
sim(a,b)=a T b#(11)
A cross entropy function is used as a contrast loss.
Wherein T is CLS [b],Finger class unit T CLS ,/>The b-th image data of the input data. τ is a temperature parameter for controlling the inter-class distance.
4.3 use the L1 penalty to calculate the reconstruction penalty for the masked region.
Wherein Patches [ b, i ]],Refers to the original image and the i-th image block characteristic of the b-th image in output.
The total loss in step S5 is
The total loss function is calculated as:
wherein, xi, beta and gamma are coefficient super parameters. Xi means xi appearing in formula (9) j Is set of (a)
The neural network is trained using existing neural network training tools until a suitable round.
The downstream task fine tuning process after training in step S6 is
6.1 inputting a batch of image data into the neural network f n . And obtaining the final visual transducer output of the model, and dividing the final visual transducer output into category units and image units. The class unit is adjusted to be output in one-hot form through the linear layer and the activation function
6.2 training the entire network Using Cross entropy function as Classification loss
The neural network is trained using existing neural network training tools until a suitable round.
Finally, the classification task process by using the network is as follows:
the category of the image is calculated. The flow is described as follows:
class is a series of positive integers, meaningThe most likely class number that each image neural network considers within a batch is shown.Refers to->And (3) represents the one-hot form prediction probability that the network belongs to the i-th class of the image.
Further, an embodiment of the present invention is provided for assistance in explanation, and the effect of performing the pre-training is shown in fig. 2 to 5. Specifically, fig. 2 illustrates the input image data, fig. 3 illustrates the masking operation of the image by the network, fig. 4 illustrates the training effect of the pre-training network, and fig. 5 illustrates the network output without masking operation. As can be seen from FIG. 4, the invention can enable the network to extract higher semantics and enable the network to have a part of reasoning capability. As can be seen from FIG. 5, the present invention allows the neural network to retain much structural and color information from the artwork, helping to train downstream tasks.
The above disclosure is only a preferred embodiment of the present invention, and it should be understood that the scope of the invention is not limited thereto, and those skilled in the art will appreciate that all or part of the procedures described above can be performed according to the equivalent changes of the claims, and still fall within the scope of the present invention.
Claims (5)
1. An unsupervised pretraining method for a neural network for image processing, comprising the steps of:
step 1: introducing a dataset having a plurality of types of sample images;
step 2: performing masking operation on the image input of the input data set to respectively obtain a raw data set and a masked data set of the image;
step 3: dividing the neural network into a plurality of stages, using a plurality of vision converters as a backbone network in each stage, respectively inputting an original data set and a masked data set, checking the difference output by the two stages, and recording as a perception loss;
step 4: at the last layer of the neural network, the output of the visual transducer is obtained and divided into a classification unit and an image unit; for the classification unit, calculating the difference between mask input of the image and neural network output of original input of the image and original input of other images, and recording the difference as contrast loss; for an image unit, calculating the difference between the output of the neural network input by the image mask and the pixel value of the original image, and recording the difference as reconstruction loss;
step 5: training the neural network by using the perceived loss, the contrast loss and the reconstruction loss together as a total loss function;
step 6: after training, the model inputs the image and outputs the category feature vector and the reconstructed image vector.
2. An unsupervised pretraining method for an image processing neural network according to claim 1,
a process for masking an image input into a dataset, comprising the steps of:
defining an image asB refers to the quantity of input data of each batch, and H, W and C refer to the width, height and channel dimension of the image respectively; firstly dividing an image into a plurality of image blocks, setting the size P multiplied by P of the divided image blocks, and defining a set of the divided image blocksWherein->Inputting the vector form code into a linear layer in a neural network, obtaining the vector form code of the vector form code and embedding the vector form code, and adding a randomly initialized class unit +.>Splicing to obtain image block feature set->D refers to a feature dimension, expressed as:
T=Concat(Patches,V CLS )
setting a mask rate m r ∈[0,1]A mask M.epsilon.0, 1 is constructed B×N Make it satisfy (sigma) M[i]=1 1)/N≈m r The method comprises the steps of carrying out a first treatment on the surface of the Constructing a randomly initialized mask vectorThus constructing a mask input from the mask, formulating the operation as:
where M i denotes a mask flag that masks M to the ith tile feature, 1 denotes that this tile feature needs to be masked.
3. An unsupervised pretraining method for an image processing neural network according to claim 2,
a process of dividing a neural network into a plurality of stages, checking the difference of both outputs of each stage, and recording as a perceived loss, comprising the steps of:
step 3.1: dividing the neural network f into n phases, phase i being denoted Stage i (X):
f j (X)=Stage j ⊙Stage j-1 ⊙...⊙Stage 1 (X)
Wherein "+.;
step 3.2: stage per Stage i (X) contains lambda i The flow of the visual transducer is expressed as:
X′ (l) =X (l) +MSA(LN(X (l) ))
X (l+1) =X′ (l) +FFN(LN(X′ (l) ))
where l denotes a layer-l neural network, LN denotes layer normalization, MSA denotes a multi-head attention mechanism, FFN denotes a feed-forward neural network,
the flow of MSA is formulated as:
wherein Concat refers to splicing operation, N h Refer to the attention head, i.e. N h The attention mechanism outputs, the attention of the h head is defined as:
SelfAttenion (h) (X):=[φ (h) (X)]V (h)
wherein the method comprises the steps ofIs a function of providing spatial attention based on the content of the input data, and functions to aggregate V (h) The definition is:
wherein the method comprises the steps ofIs a linear projection matrix, τ φ Is a temperature parameter;
the flow of FFN is described as:
FFN(X)=σ(XW 1 )W 2
wherein W is 1 ,W 2 Is a linear projection matrix, σ is an activation function;
step 3.3: calculating the perceived loss, and only calculating the perceived loss of the masked area
Wherein xi j Is a super-parameter coefficient for measuring the perceptual loss weight of each stage, and is xi j <ξ j+1 ,T[b,i]The ith image block feature representing the b-th image.
4. An unsupervised pretraining method for an image processing neural network according to claim 3,
the specific implementation process of the step 4 comprises the following steps:
re-dividing the output of a network into which an original image is input into image unitsAnd category unitFor the output of the network into which the masked image is input, the +.>And->
A similarity function sim (·) is constructed to measure similarity between the taxonomies, the formula:
a cross entropy function is used as a contrast loss,
wherein T is CLS [b],Finger class unit T CLS ,/>Corresponding to the b-th image of the input data, τ being a temperature parameter for controlling the inter-class distance;
the L1 penalty is used to calculate the reconstruction penalty for the masked region,
wherein Patches [ b, i ]],Refers to the original image and the i-th image block characteristic of the b-th image in output.
5. An unsupervised pretraining method for an image processing neural network according to claim 4,
the total loss function is:
wherein, xi, beta and gamma are super parameter coefficients, and xi refers to the super parameter coefficient xi for measuring the perception loss weight of each stage in the step 3.3 j Is a set of (3).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310656829.6A CN116739075A (en) | 2023-06-05 | 2023-06-05 | Unsupervised pre-training method of neural network for image processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310656829.6A CN116739075A (en) | 2023-06-05 | 2023-06-05 | Unsupervised pre-training method of neural network for image processing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116739075A true CN116739075A (en) | 2023-09-12 |
Family
ID=87905518
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310656829.6A Pending CN116739075A (en) | 2023-06-05 | 2023-06-05 | Unsupervised pre-training method of neural network for image processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116739075A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117459737A (en) * | 2023-12-22 | 2024-01-26 | 中国科学技术大学 | Training method of image preprocessing network and image preprocessing method |
-
2023
- 2023-06-05 CN CN202310656829.6A patent/CN116739075A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117459737A (en) * | 2023-12-22 | 2024-01-26 | 中国科学技术大学 | Training method of image preprocessing network and image preprocessing method |
CN117459737B (en) * | 2023-12-22 | 2024-03-29 | 中国科学技术大学 | Training method of image preprocessing network and image preprocessing method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110909673B (en) | Pedestrian re-identification method based on natural language description | |
CN109919204B (en) | Noise image-oriented deep learning clustering method | |
CN109918671A (en) | Electronic health record entity relation extraction method based on convolution loop neural network | |
Lu et al. | Facial expression recognition based on convolutional neural network | |
CN112508077A (en) | Social media emotion analysis method and system based on multi-modal feature fusion | |
CN110472518B (en) | Fingerprint image quality judgment method based on full convolution network | |
CN112906500B (en) | Facial expression recognition method and system based on deep privilege network | |
CN116739075A (en) | Unsupervised pre-training method of neural network for image processing | |
CN105740911A (en) | Structure sparsification maintenance based semi-supervised dictionary learning method | |
CN116311483B (en) | Micro-expression recognition method based on local facial area reconstruction and memory contrast learning | |
CN112818764A (en) | Low-resolution image facial expression recognition method based on feature reconstruction model | |
CN114528928A (en) | Two-training image classification algorithm based on Transformer | |
CN116168324A (en) | Video emotion recognition method based on cyclic interaction transducer and dimension cross fusion | |
CN116680343A (en) | Link prediction method based on entity and relation expression fusing multi-mode information | |
CN116797848A (en) | Disease positioning method and system based on medical image text alignment | |
CN115331284A (en) | Self-healing mechanism-based facial expression recognition method and system in real scene | |
CN108388918B (en) | Data feature selection method with structure retention characteristics | |
CN117409411A (en) | TFT-LCD liquid crystal panel defect segmentation method and system based on semi-supervised learning | |
Wang et al. | Action units recognition based on deep spatial-convolutional and multi-label residual network | |
CN114782995A (en) | Human interaction behavior detection method based on self-attention mechanism | |
CN115456025A (en) | Electroencephalogram emotion recognition method based on layered attention time domain convolution network | |
CN111104868B (en) | Cross-quality face recognition method based on convolutional neural network characteristics | |
CN113111945A (en) | Confrontation sample defense method based on transform self-encoder | |
CN114220145A (en) | Face detection model generation method and device and fake face detection method and device | |
CN114169433A (en) | Industrial fault prediction method based on federal learning + image learning + CNN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |