CN116739075A - Unsupervised pre-training method of neural network for image processing - Google Patents

Unsupervised pre-training method of neural network for image processing Download PDF

Info

Publication number
CN116739075A
CN116739075A CN202310656829.6A CN202310656829A CN116739075A CN 116739075 A CN116739075 A CN 116739075A CN 202310656829 A CN202310656829 A CN 202310656829A CN 116739075 A CN116739075 A CN 116739075A
Authority
CN
China
Prior art keywords
image
neural network
loss
input
stage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310656829.6A
Other languages
Chinese (zh)
Inventor
蓝如师
陈颖贤
罗笑南
杨睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanning Guidian Electronic Technology Research Institute Co ltd
Guilin University of Electronic Technology
Original Assignee
Nanning Guidian Electronic Technology Research Institute Co ltd
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanning Guidian Electronic Technology Research Institute Co ltd, Guilin University of Electronic Technology filed Critical Nanning Guidian Electronic Technology Research Institute Co ltd
Priority to CN202310656829.6A priority Critical patent/CN116739075A/en
Publication of CN116739075A publication Critical patent/CN116739075A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Abstract

The invention relates to the technical field of unsupervised learning of neural networks, in particular to an unsupervised pre-training method of a neural network for image processing, which comprises the following steps: firstly, dividing an image into image blocks, then performing mask operation, then calculating a perception loss, calculating a contrast loss and a reconstruction loss, and finally training by using the loss. After training, the input image is processed by using the trained model to obtain category characteristic vectors and reconstructed image vectors. According to the invention, the influence of mask operation on the neural network can be measured by using the perception loss, the characteristics of the neural network are more obvious by using the contrast loss, and finally, the network learns how to abstract the image into the characteristics by using the reconstruction loss, and meanwhile, the information loss in the abstract process is reduced, so that the characteristic extraction capability of the neural network on the image is improved.

Description

Unsupervised pre-training method of neural network for image processing
Technical Field
The invention relates to the technical field of unsupervised learning of neural networks, in particular to an unsupervised pre-training method of a neural network for image processing.
Background
As neural networks develop, the craving for data for machine learning has increased, but the creation of data set labels is a time consuming, laborious and not a good task. In particular, the data are now scaled in billions, with the intention of manually tagging them with near-astronomical night. Thus, to alleviate data starvation, an unsupervised learning approach may be employed for processing.
Common algorithms for unsupervised learning are classified into clustering, dimension reduction and self-supervised learning. The clustering method is an unsupervised algorithm proposed earlier, which achieves element partitioning by minimizing intra-class distances and maximizing inter-class distances. This problem is the NP-hard problem. The existing clustering method has good convergence speed and result under the condition of small number of elements and low dimension of the elements, but the cost of the clustering algorithm is great when facing high-dimensional features. The dimension reduction method maps the high-dimension data to the low-dimension space through a certain mapping method, and maintains the distance relation of the original data. The dimension reduction method cannot capture abstract links between data.
The self-supervision learning utilizes auxiliary tasks to mine supervision information from the data, trains a neural network by using the constructed supervision information, and extracts the characteristics required by downstream tasks. It can be divided into two main directions, a direction based on generation and a direction based on discrimination. One of the first proposed architectures based on the direction of generation is the self-encoder. The data is first input into an encoder that causes the neural network to learn its characteristics, known as encoding. The learned features are then used to reconstruct the original input data using a decoder, referred to as decoding. The goal of the self-encoder is to make the reconstructed data and the input data differ as little as possible. The encoder is the required feature extractor. The denoising self-encoder then proposes to obtain more versatile feature extraction capabilities by "zeroing out" certain entries. Masking self-encoders are inspired by denoising self-encoders and BERTs in the natural language domain. It was found that constructing noise with a large proportion (about 75%) of a mask of 16 x 16 size forced the network to learn some higher order semantic information. The other main direction is a direction based on discrimination. CPC uses InfoNCE loss to build an autoregressive model by contrast to predict the implicit spatial features. This inspires a direction in which the supervision information is obtained by comparing the differences between samples. SimCLR generates two different enhanced images of the same image using a twin network. Taking the two images as positive samples, and taking the enhanced images of other images in the batch as negative samples so as to obtain contrast information. The existing self-supervision learning method still has the condition of insufficient generalization capability, and particularly aims at the problem of insufficient image feature extraction capability during large-scale image processing.
Disclosure of Invention
The invention aims to provide an unsupervised pre-training method for a neural network for image processing, which reduces the loss of image features in the abstract process through perception loss, contrast loss and reconstruction loss and solves the technical problem of insufficient image feature extraction capability in the existing unsupervised pre-training method.
To achieve the above object, the present invention provides an unsupervised pretraining method for a neural network for image processing, comprising the steps of:
step 1: introducing a dataset having a plurality of types of sample images;
step 2: performing masking operation on the image input of the input data set to respectively obtain a raw data set and a masked data set of the image;
step 3: dividing the neural network into a plurality of stages, using a plurality of vision converters as a backbone network in each stage, respectively inputting an original data set and a masked data set, checking the difference output by the two stages, and recording as a perception loss;
step 4: at the last layer of the neural network, the output of the visual transducer is obtained and divided into a classification unit and an image unit; for the classification unit, calculating the difference between mask input of the image and neural network output of original input of the image and original input of other images, and recording the difference as contrast loss; for an image unit, calculating the difference between the output of the neural network input by the image mask and the pixel value of the original image, and recording the difference as reconstruction loss;
step 5: training the neural network by using the perceived loss, the contrast loss and the reconstruction loss together as a total loss function;
step 6: after training, the model inputs the image and outputs the category feature vector and the reconstructed image vector.
Optionally, the process of masking the image input into the dataset comprises the steps of:
defining an image asB refers to the quantity of input data of each batch, and H, W and C refer to the width, height and channel dimension of the image respectively; firstly dividing an image into a plurality of image blocks, setting the size P multiplied by P of the divided image blocks, and defining a set of the divided image blocksWherein->Inputting the vector form code into a linear layer in a neural network, obtaining the vector form code of the vector form code and embedding the vector form code, and adding a randomly initialized class unit +.>Splicing to obtain image block feature set->D refers to a feature dimension, expressed as:
T=Concat(Patches,V CLS )
setting a mask rate m r ∈[0,1]A mask M.epsilon.0, 1 is constructed B×N Make it satisfy (sigma) M[i]=1 1)/N≈m r The method comprises the steps of carrying out a first treatment on the surface of the Constructing a randomly initialized mask vectorThus constructing a mask input from the mask, formulating the operation as:
where M i denotes a mask flag that masks M to the ith tile feature, 1 denotes that this tile feature needs to be masked.
Optionally, the neural network is divided into a plurality of stages, and the difference output by each stage is checked and recorded as a process of perceived loss, including the following steps:
step 3.1: dividing the neural network f into n phases, phase i being denoted Stage i (X):
f j (X)=Stage j ⊙Stage j-1 ⊙...⊙Stage 1 (X)
Wherein "+.;
step 3.2: stage per Stage i (X) contains lambda i The flow of the visual transducer is expressed as:
X′ (l) =X (l) +MSA(LN(X (l) ))
X (l+1) =X′ (l) +FFN(LN(X′ (l) ))
where l denotes a layer-l neural network, LN denotes layer normalization, MSA denotes a multi-head attention mechanism, FFN denotes a feed-forward neural network,
the flow of MSA is formulated as:
wherein Concat refers to splicing operation, N h Refer to the attention head, i.e. N h The attention mechanism outputs, the attention of the h head is defined as:
SelfAttenion (h) (X):=[φ (h) (X)]V (h)
wherein the method comprises the steps ofIs a function of providing spatial attention based on the content of the input data, and functions to aggregate V (h) The definition is:
wherein the method comprises the steps ofIs a linear projection matrix, τ is a temperature parameter;
the flow of FFN is described as:
FFN(X)=σ(XW 1 )W 2
wherein W is 1 ,W 2 Is a linear projection matrix, σ is an activation function;
step 3.3: calculating the perceived loss, and only calculating the perceived loss of the masked area
Wherein xi j Is a super-parameter coefficient for measuring the perceptual loss weight of each stage, and is xi jj+1 ,T[b,i]The ith image block feature representing the b-th image.
Optionally, the output of the network into which the original image is input is re-divided into image unitsAnd category element->For the output of the network into which the masked image is input, the +.>And->
Constructing a similarity function sim (·) to measure similarity between the classification units, wherein the formula is as follows:
sim(a,b)=a T b
a cross entropy function is used as a contrast loss,
wherein T is CLS [b],Finger class unit T CLS ,/>The b-th image data of the input data. τ is a temperature parameter for controlling the inter-class distance.
The L1 penalty is used to calculate the reconstruction penalty for the masked region,
wherein Patches [ b, i ]],Refers to the original image and the i-th image block characteristic of the b-th image in output.
Optionally, the total loss function is:
wherein, xi, beta and gamma are super parameter coefficients, and xi refers to the super parameter coefficient xi for measuring the perception loss weight of each stage j Is a set of (3).
The invention provides an unsupervised pre-training method of a neural network for image processing, which comprises the following steps: firstly, dividing an image into image blocks, then performing mask operation, then calculating a perception loss, calculating a contrast loss and a reconstruction loss, and finally training by using the loss. After training, the input image is processed by using the trained model to obtain category characteristic vectors and reconstructed image vectors. According to the invention, the influence of mask operation on the neural network can be measured by using the perception loss, the characteristics of the neural network are more obvious by using the contrast loss, and finally, the network learns how to abstract the image into the characteristics by using the reconstruction loss, and meanwhile, the information loss in the abstract process is reduced, so that the characteristic extraction capability of the neural network on the image is improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of an unsupervised pretraining method of the neural network for image processing of the present invention.
Fig. 2 is an original image data diagram of a pre-training input of an embodiment of the present invention.
Fig. 3 is a schematic diagram showing the effect of masking operation according to an embodiment of the present invention.
Fig. 4 is a training effect diagram of a training network in accordance with an embodiment of the present invention.
Fig. 5 is a schematic diagram of network output without masking operation according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.
The invention provides an embodiment of an unsupervised pre-training method for a neural network for image classification, comprising the following steps:
s1: introducing a dataset having a plurality of types of sample images;
s2: performing masking operation on the image input of the input data set to respectively obtain a raw data set and a masked data set of the image;
s3: dividing the neural network into a plurality of stages, using a plurality of vision converters as a backbone network in each stage, respectively inputting an original data set and a masked data set, checking the difference output by the two stages, and recording as a perception loss;
s4: at the last layer of the neural network, the output of the visual transducer is obtained and divided into a classification unit and an image unit; for the classification unit, calculating the difference between mask input of the image and neural network output of original input of the image and original input of other images, and recording the difference as contrast loss; for an image unit, calculating the difference between the output of the neural network input by the image mask and the pixel value of the original image, and recording the difference as reconstruction loss;
s5: training the neural network by using the perceived loss, the contrast loss and the reconstruction loss together as a total loss function;
s6: after training, the model inputs the image and outputs the category feature vector and the reconstructed image vector.
The detailed flow of steps is shown in figure 1.
Further, the following describes the steps of the present invention in connection with the specific embodiments:
in step S1, the image dataset is imageNet-1K. The method comprises 140 or more than ten thousand pictures and 1000 image categories.
The step of masking the image in step S2 is:
2.1 define an image asB refers to the number of input image data per batch, H, W, C refers to the width, height and channel dimensions of the image, respectively. First, an image is divided into a plurality of image blocks, and a divided image block size p×p=16×16 is set. Defining a set of segmented image blocks->Wherein->Inputting it into a linear layer of neural network, obtaining its vector form code and embedding, and adding randomly initialized class unit vector ++>Splicing to obtain image block feature set->D refers to the feature dimension. Described by the formula as
T=Concat(Patches,V CLS )#(1)
2.2 sets a mask rate mr=0.75. Constructing a mask M.epsilon.0, 1 B×N Let it satisfy m r ≈(∑ M[i]=1 1) N. Constructing a randomly initialized mask vectorThereby constructing a mask input from the mask. This operation is described by the formula:
where M i refers to the mask mark where mask M corresponds to the ith tile feature. 1 indicates that this tile feature needs to be masked. The mask vector in this operation overlays the original feature vector.
In step S3, the step of checking the output difference of each stage and recording as a perceived loss is as follows:
3.1 dividing the neural network f into n phases, phase i being denoted S i (X)
f j (X)=S j (S j-1 (...S 1 (X)#(3)
3.2 each stage S i (X) contains lambda i And a visual transducer. Which is set as = { lambda i The visual transformer flow is formulated as = {2,2,6,2 }:
where l represents a layer-l neural network. LN refers to layer normalization. MSA refers to a multi-headed attention mechanism. FFN refers to a feed-forward neural network.
The flow of MSA is formulated as:
wherein Concat refers to splicing operation, N h Refer to the attention head, i.e. N h A personal attentiveness mechanism output. The attention of the h head is defined as
SelfAttenion (h) (X):=[φ (h) (X)]V (h) #(6)
Wherein the method comprises the steps ofIs a function of providing spatial attention based on the input data content. Its function is to polymerize V (h) . It is defined as:
wherein the method comprises the steps ofIs a linear projection matrix. τ φ Is a temperature parameter, the value is +.>
The flow of FFN is described as:
FFN(X)=σ(XW 1 )W 2 #(8)
wherein W is 1 ,W 2 Is a linear projection matrix. Sigma is the activation function GeLU.
3.3 calculating the perceived loss. Only the perceived loss of the masked area is calculated
Wherein xi j Is a coefficient super-parameter for measuring the perception loss weight of each stage, and is xi jj+1 。T[b,i]The ith image block feature representing the b-th image.
The step of calculating the contrast loss and the reconstruction loss in the step S4 is that
4.1 re-dividing the output of the network into which the raw image was input into image unitsAnd category unitFor the output of the network into which the masked image is input, the +.>And->
4.2 construction of similarity function sim (·) similarity between taxons is measured
sim(a,b)=a T b#(11)
A cross entropy function is used as a contrast loss.
Wherein T is CLS [b],Finger class unit T CLS ,/>The b-th image data of the input data. τ is a temperature parameter for controlling the inter-class distance.
4.3 use the L1 penalty to calculate the reconstruction penalty for the masked region.
Wherein Patches [ b, i ]],Refers to the original image and the i-th image block characteristic of the b-th image in output.
The total loss in step S5 is
The total loss function is calculated as:
wherein, xi, beta and gamma are coefficient super parameters. Xi means xi appearing in formula (9) j Is set of (a)
The neural network is trained using existing neural network training tools until a suitable round.
The downstream task fine tuning process after training in step S6 is
6.1 inputting a batch of image data into the neural network f n . And obtaining the final visual transducer output of the model, and dividing the final visual transducer output into category units and image units. The class unit is adjusted to be output in one-hot form through the linear layer and the activation function
6.2 training the entire network Using Cross entropy function as Classification loss
The neural network is trained using existing neural network training tools until a suitable round.
Finally, the classification task process by using the network is as follows:
the category of the image is calculated. The flow is described as follows:
class is a series of positive integers, meaningThe most likely class number that each image neural network considers within a batch is shown.Refers to->And (3) represents the one-hot form prediction probability that the network belongs to the i-th class of the image.
Further, an embodiment of the present invention is provided for assistance in explanation, and the effect of performing the pre-training is shown in fig. 2 to 5. Specifically, fig. 2 illustrates the input image data, fig. 3 illustrates the masking operation of the image by the network, fig. 4 illustrates the training effect of the pre-training network, and fig. 5 illustrates the network output without masking operation. As can be seen from FIG. 4, the invention can enable the network to extract higher semantics and enable the network to have a part of reasoning capability. As can be seen from FIG. 5, the present invention allows the neural network to retain much structural and color information from the artwork, helping to train downstream tasks.
The above disclosure is only a preferred embodiment of the present invention, and it should be understood that the scope of the invention is not limited thereto, and those skilled in the art will appreciate that all or part of the procedures described above can be performed according to the equivalent changes of the claims, and still fall within the scope of the present invention.

Claims (5)

1. An unsupervised pretraining method for a neural network for image processing, comprising the steps of:
step 1: introducing a dataset having a plurality of types of sample images;
step 2: performing masking operation on the image input of the input data set to respectively obtain a raw data set and a masked data set of the image;
step 3: dividing the neural network into a plurality of stages, using a plurality of vision converters as a backbone network in each stage, respectively inputting an original data set and a masked data set, checking the difference output by the two stages, and recording as a perception loss;
step 4: at the last layer of the neural network, the output of the visual transducer is obtained and divided into a classification unit and an image unit; for the classification unit, calculating the difference between mask input of the image and neural network output of original input of the image and original input of other images, and recording the difference as contrast loss; for an image unit, calculating the difference between the output of the neural network input by the image mask and the pixel value of the original image, and recording the difference as reconstruction loss;
step 5: training the neural network by using the perceived loss, the contrast loss and the reconstruction loss together as a total loss function;
step 6: after training, the model inputs the image and outputs the category feature vector and the reconstructed image vector.
2. An unsupervised pretraining method for an image processing neural network according to claim 1,
a process for masking an image input into a dataset, comprising the steps of:
defining an image asB refers to the quantity of input data of each batch, and H, W and C refer to the width, height and channel dimension of the image respectively; firstly dividing an image into a plurality of image blocks, setting the size P multiplied by P of the divided image blocks, and defining a set of the divided image blocksWherein->Inputting the vector form code into a linear layer in a neural network, obtaining the vector form code of the vector form code and embedding the vector form code, and adding a randomly initialized class unit +.>Splicing to obtain image block feature set->D refers to a feature dimension, expressed as:
T=Concat(Patches,V CLS )
setting a mask rate m r ∈[0,1]A mask M.epsilon.0, 1 is constructed B×N Make it satisfy (sigma) M[i]=1 1)/N≈m r The method comprises the steps of carrying out a first treatment on the surface of the Constructing a randomly initialized mask vectorThus constructing a mask input from the mask, formulating the operation as:
where M i denotes a mask flag that masks M to the ith tile feature, 1 denotes that this tile feature needs to be masked.
3. An unsupervised pretraining method for an image processing neural network according to claim 2,
a process of dividing a neural network into a plurality of stages, checking the difference of both outputs of each stage, and recording as a perceived loss, comprising the steps of:
step 3.1: dividing the neural network f into n phases, phase i being denoted Stage i (X):
f j (X)=Stage j ⊙Stage j-1 ⊙...⊙Stage 1 (X)
Wherein "+.;
step 3.2: stage per Stage i (X) contains lambda i The flow of the visual transducer is expressed as:
X′ (l) =X (l) +MSA(LN(X (l) ))
X (l+1) =X′ (l) +FFN(LN(X′ (l) ))
where l denotes a layer-l neural network, LN denotes layer normalization, MSA denotes a multi-head attention mechanism, FFN denotes a feed-forward neural network,
the flow of MSA is formulated as:
wherein Concat refers to splicing operation, N h Refer to the attention head, i.e. N h The attention mechanism outputs, the attention of the h head is defined as:
SelfAttenion (h) (X):=[φ (h) (X)]V (h)
wherein the method comprises the steps ofIs a function of providing spatial attention based on the content of the input data, and functions to aggregate V (h) The definition is:
wherein the method comprises the steps ofIs a linear projection matrix, τ φ Is a temperature parameter;
the flow of FFN is described as:
FFN(X)=σ(XW 1 )W 2
wherein W is 1 ,W 2 Is a linear projection matrix, σ is an activation function;
step 3.3: calculating the perceived loss, and only calculating the perceived loss of the masked area
Wherein xi j Is a super-parameter coefficient for measuring the perceptual loss weight of each stage, and is xi jj+1 ,T[b,i]The ith image block feature representing the b-th image.
4. An unsupervised pretraining method for an image processing neural network according to claim 3,
the specific implementation process of the step 4 comprises the following steps:
re-dividing the output of a network into which an original image is input into image unitsAnd category unitFor the output of the network into which the masked image is input, the +.>And->
A similarity function sim (·) is constructed to measure similarity between the taxonomies, the formula:
a cross entropy function is used as a contrast loss,
wherein T is CLS [b],Finger class unit T CLS ,/>Corresponding to the b-th image of the input data, τ being a temperature parameter for controlling the inter-class distance;
the L1 penalty is used to calculate the reconstruction penalty for the masked region,
wherein Patches [ b, i ]],Refers to the original image and the i-th image block characteristic of the b-th image in output.
5. An unsupervised pretraining method for an image processing neural network according to claim 4,
the total loss function is:
wherein, xi, beta and gamma are super parameter coefficients, and xi refers to the super parameter coefficient xi for measuring the perception loss weight of each stage in the step 3.3 j Is a set of (3).
CN202310656829.6A 2023-06-05 2023-06-05 Unsupervised pre-training method of neural network for image processing Pending CN116739075A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310656829.6A CN116739075A (en) 2023-06-05 2023-06-05 Unsupervised pre-training method of neural network for image processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310656829.6A CN116739075A (en) 2023-06-05 2023-06-05 Unsupervised pre-training method of neural network for image processing

Publications (1)

Publication Number Publication Date
CN116739075A true CN116739075A (en) 2023-09-12

Family

ID=87905518

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310656829.6A Pending CN116739075A (en) 2023-06-05 2023-06-05 Unsupervised pre-training method of neural network for image processing

Country Status (1)

Country Link
CN (1) CN116739075A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117459737A (en) * 2023-12-22 2024-01-26 中国科学技术大学 Training method of image preprocessing network and image preprocessing method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117459737A (en) * 2023-12-22 2024-01-26 中国科学技术大学 Training method of image preprocessing network and image preprocessing method
CN117459737B (en) * 2023-12-22 2024-03-29 中国科学技术大学 Training method of image preprocessing network and image preprocessing method

Similar Documents

Publication Publication Date Title
CN110909673B (en) Pedestrian re-identification method based on natural language description
CN109919204B (en) Noise image-oriented deep learning clustering method
CN109918671A (en) Electronic health record entity relation extraction method based on convolution loop neural network
Lu et al. Facial expression recognition based on convolutional neural network
CN112508077A (en) Social media emotion analysis method and system based on multi-modal feature fusion
CN110472518B (en) Fingerprint image quality judgment method based on full convolution network
CN112906500B (en) Facial expression recognition method and system based on deep privilege network
CN116739075A (en) Unsupervised pre-training method of neural network for image processing
CN105740911A (en) Structure sparsification maintenance based semi-supervised dictionary learning method
CN116311483B (en) Micro-expression recognition method based on local facial area reconstruction and memory contrast learning
CN112818764A (en) Low-resolution image facial expression recognition method based on feature reconstruction model
CN114528928A (en) Two-training image classification algorithm based on Transformer
CN116168324A (en) Video emotion recognition method based on cyclic interaction transducer and dimension cross fusion
CN116680343A (en) Link prediction method based on entity and relation expression fusing multi-mode information
CN116797848A (en) Disease positioning method and system based on medical image text alignment
CN115331284A (en) Self-healing mechanism-based facial expression recognition method and system in real scene
CN108388918B (en) Data feature selection method with structure retention characteristics
CN117409411A (en) TFT-LCD liquid crystal panel defect segmentation method and system based on semi-supervised learning
Wang et al. Action units recognition based on deep spatial-convolutional and multi-label residual network
CN114782995A (en) Human interaction behavior detection method based on self-attention mechanism
CN115456025A (en) Electroencephalogram emotion recognition method based on layered attention time domain convolution network
CN111104868B (en) Cross-quality face recognition method based on convolutional neural network characteristics
CN113111945A (en) Confrontation sample defense method based on transform self-encoder
CN114220145A (en) Face detection model generation method and device and fake face detection method and device
CN114169433A (en) Industrial fault prediction method based on federal learning + image learning + CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination