WO2021184902A1 - 图像分类方法、装置、及其训练方法、装置、设备、介质 - Google Patents
图像分类方法、装置、及其训练方法、装置、设备、介质 Download PDFInfo
- Publication number
- WO2021184902A1 WO2021184902A1 PCT/CN2020/140711 CN2020140711W WO2021184902A1 WO 2021184902 A1 WO2021184902 A1 WO 2021184902A1 CN 2020140711 W CN2020140711 W CN 2020140711W WO 2021184902 A1 WO2021184902 A1 WO 2021184902A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- network
- residual
- feature map
- convolutional
- training
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 118
- 238000012549 training Methods 0.000 title claims description 107
- 238000012545 processing Methods 0.000 claims abstract description 131
- 230000008569 process Effects 0.000 claims abstract description 50
- 230000006870 function Effects 0.000 claims description 35
- 238000011176 pooling Methods 0.000 claims description 25
- 238000013145 classification model Methods 0.000 claims description 20
- 238000010606 normalization Methods 0.000 claims description 15
- 230000014759 maintenance of location Effects 0.000 claims description 8
- 238000004321 preservation Methods 0.000 claims description 7
- 238000010586 diagram Methods 0.000 description 20
- 230000008921 facial expression Effects 0.000 description 16
- 238000013528 artificial neural network Methods 0.000 description 12
- 238000005516 engineering process Methods 0.000 description 11
- 238000004364 calculation method Methods 0.000 description 7
- 238000013473 artificial intelligence Methods 0.000 description 5
- 210000002569 neuron Anatomy 0.000 description 5
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 230000001360 synchronised effect Effects 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000011478 gradient descent method Methods 0.000 description 2
- 238000012886 linear function Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 239000010192 kaixin Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/776—Validation; Performance evaluation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
Definitions
- the present disclosure relates to the field of image processing technology, and more specifically, to an image classification method, device, device, and readable storage medium.
- an image classification method including: using a first convolutional network to process an image to be processed to obtain a first feature map; using a residual network to process the first feature map to obtain A second feature map, wherein the residual network includes a deeply separable convolutional layer; and the second feature map is processed by the second convolution network to determine the category label of the image to be processed.
- the residual network includes at least one residual module connected in series, and each residual module in the at least one residual module includes a first processing path and a second processing path, wherein, The first processing path includes the depth separable convolutional layer, and the second processing path includes a convolutional layer and a batch normalization layer.
- the residual network when the residual network includes N residual modules connected in series, where N is a positive integer greater than 1, the residual network is used to compare the first feature map
- the processing includes: separately processing the received first feature map by using the first processing path and the second processing path in the first residual module in the residual network to obtain a first residual feature map; Use the first processing path and the second processing path in the i-th residual module in the residual network to separately process the received i-1th residual feature map to obtain the i-th residual feature map , Where i is a positive integer greater than 1 and less than or equal to N.
- the first convolutional network includes a convolutional layer, a batch normalization layer, and a non-linear processing layer
- the second convolutional network includes a convolutional layer and a global average pooling layer
- a training method for an image classification model including: obtaining training samples; using a first convolutional network to process the training samples to obtain a first training feature map; using residuals The network processes the first training feature map to obtain a second training feature map; calculates a local loss value based on the second training feature map according to a local retention loss function; uses an optimizer to train the first training feature map based on the local loss value A convolutional network, the residual network and a second convolutional network, wherein the local loss preservation function represents a characteristic distance between the training sample and at least one sample of the same category.
- the training method further includes: using the second convolutional network to process the second training feature map to determine the category label of the training sample; according to the cross-entropy loss function based on The category label and the real label of the training sample are used to calculate a network loss value; an optimizer is used to train the first convolutional network, the residual network, and the second convolutional network based on the network loss value.
- an image classification device including: a first convolutional network unit configured to use the first convolutional network to process an image to be processed to obtain a first feature map; and a residual network unit , Configured to use a residual network to process the first feature map to obtain a second feature map, wherein the residual network includes a deeply separable convolutional layer; and a second convolutional network unit is configured to use The second convolutional network processes the second feature map to determine the category label of the image to be processed.
- the residual network includes at least one residual module connected in series, and each residual module in the at least one residual module includes a first processing path and a second processing path, wherein, The first processing path includes the depth separable convolutional layer, and the second processing path includes a convolutional layer and a batch normalization layer.
- the residual network when the residual network includes N residual modules connected in series, where N is a positive integer greater than 1, the residual network is used to compare the first feature map
- the processing includes: separately processing the received first feature map by using the first processing path and the second processing path in the first residual module in the residual network to obtain a first residual feature map; Use the first processing path and the second processing path in the i-th residual module in the residual network to separately process the received i-1th residual feature map to obtain the i-th residual feature map , Where i is a positive integer greater than 1 and less than or equal to N.
- the first convolutional network includes a convolutional layer, a batch normalization layer, and a non-linear processing layer
- the second convolutional network includes a convolutional layer and a global average pooling layer
- a training device for an image classification model configured to: obtain training samples; use a first convolutional network to process the training samples to obtain a first training feature map; The difference network processes the first training feature map to obtain a second training feature map; calculates a local loss value based on the second training feature map according to a local retention loss function; uses an optimizer to train the local loss value The first convolutional network, the residual network, and the second convolutional network, wherein the local loss preservation function represents a characteristic distance between the training sample and at least one sample of the same category.
- the training device is further configured to: use the second convolutional network to process the second training feature map to determine the category label of the training sample; and according to the cross-entropy loss function Calculate a network loss value based on the category label and the true label of the training sample; use an optimizer to train the first convolutional network, residual network, and second convolutional network based on the network loss value.
- an image processing device including: a processor; a memory, wherein computer readable code is stored in the memory, and the computer readable code is executed by the processor At this time, the above-mentioned image classification method or the above-mentioned image classification model training method is executed.
- a computer-readable storage medium having instructions stored thereon.
- the processor executes the image classification method or executes the image classification method described above.
- Fig. 1 shows a schematic flowchart of an image classification method according to an embodiment of the present disclosure
- Fig. 2 shows a schematic structural diagram of a classification neural network according to an embodiment of the present disclosure
- Fig. 3 shows a network structure diagram of a classification neural network according to an embodiment of the present disclosure
- Figure 4 shows a schematic processing flow chart of the convolutional layer
- Fig. 5 shows a network structure diagram of a residual module according to an embodiment of the present disclosure
- Fig. 6A shows a schematic diagram of parameters of conventional convolution
- FIG. 6B shows a schematic diagram of the parameters of the depth separable convolutional layer
- Figure 7A shows a processing flow chart of the maximum pooling layer
- Figure 7B shows a processing flow chart of the average pooling layer
- FIG. 8A shows a schematic flowchart of a training method for an image classification model according to an embodiment of the present disclosure
- FIG. 8B shows another network structure diagram of a classification neural network according to an embodiment of the present disclosure
- Fig. 9 shows a schematic block diagram of an image classification device according to an embodiment of the present disclosure.
- Fig. 10 shows a schematic block diagram of an image processing device according to an embodiment of the present disclosure
- FIG. 11 shows a schematic diagram of the architecture of an exemplary computing device according to an embodiment of the present disclosure
- Fig. 12 shows a schematic diagram of a computer storage medium according to an embodiment of the present disclosure.
- a flowchart is used in the present disclosure to illustrate the steps of the method according to the embodiment of the present disclosure. It should be understood that the preceding or following steps are not necessarily performed in precise order. Instead, the various steps can be processed in reverse order or simultaneously. At the same time, other operations can also be added to these processes.
- AI Artificial Intelligence
- digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
- artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
- Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
- Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology.
- Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning. By training a neural network based on training samples, for example, image classification and other processing can be implemented to identify image categories.
- the existing network structure for image classification is relatively complicated, has a large amount of calculation parameters, is not suitable for terminal equipment, and cannot meet the needs of real-time processing.
- the present disclosure provides an image classification method, in which the adopted classification neural network (or classification network for short) has a simple network structure, a small amount of parameter calculation, and is more convenient on the basis of ensuring the accuracy of classification. It is applied to terminal equipment, and the image processing rate is increased to realize real-time processing.
- Fig. 1 shows a schematic flow chart of an image classification method according to an embodiment of the present disclosure
- Fig. 2 shows a schematic structure diagram of a classification neural network according to an embodiment of the present disclosure. Publicly provide the image classification method and the network structure of the classification network used.
- step S101 the first convolutional network is used to process the image to be processed to obtain the first feature map.
- the image to be processed 201 may be an image including a human face
- the image classification method is used to recognize facial expressions to classify the facial expressions in the image 201.
- the facial expressions may be classified It is classified as happy, surprised, calm, sad, angry, disgusted, and fearful.
- the method according to the present disclosure will be described by classifying human facial expressions as a specific example. It should be noted that the method provided according to the present disclosure can be used for other image classification processing, which is not limited here.
- the first convolutional network 202 can receive the image 201 to be processed to perform image processing on it.
- the first convolutional network 202 may include at least one convolutional layer, for example.
- the first feature map A represents the processing result obtained by processing the image 201 to be processed by the first convolutional network 202, and is used to pass it to the residual network 203.
- a residual network is used to process the first feature map to obtain a second feature map.
- the first feature map B represents the processing result obtained by processing the first feature map A output by the first convolutional network 202 by the residual network 203.
- the residual network 203 includes a deeply separable convolutional layer, and the network structure and processing flow of the deeply separable convolutional layer will be described in detail below.
- the second feature map is processed using a second convolutional network to determine a category label of the image to be processed, the category label indicating the category to which the image to be processed belongs.
- the second convolutional network 204 processes the second feature map B and obtains the category label of the image 201 to be processed.
- the second convolutional network 204 may obtain the probability distribution corresponding to the facial expression category, and determine the category with the highest probability value in the probability distribution as the category label of the image to be processed.
- the probability distribution may be [0.2, 0.2, 0.1, 0.3, 0.2 ,0.1,0.7], therefore, the fear category with the highest probability value (ie, 0.7) can be determined as the category label of the image to be processed, indicating that the facial expression in the image to be processed is fear.
- the design concept of the residual network and the depth separable convolution processing are combined to balance network processing performance and model size, and simplify the network model size under the premise of ensuring the accuracy of image classification. , So that the model can realize real-time image classification processing on the terminal, for example, to recognize facial expressions.
- Fig. 3 shows a network structure diagram of a classification neural network according to an embodiment of the present disclosure. The specific structure of the classification neural network applied according to the method of the present disclosure will be described in detail below in conjunction with FIG. 3.
- the first convolutional network includes a convolutional layer, a batch normalization layer, and a non-linear processing layer.
- the first convolutional network 202 includes a first convolutional subnetwork Conv1 and a second convolutional subnetwork Conv2.
- the first convolution sub-network Conv1 and the second convolution sub-network Conv2 may have the same network structure.
- each sub-network includes a convolutional layer and a batch normalization layer (Batch Normalization, BN) and non-linear processing layer (ReLU).
- the input image of the first convolutional network 202 may be a face image with length H1, width H2, and channel number C.
- the input image size may be 7*7*3, that is, the length is 7, the width is 7, and the number of channels is 3.
- Fig. 4 shows a schematic processing flow chart of the convolutional layer.
- the size of the convolution kernel of the convolutional layer is set to 3*3, and the face image with the size of 7*7*3 as described above is received.
- the convolution parameters W0 and W1 for obtaining two feature maps respectively.
- the convolution parameters perform convolution operations on the input images of the 3 channels.
- the convolution kernel of each channel corresponds to the box in W0 and W1, and the convolution kernel corresponds to the corresponding position in the input image.
- the value (ie, the pixel value) of is multiplied and added, and then an offset value (b0 or b1) is added to obtain the value in the output characteristic map.
- the BN layer is used to normalize a batch of data in the network, is used to accelerate the convergence speed of the model, and is used to alleviate the problem of "gradient dispersion" in the deep network to a certain extent, so as to improve training Speed, with fast convergence.
- a series of feature maps are obtained. Assuming that the minimum batch (min-batch) is M, then the input data of a certain layer in the network can be expressed as a four-dimensional matrix (M, F, W, H), where M is the min-batch, F is the number of feature maps, and W and H are the length and width of the feature maps, respectively.
- M the min-batch
- F the number of feature maps
- W and H are the length and width of the feature maps, respectively.
- each feature map can be regarded as a feature processing unit (ie, a neuron).
- the minimum batch size (mini-batch size) can be M*W* H.
- each feature map has learnable parameters: ⁇ , ⁇ .
- the BN layer performs normalization processing for each neuron.
- the processing process of the BN layer can be expressed as: obtain the average value and variance of all neurons in a feature map, and then normalize the feature map neuron deal with.
- the formula for the BN layer is as follows:
- i ⁇ (1,2,...,F) represents the i-th feature map
- x m,w,h represents the m-th batch in the mini-batch
- the coordinate position is the pixel value of [w,h].
- ⁇ i represents the average value of x m, w, h
- ⁇ is a small value that is not equal to 0 to keep the denominator from being 0.
- y m, w, h are output values corresponding to x m, w, h
- ⁇ i and ⁇ i represent a pair of learnable hyperparameters of the i-th input feature map.
- the ReLU layer includes a non-linear function.
- the convolution operation is a linear operation, and the non-linear function is used to activate neurons to overcome the problem of gradient disappearance and speed up training.
- the residual network 203 may include at least one residual module connected in series, and each residual module in the at least one residual module includes a first processing path and a second processing path, wherein, The first processing path includes the depth separable convolutional layer, and the second processing path includes a convolutional layer and a batch normalization layer.
- the series connection may mean that the residual modules are connected in sequence.
- the residual network when the residual network includes N residual modules connected in series, where N is a positive integer greater than 1, the residual network is used to process the first feature map It includes: using the first processing path and the second processing path in the first residual module in the residual network to separately process the received first feature map to obtain the first residual feature map; The first processing path and the second processing path in the i-th residual module in the residual network separately process the received i-1th residual feature map to obtain the i-th residual feature map, where , I is a positive integer greater than 1 and less than or equal to N.
- the residual network is composed of 4 residual modules, namely, Resblock1, Resblock2, Resblock3, and Resblock4, wherein the above-mentioned 4 residual modules are connected in series, that is, connected in sequence.
- N the number of residual modules, which is not limited here.
- each residual module may have the same network structure.
- Fig. 5 shows a network structure diagram of a residual module according to an embodiment of the present disclosure.
- the residual module includes a first processing path and a second processing path.
- the first processing path includes a convolutional layer (Conv), a BN layer, a ReLU layer, a depth separable convolutional layer (DW_Conv), a BN layer, and a pooling layer (Pooling) that are sequentially connected.
- the first processing path processes the input parameter x, and the output obtained can be expressed as H(x), where H(x) represents the intermediate result obtained by processing the input parameter x by the first processing path.
- the second processing path includes a convolutional layer (Conv) and a BN layer.
- the second processing path processes the input parameter x, and the output obtained can be expressed as x′, where x′ represents an intermediate result obtained by processing the input parameter x by the second processing path.
- the output of the residual module is the addition of the intermediate processing results output by the two processing paths, that is, H(x(+x', that is, as the first residual feature map.
- the convolutional layer (Conv) The processing procedures of the BN layer and the ReLU layer are the same as those described above for the first convolutional network 202, and the description will not be repeated here.
- the convolution operation does not change the size of the input feature map, but increases the number of output channels to twice the number of input channels, and uses the pool
- the transformation layer reduces the size of the feature map to 1/2 of the original size.
- a convolutional layer with a step size of 2 is used to reduce the dimensionality of the input feature map, so that the number of channels of the feature map becomes twice the number of input channels, and the The size is reduced to 1/2 of the original size.
- the output of the residual module is obtained by adding the processing results of the two processing paths.
- FIG. 6A shows a schematic diagram of parameters of conventional convolution.
- each output feature map corresponds to all input feature maps.
- the size of the input feature map is D F ⁇ D F ⁇ N1
- the size of the output feature map is D F ⁇ D F ⁇ N2
- the size of the convolution kernel is D K ⁇ D K
- the parameters of the conventional convolution are:
- FIG. 6B shows a schematic diagram of parameters of a depth separable convolutional layer according to an embodiment of the present disclosure.
- the depthwise separable convolution layer decomposes the conventional convolution shown in FIG. 6A into a depthwise convolution and a 1 ⁇ 1 pointwise convolution.
- the parameter amount of the depth separable convolution layer is the sum of the parameter amount of the depth convolution and the 1 ⁇ 1 convolution:
- the amount of convolution parameters can represent the amount of calculation of the network.
- the calculation amount of the depth separable convolution is close to about 1/9 of the conventional convolution.
- the role of the pooling layer is to reduce the parameters, and is generally placed behind the convolutional layer.
- So pooling can be divided into maximum pooling and average pooling.
- Fig. 7A and Fig. 7B respectively show the processing flowcharts of the maximum pooling layer and the average pooling layer.
- the pooling layer shown in FIG. 5 may be an average pooling layer.
- the first residual module Resblock1 in the residual network processes the received first feature map A to obtain the first residual feature map.
- the second residual module Resblock2 in the residual network processes the received first residual characteristic map to obtain a second residual characteristic map, and so on.
- the fourth residual feature map obtained by the last residual module Resblock4 is used as the output of the residual network, that is, the second feature map B.
- the second convolutional network includes a convolutional layer and a global average pooling layer.
- the second convolutional network 204 processes the received second feature map B and obtains the category label of the input face image.
- the second convolutional network 204 includes a convolutional layer (Conv4) and a global average pooling layer (GlobalAveragePooling).
- the global average pooling layer belongs to a kind of average pooling layer, and the size of the pooling core is equal to the size of the input feature map. After the pooling process, the size of the feature map becomes 1 ⁇ 1.
- the classification network model shown in FIG. 3 can be used to achieve accurate and fast image classification, for example, to recognize the categories of facial expressions.
- the residual module is used in the classification network model. Image processing, simplify the network model, reduce the complexity of the model.
- the residual network includes a deeply separable convolutional layer, which can further reduce the calculation amount of parameters.
- the method according to the present disclosure can be used to realize rapid image classification processing, and, because the amount of network parameter calculation is reduced, the method can be applied to terminal devices with limited computing capabilities.
- the method according to the present disclosure can be used to classify various expressions of the input face image.
- the face image is processed by a 2-layer convolutional sub-network, 4 residual modules, a convolutional layer and a global average pooling layer, and the facial expression recognition result can be obtained.
- the execution speed of the method is relatively fast, and real-time expression recognition can be achieved.
- the design concept of the residual network and the depth separable convolution processing are combined to balance network processing performance and model size, and simplify the network model size under the premise of ensuring the accuracy of image classification. , So that the model can realize real-time image classification processing on the terminal, for example, to recognize facial expressions.
- FIG. 8A shows a schematic flowchart of a training method of an image classification model according to an embodiment of the present disclosure, and the training method will be described below in conjunction with FIG. 8A.
- the training method includes steps S201-S205.
- S201 Obtain training samples
- S202 Use the first convolutional network to process the training samples to obtain the first training feature map
- S203 Use the residual network to process the first training feature map to obtain the second training Feature map
- S204 calculate a local loss value based on the second training feature map according to a local retention loss function
- S205 use an optimizer to train the first convolutional network, the residual network and the The second convolutional network, wherein the local loss preservation function represents a characteristic distance between the training sample and at least one sample of the same category.
- the first convolutional network, the residual network, and the second convolutional network constitute an image classification model, which is used to implement image classification processing according to the image classification method described above.
- image classification model which is used to implement image classification processing according to the image classification method described above.
- the overall loss of the network will be affected by a single sample.
- the gap between the same category is relatively large, such as in Kaixin
- the local loss function represents the feature distance between the training sample and at least one sample of the same category, for example, making the feature distances of K similar samples in the same category as the training sample as close as possible to supervise the network Learning process.
- the local preservation loss function according to the present disclosure is expressed as:
- x i represents the sample feature currently undergoing training processing, that is, the second training feature map obtained from the training sample
- S j, j represents whether the sample feature x j in the same category belongs to a similar sample of x i
- n represents mini- The number of samples in batch.
- the local loss value of the network can be calculated, and the first convolutional network, the residual network, and the second convolutional network are trained based on the local loss value by using an optimizer.
- an optimizer As an example, a small batch stochastic gradient descent method can be used as the optimizer.
- the training method of the image classification model further includes: processing the second training feature map by using the second convolutional network to determine the category label of the training sample; and according to the cross-entropy loss
- the function calculates the network loss value based on the category label and the true label of the training sample; and uses an optimizer to train the first convolutional network, the residual network, and the second convolutional network based on the network loss value.
- the cross-entropy loss function is used to calculate the overall loss value of the classification network.
- the cross-entropy loss function may adopt a softmax function.
- the softmax function is used to convert an array into a probability distribution. Assuming that y i is the i-th element in the array, the output of softmax is expressed as:
- t i the true label of the training sample.
- the overall loss value of the network can be calculated, and the first convolutional network, the residual network, and the second convolutional network are trained based on the overall loss value by using an optimizer.
- an optimizer As an example, a small batch stochastic gradient descent method can be used as the optimizer.
- FIG. 8B shows another network structure diagram of the classification neural network according to an embodiment of the present disclosure. Compared with the network structure shown in FIG. 3, FIG. 8B also includes a training process of the classification neural network. As shown in Fig. 8B, according to the local loss function L1 as shown in formula (9), the local loss value of the network can be calculated based on the second training feature map output by the residual network, and then, as shown in formula (12) The loss function can calculate the overall loss value of the network. The final loss value can be expressed as the weighted sum of the above two parts of loss value, expressed as:
- ⁇ (0,1) and ⁇ (0,1) represent the weight value
- ⁇ and ⁇ are a set of hyperparameters.
- the optimizer trains the classification network based on the calculated loss value L, for example, adjusts the parameters in the network.
- a local loss function is proposed to calculate the local loss value of the network to characterize the feature distance between the training sample and at least one sample of the same category. It helps to ensure the accuracy of image classification.
- FIG. 9 shows a schematic block diagram of an image classification device according to an embodiment of the present disclosure.
- the device 1000 may include a first convolutional network unit 1010, a residual unit 1020, and a second convolutional network unit 1030.
- the first convolutional network unit 1010 may be configured to use the first convolutional network to process the image to be processed to obtain the first feature map.
- the residual network unit 1020 may be configured to use a residual network to process the first feature map to obtain a second feature map, wherein the residual network includes a deeply separable convolutional layer.
- the second convolutional network unit 1030 may be configured to use a second convolutional network to process the second feature map to determine the category label of the image to be processed.
- the residual network includes at least one residual module connected in series, and each residual module in the at least one residual module includes a first processing path and a second processing path, wherein, The first processing path includes the depth separable convolutional layer, and the second processing path includes a convolutional layer and a batch normalization layer.
- the residual network when the residual network includes N residual modules connected in series, where N is a positive integer greater than 1, the residual network is used to compare the first feature map
- the processing includes: separately processing the received first feature map by using the first processing path and the second processing path in the first residual module in the residual network to obtain a first residual feature map; Use the first processing path and the second processing path in the i-th residual module in the residual network to separately process the received i-1th residual feature map to obtain the i-th residual feature map , Where i is a positive integer greater than 1 and less than or equal to N.
- the first convolutional network includes a convolutional layer, a batch normalization layer, and a non-linear processing layer
- the second convolutional network includes a convolutional layer and a global average pooling layer
- a training device for an image classification model configured to: obtain training samples; use the first convolutional network to process the training samples to obtain a first training feature map; Use the residual network to process the first training feature map to obtain a second training feature map; calculate a local loss value based on the second training feature map according to a local retention loss function; use an optimizer based on the local loss
- the first convolutional network, the residual network, and the second convolutional network are trained with a value, wherein the local loss preservation function represents a characteristic distance between the training sample and at least one sample of the same category.
- the training device is further configured to: use the second convolutional network to process the second training feature map to determine the category label of the training sample; and according to the cross-entropy loss function Calculate a network loss value based on the category label and the true label of the training sample; use an optimizer to train the first convolutional network, residual network, and second convolutional network based on the network loss value.
- Fig. 10 shows a schematic block diagram of an image processing device according to an embodiment of the present disclosure.
- the device 2000 may include a processor 2010 and a memory 2020.
- computer readable code is stored in the memory 2020, and the computer readable code, when run by the processor 2010, executes the image classification method as described above or executes the image classification as described above The training method of the model.
- the processor 2010 may perform various actions and processing according to programs stored in the memory 2020.
- the processor 2010 may be an integrated circuit chip with signal processing capability.
- the above-mentioned processor may be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, or discrete hardware components.
- DSP digital signal processor
- ASIC application specific integrated circuit
- FPGA off-the-shelf programmable gate array
- Various methods, steps, and logical block diagrams disclosed in the embodiments of the present invention can be implemented or executed.
- the general-purpose processor may be a microprocessor or the processor may also be any conventional processor, etc., and may be an X86 architecture or an ARM architecture.
- the memory 2020 stores computer-executable instruction codes, which when executed by the processor 2010 are used to implement the image classification method according to the embodiments of the present disclosure or execute the above-mentioned image classification model training method.
- the memory 2020 may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
- the non-volatile memory may be read only memory (ROM), programmable read only memory (PROM), erasable programmable read only memory (EPROM), electrically erasable programmable read only memory (EEPROM), or flash memory.
- Volatile memory may be random access memory (RAM), which acts as an external cache.
- RAM synchronous dynamic random access memory
- SDRAM synchronous dynamic random access memory
- DDRSDRAM double data rate synchronous dynamic random access memory
- ESDRAM enhanced synchronous dynamic random access memory
- SLDRAM synchronous connection dynamic random access memory
- DR RAM direct memory bus random access memory
- the method or device according to the embodiment of the present disclosure may also be implemented by means of the architecture of the computing device 3000 shown in FIG. 11.
- the computing device 3000 may include a bus 3010, one or more CPUs 3020, a read only memory (ROM) 3030, a random access memory (RAM) 3040, a communication port 3050 connected to a network, and input/output components. 3060, hard disk 3070, etc.
- the storage device in the computing device 3000 such as the ROM 3030 or the hard disk 3070, can store various data or files used in the processing and/or communication of the image classification method or image classification model training method provided in the present disclosure, and the program instructions executed by the CPU .
- the computing device 3000 may also include a user interface 3080.
- the architecture shown in FIG. 5 is only exemplary. When implementing different devices, one or more components in the computing device shown in FIG. 5 may be omitted according to actual needs.
- FIG. 12 shows a schematic diagram 4000 of a storage medium according to the present disclosure.
- computer-readable instructions 4010 are stored on the computer storage medium 4020.
- the image classification method or the training method of the image classification model described with reference to the above figures can be executed, so as to realize the recognition of the image category using the trained classification neural network, especially Used for facial expression recognition.
- the computer-readable storage medium includes, but is not limited to, for example, volatile memory and/or non-volatile memory.
- the volatile memory may include random access memory (RAM) and/or cache memory (cache), for example.
- the non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, and the like.
- the computer storage medium 4020 may be connected to a computing device such as a computer, and then, when the computing device executes the computer readable instructions 4010 stored on the computer storage medium 4020, the above-mentioned operation may be performed.
- a computing device such as a computer
- the image classification method or the training method of the image classification model provided according to the present disclosure.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Probability & Statistics with Applications (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (14)
- 一种图像分类方法,包括:利用第一卷积网络对待处理图像进行处理,得到第一特征图(S101);利用残差网络对所述第一特征图进行处理,得到第二特征图(S102),其中,所述残差网络中包括深度可分离卷积层;以及利用第二卷积网络对所述第二特征图进行处理,以确定所述待处理图像的类别标签(S103)。
- 根据权利要求1所述的方法,其中,所述残差网络包括串联连接的至少一个残差模块,所述至少一个残差模块中的每一残差模块包括第一处理路径和第二处理路径,其中,所述第一处理路径中包括所述深度可分离卷积层,所述第二处理路径中包括卷积层和批归一化层。
- 根据权利要求2所述的方法,其中,在所述残差网络包括串联连接的N个残差模块的情况下,其中,N为大于1的正整数,所述利用残差网络对所述第一特征图进行处理包括:利用所述残差网络中的第1个残差模块中的第一处理路径和第二处理路径单独对接收的所述第一特征图进行处理,得到第一残差特征图;利用所述残差网络中的第i个残差模块中的第一处理路径和第二处理路径单独对接收的所述第i-1个残差特征图进行处理,得到第i残差特征图,其中,i为大于1小于等于N的正整数。
- 根据权利要求1所述的方法,其中,所述第一卷积网络包括卷积层、批归一化层和非线性处理层,所述第二卷积网络包括卷积层和全局平均池化层。
- 一种图像分类模型的训练方法,包括:获取训练样本(S201);利用第一卷积网络对所述训练样本进行处理,得到第一训练特征图(S202);利用残差网络对所述第一训练特征图进行处理,得到第二训练特征图(S203);按照局部保持损失函数基于所述第二训练特征图计算局部损失值(S204);利用优化器基于所述局部损失值训练所述第一卷积网络、所述残差网络和第二卷积网络(S205),其中,所述局部损失保持函数表征所述训练样本与同类别的至少一个样本之间的特征距离。
- 根据权利要求5所述的训练方法,还包括:利用所述第二卷积网络对所述第二训练特征图进行处理,以确定所述训练样本的类别标签;按照交叉熵损失函数基于所述类别标签以及所述训练样本的真实标签计算网络损失值;利用优化器基于所述网络损失值训练所述第一卷积网络、残差网络和第二卷积网络。
- 一种图像分类装置(1000),包括:第一卷积网络单元(1010),配置成利用第一卷积网络对待处理图像进行处理,得到第一特征图;残差网络单元(1020),配置成利用残差网络对所述第一特征图进行处理,得到第二特征图,其中,所述残差网络中包括深度可分离卷积层;以及第二卷积网络单元(1030),配置成利用第二卷积网络对所述第二特征图进行处理,以确定所述待处理图像的类别标签。
- 根据权利要求7所述的装置,其中,所述残差网络包括串联连接的至少一个残差模块,所述至少一个残差模块中的每一残差模块包括第一处理路径和第二处理路径,其中,所述第一处理路径中包括所述深度可分离卷积层,所述第二处理路径中包括卷积层和批归一化层。
- 根据权利要求8所述的装置,其中,在所述残差网络包括串联连接的N个残差模块的情况下,其中,N为大于1的正整数,所述利用残差网络对所述第一特征图进行处理包括:利用所述残差网络中的第1个残差模块中的第一处理路径和第二处理路径单独对接收的所述第一特征图进行处理,得到第一残差特征图;利用所述残差网络中的第i个残差模块中的第一处理路径和第二处理路径单独对接收的所述第i-1个残差特征图进行处理,得到第i残差特征图,其中,i为大于1小于等于N的正整数。
- 根据权利要求7所述的装置,其中,所述第一卷积网络包括卷积层、 批归一化层和非线性处理层,所述第二卷积网络包括卷积层和全局平均池化层。
- 一种图像分类模型的训练装置,配置成:获取训练样本;利用第一卷积网络对所述训练样本进行处理,得到第一训练特征图;利用残差网络对所述第一训练特征图进行处理,得到第二训练特征图;按照局部保持损失函数基于所述第二训练特征图计算局部损失值;利用优化器基于所述局部损失值训练所述第一卷积网络、所述残差网络和第二卷积网络,其中,所述局部损失保持函数表征所述训练样本与同类别的至少一个样本之间的特征距离。
- 根据权利要求11所述的训练装置,还配置成:利用所述第二卷积网络对所述第二训练特征图进行处理,以确定所述训练样本的类别标签;按照交叉熵损失函数基于所述类别标签以及所述训练样本的真实标签计算网络损失值;利用优化器基于所述网络损失值训练所述第一卷积网络、残差网络和第二卷积网络。
- 一种图像处理设备(2000),包括:处理器(2010);存储器(2020),其中,所述存储器中存储有计算机可读代码,所述计算机可读代码当由所述处理器运行时,执行如权利要求1-4中任一项所述的图像分类方法,或执行如权利要求5-6中任一项所述的图像分类模型的训练方法。
- 一种计算机可读存储介质,其上存储有指令,所述指令在被处理器执行时,使得所述处理器执行如权利要求1-4中任一项所述的图像分类方法,或执行如权利要求5-6中任一项所述的图像分类模型的训练方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/417,127 US11854248B2 (en) | 2020-03-19 | 2020-12-29 | Image classification method, apparatus and training method, apparatus thereof, device and medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010194826.1A CN111368937B (zh) | 2020-03-19 | 2020-03-19 | 图像分类方法、装置、及其训练方法、装置、设备、介质 |
CN202010194826.1 | 2020-03-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021184902A1 true WO2021184902A1 (zh) | 2021-09-23 |
Family
ID=71209004
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/140711 WO2021184902A1 (zh) | 2020-03-19 | 2020-12-29 | 图像分类方法、装置、及其训练方法、装置、设备、介质 |
Country Status (3)
Country | Link |
---|---|
US (1) | US11854248B2 (zh) |
CN (1) | CN111368937B (zh) |
WO (1) | WO2021184902A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113780249A (zh) * | 2021-11-10 | 2021-12-10 | 腾讯科技(深圳)有限公司 | 表情识别模型的处理方法、装置、设备、介质和程序产品 |
CN114581427A (zh) * | 2022-03-11 | 2022-06-03 | 合肥极隆电子科技有限公司 | 基于轻量化边缘计算的智能化工业质检方法及系统 |
WO2024060909A1 (zh) * | 2022-09-20 | 2024-03-28 | 支付宝(杭州)信息技术有限公司 | 识别表情的方法、装置、设备及介质 |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111368937B (zh) | 2020-03-19 | 2024-05-28 | 京东方科技集团股份有限公司 | 图像分类方法、装置、及其训练方法、装置、设备、介质 |
JP7486349B2 (ja) * | 2020-05-28 | 2024-05-17 | キヤノン株式会社 | ニューラルネットワーク、ニューラルネットワークの学習方法、プログラム、画像処理装置 |
CN112396103A (zh) * | 2020-11-16 | 2021-02-23 | 平安科技(深圳)有限公司 | 图像分类方法、装置及存储介质 |
CN112801128B (zh) * | 2020-12-14 | 2023-10-13 | 深圳云天励飞技术股份有限公司 | 非机动车识别方法、装置、电子设备及存储介质 |
CN112801918A (zh) * | 2021-03-11 | 2021-05-14 | 苏州科达科技股份有限公司 | 图像增强模型的训练方法、图像增强方法及电子设备 |
CN113537124B (zh) * | 2021-07-28 | 2024-06-18 | 平安科技(深圳)有限公司 | 模型训练方法、装置与存储介质 |
CN113627416B (zh) * | 2021-10-12 | 2022-01-25 | 上海蜜度信息技术有限公司 | 图片分类和对象检测的同步处理方法、系统、存储介质及终端 |
CN117152542B (zh) * | 2023-10-30 | 2024-01-30 | 武昌理工学院 | 一种基于轻量化网络的图像分类方法和系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108764317A (zh) * | 2018-05-21 | 2018-11-06 | 浙江工业大学 | 一种基于多路特征加权的残差卷积神经网络图像分类方法 |
CN109740534A (zh) * | 2018-12-29 | 2019-05-10 | 北京旷视科技有限公司 | 图像处理方法、装置及处理设备 |
CN109871830A (zh) * | 2019-03-15 | 2019-06-11 | 中国人民解放军国防科技大学 | 基于三维深度残差网络的空谱融合高光谱图像分类方法 |
CN110334715A (zh) * | 2019-07-04 | 2019-10-15 | 电子科技大学 | 一种基于残差注意网络的sar目标识别方法 |
CN111368937A (zh) * | 2020-03-19 | 2020-07-03 | 京东方科技集团股份有限公司 | 图像分类方法、装置、及其训练方法、装置、设备、介质 |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9858496B2 (en) * | 2016-01-20 | 2018-01-02 | Microsoft Technology Licensing, Llc | Object detection and classification in images |
WO2018133034A1 (en) * | 2017-01-20 | 2018-07-26 | Intel Corporation | Dynamic emotion recognition in unconstrained scenarios |
US10891723B1 (en) * | 2017-09-29 | 2021-01-12 | Snap Inc. | Realistic neural network based image style transfer |
WO2019119396A1 (zh) | 2017-12-22 | 2019-06-27 | 中国科学院深圳先进技术研究院 | 人脸表情识别方法及装置 |
US11636328B2 (en) * | 2018-03-28 | 2023-04-25 | University Of Maryland, College Park | L2 constrained softmax loss for discriminative face verification |
CN110163215B (zh) * | 2018-06-08 | 2022-08-23 | 腾讯科技(深圳)有限公司 | 图像处理方法、装置、计算机可读介质及电子设备 |
CN109348211B (zh) * | 2018-08-06 | 2020-11-06 | 中国科学院声学研究所 | 一种视频帧内帧间编码的通用信息隐藏检测方法 |
CN109191369B (zh) * | 2018-08-06 | 2023-05-05 | 三星电子(中国)研发中心 | 2d图片集转3d模型的方法、存储介质和装置 |
KR102122065B1 (ko) * | 2018-10-23 | 2020-06-11 | 주식회사 아나패스 | 보간된 전역 지름길 연결을 적용한 잔류 컨볼루션 신경망을 이용하는 초해상도 추론 방법 및 장치 |
US11024037B2 (en) * | 2018-11-15 | 2021-06-01 | Samsung Electronics Co., Ltd. | Foreground-background-aware atrous multiscale network for disparity estimation |
CN109815785A (zh) | 2018-12-05 | 2019-05-28 | 四川大学 | 一种基于双流卷积神经网络的人脸情绪识别方法 |
CN109886190A (zh) | 2019-02-20 | 2019-06-14 | 哈尔滨工程大学 | 一种基于深度学习的人脸表情和姿态双模态融合表情识别方法 |
CN112307826A (zh) * | 2019-07-30 | 2021-02-02 | 华为技术有限公司 | 行人检测方法、装置、计算机可读存储介质和芯片 |
CN110651277B (zh) * | 2019-08-08 | 2023-08-01 | 京东方科技集团股份有限公司 | 计算机实现的方法、计算机实现的诊断方法、图像分类设备、以及计算机程序产品 |
-
2020
- 2020-03-19 CN CN202010194826.1A patent/CN111368937B/zh active Active
- 2020-12-29 US US17/417,127 patent/US11854248B2/en active Active
- 2020-12-29 WO PCT/CN2020/140711 patent/WO2021184902A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108764317A (zh) * | 2018-05-21 | 2018-11-06 | 浙江工业大学 | 一种基于多路特征加权的残差卷积神经网络图像分类方法 |
CN109740534A (zh) * | 2018-12-29 | 2019-05-10 | 北京旷视科技有限公司 | 图像处理方法、装置及处理设备 |
CN109871830A (zh) * | 2019-03-15 | 2019-06-11 | 中国人民解放军国防科技大学 | 基于三维深度残差网络的空谱融合高光谱图像分类方法 |
CN110334715A (zh) * | 2019-07-04 | 2019-10-15 | 电子科技大学 | 一种基于残差注意网络的sar目标识别方法 |
CN111368937A (zh) * | 2020-03-19 | 2020-07-03 | 京东方科技集团股份有限公司 | 图像分类方法、装置、及其训练方法、装置、设备、介质 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113780249A (zh) * | 2021-11-10 | 2021-12-10 | 腾讯科技(深圳)有限公司 | 表情识别模型的处理方法、装置、设备、介质和程序产品 |
CN114581427A (zh) * | 2022-03-11 | 2022-06-03 | 合肥极隆电子科技有限公司 | 基于轻量化边缘计算的智能化工业质检方法及系统 |
WO2024060909A1 (zh) * | 2022-09-20 | 2024-03-28 | 支付宝(杭州)信息技术有限公司 | 识别表情的方法、装置、设备及介质 |
Also Published As
Publication number | Publication date |
---|---|
US20220165053A1 (en) | 2022-05-26 |
CN111368937B (zh) | 2024-05-28 |
CN111368937A (zh) | 2020-07-03 |
US11854248B2 (en) | 2023-12-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021184902A1 (zh) | 图像分类方法、装置、及其训练方法、装置、设备、介质 | |
WO2020238293A1 (zh) | 图像分类方法、神经网络的训练方法及装置 | |
WO2019228317A1 (zh) | 人脸识别方法、装置及计算机可读介质 | |
WO2021042828A1 (zh) | 神经网络模型压缩的方法、装置、存储介质和芯片 | |
US20210012198A1 (en) | Method for training deep neural network and apparatus | |
WO2021047286A1 (zh) | 文本处理模型的训练方法、文本处理方法及装置 | |
WO2020228376A1 (zh) | 文本处理方法、模型训练方法和装置 | |
WO2021114625A1 (zh) | 用于多任务场景的网络结构构建方法和装置 | |
US11704817B2 (en) | Method, apparatus, terminal, and storage medium for training model | |
WO2019100724A1 (zh) | 训练多标签分类模型的方法和装置 | |
WO2021022521A1 (zh) | 数据处理的方法、训练神经网络模型的方法及设备 | |
WO2021159714A1 (zh) | 一种数据处理方法及相关设备 | |
WO2022001805A1 (zh) | 一种神经网络蒸馏方法及装置 | |
WO2021147325A1 (zh) | 一种物体检测方法、装置以及存储介质 | |
WO2021218517A1 (zh) | 获取神经网络模型的方法、图像处理方法及装置 | |
CN112257449B (zh) | 命名实体识别方法、装置、计算机设备和存储介质 | |
EP4006776A1 (en) | Image classification method and apparatus | |
WO2021051987A1 (zh) | 神经网络模型训练的方法和装置 | |
WO2021057884A1 (zh) | 语句复述方法、训练语句复述模型的方法及其装置 | |
CN114186063B (zh) | 跨域文本情绪分类模型的训练方法和分类方法 | |
CN114266897A (zh) | 痘痘类别的预测方法、装置、电子设备及存储介质 | |
WO2021127982A1 (zh) | 语音情感识别方法、智能装置和计算机可读存储介质 | |
WO2020192523A1 (zh) | 译文质量检测方法、装置、机器翻译系统和存储介质 | |
WO2022156475A1 (zh) | 神经网络模型的训练方法、数据处理方法及装置 | |
CN113065512A (zh) | 人脸微表情识别方法、装置、设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20925616 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20925616 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.05.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20925616 Country of ref document: EP Kind code of ref document: A1 |