WO2019223397A1 - 图像处理方法、装置、计算机设备和计算机存储介质 - Google Patents
图像处理方法、装置、计算机设备和计算机存储介质 Download PDFInfo
- Publication number
- WO2019223397A1 WO2019223397A1 PCT/CN2019/077341 CN2019077341W WO2019223397A1 WO 2019223397 A1 WO2019223397 A1 WO 2019223397A1 CN 2019077341 W CN2019077341 W CN 2019077341W WO 2019223397 A1 WO2019223397 A1 WO 2019223397A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- neural network
- convolutional layers
- feature map
- processing
- layer
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title 1
- 238000000034 method Methods 0.000 claims abstract description 97
- 238000013528 artificial neural network Methods 0.000 claims abstract description 52
- 238000003062 neural network model Methods 0.000 claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 16
- 238000012545 processing Methods 0.000 claims description 90
- 230000008569 process Effects 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 abstract description 21
- 108091006146 Channels Proteins 0.000 description 96
- 238000013527 convolutional neural network Methods 0.000 description 52
- 238000010586 diagram Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 11
- 101100153581 Bacillus anthracis topX gene Proteins 0.000 description 9
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 9
- 101150041570 TOP1 gene Proteins 0.000 description 9
- 230000006978 adaptation Effects 0.000 description 9
- 238000004891 communication Methods 0.000 description 8
- 238000005070 sampling Methods 0.000 description 7
- 230000000717 retained effect Effects 0.000 description 6
- 238000010200 validation analysis Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000009466 transformation Effects 0.000 description 5
- 230000007423 decrease Effects 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 230000015556 catabolic process Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000005284 excitation Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 238000000844 transformation Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000004069 differentiation Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000007477 logistic regression Methods 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000013526 transfer learning Methods 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 238000012935 Averaging Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000002301 combined effect Effects 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
Definitions
- the embodiments of the present application relate to the field of deep learning, and relate to, but are not limited to, image recognition methods and devices, computer equipment, and storage media.
- Convolutional neural networks have become mainstream methods in the field of computer vision.
- the existing mainstream convolutional neural networks are, for example, the Oxford University Computer Vision Group (VGG), Residual Network (ResNet), and densely connected volumes.
- Product networks Dense, Convolutional, Network, DenseNet, etc., all use batch normalization (BN) to speed up training.
- BN batch normalization
- these convolutional neural networks are less robust to changes in image appearance. For example, when the color, contrast, style, scene, etc. of an image change, the performance of these convolutional neural networks will significantly decrease.
- instance normalization is used in convolutional neural networks to improve its adaptability to images with different appearances.
- BN convolutional neural network
- Embodiments of the present application provide an image recognition method and device, a computer device, and a storage medium.
- An embodiment of the present application provides an image recognition method.
- the method includes: acquiring an image to be identified; inputting the image to be identified, and a neural network model obtained through training to obtain a recognition result of the image to be identified;
- the neural network model is obtained by performing IN and BN processing on a feature map output by a convolution layer in the neural network; and outputting a recognition result of the image to be recognized.
- An embodiment of the present application provides an image recognition device.
- the device includes: a first acquisition module, a first processing module, and a first output module, wherein the first acquisition module is configured to acquire an image to be identified; and the first processing module,
- the neural network model configured to input the image to be identified and trained to obtain the recognition result of the image to be identified, wherein the neural network model is obtained by performing IN and BN processing on the neural network.
- the first output module is configured to output a recognition result of the image to be recognized.
- An embodiment of the present application provides a computer program product.
- the computer program product includes computer-executable instructions. After the computer-executable instructions are executed, the steps in the image recognition method provided by the embodiments of the present application can be implemented.
- An embodiment of the present application provides a computer storage medium that stores computer-executable instructions. After the computer-executable instructions are executed, the steps in the image recognition method provided by the embodiments of the present application can be implemented.
- An embodiment of the present application provides a computer device including a memory and a processor.
- the memory stores computer-executable instructions.
- the processor runs the computer-executable instructions on the memory, the processor can implement the information provided by the embodiments of the present application. Steps in image recognition method.
- An embodiment of the present application is a computing program product, wherein the computer program product includes computer-executable instructions, and after the computer-executable instructions are executed, the steps in the image recognition method provided by the embodiments of the present application can be implemented.
- the combination of IN and BN is applied to a neural network, which effectively improves the accuracy of image recognition.
- FIG. 1A is a schematic diagram of a composition structure of a network architecture according to an embodiment of the present application.
- FIG. 1B is a schematic flowchart of an image recognition method according to an embodiment of the present application.
- 1C is a network architecture diagram of an image recognition method according to an embodiment of the present application.
- FIG. 1D is a network architecture diagram of another image recognition method according to an embodiment of the present application.
- FIG. 2 is a schematic flowchart of another implementation of an image recognition method according to an embodiment of the present application.
- FIG. 3 is a structural diagram of a residual network based on an embodiment of the present application.
- FIG. 4 is a structural diagram of a residual network based on another embodiment of the present application.
- FIG. 5 is a schematic structural diagram of an image recognition device according to an embodiment of the present application.
- FIG. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application.
- FIG. 1A is a schematic structural diagram of a network architecture according to an embodiment of the present application.
- the network architecture includes two or more computer devices 11 to 1N and a server 31.
- the devices 11 to 1N interact with the server 31 through the network 21.
- the computer device may be various types of computing devices with information processing capabilities in the implementation process.
- the computer device may include a mobile phone, a tablet computer, a desktop computer, a personal digital assistant, a navigator, a digital phone, a television, and the like.
- This embodiment proposes an image recognition method, which can effectively solve the problem that the structural information of the output image changes compared with the input image.
- the method is applied to a computer device, and the functions implemented by the method can be called by a processor in the computer device.
- the program code is implemented.
- the program code can be stored in a computer storage medium. It can be seen that the computer device includes at least a processor and a storage medium.
- Channel the word has two different meanings. The first is for sample images (images are used as training samples). The channel is Refers to the color channel. The color channel is used to represent the channel of the sample image. The second is the dimension of the output space, such as the number of output channels in the convolution operation, or the number of convolution kernels in each convolution layer. .
- Color channel which decomposes an image into one or more color components or color components. For a single color channel, only one value is needed for one pixel to represent grayscale, and 0 is black.
- the three color channels if the red, green and blue (RGB) color mode is used, the image is divided into three red, green, and blue color channels, which can represent color, and all 0s represent black.
- Add an alpha channel to the RGB color mode to indicate transparency, and alpha 0 means full transparency.
- Convolutional neural network is a kind of multilayer supervised learning neural network. The convolutional layer and pool sampling layer of the hidden layer are the core modules to realize the feature extraction function of the convolutional neural network.
- the low hidden layer of the convolutional neural network is composed of the convolutional layer and the maximum pool sampling layer alternately, and the high layer is the hidden layer and the logistic regression classifier of the fully connected layer corresponding to the traditional multilayer perceptron.
- the input of the first fully connected layer is a feature image obtained by feature extraction from the convolutional layer and the sub-sampling layer.
- the final output layer is a classifier, which can use logistic regression, Softmax regression or even support vector machines to classify the input image.
- Each layer in CNN is composed of multiple maps, and each map is composed of multiple neural units. All neural units of the same map share a convolution kernel (that is, weights).
- the convolution kernel often represents a feature.
- CNN generally uses convolutional layers and sampling layers alternately, that is, one convolutional layer is connected to one sampling layer, and the sampling layer is followed by one convolution; of course, multiple convolutional layers can be connected to one sampling layer, such that Features are extracted, and then combined to form more abstract features, and finally to describe image features, CNN can be followed by a fully connected layer.
- ReLU has three main changes compared with other activation functions such as sigmoid function: 1 one-sided inhibition; 2 Relatively wide activation boundary; 3 sparse activation.
- the VGG model and VGG model structure are simple and effective. The first few layers only use 3 ⁇ 3 convolution kernels to increase the network depth. The maximum number of neurons in each layer is reduced in turn by Max Pooling. The last three layers are two. A fully connected layer of 4096 neurons and a softmax layer. "16" and "19" indicate the number of convolutional layers and fully connected layers in the network that need to be updated with weights (that is, weight, parameters to be learned). The weights of the VGG16 model and VGG19 model are both trained by ImageNet.
- FIG. 1B is a schematic flowchart of an image recognition method according to an embodiment of the present application. As shown in FIG. 1B, the method includes the following steps:
- Step S101 Acquire an image to be identified.
- the step S101 may be implemented by a computer device.
- the computer device may be a smart terminal, for example, a mobile terminal (such as a mobile phone), a tablet computer, a notebook computer and other mobile terminals with wireless communication capabilities.
- the device may also be a smart terminal device such as a desktop computer that is difficult to move.
- the computer equipment is used for image recognition or processing.
- the image to be processed may be an image with a complicated appearance or an image with a simple appearance.
- Step S102 input the image to be identified and a neural network model obtained through training to obtain a recognition result of the image to be identified.
- the step S102 may be implemented by a computer device.
- the neural network model is obtained by performing IN and BN processing on a feature map output by a convolution layer in the neural network.
- the feature map output by the convolution layer in the neural network is processed by IN and BN to obtain the neural network model, namely, IBN-Net.
- the recognition result may be a category of the image, a name of the image, and the like.
- the neural network may be a convolutional neural network, such as ResNet50, VGG, and DenseNet.
- step S103 a recognition result of the image to be recognized is output.
- the step S103 may be implemented by a computer device.
- the computer device outputting the analysis result of the image to be identified may be outputting the image to be identified on its own display screen, or the computer device may output the analysis result to other devices, That is, it is sent to other devices, for example, the other devices may be on a user's smart terminal.
- an IN and BN is combined and applied to a neural network, and then an image to be recognized is analyzed in a neural network model processed by the IN and BN, thereby effectively improving The accuracy of image recognition.
- the trained neural network model can be local to the computer equipment or it can be on the server side.
- the trained neural network model When the trained neural network model is local to the computer device, it may be when the computer device is installed with the client, that is, the trained neural network model is installed. In this way, as shown in FIG. 1C, the computer device obtains the image to be identified through step S101. Then, the recognition result of the image to be recognized is obtained through step S102, and finally the recognition result is output through step S103. It can be seen from the above process that after the computer device is installed with the client, the above steps S101 to S103 are performed locally on the computer device. Finally, the computer device outputs the recognition result to the user.
- the trained neural network model may also be located on the server side, as shown in FIG. 1D.
- the computer device sends the input image to the server
- the server receives the input image sent by the computer device
- the server implements step S101.
- step S101 includes: the server receives the input image sent by the computer device, that is, the server determines the image to be identified, and then the server obtains the image to be identified through step S102. Output the result, and finally obtain the output recognition result through step S103. From the above process, it can be seen that the above steps S101 to S103 are performed on the server side.
- the server can also send the recognition result to the computer device, so that the computer device receives After the recognition result, the recognition result is output to the user.
- the user uploads the user's to-be-recognized image and receives the to-be-recognized image sent by the server, and outputs the recognition result to the user.
- FIG. 2 is a schematic flowchart of another implementation of the image recognition method according to the embodiment of the present application. As shown in FIG. 2, the method includes the following steps:
- Step S201 Determine a first set of convolution layers and the second set of convolution layers from the convolution layers of the neural network.
- the set composed of the first convolution layer set and the second convolution layer set is all or part of all convolution layers of the neural network.
- the set consisting of the first convolution layer set and the second convolution layer set is all of all the convolution layers of the neural network, and it can be understood that all the convolution layers in the neural network have passed through IN and / or BN processing.
- the set consisting of the first set of convolutional layers and the second set of convolutional layers is part of all the convolutional layers of the neural network. It can be understood that some of the convolutional layers in the neural network do not perform IN or IN. Combined with BN processing.
- Step S202 Determine a first channel set from a channel corresponding to a feature map output by each of the first convolution layer sets.
- the first channel set is all or part of all channels corresponding to a feature map output by each of the convolutional layers in the first set of convolutional layers.
- the first set of convolutional layers does not include the last convolutional layer in the neural network, that is, the last layer (deep layer) of the neural network is not subjected to IN processing, so that the content in the deep features is not reduced.
- the degree of discrimination can also reduce feature changes caused by image appearance transformations. In this way, the accuracy of image recognition of the neural network model is improved.
- step S203 an IN process is performed on the first channel set.
- the first channel set is all of all channels corresponding to the feature map output by each of the convolutional layers in the first set of convolutional layers
- All channels corresponding to the feature map output by the convolutional layer are subjected to IN processing;
- the first channel set is a part of all the channels corresponding to the feature map outputted by each of the first convolutional layer set, Processing part of the channels IN, and then performing BN processing on the remaining channels, or do nothing.
- Step S204 Determine a second channel set from the channels corresponding to the feature map output by each convolutional layer in the second set of convolutional layers.
- the second channel set is all or part of all channels corresponding to a feature map output by each of the convolutional layers in the second set of convolutional layers.
- Step S205 BN processing is performed on the second channel set.
- the second channel set is all of all the channels corresponding to the feature map output by each of the convolutional layer sets in the second convolutional layer set, All channels corresponding to the feature map output by the convolution layer are BN processed; when the second channel set is a part of all channels corresponding to the feature map output by each of the convolution layer sets in the second convolution layer set, BN processing is performed on the partial channels, and then IN processing is performed on the remaining channels.
- the relationship between the first set of convolutional layers and the second set of convolutional layers includes the following three cases: Case 1: the first set of convolutional layers and the second volume
- the convolution set does not have intersection, that is, the first convolution set and the second convolution set are respectively subjected to different normalization processing, that is, the feature map output by each convolution layer in the first convolution layer set IN processing; performing BN processing on the feature map output of each convolution layer in the second set of convolution layers, as shown in FIG. 4 (b), performing IN on only a part of the output result obtained after the summing operation Processing, and BN processing is performed on the feature maps output by the remaining convolutional layers.
- Case two the first set of convolutional layers and the second set of convolutional layers have an intersection, that is, the first convolutional layer set is processed with IN, IN combined with BN; the second convolutional layer set is processed Processing of BN, IN combined with BN; that is, as described in step 202 and step 203, when the first channel is a part of all channels corresponding to the feature map output by each of the first convolution layer set At this time, the part is subjected to an IN process and the remaining part is subjected to a BN process.
- step 204 and step 205 when the second channel is a part of all channels corresponding to the feature map output by each of the convolutional layers in the second set of convolutional layers, BN processing is performed on the part. , Perform IN processing on the rest; as shown in Figure 4 (d), perform BN, IN combined with BN processing on the feature map output by the convolution layer.
- Case 3 The second set of convolutional layers is a subset of the first set of convolutional layers. When the second set of convolutional layers is a true subset of the first set of convolutional layers, the first The convolution set is processed by IN, IN and BN, and the second convolution set is processed by IN and BN.
- the processing of IN and BN is performed on both the second set of convolutional layers and the first set of convolutional layers. That is, for each part of the first convolution layer set corresponding to the feature map output of the first channel, all parts of the channel are subjected to IN processing, and the remaining parts are BN processed, or not processed ( That is, the first convolution layer collection layer includes two processing modes: IN, IN combined with BN).
- the method further includes: summing the feature maps corresponding to the two blocks of the neural network to obtain an output result, and performing IN processing on the output result; as shown in FIG. 3 (c) First of all, the feature map obtained by three-layer convolution of the residual block presented in Figure 3 (c) and the feature map obtained by multi-layer convolution of the previous residual block are summed to obtain the summation result (that is, the output result ); And then IN processing the summed result.
- the appearance information can be retained in the residual path or the identity path.
- the residual path that is, path 1 in FIG. 3
- the identity path that is, path 2 in FIG. 3
- the IN and BN are used in the same CNN by deeply studying the learning capabilities of IN and BN.
- BN is adopted as a key component to improve the learning ability of its advanced visual tasks
- IN is usually combined with CNN to eliminate the variance of images of low-level visual tasks, such as image style transformation.
- the IBN-Net provided in the embodiments of the present application shows that combining IN and BN in an appropriate manner will improve the learning and generalization capabilities of CNNs.
- the features of the combination of IN and BN are retained in the shallow layer of the CNN, and the BN features are retained in the upper layer of the CNN to meet the statistical characteristics at different depths of the network.
- the information related to the appearance of the image (such as color, contrast, style, etc.) is mainly present in the features of the shallow layer, while the information related to the object category in the image is mainly present in the features of the deep layer.
- the IN layer is introduced to the CNN according to two rules: First, in order to reduce the feature changes caused by the appearance in the shallow layer, while not disturbing the deep content differentiation, the IN layer is only added to the lower half of the CNN.
- the IBN-Net proposed in the embodiment of the present application improves the performance and generalization ability of the convolutional neural network.
- the accuracy of IBN-Net50 on the original verification set of the ImageNet database reaches the fifth and first accuracy rates of 93.7% and 77.4%, respectively.
- the verification rate is 0.8% and 1.7% higher than that of ResNet50, respectively.
- the accuracy of IBN-Net50 on the new validation set after ImageNet's style conversion reached the fifth and first accuracy rates of 72.9% and 48.9%, respectively, which are 2.2% and 2.9% higher than the validation rate of ResNet50, respectively.
- IN provides visual and appearance invariance
- BN can speed up training and retain distinguishing features. This feature helps to design the architecture of IBN-Net, where the IN is set in the shallow layer to eliminate appearance changes, and in order to maintain differentiation, the strength of the IN in the deep layer should be reduced.
- IBN-Net modules can be used to redevelop many of the deep architectures that are currently being researched to improve the learning and generalization capabilities of the deep structure, but keep the computational cost of the deep structure unchanged.
- IBN-Net significantly improves cross-domain performance.
- real data sets and virtual data sets based on traffic scenes belong to two image domains, where the real data set can be Cityscapes and the virtual data set can be Cityscapes Grand Theft Auto (GTA); when Trained on the GTA and tested on Cityscapes, the IBN-Net integrated ResNet50 performance improved by 7.6%.
- GTA Grand Theft Auto
- the required sample size was also significantly reduced. For example, when fine-tuning is performed using only 30% of training data of Cityscapes, the segmentation accuracy of the IBN-Net model provided in this embodiment reaches 65.5%, while ResNet50 adjusted using all training data is only 63.8%.
- CNN invariance The modules proposed in related technologies are usually used to improve the CNN's modeling ability, or to reduce overfitting to enhance its generalization ability on a single domain. These methods usually achieve the above purpose by introducing specific invariance in the architecture of CNN. For example, maximum pooling and deformable convolutions introduce spatial invariance into CNNs, thereby increasing the robustness of convolutional neural networks to spatial changes such as affine, distortion, and perspective transformations.
- the role of the dropout layer and BN in training can be viewed as regularization to reduce the impact of sample noise.
- CNN network architecture Since CNN has shown stronger performance than traditional methods, the architecture of CNN has undergone many developments. Among them, the most widely used is ResNet. ResNet uses shortcuts to ease the training difficulty of very deep networks. Since then, multiple variants of ResNet have been proposed successively. Compared with ResNet, ResNeXt improves model performance by increasing the "cardinality" of ResNet. This is achieved by using group convolution.
- Another paradigm for this problem is domain promotion, whose goal is to acquire knowledge from many related source domains and apply it to new target domains whose statistics are unknown during training.
- Related techniques are often designed to capture factors common to different domains.
- the embodiment of the present application increases the model performance and generalization ability by designing a new appearance-invariant CNN architecture IBN-Net.
- this application does not require target domain data or related source domains. This embodiment is very useful in a case where target domain data cannot be obtained, which cannot be achieved by related technologies.
- this embodiment introduces IN according to two rules. First of all, in order not to reduce the deep feature's discrimination of picture content information, IN will not be added to the last layer or layers of CNN. Secondly, in order to still preserve the content information in the shallow layer, we retain BN processing on some features in the shallow layer.
- FIG. 3 is a structural diagram of a residual network based on an embodiment of the present application.
- ResNet is mainly composed of 4 groups of residual blocks.
- Figure 3 (a) is the structure diagram of a residual block in the original ResNet, and Figures 3 (b) and 3 (c) are different convolution layers in ResNet.
- the output feature map is structured by IN combined with BN processing.
- path 1 is the residual path
- path 2 is the identity path
- x in (x, 256d) in 30 represents the input feature.
- 256d means that the input feature is 256 channels
- 31 means that the convolution kernel is 1 * 1, 64-channel convolution layers
- 32, 34, and 36 represent the excitation layer (ReLU); 33 means that the convolution kernel is 3 * 3 Convolutional layer of 64 channels;
- 35 indicates a convolution layer of 1 * 1, 256 channels;
- 311 indicates batch normalization (BN) for 64 channels;
- 313 performs batch normalization (BN) on 256 channels;
- 321 indicates that half of the channels corresponding to the feature map output by the convolution layer (that is, 32 channels) IN processing, the other half (ie, the other 32 channels) BN processing;
- 331 indicates the result after performing the summing operation, the IN processing, the summing operation is shown in Figure 3 of ResNet (c) Residual block presented Wherein FIG convolution over three and a residual block obtained through the convolution of the characteristics of a multilayer (i.e., input characteristics x) summ
- IN is applied to the first normalization layer (that is, the feature map output by the first convolutional layer) instead of the last one to reduce the occurrence of F (x, ⁇ W i ⁇ ) and x misalignment in the identity path The probability.
- the feature map output by the convolution layer half of the channels are processed by BN and half of the channels are processed by IN, which satisfies the requirement of storing image content information in shallow layers.
- This design is a pursuit of model performance.
- IN enables the model to learn appearance-invariant features, which can make better use of images with high appearance diversity in a data set.
- add IN in a modest way so that content-related information can be well retained.
- This model is represented as IBN-Net-a in this embodiment.
- this application also proposes another network IBN-Net-b that pursues maximum normalization capabilities. Since the appearance information can be retained in both the residual path and the identity path, to ensure the generalization ability of the neural network, IN is added immediately after the addition operation, as shown in Figure 3 (c). In order not to degrade the performance of ResNet, this embodiment only adds three IN layers after the first convolutional layer and the first two convolutional groups.
- FIG. 4 is a structural diagram of a residual network based on another embodiment of the present application, and FIGS. 4 (a), 4 (b), 4 (c), and 4 (d) are structural diagrams of the same block in a residual neural network.
- FIGS. 4 (a), 4 (b), 4 (c), and 4 (d) are structural diagrams of the same block in a residual neural network.
- the feature maps of the different convolutional layers of the residual neural network that is, the normalization layer
- FIG. 4 is a structural diagram of a residual network based on another embodiment of the present application
- FIGS. 4 (a), 4 (b), 4 (c), and 4 (d) are structural diagrams of the same block in a residual neural network.
- the feature maps of the different convolutional layers of the residual neural network that is, the normalization layer
- path 3 is the residual path, path 4 is the identity path;
- x in (x, 256d) in 40 represents the input features, and 256d represents the input features are 256 channels ( (Can be interpreted as 256 images), 41 indicates a convolution kernel of 1 * 1, a convolutional layer of 64 channels, 42, 44 and 46 indicate an excitation layer (ReLU); 43 indicates that the convolution kernel is 3 * 3, 64 Convolutional layer of the channel; 45 indicates a convolution layer of 1 * 1, 256 channels; 411 indicates batch normalization (BN) for 64 channels; 412 batch normalization for 256 channels ( BN); 431 indicates that the feature map output by the first convolution layer is processed IN and BN respectively; 47 indicates that the results processed by the two normalization methods are stacked, and then output to the next layer, the excitation layer ; In Figure 4 (b), 413 represents the 64 channels Batch normalization (BN); 431 indicates that half of the result after performing the summing operation (ie, performing IN processing on 128 channels out of 256 channels
- FIG. 4 (b) after three-layer convolution is summed with the feature map obtained by the previous block of the neural network through multi-layer convolution.
- 441 indicates that half of the channels corresponding to the feature map output by the first convolution layer (ie, 32 channels) are subjected to IN processing, and the other half (ie, the other 32 channels) are subjected to BN processing;
- 442 indicates that half of the channels corresponding to the feature map output by the second convolution layer (ie, 32 channels) are subjected to IN processing, and the other half (ie, the other 32 channels) are subjected to BN processing.
- Figure 4 (a) the feature map output by the first convolution layer is processed by IN and BN respectively, and then the results after the two normalization methods are stacked and output to the next layer, that is, incentives. Maintain the high normalization performance and high discrimination characteristics of IN and BN, but this will introduce more parameters. Since the idea of retaining both features is also applicable to IBN-b, Figure 4 (b) is produced. In addition, the schemes shown in Figs. 4 (b) and 4 (b) can also be combined, as shown in Figs. 4 (c) and 4 (d). A discussion of these variants is given in the experimental section below. Table 1.3 shows the error rates based on top1 and top5 using the ImageNet database as the image to be identified and input into the IBN-Net neural network model.
- top1 means that the predicted maximum probability is the correct probability.
- top5 means that there are correct probabilities in the five categories that predict the maximum probability, top1 / top5err, which respectively indicate the error rates based on top1 and top5.
- RGB + 50 indicates that the R, G, and B channels of the image are added by 50 each on the basis of the original image; R + 50 indicates that the red channel is added by 50, that is, the image is redder; contrast * 1.5 indicates the contrast multiplied by 1.5;
- the top1-based error rate and top5-based error rate obtained using the model IBN-Net50-a are lower than those not used.
- IBN-Net's original ResNet50-based error rate based on top1 and top5-based error rate; and the model-based IBN-Net50-b has a low error rate based on top1 and an error rate based on top5, respectively.
- this embodiment compares the performance of the IBN-Net with the CNN architecture of the original popular ImageNet verification set. As shown in Table 2, IBN-Net achieves consistent improvements to these CNNs, demonstrating more powerful model performance. Specifically, IBN-ResNet101 is superior to ResNeXt101 and SE-ResNet101. The latter two require more time consumption or introduce additional parameters. However, in the IBN-Net model provided in this embodiment, no extra parameters are brought, and only a small amount of calculations are added during the test phase. Experimental results show that excluding some mean and variance statistics from features helps the model learn from images with a high degree of appearance diversity.
- IBN-Net-Net-d shows the results of the IBN-Net variants described in the method section. All the IBN-Net variants provided in this embodiment show better performance than the original ResNet50, and have less performance degradation under the appearance change. Specifically, IBN-Net-c achieves similar performance to IBN-Net-a, providing another method of feature combination. The performance and generalization capabilities of IBN-Net-d are between IBN-Net-a and IBN-Net-b, which indicates that retaining some BN features in some channels of the feature map output by the convolution layer helps improve performance. But at the same time, it will lose some generalization ability.
- IBN-Net-a and IBN-Net-b are basically equivalent to IBN-Net-d, which indicates that the influence of IN on the main path of ResNet will be dominant. Finally, adding an additional IBN layer to IBN-Net-a will not bring benefits, and adding an appropriate amount of IN layers is sufficient.
- Table 4 shows the performance of IBN-Net50-a, where the IN layer is added to different amounts of residual groups. It can be seen that as more IN layers are added to the shallow layer, the performance is improved, but when the IN layer is added to the last residual group, the performance is reduced. This shows that applying IN in the shallow layer helps to improve the performance of the model, and in the deep layer, BN is needed to retain important content information.
- this embodiment also studies the effect of the IN-BN ratio on performance. As shown in Table 5, when the ratio of IN is 0.25 to 0.5, the error rate based on top1 and the error rate based on top5 are the lowest. Prove that IN and BN need to be compromised in the process of use.
- ResNet50 using Hole Convolution is used as a reference, and IBN-Net follows the same modification. Train the model on each data set and evaluate IBN-Net and ResNet50. The evaluation results are shown in Table 6. In Table 6, mIoU (%) represents the mean intersection ratio (mIoU). The results of the experiments in this embodiment are consistent with the results in the ImageNet dataset. IBN-Net shows stronger model performance in one dataset, and has better generalization ability between datasets in different domains. Specifically, IBN-Net-a shows more powerful model performance. On both datasets, the accuracy of IBN-Net-a is better than ResNet by 50.6% and 2.0%. When cross-assessed, the generalization of IBN-Net-b is better.
- IBN-Net-b With IBN-Net-b, the performance from Cityscapes to Grand Theft Auto (GTA5) is 8.6% higher than the original ResNet50. Performance from GTA5 to Cityscapes improved by 7.6%. It is worth mentioning that the IBN-Net provided by this embodiment is different from the domain adaptation work. The domain adaptation is oriented to the target domain and requires target domain data during training, which is not required for the method adopted in this embodiment. Nevertheless, the performance gain of the method of this embodiment is still comparable to the performance gain of the domain adaptation method, and the method of this embodiment takes an important step towards a more general model, because this embodiment introduces a built-in appearance to the model Immutability rather than forcing the model to adapt to a specific data domain.
- Another common way to apply a model to a new data domain is to fine-tune it with a small number of target domain annotations.
- the model provided by this embodiment has stronger generalization ability, so the data required by the network can be significantly reduced.
- This embodiment uses different amounts of Cityscapes data and annotations to fine-tune the model pre-trained on the GTA5 dataset.
- the initial learning rate and number of cycles are set to 0.003 and 80, respectively.
- Table 7 under the condition that only 30% of Cityscapes training data is used, the performance of IBN-Net50-a is better than ResNet50 using all training data.
- the feature divergence caused by domain bias is analyzed here.
- the measure of feature divergence is as follows.
- F the average value of a channel is expressed as F, F basically describes how much this channel is activated, assuming F is the mean value of ⁇ and the variance is Gaussian with ⁇ 2 distributed.
- the symmetric KL divergence between domain A and domain B on this channel can be expressed as:
- F iB ) represent the symmetric relative entropy of the i-th channel.
- C is the number of channels in this layer. This metric provides a measure of the distance between the feature distributions of domain A and domain B.
- the first two groups are Cityscapes-GTA5 and the original image-Monet-style image (Photo-Monet). These two groups of domains have obvious appearance differences.
- the ImageNet-1k validation set is divided into two parts, the first part contains the image with 500 object categories, and the second part contains the remaining 500 categories. Then calculate the feature divergence of the output features of the 17 ReLU layers on the ResNet50 and IBN-Net50 main paths. Experiments on the above three groups of images, the experimental results obtained are that, in IBN-Net, the feature divergence caused by the appearance of the pictures is significantly reduced.
- IBN-Net-a For IBN-Net-a, the divergence decreases moderately, while for IBN-Net-b, a sudden drop is encountered after the IN layers of layers 2, 4, and 8, and this effect continues to the deep, which means depth Differences caused by appearances in features are reduced, so interference with classification is reduced.
- the feature divergence due to content differences does not decrease in IBN-Net, indicating that the content information in the features is well retained in the BN layer.
- IN and BN are applied in a separate deep network layer to improve the performance and generalization ability of the neural network. This embodiment applies IBN-Net to VGG, ResNet, ResNeXt, and SENet, and achieves a consistent accuracy improvement on the ImageNet data set.
- the built-in appearance invariance introduced by IN can improve the normalization ability of neural network models across image domains. Therefore, the role of the IN and BN layers in CNN can be summarized as follows: IN introduces appearance invariance and improves the generalization ability, while BN retains the discrimination of content information in features.
- FIG. 5 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present application.
- the apparatus 500 includes: a first obtaining module 501, a first processing module 502, and A first output module 503, wherein: a first acquisition module 501 is configured to acquire an image to be identified; and a first processing module 502 is configured to input the image to be identified and a trained neural network model to obtain the image to be identified Image recognition results, where the neural network model is obtained by performing IN and BN processing on the neural network.
- the first output module 503 is configured to output a recognition result of the image to be recognized.
- the apparatus 500 further includes: a second processing module configured to perform IN and BN processing on the feature map output by the convolution layer in the neural network to the neural network model.
- the second processing module includes: a first determination module configured to determine a first set of convolutional layers and the second set of convolutional layers from the convolutional layers of the neural network; A sub-processing module configured to perform IN processing on the feature map output by each convolution layer in the first set of convolution layers; a second sub-processing module configured to perform The feature map output by each convolutional layer is BN processed.
- the set consisting of the first convolution layer set and the second convolution layer set is all or part of all convolution layers of the neural network.
- the first set of convolutional layers and the second set of convolutional layers do not have an intersection, or the first set of convolutional layers and the second set of convolutional layers have an intersection;
- the second set of convolutional layers is a subset of the first set of convolutional layers.
- the first sub-processing module includes: a first sub-determination module configured to determine from a channel corresponding to a feature map output by each of the first convolution layer set A first channel set; a third sub-processing module configured to perform IN processing on the first channel set.
- the second sub-processing module includes: a second sub-determination unit configured to determine a second channel set from a channel corresponding to a feature map output by each of the convolution layers; a fourth sub-processing A module configured to perform BN processing on the second channel set.
- the first channel set is all or part of all channels corresponding to a feature map output by each of the first convolution layer set; the second channel set is All or part of all the channels corresponding to the feature maps output by each of the second convolution layer set.
- the apparatus further includes:
- a second processing module configured to sum the feature maps corresponding to the two blocks of the neural network to obtain an output result, and perform IN processing on the output result; wherein the neural network includes at least two blocks, and The number of channels corresponding to the feature map output from the last layer of each block is the same as the number of channels corresponding to the feature map output from the last layer of the previous block.
- the computer software product is stored in a storage medium and includes several instructions for Make an instant messaging device (which can be a terminal, a server, etc.) execute all or part of the method described in each embodiment of this application.
- the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (Read Only Memory, ROM), a magnetic disk, or an optical disk, which can store program codes. In this way, the embodiments of the present application are not limited to any specific combination of hardware and software.
- an embodiment of the present application further provides a computer program product, where the computer program product includes computer-executable instructions. After the computer-executable instructions are executed, the steps in the image recognition method provided by the embodiments of the present application can be implemented. Accordingly, an embodiment of the present application further provides a computer storage medium, where the computer storage medium stores computer-executable instructions, and when the computer-executable instructions are executed by a processor, the image recognition method provided by the foregoing embodiment is implemented. step. Accordingly, an embodiment of the present application provides a computer device.
- FIG. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in FIG.
- the device 600 includes: a processor 601, at least one communication bus 602, A user interface 603, at least one external communication interface 604, and a memory 605.
- the communication bus 602 is configured to implement connection and communication between these components.
- the user interface 603 may include a display screen, and the external communication interface 604 may include a standard wired interface and a wireless interface.
- the processor 601 is configured to execute a pathological image recognition program stored in a memory to implement the steps of the image recognition method provided by the foregoing embodiment.
- the size of the sequence numbers of the above processes does not mean the order of execution.
- the execution order of each process should be determined by its function and internal logic, and should not deal with the embodiments of the present application.
- the implementation process constitutes any limitation.
- the sequence numbers of the foregoing embodiments of the present application are for description and do not represent the superiority or inferiority of the embodiments.
- the terms "including”, “including” or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, It also includes other elements not explicitly listed, or elements inherent to such a process, method, article, or device. Without more restrictions, an element limited by the sentence "including a " does not exclude that there are other identical elements in the process, method, article, or device that includes the element.
- the disclosed device and method may be implemented in other ways.
- the device embodiments described above are only schematic.
- the division of the units is only a logical function division.
- there may be another division manner such as multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not implemented.
- the displayed or discussed components are coupled, or directly coupled, or communicated with each other through some interfaces.
- the indirect coupling or communications of the device or unit may be electrical, mechanical, or other forms. of.
- the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units; they may be located in one place or distributed across multiple network units; Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
- the functional units in the embodiments of the present application may be all integrated into one processing unit, or each unit may be separately used as a unit, or two or more units may be integrated into one unit; the above integration
- the unit can be implemented in the form of hardware, or in the form of hardware plus software functional units.
- the foregoing program may be stored in a computer-readable storage medium.
- the execution includes The steps of the above method embodiment; and the foregoing storage medium includes: various types of media that can store program codes, such as a mobile storage device, a read-only memory (Read Only Memory, ROM), a magnetic disk, or an optical disc.
- ROM Read Only Memory
- the above-mentioned integrated unit of the present application is implemented in the form of a software functional module and sold or used as an independent product, it may also be stored in a computer-readable storage medium.
- the computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a personal computer, a server, or a network device) is caused to execute all or part of the methods described in the embodiments of the present application.
- the foregoing storage media include: various types of media that can store program codes, such as a mobile storage device, a ROM, a magnetic disk, or an optical disc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Image Analysis (AREA)
Abstract
本申请实施例提供一种图像识别方法及装置、计算机设备和存储介质,所述方法包括:获取待识别图像;将所述待识别图像输入,经过训练得到的神经网络模型,得到所述待识别图像的识别结果,其中,所述神经网络模型是对所述神经网络中卷积层输出的特征图进行实例归一化和批归一化处理后得到的;输出所述待识别图像的识别结果。
Description
相关申请的交叉引用
本申请基于申请号为201810500185.0、申请日为2018年5月23日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以全文引入的方式引入本申请。
本申请实施例涉及深度学习领域,涉及但不限于图像识别方法及装置、计算机设备和存储介质。
卷积神经网络(Convolutional Neural Networks,CNN)已经成为计算机视觉领域的主流方法。对于图像理解任务例如图像分类,物体检测和语义分割,现有的主流卷积神经网络例如是牛津大学计算机视觉组(Visual Geometry Group,VGG),残差网络(Residual Network,ResNet),密集连接卷积网络(Dense Convolutional Network,DenseNet)等均采用批归一化(Batch Normalization,BN)来加速训练。然而,这些卷积神经网络对于图像外观的变化有较差的鲁棒性。例如,当图像的颜色、对比度、风格、场景等发生变化时,这些卷积神经网络的性能会显著下降。
此外,在图像外观转换领域,实例归一化(Instance Normalization,IN)被用于卷积神经网络中来提升其对不同外观图像的适应能力。然而,实例归一化并未被成功用于图像理解任务中,而且在现有技术中无论是在CNN中采用BN还是IN都没有很好的提升卷积神经网络的性能。
发明内容
本申请实施例提供一种图像识别方法及装置、计算机设备和存储介质。
本申请实施例的技术方案是这样实现的:
本申请实施例提供一种图像识别方法,所述方法包括:获取待识别图像;将所述待识别图像输入,经过训练得到的神经网络模型,得到所述待识别图像的识别结果,其中,所述神经网络模型是对所述神经网络中卷积层输出的特征图进行IN和BN处理后得到的;输出所述待识别图像的识别结果。
本申请实施例提供一种图像识别装置,所述装置包括:第一获取模块、第一处理模块和第一输出模块,其中:第一获取模块,配置为获取待识别图像;第一处理模块,配置为将所述待识别图像输入,经过训练得到的神经网络模型,得到所述待识别图像的识别结果,其中,所述神经网络模型是对神经网络进行IN和BN处理得到的。第一输出模块,配置为输出所述待识别图像的识别结果。
本申请实施例提供一种计算机程序产品,所述计算机程序产品包括计算机可执行指 令,该计算机可执行指令被执行后,能够实现本申请实施例提供的图像识别方法中的步骤。
本申请实施例提供一种计算机存储介质,存储有计算机可执行指令,该计算机可执行指令被执行后,能够实现本申请实施例提供的图像识别方法中的步骤。
本申请实施例提供一种计算机设备,包括存储器和处理器,所述存储器上存储有计算机可执行指令,所述处理器运行所述存储器上的计算机可执行指令时可实现本申请实施例提供的图像识别方法中的步骤。
本申请实施例一种计算程序产品,其中,所述计算机程序产品包括计算机可执行指令,该计算机可执行指令被执行后,能够实现本申请实施例提供的图像识别方法中的步骤。
本申请实施例中,通过将IN和BN结合后应用在神经网络中,有效的提高了图像识别的正确率。
图1A为本申请实施例网络架构的组成结构示意图
图1B为本申请实施例图像识别方法的实现流程示意图;
图1C为本申请实施例实现图像识别方法的网络架构图;
图1D为本申请实施例又一实现图像识别方法的网络架构图;
图2为本申请实施例图像识别方法的又一实现流程示意图;
图3为本申请实施例基于残差网络的组成结构图;
图4为本申请实施例又一基于残差网络的组成结构图;
图5为本申请实施例图像识别装置的组成结构示意图;
图6为本申请实施例计算机设备的组成结构示意图。
本实施例先提供一种网络架构,图1A为本申请实施例网络架构的组成结构示意图,如图1A所示,该网络架构包括两个或多个计算机设备11至1N和服务器31,其中计算机设备11至1N与服务器31之间通过网络21进行交互。计算机设备在实现的过程中可以为各种类型的具有信息处理能力的计算设备,例如所述计算机设备可以包括手机、平板电脑、台式机、个人数字助理、导航仪、数字电话、电视机等。本实施例提出一种图像识别方法,能够有效解决输出图像的结构信息与输入图像相比发生变化的问题,该方法应用于计算机设备,该方法所实现的功能可以通过计算机设备中的处理器调用程序代码来实现,当然程序代码可以保存在计算机存储介质中,可见,该计算机设备至少包括处理器和存储介质。
为了更好的理解本实施例,这里对神经网络相关的专业术语加以解释,通道(Channel),该词语有两种不同的含义,第一种是对于样本图像(图像作为训练样本),通道是指颜色通道,下面将用颜色通道来表示样本图像的通道;第二种是输出空间的维 数,例如卷积操作中输出通道的个数,或者说每个卷积层中卷积核的数量。
颜色通道,把图像分解成一个或多个颜色成分或颜色分量。单颜色通道,一个像素点只需一个数值表示灰度,0为黑色。三颜色通道,如果采用红绿蓝(Red Green Blue,RGB)色彩模式,把图像分为红绿蓝三个颜色通道,可以表示彩色,全0表示黑色。四颜色通道,在RGB色彩模式的基础上加上alpha通道,表示透明度,alpha=0表示全透明。卷积神经网络,是一种多层的监督学习神经网络,隐含层的卷积层和池采样层是实现卷积神经网络特征提取功能的核心模块。卷积神经网络的低隐层是由卷积层和最大池采样层交替组成,高层是全连接层对应传统多层感知器的隐含层和逻辑回归分类器。第一个全连接层的输入是由卷积层和子采样层进行特征提取得到的特征图像。最后一层输出层是一个分类器,可以采用逻辑回归,Softmax回归甚至是支持向量机对输入图像进行分类。CNN中每一层的由多个图(map)组成,每个map由多个神经单元组成,同一个map的所有神经单元共用一个卷积核(即权重),卷积核往往代表一个特征,比如某个卷积核代表一段弧,那么把这个卷积核在整个图像上卷积一遍,卷积值较大的区域就很有可能是一段弧。CNN一般采用卷积层与采样层交替设置,即一层卷积层接一层采样层,采样层后接一层卷积;当然也可以多个卷积层接一个采样层,这样卷积层提取出特征,再进行组合形成更抽象的特征,最后形成对图像对象的描述特征,CNN后面还可以跟全连接层。ReLU函数,其公式即为个ReLU(x)=max(0,x),从ReLU函数的图形可以看出ReLU与其他激活函数例如sigmoid函数相比,主要变化有三点:①单侧抑制;②相对宽阔的激活边界;③稀疏激活性。VGG模型,VGG模型结构简单有效,前几层仅使用3×3卷积核来增加网络深度,通过最大池化(Max Pooling)依次减少每层的神经元数量,最后三层分别是2个有4096个神经元的全连接层和一个softmax层。“16”和“19”表示网络中的需要更新权重(即weight,要学习的参数)的卷积层和全连接层的层数,VGG16模型和VGG19模型的权重都由ImageNet训练而来。
本实施例提供一种图像识别方法,图1B为本申请实施例图像识别方法的实现流程示意图,如图1B所示,所述方法包括以下步骤:
步骤S101,获取待识别图像。这里,所述步骤S101可以是由计算机设备实现的,进一步地,所述计算机设备可以是智能终端,例如可以是移动电话(比如,手机)、平板电脑、笔记本电脑等具有无线通信能力的移动终端设备,还可以是台式计算机等不便移动的智能终端设备。所述计算机设备用于进行图像识别或处理。
在本实施例中,所述待处理图像可以是外观复杂的图像,还可以是外观简单的图像。
步骤S102,将所述待识别图像输入,经过训练得到的神经网络模型,得到所述待识别图像的识别结果。这里,所述步骤S102可以是由计算机设备实现的。所述神经网络模型是对所述神经网络中卷积层输出的特征图进行IN和BN处理后得到的。在本实施例中,对所述神经网络中卷积层输出的特征图进行IN和BN处理,得到所述神经网络模型,即IBN-Net。所述识别结果可以是图像的类别、图像的名称等。所述神经网络 可以是卷积神经网络,比如,ResNet50、VGG和DenseNet等。由于所述神经网络模型中同时采用了IN和BN,这样IN引入了外观不变性并改进了泛化,而BN保留了内容信息的区分特征。所以,在采用IN和BN相结合的神经网络模型中,不仅能够提升神经网络的泛化能力,还能够提高该神经网络识别图像的正确率。
步骤S103,输出所述待识别图像的识别结果。这里,所述步骤S103可以是由计算机设备实现的。在实际实现过程中,所述计算机设备输出所述待识别图像的分析结果可以是在自身显示屏上输出所述待识别图像,还可以是所述计算机设备将所述分析结果输出至其他设备,也就是发送给其他设备,例如所述其他设备可以是用户的智能终端上。
在本申请实施例提供的一种图像识别方法中,通过将IN和BN结合后应用在神经网络中,然后在经过IN和BN处理后的神经网络模型中对待识别的图像进行分析,有效的提高了图像识别的正确率。
在实现的过程中,经过训练的神经网络模型可以在计算机设备的本地,也可以是在服务器端。
当经过训练的神经网络模型在计算机设备本地时,可以是计算机设备安装客户端的时候,即安装了经过训练的神经网络模型,这样,参见图1C所示,计算机设备通过步骤S101获取待识别图像,然后通过步骤S102获得待识别图像的识别结果,最后通过步骤S103输出识别结果。从以上过程可以看出,计算机设备在安装完客户端之后,上述的步骤S101至步骤S103都在计算机设备本地执行,最后,计算机设备将得到识别结果输出给用户。
在一些实施例中,经过训练的神经网络模型也可以位于服务器端,参见图1D所示,这样计算机设备将输入图像发送给服务器,这样服务器接收计算机设备发送的输入图像,这样服务器实现了步骤S101,换句话说,如果上述的方法是在服务器端实现,那么步骤S101,包括:服务器接收计算机设备发送的输入图像,即服务器确定待识别的图像,然后服务器通过步骤S102获得所述待识别图像的输出结果,最后通过步骤S103获得输出的识别结果;从以上过程可以看出,上述的步骤S101至步骤S103都在服务器端执行,最后服务器还可以将识别结果发送给计算机设备,这样计算机设备接收到识别结果后,输出识别结果给用户。本实施例中,计算机设备在安装完客户端之后,用户上传用户的待识别图像,以及接收服务器发送的待识别图像,并将识别结果输出给用户。
本实施例提供一种图像识别方法,图2为本申请实施例图像识别方法的又一实现流程示意图,如图2所示,所述方法包括以下步骤:
步骤S201,从所述神经网络的卷积层中确定第一卷积层集合和所述第二卷积层集合。这里,所述第一卷积层集合和所述第二卷积层集合组成的集合为所述神经网络的所有卷积层中的全部或部分。所述第一卷积层集合和所述第二卷积层集合组成的集合为所述神经网络的所有卷积层中的全部,可以理解为,该神经网络中所有的卷积层都经过了IN和/或BN处理。第一卷积层集合和所述第二卷积层集合组成的集合为所述神经网络的所有卷积层中的部分,可以理解为,该神经网络中的部分卷积层没有进行IN或IN结 合BN的处理。
步骤S202,从所述第一卷积层集合中的每一所述卷积层输出的特征图对应的通道中确定第一通道集合。这里,所述第一通道集合为所述第一卷积层集合中的每一所述卷积层输出的特征图对应的所有通道的全部或部分。所述第一卷积层集合中不包括所述神经网络中的最后一层卷积层,即对神经网络的最后一层(深层)不进行IN处理,这样就不减少内容在深层特征中的区分度,还能够降低由于图像外观变换引起的特征变化,如此,提升该神经网络模型图像识别的正确率。在实际使用过程中,一般选择特征图对应的所有通道的一半进行IN处理,另一半进行BN处理,显然,进行IN处理的通道的比例是可以调整的。在本实施例中,为使神经网络的恒等路径较为清洁,因此,不在神经网络的恒等路径进行IN处理。
步骤S203,对所述第一通道集合进行IN处理。这里,当第一通道集合为所述第一卷积层集合中的每一所述卷积层输出的特征图对应的所有通道的全部时,对第一卷积层集合中的每一所述卷积层输出的特征图对应的所有通道进行IN处理;当第一通道集合为所述第一卷积层集合中的每一所述卷积层输出的特征图对应的所有通道的部分时,对所述部分通道IN处理,然后对剩余的通道进行BN处理,或者不做任何处理。
步骤S204,从第二卷积层集合中的每一卷积层输出的特征图对应的通道中确定第二通道集合。这里,所述第二通道集合为所述第二卷积层集合中的每一所述卷积层输出的特征图对应的所有通道的全部或部分。
步骤S205,对所述第二通道集合进行BN处理。这里,当第二通道集合为所述第二卷积层集合中的每一所述卷积层输出的特征图对应的所有通道的全部时,对第二卷积层集合中的每一所述卷积层输出的特征图对应的所有通道进行BN处理;当第二通道集合为所述第二卷积层集合中的每一所述卷积层输出的特征图对应的所有通道的部分时,对所述部分通道BN处理,然后对剩余的通道进行IN处理。
在一些实施例中,所述第一卷积层集合与所述第二卷积层集合之间的关系包括以下三种情况:情况一:所述第一卷积层集合与所述第二卷积层集合不具有交集,即第一卷积集合与第二卷积集合分别进行不同的归一化处理,即对所述第一卷积层集合中的每一卷积层输出的特征图进行IN处理;对所述第二卷积层集合中的每一卷积层输出的特征图进行BN处理,如图4(b)所示,只对求和操作之后得到的输出结果的一部分进行IN处理,而对其余卷积层输出的特征图进行BN处理。情况二:所述第一卷积层集合与所述第二卷积层集合具有交集,即对第一卷积层集合进行了IN、IN结合BN的处理;对第二卷积层集合进行了BN、IN结合BN的处理;即,如步骤202和步骤203所述的,当第一通道为第一卷积层集合中的每一所述卷积层输出的特征图对应的所有通道的部分时,对所述部分进行IN处理,对其余部分进行BN处理。或者,步骤204和步骤205所述的,当第二通道为第二卷积层集合中的每一所述卷积层输出的特征图对应的所有通道的部分时,对所述部分进行BN处理,对其余部分进行IN处理;如图4(d)所示,对卷积层输出的特征图进行BN、IN结合BN的处理。情况三:所述第二卷积层集合为 所述第一卷积层集合的子集合,当所述第二卷积层集合为所述第一卷积层集合的真子集时,对第一卷积集合进行IN、IN结合BN的处理,对第二卷积集合进行IN结合BN的处理。当所述第二卷积层集合和所述第一卷积层集合相同时,即是对第二卷积层集合和第一卷积层集合均进行IN结合BN的处理。即,第一通道第一卷积层集合中的每一所述卷积层输出的特征图对应的所有通道的部分,对所述部分进行IN处理,对其余部分进行BN处理,或者不处理(即第一卷积层集合层中包括了两种处理方式:IN、IN结合BN)。
在一些实施例中,所述方法还包括:对所述神经网络的两个块对应的特征图进行求和,得到输出结果,对所述输出结果进行IN处理;如图3(c)所示,首先,对图3(c)所呈现的残差块经过三层卷积得到的特征图和上一个残差块经过多层卷积得到的特征图求和,得到求和结果(即输出结果);然后对求和结果进行IN处理。由于在本实施例中,外观信息既可以保留在残留路径中,也可以保存在恒等路径中,为了更加有效的提升神经网络的泛化能力,在残差路径(即图3中的路径1)和恒等路径(即图3中的路径2)汇聚后,再进行IN处理,从而有效的提高了图像识别的正确率。
与相关技术中单独采用IN和BN的CNN结构不同,本实施例提供的图像识别方法中,通过深入研究IN和BN的学习能力来实现将IN和BN结合使用在同一个CNN中。例如,在许多先进的深度架构采用BN作为提高其高级视觉任务的学习能力的关键组件,而IN通常与CNN结合以消除低级视觉任务的图像的方差,如,图像风格变换等。但IN和BN的学习特征的不同特点及其组合的影响尚未在相关技术中得出结论。相反,本申请实施例提供的IBN-Net表明,以适当的方式组合IN和BN会提高CNN的学习和泛化能力。在IBN-Net中将IN和BN组合的特征保留在CNN的浅层,BN特征保留在CNN的高层,以符合网络不同深度下的统计特征。由于在CNN中,图像外观相关的信息(比如,颜色,对比度,风格等)主要存在于浅层的特征中,而图像中物体类别相关的信息主要存在于深层的特征中,并且在浅层的特征中仍然存在。基于此,按照两条规则向CNN引入IN层:一是,为了减少在浅层由外观引起的特征变化,同时不干扰深层内容区分,因此只在CNN中低层的那一半添加IN层。二是,为了在浅层不损失图像内容信息,将原始的BN层中一半的特征替换为IN层,将剩下另一半仍然采用BN;基于此,本实施例提供的在卷积神经网路中采用BN和IN的结合进行处理,形成IBN-Net。
在卷积神经网路中采用本申请实施例提供的IBN-Net,具有以下优点:
第一,本申请实施例提出的IBN-Net,提高了卷积神经网络的性能和泛化能力。例如,IBN-Net50和ResNet50有相似数量的参数和计算成本时,IBN-Net50在图像网络数据库(ImageNet)的原始验证集上的精度分别达到第五和第一的正确率为93.7%和77.4%,分别比ResNet50的验证率高出0.8%和1.7%。IBN-Net50在ImageNet的风格转换后的新验证集上的精度分别达到第五和第一的正确率为72.9%和48.9%,分别比ResNet50的验证率高出2.2%和2.9%。
第二,在本申请实施例提出的IBN-Net中,IN提供视觉和外观不变性,同时BN能 够加速训练并保留区分性的特征。这个特征有助于设计IBN-Net的体系结构,其中将IN设置在浅层中以消除外观变化,而为了保持区分性,应该减小IN在深层中的强度。IBN-Net的模块可以用来重新开发许多最近正在研究的深度架构,以提高该深度结构的学习和泛化能力,但保持该深度结构的计算成本不变。例如,通过在VGG16,ResNet101,ResNeXt101和挤压-激励网络(Squeeze-and-Excitation Residual Network,SE-ResNet101)中使用IBN-Net,它们在ImageNet验证集上精度达到第一的正确率分别优于各自的原始版本的0.8%,1.1%,0.6%和0.7%。
第三,IBN-Net显著提高了跨域的性能。例如,基于交通场景的真实数据集和虚拟数据集属于两种图像域,其中,真实数据集可以是城市景观(Cityscapes),虚拟数据集可以是侠盗猎车手(Cityscapes Grand Theft Auto,GTA);当在GTA上训练,在Cityscapes上测试的情况下,IBN-Net集成的ResNet50性能提高了7.6%。当使用GTA预训练的模型在Cityscapes上微调时,还显著降低了需要的样本量。例如,使用Cityscapes只有30%的训练数据进行微调时,本实施例提供的IBN-Net模型的分割准确率达到65.5%,而使用所有训练数据来调整的ResNet50则只有63.8%。
为了更好的理解本实施例,这里提供与IBN-Net相关的四个方面:CNN的不变性,CNN网络体系结构,域适配方法和场景理解的方法。CNN中的不变性:在相关技术中提出的模块通常是为了提高CNN的建模能力,或者减少过拟合来增强其在单个域上的泛化能力。这些方法通常通过在CNN的体系结构中引入特定的不变性来达到上述目的。例如,最大池化和可变形卷积将空间不变性引入到CNN中,从而增加了卷积神经网络对空间变化(例如仿射,失真和视角变换)的鲁棒性。丢弃(dropout)层和BN在训练中的作用可以被视为正则化来减少样本噪音的影响。对于图像外观,简单的外观变化例如颜色和亮度偏移可以通过用平均值和标准偏差对图像的每个RGB通道进行归一化来消除。对于更复杂的外观变化,如风格变换,最近的研究发现这些信息可以被编码在特征图的均值和方差中。因此,实例归一化层就显示出了消除这种外观差异的潜力。CNN网络体系结构:自从CNN显示出比传统方法更强的性能,CNN的架构已经经历了许多发展。其中,使用最广泛的是ResNet,ResNet使用捷径来缓解非常深的网络的训练难度。此后,ResNet的多种变体被相继提出。与ResNet相比,ResNeXt通过增加ResNet的“基数”来提高模型性能这是通过使用组卷积来实现的。在实践中,增加基数会增加深度学习框架的运行时间。此外,挤压-激励网络(Squeeze-and-Excitation Network,SENet)将通道维度的注意力机制引入ResNet。与ResNet相比,SENet在ImageNet上实现了更好的性能,但也增加了网络参数和计算量。最近提出的密集连接网络(Densely Connected Networks,DenseNet)使用堆叠操作来取代ResNet的捷径。DenseNet被证明比ResNet更有效率。但是,上述CNN体系结构有两个限制。首先,有限的基本模块阻止CNN获得更具吸引力的属性。例如,所有这些体系结构都是由卷积,BN、激励层(Rectified Linear Unit,ReLU)和共享池组成的,不同CNN唯一的区别是这些模块是如何组织的。然而,这些层的组成天然地易受外观变化的影响。其次,这些模型的设计目标是在一个单一领 域的单一任务中实现强大的性能,但是这些模型推广到新的领域的能力仍然有限。在图像风格变换的领域,一些方法采用IN来帮助消除图像对比度。但是,图像外观的不变性还没有成功地引入到CNN中,特别是在图像分类或语义分割等高级任务中。这是因为IN丢失了特征中有用的内容信息,影响了模型的性能。域适配方法:缓解由不同域之间的偏差引起的性能衰落是一个重要问题。一种自然的方法是使用迁移学习,例如,在目标域上微调模型。但是,这需要目标域有人工标注结果,并且当网络模型应用于源域时,微调后的模型的性能会下降。有许多领域适应方法使用目标域的统计数据来促进自适应。一般需要通过精心设计的损失函数,如最大平均差异(Maximum Mean Discrepancy,MMD),相关性对齐(Correlation Alignment,CORAL)和对抗损失(Adversarial Loss,AL),通过减少两个域的偏差引起的特征差异来缓解性能衰落的问题。迁移学习和领域适应有两个主要限制:首先,在实际应用中很难获得目标域的统计信息。收集涵盖目标域中所有可能场景的数据也非常困难。其次,大多数最先进的方法对源域和目标域采用不同的模型以提高性能。但理想的情况是,一个模型可以适应所有领域。
针对这个问题的另一个范式是域推广,其目标是从许多相关的源域获取知识并将其应用于训练期间统计数据未知的新目标域。相关技术中通常设计算法来捕捉不同域中共有的因素。但是,对于实际应用,通常很难收集多个相关的源域的数据,并且最终的性能高度依赖于所收集的一系列源域。在这项工作中,本申请实施例通过设计新的具有外观不变性的CNN体系结构IBN-Net增加了模型性能和泛化能力。与域适应和域推广不同,本申请不需要目标域数据或相关源域。本实施例对于无法得到目标域数据的情况非常有用,这是相关技术不能达到的。
本实施例中,对于基于BN的CNN,图像外观相关的信息(颜色,对比度,风格等)主要存在于浅层的特征中,而图像中物体类别相关的信息主要存在于深层的特征中,并且在浅层特征中仍然存在。因此,本实施例按照两条规则引入IN。首先,为了不减少深层特征对图片内容信息的区分度,不会在CNN的最后一层或几层添加IN。其次,为了在浅层中依然保存内容信息,我们保留了在浅层中对一部分特征进行BN处理。
本实施例将IBN-Net应用于ResNet中,图3为本申请实施例基于残差网络的组成结构图。ResNet主要由4组残差块组成,如图3(a)为原始的ResNet中一个残差块的结构图,图3(b)和图3(c)分别为对ResNet中不同的卷积层输出的特征图进行IN结合BN处理的结构图;其中,在图3(a)中,路径1为残差路径,路径2为恒等路径;30中(x,256d)中x表示输入的特征,256d表示输入的特征是256个通道,31表示卷积核为1*1,64个通道的卷积层,32、34和36表示激励层(ReLU);33表示卷积核为3*3,64个通道的卷积层;35表示卷积核为1*1,256个通道的卷积层;311表示对64个通道进行批归一化(BN);312对64个通道进行批归一化(BN);313对256个通道进行批归一化(BN);在图3(b)中,321表示对卷积层输出的特征图对应的通道的一半(即32个通道)进行IN处理,另一半(即另外32个通道)进行BN处理;在图3 (c)中,331表示对进行求和操作之后的结果,进行IN处理,所述求和操作是在ResNet中图3(c)所呈现的残差块经过三层卷积得到的特征图和上一个残差块经过多层卷积得到的特征图(即输入的特征x)求和。
对于一个残差块,为了利用IN的泛化潜力,在第一个卷积层之后得到的特征图中,将BN用于该特征图的一半的通道,将IN用于其他通道,如图3(b)所示,有三个理由这样做:首先,干净的恒等路径对于优化ResNet是至关重要的,因此我们将IN添加到残差路径而不是恒等路径。其次,在残差学习函数y=F(x,{W
i})+x中残差函数F(x,{W
i})在学习过程中是与恒等路径中的x对齐。因此,IN应用于第一个归一化层(即第一个卷积层输出的特征图)而不是最后一个,以减少发生F(x,{W
i})与恒等路径中的x错位的概率。第三,在卷积层输出的特征图中,将一半通道进行BN处理,一半通道进行IN处理,满足了将图像内容信息保存在浅层的要求。
这种设计是对模型性能的追求。一方面,IN使模型能够学习外观不变特征,从而能够更好地利用一个数据集内具有高外观多样性的图像。另一方面,以适度的方式添加IN,以便与内容相关的信息可以得到很好的保留。本实施例中将这个模型表示为IBN-Net-a。此外,本申请还提出另一种追求最大范化能力的网络IBN-Net-b。由于外观信息既可以保留在残差路径中,也可以保存在恒等路径中,因此,为了保证神经网络的泛化能力,在加法操作之后立即添加IN,如图3(c)所示。为了不使ResNet的性能出现劣化,本实施例只在第一卷积层和前两个卷积组之后添加三个IN层。
如表1.1为原始的ResNet50及其对应的两种IBN-Net的整体网络结构,从表1.1可以看出,相比较原始的ResNet50,在IBN-Net50-a中前三组模块(conv2_x-conv4_x)被换成了图3(b)中IBN-a的结构,而在IBN-Net50-b中前两组模块(conv2_x-conv3_x)的最后一个残差块被换成了图3(c)中IBN-b的结构,并且第一个卷积层conv1后的BN被替换成了IN。在表1A中,conv2_x表示第一组残差块。
表1.1原始的ResNet50及其对应的两种IBN-Net的整体网络结构
上述两种IBN-Net不是唯一的在CNN中使用方法IN和BN的方法。在本实施例提供了一些有趣的变体,如图4所示。图4为本申请实施例又一基于残差网络的组成结构图,图4(a)、4(b)、4(c)和4(d)为残差神经网络中同一个块的结构图,如图4(a)、4(b)、4(c)和4(d)分别为,残差神经网络的不同的卷积层输出的特征图(即归一化层)进行IN结合BN处理,其中,在图4(a)中,路径3为残差路径,路径4为恒等路径;40中(x,256d)中x表示输入的特征,256d表示输入的特征是256个通道(可以理解为256张图像),41表示卷积核为1*1,64个通道的卷积层,42、44和46表示激励层(ReLU);43表示卷积核为3*3,64个通道的卷积层;45表示卷积核为1*1,256个通道的卷积层;411表示对64个通道进行批归一化(BN);412对256个通道进行批归一化(BN);431表示对第一个卷积层输出的特征图分别进行IN和BN处理;47表示将两种归一化方式处理之后的结果进行堆叠操作,然后,输出到下一层即激励层;在图4(b)中,413表示对64个通道进行批归一化(BN);431表示表示对进行求和操作之后的结果的一半(即对256个通道中的128个通道进行IN处理),进行IN处理,所述求和操作是在卷积神经网络中图4(b)所呈现的神经网络的块经过三层卷积得到的特征图和上一个该神经网络的块经过多层卷积得到的特征图求和。在图4(c)中,441表示对第一个卷积层输出的特征图对应的通道的一半(即32个通道)进行IN处理,另一半(即另外32个通道)进行BN处理;在图4(d)中,442表示对第二个卷积层输出的特征图对应的通道的一半(即32个通道)进行IN处理,另一半(即另外32个通道)进行BN处理。
表1.2原始VGG16及其IBN-Net版本的整体结构
从表1.2可以看出,在本实施例通过的IBN-Net中,有多种实施方式,其中,部分参数可以在满足以下两个条件变化:一是不在网络的最后一层(根据对训练结果的需要,也可以将模型设置为不在最后两层或3层等不进行IN处理)使用IN进行处理;二是,在其余的层(除最后一层)将IN和BN结合使用或交替使用。例如:观察图4给出的四个方案即图4(a)、4(b)、4(c)和4(d),可以看出,IN层的位置、利用IN进行处理的通道的数量均可以调整;另外,在同一卷积神经网络的不同模块中,图4给出的四种实施方案可以交替使用。
表1.3使用ImageNet验证集的图像外观变化对应的错误率
在图4(a)中,对第一个卷积层输出的特征图分别进行IN和BN处理,然后,将两种归一化方式处理之后的结果堆叠起来并输出到下一层即激励,保持IN和BN的高范化性能和高区分度的特征,但是这会引入更多参数。由于保留两种特征的想法也适用于IBN-b,从而产生了图4(b)。另外,还可以将图4(b)和图4(b)呈现的方案结合起来,如图4(c)和4(d)所示。关于这些变体的讨论将在下文的实验部分中给出。表1.3为采用ImageNet数据库作为待识别图像,输入到IBN-Net神经网络模型中,得到的基于top1和top5的错误率;其中,top1的意思是预测出来最大概率的分类是正确的概率。top5的意思是预测出来最大概率的5个分类里有正确的概率,top1/top5err,分别表示基于top1和基于top5的错误率。RGB+50表示,在原图的基础上,将图像的R、G和B三个通道各加50;R+50表示红色通道加50,即图像更红;contrast*1.5表示对比度乘以1.5;Monet表示使用CycleGAN(图像风格转换工具)将图像转换成莫奈风格的 图像。因此,从表1.3可以看出,无论是基于哪种外观变化,采用模型IBN-Net50-a(即图3(b))得到的基于top1的错误率和基于top5的错误率分别低于未采用IBN-Net的原始的ResNet50基于top1的错误率和基于top5的错误率;而且采用模型IBN-Net50-b(即图3(c))得到的基于top1的错误率和基于top5的错误率分别低于未采用IBN-Net的原始的ResNet50基于top1的错误率和基于top5的错误率。因此,采用IBN-Net模块得到的神经网络的性能优于未采用IBN-Net模块(即原始ResNet,如图3(a)所示)得到的神经网络的性能。
表2在其他CNN模型中采用IBN-Net,通过ImageNet验证集的错误率
为了表明比传统CNN更强大的IBN-Net模型性能,本实施例将IBN-Net的性能与最初流行的原始ImageNet验证集中的CNN架构进行比较。如表2所示,IBN-Net实现了对这些CNN一致的改进,表明了更强大的模型性能。具体而言,IBN-ResNet101优于ResNeXt101和SE-ResNet101,后两者需要更多的时间消耗或引入额外的参数。但是在本实施例中提供的IBN-Net模型中,不会带来额外的参数,而只会在测试阶段添加很少量的计算。实验结果显示,剔除特征中的一些均值和方差统计量有助于模型从具有高度外观多样性的图像中学习。
表3IBN-Net的变体基于ImageNet验证集和莫奈风格图像的错误率
本实施例进一步研究IBN-Net的其他变体。表3显示了方法部分描述的IBN-Net变体的结果。在本实施例提供的所有的IBN-Net变体都比原始的ResNet50显示出更好的性能,并且在外观变换下性能下降更少。具体而言,IBN-Net-c实现与IBN-Net-a类似的性能,提供了另一种特征组合的方法。IBN-Net-d的性能和泛化能力位于IBN-Net-a和IBN-Net-b之间,这表明在卷积层输出的特征图的部分通道中保留一些BN特性有助 于提高性能,但同时会失去一定的泛化能力。IBN-Net-a和IBN-Net-b的组合基本等同于IBN-Net-d,这表明IN对ResNet主路径的影响将占主导地位。最后,向IBN-Net-a添加额外的IBN层不会带来好处,适量的添加IN层就足够了。
表4在IBN-Net50-a中的残差组中增加不同数量的IN层对应的错误率
表5 IN在IBN层中所占比例的不同对错误率的影响
在本实施例中,研究了添加不同数量IN层的IBN网络。表4给出了IBN-Net50-a的性能,其中IN层添加到不同量的残差组。可以看出,随着更多IN层添加到浅层,性能得到改善,但是当将IN层添加到最后的残差组时,性能却降低。这表明将IN应用在浅层有助于提高模型性能,而在深层需要采用BN以保留重要的内容信息。此外,本实施例还研究了IN-BN比率对性能的影响,如表5所示,当IN的比例为0.25至0.5时,基于top1的错误率和基于top5的错误率分别为最低,这就证明IN和BN在使用的过程中需要折衷进行。
表6基于Cityscapes-GTA数据集的结果
本实施例中用采用了孔卷积(Hole Convolution)的ResNet50为基准,IBN-Net遵循相同的修改。在每个数据集上训练模型并对IBN-Net和ResNet50进行评估,评估结果如表6所示。在表6中,mIoU(%)表示均交并比(Mean Intersection over Union,mIoU)。本实施例的实验的结果与ImageNet数据集中的结果一致。IBN-Net在一个数据集中显示出更强的模型性能,并且在不同域的数据集之间具有更好的泛化能力。具体而言,IBN-Net-a显示出更强大的模型性能,在两个数据集上,IBN-Net-a的正确率优于 ResNet504.6%和2.0%。当进行交叉评估时,IBN-Net-b的泛化更好,采用IBN-Net-b比原始的ResNet50,从Cityscapes到-侠盗猎车手(Grand Theft Auto V,GTA5)的性能提高了8.6%,从GTA5到Cityscapes的性能提高了7.6%。值得一提的是,本实施例提供的IBN-Net与域适应工作不同。域适应是面向目标领域的,并且在训练期间需要目标领域数据,而本实施例通过的方法则不需要。尽管如此,本实施例的方法的性能增益与域适应方法的性能增益仍然相当,而且本实施例的方法向更通用的模型迈出了重要的一步,因为本实施例向模型引入了内置的外观不变性,而不是强迫该模型适应于特定的数据域。
表7调整不同数据的百分比提升的性能
在新数据域上应用模型的另一个常用方法是使用少量目标域标注对其进行微调。本实施例提供的模型有更强的范化能力,因此网络所需的数据可以显著减少。本实施例使用不同数量的Cityscapes数据和标注来微调在GTA5数据集上预训练的模型。初始学习速率和周期数分别设为0.003和80。如表7所示,在只使用了30%的Cityscapes训练数据的条件下,IBN-Net50-a的性能优于使用了所有训练数据的ResNet50。
为了便于理解本实施例IBN-Net如何实现更好的泛化,这里分析域偏置引起的特征散度。特征散度度量如下,对于CNN中某一层的输出特征,将一个通道的平均值表示为F,F基本上描述了这个通道激活了多少,假设F为均值为μ,方差为σ
2的高斯分布。则在该通道上域A和域B之间的对称相对熵(symmetric KL divergence),可以表示为:
D(F
A||F
B)|=KL(F
A||F
B)+KL(F
B||F
A) (1);
用D(F
iA||F
iB)表示第i个通道的对称相对熵。对该层的特征的所有通道的对称相对熵取平均,就可以得到该层上域A和域B特征差异的一个度量,其表达式为:
在公式(3)中,C是该层中通道的数量,此度量提供了域A和域B的特征分布之间距离的度量标准。
为了捕获实例归一化对外观信息和内容信息的影响,这里考虑三组域。前两组是Cityscapes-GTA5和原始图像-莫奈风格的图像(Photo-Monet),这两组域有明显的外观差异。为了构建具有不同内容的两个域,将ImageNet-1k验证集分成两部分,第一部分包含图像有500个对象类别,第二个部分包含其余500个类别。然后计算ResNet50和IBN-Net50主路径上的17个ReLU层的输出特征的特征散度。对上述三组图像进行实验,得到的实验结果是,在IBN-Net中,由图片外观不同所造成的特征散度明显减小。对于 IBN-Net-a,散度适度减小,而对于IBN-Net-b,在第2、4和8层的IN层之后遇到突然下降,而且这种效应持续到深层,这意味着深度特征中的外观造成的差异减少了,因此对分类的干扰减少了。另一方面,由于内容差异导致的特征散度在IBN-Net中没有下降,表明特征中的内容信息在BN层中保留得很好。在本实施例提出的IBN-Net中,将IN和BN应用在单独的一个深度网络层中,以提高神经网络的性能和泛化能力。本实施例将IBN-Net应用在包括VGG,ResNet,ResNeXt和SENet上,并在ImageNet数据集上实现了一致的正确率提升。此外,即使不使用目标域数据,由IN引入的内置外观不变性可提升神经网络模型跨图像域的范化能力。因此,IN和BN层在CNN中的作用可以总结为:IN引入了外观不变性并提升了泛化能力,而BN保留了内容信息的在特征中的区分度。
本申请实施例提供一种图像识别装置,图5为本申请实施例图像识别装置的组成结构示意图,如图5所示,所述装置500包括:第一获取模块501、第一处理模块502和第一输出模块503,其中:第一获取模块501,配置为获取待识别图像;第一处理模块502,配置为将所述待识别图像输入,经过训练得到的神经网络模型,得到所述待识别图像的识别结果,其中,所述神经网络模型是对神经网络进行IN和BN处理得到的。第一输出模块503,配置为输出所述待识别图像的识别结果。
在一些实施例中,所述装置500还包括:第二处理模块,配置为对所述神经网络中卷积层输出的特征图进行IN和BN处理,到所述神经网络模型。
在一些实施例中,所述第二处理模块,包括:第一确定模块,配置为从所述神经网络的卷积层中确定第一卷积层集合和所述第二卷积层集合;第一子处理模块,配置为对所述第一卷积层集合中的每一卷积层输出的特征图进行IN处理;第二子处理模块,配置为对所述第二卷积层集合中的每一卷积层输出的特征图进行BN处理。
在一些实施例中,所述第一卷积层集合和所述第二卷积层集合组成的集合为所述神经网络的所有卷积层中的全部或部分。在本申请实施例中,所述第一卷积层集合与所述第二卷积层集合不具有交集,或者,所述第一卷积层集合与所述第二卷积层集合具有交集;或者,所述第二卷积层集合为所述第一卷积层集合的子集合。
在一些实施例中,所述第一子处理模块包括:第一子确定模块,配置为从所述第一卷积层集合中的每一所述卷积层输出的特征图对应的通道中确定第一通道集合;第三子处理模块,配置为对所述第一通道集合进行IN处理。
在一些实施例中,所述第二子处理模块,包括:第二子确定单元,配置为从每一所述卷积层输出的特征图对应的通道中确定第二通道集合;第四子处理模块,配置为对所述第二通道集合进行BN处理。
在一些实施例中,所述第一通道集合为所述第一卷积层集合中的每一所述卷积层输出的特征图对应的所有通道的全部或部分;所述第二通道集合为所述第二卷积层集合中的每一所述卷积层输出的特征图对应的所有通道的全部或部分。
在一些实施例中,所述装置还包括:
第二处理模块,配置为对所述神经网络的两个块对应的特征图进行求和,得到输出结果,对所述输出结果进行IN处理;其中,所述神经网络至少包括两个块,且每一块的最后一层输出的特征图对应的通道数与上一块的最后一层输出的特征图对应的通道数相同。
需要说明的是,以上装置实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请装置实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。需要说明的是,本申请实施例中,如果以软件功能模块的形式实现上述的即时通讯方法,并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台即时通讯设备(可以是终端、服务器等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。这样,本申请实施例不限制于任何特定的硬件和软件结合。
相应地,本申请实施例再提供一种计算机程序产品,所述计算机程序产品包括计算机可执行指令,该计算机可执行指令被执行后,能够实现本申请实施例提供的图像识别方法中的步骤。相应地,本申请实施例再提供一种计算机存储介质,所述计算机存储介质上存储有计算机可执行指令,所述该计算机可执行指令被处理器执行时实现上述实施例提供的图像识别方法的步骤。相应地,本申请实施例提供一种计算机设备,图6为本申请实施例计算机设备的组成结构示意图,如图6所示,所述设备600包括:一个处理器601、至少一个通信总线602、用户接口603、至少一个外部通信接口604和存储器605。其中,通信总线602配置为实现这些组件之间的连接通信。其中,用户接口603可以包括显示屏,外部通信接口604可以包括标准的有线接口和无线接口。其中所述处理器601,配置为执行存储器中存储的病理图像识别程序,以实现上述实施例提供的图像识别方法的步骤。
以上计算机设备和存储介质实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请即时通讯设备和存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。应理解,说明书通篇中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。上述本申请实施例序号为了描述,不代表实施例的优劣。需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置 不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元;既可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。另外,在本申请各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。
Claims (22)
- 一种图像识别方法,其中,所述方法包括:获取待识别图像;将所述待识别图像输入,经过训练得到的神经网络模型,得到所述待识别图像的识别结果,其中,所述神经网络模型是对所述神经网络中卷积层输出的特征图进行实例归一化IN和批归一化BN处理后得到的;输出所述待识别图像的识别结果
- 根据权利要求1所述的方法,其中,所述神经网络模型的训练过程,包括:从所述神经网络的卷积层中确定第一卷积层集合和所述第二卷积层集合;对所述第一卷积层集合中的每一卷积层输出的特征图进行IN处理;对所述第二卷积层集合中的每一卷积层输出的特征图进行BN处理。
- 根据权利要求2中所述的方法,其中,所述第一卷积层集合和所述第二卷积层集合组成的集合为所述神经网络的所有卷积层中的全部或部分。
- 根据权利要求2中所述的方法,其中,所述第一卷积层集合与所述第二卷积层集合不具有交集,或者,所述第一卷积层集合与所述第二卷积层集合具有交集;或者,所述第二卷积层集合为所述第一卷积层集合的子集合。
- 根据权利要求2中所述的方法,其中,所述对所述第一卷积层集合中的每一卷积层输出的特征图进行IN处理,包括:从所述第一卷积层集合中的每一所述卷积层输出的特征图对应的通道中确定第一通道集合;对所述第一通道集合进行IN处理。
- 根据权利要求2中所述的方法,其中,所述对所述第二卷积层集合中的每一卷积层输出的特征图进行BN处理,包括:从所述第二卷积层集合中的每一所述卷积层输出的特征图对应的通道中确定第二通道集合;对所述第二通道集合进行BN处理。
- 根据权利要求2至6任一项中所述的方法,其中,所述第一通道集合为所述第一卷积层集合中的每一所述卷积层输出的特征图对应的所有通道的全部或部分;所述第二通道集合为所述第二卷积层集合中的每一所述卷积层输出的特征图对应的所有通道的全部或部分。
- 根据权利要求2至7任一项中所述的方法,其中,所述第一卷积层集合中不包括所述神经网络中的最后一层卷积层。
- 根据权利要求2至8任一项中所述的方法,其中,所述神经网络至少包括两个块,且每一块的最后一层输出的特征图对应的通道数与上一块的最后一层输出的特征图 对应的通道数相同,所述方法还包括:对所述神经网络的两个块对应的特征图进行求和,得到输出结果;对所述输出结果进行IN处理。
- 一种图像识别装置,其中,所述装置包括:第一获取模块、第一处理模块和第一输出模块,其中:第一获取模块,配置为获取待识别图像;第一处理模块,配置为将所述待识别图像输入,经过训练得到的神经网络模型,得到所述待识别图像的识别结果,其中,所述神经网络模型是对神经网络进行IN和BN处理得到的。第一输出模块,配置为输出所述待识别图像的识别结果。
- 根据权利要求10所述的装置,所述装置还包括:第二处理模块,配置为对所述神经网络中卷积层输出的特征图进行IN和BN处理,到所述神经网络模型。
- 根据权利要求11所述的装置,其中,所述第二处理模块,包括:第一确定模块,配置为从所述神经网络的卷积层中确定第一卷积层集合和所述第二卷积层集合;第一子处理模块,配置为对所述第一卷积层集合中的每一卷积层输出的特征图进行IN处理;第二子处理模块,配置为对所述第二卷积层集合中的每一卷积层输出的特征图进行BN处理。
- 根据权利要求12所述的装置,其中,所述第一卷积层集合和所述第二卷积层集合组成的集合为所述神经网络的所有卷积层中的全部或部分。
- 根据权利要求12所述的装置,其中,所述第一卷积层集合与所述第二卷积层集合不具有交集,或者,所述第一卷积层集合与所述第二卷积层集合具有交集;或者,所述第二卷积层集合为所述第一卷积层集合的子集合。
- 根据权利要求12所述的装置,其中,所述第一子处理模块包括:第一子确定模块,配置为从所述第一卷积层集合中的每一所述卷积层输出的特征图对应的通道中确定第一通道集合;第三子处理模块,配置为对所述第一通道集合进行IN处理。
- 根据权利要求12所述的装置,其中,所述第二子处理模块,包括:第二子确定单元,配置为从每一所述卷积层输出的特征图对应的通道中确定第二通道集合;第四子处理模块,配置为对所述第二通道集合进行BN处理。
- 根据权利要求12至16任一项中所述的装置,其中,所述第一通道集合为所述第一卷积层集合中的每一所述卷积层输出的特征图对应的所有通道的全部或部分;所述第二通道集合为所述第二卷积层集合中的每一所述卷积层输出的特征图对应的所有通 道的全部或部分。
- 根据权利要求12至17任一项中所述的装置,其中,所述神经网络至少包括两个块,且每一块的最后一层输出的特征图对应的通道数与上一块的最后一层输出的特征图对应的通道数相同,所述装置还包括:第二处理模块,配置为对所述神经网络的两个块对应的特征图进行求和,得到输出结果,对所述输出结果进行IN处理。
- 根据权利要求12至18任一项中所述的装置,其中,所述第一卷积层集合中不包括所述神经网络中的最后一层卷积层。
- 一种计算机存储介质,其中,所述计算机存储介质上存储有计算机可执行指令,该计算机可执行指令被执行后,能够实现权利要求1至9任一项所述的方法步骤。
- 一种计算机设备,其中,所述计算机设备包括存储器和处理器,所述存储器上存储有计算机可执行指令,所述处理器运行所述存储器上的计算机可执行指令时可实现权利要求1至9任一项所述的方法步骤。
- 一种计算程序产品,其中,所述计算机程序产品包括计算机可执行指令,该计算机可执行指令被执行后,能够实现权利要求1至9任一项所述的方法步骤。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2020548653A JP6930039B2 (ja) | 2018-05-23 | 2019-03-07 | 画像処理方法、装置、コンピュータ装置及びコンピュータ記憶媒体 |
SG11202009173YA SG11202009173YA (en) | 2018-05-23 | 2019-03-07 | Image processing method and apparatus, computer device, and computer storage medium |
US17/072,324 US11080569B2 (en) | 2018-05-23 | 2020-10-16 | Method and device for image processing, and computer storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810500185.0A CN108875787B (zh) | 2018-05-23 | 2018-05-23 | 一种图像识别方法及装置、计算机设备和存储介质 |
CN201810500185.0 | 2018-05-23 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/072,324 Continuation US11080569B2 (en) | 2018-05-23 | 2020-10-16 | Method and device for image processing, and computer storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019223397A1 true WO2019223397A1 (zh) | 2019-11-28 |
Family
ID=64333566
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/077341 WO2019223397A1 (zh) | 2018-05-23 | 2019-03-07 | 图像处理方法、装置、计算机设备和计算机存储介质 |
Country Status (5)
Country | Link |
---|---|
US (1) | US11080569B2 (zh) |
JP (1) | JP6930039B2 (zh) |
CN (1) | CN108875787B (zh) |
SG (1) | SG11202009173YA (zh) |
WO (1) | WO2019223397A1 (zh) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111179189A (zh) * | 2019-12-15 | 2020-05-19 | 深圳先进技术研究院 | 基于生成对抗网络gan的图像处理方法、装置以及电子设备、存储介质 |
CN111709481A (zh) * | 2020-06-17 | 2020-09-25 | 云南省烟草农业科学研究院 | 一种烟草病害识别方法、系统、平台及存储介质 |
CN111738045A (zh) * | 2020-01-19 | 2020-10-02 | 中国科学院上海微系统与信息技术研究所 | 一种图像检测方法、装置、电子设备及存储介质 |
CN111832577A (zh) * | 2020-07-19 | 2020-10-27 | 武汉悟空游人工智能应用软件有限公司 | 一种基于稠密连接的感数预测方法 |
CN112651333A (zh) * | 2020-12-24 | 2021-04-13 | 世纪龙信息网络有限责任公司 | 静默活体检测方法、装置、终端设备和存储介质 |
CN113542525A (zh) * | 2021-06-30 | 2021-10-22 | 中国人民解放军战略支援部队信息工程大学 | 基于mmd残差的隐写检测特征选取方法 |
CN113657493A (zh) * | 2021-08-17 | 2021-11-16 | 北京理工大学 | 基于风格特征通道注意力的x光安检图像违禁品检测方法 |
CN114077891A (zh) * | 2020-08-07 | 2022-02-22 | 北京达佳互联信息技术有限公司 | 风格转换模型的训练方法及虚拟建筑检测模型的训练方法 |
CN114548201A (zh) * | 2021-11-15 | 2022-05-27 | 北京林业大学 | 无线信号的自动调制识别方法、装置、存储介质及设备 |
WO2024130704A1 (zh) * | 2022-12-23 | 2024-06-27 | 京东方科技集团股份有限公司 | 基于神经网络的文本抠图方法、装置、设备以及存储介质 |
Families Citing this family (43)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108875787B (zh) | 2018-05-23 | 2020-07-14 | 北京市商汤科技开发有限公司 | 一种图像识别方法及装置、计算机设备和存储介质 |
KR102200496B1 (ko) * | 2018-12-06 | 2021-01-08 | 주식회사 엘지씨엔에스 | 딥러닝을 이용한 이미지 인식 방법 및 서버 |
CN109671063B (zh) * | 2018-12-11 | 2020-08-18 | 西安交通大学 | 一种基于深度网络特征间重要性的图像质量评估方法 |
CN109784347B (zh) * | 2018-12-17 | 2022-04-26 | 西北工业大学 | 基于多尺度稠密卷积神经网络和谱注意力机制的图像分类方法 |
CN109708740A (zh) * | 2018-12-21 | 2019-05-03 | 西安科技大学 | 一种用于超市的智能自主识别称重系统 |
CN109766854A (zh) * | 2019-01-15 | 2019-05-17 | 济南浪潮高新科技投资发展有限公司 | 一种基于两阶段互补网络的鲁棒人脸识别算法 |
CN109886922B (zh) * | 2019-01-17 | 2023-08-18 | 丽水市中心医院 | 基于SE-DenseNet深度学习框架和增强MR图像的肝细胞癌自动分级方法 |
CN109886392B (zh) * | 2019-02-25 | 2021-04-27 | 深圳市商汤科技有限公司 | 数据处理方法和装置、电子设备和存储介质 |
CN109883990B (zh) * | 2019-02-28 | 2021-07-06 | 吉林大学 | 一种药用真菌近红外光谱分析方法 |
CN111666960B (zh) * | 2019-03-06 | 2024-01-19 | 南京地平线机器人技术有限公司 | 图像识别方法、装置、电子设备及可读存储介质 |
KR102046113B1 (ko) * | 2019-03-19 | 2019-11-18 | 주식회사 루닛 | 신경망 학습 방법 및 그 장치 |
CN111753862B (zh) * | 2019-03-29 | 2024-07-05 | 北京地平线机器人技术研发有限公司 | 训练神经网络模型的方法及装置、图像识别方法 |
CN109961102B (zh) * | 2019-03-30 | 2021-06-22 | 北京市商汤科技开发有限公司 | 图像处理方法、装置、电子设备及存储介质 |
CN110059744B (zh) * | 2019-04-16 | 2022-10-25 | 腾讯科技(深圳)有限公司 | 训练神经网络的方法、图像处理的方法、设备及存储介质 |
CN110348543B (zh) * | 2019-06-10 | 2023-01-06 | 腾讯医疗健康(深圳)有限公司 | 眼底图像识别方法、装置、计算机设备及存储介质 |
CN110245720B (zh) * | 2019-06-22 | 2023-05-26 | 中南林业科技大学 | 一种基于深度学习的柑橘病虫害智能诊断方法及系统 |
CN110414520B (zh) * | 2019-06-28 | 2024-07-02 | 平安科技(深圳)有限公司 | 通用字符识别方法、装置、计算机设备和存储介质 |
GB2586265B (en) * | 2019-08-15 | 2023-02-15 | Vision Semantics Ltd | Text based image search |
CN110533031A (zh) * | 2019-08-21 | 2019-12-03 | 成都电科慧安科技有限公司 | 一种目标检测识别与定位的方法 |
CN110781948A (zh) * | 2019-10-22 | 2020-02-11 | 北京市商汤科技开发有限公司 | 图像处理方法、装置、设备及存储介质 |
CN110827251B (zh) * | 2019-10-30 | 2023-03-28 | 江苏方天电力技术有限公司 | 一种基于航拍图像的输电线路锁紧销缺陷检测方法 |
CN113496237B (zh) * | 2020-03-20 | 2024-05-24 | 商汤集团有限公司 | 域适应神经网络训练和交通环境图像处理方法及装置 |
CN111553392B (zh) * | 2020-04-17 | 2024-03-01 | 东南大学 | 一种基于卷积神经网络的细粒度犬类图像识别方法 |
CN111652170A (zh) * | 2020-06-09 | 2020-09-11 | 电子科技大学 | 基于二通道残差深度神经网络的二次雷达信号处理方法 |
CN111767808A (zh) * | 2020-06-16 | 2020-10-13 | 厦门市美亚柏科信息股份有限公司 | 一种目标重识别的方法、装置、系统及计算机存储介质 |
CN111783570A (zh) * | 2020-06-16 | 2020-10-16 | 厦门市美亚柏科信息股份有限公司 | 一种目标重识别的方法、装置、系统及计算机存储介质 |
CN111738436B (zh) * | 2020-06-28 | 2023-07-18 | 电子科技大学中山学院 | 一种模型蒸馏方法、装置、电子设备及存储介质 |
CN112037256A (zh) * | 2020-08-17 | 2020-12-04 | 中电科新型智慧城市研究院有限公司 | 目标跟踪方法、装置、终端设备及计算机可读存储介质 |
CN111815627B (zh) * | 2020-08-24 | 2020-12-01 | 成都睿沿科技有限公司 | 遥感图像变化检测方法、模型训练方法及对应装置 |
CN112201255B (zh) * | 2020-09-30 | 2022-10-21 | 浙江大学 | 语音信号频谱特征和深度学习的语音欺骗攻击检测方法 |
CN112417955B (zh) * | 2020-10-14 | 2024-03-05 | 国能大渡河沙坪发电有限公司 | 巡检视频流处理方法及装置 |
CN112990053B (zh) * | 2021-03-29 | 2023-07-25 | 腾讯科技(深圳)有限公司 | 图像处理方法、装置、设备及存储介质 |
CN113326809A (zh) * | 2021-06-30 | 2021-08-31 | 重庆大学 | 基于三通道神经网络的离线签名鉴定方法及系统 |
CN113537044B (zh) * | 2021-07-14 | 2022-08-26 | 哈尔滨理工大学 | 基于STFT与改进DenseNet的航空发动机故障诊断方法 |
CN113343943B (zh) * | 2021-07-21 | 2023-04-28 | 西安电子科技大学 | 基于巩膜区域监督的眼部图像分割方法 |
CN113609951B (zh) * | 2021-07-30 | 2023-11-24 | 北京百度网讯科技有限公司 | 目标检测模型的训练和目标检测方法、装置、设备及介质 |
CN113705386A (zh) * | 2021-08-12 | 2021-11-26 | 北京有竹居网络技术有限公司 | 视频分类方法、装置、可读介质和电子设备 |
CN113706486B (zh) * | 2021-08-17 | 2024-08-02 | 西安电子科技大学 | 基于密集连接网络迁移学习的胰腺肿瘤图像分割方法 |
CN114241247B (zh) * | 2021-12-28 | 2023-03-07 | 国网浙江省电力有限公司电力科学研究院 | 一种基于深度残差网络的变电站安全帽识别方法及系统 |
CN114972952B (zh) * | 2022-05-29 | 2024-03-22 | 重庆科技学院 | 一种基于模型轻量化的工业零部件缺陷识别方法 |
CN116128876B (zh) * | 2023-04-04 | 2023-07-07 | 中南大学 | 一种基于异构域的医学图像分类方法和系统 |
CN117115641B (zh) * | 2023-07-20 | 2024-03-22 | 中国科学院空天信息创新研究院 | 建筑物信息提取方法、装置、电子设备及存储介质 |
CN117593610B (zh) * | 2024-01-17 | 2024-04-26 | 上海秋葵扩视仪器有限公司 | 图像识别网络训练及部署、识别方法、装置、设备及介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107862374A (zh) * | 2017-10-30 | 2018-03-30 | 中国科学院计算技术研究所 | 基于流水线的神经网络处理系统和处理方法 |
CN107909016A (zh) * | 2017-11-03 | 2018-04-13 | 车智互联(北京)科技有限公司 | 一种卷积神经网络生成方法及车系识别方法 |
WO2018075927A1 (en) * | 2016-10-21 | 2018-04-26 | Google Llc | Stylizing input images |
CN108875787A (zh) * | 2018-05-23 | 2018-11-23 | 北京市商汤科技开发有限公司 | 一种图像识别方法及装置、计算机设备和存储介质 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102055355B1 (ko) * | 2015-01-28 | 2019-12-12 | 구글 엘엘씨 | 배치 정규화 레이어들 |
NL2015087B1 (en) * | 2015-06-05 | 2016-09-09 | Univ Amsterdam | Deep receptive field networks. |
JP6561877B2 (ja) * | 2016-03-01 | 2019-08-21 | 株式会社デンソー | 演算処理装置 |
RU2016138608A (ru) * | 2016-09-29 | 2018-03-30 | Мэджик Лип, Инк. | Нейронная сеть для сегментации изображения глаза и оценки качества изображения |
CN107247949B (zh) * | 2017-08-02 | 2020-06-19 | 智慧眼科技股份有限公司 | 基于深度学习的人脸识别方法、装置和电子设备 |
US11521301B2 (en) * | 2017-09-22 | 2022-12-06 | Hoffman-La Roche, Inc. | Artifacts removal from tissue images |
CN107657281A (zh) * | 2017-09-28 | 2018-02-02 | 辽宁工程技术大学 | 一种基于改进的卷积神经网络的图像识别方法 |
CN107767343B (zh) * | 2017-11-09 | 2021-08-31 | 京东方科技集团股份有限公司 | 图像处理方法、处理装置和处理设备 |
US10643063B2 (en) * | 2018-04-09 | 2020-05-05 | Qualcomm Incorporated | Feature matching with a subspace spanned by multiple representative feature vectors |
-
2018
- 2018-05-23 CN CN201810500185.0A patent/CN108875787B/zh active Active
-
2019
- 2019-03-07 WO PCT/CN2019/077341 patent/WO2019223397A1/zh active Application Filing
- 2019-03-07 JP JP2020548653A patent/JP6930039B2/ja active Active
- 2019-03-07 SG SG11202009173YA patent/SG11202009173YA/en unknown
-
2020
- 2020-10-16 US US17/072,324 patent/US11080569B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018075927A1 (en) * | 2016-10-21 | 2018-04-26 | Google Llc | Stylizing input images |
CN107862374A (zh) * | 2017-10-30 | 2018-03-30 | 中国科学院计算技术研究所 | 基于流水线的神经网络处理系统和处理方法 |
CN107909016A (zh) * | 2017-11-03 | 2018-04-13 | 车智互联(北京)科技有限公司 | 一种卷积神经网络生成方法及车系识别方法 |
CN108875787A (zh) * | 2018-05-23 | 2018-11-23 | 北京市商汤科技开发有限公司 | 一种图像识别方法及装置、计算机设备和存储介质 |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111179189B (zh) * | 2019-12-15 | 2023-05-23 | 深圳先进技术研究院 | 基于生成对抗网络gan的图像处理方法、装置以及电子设备、存储介质 |
CN111179189A (zh) * | 2019-12-15 | 2020-05-19 | 深圳先进技术研究院 | 基于生成对抗网络gan的图像处理方法、装置以及电子设备、存储介质 |
CN111738045A (zh) * | 2020-01-19 | 2020-10-02 | 中国科学院上海微系统与信息技术研究所 | 一种图像检测方法、装置、电子设备及存储介质 |
CN111738045B (zh) * | 2020-01-19 | 2024-04-19 | 中国科学院上海微系统与信息技术研究所 | 一种图像检测方法、装置、电子设备及存储介质 |
CN111709481A (zh) * | 2020-06-17 | 2020-09-25 | 云南省烟草农业科学研究院 | 一种烟草病害识别方法、系统、平台及存储介质 |
CN111709481B (zh) * | 2020-06-17 | 2023-12-12 | 云南省烟草农业科学研究院 | 一种烟草病害识别方法、系统、平台及存储介质 |
CN111832577A (zh) * | 2020-07-19 | 2020-10-27 | 武汉悟空游人工智能应用软件有限公司 | 一种基于稠密连接的感数预测方法 |
CN114077891A (zh) * | 2020-08-07 | 2022-02-22 | 北京达佳互联信息技术有限公司 | 风格转换模型的训练方法及虚拟建筑检测模型的训练方法 |
CN112651333B (zh) * | 2020-12-24 | 2024-02-09 | 天翼数字生活科技有限公司 | 静默活体检测方法、装置、终端设备和存储介质 |
CN112651333A (zh) * | 2020-12-24 | 2021-04-13 | 世纪龙信息网络有限责任公司 | 静默活体检测方法、装置、终端设备和存储介质 |
CN113542525A (zh) * | 2021-06-30 | 2021-10-22 | 中国人民解放军战略支援部队信息工程大学 | 基于mmd残差的隐写检测特征选取方法 |
CN113657493A (zh) * | 2021-08-17 | 2021-11-16 | 北京理工大学 | 基于风格特征通道注意力的x光安检图像违禁品检测方法 |
CN114548201A (zh) * | 2021-11-15 | 2022-05-27 | 北京林业大学 | 无线信号的自动调制识别方法、装置、存储介质及设备 |
WO2024130704A1 (zh) * | 2022-12-23 | 2024-06-27 | 京东方科技集团股份有限公司 | 基于神经网络的文本抠图方法、装置、设备以及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN108875787B (zh) | 2020-07-14 |
SG11202009173YA (en) | 2020-10-29 |
JP6930039B2 (ja) | 2021-09-01 |
US11080569B2 (en) | 2021-08-03 |
CN108875787A (zh) | 2018-11-23 |
JP2021509994A (ja) | 2021-04-08 |
US20210034913A1 (en) | 2021-02-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019223397A1 (zh) | 图像处理方法、装置、计算机设备和计算机存储介质 | |
Ramakrishnan et al. | Overcoming language priors in visual question answering with adversarial regularization | |
Zhong et al. | Ghostvlad for set-based face recognition | |
Deng et al. | Image aesthetic assessment: An experimental survey | |
CN112528780B (zh) | 通过混合时域自适应的视频动作分割 | |
US9633282B2 (en) | Cross-trained convolutional neural networks using multimodal images | |
CN112507898A (zh) | 一种基于轻量3d残差网络和tcn的多模态动态手势识别方法 | |
CN111133453B (zh) | 人工神经网络 | |
CN109348211B (zh) | 一种视频帧内帧间编码的通用信息隐藏检测方法 | |
Zhao et al. | Scale-aware crowd counting via depth-embedded convolutional neural networks | |
US10929676B2 (en) | Video recognition using multiple modalities | |
CN112084891B (zh) | 基于多模态特征与对抗学习的跨域人体动作识别方法 | |
Nousi et al. | Deep autoencoders for attribute preserving face de-identification | |
CN112052808A (zh) | 细化深度图的人脸活体检测方法、装置、设备及存储介质 | |
WO2023185074A1 (zh) | 一种基于互补时空信息建模的群体行为识别方法 | |
US20240013564A1 (en) | System, devices and/or processes for training encoder and/or decoder parameters for object detection and/or classification | |
Gupta et al. | [Retracted] CNN‐LSTM Hybrid Real‐Time IoT‐Based Cognitive Approaches for ISLR with WebRTC: Auditory Impaired Assistive Technology | |
Li et al. | ReNAP: Relation network with adaptiveprototypical learning for few-shot classification | |
Yu et al. | Hand gesture recognition based on attentive feature fusion | |
CN114912540A (zh) | 迁移学习方法、装置、设备及存储介质 | |
Dastbaravardeh et al. | Channel Attention‐Based Approach with Autoencoder Network for Human Action Recognition in Low‐Resolution Frames | |
US20240320493A1 (en) | Improved Two-Stage Machine Learning for Imbalanced Datasets | |
Xiong et al. | Joint intensity–gradient guided generative modeling for colorization | |
CN109472307A (zh) | 一种训练图像分类模型的方法和装置 | |
Dong et al. | A supervised dictionary learning and discriminative weighting model for action recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19807565 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2020548653 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19807565 Country of ref document: EP Kind code of ref document: A1 |