WO2021051520A1 - 图像识别、训练识别模型的方法、相关设备及存储介质 - Google Patents

图像识别、训练识别模型的方法、相关设备及存储介质 Download PDF

Info

Publication number
WO2021051520A1
WO2021051520A1 PCT/CN2019/116943 CN2019116943W WO2021051520A1 WO 2021051520 A1 WO2021051520 A1 WO 2021051520A1 CN 2019116943 W CN2019116943 W CN 2019116943W WO 2021051520 A1 WO2021051520 A1 WO 2021051520A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
convolution
convolutional
output
input
Prior art date
Application number
PCT/CN2019/116943
Other languages
English (en)
French (fr)
Inventor
韦嘉楠
王义文
王健宗
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021051520A1 publication Critical patent/WO2021051520A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • This application relates to the field of computer application technology, in particular to image recognition, methods for training recognition models, related equipment and storage media.
  • convolutional neural networks have been widely used in various fields (such as semantic understanding, image recognition, etc.).
  • the most representative structure in the applied convolutional neural network is the Inception structure.
  • the Inception structure is used to increase the depth and width of the network, thereby improving the performance of the neural network.
  • multiple different The size of the convolution kernel enhances the adaptability of the network.
  • different sizes of convolution kernels are introduced. For example, 1*1, 3*3, 5*5 convolution kernels are used. These different convolution kernels extract features of different scales to increase The diversity of characteristics.
  • a larger size convolution kernel is set in the model, and the convolutional neural network model deployed on the terminal has more parameters.
  • the model is larger, and the image recognition model occupies more resources of the mobile terminal, which reduces the operating speed of the mobile terminal.
  • the embodiments of the present application provide a training recognition model, an image recognition method, related equipment, and storage medium, which are used to ensure the image recognition rate while reducing the occupation of processing resources of the terminal.
  • an image recognition method including:
  • the image to be recognized is input to the image recognition model;
  • the image recognition model includes T convolutional layers, each of the T convolutional layers includes parallel N-way convolution of holes, the N
  • the size of the convolution kernels of the path hole convolution are the same, the weights are the same, and the hole rates are different from each other, and the N is a positive integer greater than or equal to 2;
  • the recognition result is output through the image recognition model.
  • an embodiment of the present application provides a method for training a recognition model, including:
  • the data set to be trained is input to a hollow convolutional neural network
  • the hollow convolutional neural network includes T convolutional layers
  • each convolutional layer of the T convolutional layers includes parallel N-way hollow volumes Product
  • the size of the convolution kernels of the N-way hole convolution are the same
  • the weights are the same
  • the hole ratios are different from each other
  • the N and T are both positive integers greater than or equal to 2;
  • the multiple feature maps are spliced and then input to the (M+1)th convolutional layer; the convolution kernel of the Mth convolutional layer and the convolution of the (M+1)th convolutional layer
  • the weight of the product core is different;
  • the feature map output by the Tth convolutional layer is spliced and then input to the output layer, and the classification result is output through the output layer, where the T is greater than or equal to the (M+1);
  • the parameters of the convolutional neural network are determined according to the classification result to obtain a recognition model.
  • an image recognition device including:
  • the acquisition module is used to acquire the image to be recognized
  • the image input module is used to input the image to be recognized obtained by the obtaining module into an image recognition model;
  • the image recognition model includes T convolutional layers, and each convolutional layer of the T convolutional layers includes parallel N-way convolution with holes, the size of the convolution kernels of the N-way convolution with the holes are the same, the weights are the same, and the hole rates are different from each other, and the N is a positive integer greater than or equal to 2;
  • the result output module is used to output the recognition result.
  • an apparatus for training a recognition model including:
  • the acquisition module is used to acquire the image to be recognized
  • the image input module is used to input the image to be recognized obtained by the acquisition module into the hole convolutional neural network model;
  • the hole convolutional neural network model includes T convolutional layers, each of the T convolutional layers
  • Each convolutional layer includes parallel N-way convolution of holes, the convolution kernels of the N-way convolution of holes have the same size, the same weight, and different hole rates, and the N is a positive integer greater than or equal to 2;
  • Convolution module in the M-th convolution layer of the T convolution layers input by the image input module, convolution is performed on the convolution kernel of each convolution of the N channels of the convolution of the holes During the process, multiple feature maps are obtained;
  • the feature map input module is used for inputting the multiple feature maps obtained by the convolution module to the (M+1)th convolutional layer after splicing; the convolution kernel of the Mth convolutional layer and The weights of the convolution kernels of the (M+1)th convolution layer are different;
  • the result output module is used to input the feature map output by the T-th convolutional layer to the output layer through splicing, and output the recognition result through the output layer, and the T is greater than or equal to the (M+1).
  • an embodiment of the present application provides a computer device, including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, and the processor executes the computer
  • the method described in the first aspect or the second aspect is executed when the instruction is readable.
  • the embodiments of the present application provide one or more readable storage media storing computer readable instructions.
  • the computer readable instructions are executed by one or more processors, the one or more processing The device executes the method described in the above first aspect or the above second aspect.
  • FIG. 1 is a flowchart of an embodiment of a method for training a recognition model in an embodiment of the present application
  • FIG. 2 is a schematic diagram of convolution kernels with different hole ratios in an embodiment of the present application
  • Fig. 3 is a schematic structural diagram of a hollow convolutional neural network in an embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of an embodiment of an image recognition apparatus in an embodiment of the present application.
  • Fig. 6 is a schematic structural diagram of an embodiment of an apparatus for training a recognition model in an embodiment of the present application
  • Fig. 7 is a schematic structural diagram of an embodiment of a computer device in an embodiment of the present application.
  • the embodiment of the present application provides a method for image recognition, which can be applied to a terminal.
  • the terminal may include but is not limited to a mobile phone, a tablet computer, etc.
  • the image to be recognized is first acquired;
  • the image to be recognized is input to the image recognition model.
  • the image recognition model in the embodiment of the present application includes T convolutional layers, and each convolutional layer of the T convolutional layers includes parallel N-way convolution of holes.
  • the size of the convolution kernels of the path hole convolution are the same, the weights are the same, and the hole ratios are different from each other;
  • the size of the original convolution kernels of the same convolutional layer in the embodiment of this application is the same, for example, N in the same convolutional layer
  • the size of the original convolution kernel of the hole convolution of the road can be 3*3, and the weight of the convolution kernel of the hole convolution of the N way is the same, and the number of the convolution kernel of the N way is the same.
  • the void ratios of the N-way convolution are different from each other.
  • N-way convolution shares the weights of the convolution kernel. There is no need to increase the convolution kernel of different sizes to expand the receptive field as in the traditional method. This effectively reduces the volume of the image recognition model and effectively reduces the deployment of the image recognition model on mobile.
  • the resources occupied by the terminal increase the operating speed of the mobile terminal.
  • the embodiment of the present application includes two parts of the method steps, where the first part is to train the recognition model, and the second part is to recognize the image through the recognition model.
  • the execution subject of the method of training the recognition model can be a server or a terminal, which includes, but is not limited to, various personal computers, laptops, and so on.
  • the server is implemented as an independent server or a server cluster composed of multiple servers.
  • the method of training the recognition model can be online training or offline training, and the specific method is not limited.
  • the executive body of the training recognition model can be the same as or different from the executive body of the image recognition.
  • the execution subject of the method for training a recognition model can be described by taking a server as an example, and the execution subject of the image recognition method can be described by taking a terminal as an example.
  • an embodiment of the present application provides a method for training a recognition model, and the method may specifically include the following steps.
  • the data set to be trained may be an image data set, a text data set, etc.
  • the data set to be trained may obtain different types of data sets to be trained according to specific application scenarios. In this application, the data set to be trained can be illustrated by taking an image data set as an example.
  • the target data set can be crawled on the Internet through a crawler.
  • the target data set may be an animal image set.
  • the target data set is enhanced by a geometric transformation method to obtain the data set to be trained;
  • the geometric transformation method includes at least one of rotation transformation, flip transformation, scaling transformation and translation transformation.
  • rotation transformation can be understood as randomly rotating the image by a certain angle to change the orientation of the target object in the image; flipping transformation can be understood as flipping the image along the horizontal or vertical direction; scaling transformation can be understood as according to a certain ratio Enlarge or reduce the image; translation transformation can be understood as the translation of the image in a certain way to achieve the purpose of changing the target position.
  • the translation direction and distance can be preset or randomly generated.
  • the to-be-trained data set may include sufficient training samples.
  • S102 Input the data set to be trained into a hole convolutional neural network, where the holey convolutional neural network includes T convolutional layers, and each convolutional layer of the T convolutional layers includes N parallel paths.
  • Hole convolution the original convolution kernels of the N-way hole convolution have the same size, the same weight, and different hole rates.
  • the N is a positive integer greater than or equal to 2; T is a positive integer greater than or equal to 2. Integer.
  • Convolutional layer Repeated convolution operations in the form of sliding windows on the feature map through the convolution kernel. Please refer to Figure 2 for understanding.
  • the hole convolution can be understood from two perspectives: 1. From the perspective of the input image, the hole can be understood as sampling on the input image. The sampling frequency is set according to the hole rate (indicated by "rate"). When the hole rate is 1, the input image does not lose any information sampling. At this time, the convolution operation is a standard convolution operation; when the hole rate is greater than 1 For example, the hole rate is 2, which can be understood as sampling every other (rate-1) pixel on the input image.
  • the elements in the convolution kernel are the sampling points on the input image, and then the sampled image and The convolution kernel does convolution, thereby increasing the receptive field. 2. From the perspective of the convolution kernel, a hole (ie 0) is injected into the convolution kernel.
  • the convolution kernel after the hole injection can be called an "expanded convolution kernel", with a size of 3*3 convolution kernel as For example, when the void rate is 1, then the area not occupied by the original convolution kernel is filled with (rate-1) zeros, and the size of the receptive field of the convolution does not change; if the void rate is 2, then the volume Inject (void rate -1) zeros between adjacent elements in the convolution kernel, that is, inject one zero between adjacent elements in the original convolution kernel.
  • the size of the expanded convolution kernel is relative to that of the original convolution kernel. The size becomes larger, the size of the expanded convolution kernel becomes 5*5, and the receptive field of the expanded convolution kernel becomes 5*5, thereby increasing the receptive field.
  • the hollow convolutional neural network includes an input layer, a hidden layer and an output layer.
  • the hidden layer includes T convolutional layers.
  • T is a positive integer greater than or equal to 2, and T can be 5, 6, or 7
  • the specific number is not limited. In an application example, the T can be explained by taking 7 as an example.
  • the hidden layer includes 7 convolutional layers.
  • Each convolutional layer includes parallel N-way convolution of holes, and the N can be greater than 2, 3, 4, etc., and optionally, the N can be illustrated by taking 3 ways as an example. Take one convolutional layer in T convolutional layers as an example.
  • the M-th convolutional layer can be understood by referring to Figure 3.
  • a convolutional layer there are three-way convolution and three-way convolution.
  • the size of the convolution kernel (or “size”) of the convolution is the same.
  • the size of the original convolution kernel of each channel of the cavity convolution in the 3-way cavity convolution is 3*3 or 5*5, etc. It is not limited in the application. In this application, the size of the original convolution kernel is described by taking 3*3 as an example.
  • Receptive field the size of the visual perception area.
  • the receptive field is the size of the area mapped on the original image by the pixels on the feature map output by each layer of the convolutional neural network.
  • the size of the receptive field represents the amount of information contained in the extracted features. The larger the receptive field, the more contextual information it contains. It should be noted that the size of the receptive field of the output feature map pixel of the first layer of convolutional layer is equal to the size of the filter; the size of the receptive field of the deep convolutional layer is related to the size and step size of the convolution kernel of all the layers before it. .
  • r (m-1)*stride+ksize 1 ; where r is the receptive field of the current layer, m is the receptive field of the upper layer, stride is the convolution step length, and ksize 1 is the size of the expanded convolution kernel.
  • the initial receptive field is 1.
  • the convolution step size may be 1 as an example for description.
  • the convolution layer takes the first layer of hole convolution in the 7-layer hole convolution as an example to illustrate the size of the expanded convolution kernel and the size of the receptive field under different hole ratios.
  • the size of the original convolution kernel of the same convolutional layer is the same, that is, the size of the original convolution kernel of the three-way cavity convolution in the same convolutional layer can be 3*3, and the size of the three-way cavity
  • the weights of the convolution kernels of the convolution are the same, and the number of the convolution kernels of 3 ways is the same.
  • the number of the convolution kernel may be a multiple of 16, such as 16, 32, 64, and so on.
  • the void rate of the 3-way convolution is different.
  • the void rates can be respectively: the first path of void convolution has a void rate of 1, the second path of void convolution has a void rate of 2, and the third The hole ratio of the road hole convolution is 4. That is to say, in the same convolutional layer, the void ratios of the three-way convolution are different from each other, and the feature maps of three different receptive fields are obtained, and features of different scales can be extracted in the same layer; three-way The hollow convolution shares the weights of the convolution kernel, using less weight parameters, which can effectively reduce model parameters, reduce the possibility of overfitting, and speed up the calculation speed, which is conducive to the construction and learning of large-scale networks .
  • the M takes every value from 1 to (T-1).
  • T-1 The M takes every value from 1 to (T-1).
  • the M-th layer is described by taking the first layer as an example.
  • the first convolutional layer three parallel paths are included, and the hole ratios of the three paths are different, and the weights of the three convolution kernels are the same.
  • the weight of the original convolution kernel can be: [0, 1, 0]
  • 3 channels use the same convolution kernel for convolution, that is, the original size of the convolution kernel used by each of the 3 channels is the same, the weights are the same, and the 3 channels share parameters .
  • 3 channels of features of different scales can be obtained, and 3 channels of featuremaps can be obtained; or it can also be understood that the same type of features are captured at different sampling rates of the same featuremap.
  • S104 Input the multi-path feature map to the (M+1)th convolutional layer after splicing; the convolution kernel of the Mth convolutional layer and the (M+1)th convolutional layer The weights of the convolution kernels are different.
  • the 3-way feature map obtained from the first layer is spliced, and the spliced feature map is input to the second layer of convolutional layer.
  • the M layer takes the first convolutional layer as an example
  • the (M+1)th layer takes the second layer as an example for description.
  • the size and receptive field of the 3 dilated convolution kernels with different void ratios are calculated in the second convolution layer, please refer to Table 2 below:
  • the void ratios of the Mth layer and the (M+1)th layer may be the same or different.
  • the calculation of the receptive field in Table 2 above is based on the The hole rate of the M layer and the (M+1)th layer are the same. That is, the hole rate of the second hole convolutional layer is also 1, 2, and 4.
  • the first layer can obtain 3 feature maps, and the receptive fields are 3*3, 5*5, and 9*9, respectively. Road featuremap, and the receptive fields are 5*5, 9*9, and 17*17.
  • Features of different scales can be obtained in the same layer, and the parameters maintained in each layer are the same; in the prior art, in order to obtain Features of different scales, learning richer regional features, the size of the first convolution kernel is 1*1, the size of the second convolution kernel is 3*3, and the size of the third convolution kernel is 5*5. Every time one channel is added, one more convolution kernel parameter must be maintained. Compared with the prior art, the amount of parameters in this application is greatly reduced.
  • S105 Input the N-way feature maps output by the T-th convolutional layer to the output layer through splicing, and output the classification result through the output layer, where the T is greater than or equal to the (M+1).
  • the feature maps output by the T-th convolutional layer are spliced and then input to the global average pooling layer to obtain a feature vector
  • the feature vector is input to the output layer, and the classification result is output through the output layer.
  • T can be explained by taking 7 as an example.
  • the T-th convolutional layer also outputs the 3-way feature map. After the same 3-way feature map is spliced, it passes through the activation layer, and then is input to the global average pooling. Floor.
  • the 7 convolutional layers are down-sampled a total of 7 times.
  • the length and width of the feature map are small enough, which is more suitable for direct global average pooling.
  • Global average pooling refers to pooling the feature map of the last layer to a mean value of the entire image to form a feature point, which is composed of the final feature vector for softmax calculation.
  • the data of the last layer is 10 6*6 feature maps
  • global average pooling is to calculate the average value of all pixels for each feature map, and output a data value, so that 10 feature maps will output 10 Data points, these data points are formed into a 1*10 vector, which becomes a feature vector, and the feature vector is input into the softmax classification calculation to obtain the classification result.
  • the activation layer is connected after each convolutional layer, and the activation layer is composed of an activation function.
  • the activation function in the embodiment of the application is a retified linear unit (ReLU) .
  • the ReLU may be a leaky-relu function, a p-relu function, or an r-relu function.
  • S106 Determine the parameters of the hollow convolutional neural network according to the classification result to obtain a recognition model.
  • Image classification is performed through softmax regression, the parameters of the hollow convolutional neural network are trained through the back propagation algorithm, the training sample set is input to the hollow convolutional neural network to calculate the classification results, and the classification results and expectations are evaluated through the loss function The error value between the values, and then continuously update the parameter weights through the back propagation error and gradient to complete the training of the hollow convolutional neural network, and obtain the recognition model.
  • the first step of the back-propagation algorithm is to perform forward propagation and calculate the final loss function value, and then the second step is to find all the parameter gradients according to the back-propagation algorithm.
  • the parameters can be continuously adjusted. Optimization, and finally complete the training of the all-hole convolutional neural network.
  • the algorithm framework is as follows:
  • the sample set to be trained includes multiple image samples, the network structure of the convolutional neural network model and the number of network layers, the network structure of each layer in the network, and the activation function.
  • the gradient descent algorithm set the iteration step size, the maximum number of iterations, the stop iteration threshold, and the initial void rate.
  • Output parameter values of each layer of the all-hole convolutional neural network.
  • the parameter values of each layer include the weights of the convolution kernels in each layer of the convolutional layer.
  • N ways share the weights, and the convolution kernels of different convolutional layers The weights are different from each other.
  • the inputting the multiple feature maps to the (M+1)th convolutional layer after splicing may specifically include:
  • Down-sampling is performed through the maximum pooling layer, and the down-sampled feature map is input to the (M+1)th convolutional layer.
  • the role of the pooling layer can have two points.
  • the first point is to reduce the dimensionality of the data. After the feature map size is compressed, the complexity is reduced, and the calculation required for the convolutional layer and the activation layer is also reduced, which can speed up the training of the network. ;
  • the second point is to extract important information to suppress data noise.
  • the maximum pooling is used to facilitate the extraction of key information, such as key point extraction. For example, if the pooling window is 2*2, and the pooling step size is 2, the maximum pooling is to map the 2*2 area to the maximum value of the area. This pooling method can better retain the features on the texture. Because the most obvious value of the feature extracted by the upper layer is replaced with the value of the window.
  • the data is down-sampled, and the features of the key points of the image are extracted at the same time. Then the down-sampled feature map (also feature map) is input to the next convolutional layer.
  • the number of output channels of each convolutional layer in the T-layer convolutional layer is greater than the number of input channels.
  • the convolutional layer usually sets more channels (channel).
  • the number of output channels from the first convolutional layer to the T-th convolutional layer gradually increases.
  • the number of output channels of each convolutional layer in this embodiment of the present application may be twice or 4 times the number of input channels.
  • Feature map is a tensor shaped like the number of channels * width * height.
  • the featuremap at the bottom layer extracts image information such as textures and edges.
  • the advanced features have a lot of semantic information, so they are richer, and the more the bottom layer, the more channels are needed.
  • the multiple of 2 is more in line with the hardware design of the GPU, and the calculation speed will be faster.
  • the embodiment of the present application provides a method for training a recognition model.
  • the method is applied to a computer device.
  • the computer device may be a server or a terminal device.
  • the details are not limited.
  • the terminal includes but is not limited to each Kinds of personal computers, laptops, etc.
  • the server is implemented as an independent server or a server cluster composed of multiple servers.
  • the execution subject of the method can be described by taking the server as an example. First, the server obtains the data set to be trained; then the data set to be trained is input to the hollow convolutional neural network.
  • the hollow convolutional neural network includes T convolutional layers. Each convolutional layer of the T convolutional layers includes parallel N-way convolution of holes.
  • the convolution kernels of the N-way convolution of holes have the same size and weight.
  • the value is the same, the hole rate is different;
  • the size of the original convolution kernel of the same convolutional layer in the embodiment of this application is the same, for example, the size of the original convolution kernel of the N-way convolution of the same convolution layer is all It can be 3*3, and the weights of the convolution kernels of the N-way hole convolution are the same, and the number of the N-way convolution kernels is the same.
  • the void ratios of the N-way convolution are different from each other, and the feature maps of the N-way different receptive fields are obtained, and features of different scales can be extracted in the same layer; then, in the first T convolutional layer M convolutional layers, in the process of convolution by the convolution kernel of each of the N-way hole convolutions, multiple feature maps are obtained; the multiple feature maps are spliced and then input to the first (M+1) convolutional layers; the weights of the convolution kernel of the Mth convolutional layer and the convolution kernel of the (M+1)th convolutional layer are different; until the Tth convolutional layer
  • the feature map output by the convolutional layer is input to the output layer after splicing, and the classification result is output through the output layer; the parameters of the convolutional neural network are determined according to the classification result, and the recognition model is obtained.
  • the weights of the convolution kernel are shared by the N-way convolution of the holes, which uses a
  • FIG. 4 is a flowchart of an embodiment of an image recognition method in an embodiment of this application.
  • the method for image recognition may specifically include the following steps:
  • the size of the image to be recognized is not limited.
  • the image to be recognized may be an image sequence, which may include images of different sizes. For example, in a mobile terminal, an image of a distant object is acquired through a camera in the mobile terminal. Then the size of the image is smaller, and a closer scene image is obtained, and the size of the scene image is larger.
  • the recognition model includes T convolutional layers, and each convolutional layer in the T convolutional layers includes parallel N-way convolution of holes, and the convolution kernels of the N-way convolution of holes have the same size and weight.
  • the value is the same but the void ratio is different from each other, and the N is a positive integer greater than or equal to 2;
  • the recognition model is the recognition model trained in the above method of training a recognition model.
  • the recognition model in this step please refer to step 102 in the above embodiment, which will not be repeated here.
  • a plurality of feature maps are obtained during the convolution process of the convolution kernel of each of the N-hole convolutions. . Please refer to step 103 in the foregoing embodiment for understanding of this step, and will not be repeated here.
  • the multiple feature maps are spliced and then input to the (M+1)th convolutional layer; the convolution kernel of the Mth convolutional layer and the convolution of the (M+1)th convolutional layer
  • the weights of the product cores are different. Please refer to step 104 in the foregoing embodiment for understanding of this step, and will not be repeated here.
  • the feature map output by the Tth convolutional layer is spliced and then input to the output layer, and the recognition result is output through the output layer. Please refer to step 105 in the foregoing embodiment for understanding of this step, and will not be repeated here.
  • the image to be recognized is first obtained; then the image to be recognized is input to the image recognition model.
  • the image recognition model in the embodiment of the application includes T convolutional layers, and each of the T convolutional layers Each convolutional layer includes parallel N-way convolutional convolutions.
  • the convolution kernels of the N-way convolutional convolution have the same size, the same weight, and different hole ratios; the original convolutional layer in the embodiment of the present application
  • the size of the convolution kernel is the same.
  • the size of the original convolution kernel of the N-way convolution in the same convolution layer can be 3*3, and the weight of the convolution kernel of the N-way hole convolution is the same ,
  • the number of N-way convolution kernels is the same.
  • the void ratios of the N-way convolution are different from each other.
  • features of different scales can be extracted in the same layer, and the receptive fields can be expanded, which not only ensures the accuracy of image recognition.
  • the weights of the convolution kernel are shared by the N-way hollow convolution, and there is no need to increase the convolution kernel of different sizes as in the traditional method to expand the receptive field, effectively reducing the volume of the image recognition model, and ensuring the recognition accuracy of the recognized image
  • the resources occupied by the deployment of the image recognition model on the mobile terminal are effectively reduced, and the operating speed of the mobile terminal is improved.
  • an image recognition device 500 is provided, and the image recognition device corresponds to the image recognition method in the foregoing embodiment.
  • the image recognition apparatus 500 may specifically include:
  • the obtaining module 501 is used to obtain the image to be recognized
  • the image input module 502 is configured to input the image to be recognized obtained by the obtaining module 501 into an image recognition model;
  • the image recognition model includes T convolutional layers, each of the T convolutional layers Including parallel N-way convolution of holes, the size of the convolution kernels of the N-way convolution of holes are the same, the weights are the same, and the hole ratios are different from each other, and the N is a positive integer greater than or equal to 2;
  • the result output module 503 is configured to output the recognition result through the image recognition model input by the image input module 502.
  • Each module in the above-mentioned image recognition apparatus can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a device for training a recognition model corresponds to the method for training a recognition model in the foregoing embodiment.
  • the device 600 for training a recognition model may specifically include:
  • the obtaining module 601 is used to obtain the image to be recognized
  • the image input module 602 is configured to input the image to be recognized obtained by the acquisition module 601 into a hollow convolutional neural network model;
  • the hollow convolutional neural network model includes T convolutional layers, in the T convolutional layers
  • Each of the convolutional layers includes parallel N-way convolution of holes, the size of the convolution kernels of the N-way convolution of holes are the same, the weights are the same, and the hole rates are different from each other, and the N is a positive value greater than or equal to 2.
  • Convolution module 603 in the M-th convolution layer of the T convolution layers input by the image input module 602, the convolution kernel of each channel of the hole convolution in the N channels of cavity convolution is performed In the process of convolution, multiple feature maps are obtained;
  • the feature map input module 604 is configured to input the multiple feature maps obtained by the convolution module 603 into the (M+1)th convolutional layer after splicing; the convolution of the Mth convolutional layer The weights of the kernel and the convolution kernel of the (M+1)th convolution layer are different;
  • the result output module 605 is used to splice the feature map output by the Tth convolutional layer and input it to the output layer through the feature map input module 604, and output the recognition result through the output layer.
  • the T is greater than or equal to the ( M+1).
  • the feature map input module 604 is further specifically configured to input the multiple feature maps to the maximum pooling layer after splicing; down-sampling is performed through the maximum pooling layer, and the down-sampled feature maps are input To the (M+1)th convolutional layer.
  • the number of output channels of the Mth convolutional layer is greater than the number of input channels.
  • the result output module 605 is further configured to splice the feature maps output by the Tth convolutional layer and input them to the global average pooling layer to obtain a feature vector; and input the feature vector into the output layer , Outputting the classification result through the output layer, and the T is greater than or equal to the (M+1).
  • the acquisition module 601 is further configured to acquire a target data set; perform enhancement processing on the target data set through geometric transformation to obtain the data set to be trained; the geometric transformation methods include rotation transformation, flip transformation, At least one of zoom transformation and translation transformation.
  • the embodiment of the present application provides a computer device.
  • the computer device may be a server or a terminal.
  • the computer device includes a memory 701, a processor 702, and a transceiver 703. ,
  • the memory 701, the processor 702, and the transceiver 703 are connected through a bus 704.
  • the memory 701 stores computer-readable instructions that can run on the processor 702, and the processor 702 implements the steps in the image recognition method in the foregoing embodiment when the computer-readable instructions are executed, such as steps S101-S106 shown in FIG. 1; Or the processor 702 implements the method of training the recognition model in the foregoing embodiment when the processor 702 executes the computer-readable instruction, for example, the steps shown in steps S401-S403 shown in FIG.
  • the processor 702 implements the function of each module/unit in the embodiment of the image recognition device when the processor 702 executes the computer readable instruction, or the embodiment of the device for training the recognition model is implemented when the processor 702 executes the computer readable instruction In order to avoid repetition, the functions of each module/unit in, will not be repeated here.
  • one or more readable storage media storing computer readable instructions are provided.
  • the readable storage medium includes a non-volatile readable storage medium and a volatile readable storage medium.
  • the computer When the readable instruction is executed by one or more processors, when the one or more processors are executed, the computer readable instruction is executed by the processor 702 to realize the steps in the image recognition method in the above embodiment, For example, steps S101-S106 shown in FIG. 1; or, when the processor 702 executes a computer-readable instruction, the method for training a recognition model in the above embodiment is implemented, for example, the steps shown in steps S401-S403 shown in FIG. 4 are To avoid repetition, I won't repeat it here.
  • the processor 702 implements the function of each module/unit in the embodiment of the image recognition device when the processor 702 executes the computer readable instruction, or the embodiment of the device for training the recognition model is implemented when the processor 702 executes the computer readable instruction In order to avoid repetition, the functions of each module/unit in, will not be repeated here, in order to avoid repetition, will not be repeated here.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

一种图像识别、训练识别模型的方法、相关设备及存储介质,在保证图像识别率的同时,减少了占用终端的处理资源,训练识别模型的方法包括:获取待识别图像(401);将所述待识别图像输入到图像识别模型;所述图像识别模型包括T个卷积层,所述T个卷积层中的每个卷积层包括并列的N路空洞卷积,所述N路空洞卷积的卷积核的尺寸相同、权值相同、空洞率互不相同,所述N为大于或者等于2的正整数(402);通过所述图像识别模型输出识别结果(403)。

Description

图像识别、训练识别模型的方法、相关设备及存储介质
本申请以2019年9月18日提交的申请号为201910882256.2,名称为“图像识别、训练识别模型的方法、相关设备及存储介质”的中国发明专利申请为基础,并要求其优先权。
技术领域
本申请涉及计算机应用技术领域,尤其涉及图像识别、训练识别模型的方法、相关设备及存储介质。
背景技术
近年来,随着深度学习技术的高速发展,卷积神经网络在各个领域(如语义理解、图像识别等)都具有很广泛的应用。比如,在图像识别领域,所应用的卷积神经网络中比较有代表性的结构为Inception结构,Inception结构用于增加网络深度和宽度,从而提高神经网络性能,在Inception结构中,使用多个不同尺寸的卷积核来增强网络的适应力。在同一层神经网络中引入了不同大小的卷机核,例如分别使用了1*1,3*3,5*5的卷积核,这些不同卷积核的提取不同尺度的特征,以增加了特征的多样性。
传统方式中,为了提高图像的识别率,需要增大感受野,尽可能的获取更多的特征,模型中设置更大尺寸的卷积核,部署在终端的卷积神经网络模型的参数更多,模型体量更大,该图像识别模型所占用移动终端的资源越多,降低了移动终端的运行速度。
发明内容
本申请实施例提供一种训练识别模型、图像识别的方法、相关设备及存储介质,用于保证图像识别率的同时,减少占用终端的处理资源。
第一方面,本申请实施例提供了一种图像识别的方法,包括:
获取待识别图像;
将所述待识别图像输入到图像识别模型;所述图像识别模型包括T个卷积层,所述T个卷积层中的每个卷积层包括并列的N路空洞卷积,所述N路空洞卷积的卷积核的尺寸相同、权值相同、空洞率互不相同,所述N为大于或者等于2的正整数;
通过所述图像识别模型输出识别结果。
第二方面,本申请实施例提供了一种训练识别模型的方法,包括:
获取待训练数据集;
将所述待训练数据集输入到空洞卷积神经网络,所述空洞卷积神经网络包括T个卷积层,所述T个卷积层中的每个卷积层包括并列的N路空洞卷积,所述N路空洞卷积的卷积核的尺寸相同、权值相同、空洞率互不不同,所述N和T均为大于或者等于2的正整数;
在所述T个卷积层中的第M个卷积层,所述N路空洞卷积中的每路空洞卷积的卷积核进行卷积的过程中,得到多个特征映射;
将所述多个特征映射通过拼接后输入到第(M+1)个卷积层;所述第M个卷积层的卷积核与所述第(M+1)个卷积层的卷积核的权值不同;
将第T个卷积层输出的特征映射通过拼接后输入到输出层,通过所述输出层输出分类结果,所述T大于或者等于所述(M+1);
根据所述分类结果确定所述卷积神经网络的参数,得到识别模型。
第三方面,本申请实施例提供了一种图像识别的装置,包括:
获取模块,用于获取待识别图像;
图像输入模块,用于将所述获取模块获取的待识别图像输入到图像识别模型;所述图像识别模型包括T个卷积层,所述T个卷积层中的每个卷积层包括并列的N路空洞卷积,所述N路空洞卷积的卷积核的尺寸相同、权值相同、空洞率互不相同,所述N为大于或者等于2的正整数;
结果输出模块,用于输出识别结果。
第四方面,本申请实施例提供了一种训练识别模型的装置,包括:
获取模块,用于获取待识别图像;
图像输入模块,用于将所述获取模块获取的待识别图像输入到空洞卷积神经网络模型;所述空洞卷积神经网络模型包括T个卷积层,所述T个卷积层中的每个卷积层包括并列的N路空洞卷积,所述N路空洞卷积的卷积核的尺寸相同、权值相同、空洞率互不相同,所述N为大于或者等于2的正整数;
卷积模块,在所述图像输入模块输入到的所述T个卷积层中的第M个卷积层,所述N路空洞卷积中的每路空洞卷积的卷积核进行卷积的过程中,得到多个特征映射;
特征映射输入模块,用于将所述卷积模块得到的所述多个特征映射通过拼接后输入到第(M+1)个卷积层;所述第M个卷积层的卷积核与所述第(M+1)个卷积层的卷积核的权值不同;
结果输出模块,用于将第T个卷积层输出的特征映射通过拼接后输入到输出层,通过 所述输出层输出识别结果,所述T大于或者等于所述(M+1)。
第五方面,本申请实施例提供了一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时执行上述第一方面或上述第二方面所述的方法。
第六方面,本申请实施例提供了一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行上述第一方面或上述第二方面所述的方法。
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。
图1是本申请实施例中一种训练识别模型的方法的一个实施例的步骤流程图;
图2是本申请实施例中不同空洞率的卷积核的示意图;
图3是本申请实施例中空洞卷积神经网络的结构示意图;
图4是本申请实施例中一种图像识别的方法的一个实施例的步骤流程图;
图5是本申请实施例中一种图像识别的装置的一个实施例的结构示意图;
图6是本申请实施例中一种训练识别模型的装置的一个实施例的结构示意图;
图7是本申请实施例中一种计算机设备的一个实施例的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不 必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
本申请实施例提供了一种图像识别的方法,该方法可以应用于终端,该终端可以包括但不限定于手机,平板电脑等,本申请实施例中,首先获取待识别图像;然后将所述待识别图像输入到图像识别模型,本申请实施例中的图像识别模型包括T个卷积层,该T个卷积层中的每个卷积层包括并列的N路空洞卷积,所述N路空洞卷积的卷积核的尺寸相同、权值相同、空洞率互不不同;本申请实施例中同一个卷积层的原始卷积核的大小相同,例如在同一个卷积层的N路的空洞卷积的原始卷积核的大小都可以为3*3,且N路的空洞卷积的卷积核的权值相同,N路的卷积核的数量相同。但是N路空洞卷积的空洞率互不相同,获取N路不同感受野的特征映射(featuremap),在同一层就可以提取不同尺度的特征,扩大感受野,既保证了图像识别的准确率,而且N路空洞卷积共享卷积核的权值,不需要像传统方法中增加不同尺寸的卷积核来扩大感受野,有效的减少图像识别模型的体量,有效减少图像识别模型部署在移动终端所占用的资源,提高移动终端的运行速度。
本申请实施例中包括两个部分的方法步骤,其中,第一部分为训练识别模型,第二部分为通过该识别模型对图像进行识别。在第一部分中,训练识别模型的方法的执行主体可以为服务器,也可以为终端,该终端包括但不限定于各种个人计算机、笔记本电脑等。该服务器为独立的服务器或者是多个服务器组成的服务器集群来实现。训练识别模型的方法可以是线上训练也可以是线下训练,具体的并不限定。训练识别模型的执行主体可以与图像识别的执行主体相同,也可以不同。本申请实施例中,该训练识别模型的方法的执行主体可以以服务器为例进行说明,图像识别的方法的执行主体可以以终端为例进行说明。
首先,对训练识别模型的方法进行说明:
请参阅图1所示,本申请实施例提供了一种训练识别模型的方法,该方法可以具体包括如下步骤。
S101:获取待训练数据集。
该待训练数据集可以是图像数据集,文本数据集等,该待训练数据集可以根据具体的应用场景获取不同类别的待训练数据集。本申请中,该待训练的数据集可以以图像数据集为例进行说明。
可选的,可以通过爬虫在互联网上爬取目标数据集。例如,该目标数据集可以为动物图像集,为了丰富待训练数据集,更好的提取特征,泛化模型防止模型出现过拟合,需要输入充足的数据量。可以对待训练数据集进行数据增强处理来获得更多的待训练数据。
具体的,对所述目标数据集通过几何变换方式进行增强处理,得到所述待训练数据集; 所述几何变换方式包括旋转变换、翻转变换、缩放变换和平移变换中的至少一种。其中,旋转变换可以理解为将图像随机的旋转一定的角度,使图像里的目标物体的朝向发生变化;翻转变换可以理解为沿着水平或者垂直方向翻转图像;缩放变换可以理解为按照一定的比例放大或者缩小图像;平移变换可以理解为将图像按照某种方式进行平移,达到改变目标位置的目的,平移方向和距离可以预先设定,也可以随机生成。本实施例中,通过上述数据增强处理,待训练数据集可以包括充足的训练样本。
S102:将所述待训练数据集输入到空洞卷积神经网络,所述空洞卷积神经网络包括T个卷积层,所述T个卷积层中的每个卷积层包括并列的N路空洞卷积,所述N路空洞卷积的原始卷积核的尺寸相同、权值相同、空洞率互不相同,所述N为大于或者等于2的正整数;T为大于或者等于2的正整数。
卷积层:通过卷积核在特征图上以滑窗的形式反复进行卷积运算。请参阅图2进行理解,空洞卷积可以从两个角度来理解:1、从输入图像的角度来理解,空洞可以理解为在输入图像上做采样。采样的频率是根据空洞率(以“rate”表示)来设置的,当空洞率为1时,输入图像不丢失任何信息采样,此时卷积操作就是标准的卷积操作;当空洞率大于1时,例如,空洞率为2,可以理解为在输入图像上每隔一(rate-1)个像素采样,卷积核中的元素为在输入图像上的采样点,然后将采样后的图像与卷积核做卷积,从而增大了感受野。2、从卷积核的角度来理解,在卷积核中注入空洞(即0),注入空洞之后的卷积核可以称为“膨胀卷积核”,以大小为3*3卷积核为例,当空洞率为1时,则在原始卷积核中没有占用的区域填充注入(rate-1)个0,卷积的感受野的大小不发生变化;若空洞率为2,则在卷积核中相邻的元素中间注入(空洞率-1)个0,也就是在原始卷积核中相邻的元素之间注入1个0,膨胀卷积核的尺寸相对于原始卷积核的尺寸变大了,该膨胀卷积核的尺寸变为5*5,则该膨胀卷积核的感受野变为5*5,从而增大了感受野。
请参阅图3进行理解,该空洞卷积神经网络包括输入层,隐藏层和输出层,隐藏层包括T个卷积层,T为大于或者等于2的正整数,T可以为5、6、7等,具体的数量并不限定,在一个应用实例中,该T可以以7为例进行说明。该隐藏层包括7个卷积层。每一个卷积层均包括了并列的N路空洞卷积,该N可以为大于2、3、4等等,可选的,该N可以以3路为例进行说明。以T个卷积层中的一个卷积层为例进行说明,例如第M个卷积层,可以参阅图3进行理解,在一个卷积层中,包括了3路空洞卷积,3路空洞卷积的卷积核尺寸(或称为“大小”)相同,例如,该3路空洞卷积中的每路空洞卷积的原始卷积核的大小为3*3或5*5等,本申请中并不限定。本申请中,该原始卷积核的大小均以3*3为例进行说明。
膨胀卷积核的尺寸计算公式如下:
ksize 1=(rate-1)*(ksize 0-1)+ksize 0;其中,ksize 0为原始卷积核的大小(如ksize 0=3),rate为空洞率。
感受野:视觉感受区域的大小。在卷积神经网络中,感受野为卷积神经网络每一层输出的特征映射(feature map)上的像素点在原始图像上映射的区域的大小。感受野的大小代表了提取的特征包含的信息的多少,感受野越大所包含的上下文信息越多。需要说明的是,第一层卷积层的输出特征图像素的感受野的大小等于滤波器的大小;深层卷积层的感受野大小和它之前所有层的卷积核大小和步长有关系。
感受野的计算公式如下:
r=(m-1)*stride+ksize 1;其中,r为本层感受野,m为上层感受野,stride为卷积步长,ksize 1为膨胀卷积核大小。初始感受野为1。
为了方便说明,本申请实施例中,卷积步长可以以1为例进行说明。卷积层以7层空洞卷积中的第一层空洞卷积为例进行说明,不同的空洞率下的膨胀卷积核的大小,及感受野的大小。
按照上述公式,计算第一层卷积层的膨胀卷积核的大小和感受野的大小。
第一层卷积层,3路不同空洞率的膨胀卷积核的大小及感受野请参阅下表1所示:
表1
第一层 原始卷积核的尺寸 空洞率 膨胀卷积核的尺寸 感受野
第一路 3*3 1 3*3 3*3
第二路 3*3 2 5*5 5*5
第三路 3*3 4 9*9 9*9
本申请中同一个卷积层的原始卷积核的大小相同,即在同一个卷积层的三路的空洞卷积的原始卷积核的大小都可以为3*3,且三路的空洞卷积的卷积核的权值相同,3路的卷积核的数量相同。例如,该卷积核的数量可以为16的倍数个,如16、32、64个等。但是3路空洞卷积的空洞率不同,同样以3路为例,空洞率可以分别为:第一路空洞卷积的空洞率为1,第二路空洞卷积的空洞率为2,第三路空洞卷积的空洞率为4。即可以理解为在同一个卷积层中,3路空洞卷积的空洞率互不相同,获取3路不同感受野的特征映射(featuremap),在同一层就可以提取不同尺度的特征;3路空洞卷积共享卷积核的权值,使用了较少的权值参数量,可以有效的减少模型参数,降低过拟合的可能性,且可以加快 计算速度,利于大规模网络的搭建和学习。
S103:在所述T个卷积层中的第M个卷积层,所述N路空洞卷积中的每路空洞卷积的卷积核进行卷积的过程中,得到多路特征映射;
该M取遍从1至(T-1)中的每一个数值,为了方便说明,该第M层以第一层为例进行说明。
在第一个卷积层中,包括了并列的3路,且3路的空洞率不同,3路卷积核的权值相同。
例如,该原始卷积核的权值可以为:【0,1,0】
                                【1 4 1】
                                【0 1 0】。
可以理解的是,在同一个卷积层,3路使用同一个卷积核进行卷积,即3路中的每一路所使用的卷积核的原始大小相同,权值相同,3路共享参数。在同一个卷积层,可以获取3路不同尺度的特征,得到3路featuremaps;或者也可以理解为,同一种类型的特征是在同一个featuremap的不同采样率下所捕获。
S104:将所述多路特征映射通过拼接后输入到第(M+1)层卷积层;所述第M层卷积层的卷积核与所述第(M+1)层卷积层的卷积核的权值不同。
例如,将从第一层获得的3路特征映射进行拼接,将拼接后的特征映射输入到第二层卷积层。
为了方便说明,本实施例中,该M层以第一层卷积层为例,第(M+1)层以第2层为例进行说明。按照上述公式1和公式2,计算在第二层卷积层,3路不同空洞率的膨胀卷积核的大小及感受野请参阅下表2所示:
表2
第二层 膨胀卷积核的尺寸 上一层感受野 本层感受野
第一路 3*3 3*3 5*5
第二路 5*5 5*5 9*9
第三路 9*9 9*9 17*17
需要说明的是,本申请实施例中,该第M层和第(M+1)层的空洞率可以相同,也可以不同,本实施例中,上述表2中的感受野的计算是以第M层和第(M+1)层的空洞率相同来计算的,也就是说,在第二个空洞卷积层3路的空洞率也是1、2、4。
从上表1和上表2可以看出,本申请实施例中,第一层可以获取3路featuremap,且感受野分别为3*3、5*5和9*9,第二层获得的3路featuremap,且感受野分别为5*5、9*9和17*17,在同一个层就可以获取不同尺度的特征,且在每一层维护的参数相同;在现有技术中,为了获取不同尺度的特征,学习更丰富的区域特征,第一路卷积核的大小为1*1,第二路卷积核的大小3*3,第三路卷积核的大小为5*5,每增加一路就要多维护一路的卷积核参数,相对于现有技术,本申请中极大的减少参数量。
S105:将第T个卷积层输出的N路特征映射通过拼接后输入到输出层,通过所述输出层输出分类结果,所述T大于或者等于所述(M+1)。
进一步的,将所述第T个卷积层输出的特征映射进行拼接后输入到全局平均池化层,得到特征向量;
将所述特征向量输入所述输出层,通过所述输出层输出所述分类结果。
本示例中,T可以以7为例进行说明,第T个卷积层同样输出的3路特征映射,同样的将该3路特征映射进行拼接后,经过激活层,然后输入到全局平均池化层。
从第1个卷积层到第7个卷积层,7个卷积层一共降采样了7次。Feature map的长宽足够小,比较适合直接过全局平均池化(global average pooling)。全局平均池化是指将最后一层的特征图进行整张图的一个均值池化,形成一个特征点,将这些特征点组成最后的特征向量,进行softmax计算。例如,最后的一层的数据是10个6*6的特征图,global average pooling是将每一张特征图计算所有像素点的均值,输出一个数据值,这样10个特征图就会输出10个数据点,将这些数据点组成一个1*10的向量,就成为一个特征向量,将该特征向量输入到softmax的分类中计算,得到分类结果。
需要说明的是,本申请实施例中在每一个卷积层后连接激活层,激活层是由激活函数构成的,本申请实施例中的激活函数为整流线性单元函数(retified linear unit,ReLU)。该ReLU可以为leaky-relu函数、p-relu函数或r-relu函数。
S106:根据所述分类结果确定所述空洞卷积神经网络的参数,得到识别模型。
通过softmax回归进行图像的分类,通过反向传播算法对该空洞卷积神经网络的参数进行训练,将训练样本集输入到该空洞卷积神经网络计算出分类结果,通过损失函数评价分类结果与预期值之间的误差值,然后不断的通过反向传播误差和梯度更新参数权值,完成对该空洞卷积神经网络的训练,得到识别模型。
反向传播算法的第一步首先是进行前向传播,计算最后的损失函数值,然后第二步为根据反向传播算法,求出所有的参数梯度,结合梯度下降方法便可对参数进行不断优化,最终对全空洞卷积神经网络完成训练。以梯度下降算法为例,算法框架如下:
输入:待训练样本集,例如,该带训练样本集中包括多个图像样本,卷积神经网络模型的网络结构及网络层数,网络中各层网络结构,及激活函数。梯度下降算法中设置迭代步长,及最大迭代次数,和停止迭代阈值,初始的空洞率。
输出:全空洞卷积神经网络各层参数值。
本申请实施例中,该各层参数值包括卷积层中每一层中的卷积核的权值,在同一个层中,N路共享权值,不同的卷积层的卷积核的权值互不相同。
可选的,所述将所述多个特征映射通过拼接后输入到第(M+1)个卷积层可以具体包括:
将所述多个特征映射通过拼接后输入到最大池化层;
通过所述最大池化层进行下采样,将下采样后的特征映射输入到第(M+1)个卷积层。
池化层的作用可以有两点,第一点是进行数据降维,特征图尺寸压缩后复杂度减小,而且卷积层和激活层等需要的计算度也减小,可以加速网络的训练;第二点是提取重要信息从而抑制数据噪声。其中,本实施例中,通过三路获取不同尺度的特征后,通过最大池化利于提取关键信息,如关键点提取等。例如,池化窗口为2*2,且池化步长为2,其中最大池化为将2*2区域映射为该区域的最大值,该池化方法能够更好的保留纹理上的特征,因为把上层提取出的特征最明显的取值取代为该窗口的值。
本实施例中,上一个卷积层输出的特征图经过最大池化后,即对数据进行了下采样,同时提取图像的关键点的特征。然后将下采样后的特征映射(也特征图)输入到下一个卷积层。
进一步的,所述T层卷积层中每个卷积层输出通道数大于输入通道数。
若输入的图像为灰度图像,则为通道为1的图像,而实际上,大多数的图像都是3通道的RGB图像,为了提取更多的特征图,卷积层通常设置更多的通道(channel)。本申请实施例中,从第一层卷积层到第T层卷积层输出通道数是逐渐增多的。可选的,本申请实施例中每个卷积层的输出通道数可以为输入通道数的两倍或4倍。
Feature map是形如通道数*宽*高的张量。feature map层数越低,其中的特征越低级,相应的特征的形式也就越少,比如最底层的featuremap提取的是纹理,边缘等图像信息。而高级的特征就具备大量的语义信息,因此比较丰富,越到底层就需要更多的通道数。2的倍数比较符合GPU的硬件设计,计算速度会更快。
本申请实施例提供了一种训练识别模型的方法,该方法应用于一种计算机设备,该计算机设备可以为服务器,也可以为终端设备,具体的并不限定,该终端包括但不限定于各种个人计算机、笔记本电脑等。该服务器为独立的服务器或者是多个服务器组成的服务器 集群来实现。本申请实施例中,该方法的执行主体可以以服务器为例进行说明,首先,服务器获取待训练数据集;然后将所述待训练数据集输入到空洞卷积神经网络,本申请实施例中的空洞卷积神经网络包括T个卷积层,该T个卷积层中的每个卷积层包括并列的N路空洞卷积,所述N路空洞卷积的卷积核的尺寸相同、权值相同、空洞率互不不同;本申请实施例中同一个卷积层的原始卷积核的大小相同,例如在同一个卷积层的N路的空洞卷积的原始卷积核的大小都可以为3*3,且N路的空洞卷积的卷积核的权值相同,N路的卷积核的数量相同。但是N路空洞卷积的空洞率互不相同,获取N路不同感受野的特征映射(featuremap),在同一层就可以提取不同尺度的特征;然后,在所述T个卷积层中的第M个卷积层,所述N路空洞卷积中的每路空洞卷积的卷积核进行卷积的过程中,得到多个特征映射;将所述多个特征映射通过拼接后输入到第(M+1)个卷积层;所述第M个卷积层的卷积核与所述第(M+1)个卷积层的卷积核的权值不同;直到将将第T个卷积层输出的特征映射通过拼接后输入到输出层,通过所述输出层输出分类结果;根据所述分类结果确定所述卷积神经网络的参数,得到识别模型。本申请实施例中,N路空洞卷积共享卷积核的权值,使用了较少的参数量,可以有效的减少模型参数,降低过拟合的可能性,且可以提高计算速度。
上面对训练识别模型的方法进行了说明,以下对图像识别的方法进行说明:
请参阅图4进行理解,图4为本申请实施例中一种图像识别的方法的一个实施例的步骤流程图。该一种图像识别的方法具体可以包括如下步骤:
S401:获取待识别图像;
待识别图像的尺寸并不限定,该待识别图像可以为一个图像序列,该图像序列中可以包括尺寸不同的图像,例如,在移动终端,通过移动终端中的摄像头获取到较远景物的图像,则图像的尺寸较小,获取到较近的景物图像,该景物图像的尺寸较大。
S402:将所述待识别图像输入到上述实施例中所述的识别模型。
所述识别模型包括T个卷积层,所述T个卷积层中的每个卷积层包括并列的N路空洞卷积,所述N路空洞卷积的卷积核的尺寸相同、权值相同、空洞率互不同,所述N为大于或者等于2的正整数;
该识别模型为通过上述训练识别模型的方法实施例中所训练的的识别模型,本步骤中对该识别模型的理解,请参考上述实施例中步骤102,此处不赘述。
S403:通过所述图像识别模型输出识别结果。
具体的,在所述T个卷积层中的第M个卷积层,所述N路空洞卷积中的每路空洞卷积的卷积核进行卷积的过程中,得到多个特征映射。本步骤请参考上述实施例中步骤103 进行理解,此处不赘述。
将所述多个特征映射通过拼接后输入到第(M+1)个卷积层;所述第M个卷积层的卷积核与所述第(M+1)个卷积层的卷积核的权值不同。本步骤请参考上述实施例中步骤104进行理解,此处不赘述。
将第T个卷积层输出的特征映射通过拼接后输入到输出层,通过所述输出层输出识别结果。本步骤请参考上述实施例中步骤105进行理解,此处不赘述。
本申请实施例中,首先获取待识别图像;然后将所述待识别图像输入到图像识别模型,本申请实施例中的图像识别模型包括T个卷积层,该T个卷积层中的每个卷积层包括并列的N路空洞卷积,所述N路空洞卷积的卷积核的尺寸相同、权值相同、空洞率互不不同;本申请实施例中同一个卷积层的原始卷积核的大小相同,例如在同一个卷积层的N路的空洞卷积的原始卷积核的大小都可以为3*3,且N路的空洞卷积的卷积核的权值相同,N路的卷积核的数量相同。但是N路空洞卷积的空洞率互不相同,获取N路不同感受野的特征映射(featuremap),在同一层就可以提取不同尺度的特征,扩大感受野,既保证了图像识别的准确率,而且N路空洞卷积共享卷积核的权值,不需要像传统方法中增加不同尺寸的卷积核来扩大感受野,有效的减少图像识别模型的体量,保证识别图像的识别准确率的同时,有效减少图像识别模型部署在移动终端所占用的资源,提高移动终端的运行速度。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在一实施例中,提供一种图像识别的装置500,该图像识别的装置与上述实施例中图像识别的方法相对应。如图5所示,图像识别的装置500具体可以包括:
获取模块501,用于获取待识别图像;
图像输入模块502,用于将所述获取模块501获取的待识别图像输入到图像识别模型;所述图像识别模型包括T个卷积层,所述T个卷积层中的每个卷积层包括并列的N路空洞卷积,所述N路空洞卷积的卷积核的尺寸相同、权值相同、空洞率互不相同,所述N为大于或者等于2的正整数;
结果输出模块503,用于通过图像输入模块502输入到的图像识别模型输出识别结果。
上述图像识别的装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一实施例中,提供一种训练识别模型的装置,该训练识别模型的装置与上述实施例中训练识别模型的方法相对应。如图6所示,训练识别模型的装置600具体可以包括:
获取模块601,用于获取待识别图像;
图像输入模块602,用于将所述获取模块601获取的待识别图像输入到空洞卷积神经网络模型;所述空洞卷积神经网络模型包括T个卷积层,所述T个卷积层中的每个卷积层包括并列的N路空洞卷积,所述N路空洞卷积的卷积核的尺寸相同、权值相同、空洞率互不相同,所述N为大于或者等于2的正整数;
卷积模块603,在所述图像输入模块602输入到的所述T个卷积层中的第M个卷积层,所述N路空洞卷积中的每路空洞卷积的卷积核进行卷积的过程中,得到多个特征映射;
特征映射输入模块604,用于将所述卷积模块603得到的所述多个特征映射通过拼接后输入到第(M+1)个卷积层;所述第M个卷积层的卷积核与所述第(M+1)个卷积层的卷积核的权值不同;
结果输出模块605,用于将第T个卷积层输出的特征映射通过拼接后通过特征映射输入模块604输入到输出层,通过所述输出层输出识别结果,所述T大于或者等于所述(M+1)。
可选的,特征映射输入模块604,还具体用于将所述多个特征映射通过拼接后输入到最大池化层;通过所述最大池化层进行下采样,将下采样后的特征映射输入到第(M+1)个卷积层。
可选的,所述第M个卷积层输出通道数大于输入通道数。
可选的,结果输出模块605,还用于将所述第T个卷积层输出的特征映射进行拼接后输入到全局平均池化层,得到特征向量;将所述特征向量输入所述输出层,通过所述输出层输出所述分类结果,所述T大于或者等于所述(M+1)。
可选的,获取模块601,还用于获取目标数据集;对所述目标数据集通过几何变换方式进行增强处理,得到所述待训练数据集;所述几何变换方式包括旋转变换、翻转变换、缩放变换和平移变换中的至少一种。
在一个实施例中,请参阅图7所示,本申请实施例提供了一种计算机设备,该计算机设备可以为服务器,也可以为终端,该计算机设备包括存储器701、处理器702及收发器703,存储器701、处理器702和收发器703通过总线704连接。
存储器701中存储可在处理器702上运行的计算机可读指令,处理器702执行计算机可读指令时实现上述实施例中图像识别的方法中的步骤,例如图1所示的步骤S101-S106;或者处理器702执行计算机可读指令时实现上述实施例中训练识别模型的方法,例如图4所示的步骤S401-S403中所示的步骤,为避免重复,这里不再赘述。或者,处理器702执 行计算机可读指令时实现图像识别的装置这一实施例中的各模块/单元的功能,或者,处理器702执行计算机可读指令时实现训练识别模型的装置这一实施例中的各模块/单元的功能,为避免重复,这里不再赘述。
在一实施例中,提供一个或多个存储有计算机可读指令的可读存储介质,所述可读存储介质包括非易失性可读存储介质和易失性可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时,该计算机可读指令被处理器702执行时实现实现上述实施例中图像识别的方法中的步骤,例如图1所示的步骤S101-S106;或者,处理器702执行计算机可读指令时实现上述实施例中训练识别模型的方法,例如图4所示的步骤S401-S403中所示的步骤,为避免重复,这里不再赘述。或者,处理器702执行计算机可读指令时实现图像识别的装置这一实施例中的各模块/单元的功能,或者,处理器702执行计算机可读指令时实现训练识别模型的装置这一实施例中的各模块/单元的功能,为避免重复,这里不再赘述,为避免重复,这里不再赘述。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者 替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种图像识别的方法,其特征在于,包括:
    获取待识别图像;
    将所述待识别图像输入到图像识别模型;所述图像识别模型包括T个卷积层,所述T个卷积层中的每个卷积层包括并列的N路空洞卷积,所述N路空洞卷积的卷积核的尺寸相同、权值相同、空洞率互不相同,所述N为大于或者等于2的正整数;
    通过所述图像识别模型输出识别结果。
  2. 一种训练识别模型的方法,其特征在于,包括:
    获取待训练数据集;
    将所述待训练数据集输入到空洞卷积神经网络,所述空洞卷积神经网络包括T个卷积层,所述T个卷积层中的每个卷积层包括并列的N路空洞卷积,所述N路空洞卷积的卷积核的尺寸相同、权值相同、空洞率互不不同,所述N和T均为大于或者等于2的正整数;
    在所述T个卷积层中的第M个卷积层,所述N路空洞卷积中的每路空洞卷积的卷积核进行卷积的过程中,得到多个特征映射;
    将所述多个特征映射通过拼接后输入到第(M+1)个卷积层;所述第M个卷积层的卷积核与所述第(M+1)个卷积层的卷积核的权值不同;
    将第T个卷积层输出的特征映射通过拼接后输入到输出层,通过所述输出层输出分类结果,所述T大于或者等于所述(M+1);
    根据所述分类结果确定所述卷积神经网络的参数,得到识别模型。
  3. 如权利要求2所述的方法,其特征在于,所述将所述多个特征映射通过拼接后输入到第(M+1)个卷积层,包括:
    将所述多个特征映射通过拼接后输入到最大池化层;
    通过所述最大池化层进行下采样,将下采样后的特征映射输入到第(M+1)个卷积层。
  4. 如权利要求2所述的方法,其特征在于,所述第M个卷积层输出通道数大于输入通道数。
  5. 如权利要求2所述的方法,其特征在于,所述将第T个卷积层输出的特征映射通过拼接后输入到输出层,包括:
    将所述第T个卷积层输出的特征映射进行拼接后输入到全局平均池化层,得到特征向量;
    将所述特征向量输入所述输出层,通过所述输出层输出所述分类结果,所述T大于或 者等于所述(M+1)。
  6. 根据权利要求2-5中任一项所述的方法,其特征在于,所述获取待训练数据集,包括:
    获取目标数据集;
    对所述目标数据集通过几何变换方式进行增强处理,得到所述待训练数据集;所述几何变换方式包括旋转变换、翻转变换、缩放变换和平移变换中的至少一种。
  7. 一种图像识别的装置,其特征在于,包括:
    获取模块,用于获取待识别图像;
    图像输入模块,用于将所述获取模块获取的待识别图像输入到图像识别模型;所述图像识别模型包括T个卷积层,所述T个卷积层中的每个卷积层包括并列的N路空洞卷积,所述N路空洞卷积的卷积核的尺寸相同、权值相同、空洞率互不相同,所述N为大于或者等于2的正整数;
    结果输出模块,用于输出识别结果。
  8. 一种训练识别模型的装置,其特征在于,包括:
    获取模块,用于获取待训练数据集;
    训练数据输入模块,用于将所述获取模块获取的所述待训练数据集输入到空洞卷积神经网络,所述空洞卷积神经网络包括T个卷积层,所述T个卷积层中的每个卷积层包括并列的N路空洞卷积,所述N路空洞卷积的卷积核的尺寸相同、权值相同、空洞率不同,所述N为大于或者等于2的正整数;
    卷积模块,用于所述在训练数据输入模块输入到的所述T个卷积层中的第M个卷积层,所述N路空洞卷积中的每路空洞卷积的卷积核进行卷积的过程中,得到多个特征映射;
    特征映射输入模块,用于将所述卷积模块得到的所述多个特征映射通过拼接后输入到第(M+1)个卷积层;所述第M个卷积层的卷积核与所述第(M+1)个卷积层的卷积核的权值不同;
    结果输出模块,用于将第T个卷积层输出的特征映射通过拼接后输入到输出层,通过所述输出层输出分类结果,所述T大于或者等于所述(M+1);
    确定模块,根据所述输出模块输出的所述分类结果确定所述卷积神经网络的参数,得到空洞卷积神经网络模型。
  9. 根据权利要求8所述的训练识别模型的装置,其特征在于,所述特征映射输入模块具体用于:
    将所述多个特征映射通过拼接后输入到最大池化层;
    通过所述最大池化层进行下采样,将下采样后的特征映射输入到第(M+1)个卷积层。
  10. 根据权利要求8所述的训练识别模型的装置,其特征在于,所述结果输出模块具体用于:
    将所述第T个卷积层输出的特征映射进行拼接后输入到全局平均池化层,得到特征向量;
    将所述特征向量输入所述输出层,通过所述输出层输出所述分类结果,所述T大于或者等于所述(M+1)。
  11. 一种计算机设备,其特征在于,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取待识别图像;
    将所述待识别图像输入到图像识别模型;所述图像识别模型包括T个卷积层,所述T个卷积层中的每个卷积层包括并列的N路空洞卷积,所述N路空洞卷积的卷积核的尺寸相同、权值相同、空洞率互不相同,所述N为大于或者等于2的正整数;
    通过所述图像识别模型输出识别结果。
  12. 一种计算机设备,其特征在于,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取待训练数据集;
    将所述待训练数据集输入到空洞卷积神经网络,所述空洞卷积神经网络包括T个卷积层,所述T个卷积层中的每个卷积层包括并列的N路空洞卷积,所述N路空洞卷积的卷积核的尺寸相同、权值相同、空洞率互不不同,所述N和T均为大于或者等于2的正整数;
    在所述T个卷积层中的第M个卷积层,所述N路空洞卷积中的每路空洞卷积的卷积核进行卷积的过程中,得到多个特征映射;
    将所述多个特征映射通过拼接后输入到第(M+1)个卷积层;所述第M个卷积层的卷积核与所述第(M+1)个卷积层的卷积核的权值不同;
    将第T个卷积层输出的特征映射通过拼接后输入到输出层,通过所述输出层输出分类结果,所述T大于或者等于所述(M+1);
    根据所述分类结果确定所述卷积神经网络的参数,得到识别模型。
  13. 根据权利要求12所述的计算机设备,其特征在于,所述将所述多个特征映射通过拼接后输入到第(M+1)个卷积层,包括:
    将所述多个特征映射通过拼接后输入到最大池化层;
    通过所述最大池化层进行下采样,将下采样后的特征映射输入到第(M+1)个卷积层。
  14. 根据权利要求12所述的计算机设备,其特征在于,所述将第T个卷积层输出的特征映射通过拼接后输入到输出层,包括:
    将所述第T个卷积层输出的特征映射进行拼接后输入到全局平均池化层,得到特征向量;
    将所述特征向量输入所述输出层,通过所述输出层输出所述分类结果,所述T大于或者等于所述(M+1)。
  15. 根据权利要求12-14中任一项所述的计算机设备,其特征在于,所述获取待训练数据集,包括:
    获取目标数据集;
    对所述目标数据集通过几何变换方式进行增强处理,得到所述待训练数据集;所述几何变换方式包括旋转变换、翻转变换、缩放变换和平移变换中的至少一种。
  16. 一个或多个存储有计算机可读指令的可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
    获取待识别图像;
    将所述待识别图像输入到图像识别模型;所述图像识别模型包括T个卷积层,所述T个卷积层中的每个卷积层包括并列的N路空洞卷积,所述N路空洞卷积的卷积核的尺寸相同、权值相同、空洞率互不相同,所述N为大于或者等于2的正整数;
    通过所述图像识别模型输出识别结果。
  17. 一个或多个存储有计算机可读指令的可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
    获取待训练数据集;
    将所述待训练数据集输入到空洞卷积神经网络,所述空洞卷积神经网络包括T个卷积层,所述T个卷积层中的每个卷积层包括并列的N路空洞卷积,所述N路空洞卷积的卷积核的尺寸相同、权值相同、空洞率互不不同,所述N和T均为大于或者等于2的正整数;
    在所述T个卷积层中的第M个卷积层,所述N路空洞卷积中的每路空洞卷积的卷积核进行卷积的过程中,得到多个特征映射;
    将所述多个特征映射通过拼接后输入到第(M+1)个卷积层;所述第M个卷积层的卷积核与所述第(M+1)个卷积层的卷积核的权值不同;
    将第T个卷积层输出的特征映射通过拼接后输入到输出层,通过所述输出层输出分类 结果,所述T大于或者等于所述(M+1);
    根据所述分类结果确定所述卷积神经网络的参数,得到识别模型。
  18. 根据权利要求17所述的可读存储介质,其特征在于,所述将所述多个特征映射通过拼接后输入到第(M+1)个卷积层,包括:
    将所述多个特征映射通过拼接后输入到最大池化层;
    通过所述最大池化层进行下采样,将下采样后的特征映射输入到第(M+1)个卷积层。
  19. 根据权利要求17所述的可读存储介质,其特征在于,所述将第T个卷积层输出的特征映射通过拼接后输入到输出层,包括:
    将所述第T个卷积层输出的特征映射进行拼接后输入到全局平均池化层,得到特征向量;
    将所述特征向量输入所述输出层,通过所述输出层输出所述分类结果,所述T大于或者等于所述(M+1)。
  20. 根据权利要求17-19中任一项所述的可读存储介质,其特征在于,所述获取待训练数据集,包括:
    获取目标数据集;
    对所述目标数据集通过几何变换方式进行增强处理,得到所述待训练数据集;所述几何变换方式包括旋转变换、翻转变换、缩放变换和平移变换中的至少一种。
PCT/CN2019/116943 2019-09-18 2019-11-11 图像识别、训练识别模型的方法、相关设备及存储介质 WO2021051520A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910882256.2A CN110796162B (zh) 2019-09-18 2019-09-18 图像识别、训练识别模型的方法、相关设备及存储介质
CN201910882256.2 2019-09-18

Publications (1)

Publication Number Publication Date
WO2021051520A1 true WO2021051520A1 (zh) 2021-03-25

Family

ID=69427288

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/116943 WO2021051520A1 (zh) 2019-09-18 2019-11-11 图像识别、训练识别模型的方法、相关设备及存储介质

Country Status (2)

Country Link
CN (1) CN110796162B (zh)
WO (1) WO2021051520A1 (zh)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113537385A (zh) * 2021-08-01 2021-10-22 程文云 一种基于tx2设备的电力复合绝缘子憎水性分类方法
CN113538581A (zh) * 2021-07-19 2021-10-22 之江实验室 一种基于图注意力时空卷积的3d姿态估计方法
CN113611315A (zh) * 2021-08-03 2021-11-05 南开大学 基于轻量化卷积神经网络的声纹识别方法和装置
CN113610153A (zh) * 2021-08-06 2021-11-05 长沙理工大学 人体红外图像识别方法、装置、计算机设备及存储介质
CN113687326A (zh) * 2021-07-13 2021-11-23 广州杰赛科技股份有限公司 一种车载雷达回波降噪方法、装置、设备及介质
CN113936028A (zh) * 2021-10-19 2022-01-14 深圳市金视电子科技有限公司 结合自动产生三元图与深层空洞卷积网络之数字抠像技术
CN114299305A (zh) * 2021-12-30 2022-04-08 安徽理工大学 聚合密集和注意力多尺度特征的显著性目标检测算法
CN114840938A (zh) * 2022-04-22 2022-08-02 武汉理工大学 滚动轴承故障诊断方法、装置、电子设备以及存储介质
CN115223017A (zh) * 2022-05-31 2022-10-21 昆明理工大学 一种基于深度可分离卷积的多尺度特征融合桥梁检测方法
CN115273060A (zh) * 2022-08-18 2022-11-01 杭州朗阳科技有限公司 适用于边缘设备的神经网络模型、图像识别方法及装置

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368941B (zh) * 2020-04-10 2023-09-01 浙江大华技术股份有限公司 一种图像处理方法、装置以及计算机存储介质
CN113284104A (zh) * 2021-05-21 2021-08-20 哈尔滨理工大学 基于深度学习的医学图像预处理方法
CN113420579B (zh) * 2021-06-29 2023-05-26 北大方正集团有限公司 标识码位置定位模型的训练及定位方法、装置、电子设备
CN115294644B (zh) * 2022-06-24 2024-07-02 北京昭衍新药研究中心股份有限公司 一种基于3d卷积参数重构的快速猴子行为识别方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986124A (zh) * 2018-06-20 2018-12-11 天津大学 结合多尺度特征卷积神经网络视网膜血管图像分割方法
US20190220746A1 (en) * 2017-08-29 2019-07-18 Boe Technology Group Co., Ltd. Image processing method, image processing device, and training method of neural network
CN110120020A (zh) * 2019-04-30 2019-08-13 西北工业大学 一种基于多尺度空洞残差注意力网络的sar图像去噪方法
CN110210497A (zh) * 2019-05-27 2019-09-06 华南理工大学 一种鲁棒实时焊缝特征检测方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107784654B (zh) * 2016-08-26 2020-09-25 杭州海康威视数字技术股份有限公司 图像分割方法、装置及全卷积网络系统
CN107679477B (zh) * 2017-09-27 2021-02-02 深圳市未来媒体技术研究院 基于空洞卷积神经网络的人脸深度和表面法向量预测方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190220746A1 (en) * 2017-08-29 2019-07-18 Boe Technology Group Co., Ltd. Image processing method, image processing device, and training method of neural network
CN108986124A (zh) * 2018-06-20 2018-12-11 天津大学 结合多尺度特征卷积神经网络视网膜血管图像分割方法
CN110120020A (zh) * 2019-04-30 2019-08-13 西北工业大学 一种基于多尺度空洞残差注意力网络的sar图像去噪方法
CN110210497A (zh) * 2019-05-27 2019-09-06 华南理工大学 一种鲁棒实时焊缝特征检测方法

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113687326A (zh) * 2021-07-13 2021-11-23 广州杰赛科技股份有限公司 一种车载雷达回波降噪方法、装置、设备及介质
CN113687326B (zh) * 2021-07-13 2024-01-05 广州杰赛科技股份有限公司 一种车载雷达回波降噪方法、装置、设备及介质
CN113538581A (zh) * 2021-07-19 2021-10-22 之江实验室 一种基于图注意力时空卷积的3d姿态估计方法
CN113538581B (zh) * 2021-07-19 2024-03-12 之江实验室 一种基于图注意力时空卷积的3d姿态估计方法
CN113537385A (zh) * 2021-08-01 2021-10-22 程文云 一种基于tx2设备的电力复合绝缘子憎水性分类方法
CN113537385B (zh) * 2021-08-01 2023-12-05 国网冀北电力有限公司超高压分公司 一种基于tx2设备的电力复合绝缘子憎水性分类方法
CN113611315B (zh) * 2021-08-03 2023-09-22 南开大学 基于轻量化卷积神经网络的声纹识别方法和装置
CN113611315A (zh) * 2021-08-03 2021-11-05 南开大学 基于轻量化卷积神经网络的声纹识别方法和装置
CN113610153A (zh) * 2021-08-06 2021-11-05 长沙理工大学 人体红外图像识别方法、装置、计算机设备及存储介质
CN113936028A (zh) * 2021-10-19 2022-01-14 深圳市金视电子科技有限公司 结合自动产生三元图与深层空洞卷积网络之数字抠像技术
CN114299305A (zh) * 2021-12-30 2022-04-08 安徽理工大学 聚合密集和注意力多尺度特征的显著性目标检测算法
CN114840938B (zh) * 2022-04-22 2024-08-20 武汉理工大学 滚动轴承故障诊断方法、装置、电子设备以及存储介质
CN114840938A (zh) * 2022-04-22 2022-08-02 武汉理工大学 滚动轴承故障诊断方法、装置、电子设备以及存储介质
CN115223017A (zh) * 2022-05-31 2022-10-21 昆明理工大学 一种基于深度可分离卷积的多尺度特征融合桥梁检测方法
CN115223017B (zh) * 2022-05-31 2023-12-19 昆明理工大学 一种基于深度可分离卷积的多尺度特征融合桥梁检测方法
CN115273060A (zh) * 2022-08-18 2022-11-01 杭州朗阳科技有限公司 适用于边缘设备的神经网络模型、图像识别方法及装置

Also Published As

Publication number Publication date
CN110796162A (zh) 2020-02-14
CN110796162B (zh) 2023-08-29

Similar Documents

Publication Publication Date Title
WO2021051520A1 (zh) 图像识别、训练识别模型的方法、相关设备及存储介质
CN108510485B (zh) 一种基于卷积神经网络的无参照图像质量评估方法
US10325181B2 (en) Image classification method, electronic device, and storage medium
CN109949255B (zh) 图像重建方法及设备
WO2021114625A1 (zh) 用于多任务场景的网络结构构建方法和装置
US11080565B2 (en) Face detection method and apparatus, computer device, and storage medium
EP4145353A1 (en) Neural network construction method and apparatus
WO2020006881A1 (zh) 蝴蝶识别网络构建方法、装置、计算机设备及存储介质
WO2021184902A1 (zh) 图像分类方法、装置、及其训练方法、装置、设备、介质
CN113159073B (zh) 知识蒸馏方法及装置、存储介质、终端
WO2019218136A1 (zh) 图像分割方法、计算机设备和存储介质
CN108765425B (zh) 图像分割方法、装置、计算机设备和存储介质
US20210256304A1 (en) Method and apparatus for training machine learning model, apparatus for video style transfer
US11449754B1 (en) Neural network training method for memristor memory for memristor errors
CN111192292A (zh) 基于注意力机制与孪生网络的目标跟踪方法及相关设备
WO2021218469A1 (zh) 影像数据检测方法、装置、计算机设备和存储介质
Zeng et al. Single image super-resolution using a polymorphic parallel CNN
EP4163832A1 (en) Neural network training method and apparatus, and image processing method and apparatus
US11176457B2 (en) Method and apparatus for reconstructing 3D microstructure using neural network
DE102017006563A1 (de) Bildpatchabgleich unter Nutzung eines wahrscheinlichkeitsbasierten Abtastens auf Grundlage einer Vorhersage
CN111161274A (zh) 腹部图像分割方法、计算机设备
US10832180B2 (en) Artificial intelligence system that employs windowed cellular automata to create plausible alternatives
WO2021218037A1 (zh) 目标检测方法、装置、计算机设备和存储介质
CN110264407A (zh) 图像超分辨模型训练及重建方法、装置、设备及存储介质
CN112801107A (zh) 一种图像分割方法和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19946039

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19946039

Country of ref document: EP

Kind code of ref document: A1