WO2016090520A1 - A method and a system for image classification - Google Patents

A method and a system for image classification Download PDF

Info

Publication number
WO2016090520A1
WO2016090520A1 PCT/CN2014/001115 CN2014001115W WO2016090520A1 WO 2016090520 A1 WO2016090520 A1 WO 2016090520A1 CN 2014001115 W CN2014001115 W CN 2014001115W WO 2016090520 A1 WO2016090520 A1 WO 2016090520A1
Authority
WO
WIPO (PCT)
Prior art keywords
convolutional
pooling
kernel
map
error
Prior art date
Application number
PCT/CN2014/001115
Other languages
French (fr)
Inventor
Xiaogang Wang
Hongsheng LI
Rui Zhao
Original Assignee
Xiaogang Wang
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaogang Wang filed Critical Xiaogang Wang
Priority to CN201480083906.2A priority Critical patent/CN107004142B/en
Priority to PCT/CN2014/001115 priority patent/WO2016090520A1/en
Publication of WO2016090520A1 publication Critical patent/WO2016090520A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects

Definitions

  • the present application relates to a method for image classification and a system thereof.
  • Pixel-wise classification tasks include image segmentation and object detection, which require inputting image patches into a classifier and outputting the class labels for their central pixels.
  • CNNs Convolutional Neural Networks
  • the input and output of each layer are called feature maps.
  • the CNN generally comprises convolution layers, pooling layers and non-linearity layers.
  • the convolution layer convolves input feature maps with 3D filter banks to generate output feature maps.
  • Each filter extracts the same type of local features at all locations of the input feature map.
  • the pooling layer decreases the resolution of the feature maps to make the output feature maps less sensitive to input shift and distortions. Max-pooling and average-pooling are most commonly used.
  • the non-linearity layer is a point-wise non-linear function applied to each entry of the feature maps.
  • the apparatus may comprise a converter configured to convert a convolutional neural network with a plurality of convolutional layers and a plurality of pooling layers connected to the convolutional layers.
  • the converter may comprise a first converting unit configured to insert all-zero rows and columns to a convolutional kernel of each of the convolutional layers such that every two neighboring entries in the convolutional kernel are separated from each other, and a second converting unit configured to insert unmasked rows and columns to a pooling kernel of each of the pooling layers such that every two neighboring entries in the pooling kernel are separated from each other.
  • the apparatus may further comprise a forward propagator configured to feed an image into the converted convolutional neural network to predict classes of all pixels in the image.
  • the apparatus may further comprise a backward propagator.
  • the backward propagator may be configured to update parameters of the convolutional kernel in the converted convolutional neural network
  • the apparatus may further comprise a chooser.
  • the chooser may be configured to choose errors of pixels of interest, the errors being back-propagate through the converted convolutional neural network so as to update parameters of the convolutional kernel.
  • the method may comprise converting a convolutional neural network with a plurality of convolutional layers and a plurality of pooling layers connected to the convolutional layers, and feeding an image into the converted convolutional neural network to predict classes of all pixels in the image.
  • the step of converting may comprise inserting all-zero rows and columns to a convolutional kernel of each of the convolutional layers such that every two neighboring entries in the convolutional kernel are separated from each other, and inserting unmasked rows and columns to a pooling kernel of each of the pooling layers such that every two neighboring entries in the pooling kernel are separated from each other.
  • the method may further comprise a step of updating parameters of the convolutional kernel in the converted convolutional neural network.
  • the method may further comprise a step of choosing errors of pixels of interest, and back-propagating errors through the converted convolutional neural network so as to update the parameters of the convolutional kernel.
  • Fig. 1 is a schematic diagram illustrating an exemplary apparatus according to one embodiment of the present application.
  • Fig. 2 is a schematic diagram illustrating an exemplary forward propagator according to one embodiment of the present application.
  • Fig. 3 is a schematic diagram illustrating another exemplary forward propagator according to one embodiment of the present application.
  • Fig. 4 is a schematic diagram illustrating an exemplary chooser according to one embodiment of the present application.
  • Fig. 5 is a schematic diagram illustrating an exemplary backward propagator according to one embodiment of the present application.
  • Fig. 6 is a schematic diagram illustrating another exemplary backward propagator according to one embodiment of the present application.
  • Fig. 7 is a schematic diagram illustrating yet another exemplary backward propagator according to one embodiment of the present application.
  • Fig. 8 is a schematic flowchart illustrating an exemplary method for image classification according to one embodiment of the present application.
  • Fig. 9 is a schematic flowchart illustrating the steps for converting an original CNN to a converted CNN according to one embodiment of the present application.
  • Fig. 11 is a schematic flowchart illustrating the steps for forward propagating according to one embodiment of the present application.
  • Fig. 12 is a schematic view illustrating performing convolution as matrix multiplication with the converted convolutional kernel.
  • Fig. 13 is a schematic flowchart illustrating the choosing step according to one embodiment of the present application.
  • Fig. 14 is a schematic flowchart illustrating the steps for backward propagating according to one embodiment of the present application.
  • Fig. 15 is a comparison of patch-by-patch scanning for CNN based pixel-wise classification and the advanced method disclosed in the present application.
  • the present invention may be embodied as a system, method or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc. ) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit, ” “module” or “system. ” Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer-usable program code embodied in the medium.
  • Fig. 1 is a schematic diagram illustrating an exemplary apparatus 100 for image classification consistent with some disclosed embodiments.
  • the apparatus 100 may comprise a converter 10 and a forward propagator 20.
  • the converter 10 is configured to retrieve a convolutional neural network with a plurality of convolutional layers and a plurality of pooling layers connected to the convolutional layers.
  • the forward propagator may be configured to feed an image into the converted convolutional neural network to generate a predicted label map for the image classification.
  • the converter 10 may comprise a first converting unit 11 and a second converting unit 12.
  • the first converting unit 11 may be configured to insert all-zero rows and columns to a convolutional kernel of each of the convolutional layers such that every two neighboring entries in the convolutional kernel are separated from each other.
  • the second converting unit 12 may be configured to insert unmasked rows and columns to a pooling kernel of each of the pooling layers such that every two neighboring entries in the pooling kernel are separated from each other. In some embodiments, the two neighboring entries are separated from each other by several pixels.
  • the apparatus 100 further comprises a backward propagator 30 for calculating the gradients of parameters of the modified CNN.
  • the backward propagator 30 may be configured to update the parameters of the convolutional kemel in the converted convolutional neural network.
  • the apparatus 100 further comprises a chooser 40, which calculates the errors of the predicted label map and chooses only the errors of pixels of interest for training CNN parameters.
  • the chooser 40 may be configured to choose errors of pixels of interest, the errors being back-propagate through the converted convolutional neural network so as to update the parameters of the convolutional kernel.
  • Fig. 2 is a schematic diagram illustrating an exemplary forward propagator 20.
  • the forward propagator 20 may comprise a first extracting unit 21, a first vectorizing unit 22, and a first convolution unit 23, wherein the first extracting unit 21 is configured to extract feature values specified by non-zero entries in the converted convolutional kernel from each neighborhood in an input feature of images to be classified, the first vectorizing unit 22 is configured to vectorize the non-zero entries of the converted convolutional kernel, and the first convolution unit 23 is configured to perform convolution on the feature values extracted by the first extracting unit and the non-zero entries vectorized by the first vectorizing unit to generate an output feature map, which may be used in the CNN as an intermediate result.
  • Fig. 3 is a schematic diagram illustrating another exemplary forward propagator 20’.
  • the forward propagator 20’ may comprise a second extracting unit 24, and a calculating unit 25, wherein the second extracting unit 24 is configured to extract feature values specified by masked entries in the converted pooling kernel from for each neighborhood in an input feature of images to be classified, and the calculating unit 25 is configured to calculate a mean value for an average pooling layer in said plurality of pooling layers or a max value for a max pooling layer in said plurality of pooling layers from the feature values extracted in the second extracting unit to generate an output feature map.
  • the pooling layer may be a layer in a convolutional neural network and can be at any layer of the CNN, and the average pooling layer calculates the mean value from the feature values extracted from each neighborhood of the input feature map.
  • the max pooling layer it may calculate the max value from the feature values extracted from each neighborhood of the input feature map.
  • the forward propagator 20 may comprise a first extracting unit 21, a first vectorizing unit 22, a first convolution unit 23, a second extracting unit 24, and a calculating unit 25. It should be understood that, although one forward propagator 20/20’ is shown in Figs. 2 and 3, there may be more than one forward propagators 20/20’ in other embodiments.
  • Fig. 4 is a schematic diagram illustrating an exemplary chooser 40.
  • the chooser 40 may comprises a comparer 41, which is configured to compare a predicted label map generated in the forward propagator 20 with a ground-truth label map to obtain pixel-wise errors for the label map.
  • the chooser 40 may further comprises a multiplier 42, which is configured to multiply each of the pixel-wise errors with a pixel-of-interest mask to generate a masked map for the errors.
  • Fig. 5 is a schematic diagram illustrating an exemplary backward propagator 30.
  • the backward propagator 30 may comprise a third extracting unit 31, a second vectorizing unit 32, and a second convolution unit 33.
  • the third extracting unit 31 is configured to extract feature values specified by non-zero entries in the converted convolutional kernel from each neighborhood in an input feature of images to be classified
  • the second vectorizing unit 32 is configured to vetorize the error map revived from the chooser 40 or the error map of the next layer
  • the second convolution unit 33 is configured to perform convolution on the feature values extracted by the third extracting unit and by the error map vectorized by the second vectorizing unit 32 to calculate the gradients of the convolutional kernel for updating the convolutional kernel.
  • the backward propagator 30 further comprises a third vectorizing unit 321, a fourth extracting unit 311, and a third convolution unit 331.
  • the third vectorizing unit 321 may be configured to rotate the non-zero entries of the converted convolutional kernel by a certain degree and vectorize the rotated non-zero entries.
  • the fourth extracting unit 311 may be configured to extract feature values specified by the rotated non-zero entries.
  • the third convolution unit 331 is configured to perform convolution on the feature values extracted by the fourth extracting unit 311 and the non-zero entries vectorized by the third vectorizing unit 321 to generate an error map, which is propagated backward through the network layer by layer to update the convolutional kernels of former convolutional layers.
  • Fig. 6 is a schematic diagram illustrating another exemplary backward propagator 30’ .
  • the backward propagator 30’ may comprise a first transferring unit 34, and a first accumulating unit 35.
  • the first transferring unit 34 may be configured to transfer the error value of the error map revived from the chooser 40 or the error map ora next layer to a corresponding entry on the error map of a current layer, whose indices are recorded in the forward propagator 10, and the first accumulating unit 35 may be configured to accumulate the transferred error values at each entry of the error map of the current layer.
  • Fig. 7 is a schematic diagram illustrating yet another exemplary backward propagator 30” .
  • the backward propagator 30 may comprise a dividing unit 36, a second transferring unit 37, and a second accumulating unit 38.
  • the dividing unit 36 may be configured to divide each error value on the error map revived from the chooser 40 or the error map of a next layer by the number of masked entries in the pooling kernel.
  • the second transferring unit 37 may be configured to transfer the divided error values back to the neighborhood on the error map of a current layer, whose indices are recorded in the forward propagator 10.
  • the second accumulating unit 38 may be configured to accumulate the transferred error values at each entry of the error map of the current layer wherein the first transferring unit 34 is configured to transfer the error value of the error map of a next layer to a corresponding entry on the error map of a current layer, whose indices are recorded in the forward propagator 10.
  • Fig. 8 is a schematic diagram illustrating an exemplary method 200 for image classification.
  • the method 200 may comprise following steps.
  • a convolutional neural network with a plurality of convolutional layers and a plurality of pooling layers connected to the convolutional layers is retrieved.
  • an image is fed into the convolutional neural network to predict classes of all pixels in the image.
  • errors of pixels of interest are chosen and back-propagated through the converted convolutional neural network so as to update the convolutional kernels.
  • the convolutional kernels in the converted convolutional neural network are updated.
  • the convolutional neural network is retrieved or obtained by converting an original CNN that takes image patches as input to a converted CNN that is able to take whole image as input.
  • some parameters such as a convolutional kernel W k of a convolution layer, a pooling kernel P k of a pooling layer, and a stride d k ofa layer, are converted by steps shown in Fig. 9.
  • d and k are originally set to be 1.
  • step 212 the type ofthe layer k is determined. If the layer is neither a convolution layer nor a pooling layer, the convention process goes to step 214.
  • Fig. 11 is a schematic flowchart illustrating the forward propagation according to one embodiment of the present application, which may be carried out by the forward propagator 20 mentioned above.
  • the forward propagation starts from step 221 at which an image is set as the input feature map and k is set to be 1. Then the process goes to step 222 at which the type of the current layer (i.e., layer k) is determined. If the current layer is neither a convolution layer nor a pooling layer, for example, the layer is a non-linearity layer, the method goes to step 223 at which the operation is performed in its original way.
  • the process goes to step 224 at which the stride of the layer is set to be 1, and then goes to step 225 at which feature values specified by non-zero entries in the kernel are extracted from each neighborhood in an input feature of image X k to create a matrix. Then, the process goes to step 226 at which the non-zero entries of the convolutional kernel W k are vectorized to create a vector. Then the process goes to step 227 at which, convolution is performed by matrix multiplication between the matrix and the vector created in former steps. Then the process goes to step 228 at which the results are stored in the output feature map.
  • step 222 if the current layer is a pooling layer with the converted pooling kernel P k , the process goes to step 229 at which the stride of the layer is set to be 1 and then goes to step 230 at which feature values specified by masked entries in the pooling kernel are extract from each neighborhood in an input feature of image X k to be classified. Then, the process goes to step 231 at which a mean value for the average pooling layer or a max value for the max pooling layer is calculated from the feature values extracted in the second extracting unit to generate a label map for classifying all pixels in the image. Similar to step 228, the calculation results are stored in the output feature map at step 232. Then the process proceeds to step 233 at which whether the current layer is the last layer is determined. If yes, the process goes to step 234 at which output feature map of the last layer is output to generate a predicted label map, and the process ends. If not, the process goes back to step 222 to process the next layer.
  • the output feature map X k+1 may be created by re-organizing the matrix multiplication result.
  • a bias value b k (i) is added to all the values for the ith channel of the output feature map X k+1 .
  • Fig. 12 is a schematic view illustrating performing convolution as matrix multiplication with the converted convolutional kernel.
  • Fig. 13 is a schematic flowchart illustrating the choosing step according to one embodiment of the present application, which may be which may be carried out by the chooser 40 mentioned above.
  • a predicted label map generated in the forward propagator based on the output feature map is compared with a ground-truth label map to calculate pixel-wise errors for the label map in step 261, and then each of the pixel-wise errors is multiplied with a pixel-of-interest mask to generate a masked map for the errors at step 262.
  • the masked error map is output at step 263.
  • Fig. 14 is a schematic flowchart illustrating the steps for backward propagating according to one embodiment of the present application, which may be carried out by the backward propagator 30 mentioned above.
  • the forward propagation starts from step 241 at which the current error map is set as the input error map and k is set to be K. Then the process goes to step 242 at which the type of the current layer (i.e., layer k) is determined. Ifthe current layer is neither a convolution layer nor a pooling layer, for example, the layer is a non-linearity layer, the method goes to step 243 at which the operation is performed in its original way.
  • step 242 if the layer k is a convolution layer with the converted convolutional kernel W k and the bias vector b k , the process goes to step 244 at which feature values specified by non-zero entries in the converted convolutional kernel are extracted from each neighborhood in an input feature of image X k to be classified to create a matrix. Then the error map ⁇ k+1 is rotated by a certain degree, for example, 180 degrees, and vetorized to create a vector at step 245. After that, convolution is performed as matrix multiplication with the created matrix and the vector to calculate the gradients of the kernel W k at step 246. Then the process goes to step 247 at which the results are stored as the gradients of the kernel. For the ith channel in the error map ⁇ k+1 , all the error values in each error channel is summed up as the gradient of the bias b k (i) .
  • the kernel may be rotated for a certain degree such as 180 degrees to create a vector at step 248. Then for each neighborhood in the error map ⁇ k+1 , error values specified by non-zero entries in the kernel are extracted from the error map to create a matrix at step 250. Then the process goes to step 251 at which convolution is performed as matrix multiplication with the created matrix and vector to calculate the error map at layer. Finally, the results are stored in the error map ⁇ k of the previous layer, i.e., layer k-1.
  • step 242 if the layer k is a max pooling layer with the modified pooling kernel P k , then the process goes to step 249 at which, for each error value of the error map ⁇ k+1 , the error value is transferred to the corresponding entry on the error map ⁇ k , whose indices are recorded during the forward propagation. Then the transferred error values are accumulated at each entry of ⁇ k . If layer k is an average pooling layer with the modified pooling kernel P k , each error value on ⁇ k+1 is divided by the number of masked entries in the pooling kernel P k at step 249. Then the divided values is transferred back to the neighborhood on the error map ⁇ k , whose indices are recorded during the forward propagation. Finally, the transferred error values are accumulated at each entry of ⁇ k .
  • Fig. 15 is a comparison of patch-by-patch scanning for CNN based pixel-wise classification and the advanced method disclosed in the present application. Comparing with the conventional classification scheme, the present solution has the advantage of eliminating the redundant computation of forward and backward propagation in CNN based pixel-wise classification, and achieves significant speedup.

Abstract

Disclosed is an apparatus for image classification. The apparatus comprises a converter and a forward propagator. The converter is configured to retrieve a convolutional neural network with a plurality of convolutional layers and a plurality of pooling layers connected to the convolutional layers. The forward propagator is configured to feed an image into the convolutional neural network to predict classes of all pixels in the image. The convert further comprises first and second converting units. The first converting unit is configured to insert all-zero rows and columns to the convolutional kernel of the convolutional layers such that every two neighboring entries are separated from each other. The second converting unit is configured to insert unmasked rows and columns to the pooling kernel of the pooling layers such that every two neighboring entries are separated from each other. The apparatus also comprises a backward propagator to update the convolutional kernels in the converted convolutional neural network. The present application also discloses a method for image classification.

Description

A METHOD AND A SYSTEM FOR IMAGE CLASSIFICATION Technical Field
The present application relates to a method for image classification and a system thereof.
Background
The goal of pixel-wise classification is to classify all pixels in an image into different classes. Pixel-wise classification tasks include image segmentation and object detection, which require inputting image patches into a classifier and outputting the class labels for their central pixels.
Convolutional Neural Networks (CNNs) are trainable multistage feed-forward neural networks. They have been extensively investigated to extract good hierarchical feature representations for image classification tasks. The input and output of each layer are called feature maps. The CNN generally comprises convolution layers, pooling layers and non-linearity layers. The convolution layer convolves input feature maps with 3D filter banks to generate output feature maps. Each filter extracts the same type of local features at all locations of the input feature map. The pooling layer decreases the resolution of the feature maps to make the output feature maps less sensitive to input shift and distortions. Max-pooling and average-pooling are most commonly used. The non-linearity layer is a point-wise non-linear function applied to each entry of the feature maps.
After extracting features with a multilayer convolutional network, fully connected layers with a final classifier are added to output class predictions. Given training samples and their labels, the parameters of CNNs are learned in an end-to-end supervised way by minimizing a loss function on training data. Forward and backward propagation is used to make class predictions for input samples and to update CNN parameters based on prediction errors, respectively.
However, forward and backward propagation was originally designed for  whole-image classification. Directly applying it to pixel-wise classification in a patch-by-patch scanning manner is extremely inefficient, because surrounding patches of pixels have large overlaps, which lead to a lot of redundant computation.
Summary
There have been works on how to eliminate all the redundant computations of forward and backward propagation in CNN based pixelwise classification, and achieve significant speedup.
In one aspect of the present application, disclosed is an apparatus for image classification. The apparatus may comprise a converter configured to convert a convolutional neural network with a plurality of convolutional layers and a plurality of pooling layers connected to the convolutional layers. The converter may comprise a first converting unit configured to insert all-zero rows and columns to a convolutional kernel of each of the convolutional layers such that every two neighboring entries in the convolutional kernel are separated from each other, and a second converting unit configured to insert unmasked rows and columns to a pooling kernel of each of the pooling layers such that every two neighboring entries in the pooling kernel are separated from each other. The apparatus may further comprise a forward propagator configured to feed an image into the converted convolutional neural network to predict classes of all pixels in the image.
In one embodiment, the apparatus may further comprise a backward propagator. The backward propagator may be configured to update parameters of the convolutional kernel in the converted convolutional neural network
In one embodiment, the apparatus may further comprise a chooser. The chooser may be configured to choose errors of pixels of interest, the errors being back-propagate through the converted convolutional neural network so as to update parameters of the convolutional kernel.
In another aspect of the present application, disclosed is a method for image classification. The method may comprise converting a convolutional neural network  with a plurality of convolutional layers and a plurality of pooling layers connected to the convolutional layers, and feeding an image into the converted convolutional neural network to predict classes of all pixels in the image. The step of converting may comprise inserting all-zero rows and columns to a convolutional kernel of each of the convolutional layers such that every two neighboring entries in the convolutional kernel are separated from each other, and inserting unmasked rows and columns to a pooling kernel of each of the pooling layers such that every two neighboring entries in the pooling kernel are separated from each other.
In one embodiment, the method may further comprise a step of updating parameters of the convolutional kernel in the converted convolutional neural network.
In one embodiment, the method may further comprise a step of choosing errors of pixels of interest, and back-propagating errors through the converted convolutional neural network so as to update the parameters of the convolutional kernel.
Brief Description of the Drawing
Exemplary non-limiting embodiments of the present invention are described below with reference to the attached drawings. The drawings are illustrative and generally not to an exact scale. The same or similar elements on different figures are referenced with the same reference numbers.
Fig. 1 is a schematic diagram illustrating an exemplary apparatus according to one embodiment of the present application.
Fig. 2 is a schematic diagram illustrating an exemplary forward propagator according to one embodiment of the present application.
Fig. 3 is a schematic diagram illustrating another exemplary forward propagator according to one embodiment of the present application.
Fig. 4 is a schematic diagram illustrating an exemplary chooser according to  one embodiment of the present application.
Fig. 5 is a schematic diagram illustrating an exemplary backward propagator according to one embodiment of the present application.
Fig. 6 is a schematic diagram illustrating another exemplary backward propagator according to one embodiment of the present application.
Fig. 7 is a schematic diagram illustrating yet another exemplary backward propagator according to one embodiment of the present application.
Fig. 8 is a schematic flowchart illustrating an exemplary method for image classification according to one embodiment of the present application.
Fig. 9 is a schematic flowchart illustrating the steps for converting an original CNN to a converted CNN according to one embodiment of the present application.
Fig. 10 is a schematic view illustrating inserting all-zero rows and columns to the convolutional kernel Wk and pooling kernel Pk with d=2 and d=3, respectively.
Fig. 11 is a schematic flowchart illustrating the steps for forward propagating according to one embodiment of the present application.
Fig. 12 is a schematic view illustrating performing convolution as matrix multiplication with the converted convolutional kernel.
Fig. 13 is a schematic flowchart illustrating the choosing step according to one embodiment of the present application.
Fig. 14 is a schematic flowchart illustrating the steps for backward propagating according to one embodiment of the present application.
Fig. 15 is a comparison of patch-by-patch scanning for CNN based  pixel-wise classification and the advanced method disclosed in the present application.
Detailed Description
Reference will now be made in detail to some specific embodiments of the invention including the best modes contemplated by the inventors for carrying out the invention. Examples of these specific embodiments are illustrated in the accompanying drawings. While the invention is described in conjunction with these specific embodiments, it will be understood that it is not intended to limit the invention to the described embodiments. On the contrary, it is intended to cover alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be practiced without some or all of these specific details. In other instances, well-known process operations have not been described in detail in order not to unnecessarily obscure the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms ″a″ , ″an″ and ″the″ are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms ″comprises″ and/or ″comprising, ″ when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc. ) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit, ” “module” or “system. ” Furthermore, the present invention may take the form of a computer program product embodied in any tangible  medium of expression having computer-usable program code embodied in the medium.
It is further understood that the use of relational terms such as first and second, and the like, if any, are used solely to distinguish one from another entity, item, or action without necessarily requiring or implying any actual such relationship or order between such entities, items or actions.
Much of the inventive functionality and many of the inventive principles when implemented, are best supported with or in software or integrated circuits (ICs) , such as a digital signal processor and software therefore or application specific ICs. It is expected that one of ordinary skill, notwithstanding possibly significant effort and many design choices motivated by, for example, available time, current technology, and economic considerations, when guided by the concepts and principles disclosed herein will be readily capable of generating such software instructions or ICs with minimal experimentation. Therefore, in the interest of brevity and minimization of any risk of obscuring the principles and concepts according to the present invention, further discussion of such software and ICs, if any, will be limited to the essentials with respect to the principles and concepts used by the preferred embodiments.
Fig. 1 is a schematic diagram illustrating an exemplary apparatus 100 for image classification consistent with some disclosed embodiments. As shown, the apparatus 100 may comprise a converter 10 and a forward propagator 20. The converter 10 is configured to retrieve a convolutional neural network with a plurality of convolutional layers and a plurality of pooling layers connected to the convolutional layers. The forward propagator may be configured to feed an image into the converted convolutional neural network to generate a predicted label map for the image classification. In one embodiment of the present application, the converter 10 may comprise a first converting unit 11 and a second converting unit 12. The first converting unit 11 may be configured to insert all-zero rows and columns to a convolutional kernel of each of the convolutional layers such that every two neighboring entries in the convolutional kernel are separated from each other. The second converting unit 12 may be configured to insert unmasked rows and columns to  a pooling kernel of each of the pooling layers such that every two neighboring entries in the pooling kernel are separated from each other. In some embodiments, the two neighboring entries are separated from each other by several pixels.
Referring to Fig. 1, to enable the convolutional neural network to work effectively, the apparatus 100 further comprises a backward propagator 30 for calculating the gradients of parameters of the modified CNN. In the present embodiment, the backward propagator 30 may be configured to update the parameters of the convolutional kemel in the converted convolutional neural network. In some embodiments, the apparatus 100 further comprises a chooser 40, which calculates the errors of the predicted label map and chooses only the errors of pixels of interest for training CNN parameters. In the present embodiment, the chooser 40 may be configured to choose errors of pixels of interest, the errors being back-propagate through the converted convolutional neural network so as to update the parameters of the convolutional kernel.
Fig. 2 is a schematic diagram illustrating an exemplary forward propagator 20.As shown, the forward propagator 20 may comprise a first extracting unit 21, a first vectorizing unit 22, and a first convolution unit 23, wherein the first extracting unit 21 is configured to extract feature values specified by non-zero entries in the converted convolutional kernel from each neighborhood in an input feature of images to be classified, the first vectorizing unit 22 is configured to vectorize the non-zero entries of the converted convolutional kernel, and the first convolution unit 23 is configured to perform convolution on the feature values extracted by the first extracting unit and the non-zero entries vectorized by the first vectorizing unit to generate an output feature map, which may be used in the CNN as an intermediate result.
Fig. 3 is a schematic diagram illustrating another exemplary forward propagator 20’. As shown in Fig. 3, the forward propagator 20’ may comprise a second extracting unit 24, and a calculating unit 25, wherein the second extracting unit 24 is configured to extract feature values specified by masked entries in the converted pooling kernel from for each neighborhood in an input feature of images to  be classified, and the calculating unit 25 is configured to calculate a mean value for an average pooling layer in said plurality of pooling layers or a max value for a max pooling layer in said plurality of pooling layers from the feature values extracted in the second extracting unit to generate an output feature map. As well known in the art, the pooling layer may be a layer in a convolutional neural network and can be at any layer of the CNN, and the average pooling layer calculates the mean value from the feature values extracted from each neighborhood of the input feature map. As to the max pooling layer, it may calculate the max value from the feature values extracted from each neighborhood of the input feature map.
It should be understood that, in some embodiments, the forward propagator 20 may comprise a first extracting unit 21, a first vectorizing unit 22, a first convolution unit 23, a second extracting unit 24, and a calculating unit 25. It should be understood that, although one forward propagator 20/20’ is shown in Figs. 2 and 3, there may be more than one forward propagators 20/20’ in other embodiments.
Fig. 4 is a schematic diagram illustrating an exemplary chooser 40. As shown in Fig. 4, the chooser 40 may comprises a comparer 41, which is configured to compare a predicted label map generated in the forward propagator 20 with a ground-truth label map to obtain pixel-wise errors for the label map.
In some embodiments, the chooser 40 may further comprises a multiplier 42, which is configured to multiply each of the pixel-wise errors with a pixel-of-interest mask to generate a masked map for the errors.
Fig. 5 is a schematic diagram illustrating an exemplary backward propagator 30. As shown in Fig. 5, the backward propagator 30 may comprise a third extracting unit 31, a second vectorizing unit 32, and a second convolution unit 33. The third extracting unit 31 is configured to extract feature values specified by non-zero entries in the converted convolutional kernel from each neighborhood in an input feature of images to be classified, the second vectorizing unit 32 is configured to vetorize the error map revived from the chooser 40 or the error map of the next layer, and the second convolution unit 33 is configured to perform convolution on the feature values  extracted by the third extracting unit and by the error map vectorized by the second vectorizing unit 32 to calculate the gradients of the convolutional kernel for updating the convolutional kernel.
In some embodiments, the backward propagator 30 further comprises a third vectorizing unit 321, a fourth extracting unit 311, and a third convolution unit 331. The third vectorizing unit 321 may be configured to rotate the non-zero entries of the converted convolutional kernel by a certain degree and vectorize the rotated non-zero entries. The fourth extracting unit 311 may be configured to extract feature values specified by the rotated non-zero entries. The third convolution unit 331 is configured to perform convolution on the feature values extracted by the fourth extracting unit 311 and the non-zero entries vectorized by the third vectorizing unit 321 to generate an error map, which is propagated backward through the network layer by layer to update the convolutional kernels of former convolutional layers.
Fig. 6 is a schematic diagram illustrating another exemplary backward propagator 30’ . As shown in Fig. 6, the backward propagator 30’ may comprise a first transferring unit 34, and a first accumulating unit 35. The first transferring unit 34 may be configured to transfer the error value of the error map revived from the chooser 40 or the error map ora next layer to a corresponding entry on the error map of a current layer, whose indices are recorded in the forward propagator 10, and the first accumulating unit 35 may be configured to accumulate the transferred error values at each entry of the error map of the current layer.
Fig. 7 is a schematic diagram illustrating yet another exemplary backward propagator 30” . As shown in Fig. 7, the backward propagator 30” may comprise a dividing unit 36, a second transferring unit 37, and a second accumulating unit 38. The dividing unit 36 may be configured to divide each error value on the error map revived from the chooser 40 or the error map of a next layer by the number of masked entries in the pooling kernel. The second transferring unit 37 may be configured to transfer the divided error values back to the neighborhood on the error map of a current layer, whose indices are recorded in the forward propagator 10. The second accumulating unit 38 may be configured to accumulate the transferred error values at  each entry of the error map of the current layer wherein the first transferring unit 34 is configured to transfer the error value of the error map of a next layer to a corresponding entry on the error map of a current layer, whose indices are recorded in the forward propagator 10.
Fig. 8 is a schematic diagram illustrating an exemplary method 200 for image classification. As shown, the method 200 may comprise following steps. At step 210, a convolutional neural network with a plurality of convolutional layers and a plurality of pooling layers connected to the convolutional layers is retrieved. At step 220, an image is fed into the convolutional neural network to predict classes of all pixels in the image. At step 260, errors of pixels of interest are chosen and back-propagated through the converted convolutional neural network so as to update the convolutional kernels. At step 240, the convolutional kernels in the converted convolutional neural network are updated.
In the present embodiment, the convolutional neural network is retrieved or obtained by converting an original CNN that takes image patches as input to a converted CNN that is able to take whole image as input. Particularly, some parameters, such as a convolutional kernel Wk of a convolution layer, a pooling kernel Pk of a pooling layer, and a stride dk ofa layer, are converted by steps shown in Fig. 9. In step 211, d and k are originally set to be 1. In step 212, the type ofthe layer k is determined. If the layer is neither a convolution layer nor a pooling layer, the convention process goes to step 214. If the layer is a convolution layer in step 212, the method proceeds to step 213 at which the convolutional kernel Wk is converted by inserting all-zero rows and columns to the convolutional kernel Wk such that every two neighboring entries are d-pixels away from each other. Ifthe layer is a pooling layer at step 212, the pooling kernel Pk is converted by inserting unmasked rows and columns to the kernel Pk at step 213 such that every two neighboring entries are d-pixels away from each other. Then the process proceeds to step 214 at which whether the current layer is the last layer is determined. If yes, the process goes to step 215 at which the process ends. If not, k is increased by 1 (i.e., k=k+1) , and the process goes back to step 212 to process the next layer. 
Fig. 10 is an illustration of inserting all-zero rows and columns to the convolutional kernel Wk and pooling kernel Pk with d=2 and d=3, respectively.
Fig. 11 is a schematic flowchart illustrating the forward propagation according to one embodiment of the present application, which may be carried out by the forward propagator 20 mentioned above. As shown in Fig. 11, the forward propagation starts from step 221 at which an image is set as the input feature map and k is set to be 1. Then the process goes to step 222 at which the type of the current layer (i.e., layer k) is determined. If the current layer is neither a convolution layer nor a pooling layer, for example, the layer is a non-linearity layer, the method goes to step 223 at which the operation is performed in its original way. If the current layer is a convolution layer with the converted convolutional kernel Wk and the bias vector bk, the process goes to step 224 at which the stride of the layer is set to be 1, and then goes to step 225 at which feature values specified by non-zero entries in the kernel are extracted from each neighborhood in an input feature of image Xk to create a matrix. Then, the process goes to step 226 at which the non-zero entries of the convolutional kernel Wk are vectorized to create a vector. Then the process goes to step 227 at which, convolution is performed by matrix multiplication between the matrix and the vector created in former steps. Then the process goes to step 228 at which the results are stored in the output feature map.
At step 222, if the current layer is a pooling layer with the converted pooling kernel Pk, the process goes to step 229 at which the stride of the layer is set to be 1 and then goes to step 230 at which feature values specified by masked entries in the pooling kernel are extract from each neighborhood in an input feature of image Xk to be classified. Then, the process goes to step 231 at which a mean value for the average pooling layer or a max value for the max pooling layer is calculated from the feature values extracted in the second extracting unit to generate a label map for classifying all pixels in the image. Similar to step 228, the calculation results are stored in the output feature map at step 232. Then the process proceeds to step 233 at which whether the current layer is the last layer is determined. If yes, the process goes to step 234 at which output feature map of the last layer is output to generate a predicted label map, and the process ends. If not, the process goes back to step 222 to  process the next layer.
In some embodiments, the output feature map Xk+1 may be created by re-organizing the matrix multiplication result. In some embodiments, a bias value bk (i) is added to all the values for the ith channel of the output feature map Xk+1.
Fig. 12 is a schematic view illustrating performing convolution as matrix multiplication with the converted convolutional kernel.
Fig. 13 is a schematic flowchart illustrating the choosing step according to one embodiment of the present application, which may be which may be carried out by the chooser 40 mentioned above. As shown in Fig. 13, a predicted label map generated in the forward propagator based on the output feature map is compared with a ground-truth label map to calculate pixel-wise errors for the label map in step 261, and then each of the pixel-wise errors is multiplied with a pixel-of-interest mask to generate a masked map for the errors at step 262. Finally, the masked error map is output at step 263.
Fig. 14 is a schematic flowchart illustrating the steps for backward propagating according to one embodiment of the present application, which may be carried out by the backward propagator 30 mentioned above.
As shown in Fig. 14, the forward propagation starts from step 241 at which the current error map is set as the input error map and k is set to be K. Then the process goes to step 242 at which the type of the current layer (i.e., layer k) is determined. Ifthe current layer is neither a convolution layer nor a pooling layer, for example, the layer is a non-linearity layer, the method goes to step 243 at which the operation is performed in its original way.
At step 242, if the layer k is a convolution layer with the converted convolutional kernel Wk and the bias vector bk, the process goes to step 244 at which feature values specified by non-zero entries in the converted convolutional kernel are extracted from each neighborhood in an input feature of image Xk to be classified to  create a matrix. Then the error map δk+1 is rotated by a certain degree, for example, 180 degrees, and vetorized to create a vector at step 245. After that, convolution is performed as matrix multiplication with the created matrix and the vector to calculate the gradients of the kernel Wk at step 246. Then the process goes to step 247 at which the results are stored as the gradients of the kernel. For the ith channel in the error map δk+1, all the error values in each error channel is summed up as the gradient of the bias bk (i) .
At the same time of step 244, the kernel may be rotated for a certain degree such as 180 degrees to create a vector at step 248. Then for each neighborhood in the error map δk+1, error values specified by non-zero entries in the kernel are extracted from the error map to create a matrix at step 250. Then the process goes to step 251 at which convolution is performed as matrix multiplication with the created matrix and vector to calculate the error map at layer. Finally, the results are stored in the error map δk of the previous layer, i.e., layer k-1.
At step 242, ifthe layer k is a max pooling layer with the modified pooling kernel Pk, then the process goes to step 249 at which, for each error value of the error map δk+1, the error value is transferred to the corresponding entry on the error map δk, whose indices are recorded during the forward propagation. Then the transferred error values are accumulated at each entry of δk. If layer k is an average pooling layer with the modified pooling kernel Pk, each error value on δk+1 is divided by the number of masked entries in the pooling kernel Pk at step 249. Then the divided values is transferred back to the neighborhood on the error map δk, whose indices are recorded during the forward propagation. Finally, the transferred error values are accumulated at each entry of δk.
After above steps, the process proceeds to step 253 at which whether the current layer is the first layer is determined. If yes, the process goes to step 254 at which the process ends. If not, k is decreased by 1 (i.e., k=k-1) , and the process goes back to step 242 to continue.
At the end of the process, the gradients of all convolutional kernels and bias  vectors are output.
Fig. 15 is a comparison of patch-by-patch scanning for CNN based pixel-wise classification and the advanced method disclosed in the present application. Comparing with the conventional classification scheme, the present solution has the advantage of eliminating the redundant computation of forward and backward propagation in CNN based pixel-wise classification, and achieves significant speedup.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims (20)

  1. An apparatus for image classification, comprising:
    a converter configured to convert a convolutional neural network with a plurality of convolutional layers and a plurality of pooling layers connected to the convolutional layers, wherein the converter further comprises:
    a first converting unit configured to insert all-zero rows and columns to a convolutional kernel of each of the convolutional layers such that every two neighboring entries in the convolutional kernel are separated from each other, and
    a second converting unit configured to insert unmasked rows and columns to a pooling kernel of each of the pooling layers such that every two neighboring entries in the pooling kernel are separated from each other, and
    a forward propagator configured to feed an image into the converted convolutional neural network to generate a predicted label map for the image classification.
  2. The apparatus of claim 1, further comprising:
    a backward propagator configured to update the convolutional kernel in the converted convolutional neural network.
  3. The apparatus of claim 2, further comprising:
    a chooser configured to choose errors of pixels of interest, the errors being back-propagate through the converted convolutional neural network so as to update the convolutional kernel.
  4. The apparatus of claim 1, wherein the forward propagator further comprises:
    a first extracting unit configured to extract feature values specified by non-zero entries in the converted convolutional kernel from each neighborhood in an input feature of an image to be classified;
    a first vectorizing unit configured to vectorize the non-zero entries of the converted convolutional kernel; and
    a first convolution unit configured to perform convolution on the feature values extracted by the first extracting unit and the non-zero entries vectorized by the first vectorizing unit to generate a feature map to be outputted from the convolutional layers.
  5. The apparatus of claim 1, wherein the pooling layers comprises an average pooling layer or a max pooling layer, and
    wherein the forward propagator further comprises:
    a second extracting unit configured to extract feature values specified by masked entries in the converted pooling kernel from each neighborhood in an input feature of an image to be classified; and
    a calculating unit configured to calculate a mean value for the average pooling layer or a max value for the max pooling layer from the feature values extracted in the second extracting unit to generate a feature map to be outputted from the pooling layers.
  6. The apparatus of claim 1, wherein the chooser further comprises:
    a comparer configured to compare the predicted label map with a ground-truth label map to obtain pixel-wise errors for the predicted label map; and
    a multiplier configured to multiply each of the pixel-wise errors with a pixel-of-interest mask to generate a masked error map.
  7. The apparatus of claim 2, wherein the backward propagator further comprises:
    a third extracting unit configured to extract feature values specified by non-zero entries in the converted convolutional kernel from each neighborhood in an input feature of an image to be classified;
    a second vectorizing unit configured to vetorize the error map revived from the chooser;
    a second convolution unit configured to perform convolution on the feature values extracted by the third extracting unit and the error map vectorized by the second vectorizing unit to calculate the gradients of the convolutional kernel for updating the convolutional kernel.
  8. The apparatus of claim 2, wherein the forward propagator further comprises:
    a third vectorizing unit configured to rotate the non-zero entries of the converted convolutional kernel by a certain degree and vectorize the rotated non-zero entries;
    a fourth extracting unit configured to extract feature values specified by the rotated non-zero entries; and
    a third convolution unit configured to perform convolution on the feature values extracted by the fourth extracting unit and the non-zero entries vectorized by the third vectorizing unit to generate an error map for a former layer.
  9. The apparatus of claim 8, wherein the backward propagator further comprises:
    a first transferring unit configured to transfer the error value of the error map of a next layer to a corresponding entry on the error map of a current pooling layer of the plurality of polling layers, whose indices are recorded in the forward propagator; and
    a first accumulating unit configured to accumulate the transferred error values at each entry of the error map of a current layer.
  10. The apparatus of claim 8, wherein the backward propagator further comprises:
    a dividing unit configured to divide each error value on the error map of a next layer by the number of masked entries in the pooling kernel;
    a second transferring unit configured to transfer the divided error values back to the neighborhood on the error map of a current pooling layer of the plurality of polling layers, whose indices are recorded in the forward propagator; and
    a second accumulating unit configured to accumulate the transferred error values at each entry of the error map of a current layer.
  11. A method for image classification, comprising:
    converting a convolutional neural network with a plurality of convolutional layers and a plurality of pooling layers connected to the convolutional layers, wherein the step of converting comprises:
    inserting all-zero rows and columns to a convolutional kernel of each of the convolutional layers such that every two neighboring entries in the convolutional kernel are separated from each other, and
    inserting unmasked rows and columns to a pooling kernel of each of the pooling layers such that every two neighboring entries in the pooling kernel are separated from each other, and
    feeding an image into the converted convolutional neural network to generate a predicted label map for the image classification.
  12. The method ofclaim 11, further comprising:
    updating the convolutional kernel in the converted convolutional neural network.
  13. The method of claim 12, further comprising:
    choosing errors of pixels of interest, and back-propagating errors being through the converted convolutional neural network so as to update the convolutional kernel.
  14. The method of claim 11, wherein the step of feeding comprises:
    extracting feature values specified by non-zero entries in the converted convolutional kemel from each neighborhood in an input feature of an image to be classified;
    vectorizing the non-zero entries of the converted convolutional kernel; and
    performing convolution on the feature values extracted in the step of extracting and the non-zero entries vectorized in the step of vectorizing to generate an feature map to be outputted from the convolutional layers.
  15. The method of claim 11, wherein the pooling layers comprises an average pooling layer or a max pooling layer, and
    wherein the step of feeding further comprises:
    extracting feature values specified by masked entries in the converted pooling kernel from each neighborhood in an input feature of an image to be classified; and
    calculating a mean value for the average pooling layer or a max value for the max pooling layer from the feature values extracted in the step of extracting to generate a feature map to be outputted from the pooling layers.
  16. The method of claim 11, wherein the step of choosing comprises:
    comparing the predicted label map with a ground-truth label map to obtain pixel-wise errors for the predicted label map; and
    multiplying each of the pixel-wise errors with a pixel-of-interest mask to generate a masked error map.
  17. The method of claim 12, wherein the step of updating comprises:
    extracting feature values specified by non-zero entries in the converted convolutional kemel from each neighborhood in an input feature of an image to be classified;
    vectorizing the error map revived in the step of choosing errors;
    performing convolution on the feature values extracted in the step of extracting and the error map vectorized in the step of vectorizing to calculate the gradients of the convolutional kernel for updating the convolutional kernel.
  18. The method of claim 12, wherein the step of feeding further comprises:
    rotating the non-zero entries of the converted convolutional kernel by a certain degree and vectorize the rotated non-zero entries;
    extracting feature values specified by the rotated non-zero entries; and
    performing convolution on the feature values extracted in the step of extracting and the non-zero entries vectorized in the step of vectorizing to generate an error  feature map for a former layer.
  19. The method of claim 18, wherein the step of updating further comprises:
    transferring the error value of the error map of a next layer to a corresponding entry on the error map of a current pooling layer of the plurality of polling layers, whose indices are recorded during the step of feeding; and
    accumulating the transferred error values at each entry of the error map of a current layer.
  20. The method of claim 18, wherein the step of updating further comprises:
    dividing each error value on the error map of a next layer by the number of masked entries in the pooling kernel;
    transferring the divided error values back to the neighborhood on the error map of a current pooling layer of the plurality of polling layers, whose indices are recorded during the step of feeding; and
    accumulating the transferred error values at each entry of the error map of a current layer.
PCT/CN2014/001115 2014-12-10 2014-12-10 A method and a system for image classification WO2016090520A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201480083906.2A CN107004142B (en) 2014-12-10 2014-12-10 Method and system for image classification
PCT/CN2014/001115 WO2016090520A1 (en) 2014-12-10 2014-12-10 A method and a system for image classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/001115 WO2016090520A1 (en) 2014-12-10 2014-12-10 A method and a system for image classification

Publications (1)

Publication Number Publication Date
WO2016090520A1 true WO2016090520A1 (en) 2016-06-16

Family

ID=56106391

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/001115 WO2016090520A1 (en) 2014-12-10 2014-12-10 A method and a system for image classification

Country Status (2)

Country Link
CN (1) CN107004142B (en)
WO (1) WO2016090520A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107967484A (en) * 2017-11-14 2018-04-27 中国计量大学 A kind of image classification method based on multiresolution
CN108734269A (en) * 2017-04-18 2018-11-02 三星电子株式会社 Generate the convolutional neural networks and computer implemented method of the classification of input picture
CN109886404A (en) * 2019-02-01 2019-06-14 东南大学 A kind of convolutional neural networks pond method of staggered diamonds perception
US10719737B2 (en) 2018-08-23 2020-07-21 Denso International America, Inc. Image classification system for resizing images to maintain aspect ratio information
EP3687152A1 (en) * 2019-01-23 2020-07-29 StradVision, Inc. Learning method and learning device for pooling roi by using masking parameters to be used for mobile devices or compact networks via hardware optimization, and testing method and testing device using the same
CN112651420A (en) * 2019-10-11 2021-04-13 百度(美国)有限责任公司 System and method for training image classification model and method for classifying images
JP2022500786A (en) * 2019-05-21 2022-01-04 深▲セン▼市商湯科技有限公司Shenzhen Sensetime Technology Co., Ltd. Information processing methods and devices, electronic devices, storage media and computer programs

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109726709A (en) * 2017-10-31 2019-05-07 优酷网络技术(北京)有限公司 Icon-based programming method and apparatus based on convolutional neural networks
CN109165666A (en) * 2018-07-05 2019-01-08 南京旷云科技有限公司 Multi-tag image classification method, device, equipment and storage medium
CN109102070B (en) * 2018-08-22 2020-11-24 地平线(上海)人工智能技术有限公司 Preprocessing method and device for convolutional neural network data
CN111797881A (en) * 2019-07-30 2020-10-20 华为技术有限公司 Image classification method and device
CN113850275A (en) * 2019-09-27 2021-12-28 深圳市商汤科技有限公司 Image processing method, image processing device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544506A (en) * 2013-10-12 2014-01-29 Tcl集团股份有限公司 Method and device for classifying images on basis of convolutional neural network
CN103984959A (en) * 2014-05-26 2014-08-13 中国科学院自动化研究所 Data-driven and task-driven image classification method
CN104067314A (en) * 2014-05-23 2014-09-24 中国科学院自动化研究所 Human-shaped image segmentation method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7747070B2 (en) * 2005-08-31 2010-06-29 Microsoft Corporation Training convolutional neural networks on graphics processing units

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544506A (en) * 2013-10-12 2014-01-29 Tcl集团股份有限公司 Method and device for classifying images on basis of convolutional neural network
CN104067314A (en) * 2014-05-23 2014-09-24 中国科学院自动化研究所 Human-shaped image segmentation method
CN103984959A (en) * 2014-05-26 2014-08-13 中国科学院自动化研究所 Data-driven and task-driven image classification method

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108734269A (en) * 2017-04-18 2018-11-02 三星电子株式会社 Generate the convolutional neural networks and computer implemented method of the classification of input picture
US11164071B2 (en) * 2017-04-18 2021-11-02 Samsung Electronics Co., Ltd. Method and apparatus for reducing computational complexity of convolutional neural networks
CN108734269B (en) * 2017-04-18 2024-01-09 三星电子株式会社 Convolutional neural network and computer-implemented method for generating a classification of an input image
CN107967484A (en) * 2017-11-14 2018-04-27 中国计量大学 A kind of image classification method based on multiresolution
US10719737B2 (en) 2018-08-23 2020-07-21 Denso International America, Inc. Image classification system for resizing images to maintain aspect ratio information
EP3687152A1 (en) * 2019-01-23 2020-07-29 StradVision, Inc. Learning method and learning device for pooling roi by using masking parameters to be used for mobile devices or compact networks via hardware optimization, and testing method and testing device using the same
CN109886404A (en) * 2019-02-01 2019-06-14 东南大学 A kind of convolutional neural networks pond method of staggered diamonds perception
CN109886404B (en) * 2019-02-01 2023-08-04 东南大学 Convolutional neural network pooling method for staggered diamond perception
JP2022500786A (en) * 2019-05-21 2022-01-04 深▲セン▼市商湯科技有限公司Shenzhen Sensetime Technology Co., Ltd. Information processing methods and devices, electronic devices, storage media and computer programs
JP7140912B2 (en) 2019-05-21 2022-09-21 深▲セン▼市商湯科技有限公司 Information processing method and device, electronic device, storage medium and computer program
CN112651420A (en) * 2019-10-11 2021-04-13 百度(美国)有限责任公司 System and method for training image classification model and method for classifying images

Also Published As

Publication number Publication date
CN107004142B (en) 2018-04-17
CN107004142A (en) 2017-08-01

Similar Documents

Publication Publication Date Title
WO2016090520A1 (en) A method and a system for image classification
US20220327355A1 (en) Sparsified Training of Convolutional Neural Networks
Sameen et al. Classification of very high resolution aerial photos using spectral-spatial convolutional neural networks
Can et al. Learning to segment medical images with scribble-supervision alone
Lin et al. Exploring context with deep structured models for semantic segmentation
US11403486B2 (en) Methods and systems for training convolutional neural network using built-in attention
Lin et al. Towards accurate binary convolutional neural network
US11461628B2 (en) Method for optimizing neural networks
Gadde et al. Superpixel convolutional networks using bilateral inceptions
US20180181867A1 (en) Artificial neural network class-based pruning
WO2016054779A1 (en) Spatial pyramid pooling networks for image processing
US20190286982A1 (en) Neural network apparatus, vehicle control system, decomposition device, and program
US20150278634A1 (en) Information processing apparatus and information processing method
EP3480689B1 (en) Hierarchical mantissa bit length selection for hardware implementation of deep neural network
EP3528181B1 (en) Processing method of neural network and apparatus using the processing method
EP3480743A1 (en) End-to-end data format selection for hardware implementation of deep neural network
US20200202514A1 (en) Image analyzing method and electrical device
CN109255382B (en) Neural network system, method and device for picture matching positioning
CN114419406A (en) Image change detection method, training method, device and computer equipment
CN113920382B (en) Cross-domain image classification method based on class consistency structured learning and related device
Suzuki et al. Superpixel convolution for segmentation
Barbu Robust contour tracking model using a variational level-set algorithm
Wei et al. Sparsifiner: Learning sparse instance-dependent attention for efficient vision transformers
Chen et al. Dual dictionary learning for mining a unified feature subspace between different hyperspectral image scenes
Hu et al. Unifying label propagation and graph sparsification for hyperspectral image classification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14907619

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14907619

Country of ref document: EP

Kind code of ref document: A1